Dietaryindex: A Case Study in Improving Reproducibility, Standardization and Access in Research and Beyond 

Table of Contents

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

A modern, sleek illustration showing a laptop with R programming code, surrounded by icons of fruits, vegetables, and grains, along with data charts representing dietary science. The image reflects themes of reproducibility, standardization, and accessibility in scientific research, specifically related to dietary indexes, as discussed in the blog post 'Dietaryindex: A Case Study in Improving Reproducibility, Standardization and Access in Research and Beyond.

Introduction 

Last year, I conducted validation work for dietaryindex, a comprehensive, easy-to-use, and highly versatile R package that was created by James Zhan, a PhD student at Emory in the Nutritional and Health Sciences program. Dietaryindex contains over fifty highly flexible functions to calculate more than ten different dietary pattern indexes – an enormous undertaking that immensely improves standardization and reproducibility in nutritional research. 

Prior to dietaryindex, nutritional research lacked standardized, validated, open-source tools for calculating dietary indexes – or they were only available in SAS. That’s the other way dietaryindex is helping reshape the field: it’s expanding who can even work in it. With the release of dietaryindex, R users now have true access to (and even a leg up in) a field that has been SAS-centric for decades. This expansion that wasn’t just nice for R users – it’s essential to the field, because more and more nutrition research is starting to include microbiome and other omics data – data types that are better suited for analysis in R rather than SAS. 

  

What’s this blog post about? 

In this blog post, we’ll dive into the ways dietaryindex was systematically created to address the challenges found in nutritional research and in calculating dietary pattern indexes. We’ll discuss what those challenges are, how dietaryindex addressed them, and how dietaryindex serves as a case study for how we can improve access and reproducibility in science- and beyond- by creating standardized, open-source tools in R. 

 

First off: what are dietary pattern indexes? How does this problem relate to fields outside of nutritional research?  

Dietary pattern indexes- also known as diet indexes, or diet scores- quantify dietary intake to assess the diet quality (and overall diet patterns) of individuals in a study population. They’re commonly used in nutritional research to study the relationship of diet quality or larger diet patterns with diseases such as diabetes, obesity, or cancer. 

However, creating scores to classify individuals to try to understand or predict their disease risk isn’t limited to nutritional research. Lots of other fields- both in research and in clinical medicine, as well as in fields entirely outside of science- use it too.

Here are a few examples of indexes or scores used in fields besides nutrition:

  • Polygenic risk scores, or genetic risk scores: These are scores calculated according to the genetic risk factors an individual has for a given disease. 

 

  • Medical scores: Scores (or ‘checklists’, or classification systems) can be based on a variety of factors, such as behavioral risk factors (e.g., smoking history, body mass index, age), clinical biomarkers (e.g., prostate-specific antigen (PSA) levels), or symptoms and their severity (e.g., chest pain), which are used to help clinicians make decisions for their patients’ care. They are used all across medicine, from pediatrics to oncology, to psychiatry and emergency medicine. 

 

  • Credit Scores: rating systems based on credit use, payment history, income, and more that are used by insurance or financial companies to assess the level of credit or financial risk an individual poses. We’re probably all familiar with these: they impact our ability to get loans or open new credit cards or bank accounts.

 

Challenges in calculating dietary indexes, and how dietaryindex solved them 

 There were four primary challenges that dietaryindex was built to address.

 

Challenge 1: It’s a time-consuming and complicated process to figure out how to calculate a diet score

Prior to the creation of dietaryindex, there was only one dietary pattern index (the Healthy Eating Index (HEI)) that had an openly available, validated SAS program to calculate it. Unfortunately, for the rest of the dietary patterns commonly used in the field, such as the Alternative Healthy Eating Index (AHEI), the Mediterranean diet scale (MED), and more, calculating them was left up to the researcher or clinician using them in their studies. 

 

This was problematic for a lot of reasons: 

  • It was a manual and tedious process that could result in incorrect interpretations or miscoding, which could lead to errors in analysis. 

 

  • Even if you managed to correctly interpret and encode everything, many of the diet scores are complicated and take time to figure out how to interpret, like the Dietary Inflammation Index. 

 

  • If you planned to calculate multiple diet scores, that meant looking up a lot of papers, and referencing all of them. 

 

Now with the creation of dietaryindex, anyone conducting dietary pattern research can find everything they need in one place: the dietaryindex GitHub page. It has a comprehensive Excel file with every dietary pattern’s scoring system, explains how each of its components is calculated, and cites the publication they came from. For anyone working in SAS, this saves a lot of time and effort; for everyone working in R, it saves us even more than that, because we don’t even need to use the Excel file– we can just use the dietaryindex functions themselves to calculate the scores with minimal coding. 

[Here’s an example of one of  the HEI-2020 index. This is one of the simpler ones. Just think about the coding involved in this and the many places you might get tripped up.] 

 

Challenge 2: Dietary indexes are cohort-specific and weren’t easily open-sourced

Many dietary indexes are population-specific – meaning that the score a particular individual gets for a particular dietary pattern can depend on the other people in their study population or cohort. For instance, several diet scores (i.e., AHEI-2010, DASH, DII, and ACS 2020) assign individuals points according to the tertile, quantile, or quintile of food intake an individual’s consumption falls within for that particular study population.

Prior to the creation of dietaryindex, only one dietary index had a publicly available SAS program to calculate it: the Healthy Eating Index (HEI).  Perhaps unsurprisingly, this is also one of the only dietary indexes that isn’t cohort-specific. 

With the creation of dietaryindex, R users can now easily and reproducibly calculate over ten dietary indexes, including HEI. Dietaryindex managed to get around the issue of cohort-specific coding by creating a variety of functions that enable users to input the different food components the scores need, with dietaryindex standardizing any food units to work for any diet index, before it calculates the individual scores for the provided study population. 

 

Challenge 3: There were freely available Food Frequency Questionnaires (FFQs) to collect dietary data, but no tools to analyze them

There are many freely available and commonly used food frequency questionnaires (FFQs) in the nutrition field, such as the DHQ-III or ASA-24, which both provide clean data output files. But prior to dietaryindex, there were no easy ways to calculate dietary indexes with them, aside from HEI.

Now that we have dietaryindex, there are also freely available, standardized, and validated tools to calculate diet scores directly with the data output from these food frequency questionnaires. This eliminates any need to code anything at all, you just input the locations of the data output files for your study. Not only does this vastly improve reproducibility, access, and standardization in the field, but it makes it a lot easier to conduct nutritional research in the first place.

Need another reason this feat is so amazing? Let’s say you’re a researcher who wants to calculate multiple dietary indexes. Unfortunately, prior to dietaryindex, this would actually mean more effort, not less. This is because the different dietary indexes often require different units of input and don’t always count the same individual foods within the same food group.  

For instance, both the DASH and Mediterranean dietary indexes need daily servings of whole fruit and 100% fruit juices; meanwhile, AHEI-2010 and ACS-2020 don’t include fruit juices (whether they’re 100% or not), but do want daily servings, whereas the HEI wants only whole fruits (no juices) in cup equivalents, and the PHDI wants only whole fruits in grams. 

Dietaryindex not only made calculating multiple dietary indexes easy for the commonly used FFQs, but it also completely eliminated this ‘units’ issue. Now, any researchers using the common FFQs (as well as anyone using the publicly available datasets from NHANES) don’t have to input anything at all. They can simply specify the location of the standard output data files from these different resources and dietaryindex will calculate the scores directly from those Excel files in a single, streamlined piece of code. 

That’s pretty dang impressive.

 

Challenge 4: The publicly available dietary dataset (NHANES) had few tools to analyze its dietary recall files in a standardized, reproducible way

The National Health and Nutrition Examination Survey (NHANES) is a survey and one to two dietary recalls conducted in a nationally representative subset of approximately 5,000 Americans every two years. Survey and diet data from these studies are publicly available going back as far as the 2001-2002 cycle. Prior to dietaryindex, only HEI was able to be reproducibly and easily calculated with NHANES data. But for the other scores? Despite the NHANES data being standardized, it’s not as easy as you might think to calculate those other diet scores.

Why? Because dietary recalls, which ask people about everything they ate or drank in the last twenty-four hours, contain thousands of foods and beverages. It can be a lot of work to figure out which foods or beverages should count toward the different components each dietary index asks for – and that’s ignoring the problem of figuring out how to convey all your decisions in your later publication so that other researchers can reproduce it.

Here’s an example of the type of decisions you will have to make: someone drank a latte. Should the dairy in a latte (e.g., a coffee drink) count toward the dairy score? If so, how do you decide what portion of the latte is dairy and should be counted? And to make matters more complicated, if a diet score requires grams or servings, but the latte was in ounces, then you need to convert units too, right?

Dietaryindex solved this reproducibility issue by specifying which NHANES food groups or individual food codes were used to calculate each scoring component for each of the dietary indexes, as well as by validating its results for HEI- the only diet score with standardized code available for NHANES- against the output from the NHANES SAS code. Now, anyone calculating any of the ten commonly used dietary indexes with NHANES data can use dietaryindex’s functions and cite dietaryindex and know their results are valid and easily reproducible.

 

Convinced yet of the inspiring, revolutionary feat that is dietaryindex?

Hopefully, by now I’ve sung its praises enough, and helped even those of you completely unfamiliar with nutritional research and dietary indexes to understand the revolutionary work of dietaryindex. Dietaryindex has created a standardized, validated, and open-source tool for calculating dietary pattern indexes that improves reproducibility and access within nutritional research while also expanding the types of research that can be done by allowing more R users into the field.

There are lots of other fields out there- both in science and industry- that use classification systems similar to dietary indexes. Hopefully, this blog post has inspired you to think deeper about the challenges to the types of work you’re doing in your field, and how you might dig deeper into those challenges to figure out innovative solutions to deal with them.

References

If you want to learn more about dietaryindex, check out our recently published paper or the dietaryindex GitHub page. 

Subscribe to our newsletter

Stay informed with the latest insights, industry trends, and expert tips delivered straight to your inbox. Sign up for our newsletter today and never miss an update!

We care about the protection of your data. Read our Privacy Policy.

Keep reading

Dig deeper into data development by browsing our blogs…
A diverse team of professionals at ProCogia collaborates in a modern office, analyzing complex data visualizations on a large digital screen. One person actively points at the screen while others engage in discussion, symbolizing end-to-end problem-solving, strategic planning, and teamwork. The high-tech setting reflects deep engagement in solving real-world challenges.

Delivering End-to-End Data Solutions That Drive Outcomes

In today’s rapidly evolving data landscape, businesses need more than just tools—they need comprehensive, end-to-end solutions that drive real impact. Too often, companies invest in data products without the right strategy, integration, or expertise to maximize their value. At ProCogia, we take a different approach: we embed ourselves in our clients’ ecosystems, ensuring that data engineering, pipelines, analytics, and AI solutions aren’t just implemented, but truly optimized for long-term success.

This blog explores why trust, deep collaboration, and tailored consulting are essential in transforming data into meaningful insights. Whether it’s breaking down silos in healthcare, refining AI-powered search engines, or enabling financial institutions to make smarter decisions, ProCogia’s approach ensures that technology aligns with business needs—not the other way around.

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.