Designed to Fail: Why Short-term Diet Trials Cannot Solve the Epidemics of Chronic Disease
A version of this essay was published in STAT News First Opinion on April 22
See bottom for a Stats 101 primer on the fatal flaw of “carry-over effects”
IMAGINE a clinical trial with sedentary, overweight adults. One group is assigned to remain sedentary, the other to undergo intensive physical training with daily runs, calisthenics, and sports. After a week or two, the training group would probably feel sore and tired, and their endurance might be reduced. But we wouldn’t conclude that physical activity is bad for health. Clearly, we’d need a better, longer study to see the benefits.
Unfortunately, this fundamental flaw affects clinical trials at the heart of the $170 million Nutrition for Precision Health program, as we consider in a new paper in The BMJ, British Medical Journal.
Nutrition for Precision Health has an ambitious goal — to determine what type of diet works best to prevent chronic disease for each person using artificial intelligence. To do this, research teams from across the country intend to study several thousand volunteers on 3 different diets. One diet is conventionally healthy, including vegetables, fruits, and whole grains. Another diet is highly processed, with lots of sugar, refined grains, and meat. The third is low-carbohydrate, based on high-fat foods and strictly limiting sugar and grains.
The investigators go to great lengths to maintain scientific rigor, providing the volunteers fully prepared meals to help them stick to the diets. Up to 1000 will be admitted as inpatients and kept under continuous observation, to ensure they eat nothing but the prescribed diets. A huge amount of data will be collected using advanced technologies with complicated names, like “microbial metagenomics and metatranscriptomics, targeted and untargeted metabolomics,” and an “Automatic Ingestion Monitor” system.
The problem is, with all the costs and complexity, the diets can’t last long — just 2 weeks each. Two weeks is simply too short to tell us anything meaningful about how diet affects obesity and the other chronic diseases plaguing Americans today.
Consider the trial comparing “ultra-processed” and “unprocessed” diets, upon which the new research program is designed in part. During 2-week inpatient stays, 20 volunteers initially ate about 600 calories more a day on the ultra-processed diet. However, this effect shrank by about 25 calories each day throughout the trial. At this rate, the diets would no longer differ after another two weeks. In a replication trial, the effect of the ultra-processed diet weakened after just one week.
Myriad factors affect how much a person eats on any particular day, among them utensil size, plate color, room temperature, and social setting. But we don’t eat with small spoons on colorless plates in a warm room with strangers in the hopes of losing much weight — we understand these effects are transient.
Does ultra-processed food cause obesity? Maybe, but we’ll never know from short-term trials like these.
To complicate matters further, the Nutrition for Precision Health trials use a cross-over design, in which all volunteers consume all three diets in succession over several months. This design is efficient and lets investigators examine how people vary in their individual responses. But there is a big catch — the effects of one diet can bleed into the next, creating a statistical mess.
Let’s imagine what might happen with that hypothetical short-term physical activity study, if it were run as a cross-over. One group would get the vigorous activities first, becoming tired and sore. Then, during the subsequent sedentary condition, they’d rest, recover, and experience delayed benefits of exercise as the temporary side effects wore off. But the group assigned the sedentary condition first could become even less fit than they started, making the subsequent physical activities even more likely to cause side effects. This is what’s called a carry-over effect — exercise makes being sedentary appear better than it really is, whereas being sedentary makes exercise appear worse. As every statistician knows, carry-over effects invalidate the trial (see quotes below).
Short-term diet trials are highly prone to these types of bias, because it takes weeks to months for the body to adapt to a major change in nutrients. For this reason, people starting a very-low-carb diet often experience fatigue and other symptoms — it’s called ‘keto flu.’ With volunteers selected from the general population — that is, with habitually high intakes of carbohydrate — bias inevitably works against the low-carbohydrate diet. (This problem is on full display in an influential trial comparing low-carbohydrate and low-fat diets — the carry-over effect was a massive 2000 Calories a day!)
These trials are not only inconclusive, but also potentially misleading by making a healthy diet look bad and an unhealthy diet look good. We must do better.
Can specific diets support metabolism, calm inflammation, slow aging, or protect the brain? Feeding studies with all the bells and whistles can help answer these basic questions, but they must be long enough to allow the body to adapt, with diets and wash-out periods (the time between successive diets) of at least 2 months.
Ultimately, there is no substitute for long-term trials. We’d never approve a drug for obesity, diabetes, or any chronic disease based on 2-week or even 2-month data.
And there is no substitute for government support of nutrition science, as we’ve seen from the success of publicly funded research with tobacco-related diseases, HIV/AIDS, and birth defects, among other breakthroughs. Whereas Big Pharma can easily raise the $1 billion that may be needed to develop just 1 drug for just 1 health condition, no big company profits by preventing disease through diet. Only government can fill this gap.
With the recent budget cuts, every research dollar must count. Funding at the level of Nutrition for Precision Health could support several large-scale trials of low-carbohydrate, ultra-processed, and other diets over at least 2 years. These studies would lay a solid foundation for future dietary guidelines and patient care recommendations. (The government previously sponsored a slew of major low-fat diet trials, all basically negative, but not one low-carbohydrate diet trial of comparable magnitude.)
We’ve been debating diet for decades, even as rates of diet-related chronic diseases continue to surge. To solve this public health crisis, we need definitive research from high-quality studies. Short diet trials will never measure up to this task.
Stats 101: The fatal flaw of carry-over effects
Carry-over effects invalidate cross-over trials, as evidenced by these quotes dating back half a century. This point really isn’t debatable.
As we document in our BMJ paper, this problem seems more the rule than the exception in short-term cross-over trials comparing diets.
Varma 1974: “If there is a differential carryover, then only the first two readings are used, i. e., the baseline and the first ‘active period’ … If there is a differential carryover, the treatment difference can be estimated only from the first period”
Hill 1979: “If the results of the trial suggest a definite interaction between treatments and periods [i.e., a carry-over effect] then … the treatment comparison should be based on the first period alone.”
Armitage 1982: “The crucial problem is that if there is an interaction [carry-over effect] there is no point in estimating and testing the treatment main effect in the manner just described. If the treatment effect is different in the two periods there is little or no interest in its average over the two periods, since the particular pattern of administration of treatments adopted for the trial will normally have no clinical relevance. It is therefore a matter of great importance to decide whether or not we can safely assume no interaction [carry-over].”
Laska 1983: “the usual two-period, two-treatment crossover design, AB,BA, cannot be used to estimate the contrast between direct treatment effects when unequal carryover effects are present”
Senn 1988: “The justification for a cross-over design must depend on medical opinion as to whether the wash-out period can be regarded as achieving its aim”
Freeman 1989: “When carryover is present, the nominal level very seriously understates the actual level, and this becomes even worse when baseline observations are ignored. Increasing sample size only exacerbates the problem since this adverse behaviour then occurs at smaller values of the carryover effect.”
“This [2 group crossover] design is still widely used in clinical trials, despite its well-known deficiency of being unable to provide an unbiased estimate of the difference between the treatment effects in the presence of a differential carryover effect. Such an estimate can only be obtained by: (a) using data from the first treatment period only (thereby making the second period pointless) or (b) assuming that there is no differential carryover”
Statistical Principles for Clinical Trials, 1998: “Crossover designs have a number of problems that can invalidate their results. The chief difficulty concerns carryover, that is, the residual influence of treatments in subsequent treatment periods. In an additive model the effect of unequal carryover will be to bias direct treatment comparisons. In the 2×2 design the carryover effect cannot be statistically distinguished from the interaction between treatment and period and the test for either of these effects lacks power because the corresponding contrast is ‘between subject’.”
Fleiss 1989: “If markedly unequal carryover effects are a realistic, not merely a theoretical possibility, the trial should be carried out as a simple parallel groups experiment”
Lehmacher 1991: “If [unequal carry-over effects can’t be excluded] then the crossover trial should be avoided, because a lack of residual effects cannot be determined satisfactorily from the sample data”
Chow 1992: “When unequal carryover effects are present, the standard 2 X 2 crossover design may not be useful because the differential carryover effect is confounded with sequence or formulation-by-period effects.”
Senn 1994: “baselines do not provide a cure for the problem of carry-over; and it is concluded that any rational analysis of such trials will always be dependent on assumptions regarding carry-over, and that it is necessary to pay particular attention to washout periods.”
Wang 1997: “One of the challenges with two-treatment two-period crossover designs stems from the possible presence of differential carryover effects that may invalidate use of the second-period data. The carryover effects are completely confounded with treatment-by-period interaction and sequence effects”
Food and Drug Administration 2001: “if the possibility of unequal carryover effects cannot be ruled out, no unbiased estimate of [the treatment effect] based on within-subject comparisons can be obtained with this design.”
Jones & Kenward 2003: “the carry-over and the direct-by-period interaction parameters are intrinsically aliased with each other.”
“Our advice therefore is to avoid having to test for a carry-over difference by doing all that is possible to remove the possibility that such a difference will exist. This requires using wash-out periods of adequate length between the treatment periods. This in turn requires a good working knowledge of the treatment effects, which will most likely be based on prior knowledge”
Reed 2004: “The AB/BA design is still widely used in clinical trials despite its well-known deficiency of being unable to provide an unbiased estimate of the difference between the treatment effects in the presence of a differential carryover effect. Such an estimate can only be obtained by using data from the first treatment period (which obviously makes the second period pointless) or assuming that there is no differential carryover.”
Ambrosius 2007: “To use the crossover design, we need to ensure that the effects from the previous treatment do not carry over into the period of the next treatment. If a carryover effect exists, the analysis is complicated, and the direct comparison of the treatment effects can be invalidated.”
Mori 2015: “the AB/BA design cannot be applied in the presence of carryover effects and/or treatments-by-period interaction”
Lichtenstein 2021: “Since each participant serves as his/her own control, crossover study designs are unsuitable when outcomes … have long carryover effects”
Penn State STATS 509: “The recommendation for crossover designs is to avoid the problems caused by differential carryover effects at all costs by employing lengthy washout periods and/or designs where treatment and carryover are not aliased or confounded with each other”