Mental Health Diagnoses May Be Less Reliable Than Thought

A meta-analysis found that repeating the same psychiatric assessment often produced different results, underscoring the complexity of diagnosing mental health.
Mental Health Diagnoses May Be Less Reliable Than Thought
Illustration by The Epoch Times, Shutterstock
|Updated:
0:00

A mental health diagnosis can influence everything from the way people see themselves to new medications, insurance coverage, and even job opportunities.

However, a new study, published in JAMA Network Open, suggests that psychiatry’s most trusted diagnostic interviews—often considered the gold standard—may be less reliable than many clinicians and patients assume.

Researchers found that when adults completed the same interview twice, typically within one to two weeks, they did not always receive the same diagnosis.

“Many people assume these interviews give a definitive answer—that you either do or do not have a condition,” senior author of the study Laura Duncan, assistant professor in the Department of Psychiatry and Behavioural Neurosciences at McMaster University, told The Epoch Times via email.

“In reality, diagnosis may be more contextual.”

Mental Health Diagnoses Don’t Always Match

The analysis pools data from 46 studies, covering more than 8,000 adults in 26 countries and 17 different structured diagnostic tools, including the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, the Composite International Diagnostic Interview, and the Mini-International Neuropsychiatric Interview, used to assess mental and substance use disorders.

Yet across the studies reviewed, overall agreement between the first and second interview was moderate. On a standard scale, on which a score of one means perfect agreement, the interviews scored 0.69—falling short of the level of consistency many people might expect from a diagnostic tool often considered a gold standard.

“The more accurate takeaway is not that diagnoses are arbitrary,” Duncan said. “It’s that they are not perfectly reliable when measured using structured interviews.”

Diagnoses tied to more observable behaviors tended to be more consistent. For example, substance use disorders performed better than mental disorders as a group, with an agreement score of 0.72 compared with 0.65 for mental disorders.

Opioid use disorder was among the most reliably diagnosed conditions in the entire analysis, scoring 0.81.

At the other end of the spectrum, nonaffective psychosis—a category that includes disorders such as schizophrenia—showed agreement of just 0.55, a figure that falls closer to chance than certainty. Anxiety disorders, depression, and personality disorders generally fell somewhere in the low- to mid-0.60 range.

Bipolar disorder was a relative bright spot among psychiatric diagnoses at 0.74. Hallucinogen use disorder ranked lowest at 0.59.

The Difficulty With Mental Health Diagnosis

Mental health conditions can be difficult to measure in a perfectly consistent way. Unlike a broken bone on an X-ray, most psychiatric disorders are assessed entirely through self-report: how a person describes his thoughts, feelings, and behaviors at a given moment in time.

“Behaviors like substance use or actions like stealing or vandalism tend to be easier to recall and describe consistently,” Duncan said. “Internal experiences like mood or anxiety are more subjective and can be harder to assess in a consistent way.”

Mental health symptoms are also not static. A person’s current mental state can also shape how he describes his symptoms from one week to the next.

“That can impact the reliability of their ability to self‑assess,” she said.

Further, mental health symptoms themselves can shift with stress, sleep, relationships, or major life events. As a result, two interviews conducted close together may capture different slices of the same person’s mental state—a bad week or a patient’s unwillingness to talk about what he is going through can all affect his answers.

Duncan’s earlier research on children and adolescents showed even less reliable results. She and her colleague’s 2019 meta-analysis of standardized psychiatric interviews found only moderate agreement—an average reliability of about 0.58 on the same zero to one scale.
The review’s findings suggest the broader difficulty of capturing changing emotional states with fixed diagnostic labels.

The Implications

The implications extend well beyond the clinic. Structured interviews are widely used in psychiatric research to estimate disorder prevalence, screen participants for clinical trials, and validate diagnostic instruments.

If the instruments themselves carry significant measurement error, those findings inherit that uncertainty.

“Structured interviews are often treated as a ‘gold standard,’ but our findings suggest they have important limitations,” Duncan said. “My hope is that these findings open up an important conversation: whether we should think differently about how we define and measure mental disorders.”

That does not mean that diagnoses are meaningless, or that clinicians should abandon structured tools. However, it does suggest that a single interview, however carefully administered, should rarely be treated as definitive.

Diagnosis should be seen as a working formulation rather than a final verdict, Duncan said.

“These interviews can be very helpful, but their results should be interpreted in light of what moderate reliability actually means,” she said.

Cara Michelle Miller
Cara Michelle Miller
Author
Cara Michelle Miller is a health reporter for The Epoch Times. She covers both health news and in-depth features on emerging health issues. Prior to taking up writing, she taught at the Pacific College of Health and Science in NYC for 12 years and led communication seminars for engineering students at The Cooper Union.