Did The New York Times Overhype the AI Study?

Unpacking How AI is Changing Healthcare

This Week in Health AI #7 | Subscribe

Happy Friday! Let’s jump in.

Today We’re Covering:

  • AI "Beat the Doctors" Story: A JAMA study claims ChatGPT outperformed physicians in diagnostic reasoning, but a key critic argues the results are overhyped due to flawed methodology and unrealistic test conditions.

  • Paragon Health Institute’s AI Roadmap: Kev Coleman’s regulatory framework could potentially shape healthcare AI policy under the Trump administration.

  • AI in Healthcare: The Rush to Innovate: A survey reveals 73% of healthcare executives are increasing AI investments, with a focus on quick operational wins and the pivotal "build vs. buy" debate.

The AI "Beat the Doctors" Story—Too Good to Be True?

Last Sunday, The New York Times made waves with a story that highlighted a JAMA study claiming ChatGPT "defeated" a group of physicians in diagnostic reasoning. The study, led by Dr. Adam Rodman, compared ChatGPT's performance to that of 50 physicians tackling six complex clinical cases. On average, the chatbot scored 90%, while physicians achieved 74%-76%, whether or not they used the AI tool​.

This finding certainly struck a nerve, as it feeds into the growing narrative that AI is poised to take a central role in healthcare diagnostics. The Times' piece amplified these claims, quickly attracting the attention of both AI enthusiasts and skeptics.

Predictably, the story gained immediate traction. For many in the AI health community, this study reinforced the belief that AI will inevitably infiltrate key aspects of clinical decision-making.

However, not everyone was convinced.

Enter Sergei Polevikov, an AI researcher and healthcare entrepreneur, whose detailed critique of the study pulled the brakes on this runaway narrative​.

Polevikov’s Counterpoints: It’s Not What You Think

Polevikov raised significant concerns about the study's validity:

  • Sample Size Problems: The study evaluated just six clinical cases, cherry-picked from a dataset of 105. This small sample size severely undermines its statistical reliability​.

  • Misleading Framing: The study focused on diagnostic reasoning (the cognitive process of analyzing evidence), not final diagnosis accuracy. This distinction was glossed over in the Times’ coverage, which conflated the two​.

  • Case Selection Bias: The cases used were unusually complex, and simplified descriptions replaced key medical terms—potentially skewing results in favor of the AI model​.

  • Real-World Applicability: The controlled vignette format of the study doesn’t replicate the messy, real-world scenarios where doctors must juggle incomplete data, patient interactions, and time constraints​.

Polevikov’s critique is a reminder that context and methodology matter. While AI applications in patient care are deeply intriguing, the study's limitations make its headline-grabbing conclusions premature at best.

So where does that leave us? The New York Times' portrayal of this study likely overstates ChatGPT's readiness to surpass human doctors in real-world diagnostic tasks. This kind of premature framing can distort public perception and confuse an already nuanced debate about AI's role in healthcare. With ongoing advancements in AI and its potential for diagnostic support, more rigorous, large-scale studies are essential to clarify what these tools can—and cannot—do.

As the dust settles, we’ll be keeping an eye on follow-up research that is inevitable at this point, given the hype surrounding stories like this one.

Paragon Health Institute’s AI Roadmap for Healthcare Regulation

Kev Coleman, a visiting fellow at the Paragon Health Institute, proposes a detailed framework for regulating AI in healthcare as the Trump administration prepares its technology policies. Coleman’s insights, grounded in his experience with software and healthcare, aim to optimize cost savings and efficiency while ensuring tailored oversight of various AI technologies.

Key Points:

  1. Granular AI Regulation: Coleman stresses the need to regulate AI at a detailed level, recognizing that broad policies might lead to unintended consequences. He recommends specific definitions for technologies like large language models and neural networks.

  2. Decentralized Agency Role: Rather than centralizing AI oversight under a single officer, Coleman suggests leveraging the expertise of agencies like the FDA and CMS to independently govern AI applications.

  3. Risk-Based Approach: High-risk AI applications should undergo more rigorous scrutiny, while proven tools could operate with less oversight, reducing costs and fostering competition.

  4. Economic Impacts: Coleman’s July policy paper highlights the potential for AI to disrupt consolidated hospital markets by enabling competition through autonomous care delivery and devices.

Why This Matters

Kev Coleman’s upcoming paper, expected next month, could significantly influence healthcare AI policy as the Trump administration shapes its own approach to technology regulation. With Trump poised to repeal Biden’s 2023 AI executive order, organizations like the Paragon Health Institute—seen as ideologically aligned with Trump—may play a pivotal role in framing the regulatory landscape. Monitoring these developments is crucial as they could redefine AI's integration into healthcare, balancing innovation against patient safety and market competition. Full Article Read Coleman's Last Report

AI in Healthcare: The Rush to Innovate

A recent survey of healthcare executives by Define Ventures indicates that AI is a top priority for providers and payers, with 73% planning increased investments. The focus is on achieving early operational efficiencies to build momentum for more transformative applications. The survey also highlights a significant "build vs. buy" debate, as organizations navigate the choice between creating custom AI tools or leveraging third-party solutions.

Key Takeaways

  • Quick Wins Drive Adoption: Healthcare organizations prioritize applications like ambient scribing (83%) and administrative AI (59%) to improve operational efficiency and clinician workflows.

  • Governance Structures in Place: Nearly three-fourths of surveyed organizations have established AI governance to prioritize use cases, ensure safety, and address data policies.

  • Build vs. Buy: 72% of organizations rely on external vendors for application solutions, but many invest internally in data infrastructure, reflecting a hybrid approach.

  • Challenges Persist: Integration issues, unclear ROI, and limited technical resources remain significant hurdles, while seamless EHR integration and data security are top priorities when evaluating vendors.

Why This Matters

The findings spotlight AI's increasing influence as healthcare players rush to integrate innovative solutions, underscoring how central it has become to competitive strategies. The "build vs. buy" dilemma captures a critical crossroads: Will healthcare systems lean on giants like EPIC and other existing healthcare players, or will new entrants use AI as a foothold to disrupt the existing market share dominance by existing software companies. Intermixed with this is the ongoing regulatory uncertainty as regulators struggle to put out meaningful guidance to help shape the way AI will be used in our industry. Full Article

Other Things Worth Checking Out

Here are some other developments that might be worth your time.

FDA advisory committee to roll up sleeves on generative AI this week - The FDA's Digital Health Advisory Committee (DHAC) met Wednesday and Thursday of this week to establish a framework for regulating generative AI in medical devices, addressing unique risks like hallucination and biases in foundation models.

Researchers: LLMs Can Help Process Hospital Quality Measures - A UC San Diego pilot study found that large language models (LLMs) can efficiently and accurately process complex hospital quality measures like CMS's SEP-1 for sepsis

Tenet to deploy Commure’s AI scribe at physician network - Commure and Tenet announced that Commure's ambient AI platform, Commure Scribe, will be deployed across Tenet Physician Resources’ national network

That’s it for now. We’ll catch up again next week.

-Patrick

Enjoy this? Please take the time to forward to a friend or subscribe.