AI has passed medical exams. Will the doctors of the future be made of silicon?

ChatGPT is like Uncle Google on steroids – a scholastic parrot, so the use of AI in medicine is just a pipe dream. Yes? Meanwhile, a group of AI models just crushed the USMLE medical exam, achieving results that most living doctors can only dream of. And she did it without reading textbooks, without night shifts and without a single cup of coffee.

When the machines begin to confer

Scientists from Johns Hopkins University have created something that sounds like a science fiction scenario – the “AI council”. Five models based on ChatGPT not only solved exam questions, but above all… discussed with each other. They compared answers, argued, convinced each other and came to common conclusions.

Effect? On 325 questions in the field of clinical knowledge and basic medicine, the system achieved 97, 93 and 94% correct answers in three stages of the test. These are not “pretty good” results. These are results that exceed the capabilities of single AI models and set the bar at a level that many human medical candidates will never reach.

The power of AI collective thinking (even if there is nothing to think about)

The most fascinating aspect of this experiment? The models did not receive any specialized medical training or additional data. As Yahya Shaikh, co-author of the study published in “PLOS Digital Health”, emphasizes:

Our research shows that when several AI systems conduct a collaborative discussion, they achieve the highest performance ever in medical licensing exams – without special training or access to medical data.

It’s a bit like gathering five first-year students, having them talk about diagnostics, and suddenly discovering that together they solve cases at the level of experienced specialists. Except there are no students here, only algorithms exchanging arguments.

When the models could not reach an agreement, the scientists introduced a mediator into the game, i.e. an additional AI system that analyzes the discrepancies and proposes further discussion. Thanks to this approach, as many as 53% of the initially incorrect answers were corrected.

Picture this: machines argue over a diagnosis, a digital arbitrator is called in, and in the end comes up with a better answer than the one any of them individually offered. Sounds like a parody of bureaucratic management? And yet it works better than a single, even the most advanced, AI.

Is this the end of the stethoscope?

Before you start imagining a future where holograms in white coats make diagnoses, scientists warn: the method has not yet been tested in real clinical settings. This is the fundamental difference between passing an exam and holding someone’s life in (non-existent) hands.

Nevertheless, researchers emphasize the potential:

Our work provides the first clear evidence that AI systems can self-correct through structured dialogue, and the effects of their interaction exceed the capabilities of a single model

– adds Shaikh.

It is worth adding that over the last 3 years (since the beginning of the so-called generative revolution, i.e. the premiere of ChatGPT 3.5), many doctors have tested the capabilities of genAI in terms of diagnostics. Some of them were really promising. Importantly, patients themselves also often experiment with AI in the initial diagnosis of their diseases and conditions. Is this appropriate? Well, on the one hand, AI will not replace a human specialist, but on the other hand, AI can often accurately diagnose the initial diagnosis, which is then developed during a visit to a real doctor. This does not change the fact that AI may support doctors in making more accurate decisions in the future. Not to replace, but to support. At least for now.

Silicon doctors? Not necessarily. Consultants – coming soon

In a decade, will we be consulting symptoms with an AI advisory board instead of a GP? It’s hard to say, although curiosity suggests Trauma Team clinics straight from a Cyberpunk dystopia. Technology is no longer just a tool in human hands. Now he becomes a discussion partner who can question his own assumptions and correct mistakes, something that many people are… well, not so familiar with.

Maybe the doctors of the future won’t be made of silicon. But they will definitely have silicon colleagues who are not afraid to say, “Wait a minute, let’s think about this again.” And honestly? In medicine, such a colleague can be worth his weight in gold.