We propose an active learning framework that leverages disagreement between large language models (GPT and DeepSeek) to selectively fine-tune GPT models with reasoning-augmented supervision, significantly improving accuracy and recall in automated biomedical literature screening.
Mar 9, 2026
Our systematic review shows that current evaluations of LLMs in real clinical settings are surprisingly narrow and mostly focused on radiology and decision support, while key areas like patient communication remain largely under-explored.
Jan 13, 2026