Paper-Journal

Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization

We propose an active learning framework that leverages disagreement between large language models (GPT and DeepSeek) to selectively fine-tune GPT models with reasoning-augmented supervision, significantly improving accuracy and recall in automated biomedical literature screening.

Mar 9, 2026

Testing and evaluation of generative large language models in electronic health record applications: a systematic review

Our systematic review shows that current evaluations of LLMs in real clinical settings are surprisingly narrow and mostly focused on radiology and decision support, while key areas like patient communication remain largely under-explored.

Jan 13, 2026