RESEARCHarXiv CS.CL·4/14/2026
Generating High Quality Synthetic Data for Dutch Medical Conversations
This paper presents a pipeline for generating synthetic Dutch medical dialogues using a fine-tuned Large Language Model to address the scarcity of clinical data due to privacy constraints. Evaluations showed strong lexical variety but a scripted conversation flow and issues in domain specificity during qualitative review.
28