← heapsort
RESEARCH28

Generating High Quality Synthetic Data for Dutch Medical Conversations

arXiv CS.CLΒ·April 14, 2026

This paper presents a pipeline for generating synthetic Dutch medical dialogues using a fine-tuned Large Language Model to address the scarcity of clinical data due to privacy constraints. Evaluations showed strong lexical variety but a scripted conversation flow and issues in domain specificity during qualitative review.

Read original β†—