← heapsort
ARTICLE↑ trending42

SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]

Reddit r/MachineLearningΒ·April 20, 2026

An independent researcher created SGOCR, an open-source dataset pipeline for spatially-grounded, OCR-focused VQA, to fill a gap in visual datasets for text grounding in imagery. This pipeline generates VQA tuples with rich metadata, supporting diverse VLM training strategies.

Read original β†—