SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]
An independent researcher created SGOCR, an open-source dataset pipeline for spatially-grounded, OCR-focused VQA, to fill a gap in visual datasets for text grounding in imagery. This pipeline generates VQA tuples with rich metadata, supporting diverse VLM training strategies.
