← heapsort-ai

computer vision

125 items

RESEARCHarXiv CS.LG·5d ago

Do Transformers Need Three Projections? Systematic Study of QKV Variants

This research systematically evaluates variants of the Query, Key, and Value (QKV) attention formulation in Transformers, including shared key-value, query-key, and single projections. Experiments across synthetic, vision, and language modeling tasks demonstrate that these alternative formulations perform on par or occasionally better than standard QKV Transformers, with Q-K=V sharing offering significant KV cache reduction in language modeling.

29
ARTICLEDEV.to AI·27d ago

Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets

The article technically analyzes Google's Android Show announcements, focusing on the new Google Books app and vibe-coded widgets. It details how Google Books uses a proprietary rendering engine with ML for text recognition, while vibe-coded widgets leverage NLP and computer vision via TensorFlow Lite for personalized experiences.

29
ARTICLE↑ trendingReddit r/MachineLearning·4/16/2026

Camera-ready paranoia [D]

A user expresses "camera-ready paranoia" after submitting their paper to CVPRW, fearing rejection due to potential errors despite having used a PDF validation tool and the correct template. They are seeking confirmation on when the paper will be placed in the proceedings, noting its current status as "In production".

29
ARTICLEDEV.to AI·4/18/2026

Privacy-Preserving Active Learning for sustainable aquaculture monitoring systems with inverse simulation verification

The content introduces the challenges of optimizing sustainable aquaculture using AI, specifically citing data scarcity, privacy concerns, and the simulation-to-reality gap in computer vision applications. It describes the author's journey to formulate a Privacy-Preserving Active Learning approach with inverse simulation verification to address these practical issues.

28
RESEARCHarXiv CS.LG·25d ago

Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations

This paper investigates certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. It proposes a reusable monitor that infers safety-relevant quantities from images and provides finite-sample guarantees, leveraging semantic latent representations to certify formulas without per-formula retraining.

28
RESEARCHarXiv CS.AI·4/20/2026

GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology

GIST introduces a multimodal knowledge extraction pipeline for spatial grounding in complex environments, transforming mobile point clouds into semantically annotated navigation topologies. It distills scenes into 2D occupancy maps, extracts topological layouts, and overlays a lightweight semantic layer to aid embodied AI and humans.

28
RESEARCHarXiv CS.CL·4/23/2026

Hybrid Multi-Phase Page Matching and Multi-Layer Diff Detection for Japanese Building Permit Document Review

This research presents a hybrid multi-phase page matching algorithm for automating the comparison of complex Japanese building permit document sets, which is currently a labor-intensive and error-prone manual process. The algorithm robustly pairs pages across revisions using structural alignment and dynamic programming, then applies a multi-layer diff engine to produce detailed difference reports with high accuracy.

28
RESEARCHarXiv CS.AI·4/8/2026

Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation

Este trabalho propõe um framework generativo para sintetizar veículos 3D Gaussianos animáveis a partir de uma única imagem ou entradas multi-view esparsas. Ele visa superar as limitações dos modelos de veículos rígidos atuais em simulações de direção autônoma, introduzindo um módulo de refinamento para articulação de partes.

28