computer vision

125 items

RESEARCHarXiv CS.LG·5d ago

Do Transformers Need Three Projections? Systematic Study of QKV Variants

This research systematically evaluates variants of the Query, Key, and Value (QKV) attention formulation in Transformers, including shared key-value, query-key, and single projections. Experiments across synthetic, vision, and language modeling tasks demonstrate that these alternative formulations perform on par or occasionally better than standard QKV Transformers, with Q-K=V sharing offering significant KV cache reduction in language modeling.

QKV computer vision attention mechanisms Language modeling

ARTICLEDEV.to AI·27d ago

Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets

The article technically analyzes Google's Android Show announcements, focusing on the new Google Books app and vibe-coded widgets. It details how Google Books uses a proprietary rendering engine with ML for text recognition, while vibe-coded widgets leverage NLP and computer vision via TensorFlow Lite for personalized experiences.

Android machine learning computer vision Natural Language Processing

ARTICLE↑ trendingReddit r/MachineLearning·4/16/2026

Camera-ready paranoia [D]

A user expresses "camera-ready paranoia" after submitting their paper to CVPRW, fearing rejection due to potential errors despite having used a PDF validation tool and the correct template. They are seeking confirmation on when the paper will be placed in the proceedings, noting its current status as "In production".

academic submission research publishing computer vision AI Research

ARTICLEDEV.to AI·4/18/2026

Privacy-Preserving Active Learning for sustainable aquaculture monitoring systems with inverse simulation verification

The content introduces the challenges of optimizing sustainable aquaculture using AI, specifically citing data scarcity, privacy concerns, and the simulation-to-reality gap in computer vision applications. It describes the author's journey to formulate a Privacy-Preserving Active Learning approach with inverse simulation verification to address these practical issues.

Privacy AI aquaculture computer vision sustainable AI

RESEARCHDEV.to AI·4/13/2026

FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age

FairFace is a face attribute dataset designed to mitigate biases in AI models by providing balanced representation across race, gender, and age. It aims to improve the fairness and robustness of computer vision systems, ensuring more equitable performance.

FairFace Dataset Bias Mitigation computer vision

RESEARCHDEV.to AI·4/8/2026

An All-in-One Network for Dehazing and Beyond

Este conteúdo explora uma rede neural unificada projetada para remover neblina de imagens e potencialmente realizar outras tarefas de processamento de imagem. Aborda soluções avançadas em visão computacional e inteligência artificial.

Image processing deep learning computer vision Dehazing

RESEARCHDEV.to AI·5/2/2026

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection inAutonomous Driving

This research introduces a Temporal-Channel Transformer designed for 3D Lidar-based video object detection. It aims to improve the perception capabilities of autonomous driving systems by processing sequential Lidar data.

object detection computer vision autonomous driving LiDAR

ARTICLEDEV.to AI·5/4/2026

Flagged by a Face: Innocent Shoppers Banned With No Way to Fight Back

The article discusses how technical failures in facial recognition watchlists lead to innocent shoppers being banned without recourse. It highlights the gap between identification accuracy in labs and real-world accountability, emphasizing the problem of false positives in 1:N search systems.

ethics privacy security computer vision

RESEARCHDEV.to AI·5/9/2026

Anticipating Visual Representations from Unlabeled Video

This content explores methods for anticipating visual representations from unlabeled video. The research investigates models' ability to learn visual features without explicit supervision, enhancing contextual understanding in video sequences.

computer vision representation learning video-analysis unsupervised learning

RESEARCHarXiv CS.LG·25d ago

Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations

This paper investigates certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. It proposes a reusable monitor that infers safety-relevant quantities from images and provides finite-sample guarantees, leveraging semantic latent representations to certify formulas without per-formula retraining.

machine learning computer vision runtime monitoring formal methods

RESEARCHDEV.to AI·4/19/2026

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

F-VLM introduces a novel approach for open-vocabulary object detection by efficiently leveraging frozen pre-trained vision and language models. This method allows for identifying a wide range of objects without requiring specific training data for each new category.

Vision-Language Models deep learning object detection computer vision

DOCDEV.to AI·5/10/2026

How I cut speech-bubble retries from 70% to 0% with 200 lines of Pillow code

The author drastically reduced AI image generation retries for unreadable text in speech bubbles by offloading typography to a deterministic post-processing step. This involved having the AI draw empty bubbles and then using Pillow and OpenCV to add readable text, achieving a zero-retry rate for text issues.

Image processing AI generation Stable Diffusion computer vision

RESEARCHarXiv CS.AI·4/20/2026

GIST: Multimodal Knowledge Extraction and Spatial Grounding via Intelligent Semantic Topology

GIST introduces a multimodal knowledge extraction pipeline for spatial grounding in complex environments, transforming mobile point clouds into semantically annotated navigation topologies. It distills scenes into 2D occupancy maps, extracts topological layouts, and overlays a lightweight semantic layer to aid embodied AI and humans.

multimodal AI navigation spatial AI computer vision

RESEARCHarXiv CS.CL·4/23/2026

Hybrid Multi-Phase Page Matching and Multi-Layer Diff Detection for Japanese Building Permit Document Review

This research presents a hybrid multi-phase page matching algorithm for automating the comparison of complex Japanese building permit document sets, which is currently a labor-intensive and error-prone manual process. The algorithm robustly pairs pages across revisions using structural alignment and dynamic programming, then applies a multi-layer diff engine to produce detailed difference reports with high accuracy.

machine learning computer vision document processing automation

RESEARCHDEV.to AI·5/9/2026

DeXpression: Deep Convolutional Neural Network for Expression Recognition

DeXpression is a deep convolutional neural network model designed for accurate facial expression recognition. It aims to enhance computer vision systems' ability to interpret human emotions from images.

facial expression recognition deep learning computer vision Convolutional Neural Networks

ARTICLEDEV.to AI·4/23/2026

BiRefNet vs rembg vs U2Net: Which Background Removal Model Actually Works in Production?

This article compares the production performance of BiRefNet, rembg, and U2Net background removal models, emphasizing that real-world differences are much larger than benchmarks suggest. It details the brutal and distinct failure cases of each model when applied at scale.

AI models Production AI Image processing Benchmarking

RESEARCHarXiv CS.AI·4/8/2026

Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation

Este trabalho propõe um framework generativo para sintetizar veículos 3D Gaussianos animáveis a partir de uma única imagem ou entradas multi-view esparsas. Ele visa superar as limitações dos modelos de veículos rígidos atuais em simulações de direção autônoma, introduzindo um módulo de refinamento para articulação de partes.

computer vision autonomous driving 3D Generation Vehicle Simulation

ARTICLEDEV.to AI·4/15/2026

We Integrated Netflix's VOID Model Into Our API — Here's What Nobody Tells You About Video Object Removal

The article describes the integration of Netflix's VOID model into an API for video object removal. It explains how VOID overcomes the issues of traditional inpainting tools, which fail to address artifacts and physics, by treating the problem as 4D.

Netflix VOID computer vision Video Inpainting Video Object Removal

ARTICLEDEV.to AI·4/18/2026

Discord Leaked 70,000 IDs Answering One Simple Question: Are You 18?

Discord's exposure of 70,000 government IDs for age verification highlights a severe case of architectural over-collection. The article advocates for moving away from full identity-linked verification towards threshold-based estimation, utilizing facial age estimation tools for binary questions.

biometrics data privacy data breach computer vision

ARTICLEDEV.to AI·17d ago

YouTube Just Made Every Creator a Deepfake Cop — Here's Why Investigators Should Be Nervous

YouTube's expanded deepfake detection tools transform synthetic media verification into a standard production requirement, shifting the burden of proof in digital investigations. This "democratization of detection" implies that platform likeness detection flags will become primary artifacts in legal and insurance disputes.

deepfake security computer vision fraud detection