computer vision

125 items

ARTICLE↑ trendingReddit r/MachineLearning·4/16/2026

Can frontier AI models actually read a painting? [R]

An experiment tested frontier multimodal AI models, including Gemini 3.1 Pro and GPT-5.4, on their ability to appraise art from vision alone. The study revealed a "recognition vs commitment gap," suggesting that for AI, "seeing" something and actually relying on what is seen are not the same.

multimodal AI AI capabilities art appraisal Benchmarking

RESEARCH↑ trendingReddit r/MachineLearning·4/18/2026

We’re proud to open-source LIDARLearn [R] [D] [P]

LIDARLearn is a unified PyTorch library for 3D point cloud deep learning, supporting 56 ready-to-use configurations and built-in cross-validation. It also automates publication-ready LaTeX PDF generation after training, making it ideal for researchers in 3D computer vision and remote sensing.

Open Source deep learning computer vision 3d-point-cloud

We’re proud to open-source LIDARLearn [R] [D] [P]

RESEARCH↑ trendingReddit r/LocalLLaMA·4/27/2026

Microsoft Presents "TRELLIS.2": An Open-Source, 4b-Parameter, Image-To-3D Model Producing Up To 1536³ PBR Textured Assets, Built On Native 3D VAES With 16× Spatial Compression, Delivering Efficient, Scalable, High-Fidelity Asset Generation.

Microsoft's TRELLIS.2 is a 4B-parameter open-source model for high-fidelity image-to-3D generation, capable of producing up to 1536³ PBR textured assets using native 3D VAES and a novel O-Voxel structure. It offers an efficient, scalable solution for generating detailed 3D assets with full PBR materials.

Open Source Image-to-3D 3D modeling computer vision

Microsoft Presents "TRELLIS.2": An Open-Source, 4b-Parameter, Image-To-3D Model Producing Up To 1536³ PBR Textured Assets, Built On Native 3D VAES With 16× Spatial Compression, Delivering Efficient, Scalable, High-Fidelity Asset Generation.

ARTICLE↑ trendingReddit r/MachineLearning·5/7/2026

Dataset of 150k+ stool images and not sure how to fully use it [D]

A user with a 150k stool image dataset seeks best practices for training a computer vision model. They question their current manual verification workflow and look for smarter, more scalable approaches for ensuring dataset and annotation quality.

dataset-quality model training machine learning computer vision

RESEARCH↑ trendingReddit r/MachineLearning·5/7/2026

Visual Perceptual to Conceptual First-Order Rule Learning Networks [R]

This content discusses recent research in the field of Inductive Logic Programming (ILP) that explores predicate induction on image datasets. The author questions ILP's ability to compete with Deep Learning/Neural Networks in computer vision, despite strong performance claims.

learning machine learning computer vision ILP

RESEARCH↑ trendingReddit r/MachineLearning·27d ago

Elastic Attention Cores for Scalable Vision Transformers [R]

This paper introduces Elastic Attention Cores as a new building block for scalable Vision Transformers, addressing the high cost of dense self-attention. The approach uses a core-periphery block-sparse attention structure and nested dropout for elastic inference cost adjustments, achieving competitive accuracy.

deep learning computer vision attention mechanisms Vision Transformers

Elastic Attention Cores for Scalable Vision Transformers [R]

ARTICLE↑ trendingReddit r/LocalLLaMA·4/21/2026

Gemma 4 Vision

Gemma 4's default vision budget is often too low for effective detail recognition, causing poor OCR performance. Users can significantly enhance its vision by configuring `llama.cpp` parameters like `--image-min-tokens` and `--image-max-tokens` to higher values, such as 560 and 2240.

Optimization Configuration computer vision Gemma

ARTICLE↑ trendingReddit r/MachineLearning·4/9/2026

Detecting mirrored selfie images: OCR the best way? [D]

O usuário busca um método eficaz para detectar texto invertido em selfies antes de passá-los para modelos de Visão-Linguagem (VLM) ou extratores de embedding facial, que são insensíveis a essa inversão devido ao treinamento com dados aumentados. Sua ideia é usar OCR (EasyOCR) para comparar a pontuação de leitura de texto normal versus espelhado, questionando se esta é a melhor abordagem ou se existe uma solução de modelo menor e mais inteligente.

AI models Image processing Vision-Language Models computer vision

ARTICLE↑ trendingReddit r/MachineLearning·4/10/2026

What image/video training data is hardest to find right now? [R]

Um usuário está desenvolvendo uma plataforma de coleta de fotos crowdsourced, utilizando YOLO/CLIP para rotulagem automática e enriquecimento de metadados. Ele busca sugestões sobre quais tipos de dados de imagem são mais difíceis de encontrar e mais desejados para o treinamento de modelos de IA, citando exemplos como cenas de rua europeias ou prateleiras de supermercado.

computer vision Image Annotation AI development Crowdsourcing

ARTICLEDEV.to AI·2d ago

Iowa Wants Your Driver's License. Nobody Will Say Where It Goes.

Iowa's new age-gate law presents significant technical hurdles for developers, mandating "reasonable age verification" under threat of heavy fines. This shift requires the deployment of complex solutions like document OCR, facial comparison, and robust data retention logic, turning security features into critical backend requirements.

biometrics privacy security computer vision

ARTICLE↑ trendingReddit r/MachineLearning·4/17/2026

Thoughts on vision-captchas [D]

The author explores the potential of vision-based CAPTCHAs (webcam + gesture detection) that run locally in the browser for bot prevention. They raise the question of trust and privacy concerns regarding camera usage for this purpose.

AI applications privacy security computer vision

ARTICLE↑ trendingReddit r/MachineLearning·4/27/2026

What do reviewers actually mean when they say the paper sound more like a technical report? [D]

An author's paper was rejected from a workshop for sounding more like a technical report than a research paper, despite following the usual computer vision format. They are seeking community opinion to understand common faux pas that lead to such an assessment.

academic publishing computer vision Peer review AI Research

RESEARCH↑ trendingReddit r/MachineLearning·5/5/2026

Struggling to reproduce paper results before improving them — stuck below reported accuracy [R]

A PhD student in AI/computer vision is struggling to reproduce the reported accuracy of a published paper, consistently achieving ~73% against the paper's ~77% baseline. Despite thorough checks and attempts to contact authors, the student is encountering a reproducibility gap that impedes further research.

research PhD student machine learning computer vision

ARTICLEDEV.to AI·4/20/2026

Building a Touchless AI Mouse Control in 2 hours with Python 🖱️✨

This article introduces NUMBA_3, a Python-based open-source tool that enables touchless AI mouse control using a webcam and hand gestures. Developed in just two hours, it leverages MediaPipe, OpenCV, and Numba for fluid, real-time cursor movement, packaged with PyInstaller.

Open Source human-computer interaction machine learning computer vision

ARTICLEDEV.to AI·3d ago

Face Recognition: From Traditional to Deep Learning Methods

This content explores face recognition methods, from traditional approaches to the latest deep learning techniques. It provides a comprehensive overview of the evolution and technologies employed in this field of artificial intelligence.

deep learning Face Recognition computer vision AI Methods

NEWS↑ trendingReddit r/MachineLearning·4/20/2026

CVPR Broadening Participation Results. [D]

A user on Reddit reported not receiving the CVPR26 Broadening Participation Scholarship decision email, despite conference chairs confirming that all participants had been notified. The user is still awaiting their decision communication.

scholarship AI conference computer vision

ARTICLEDEV.to AI·3d ago

Mom, Don't Wire That Money: The 6-Word Rule That Stops a $1M Deepfake Cold

A recent deepfake scam where a senior lost nearly $1 million to a synthetic Canadian Prime Minister highlights a critical breakdown of biometric trust. This incident signals that human verification is no longer a reliable fail-safe due to the sophistication of generative AI.

biometrics deepfake security computer vision

RESEARCHDEV.to AI·4/18/2026

Density-aware Chamfer Distance as a Comprehensive Metric for Point CloudCompletion

This content introduces the "Density-aware Chamfer Distance" as a new comprehensive metric for evaluating point cloud completion tasks. It aims to provide a more robust and accurate assessment of completed 3D models.

3D reconstruction point cloud Metrics computer vision

RESEARCHDEV.to AI·3d ago

Aligning where to see and what to tell: image caption with region-basedattention and scene factorization

This work introduces a method for image caption generation, utilizing region-based attention and scene factorization to enhance descriptive relevance and accuracy. It aims to more effectively align visual perception with textual narration.

scene understanding deep learning computer vision attention mechanisms

RESEARCHDEV.to AI·4/19/2026

Self-Supervised Learning for Stereo Matching with Self-Improving Ability

This work explores self-supervised learning for stereo matching, focusing on a system with a self-improving ability. The research aims to enhance the accuracy and robustness of computer vision algorithms in depth estimation.

Stereo Matching deep learning self-supervised learning computer vision