← heapsort-ai

computer vision

125 items

RESEARCH↑ trendingReddit r/LocalLLaMA·4/27/2026

Microsoft Presents "TRELLIS.2": An Open-Source, 4b-Parameter, Image-To-3D Model Producing Up To 1536³ PBR Textured Assets, Built On Native 3D VAES With 16× Spatial Compression, Delivering Efficient, Scalable, High-Fidelity Asset Generation.

Microsoft's TRELLIS.2 is a 4B-parameter open-source model for high-fidelity image-to-3D generation, capable of producing up to 1536³ PBR textured assets using native 3D VAES and a novel O-Voxel structure. It offers an efficient, scalable solution for generating detailed 3D assets with full PBR materials.

Microsoft Presents "TRELLIS.2": An Open-Source, 4b-Parameter, Image-To-3D Model Producing Up To 1536³ PBR Textured Assets, Built On Native 3D VAES With 16× Spatial Compression, Delivering Efficient, Scalable, High-Fidelity Asset Generation.
42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/21/2026

Gemma 4 Vision

Gemma 4's default vision budget is often too low for effective detail recognition, causing poor OCR performance. Users can significantly enhance its vision by configuring `llama.cpp` parameters like `--image-min-tokens` and `--image-max-tokens` to higher values, such as 560 and 2240.

41
ARTICLE↑ trendingReddit r/MachineLearning·4/9/2026

Detecting mirrored selfie images: OCR the best way? [D]

O usuário busca um método eficaz para detectar texto invertido em selfies antes de passá-los para modelos de Visão-Linguagem (VLM) ou extratores de embedding facial, que são insensíveis a essa inversão devido ao treinamento com dados aumentados. Sua ideia é usar OCR (EasyOCR) para comparar a pontuação de leitura de texto normal versus espelhado, questionando se esta é a melhor abordagem ou se existe uma solução de modelo menor e mais inteligente.

40
ARTICLE↑ trendingReddit r/MachineLearning·4/10/2026

What image/video training data is hardest to find right now? [R]

Um usuário está desenvolvendo uma plataforma de coleta de fotos crowdsourced, utilizando YOLO/CLIP para rotulagem automática e enriquecimento de metadados. Ele busca sugestões sobre quais tipos de dados de imagem são mais difíceis de encontrar e mais desejados para o treinamento de modelos de IA, citando exemplos como cenas de rua europeias ou prateleiras de supermercado.

40
RESEARCH↑ trendingReddit r/MachineLearning·5/5/2026

Struggling to reproduce paper results before improving them — stuck below reported accuracy [R]

A PhD student in AI/computer vision is struggling to reproduce the reported accuracy of a published paper, consistently achieving ~73% against the paper's ~77% baseline. Despite thorough checks and attempts to contact authors, the student is encountering a reproducibility gap that impedes further research.

36