← heapsort-ai

Model Architecture

13 items

ARTICLE↑ trendingReddit r/MachineLearning·4/26/2026

Can Geometric Deep Learning lead eliminate the need of "Brute Force" pre-training [D]

The author questions whether Geometric Deep Learning, by explicitly building symmetries and invariances into its architecture, could significantly reduce or eliminate the need for extensive, data-intensive pre-training. This raises the question of whether current massive-scale pre-training is largely a consequence of architectures lacking inherent invariance.

42
RESEARCHarXiv CS.CL·5/1/2026

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

This paper introduces the Length Value Model (LenVM), a novel token-level framework for modeling the remaining generation length in autoregressive models. By formulating length modeling as a value estimation problem, LenVM provides an annotation-free, scalable, and effective signal for LLMs and VLMs, improving performance on exact length matching tasks.

27
RESEARCHarXiv CS.CL·27d ago

The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models

The Bicameral Model couples two frozen, pretrained language models via a trainable neural interface on their intermediate hidden states, allowing them to operate in lockstep. This method enables a primary model to drive a task while an auxiliary model uses tools or solves constraints, significantly improving accuracy on tasks like arithmetic and logic puzzles.

27