← heapsort-ai

Machine learning research

6 items

RESEARCHarXiv CS.LG·4/17/2026

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

MixAtlas introduces an uncertainty-aware method for optimizing data mixtures in multimodal LLM midtraining by decomposing corpora along image concepts and task supervision. Using proxy models and a Gaussian-process surrogate, it finds better-performing data recipes for improved sample efficiency and generalization.

32
RESEARCHarXiv CS.CL·4/20/2026

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

This paper analyzes the interpretive behaviors of LLMs for automated code compliance using perturbation-based attribution analysis, comparing different fine-tuning strategies and model scales. Results show full fine-tuning yields more focused attribution patterns, and larger models prioritize specific textual elements like numerical constraints.

28
RESEARCHarXiv CS.LG·21d ago

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

This research addresses the challenge of poor credit assignment in reinforcement learning for multi-step reasoning with large language models, caused by sparse terminal rewards leading to high gradient variance and unstable training. It proposes a counterfactual comparison-based framework and Implicit Behavior Policy Optimization (IBPO) to create step-sensitive learning signals, significantly improving training stability and performance.

27
RESEARCHarXiv CS.AI·28d ago

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

This research paper introduces Auto-Rubric as Reward (ARR), a novel framework for aligning multimodal generative models with human preferences. ARR externalizes a VLM's implicit preference knowledge into explicit, prompt-specific rubrics, decomposing human judgment into independently verifiable quality dimensions to overcome limitations of traditional RLHF approaches.

27