← heapsort-ai

Fine-tuning

59 items

RESEARCHarXiv CS.CL·1d ago

Post-training is (Massive) Supervised Learning

This paper argues that the prevailing post-training paradigm for LLMs, involving SFT and RL, effectively reverts to the "pre-train then fine-tune" approach, explicitly tailoring models to specific benchmarks. Empirical evidence shows that models post-trained from scratch can yield significant performance on reasoning datasets.

47
RESEARCHarXiv CS.CL·1d ago

Community-Specific Slang and Entity Detection via Semantic Shift in Fine-Tuned Language Models

This research proposes an unsupervised method to identify community-specific slang and unique entities by analyzing the magnitude of semantic shift. Semantic shift is defined as the evolution of a word's encoded representation after fine-tuning a pre-trained Large Language Model (LLM) on a community-specific text corpus.

46
ARTICLE↑ trendingReddit r/LocalLLaMA·4/10/2026

[Model Release] I trained a 9B model to be agentic Data Analyst (Qwen3.5-9B + LoRA). Base model failed 100%, this LoRA completes 89% of workflows without human intervention.

Um desenvolvedor treinou um modelo Qwen3.5-9B com LoRA para atuar como analista de dados agente, focando em autonomia através de pesos. O modelo alcançou 89% de conclusão de fluxos de trabalho de ponta a ponta sem intervenção humana, superando a falha total do modelo base.

42
ARTICLE↑ trendingReddit r/MachineLearning·4/26/2026

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

The author is transitioning from fine-tuning dense 3B/7B transformers to NVIDIA's Nemotron 3 Nano (a hybrid Mamba-Attention-MoE architecture) for multi-task reasoning. They are seeking guidance on how the hybrid architecture impacts standard LoRA fine-tuning, as their prior experience is limited to dense models.

42
RESEARCHarXiv CS.LG·4/20/2026

Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

Aletheia introduces a gradient-guided layer selection method for LoRA fine-tuning, identifying the most task-relevant layers and applying adapters selectively with asymmetric rank. This approach achieves a significant 15-28% training speedup across diverse large language models and architectures while broadly matching downstream behavior.

32
RESEARCHarXiv CS.LG·4/21/2026

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

This research introduces a rubric-based Generative Reward Model (GRM) to enhance Reinforced Fine-Tuning (RFT) for LLM Agents in Software Engineering (SWE) tasks. By providing richer learning signals beyond binary terminal rewards, this approach shapes intermediate behaviors and significantly improves the quality of the resolution process.

31
RESEARCHarXiv CS.LG·4/22/2026

Discrete Tilt Matching

Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion large language models (dLLMs), addressing the intractability of sequence-level marginal likelihoods in RL. It recasts fine-tuning as state-level matching, using a weighted cross-entropy objective with control variates for stability, and achieves strong results on various tasks like Sudoku and Countdown.

30
RESEARCHarXiv CS.AI·6d ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

StepPRM-RTL is a novel framework that enhances LLM-based RTL code generation by combining stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT). It uses dense feedback from a PRM to guide reinforcement-style updates and Monte Carlo Tree Search (MCTS) to enrich the training dataset.

29
RESEARCHarXiv CS.LG·4/15/2026

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

Polynomial Expansion Rank Adaptation (PERA) is a novel method to enhance low-rank adaptation (LoRA) for fine-tuning large language models. It introduces structured polynomial expansion into the low-rank factor space to model richer nonlinear high-order interactions, overcoming LoRA's linear limitations without increasing rank or inference cost.

28
RESEARCHarXiv CS.CL·4/20/2026

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

This paper analyzes the interpretive behaviors of LLMs for automated code compliance using perturbation-based attribution analysis, comparing different fine-tuning strategies and model scales. Results show full fine-tuning yields more focused attribution patterns, and larger models prioritize specific textual elements like numerical constraints.

28