Fine-tuning

59 items

NEWS↑ trendingReddit r/MachineLearning·4/21/2026

We open-sourced Chaperone-Thinking-LQ-1.0 — a 4-bit GPTQ + QLoRA fine-tuned DeepSeek-R1-32B that hits 84% on MedQA in ~20GB[N]

Chaperone-Thinking-LQ-1.0, a 4-bit GPTQ + QLoRA fine-tuned DeepSeek-R1-32B model, has been open-sourced. It achieves 84% accuracy on MedQA, close to GPT-4o, while being only ~20GB in size and 1.6x faster than the base model.

Open Source Benchmarking quantization Fine-tuning

RESEARCHarXiv CS.CL·1d ago

Evaluating Hallucinations in Domain-Adapted Large Language Models

This study investigates hallucinations in domain-adapted Large Language Models, specifically Llama-2 fine-tuned with the Lamini dataset. It found that while the model excels in training-similar tasks, its ability to reason about and recall new domain-specific information is limited, leading to hallucinations and a tendency for over-generation.

Llama-2 hallucinations Domain Adaptation large language models

RESEARCHarXiv CS.CL·1d ago

Post-training is (Massive) Supervised Learning

This paper argues that the prevailing post-training paradigm for LLMs, involving SFT and RL, effectively reverts to the "pre-train then fine-tune" approach, explicitly tailoring models to specific benchmarks. Empirical evidence shows that models post-trained from scratch can yield significant performance on reasoning datasets.

LLMs machine learning Benchmarking Training

ARTICLE↑ trendingReddit r/MachineLearning·4/18/2026

Trials and tribulations fine-tuning & deploying Gemma-4 [P]

An ML team documented the technical challenges faced while fine-tuning and deploying Gemma-4. Key issues included PEFT's incompatibility with Gemma 4's custom layers, SFTTrainer silently breaking KV-sharing attention, and DeepSpeed ZeRO-3 saving half-empty LoRA adapters.

MLOps Gemma 4 Fine-tuning LoRA

RESEARCHarXiv CS.CL·1d ago

Community-Specific Slang and Entity Detection via Semantic Shift in Fine-Tuned Language Models

This research proposes an unsupervised method to identify community-specific slang and unique entities by analyzing the magnitude of semantic shift. Semantic shift is defined as the evolution of a word's encoded representation after fine-tuning a pre-trained Large Language Model (LLM) on a community-specific text corpus.

online-communities semantic-shift Natural Language Processing large language models

ARTICLE↑ trendingReddit r/LocalLLaMA·4/10/2026

[Model Release] I trained a 9B model to be agentic Data Analyst (Qwen3.5-9B + LoRA). Base model failed 100%, this LoRA completes 89% of workflows without human intervention.

Um desenvolvedor treinou um modelo Qwen3.5-9B com LoRA para atuar como analista de dados agente, focando em autonomia através de pesos. O modelo alcançou 89% de conclusão de fluxos de trabalho de ponta a ponta sem intervenção humana, superando a falha total do modelo base.

data analysis Agentic AI Fine-tuning LoRA

ARTICLE↑ trendingReddit r/MachineLearning·4/26/2026

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

The author is transitioning from fine-tuning dense 3B/7B transformers to NVIDIA's Nemotron 3 Nano (a hybrid Mamba-Attention-MoE architecture) for multi-task reasoning. They are seeking guidance on how the hybrid architecture impacts standard LoRA fine-tuning, as their prior experience is limited to dense models.

LLMs multi-task reasoning AI Architectures Fine-tuning

ARTICLE↑ trendingReddit r/MachineLearning·4/23/2026

First time fine-tuning, need a sanity check — 3B or 7B for multi-task reasoning? [D]

A self-taught user new to fine-tuning seeks advice on choosing between 3B and 7B LLM models for a multi-task reasoning project. The project involves understanding underlying questions, maintaining multiple perspectives, and handling messy inputs.

LLMs model selection multi-task reasoning NLP

ARTICLE↑ trendingReddit r/LocalLLaMA·4/26/2026

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found!

This content reviews the Qwen3.6 35B A3B Heretic model, praising it as the best uncensored 35B model the user has found. It highlights its ability to fit in 24GB VRAM, handle multi-turn tool calls, and its potential to benchmark higher than the original Qwen 3.6 model.

Model Evaluation Fine-tuning LLM

Qwen3.6 35B A3B Heretic (KLD 0.0015!) Incredible model. Best 35B I have found!

RESEARCHarXiv CS.LG·4/20/2026

Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

Aletheia introduces a gradient-guided layer selection method for LoRA fine-tuning, identifying the most task-relevant layers and applying adapters selectively with asymmetric rank. This approach achieves a significant 15-28% training speedup across diverse large language models and architectures while broadly matching downstream behavior.

Parameter-efficient fine-tuning efficiency large language models Fine-tuning

RESEARCHarXiv CS.LG·4/21/2026

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

This research introduces a rubric-based Generative Reward Model (GRM) to enhance Reinforced Fine-Tuning (RFT) for LLM Agents in Software Engineering (SWE) tasks. By providing richer learning signals beyond binary terminal rewards, this approach shapes intermediate behaviors and significantly improves the quality of the resolution process.

reinforcement learning Fine-tuning Software Engineering AI agents

RESEARCHarXiv CS.LG·4/22/2026

Discrete Tilt Matching

Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion large language models (dLLMs), addressing the intractability of sequence-level marginal likelihoods in RL. It recasts fine-tuning as state-level matching, using a weighted cross-entropy objective with control variates for stability, and achieves strong results on various tasks like Sudoku and Countdown.

Diffusion Models LLMs reinforcement learning machine learning

ARTICLE↑ trendingReddit r/LocalLLaMA·4/14/2026

These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade

The title suggests that fine-tuning local AI models using

model performance AI models LLMs local models

These "Claude-4.6-Opus" Fine Tunes of Local Models Are Usually A Downgrade

RESEARCHarXiv CS.AI·6d ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

StepPRM-RTL is a novel framework that enhances LLM-based RTL code generation by combining stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT). It uses dense feedback from a PRM to guide reinforcement-style updates and Monte Carlo Tree Search (MCTS) to enrich the training dataset.

LLMs reinforcement learning code generation RTL Synthesis

RESEARCHarXiv CS.CL·4/20/2026

Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch

This research introduces a data-efficient fine-tuning framework to teach large language models to effectively code-switch for reasoning tasks. It identifies beneficial code-switched behaviors, moving beyond treating code-switching as an error, through systematic analysis of diverse reasoning traces.

Multilingual AI Code-Switching Reasoning large language models

RESEARCHDEV.to AI·4/20/2026

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

O1-Pruner introduces a length-harmonizing fine-tuning method aimed at improving reasoning capabilities through model pruning. This technique focuses on optimizing models for specific O1-like reasoning tasks.

Pruning Reasoning Fine-tuning model optimization

DOCAWS Machine Learning Blog·6d ago

Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

This post explains how to use Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to enhance the tool-calling accuracy of small language models. It details how to leverage Amazon SageMaker AI training jobs to focus on training code and evaluate model quality.

SageMaker learning tool-calling SLM

DOCDEV.to AI·16d ago

96. LoRA: Fine-Tune a Billion-Parameter Model on a Laptop

This article explains how the LoRA (Low-Rank Adaptation) technique enables fine-tuning billion-parameter language models on consumer hardware like laptops. Instead of updating all parameters, LoRA adds tiny trainable modules, drastically reducing GPU memory requirements.

GPU memory Fine-tuning LoRA HuggingFace

RESEARCHarXiv CS.LG·4/15/2026

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

Polynomial Expansion Rank Adaptation (PERA) is a novel method to enhance low-rank adaptation (LoRA) for fine-tuning large language models. It introduces structured polynomial expansion into the low-rank factor space to model richer nonlinear high-order interactions, overcoming LoRA's linear limitations without increasing rank or inference cost.

LLMs Low-Rank Adaptation machine learning Polynomial Expansion

RESEARCHarXiv CS.CL·4/20/2026

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

This paper analyzes the interpretive behaviors of LLMs for automated code compliance using perturbation-based attribution analysis, comparing different fine-tuning strategies and model scales. Results show full fine-tuning yields more focused attribution patterns, and larger models prioritize specific textual elements like numerical constraints.

model interpretability LLMs Machine learning research Fine-tuning