← heapsort-ai

machine learning

790 items

RESEARCHarXiv CS.LG·7d ago

Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning

This paper introduces the Human-in-the-Loop Gated Bandit (HITL-GB) framework for dynamic pricing in short-term rental markets. It demonstrates that historical pricing data can be structurally equivalent to on-policy warm-up data, significantly reducing the cold-start period for online bandit learning.

27
RESEARCHarXiv CS.AI·9d ago

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

This paper disentangles two self-evolving LLM agent capabilities: harness-updating (producing useful updates) and harness-benefit (gaining from these updates). The analysis reveals that harness-updating is surprisingly consistent across models of different base capabilities, suggesting that even less capable models can produce useful updates.

27
RESEARCHarXiv CS.CL·9d ago

Configurable Reward Model for Balanced Safety Alignment

This paper introduces the Configurable Safety Reward Model (CSRM) to address the challenge of aligning LLMs with heterogeneous and rapidly evolving safety requirements. CSRM substantially improves generalization to previously unseen safety configurations by being jointly optimized for calibrated safety compliance and reward modeling, achieving state-of-the-art performance on benchmarks.

27
RESEARCHarXiv CS.AI·16d ago

ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

ImProver 2 is a new neurosymbolic framework for automated proof optimization in Lean 4, designed to address challenges in refactoring formal mathematics proofs. It uses a data-efficient expert-iteration pipeline and a neurosymbolic scaffold, enabling a 7B-parameter model to achieve competitive performance against much larger models.

27
RESEARCHarXiv CS.LG·7d ago

Geometry-Aware Tabular Diffusion

Geometry-Aware Tabular Diffusion (GATD) is introduced to improve tabular synthesis by augmenting denoisers with pairwise angles and lengths computed from column value differences. It achieves state-of-the-art performance with fewer parameters, reducing Shape and Trend error, and showing that explicit relational supervision drives the gains.

27
RESEARCHarXiv CS.LG·7d ago

Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation

This paper proposes an auditable climate risk intelligence framework for validating fragmented ESG data, integrating deterministic orchestration, temporal anomaly detection, and imbalance-aware ensemble learning. It introduces a synthetic ESG validation benchmark to support open reproducibility.

27
RESEARCHarXiv CS.LG·9d ago

Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

Unicorn is a new framework for scalable, high-dimensional time series forecasting, bridging the gap between channel-independent and channel-dependent models. It leverages a latent prototype codebook to learn universal correlation patterns, significantly outperforming state-of-the-art architectures, especially in few-shot transfer scenarios.

27
RESEARCHarXiv CS.CL·15d ago

SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning

This research introduces SLAP, a novel batch-aware data selection framework designed to improve the data efficiency of instruction tuning for LLMs. SLAP optimizes learning by evaluating entire batch compositions, ensuring comprehensive data distribution coverage and maximizing intra-batch diversity to achieve lossless performance with reduced training costs.

27
RESEARCHarXiv CS.LG·16d ago

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

This research introduces FuRA (Full-Rank Adaptation), a novel parameter-efficient fine-tuning method that addresses limitations in existing techniques by incorporating spectral preconditioning. By reparameterizing weight matrices via full-rank Singular Value Decomposition and constraining updates, FuRA outperforms unconstrained Full Fine-Tuning while maintaining efficiency.

27