machine learning

790 items

RESEARCHarXiv CS.AI·15d ago

Confidence Calibration in Large Language Models

This study investigates confidence calibration in Large Language Models (LLMs) across diverse tasks, finding that current LLMs are overconfident on difficult tests and underconfident on easy ones. The researchers developed LifeEval, a new test to evaluate model calibration across varying levels of difficulty.

Confidence Calibration Overconfidence machine learning large language models

RESEARCHarXiv CS.LG·15d ago

CAFD: Concept-Aware DNN Fault Detection using VLMs

CAFD is a new learning-based method for detecting faults in Deep Neural Networks (DNNs) that combines multiple information sources for superior performance and efficiency. It utilizes model-based signals, distance features, and a novel Concept Failure Ratio (CFR) derived from Vision-Language Models (VLMs).

Fault Detection Vision-Language Models machine learning AI reliability

RESEARCHarXiv CS.LG·9d ago

LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study

This article presents a novel architecture for LLMs that eliminates the need for deep neural networks. The proposed model, building on RBF networks, finds the global optimum of the loss function in a single iteration, thereby removing the tedious training step.

neural networks AI architecture LLMs machine learning

RESEARCHarXiv CS.LG·7d ago

Making Brain-Computer Interfaces More Secure

This study proposes a lightweight custom Convolutional Neural Network (CNN) architecture to investigate adversarial robustness in EEG-based Brain-Computer Interfaces (BCIs). The method is assessed using two EEG datasets and compared with other CNN models under gradient-based adversarial attack scenarios to ensure reliable BCI deployment.

neural networks brain-computer interfaces security machine learning

RESEARCHarXiv CS.LG·7d ago

Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning

This paper introduces the Human-in-the-Loop Gated Bandit (HITL-GB) framework for dynamic pricing in short-term rental markets. It demonstrates that historical pricing data can be structurally equivalent to on-policy warm-up data, significantly reducing the cold-start period for online bandit learning.

human-in-the-loop machine learning contextual bandits online learning

RESEARCHarXiv CS.LG·7d ago

ReLoRA: Knowledge-Reusing Adaptation for Fast Rollout of Evolving LLM Services

This paper introduces ReLoRA, a knowledge-reusing re-adaptation framework that efficiently restores service-ready LoRA adapters for evolving LLM services. It addresses the computational cost of retraining and quality degradation from naive application to updated base models.

AI models machine learning fine-tuning LoRA

RESEARCHarXiv CS.AI·7d ago

Evaluating Transformer and LSTM Frameworks for Prediction in Ungauged Basins

This study evaluates Transformer and LSTM frameworks for streamflow inference in ungauged basins under limited hydrological information. The LSTM architecture showed stronger overall performance than the Transformer model, and incorporating downstream information further boosted performance for all models.

deep learning Environmental Modeling machine learning AI

RESEARCHarXiv CS.AI·16d ago

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

This paper introduces BOHM, a novel method for zero-cost hierarchical attribution in compound AI systems. Unlike traditional Shapley-based methods, BOHM extracts attribution directly from routing weights, eliminating the need for internal component access and providing multi-resolution insights.

attribution routing machine learning Explainable AI

RESEARCHarXiv CS.AI·9d ago

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

This paper disentangles two self-evolving LLM agent capabilities: harness-updating (producing useful updates) and harness-benefit (gaining from these updates). The analysis reveals that harness-updating is surprisingly consistent across models of different base capabilities, suggesting that even less capable models can produce useful updates.

AI capabilities LLM agents machine learning self-evolution

RESEARCHarXiv CS.LG·7d ago

Assessing Region-Level EEG Contributions to Cognitive Workload Prediction

This paper introduces a region-level evaluation framework for EEG-based cognitive workload prediction, analyzing contributions from anatomically defined scalp regions. It conducts a large-scale analysis across four public datasets to quantify region importance using a model-agnostic, performance-based approach.

Brain Activity Cognitive Workload machine learning Workload Prediction

RESEARCHarXiv CS.CL·9d ago

Configurable Reward Model for Balanced Safety Alignment

This paper introduces the Configurable Safety Reward Model (CSRM) to address the challenge of aligning LLMs with heterogeneous and rapidly evolving safety requirements. CSRM substantially improves generalization to previously unseen safety configurations by being jointly optimized for calibrated safety compliance and reward modeling, achieving state-of-the-art performance on benchmarks.

Generalization machine learning large language models Reward Models

RESEARCHarXiv CS.AI·16d ago

ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

ImProver 2 is a new neurosymbolic framework for automated proof optimization in Lean 4, designed to address challenges in refactoring formal mathematics proofs. It uses a data-efficient expert-iteration pipeline and a neurosymbolic scaffold, enabling a 7B-parameter model to achieve competitive performance against much larger models.

Neurosymbolic AI machine learning Proof Optimization Formal Mathematics

RESEARCHarXiv CS.LG·7d ago

Geometry-Aware Tabular Diffusion

Geometry-Aware Tabular Diffusion (GATD) is introduced to improve tabular synthesis by augmenting denoisers with pairwise angles and lengths computed from column value differences. It achieves state-of-the-art performance with fewer parameters, reducing Shape and Trend error, and showing that explicit relational supervision drives the gains.

Diffusion Models data synthesis deep learning machine learning

RESEARCHarXiv CS.LG·7d ago

Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection

Within-dataset class-split evaluation for anomaly detection can be ill-posed when anomaly classes overlap with normal data, causing score instability or inversion. A new diagnostic, neighborhood class leakage, is introduced to predict this instability across various datasets and models.

Score Instability Anomaly Detection machine learning Out-of-Distribution Detection

RESEARCHarXiv CS.LG·7d ago

Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation

This paper proposes an auditable climate risk intelligence framework for validating fragmented ESG data, integrating deterministic orchestration, temporal anomaly detection, and imbalance-aware ensemble learning. It introduces a synthetic ESG validation benchmark to support open reproducibility.

ESG Auditability Climate Risk machine learning

RESEARCHarXiv CS.LG·9d ago

Calibrated Preference Learning: The Case of Label Ranking

This paper formalizes calibration for probabilistic label ranking, introducing a hierarchy of notions for full, sub-ranking, and top-k calibration. Empirically, popular label ranking models are often poorly calibrated, with implications for RLHF reward models.

Calibration AI models ranking machine learning

RESEARCHarXiv CS.LG·9d ago

Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

Unicorn is a new framework for scalable, high-dimensional time series forecasting, bridging the gap between channel-independent and channel-dependent models. It leverages a latent prototype codebook to learn universal correlation patterns, significantly outperforming state-of-the-art architectures, especially in few-shot transfer scenarios.

forecasting pretraining deep learning machine learning

RESEARCHarXiv CS.LG·16d ago

Latent Cache Flow: Model-to-Model Communication Without Text

Latent Cache Flow (LCF) is introduced as a new method for efficient model-to-model communication, addressing the latency and information loss of text-based LLM agent communication. LCF jointly translates and compresses keys and values, significantly reducing adapter size and transmitting a summary of new information for differing contexts.

research machine learning AI Communication

RESEARCHarXiv CS.CL·15d ago

SLAP: Stratified Loss-based Pruning for On-Policy Data-Efficient Instruction Tuning

This research introduces SLAP, a novel batch-aware data selection framework designed to improve the data efficiency of instruction tuning for LLMs. SLAP optimizes learning by evaluating entire batch compositions, ensuring comprehensive data distribution coverage and maximizing intra-batch diversity to achieve lossless performance with reduced training costs.

Instruction Tuning LLMs machine learning model optimization

RESEARCHarXiv CS.LG·16d ago

FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

This research introduces FuRA (Full-Rank Adaptation), a novel parameter-efficient fine-tuning method that addresses limitations in existing techniques by incorporating spectral preconditioning. By reparameterizing weight matrices via full-rank Singular Value Decomposition and constraining updates, FuRA outperforms unconstrained Full Fine-Tuning while maintaining efficiency.

Optimization deep learning machine learning spectral preconditioning