← heapsort-ai

machine learning

790 items

RESEARCHarXiv CS.CL·4/16/2026

A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews

This study classifies sentiment in English and Bangla reviews of Bangladeshi government mobile banking apps, using a hybrid labeling approach for 5,652 reviews. It found that traditional machine learning models like Random Forest and Linear SVM significantly outperformed fine-tuned XLM-RoBERTa for this specific task.

31
ARTICLEDEV.to AI·4/18/2026

Part 2: The Data — Building the First Public Coffee Roasting Audio Dataset with Warp/Oz

This article describes the creation of the first public audio dataset for coffee roasting first crack detection, addressing a significant gap in available resources. The dataset, comprising 973 annotated 10-second segments, was meticulously built from scratch and led to a model achieving 100% precision thanks to careful data splitting and loss weighting.

31
RESEARCHarXiv CS.LG·4/22/2026

Discrete Tilt Matching

Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion large language models (dLLMs), addressing the intractability of sequence-level marginal likelihoods in RL. It recasts fine-tuning as state-level matching, using a weighted cross-entropy objective with control variates for stability, and achieves strong results on various tasks like Sudoku and Countdown.

30
ARTICLEDEV.to AI·4/25/2026

My AI Agent Over-Corrected Itself — So I Built Metabolic Regulation

The author details how their AI agent, with an Active Inference perception pipeline, learned a correction rule that led to over-correction, causing it to misclassify human speech. This incident highlights the challenge of building robust regulation mechanisms in AI systems to prevent over-generalization and suggests a need for more metabolic control.

30
RESEARCHDEV.to AI·4/10/2026

Cross-Modal Knowledge Distillation for planetary geology survey missions with ethical auditability baked in

O texto narra a jornada de pesquisa do autor em destilação de conhecimento cross-modal com auditabilidade ética, impulsionada pela observação de que IAs de classificação mineral podem tomar decisões tecnicamente corretas, mas eticamente ingênuas. O objetivo é desenvolver sistemas de IA que sejam precisos e eticamente robustos para missões de pesquisa geológica planetária.

30
RESEARCHarXiv CS.CL·5d ago

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

This paper introduces a hybrid pre-training objective for text encoders, combining a JEPA-style latent-space prediction loss with a standard Masked Language Modelling (MLM) objective. This new approach aims to encourage representations anchored to deeper semantic structure rather than just surface-form token identity, showing significantly more uniform embeddings.

30
RESEARCHarXiv CS.LG·4/17/2026

Shapley Value-Guided Adaptive Ensemble Learning for Explainable Financial Fraud Detection with U.S. Regulatory Compliance Validation

This research addresses the challenge of explainability in AI for financial fraud detection, crucial for U.S. regulatory compliance. It introduces the SHAP-Guided Adaptive Ensemble (SGAE) method, which dynamically adjusts ensemble weights based on SHAP attribution agreement, achieving high performance and transparency.

29
RESEARCHarXiv CS.AI·5/9/2026

ZAYA1-8B Technical Report

ZAYA1-8B is a reasoning-focused mixture-of-experts (MoE) model with 700M active parameters, outperforming DeepSeek-R1-0528 on math and coding benchmarks. It was trained from scratch for reasoning on an AMD platform and uses a four-stage RL cascade for post-training.

29