← heapsort-ai

Optimization

134 items

RESEARCHarXiv CS.CL·12d ago

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget

EvoSpec introduces a framework for real-time evolution of draft models in speculative decoding for Large Language Models, addressing the bottleneck of large vocabulary sizes. It uses dynamic vocabulary and parameter adaptation, employing a context-aware mechanism and a lightweight online alignment strategy to improve acceptance rates and minimize distributional gaps.

27
ARTICLEDEV.to AI·4/27/2026

Context Compression in .NET

This quick tip explains how to implement context compression in .NET for RAG systems, addressing the lack of a direct equivalent to tools like LLMLingua. It proposes using a smaller, cheaper worker model to pre-process retrieved documentation, extracting only essential facts to reduce cost and latency with premium AI models.

27
RESEARCHarXiv CS.LG·4/6/2026

Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers

Este estudo caracteriza a sobrecarga de despacho do WebGPU para inferência de LLM em diversas plataformas de GPU, backends e navegadores. Ele revela que benchmarks simples superestimam os custos e identifica o verdadeiro custo por despacho da API WebGPU, destacando a necessidade dessa distinção para otimizações eficazes.

27
RESEARCHarXiv CS.AI·4/30/2026

Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

This paper proposes a hierarchical framework to induce multiple evidence-grounded user personas from behavioral logs by clustering intent memories and optimizing persona quality. The method utilizes a groupwise extension of Direct Preference Optimization (DPO) and demonstrates more coherent, truthful personas, also improving future interaction prediction.

27
RESEARCHarXiv CS.AI·4/14/2026

Linear Programming for Multi-Criteria Assessment with Cardinal and Ordinal Data: A Pessimistic Virtual Gap Analysis

This paper introduces novel linear programming-based Virtual Gap Analysis (VGA) models for multi-criteria assessment, addressing issues of subjective evaluations and data diversity. The two-step method assesses alternatives pessimistically using cardinal and ordinal data, enabling efficient ranking and elimination of unfavorable options within decision support systems.

27
RESEARCHarXiv CS.AI·4/22/2026

On Solving the Multiple Variable Gapped Longest Common Subsequence Problem

This paper tackles the Variable Gapped Longest Common Subsequence (VGLCS) problem, a generalization of LCS with flexible gap constraints, relevant to molecular sequence comparison and time-series analysis. It proposes a root-based state graph search framework combined with an iterative beam search strategy to manage combinatorial explosion and find high-quality solutions.

26