← heapsort-ai

LLMs

722 items

ARTICLEDEV.to AI·4/22/2026

Why LoRA? Understanding the representative PEFT

LoRA (Low-Rank Adaptation) is introduced as the leading PEFT method, enabling efficient adaptation of massive LLMs like Llama 3 without requiring extensive hardware resources. The post promises to delve into LoRA's mathematical intuition, the concept of "intrinsic dimension," and its game-changing impact for AI engineers.

27
ARTICLEDEV.to AI·4/12/2026

Serverless Memory DBs for AI Agents in 2025

The content analyzes the lack of memory in AI agents as an architectural, not data, problem, noting that the community is developing solutions. It proposes serverless memory databases to decouple storage from inference, allowing LLMs to focus on reasoning, while criticizing the inefficiency of inserting context into prompts.

27
ARTICLEDEV.to AI·5/9/2026

Future of AI Agents in Agentic AI

Agentic AI refers to artificial intelligence systems capable of acting autonomously, making decisions, and carrying out tasks without constant human intervention. Powered by large language models and sophisticated tool-use frameworks, these AI agents are considered the next big thing in the field.

27
RESEARCHarXiv CS.LG·4/20/2026

The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference

This research reveals that KV caching in autoregressive transformer inference, under standard FP16 precision, causes a systematic divergence in decoded token sequences due to different floating-point accumulation orders. Across LLaMA-2-7B, Mistral-7B, and Gemma-2-2B, a 100% token divergence rate was observed, with cache-ON often leading to higher accuracy.

27
RESEARCHarXiv CS.LG·4/20/2026

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

This research introduces sequential KV compression, a novel two-layer architecture for transformer key-value caches that surpasses the per-vector Shannon limit. It leverages the sequential nature of KV cache tokens, using probabilistic prefix deduplication with language tries and predictive delta coding to achieve more efficient compression.

27
RESEARCHarXiv CS.AI·4/16/2026

SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications

This work introduces SciFi, a safe, lightweight, and user-friendly agentic framework for the autonomous execution of scientific tasks. It combines an isolated environment, a three-layer agent loop, and a self-assessing mechanism to ensure reliable operation, leveraging LLMs to automate routine scientific workloads and free researchers for creative activities.

27
ARTICLEDEV.to AI·4/22/2026

RAG: How AI Models Use Your Data Without Forgetting

Large language models are inherently stateless, lacking memory of past conversations or access to up-to-date or private data due to training limitations. Retrieval Augmented Generation (RAG) addresses this by introducing a retrieval step, allowing models to access external information and act as a reasoning engine over that data.

27
ARTICLEDEV.to AI·4/22/2026

One Open Source Project a Day (No. 45): Browser Harness - A Lightweight Bridge Giving AI Agents "Hands" and "Eyes"

Browser Harness is a lightweight open-source project designed to enable AI agents to interact with browsers efficiently and cost-effectively, overcoming the limitations of traditional automation tools like Playwright or Selenium. It achieves this by directly bridging to the Chrome DevTools Protocol, encouraging agents to write and modify their own helper functions in real-time.

27