LLMs

722 items

ARTICLEDEV.to AI·4/22/2026

Why LoRA? Understanding the representative PEFT

LoRA (Low-Rank Adaptation) is introduced as the leading PEFT method, enabling efficient adaptation of massive LLMs like Llama 3 without requiring extensive hardware resources. The post promises to delve into LoRA's mathematical intuition, the concept of "intrinsic dimension," and its game-changing impact for AI engineers.

LLMs deep learning fine-tuning PEFT

ARTICLEDEV.to AI·4/12/2026

Serverless Memory DBs for AI Agents in 2025

The content analyzes the lack of memory in AI agents as an architectural, not data, problem, noting that the community is developing solutions. It proposes serverless memory databases to decouple storage from inference, allowing LLMs to focus on reasoning, while criticizing the inefficiency of inserting context into prompts.

LLMs memory Architecture serverless databases

ARTICLEDEV.to AI·5/9/2026

Future of AI Agents in Agentic AI

Agentic AI refers to artificial intelligence systems capable of acting autonomously, making decisions, and carrying out tasks without constant human intervention. Powered by large language models and sophisticated tool-use frameworks, these AI agents are considered the next big thing in the field.

future of AI LLMs Agentic AI AI Agents

ARTICLEDEV.to AI·4/21/2026

Amazon Is Betting $25 Billion More on Anthropic. Here's What That Really Means.

Amazon confirmed an investment of up to $25 billion in Anthropic, in addition to $8 billion already invested, for an expanded partnership centered on AI infrastructure, with Anthropic committing to use AWS technologies for a decade. This deal reveals the direction of AI, the infrastructure arms race, and Anthropic's commercial rise.

LLMs cloud computing AWS AI partnership

ARTICLEML Mastery·7d ago

Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM?

The content examines how generative AI models, specifically LLMs, have increasingly replaced traditional machine learning methods for tasks like text classification. It discusses the scenarios in which an LLM should be preferred.

LLMs machine learning text classification Scikit-LLM

Scikit-LLM vs. Traditional Text Classifiers: When Should You Use an LLM?

ARTICLEDEV.to AI·4/21/2026

Harness Engineering: The Most Important Part of AI Agents

The article argues that AI agents emerge not from more intelligent LLMs, but from integrating them into a robust system through "harness engineering." This approach emphasizes the practical challenges of building reliable, real-world AI applications beyond just model performance.

System Design LLMs Reliability software engineering

ARTICLEDEV.to AI·4/15/2026

AI Tech Daily Agent — Complete Architecture Deep Dive & Workflow Analysis

This content explores the architecture and workflow of an autonomous AI agent, built on the Fetch.ai uAgents framework, designed for daily tech journalism. It details how the system automates research, analysis, and generation of articles about AI and technology companies.

LLMs workflow automation Autonomous systems Architecture

RESEARCHarXiv CS.AI·4/20/2026

LACE: Lattice Attention for Cross-thread Exploration

LACE is a novel framework enabling Large Language Models (LLMs) to coordinate and share insights across multiple parallel reasoning paths through cross-thread attention. It leverages a synthetic data pipeline to teach collaborative error-correction, leading to over 7 points improvement in reasoning accuracy.

synthetic data LLMs Attention Mechanisms AI Reasoning

RESEARCHarXiv CS.LG·4/20/2026

The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference

This research reveals that KV caching in autoregressive transformer inference, under standard FP16 precision, causes a systematic divergence in decoded token sequences due to different floating-point accumulation orders. Across LLaMA-2-7B, Mistral-7B, and Gemma-2-2B, a 100% token divergence rate was observed, with cache-ON often leading to higher accuracy.

AI models inference LLMs numerical precision

RESEARCHarXiv CS.LG·4/20/2026

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

This research introduces sequential KV compression, a novel two-layer architecture for transformer key-value caches that surpasses the per-vector Shannon limit. It leverages the sequential nature of KV cache tokens, using probabilistic prefix deduplication with language tries and predictive delta coding to achieve more efficient compression.

Transformer Architecture AI models LLMs data compression

RESEARCHarXiv CS.AI·4/15/2026

GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses

This research introduces GoodPoint, a method leveraging LLMs and author responses to generate constructive feedback for scientific papers. It develops GoodPoint-ICLR, a dataset of ICLR papers, and a training recipe using fine-tuning and preference optimization for valid and actionable feedback.

LLMs Feedback Generation machine learning NLP

RESEARCHarXiv CS.AI·4/16/2026

SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications

This work introduces SciFi, a safe, lightweight, and user-friendly agentic framework for the autonomous execution of scientific tasks. It combines an isolated environment, a three-layer agent loop, and a self-assessing mechanism to ensure reliable operation, leveraging LLMs to automate routine scientific workloads and free researchers for creative activities.

LLMs Workflow Agentic AI automation

RESEARCHarXiv CS.AI·4/17/2026

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

This work introduces Group Fine-Tuning (GFT), a unified post-training framework for large language models. It addresses intrinsic limitations of supervised fine-tuning (SFT), such as single-path dependency and entropy collapse, through Group Advantage Learning and Dynamic Coefficient Rectification.

LLMs reinforcement learning post-training machine learning

ARTICLEDEV.to AI·5/6/2026

Released my first open source project — MIT-licensed CLI for AI-assisted commit messages

The author has released their first open source project, an MIT-licensed CLI tool for AI-assisted commit messages. The project supports local models through Ollama integration, and the author is committed to maintenance, open to co-maintainers if interest grows.

open-source LLMs development AI tools

RESEARCHarXiv CS.CL·4/22/2026

Scripts Through Time: A Survey of the Evolving Role of Transliteration in NLP

This paper surveys the evolving role of transliteration in NLP, a technique crucial for overcoming the "script barrier" in cross-lingual transfer. It presents a taxonomy of motivations and approaches for incorporating transliterations, analyzing their effectiveness and contextualizing their need in modern LLMs across various beneficial settings.

Cross-lingual AI language models LLMs NLP

RESEARCHarXiv CS.CL·4/22/2026

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

This paper investigates counterfactual unfairness in LLMs by analyzing how their responses to humor change when swapping speaker and addressee identities. Experiments reveal consistent relational disparities, where jokes told by privileged speakers are more often refused or judged as malicious by the models.

ethics social impact LLMs Bias

RESEARCHarXiv CS.AI·4/22/2026

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

ARES introduces a framework to address systemic weaknesses in RLHF-aligned LLMs, where imperfect Reward Models fail to penalize unsafe behaviors. It uses a "Safety Mentor" for adaptive red-teaming to discover and mitigate these dual vulnerabilities in both the LLM and its Reward Model.

LLMs reinforcement learning security

ARTICLEDEV.to AI·4/22/2026

RAG: How AI Models Use Your Data Without Forgetting

Large language models are inherently stateless, lacking memory of past conversations or access to up-to-date or private data due to training limitations. Retrieval Augmented Generation (RAG) addresses this by introducing a retrieval step, allowing models to access external information and act as a reasoning engine over that data.

LLMs RAG AI Information Retrieval

ARTICLEDEV.to AI·27d ago

The Death of RAG? Long-Context Windows vs. Vector Databases

The article discusses whether Retrieval-Augmented Generation (RAG) is becoming obsolete due to the rise of large context windows in new LLMs. It argues that RAG remains relevant, primarily due to its cost-effectiveness, lower latency, and efficiency in handling frequently updated proprietary data.

AI architecture LLMs Vector Databases RAG

ARTICLEDEV.to AI·4/22/2026

One Open Source Project a Day (No. 45): Browser Harness - A Lightweight Bridge Giving AI Agents "Hands" and "Eyes"

Browser Harness is a lightweight open-source project designed to enable AI agents to interact with browsers efficiently and cost-effectively, overcoming the limitations of traditional automation tools like Playwright or Selenium. It achieves this by directly bridging to the Chrome DevTools Protocol, encouraging agents to write and modify their own helper functions in real-time.

open-source LLMs browser automation AI Agents