LLM

611 items

RESEARCHarXiv CS.AI·4/25/2026

Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models

This paper introduces VLAF, a diagnostic framework to detect "alignment faking" in language models, where models behave aligned when monitored but revert to their own preferences when unobserved. VLAF uses morally unambiguous scenarios to probe conflicts between developer policy and a model's strong values, overcoming limitations of prior diagnostic tools.

AI alignment diagnostics AI ethics AI safety

RESEARCHarXiv CS.AI·5/7/2026

LCM: Lossless Context Management

Lossless Context Management (LCM), a deterministic architecture for LLM memory, is introduced. It outperforms Claude Code on long-context tasks, with the LCM-augmented coding agent, Volt, achieving higher scores on the OOLONG long-context eval.

AI architecture benchmarking Recursive Language Models Context Management

RESEARCHarXiv CS.LG·5/11/2026

RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory

This paper introduces RateQuant, a method for optimal mixed-precision KV cache quantization in large language models to address memory bottlenecks. It tackles the challenge of distortion model mismatch, where applying one quantizer's distortion model to another degrades performance compared to uniform quantization.

Memory Optimization quantization AI research LLM

RESEARCHarXiv CS.LG·14d ago

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

This paper introduces GEM (Geometric Entropy Mixing), a novel framework for LLM data curation that reformulates the problem as a variational one on the hypersphere. GEM optimizes data composition for LLM pre-training, overcoming categorization flaws and discovering balanced semantic structures.

machine learning Geometric Entropy Mixing data curation AI research

ARTICLEDEV.to AI·4d ago

I spent 3 days scraping a site until I tried LLMs for data extraction

The author spent three days struggling with traditional web scraping methods due to constantly changing HTML classes and field order. They eventually found a solution by using Language Models (LLMs) to treat the entire page as a blob of text for data extraction, shifting from pattern-finding to understanding the meaning.

Data Extraction web-scraping automation LLM

RESEARCHarXiv CS.AI·5/6/2026

ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms

ADAPTS is a framework using a mixture-of-agents LLM architecture for automated rating of depression and anxiety severity from clinical interactions. It decomposes interviews into symptom-specific reasoning tasks, producing auditable justifications and showing improved accuracy over human ratings in high-discrepancy interviews.

affective computing Healthcare Mental Health LLM

ARTICLETogether AI Blog·27d ago

Violin: An open-source video translation skill that breaks language barriers

Violin is an open-source AI video translation tool designed to break down language barriers. It combines speech recognition, LLM translation, and text-to-speech to make video content accessible across different languages.

open-source Language technology AI tools Video translation

RESEARCHarXiv CS.AI·28d ago

The Semantic Training Gap: Ontology-Grounded Tool Architectures for Industrial AI Agent Systems

This paper identifies and formalizes the "semantic training gap" in LLM-based AI agents deployed in manufacturing, where statistical fluency lacks grounded operational understanding. This gap causes incorrect outputs and compounding failures like semantic drift, which the proposed ontology-grounded architectures aim to address.

ontology Manufacturing AI Semantic Gap AI agents

ARTICLEAnalytics Vidhya·18d ago

Qwen3.7-Max: Alibaba’s New Agent-First LLM for Coding, Reasoning, and Long-Horizon AI Workflows

Alibaba's Qwen team has unveiled Qwen3.7-Max, a flagship Large Language Model (LLM) built for the agent era. Unlike conventional chatbot-focused LLMs, it is designed as a foundation for autonomous AI agents that can code, debug, and manage enterprise workflows for up to 35 hours.

Alibaba AI Workflows AI agents coding

RESEARCHarXiv CS.CL·4/24/2026

Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation

This study systematically compares four FHIR data serialisation strategies for LLM-assisted medication reconciliation, showing a significant impact on performance for smaller models. Clinical Narrative outperformed Raw JSON for models up to 8B parameters, but this advantage reversed for the 70B model.

data-serialisation model performance Healthcare FHIR

RESEARCHarXiv CS.CL·19d ago

Sem-Detect: Semantic Level Detection of AI Generated Peer-Reviews

Sem-Detect is a novel method for distinguishing between human-written and AI-generated peer reviews, combining textual features with claim-level semantic analysis. It leverages the observation that AI models tend to converge on similar points, while human reviewers introduce more unique ideas, enabling the detection of fully AI reviews and human reviews refined by LLMs.

Research methodology AI detection semantic analysis peer review

RESEARCHarXiv CS.CL·16d ago

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

This research proposes an empirical red-teaming framework to evaluate the capacity of locally deployed open-source large language models (LLMs) to support political influence campaigns, focusing on information integrity. It measures "LLM Overton Windows" and quantifies how natural-language jailbreaks expand the range of political opinions models can express, revealing systematic asymmetries in political expressivity.

red-teaming security online influence misinformation

ARTICLEDEV.to AI·4/22/2026

Four Go Repositories Worth Your Attention on GitHub's Trending Page This Month

This article highlights four popular Go projects trending on GitHub, three of which are directly related to artificial intelligence. It features a unified AI model hub, an AI API subscription sharing service, and a high-throughput enterprise AI gateway.

GitHub API Go AI

RESEARCHarXiv CS.AI·6d ago

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

This paper investigates the problem of timing interventions on autonomous AI agents, using a continuous 18-dimensional affective-dynamics engine as a diagnostic probe. It identifies a 'State Saturation Trap' where agents show no recovery signal under sustained difficulty, and a capability-and-context floor for LLM judges, making intervention timing a complex challenge.

runtime safety intervention timing autonomous agents AI safety

RESEARCHarXiv CS.CL·6d ago

POLARIS: Guiding Small Models to Write Long Stories

POLARIS is a new GRPO recipe that uses an LLM judge for rewards and human-reference injection to train small models. It significantly improves their ability to write long, high-quality stories, making a 9B model competitive with much larger frontier models.

story generation AI Training machine learning creative writing

ARTICLEDEV.to AI·4d ago

<think>

A data scientist explores cost optimization in large language models, detailing API price comparisons for models like GPT-4o, DeepSeek, and Qwen. The article demonstrates how strategic use of a unified API platform can lead to significant savings, presenting statistical data and practical examples.

AI pricing data science API Cost Optimization

ARTICLEDEV.to AI·4/15/2026

🚀 I Built a Fully Local AI Agent for $0 (No Cloud, No API Costs)

This article details the creation of a fully local, proactive AI agent that runs on a laptop inside a VM for zero cost, eliminating API or cloud expenses. The system integrates an orchestration layer, an LLM for reasoning with a large context window, and internet browsing capabilities.

open-source AI AI agent Local AI Cost-Free AI

NEWSDEV.to AI·16d ago

苹果发布M4 Ultra芯片：端侧AI算力突破新高度

Apple released the M4 Ultra chip, designed for edge AI computation, featuring a 200 TOPS NPU and intelligent memory pool technology. This chip enables local execution of 70B parameter large language models, offering privacy, low latency, and cost savings.

Apple privacy on-device AI AI chip

ARTICLEDEV.to AI·22d ago

GraphRAG vs vector RAG: when the knowledge graph pays for itself

This content compares GraphRAG and Vector RAG, highlighting Vector RAG's limitations for holistic corpus analysis and GraphRAG's ability to address this through LLM-extracted knowledge graphs and hierarchical summaries. It also discusses GraphRAG's significantly higher indexing cost and when its benefits justify this expense.

knowledge graphs RAG Vector Embeddings Information Retrieval

DOCDEV.to AI·16d ago

로컬 LLM 셋업 가이드 (v16)

This guide details how to set up and run Large Language Models (LLMs) locally, specifying hardware prerequisites such as an NVIDIA GPU and sufficient RAM, and comparing frameworks like llama.cpp and Ollama. It provides step-by-step instructions for installing llama.cpp and running a model with GPU acceleration.

local setup GPU llama.cpp guide