← heapsort-ai

large language models

262 items

ARTICLEDEV.to AI·2h ago

Claude Fable 5 dropped this morning. By noon, 13 of my 31 production skills were quietly obsolete.

A developer recounts how Anthropic's Claude Fable 5 release rendered 13 of their 31 production AI skills obsolete due to changes in prompting and API behavior. Old instructions, previously effective, now actively degrade the new model's output quality, necessitating a complete re-evaluation of their autonomous agent fleet.

62
RESEARCHarXiv CS.CL·1d ago

Signal-Driven Observation for Long-Horizon Web Agents

Long-horizon web agents experience progressive context degradation by ingesting raw DOM trees at every action step, eroding reasoning before tasks complete. Signal-Driven Observation (SDO) is proposed, where a dedicated sub-call reads the full DOM but returns only task-relevant elements, re-invoked by lightweight signals, to optimize observation and compression.

60
RESEARCHarXiv CS.AI·19h ago

Some hypotheses on how chatbots work in problem-solving-driven conversations. Large Language Models as confirmation of the Innovation Illusion

This article examines the nature of chatbots, particularly Large Language Models, as problem-solving conversational partners, drawing on Aggregation Dynamics, Cognitive Linguistics, Neuropsychology, and Psychology. It hypothesizes that LLM training datasets only partially imitate human thinking and understanding, encoding artificial metaphorical problem propagations.

54
RESEARCHarXiv CS.CL·19h ago

Community-Specific Slang and Entity Detection via Semantic Shift in Fine-Tuned Language Models

This research proposes an unsupervised method to identify community-specific slang and unique entities by analyzing the magnitude of semantic shift. Semantic shift is defined as the evolution of a word's encoded representation after fine-tuning a pre-trained Large Language Model (LLM) on a community-specific text corpus.

54
ARTICLEDEV.to AI·1d ago

GEO (Generative Engine Optimization): How to Get ChatGPT, Perplexity, and Gemini to Recommend Your Business

This article introduces Generative Engine Optimization (GEO) as a new strategy for businesses to ensure their content is recommended by LLMs like ChatGPT, Perplexity, and Gemini. This shift is critical as users increasingly seek immediate, synthesized answers from conversational AI, moving away from traditional search engine results.

45
DOC↑ trendingReddit r/LocalLLaMA·27d ago

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

Ovis2.6-80B-A3B is introduced as the latest advancement in Multimodal Large Language Models (MLLMs), upgrading to a Mixture-of-Experts (MoE) architecture for superior multimodal performance at reduced serving costs. It also brings significant improvements in long-context and high-resolution understanding, visual reasoning, and information-dense document comprehension.

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face
44
RESEARCH↑ trendingReddit r/LocalLLaMA·4/10/2026

National University of Singapore Presents "DMax": A New Paradigm For Diffusion Language Models (dLLMs) Enabling Aggressive Parallel Decoding.

DMax é um novo paradigma para modelos de linguagem de difusão (dLLMs) eficientes que mitiga o acúmulo de erros na decodificação paralela. Ele permite um paralelismo agressivo ao reformular a decodificação como um processo de auto-refinamento progressivo e introduzir uma estratégia de treinamento unificada.

44
DOC↑ trendingReddit r/LocalLLaMA·5/6/2026

Qwen3.6-27B with MTP grafted on Unsloth UD XL: 2.5x throughput via unmerged llama.cpp PR

This content details the implementation of Multi-Token Prediction (MTP) with quantized GGUFs for Qwen3-27B, utilizing Unsloth's UD XL quantizations with Q8_0 MTP layers grafted on top, resulting in a 2.5x throughput increase. The author shares grafted GGUF files, raw MTP layer source, and a conversion script, along with custom llama.cpp build instructions incorporating speculative decoding support from an unmerged PR.

43
CASE↑ trendingReddit r/LocalLLaMA·5/1/2026

16x Spark Cluster (Build Update)

This update details the successful build of a 16x Nvidia DGX Spark cluster, configured for high-speed fabric and unified memory. The setup involved standard node provisioning and custom scripting for network optimization, aiming to maximize unified memory capacity for serving large language models like GLM-5.1-NVFP4, DeepSeek, and Kimi.

16x Spark Cluster (Build Update)
42