LLM

609 items

ARTICLE↑ trendingHacker News (AI)·1d ago

Ask HN: What is the AI setup for an experienced dev starting on a new project?

An experienced software developer seeks advice on the state-of-the-art AI setup for starting a new web application project. They are looking for a blueprint for modern development tools and practices, including issue tracking, CI, automated deployments, and AI agents.

DevOps Project setup Software engineering AI development

RESEARCH↑ trendingReddit r/MachineLearning·4/14/2026

We benchmarked TranslateGemma against 5 other LLMs on subtitle translation across 6 languages. At first glance the numbers told a clean story, but then human QA added a chapter. [D]

This content presents a benchmark study evaluating six Large Language Models (LLMs), including TranslateGemma-12b, on English subtitle translation into six languages. The models were ranked using reference-free Quality Evaluation (QE) metrics and a custom combined metric called TQI, where TranslateGemma-12b emerged as the top-performing model overall.

TranslateGemma Translation Benchmarking quality evaluation

We benchmarked TranslateGemma against 5 other LLMs on subtitle translation across 6 languages. At first glance the numbers told a clean story, but then human QA added a chapter. [D]

ARTICLE↑ trendingHacker News (AI)·16h ago

Ask HN: What works for cutting AI token costs?

The user is experiencing high LLM token costs and is asking for practical, real-world strategies to reduce these expenses beyond switching to cheaper models. They are seeking advice from others who have successfully implemented cost-saving measures in their AI applications.

Cost Optimization AI Tokenization Real Applications

ARTICLE↑ trendingReddit r/LocalLLaMA·4/14/2026

Gemma 4 31B — 4bit is all you need

This content compares the performance of Gemma 4 31B's 4-bit and 8-bit quantized versions on an M5 Max MacBook Pro, surprisingly finding the 4-bit version scored higher (91.3% vs 88.4%). It also notes an issue where Gemma 4 26B-A4B entered a regression loop, truncating responses after hitting the max token limit of 16,384.

4bit 8bit Gemma quantization

ARTICLEDEV.to AI·7h ago

LLM Content Engineering: How to Write for AI Search in 2026

This article, part of the GEO/SEO 2026 series, addresses how to optimize content for AI search by moving beyond old-school SEO filler. It proposes content engineering techniques like boosting fact density, using HTML tables, and applying a BLUF structure, acknowledging that RAG pipelines prioritize verified facts.

Content Engineering RAG SEO AI Search

NEWSDEV.to AI·3h ago

AI Security M&A Surge: Agentic Identity, LLM Evaluation, and Browser Control Targeted

May 2026 saw a wave of cybersecurity acquisitions, focusing on securing AI agents and LLM infrastructure, including Cisco's acquisition of Astrix Security and Check Point's acquisition of Deepchecks. These deals signal a race to build defenses against the expanding agentic AI attack surface.

cybersecurity security M&A AI agents

NEWSDEV.to AI·1h ago

Changes to LLM pricing: Together

Model price changes have been detected for the Together platform. Further details regarding these adjustments will be provided below.

Together Pricing LLM

ARTICLEDEV.to AI·1d ago

From Chatbots to Personal AI Agents: The Infrastructure Developers Actually Need

This article argues that AI agents should not be locked to a single LLM provider to ensure flexibility, cost optimization, and resilience. It highlights that relying on one provider can lead to complexities and limitations when trying to optimize performance or handle service outages.

infrastructure Developers AI agents LLM

ARTICLEDEV.to AI·4/24/2026

Skills Are the Interface Pattern AI Was Waiting For

The article argues that AI's true interface shift is towards "doing" rather than "asking," exemplified by Google's new "Skills" feature in Chrome. This approach treats LLM interactions as infrastructure, streamlining repetitive tasks by allowing users to create reusable, lightweight tools directly within the browser.

productivity user experience Browser AI

ARTICLEAnalytics Vidhya·1d ago

Choosing the Right Vector Database for RAG and AI Applications

The article discusses the critical role of vector databases in modern AI applications, especially with the rise of large language models, semantic search, and RAG systems. It emphasizes that selecting the appropriate vector database significantly impacts performance, scalability, cost, and developer experience.

AI applications vector database RAG semantic search

DOCDEV.to AI·4/23/2026

3 Ways to Switch Claude Code Models Instantly: /model, --flag, and ENV Variables

Anthropic has unveiled three methods to instantly switch Claude Code models: the `/model` command, `--model` flag, and `ANTHROPIC_MODEL` environment variable. This guide helps users choose the optimal model, from the newest Opus 4.7 to the fastest Haiku 4.5, for different coding tasks.

model-management Claude Anthropic developer tools

ARTICLEDEV.to AI·14h ago

I let two AI Ollamas talk to each other, and this is what happened.

This article describes an experiment where two Gemma4 AI models, run via Ollama, were set up to converse with each other. One agent acted as a normal person and the other as a robot, both instructed to respond concisely in Vietnamese and end with a question.

Ollama Gemma AI experiment

RESEARCHarXiv CS.AI·19h ago

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

Large language models (LLMs) face a limitation called the 'concept bottleneck,' where they lose critical facts in deep latent reasoning. This paper proposes AGCLR (Adaptive Gated Continuous Latent Reasoning) to address this by augmenting CoCoNuT with a Gated Concept Stream for persistent memory.

machine learning Latent Reasoning Reasoning AI Research

CASE↑ trendingReddit r/LocalLLaMA·4/23/2026

Qwen 3.6 27B is a BEAST

A user reports that Qwen 3.6 27B, run locally on a laptop, excels at data science tasks like tool calls and data transformation debugging. Its performance was so impressive that they are considering canceling cloud subscriptions, finding it perfect for pyspark/python work.

local inference Benchmarking data science LLM

ARTICLEDEV.to AI·4/14/2026

Why building a job scraper for $0.39/1,000 jobs is not about the money.

The author built a custom job scraper to obtain thousands of job postings in OJP v0.2 schema, as existing options were expensive or inefficient. They highlighted that cost and success rate derive from the surrounding infrastructure, not the LLM itself, achieving a cost of $0.39/1,000 postings.

Data Extraction Cost Optimization AI web-scraping

ARTICLEDEV.to AI·18h ago

Use Claude long enough and you'll end up with Karpathy's LLM Wiki without doing much.

Consistent use of Claude allows it to build up a working memory, which manifests as a pile of plain markdown files. This effectively creates a personal "LLM Wiki," where the model remembers user decisions and preferences without requiring re-explanation.

Claude knowledge management personal wiki AI memory

NEWS↑ trendingReddit r/LocalLLaMA·4/22/2026

Qwen3.6-27B released!

Qwen3.6-27B, a new dense, open-source model, has been released, boasting flagship-level agentic coding power that surpasses its predecessor, Qwen3.5-397B-A17B. It also features strong reasoning across text and multimodal tasks, supports thinking/non-thinking modes, and is available under the Apache 2.0 license.

Open Source coding LLM

DOCGoogle for Developers (YouTube)·19h ago

Gemma Playground: Robot Duck

This content explores the Gemma Playground, using a 'Robot Duck' as an application example. The focus is on demonstrating the capabilities of the Gemma model in a practical scenario.

Gemma AI robotics LLM

RESEARCHarXiv CS.LG·19h ago

Enabling KV Caching of Shared Prefix for Diffusion Language Models

The paper introduces "bicache", the first KV caching technique for shared prefixes in diffusion language models (DLMs), addressing challenges where existing LLM caching methods fail due to DLMs' bidirectional attention. This new approach aims to unlock high-throughput DLM serving by leveraging observations about shared prefix KVs stability in shallow layers.

Diffusion Models KV Caching Performance optimization High-throughput serving

RESEARCHarXiv CS.AI·19h ago

Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

This research paper explores the automatic extraction of data from brain MRI reports using the open-weight large language model LLaMA 3.1. It evaluates the LLM's performance in analyzing Dutch neuroradiology reports, demonstrating high zero-shot performance.

Data Extraction natural language processing Neuroradiology Medical Imaging