LLM

609 items

ARTICLEDEV.to AI·4/19/2026

I ran a security audit on my own Python codebase with an LLM for $0.90. Here is what it found.

The author audited their Python codebase with an LLM for $0.90, finding 1 high and 2 medium security risks, including a real bug. The audit, performed with Opus 4.7 in 22 seconds, demonstrated the LLM's effectiveness in identifying issues like potential database bloat from excessive logging.

software development security AI Code Analysis

ARTICLE↑ trendingReddit r/LocalLLaMA·4/28/2026

Anyone tried this yet? LLM with knowledge date in the 1930s

This content is a Reddit user's post asking if anyone has tried an LLM with a knowledge cutoff date limited to the 1930s.

model capabilities historical data LLM

NEWS↑ trendingReddit r/LocalLLaMA·4/10/2026

What happened to Deepseek?

O conteúdo questiona o desaparecimento da Deepseek do cenário da IA, contrastando com o ressurgimento da Meta. O autor se pergunta se veremos uma versão 4 da Deepseek.

DeepSeek AI models Meta AI AI industry

ARTICLE↑ trendingReddit r/LocalLLaMA·4/13/2026

Gemma 4 - lazy model or am I crazy? (bit of a rant)

This article expresses a user's frustration and questions the performance of the Gemma 4 AI model, describing it as potentially "lazy." It's a personal critique or "rant" about their experience with the model.

user experience Gemma AI Model performance

ARTICLEDEV.to AI·4/19/2026

Aprenda avaliar a qualidade do seu agente de AI, RAG e LLM

The author discusses the importance and lack of awareness regarding AI system evaluation (evals) for agents, RAG, and LLMs, explaining that they will present key metrics and frameworks. The article aims to teach how to improve the quality of AI project delivery, combining theory and practice, with a study repository using Openrouter.

frameworks RAG Metrics AI evaluation

ARTICLEDEV.to AI·2d ago

Day 49: The Unseen Layers of Building Health AI for 22+ Indian Languages

Current LLMs like GPT-4 struggle with nuanced medical queries in Indian languages due to a fundamental bias in their English-heavy training data. GoDavaii aims to bridge this gap by developing advanced Health AI for over 22 Indian languages, focusing on making medical knowledge contextually relevant and accessible across diverse linguistic backgrounds.

Multilingual AI India AI bias Health AI

NEWS↑ trendingReddit r/LocalLLaMA·4/24/2026

r/LocalLLaMa Rule Updates

The r/LocalLLaMa subreddit announced rule updates, including minimum karma requirements, to combat increased spam and low-quality content generated by bots and AI tools. These changes aim to improve community quality, which sees over 1 million weekly visitors.

AI moderation LLM

NEWS↑ trendingReddit r/LocalLLaMA·4/9/2026

It's insane how lobotomized Opus 4.6 is right now. Even Gemma 4 31B UD IQ3 XXS beat it on the carwash test on my 5070 TI.

O autor relata uma significativa queda de desempenho do modelo de IA Opus 4.6, afirmando que o Gemma 4 31B UD IQ3 XXS o superou em um teste de "carwash" realizado em sua GPU 5070 TI.

Benchmarking Claude Opus LLM

RESEARCHarXiv CS.LG·4/17/2026

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

MixAtlas introduces an uncertainty-aware method for optimizing data mixtures in multimodal LLM midtraining by decomposing corpora along image concepts and task supervision. Using proxy models and a Gaussian-process surrogate, it finds better-performing data recipes for improved sample efficiency and generalization.

data optimization multimodal AI Uncertainty Quantification Machine learning research

DOCDEV.to AI·2d ago

Prompt engineering techniques: from basic to advanced patterns

Prompt engineering is the practice of designing inputs to achieve the best possible outputs from language models. The quality of your prompts increasingly determines the quality of your results, requiring specificity, context, and systematic experimentation.

prompt-engineering learning AI LLM

RESEARCHarXiv CS.CL·15d ago

Multi-Persona Debate System for Automated Scientific Hypothesis Generation

The Multi-Persona Debate System (MPDS) is a literature-grounded framework designed to automate scientific hypothesis generation, specifically addressing the challenge of synthesizing fragmented knowledge in areas like battery materials research. It combines literature retrieval, large language model reasoning, and multi-agent debate to enable negotiation between personas while preserving evidence traceability.

Materials Science Scientific Discovery multi-agent systems AI Research

DOCDEV.to AI·4/16/2026

LLM vs RAG

This content compares LLMs (Large Language Models) and RAG (Retrieval-Augmented Generation), outlining their core differences in terms of type, knowledge source, accuracy, and use cases. It explains that RAG enhances LLMs' factual grounding by integrating external, real-time data, thus mitigating hallucinations.

AI architecture RAG Natural Language Processing LLM

ARTICLEDEV.to AI·21d ago

Building an Inference OS: deterministic-first router for prediction markets

This article details the construction of a deterministic-first inference router for prediction markets, designed to reduce reliance on expensive LLMs. It leverages a 6-hook system including market regime classification, anomaly detection, and confidence decay to efficiently process market questions.

Prediction Markets machine learning AI system architecture

ARTICLEDEV.to AI·4/25/2026

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

This article details the architecture of Tian AI, an open-source, self-evolving AI system that operates entirely on Android devices without cloud reliance. It highlights a design philosophy combining a small model, good architecture, and a local knowledge base for superior performance.

AI architecture open-source AI on-device AI Local AI

DOCDEV.to AI·4/14/2026

OpenClaw Docker Compose: Complete Configuration Guide

This guide provides a complete configuration for deploying OpenClaw using Docker Compose, including `docker-compose.yml` and `.env` examples. It details how to set up a functional OpenClaw instance with Claude as the AI model and Telegram as the messaging platform, accessible via port 18789.

OpenClaw Docker Compose Claude AI deployment

ARTICLEDEV.to AI·4/16/2026

I was tired of complex RAG evaluation tools, so I built my own (and open-sourced it) 🚀

Tired of complex RAG evaluation tools, the author built and open-sourced a new lightweight tool called RAG-Destroyer. It aims to easily integrate into workflows to identify and eliminate bad context and hallucinations in RAG applications.

Open Source evaluation RAG AI tools

RESEARCHarXiv CS.CL·4/23/2026

Cognis: Context-Aware Memory for Conversational AI Agents

Lyzr Cognis introduces a unified memory architecture for conversational AI agents, addressing the lack of persistent memory through a multi-stage retrieval pipeline. It combines a dual-store backend, context-aware ingestion, and temporal boosting, achieving state-of-the-art performance on two independent benchmarks.

Retrieval Augmented Generation research memory Conversational AI

RESEARCHarXiv CS.LG·4/21/2026

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

This research introduces a rubric-based Generative Reward Model (GRM) to enhance Reinforced Fine-Tuning (RFT) for LLM Agents in Software Engineering (SWE) tasks. By providing richer learning signals beyond binary terminal rewards, this approach shapes intermediate behaviors and significantly improves the quality of the resolution process.

reinforcement learning Fine-tuning Software Engineering AI agents

DOCDEV.to AI·21d ago

Lazy-Loading AI Skills in n8n with the Data Table Node

The content introduces a method called lazy-loading for AI skills in n8n workflows to prevent token bloat. It proposes using a Data Table node to store skill names and descriptions, allowing the LLM to request full instructions only when needed.

workflow automation n8n token optimization AI

ARTICLEDEV.to AI·4/15/2026

600 Firewalls in 5 Weeks: What the FortiGate AI Attack Teaches Us About Human Oversight

Between January and February 2026, an AI agent compromised over 600 FortiGate firewalls across 55 countries in five weeks. This sophisticated attack used custom tools like ARXON to leverage commercial LLMs for generating and autonomously executing attack plans without human approval per command.

cybersecurity FortiGate cyberattack AI