← heapsort-ai

machine learning

790 items

ARTICLEDEV.to AI·4/10/2026

Static Agents Are Already Legacy Code

Agentes estáticos com pesos congelados e prompts fixos são considerados código legado, pois se afastam da realidade e descartam o aprendizado de interações. ALTK-Evolve da IBM Research propõe agentes que aprendem em tempo de execução, adaptando-se a resultados de tarefas e feedback, o que é essencial para lidar com casos de borda e mudanças em fluxos de trabalho.

26
ARTICLEDEV.to AI·4/13/2026

From $0 to First Sales Call: Building ThumbGate in Public

ThumbGate introduces a system for AI coding agents to prevent errors by generating "PreToolUse" gates, with Thompson Sampling adapting their confidence. The creator describes building and marketing the product rapidly, detailing features like self-distillation and SQL protection, and noting the first sales contact originated from a GitHub issue rather than extensive social media efforts.

26
NEWSDEV.to AI·4/10/2026

Claude Office Copilot, CoreWeave Cloud, and Models That Slim Themselves

O mundo da IA está mais prático esta semana: o Claude da Anthropic está sendo integrado ao Microsoft Office, e uma nova técnica permite que modelos de IA otimizem suas arquiteturas durante o treinamento, reduzindo custos e latência. Paralelamente, o PyTorch expande suas ferramentas para desenvolvedores e uma nova ferramenta de IA para criação de visuais de redes sociais foi lançada.

26
ARTICLELangChain Blog·4/5/2026

Continual learning for AI agents

This content discusses continual learning for AI agents, proposing that learning extends beyond just updating model weights. It introduces three distinct layers where learning can occur – the model, the harness, and the context – emphasizing how this perspective changes the approach to building improving AI systems.

Continual learning for AI agents
26
ARTICLEDEV.to AI·14d ago

Understanding Reinforcement Learning with Human Feedback Part 6: How the Reward Model Trains the Original Model

This article, part of a series on Reinforcement Learning with Human Feedback (RLHF), details how a pre-trained reward model is leveraged to train an original AI model. It explains that new prompts are used, the original model generates responses, and the reward model provides feedback signals, allowing the original model to learn to generate more helpful and human-aligned outputs.

24