← heapsort-ai

prompt engineering

251 items

ARTICLEDEV.to AI·22h ago

Claude Fable 5 dropped this morning. By noon, 13 of my 31 production skills were quietly obsolete.

A developer recounts how Anthropic's Claude Fable 5 release rendered 13 of their 31 production AI skills obsolete due to changes in prompting and API behavior. Old instructions, previously effective, now actively degrade the new model's output quality, necessitating a complete re-evaluation of their autonomous agent fleet.

62
RESEARCH↑ trendingReddit r/MachineLearning·4/9/2026

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

Este estudo aborda erros de Tipo II na classificação de tarefas por LLMs, onde prompts aparentemente simples exigem compreensão profunda. A pesquisa demonstrou que prompts de exploração aberta ("What's really going on here?") reduzem significativamente esses erros em comparação com prompts de extração direta.

45
CASE↑ trendingReddit r/LocalLLaMA·4/23/2026

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane

The user reports an extremely positive and effective experience with the PI Coding Agent, utilizing a local Qwen3.6 35b model for production projects. Success was attributed to a custom "plan-first skill file" that enforces a structured planning workflow, ensuring step-by-step execution and plan approval before any coding.

42
ARTICLEDEV.to AI·4/22/2026

Eval workflow for agentic builders: fork any prompt through baseline vs scaffolded agents, blind third-party judge.

A solo founder built an n8n eval workflow for AI agents, A/B testing prompts with plain GPT-4o versus GPT-4o with a reasoning scaffold, using a blind Gemini evaluator. This tool allows builders to test agent performance on their own tasks, focusing on how scaffolding affects depth, sycophancy, and diagnostic procedures.

35
ARTICLEDEV.to AI·4/22/2026

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline

This article critiques the common practice of feeding raw, unformatted data directly into AI prompts, leading to exorbitant costs and poor agent performance. It illustrates how a junior developer's approach caused an AI agent to endlessly loop while attempting to parse malformed JSON, underscoring the need for proper data engineering rather than using LLMs as parsers.

34