code generation

107 items

ARTICLE↑ trendingReddit r/LocalLLaMA·4/23/2026

Qwen3.6 can code

A user, frustrated with OpenAI models, tried Qwen3.6-27b for Svelte 5 code generation and got a perfect result, despite it taking longer. They anticipate interesting developments in the next 12 months, despite the informal nature of the evaluation.

AI models Model Evaluation code generation

CASE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen3.6. This is it.

A user recounts their experience with the Qwen3.6 model, which successfully built and tested a tower defense game, demonstrating the ability to identify and fix its own bugs. The AI confirmed builds using screenshots, astonishing the user with its advanced capabilities.

game development code generation AI programming Qwen

RESEARCH↑ trendingReddit r/MachineLearning·5/4/2026

AutoBe benchmark: structured harness narrows frontier-vs-local gap in backend generation [D]

AutoBe is a new benchmark for end-to-end backend generation, where natural language requests produce six structured outputs via structured function calls. The benchmark reveals that backend quality is more influenced by harness design than model prestige, with local models performing comparably to frontier models at a significantly lower cost.

AI models Benchmarking code generation backend development

RESEARCH↑ trendingReddit r/MachineLearning·4/27/2026

Qwen3 4B outperforms cloud agents on code tasks—with Mahoraga research [R]

The author introduces Mahoraga, an open-source orchestrator that efficiently routes tasks between local and cloud AI agents using a contextual bandit (LinUCB). Developed from personal experience with cloud credit limitations, the tool optimizes AI usage, highlighting Qwen3 4B's strong performance on code tasks.

Open Source orchestration machine learning code generation

Qwen3 4B outperforms cloud agents on code tasks—with Mahoraga research [R]

ARTICLE↑ trendingHacker News (AI)·11d ago

Flathub disallows AI-assisted code and documentation

Flathub has introduced a policy that prohibits AI-assisted code and documentation in its contributions. This measure aims to maintain human authorship and quality in software development.

Open Source documentation AI policy code generation

RESEARCH↑ trendingReddit r/MachineLearning·5/7/2026

META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet?

Meta Superintelligence Lab introduces ProgramBench, an initiative testing the ability of advanced AIs to recreate executable programs like ffmpeg and SQLite from scratch, without internet access. This study aims to explore the limits of AI code generation. The research focuses on evaluating the autonomy and completeness of AI models in complex software synthesis.

program synthesis code generation Benchmarks AI programming

RESEARCH↑ trendingReddit r/LocalLLaMA·4/28/2026

Local model on coding has reached a certain threshold to be feasible for real work

Open-weight 27B–32B code models, such as Qwen 3.6-27B, achieved a 38.2% success rate on Terminal-Bench 2.0 for coding tasks under standard constraints. The focus is on the feasibility of local models and significant inference speed improvements offered by MOE architectures.

AI models open-source AI Benchmarking code generation

Local model on coding has reached a certain threshold to be feasible for real work

RESEARCH↑ trendingReddit r/LocalLLaMA·5/1/2026

Qwen 3.6 27B vs Gemma 4 31B - making Packman game!

A local LLM gamedev contest compared Qwen 3.6 27B and Gemma 4 31B in creating a Pac-Man game. Gemma 4 31B was the clear winner, producing stronger game logic and higher quality in much less time, despite Qwen generating more tokens.

code generation model comparison benchmark LLM

Qwen 3.6 27B vs Gemma 4 31B - making Packman game!

ARTICLE↑ trendingReddit r/LocalLLaMA·26d ago

I Let a Small Model Train on Its Own Mistakes. It Reached 80% on HumanEval and Beat GPT-3.5 on Math

An experiment showed that a small AI model can train itself to code by inventing problems, solving them, and fine-tuning on its own corrections. The model achieved 80% on HumanEval and outperformed GPT-3.5 on math, using only a Python interpreter as the judge.

self-correction AI training Benchmarking code generation

I Let a Small Model Train on Its Own Mistakes. It Reached 80% on HumanEval and Beat GPT-3.5 on Math

CASE↑ trendingReddit r/LocalLLaMA·4/23/2026

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane

The user reports an extremely positive and effective experience with the PI Coding Agent, utilizing a local Qwen3.6 35b model for production projects. Success was attributed to a custom "plan-first skill file" that enforces a structured planning workflow, ensuring step-by-step execution and plan approval before any coding.

LLMs prompt engineering workflow automation code generation

ARTICLE↑ trendingHacker News (AI)·11d ago

When AI starts writing systems code

This article explores the emerging landscape where artificial intelligence begins to develop systems code. It discusses the implications and future of programming as AI tools become more proficient.

Software Development code generation AI Programming

ARTICLE↑ trendingReddit r/LocalLLaMA·4/19/2026

Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar?

A user is attempting to perform real coding tasks with Qwen3.6-35B on a 32GB M2 Macbook Pro, encountering memory exhaustion and context window management issues. Despite the model identifying the essence of a bug, it struggles with implementation as critical information is lost during context compaction.

LLMs open-source AI local inference code generation

ARTICLE↑ trendingReddit r/LocalLLaMA·4/24/2026

DeepSeek-v4 has a comical 384K max output capability

A user expresses shock at DeepSeek-v4's 384K max output capability, successfully generating a comprehensive single-HTML web OS in a 100KB file. This impressive functionality showcases the model's potential for generating extensive and complex content.

DeepSeek AI models code generation large language models

DeepSeek-v4 has a comical 384K max output capability

ARTICLEDEV.to AI·4/22/2026

Cursor Rules for Vue.js: Composition API Patterns That Scale

This article discusses how AI assistants like Cursor or Claude often generate suboptimal Vue.js code due to outdated training data, mixing old and new API patterns. It proposes enforcing specific, modern Vue 3 Composition API patterns via repo-checked rules like `.cursorrules` to ensure code quality and scalability, rather than relying solely on prompts.

Vue.js code generation best practices AI development

ARTICLEDEV.to AI·3d ago

Yapay Zeka ile Kod Yazmanın En İyi Araçları

The article introduces top AI-powered tools like GitHub Copilot, Tabnine, and OpenAI Codex that assist in writing code. These tools accelerate software development by providing code suggestions and converting natural language into programming code.

Software Development AI coding code generation AI tools

RESEARCHarXiv CS.AI·5d ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

StepPRM-RTL is a novel framework that enhances LLM-based RTL code generation by combining stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT). It uses dense feedback from a PRM to guide reinforcement-style updates and Monte Carlo Tree Search (MCTS) to enrich the training dataset.

LLMs reinforcement learning code generation RTL Synthesis

RESEARCHDEV.to AI·4/21/2026

We Ran 52 AI Coding Benchmarks. Here's Every Uncomfortable Thing We Found.

This study ran 52 AI coding benchmarks, finding that the biggest variable in AI-assisted development is the initial brief, not the model or tool. A structured brief (CONTRACT.md) reduces costs by 54% and boosts quality from 5/10 to 9/10, while agent teams and retry loops proved costly or detrimental.

prompt engineering Benchmarking code generation developer tools

ARTICLEDeepLearning.AI (YouTube)·19d ago

AI Dev 26 x SF | Tom Howlett: Can LLMs Generate Enterprise Quality Code?

This content explores the critical question of whether Large Language Models (LLMs) are capable of producing code with the quality required for enterprise environments. Tom Howlett investigates the challenges and capabilities of these technologies in enterprise-grade software development.

LLMs Software Development code generation AI development

AI Dev 26 x SF | Tom Howlett: Can LLMs Generate Enterprise Quality Code?

ARTICLEDEV.to AI·4/22/2026

My Junior Can Explain It. My Senior Can Defend It. The AI Just... Did It.

A developer recounts using GitHub Copilot for a small code change, which resulted in 12 test failures without any explanation. The anecdote, from over a year ago, highlights the limitations of AI code generation at the time in terms of reliability and traceability.

Software Development Testing Reliability code generation

ARTICLEDEV.to AI·4/23/2026

Stop getting generic output from Copilot. Teach it your patterns.

The article addresses the issue of AI copilots generating generic code, leading to inconsistent codebases over time. It introduces 'Agent Skills' as Markdown files designed to provide persistent context about team conventions, helping Copilot generate more specific and aligned output.

Copilot code generation Customization AI