reinforcement learning

154 items

RESEARCHTogether AI Blog·3/31/2026

Aurora

Aurora is an open-source RL framework designed to self-improve speculative decoding, learning from every served request. It achieves a 1.25x performance increase over well-trained static speculators.

open-source AI Framework reinforcement learning Performance Improvement

DOCHugging Face (YouTube)·4/22/2026

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

This workshop offers a deep dive into training AI agents using Reinforcement Learning (RL) principles. It specifically focuses on leveraging open-source tools and techniques for practical agent development.

open-source reinforcement learning learning Training

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RESEARCHQwen Blog·3/5/2025

QwQ-32B: Embracing the Power of Reinforcement Learning

O conteúdo aborda o potencial do Aprendizado por Reforço (RL) em escala para aprimorar o desempenho e as capacidades de raciocínio de modelos de IA, superando métodos convencionais. A pesquisa explora especificamente o impacto do RL na inteligência de Grandes Modelos de Linguagem (LLMs), citando exemplos como o DeepSeek R1.

model performance deep learning reinforcement learning large language models

RESEARCHQwen Blog·7/27/2025

GSPO: Towards Scalable Reinforcement Learning for Language Models

O Reinforcement Learning é crucial para escalar modelos de linguagem, mas algoritmos existentes sofrem de instabilidade e colapso do modelo. Para resolver isso e permitir o escalonamento bem-sucedido, propõe-se o algoritmo Group Sequence Policy Optimization (GSPO).

scalability Policy optimization language models reinforcement learning

ARTICLEHugging Face Blog·3/10/2026

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

This content explores valuable lessons derived from an analysis of 16 open-source Reinforcement Learning (RL) libraries. It aims to provide insights for practitioners and developers working with RL frameworks.

open-source AI Libraries reinforcement learning Machine Learning

DOCStatQuest (YouTube)·5/5/2025

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

This content clearly explains Reinforcement Learning with Human Feedback (RLHF), a crucial technique used to align large language models with human preferences. It details how human input helps fine-tune AI models for better performance and safety.

reinforcement learning learning RLHF AI Explanation

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

DOCStatQuest (YouTube)·4/7/2025

Reinforcement Learning with Neural Networks: Essential Concepts

This content covers the essential concepts of Reinforcement Learning, focusing on its integration with Neural Networks. It serves as a foundational guide for understanding this area of artificial intelligence.

neural networks reinforcement learning learning

Reinforcement Learning with Neural Networks: Essential Concepts

RESEARCHStatQuest (YouTube)·4/14/2025

Reinforcement Learning with Neural Networks: Mathematical Details

This content delves into the mathematical details of reinforcement learning when combined with neural networks. It explores the theoretical foundations and algorithms involved in this area of artificial intelligence.

neural networks reinforcement learning Machine Learning mathematics

Reinforcement Learning with Neural Networks: Mathematical Details

DOCStatQuest (YouTube)·3/31/2025

Reinforcement Learning: Essential Concepts

This content covers the essential concepts of Reinforcement Learning, a fundamental area of artificial intelligence. It serves as a guide to understanding the basic principles.

reinforcement learning learning Machine Learning AI

Reinforcement Learning: Essential Concepts

RESEARCHarXiv CS.AI·4/6/2026

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Este conteúdo descreve o projeto GrandCode, uma iniciativa de inteligência artificial que visa alcançar o nível de grandmaster em programação competitiva. Para isso, o sistema utiliza uma abordagem de aprendizado por reforço agêntico.

reinforcement learning Grandmaster AI competitive programming Agentic AI

ARTICLEDEV.to AI·15d ago

Understanding Reinforcement Learning with Human Feedback Part 6: How the Reward Model Trains the Original Model

This article, part of a series on Reinforcement Learning with Human Feedback (RLHF), details how a pre-trained reward model is leveraged to train an original AI model. It explains that new prompts are used, the original model generates responses, and the reward model provides feedback signals, allowing the original model to learn to generate more helpful and human-aligned outputs.

reinforcement learning learning AI training Machine Learning

NEWSDEV.to AI·4/14/2026

AI Contract Closing, RL Hunting & Fractal Analytics

Sovereign Node Omega v10087.0 unifies WiGLE RF telemetry, Copernicus CDSE fractal analysis, and RL bug bounty hunting into a single edge-quantized Termux node. Led by Samuel James Hiotis, this project aims to integrate advanced AI and data analysis in a unified edge environment.

reinforcement learning Telemetry Fractal Analysis AI

NEWSQwen Blog·7/24/2025

Qwen-MT: Where Speed Meets Smart Translation

A Qwen-MT introduz a atualização qwen-mt-turbo, que aprimora significativamente as capacidades de tradução e compreensão multilingue do modelo. Construído sobre o Qwen3 e utilizando aprendizado por reforço, oferece suporte a 92 idiomas com maior precisão e fluência.

Qwen-MT AI translation reinforcement learning language model

NEWSQwen Blog·3/23/2025

Qwen2.5-VL-32B: Smarter and Lighter

O texto anuncia o Qwen2.5-VL-32B-Instruct, um novo modelo da série Qwen2.5-VL otimizado com aprendizado por reforço e lançado como código aberto sob licença Apache 2.0. Este modelo se destaca por sua escala de 32 bilhões de parâmetros.

open-source 32B Parameters reinforcement learning Machine Learning