Rlhf and PPO - Search Videos

RLHF Explained & Coded (feat. PPO)

RLHF Explained & Coded (feat. PPO)

230 views6 months ago

YouTubeAIArchives

RLHF, PPO and DPO for Large language models

RLHF, PPO and DPO for Large language models

3.6K viewsFeb 18, 2024

YouTubeArvind N

Visualizing PPO Behind RLHF

Visualizing PPO Behind RLHF

3.9K viewsJan 31, 2025

YouTubeAGI Lambda

LLMs from Scratch – Practical Engineering from Base Model to PPO RLHF

LLMs from Scratch – Practical Engineering from Base Model to P…

140.4K views4 months ago

YouTubefreeCodeCamp.org

DPO Meets PPO: Reinforced Token Optimization for RLHF

DPO Meets PPO: Reinforced Token Optimization for RLHF

171 viewsApr 30, 2024

YouTubeArxiv Papers

Proximal Policy Optimization (PPO) - How to train Large Language Models

Proximal Policy Optimization (PPO) - How to train Large Language Mod…

77.9K viewsJan 24, 2024

YouTubeSerrano.Academy

RLHF from scratch, step-by-step, in code

RLHF from scratch, step-by-step, in code

129 views7 months ago

YouTubeAshwani Kumar

Reinforcement Learning with Human Feedback (RLHF) - How to train an…

32.4K viewsFeb 12, 2024

YouTubeSerrano.Academy

[RLHF] 从 PPO rlhf 到 DPO，公式推导与原理分析

22K viewsJun 23, 2024

bilibili五道口纳什

[QA] DPO Meets PPO: Reinforced Token Optimization for RLHF

95 viewsApr 30, 2024

YouTubeArxiv Papers

Reinforcement Learning, RLHF, & DPO Explained

15.7K viewsJun 12, 2024

YouTubeMark Hennings

Reinforcement Learning with Human Feedback (RLHF)

2.5K viewsJan 31, 2024

YouTubeAI Makerspace

Reinforcement Learning: ChatGPT and RLHF

23.7K viewsAug 14, 2023

YouTubeGraphics in 5 Minutes

Reinforcement Learning from Human Feedback (RLHF) Explained

76.7K viewsAug 7, 2024

YouTubeIBM Technology

Unlock the Power of Generative AI with RLHF Powered by Appen - Yo…

16.9K viewsMar 31, 2023

4 Ways to Align LLMs: RLHF, DPO, KTO, and ORPO

3.7K viewsJul 10, 2024

YouTubeSnorkel AI

RLAIF Reinforcement Learning with AI Feedback or Aligning Large La…

1.3K viewsSep 6, 2023

YouTubeAI WITH Rithesh

Direct Preference Optimization: Forget RLHF (PPO)

16.1K viewsJun 6, 2023

YouTubeDiscover AI

ChatGPT explained: A Guide to Conversational AI w/ InstructGPT, …

8.1K viewsDec 12, 2022

YouTubeDiscover AI

Reinforcement Learning through Human Feedback - EXPLAINED! | …

28.8K viewsDec 11, 2023

YouTubeCodeEmporium

Policy Optimization & TRPO & PPO | RL原理讲解系列#3

8.5K viewsDec 20, 2023

Reinforcement Learning with Human Feedback

276 viewsNov 14, 2024

YouTubeOpen Data Science

How to Code RLHF on LLama2 w/ LoRA, 4-bit, TRL, DPO

16.9K viewsAug 31, 2023

YouTubeDiscover AI

What’s the Difference Between an HMO, PPO and POS?

复旦NLP团队开元MOSS- RLHF，实现ChatGPT的PPO算法

321 viewsSep 1, 2023

bilibili二范数智能

OpenRLHF - Simplest and Fastest RLHF Training

823 viewsMay 21, 2024

YouTubeFahd Mirza

用RLHF的方法解读论语

3.8K viewsOct 5, 2023

bilibilijurejoy

L4 TRPO and PPO (Foundations of Deep RL Series)

45.9K viewsAug 25, 2021

YouTubePieter Abbeel

RLHF训练法从零复现,代码实战,大语言模型训练

21K viewsMay 8, 2024

bilibili蓝斯诺特

HDHP vs. PPO: Choose the health plan that's right for you

See more videos