All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
1:18:00
RLHF Explained & Coded (feat. PPO)
230 views
6 months ago
YouTube
AIArchives
1:27:21
RLHF, PPO and DPO for Large language models
3.6K views
Feb 18, 2024
YouTube
Arvind N
7:37
Visualizing PPO Behind RLHF
3.9K views
Jan 31, 2025
YouTube
AGI Lambda
6:06:21
LLMs from Scratch – Practical Engineering from Base Model to P
…
140.4K views
4 months ago
YouTube
freeCodeCamp.org
24:31
DPO Meets PPO: Reinforced Token Optimization for RLHF
171 views
Apr 30, 2024
YouTube
Arxiv Papers
38:24
Proximal Policy Optimization (PPO) - How to train Large Language Mod
…
77.9K views
Jan 24, 2024
YouTube
Serrano.Academy
3:14:37
RLHF from scratch, step-by-step, in code
129 views
7 months ago
YouTube
Ashwani Kumar
15:31
Reinforcement Learning with Human Feedback (RLHF) - How to train an
…
32.4K views
Feb 12, 2024
YouTube
Serrano.Academy
16:01
[RLHF] 从 PPO rlhf 到 DPO,公式推导与原理分析
22K views
Jun 23, 2024
bilibili
五道口纳什
9:36
[QA] DPO Meets PPO: Reinforced Token Optimization for RLHF
95 views
Apr 30, 2024
YouTube
Arxiv Papers
19:39
Reinforcement Learning, RLHF, & DPO Explained
15.7K views
Jun 12, 2024
YouTube
Mark Hennings
59:15
Reinforcement Learning with Human Feedback (RLHF)
2.5K views
Jan 31, 2024
YouTube
AI Makerspace
6:31
Reinforcement Learning: ChatGPT and RLHF
23.7K views
Aug 14, 2023
YouTube
Graphics in 5 Minutes
11:29
Reinforcement Learning from Human Feedback (RLHF) Explained
76.7K views
Aug 7, 2024
YouTube
IBM Technology
1:47
Unlock the Power of Generative AI with RLHF Powered by Appen - Yo
…
16.9K views
Mar 31, 2023
YouTube
Appen
6:18
4 Ways to Align LLMs: RLHF, DPO, KTO, and ORPO
3.7K views
Jul 10, 2024
YouTube
Snorkel AI
9:44
RLAIF Reinforcement Learning with AI Feedback or Aligning Large La
…
1.3K views
Sep 6, 2023
YouTube
AI WITH Rithesh
9:10
Direct Preference Optimization: Forget RLHF (PPO)
16.1K views
Jun 6, 2023
YouTube
Discover AI
18:37
ChatGPT explained: A Guide to Conversational AI w/ InstructGPT,
…
8.1K views
Dec 12, 2022
YouTube
Discover AI
10:17
Reinforcement Learning through Human Feedback - EXPLAINED! |
…
28.8K views
Dec 11, 2023
YouTube
CodeEmporium
15:55
Policy Optimization & TRPO & PPO | RL原理讲解系列#3
8.5K views
Dec 20, 2023
bilibili
Up-Fei
28:51
Reinforcement Learning with Human Feedback
276 views
Nov 14, 2024
YouTube
Open Data Science
36:14
How to Code RLHF on LLama2 w/ LoRA, 4-bit, TRL, DPO
16.9K views
Aug 31, 2023
YouTube
Discover AI
What’s the Difference Between an HMO, PPO and POS?
Nov 6, 2018
trinet.com
0:40
复旦NLP团队开元MOSS- RLHF,实现ChatGPT的PPO算法
321 views
Sep 1, 2023
bilibili
二范数智能
5:58
OpenRLHF - Simplest and Fastest RLHF Training
823 views
May 21, 2024
YouTube
Fahd Mirza
30:12
用RLHF的方法解读论语
3.8K views
Oct 5, 2023
bilibili
jurejoy
25:21
L4 TRPO and PPO (Foundations of Deep RL Series)
45.9K views
Aug 25, 2021
YouTube
Pieter Abbeel
1:53
RLHF训练法从零复现,代码实战,大语言模型训练
21K views
May 8, 2024
bilibili
蓝斯诺特
HDHP vs. PPO: Choose the health plan that's right for you
Nov 27, 2024
wexinc.com
See more videos
More like this
Feedback