/
lemmy.ml
Search
Explore
Create
Machine Learning - Theory | Research
manitcor
•
Now
•
100%
Direct Preference Optimization - Your Language Model is Secretly a Reward Model
https://arxiv.org/pdf/2305.18290.pdf
1
0
Comments
0
Hot
Top
New