Skip to content

Instantly share code, notes, and snippets.

@tnq177
tnq177 / RLHF.md
Created May 20, 2023 16:29 — forked from JoaoLages/RLHF.md
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
@tnq177
tnq177 / folder_to_apkg.py
Created December 11, 2022 07:33
properly zip a folder to anki apkg
"""
properly zip folder to anki apkg
learned from https://github.com/patarapolw/ankisync2
"""
import sys
from zipfile import ZipFile
from pathlib import Path
if __name__ == "__main__":
indir = sys.argv[1] # path to input dir contains {1...n} files + media + collection.anki2
@tnq177
tnq177 / multiple_ssh_setting.md
Created January 16, 2021 17:06 — forked from jexchan/multiple_ssh_setting.md
Multiple SSH keys for different github accounts

Multiple SSH Keys settings for different github account

create different public key

create different ssh key according the article Mac Set-Up Git

$ ssh-keygen -t rsa -C "your_email@youremail.com"
@tnq177
tnq177 / plot_attention_weights.py
Created January 7, 2021 20:43
plotting attention weights with bokeh #attention #bokeh
from bokeh.plotting import figure, output_file, save
from bokeh.palettes import Blues256
from bokeh.io import export_png
def plot_att(src, tgt, weights, out_filepath):
"""
Plot attention using Bokeh.
Output is a 2D matrix with x-axis=src, y-axis=tgt.
Each cell = attention weight between corresponding src
@tnq177
tnq177 / download_opensub.py
Created July 23, 2019 17:48
download opensubtitles data
import os
import sys
import requests
from homura import download
from multiprocessing import Pool
LANGUAGES = ['af', 'ar', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'da', 'de', 'el', 'en', 'eo', 'es', 'et', 'eu', 'fa', 'fi', 'fr', 'gl', 'he', 'hi', 'hr', 'hu', 'hy', 'id', 'is', 'it', 'ja', 'ka', 'kk', 'ko', 'lt', 'lv', 'mk', 'ml', 'ms', 'nl', 'no', 'pl', 'pt', 'pt_br', 'ro', 'ru', 'si', 'sk', 'sl', 'sq', 'sr', 'sv', 'ta', 'te', 'th', 'tl', 'tr', 'uk', 'ur', 'vi', 'ze_en', 'ze_zh', 'zh_cn', 'zh_tw']
DELIMITER = '<bazingaaaaa>'
@tnq177
tnq177 / tensorflow_random_seed.md
Last active May 11, 2020 09:21
Tensorflow global random seed
tf.reset_default_graph()
with tf.Graph().as_default():
    tf.set_random_seed(42)
    
    with tf.Session() as sess:
        ...define graph here...
  • Must reset default graph before defining graph & must set random seed before creating session.