Skip to content

Instantly share code, notes, and snippets.

@sshleifer
Last active July 28, 2021 22:02
Show Gist options
  • Save sshleifer/e02c30a66a94126aea6baf77df926e84 to your computer and use it in GitHub Desktop.
Save sshleifer/e02c30a66a94126aea6baf77df926e84 to your computer and use it in GitHub Desktop.
How to use adam8bit

Setup

To use it on the fair cluster gshard branch, you need the following dependencies: (from inside fairseq env, assuming cuda 11.0)

pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U
pip install -U fairscale

WARNING: if you dont do this step your checkpoints will not be usable!

Change Sweep Script

Remove your old --optimizer and add the following:

grid.extend([
        hyperparam('--optimizer', 'adam8bit', save_dir_key=lambda val: original_opt),
        hyperparam('--no-scale-embedding'),
        hyperparam('--use-stable-embedding', save_dir_key=lambda x: 'stable' if x else ''),
        hyperparam('--block-wise', save_dir_key=lambda x: 'blockwise' if x else ''),
        hyperparam('--use-stable-embedding', save_dir_key=lambda x: 'stable' if x else '')
    ]
)

If you are using FSDP, you also need to add

grid.append(hyperparam('--use-sharded-state'))

This will make your checkpoint files look different, like this:

checkpoint_last-rank-0-shard0.pt 
checkpoint_last-rank-1-shard1.pt
checkpoint_last-shared-shard0.pt
checkpoint_last-shared-shard1.pt

which eval_lm and gpt3_eval will not consume.

Consolidating Sharded Checkpoints

After training with sharded state, you can run, for example:

python scripts/consolidate_fsdp_shards.py SAVE_DIR/checkpoint_1_1000.pt

which will save files like

SAVE_DIR/checkpoint_1_1000_consolidated-shared.pt
SAVE_DIR/checkpoint_1_1000_consolidated-rank-0.pt

so that

fairseq_cli/eval_lm.py SAVE_DIR/checkpoint_1_1000.pt ...

works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment