Skip to content

Instantly share code, notes, and snippets.

@mapmeld
Last active December 14, 2021 03:11
Show Gist options
  • Save mapmeld/36bd70231fd8b952b1b38d41c40289e9 to your computer and use it in GitHub Desktop.
Save mapmeld/36bd70231fd8b952b1b38d41c40289e9 to your computer and use it in GitHub Desktop.
Patching Models BigSci Proposal

Patching Models with New Words, People, and Events

May 6 - June 15, 2021

Scope

Once a large pre-trained language model is published, it is a snapshot of language when its corpus was collected. What are ways to update models to support new or newly-frequent terms (BIPOC), phrasing (social distancing), or people and events (Fyre Festival)? What are reliable, low-cost ways to test and benchmark these methods of updating?

Current status

Moving to participate in Modeling / Retrieval Working Group; if you have resources about model update-ability, feel free to join that group, contact Nick Doiron on Slack, and/or paste links to papers below.

Resources

My goal would be a benchmark to compare approaches to move/insert embeddings (CPU) or do a short burst of training (GPU). Terms would come from news articles, Reddit comments, and/or fictional events where we can show the models have no prior knowledge.

KD: I feel this is a very important task and what most language models are struggling at. There has been some interesting work on dynamical evaluation which attempt to fit models to recent history: https://arxiv.org/pdf/1904.08378.pdf https://www.aclweb.org/anthology/2021.eacl-main.6.pdf

ND: ^^ thanks, these are good resources and also remind me to include FB's dynabench.org in our brainstorming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment