Skip to content

Instantly share code, notes, and snippets.

@takluyver
Created September 6, 2014 21:44
Show Gist options
  • Save takluyver/bc8f3275c7d34abb68bf to your computer and use it in GitHub Desktop.
Save takluyver/bc8f3275c7d34abb68bf to your computer and use it in GitHub Desktop.
Flatten notebooks for git diff

Copy nbflatten.py to somewhere on $PATH. Then, in the root of a git repository, run these commands:

echo "*.ipynb diff=ipynb" >> .gitattributes 
git config diff.ipynb.textconv nbflatten.py

When you change a notebook and run git diff, you'll see the diff of flattened, simplified notebooks, rather than the full JSON. This does lose some information (metadata, non-text output), but it makes it easier to see simple changes in the notebook.

This doesn't help with merging conflicting changes in notebooks. For that, see nbdiff.org.

#!/usr/bin/python3
import sys
from IPython.nbformat.current import read
from IPython.utils.text import strip_ansi
fname = sys.argv[1]
with open(fname, encoding='utf-8') as f:
nb = read(f, 'ipynb')
banners = {
'heading': 'Heading %d ------------------',
'markdown': 'Markdown cell ---------------',
'code': 'Code cell -------------------',
'raw': 'Raw cell --------------------',
'output': 'Output ----------------------',
}
for cell in nb.worksheets[0].cells:
if cell.cell_type == 'heading':
print(banners['heading'] % cell.level)
else:
print(banners[cell.cell_type])
if cell.cell_type == 'code':
source = cell.input
else:
source = cell.source
print(source)
if not source.endswith('\n'):
print()
if cell.cell_type == 'code':
if cell.outputs:
print(banners['output'])
for output in cell.outputs:
if 'text' in output:
print(strip_ansi(output.text))
elif 'traceback' in output:
print(strip_ansi('\n'.join(output.traceback)))
else:
print("(Non-plaintext output)")
print()
@michaelaye
Copy link

I get an error.
Here's the end of git config -l to show I have the right line in there:

ranch.new_offsets.merge=refs/heads/new_offsets
branch.feature/output_formatter.remote=origin
branch.feature/output_formatter.merge=refs/heads/feature/output_formatter
diff.ipynb.textconv=nbflatten.py

Here's the content of .gitattributes:

maye@lunatic|~/Dropbox/src/diviner on develop!
± cat .gitattributes
*.ipynb diff=ipynb
maye@lunatic|~/Dropbox/src/diviner on develop!

Here's my try to use it:

± git diff notebooks/analyses/Physics.ipynb
error: cannot run nbflatten.py: No such file or directory
fatal: unable to read files to diff
-> [128]
maye@lunatic|~/Dropbox/src/diviner on develop!

Here's the content of my $HOME/bin which is on the PATH:

± ll ~/bin
total 40
-rwxr--r--  1 maye  staff   1.2K Sep  8 15:24 nbflatten.py
-rwx------  1 maye  staff   116B Oct  5  2012 printpath
-rwxr-xr-x  1 maye  staff   1.2K Sep 28  2012 ssh-copy-id
lrwxr-xr-x  1 maye  staff    62B Apr 21 18:19 subl -> /Applications/Sublime Text.app/Contents/SharedSupport/bin/subl
lrwxr-xr-x  1 maye  staff    37B Nov 20  2012 vcprompt -> /Users/maye/src/vcprompt/bin/vcprompt

@michaelaye
Copy link

Very mysterious: Adding the full path results in git lying to me:

diff.ipynb.textconv=/Users/maye/bin/nbflatten.py
maye@lunatic|~/Dropbox/src/diviner on develop!
± git diff notebooks/analyses/Physics.ipynb
error: cannot run /Users/maye/bin/nbflatten.py: No such file or directory
fatal: unable to read files to diff
-> [128]
maye@lunatic|~/Dropbox/src/diviner on develop!
± ll /Users/maye/bin/nbflatten.py
-rwxr--r--  1 maye  staff   1.2K Sep  8 15:24 /Users/maye/bin/nbflatten.py
maye@lunatic|~/Dropbox/src/diviner on develop!

@holdenweb
Copy link

Make it executable? Oh, sorry, it is. Do you have a /usr/bin/python3? I believe you'll see "No such file or directory" if the kernel can't find the executable named in the shebang line. Somethimes this happens with DOS-style files when the carriage return is taken as part of the filename.

@gforsyth
Copy link

@takluyver

This is awesome. I backported it for Python 2.7 (that sounds much grander than changing 2 lines) and it's already saving me a number of headaches.

@ethanwhite
Copy link

Getting the following error using IPython 2.2.0 on Ubuntu 14.04:

ethan@oryx:~/ProgBio/repo (gh-pages *)$ git diff ipynbs/functions-writing.ipynb
Traceback (most recent call last):
  File "/usr/local/bin/nbflatten.py", line 4, in <module>
    from IPython.utils.text import strip_ansi
ImportError: cannot import name 'strip_ansi'
fatal: unable to read files to diff

@ethanwhite
Copy link

Looks like in the current release (2.2.0) line 4 should still be from IPython.nbconvert.filters.ansi import strip_ansi. Change is in ipython/ipython@d2acc30

@ethanwhite
Copy link

Oh, and now that this works, it is awesome!

@takluyver
Copy link
Author

Oh, I wasn't getting pinged by comments on here for some reason. I'm glad it's helping people - if anyone is still having trouble with it, let me know.

@jfeist
Copy link

jfeist commented Mar 19, 2015

Just in case it might be useful to someone:

I have been a big fan of nbflatten.py since I discovered it, and have been using it extensively as a diff filter for git. However, I find it to be a bit slow, especially for repositories with many (large) notebooks. So I spent a bit of time writing a filter for jq which does the same thing, but is orders of magnitude faster.

The relevant section of my .gitconfig now looks like this:

[diff "ipynb"]
    textconv = "jq -r 'def banner: \"\\(.) \"+(28-(.|length))*\"-\"; (\"Non-cell info\"|banner),del(.cells),\"\", (.cells[] | (\"\\(.cell_type) cell\"|banner), \"\\(.source|add)\\n\")'"

I am typically not interested in the outputs for diffing notebooks, so the textconv filter here does not show them. However, I did find it to be more convenient for me to show the metadata of the notebook as well in the output, and everything not in "cells" is shown first under the header "Non-cell info". This disappears by removing the part (\"Non-cell info\"|banner),del(.cells),\"\",.

I have also written a more "complete" script which shows the outputs in pretty much the same way as nbflatten.py. That can be found at https://gist.github.com/jfeist/cd00aa3b681092e1d5dc. If you download it and put it somewhere in your path, you can use textconv = nbflatten.jq instead.

@jfeist
Copy link

jfeist commented Mar 19, 2015

PS: jq is also very useful (and fast!) for making a filter to remove the output of notebooks when adding them to git. The relevant part of .gitconfig is

[filter "clean_nb"]
    clean = "jq '(.cells[] | select(.outputs) ) |= [] | (.cells[] | select(.execution_count)) |= null'"

And in .gitattributes, you then need *.ipynb filter=clean_nb diff=ipynb.

@jakirkham
Copy link

@jfeist, using jq is fantastic. This really saved me a lot of time!

@jakirkham
Copy link

Modification to @jfeist's snippet for .gitconfig. More details here ( jqlang/jq#921 ). Also, double quotes must be escaped in single quotes with .gitconfig. ( http://stackoverflow.com/a/25535431 )

[filter "clean_nb"]
        clean = jq '(.cells[] | select(has(\"outputs\")) | .outputs) = [] | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null'

@edgimar
Copy link

edgimar commented Sep 21, 2015

In case anyone cares, newer versions of ipython have a "nbconvert" function built into them, so you can do something like ipython nbconvert myfile.ipynb --to markdown --stdout and get a similar effect to this script. Otherwise you will need to mess around with the nbflatten script in order to get it to work with recent versions of ipython.

@jankatins
Copy link

Has someone here use jq as a nbflatten replacement (with [diff "ipynb"], not [filter ...]!) on windows? I tried and jq crashes even on jq "." whatever.ipnb

@nicowilliams
Copy link

Re: jq, @JanSchulz filed jqlang/jq#1072, an it's a fun one.

@jankatins
Copy link

JFYI: with a recent build of jq, the jq version of nbflatten and the filter now works on windows.

@vmuriart
Copy link

vmuriart commented Apr 6, 2016

For anyone looking to download the version @JanSchulz was referring to its on AppVeyor

Thanks @JanSchulz for the heads up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment