Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save bollwyvl/6156dd687fbbd0abc457 to your computer and use it in GitHub Desktop.
Save bollwyvl/6156dd687fbbd0abc457 to your computer and use it in GitHub Desktop.
so you want to host some notebooks
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# So You Want to Host Some HTML Notebooks\n",
"## _A Field Guide_\n",
"[jupyter/nbviewer](https://github.com/jupyter/nbviewer) is the code behind [The Jupyter Notebook Viewer](http://nbviewer.org), which _fetches_ publicly-accessible notebooks we don't control, or even trust, and _renders_ them for anyone in the world to see.\n",
"\n",
"**So you want host some notebooks.** While it may seem tempting to just `git clone` and start running your very own `nbviewer.company.com`, there are a number of pitfalls."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Vocabulary\n",
"- **Content Delivery Network (CDN)** such as [Fastly](http://www.fastly.com/), provides a distributed cache to make serving rendered HTML and static assets to browsers really fast\n",
"- **Cross Site Scripting (XSS)** use of a JavScript in a web application to steal information from an unsuspecting user\n",
"- **Jupyter Notebook Viewer** (_a.k.a nbviewer.org, nbviewer.jupyter.org_), the globally visible instance of `nbviewer`, which always runs the `master` branch\n",
"- **memcache**, an in-memory key-value store with a network API and support for timeout to ensure freshness\n",
"- **nbconvert** the Jupyter sub-project responsible for generating different kinds of output from `.ipynb` files\n",
"- **`nbviewer`**, the codebase at http://github.com/jupyter/nbviewer\n",
"- **nbviewer provider** a python module that can _fetch_ content from a source, e.g. URL, API, as well as turn URLs from a host URL namespace into _canonical local URIs_ \n",
"- **nbviewer format** a python module that _renders_ a fetched `.ipynb` files into HTML\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What Won't `nbviewer` Do?\n",
"These are not hard and fast rules: if someone wanted to overcome all the issues entailed in one of the things below, PRs would be reviewed. However, based on the availability of the maintainers, and the design space of nbviewer.org, here are some current opinions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### protect individual notebooks behind authentication/authorization\n",
"Because nbviewer will render notebooks hosted anywhere, there is no secure, scalable way to handle the possible types of integration that would be required. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### execute notebooks\n",
"We present the notebook as close to possible as the author intended it to be viewed the last time they hit \"save\"."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use Cases\n",
"Here are some ways to host notebooks: for some cases, using `nbviewer` makes a lot of sense. For others, it can introduce a lot of complexity and attack surface."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## I have a directory of _trustworthy_ `.ipynb` I want to serve as HTML..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ...that don't change\n",
"_Should I run nbviewer?_ __No.__\n",
"\n",
"The most secure, fastest web application is _no application_. Statically-generated HTML, uploaded to your server is a great way to get your notebooks hosted.\n",
"\n",
"```bash\n",
"jupyter nbconvert --to=html the_notebooks/*.ipynb\n",
"scp -r the_notebooks/*.html server:/var/www/htdocs/notebooks\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ...that change\n",
"_Should I run nbviewer?_ __No.__\n",
"\n",
"If your notebooks change, you can make a [`cron` job](http://www.unixgeeks.org/security/newbie/unix/cron-1.html) to refresh the generated HTML. This is still easier than standing up nbviewer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ...after I run them\n",
"_Should I run nbviewer?_ __No.__\n",
"\n",
"You can leverage the reproducible nature of notebooks, and run them prior to generating the HTML:\n",
"\n",
"```bash\n",
"jupyter nbconvert --to=html --execute the_notebooks/*.ipynb\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ...protected by username/password\n",
"_Should I run nbviewer?_ __No.__\n",
"\n",
"With static files, you can leverage existing, production-grade means of protecting your content. The simplest approach, is the time-honored [.htaccess](https://httpd.apache.org/docs/2.4/howto/htaccess.html) file used by Apache `httpd`. Any number of other servers, such as [nginx](https://www.nginx.com) can also host files and limit them through various means."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## I want to host notebooks on my company's on-premises web app"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ... that are \"internally public\"\n",
"_Should I run nbviewer?_ __Maybe.__\n",
"\n",
"If your web application (e.g. GitLab, Stash, PowerPoint) contains an unknowable number of notebooks, or they change frequently, running a local nbviewer can be a good choice. To avoid XSS, you'll want to ensure that the `nbviewer` server is running on its own host/port.\n",
"\n",
"With the `url`(`s`) provider, nbviewer can fetch and render notebooks from any site that has a reasonable URL structure, reachable by an unauthenticated user, often called \"public\" in site configuration.\n",
"\n",
"Github Enterprise (GHE) is a special case, as it is basically API-compatible with GitHub, which constitutes the bulk of hosted notebooks on `nbviewer.org`. There are no tests specifically for GHE, but some users have been able to use the two together. In this case, you just need to generate an API key for the application."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> <i class=\"fa fa-exclamation-triangle text-warning fa-2x\"></i>\n",
"> When running `nbviewer` in this context, you may want to disable the other default _providers_, by just specifying the `url` provider.\n",
"```\n",
"python -m nbviewer --providers=nbviewer.providers.url\n",
"```\n",
"\n",
"> A malicious individual could still figure out your internal nbviewer instance and use it to send a link to a malicious notebook, such that it could leak information. This could be achieved by using a firewall to limit what the machine itself can acces, but future work could include a `--url_whitelist` option that allowed you to specify patterns at runtime."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## I want integrate rendered notebooks into my application\n",
"_Should I run nbviewer?_ __No.__\n",
"\n",
"There are a number of considerations here. About the only way to be sure is to run untrusted rendered notebooks on a wholly separate domain/port than your main application.\n",
"\n",
"> <i class=\"fa fa-question-circle text-primary fa-2x\"></i> we would like at some point to support more uniform integration with popular. Create an [issue](https://github.com/jupyter/nbviewer/issues) or chime in on \n",
"[gitter][gitter] if you have ideas about platforms that need notebooks!\n",
"\n",
"[gitter]: https://gitter.im/jupyter/nbviewer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Appendix 1: How Does [nbviewer.org](http://nbviewer.org) Work?\n",
"> _TODO: get validation from @rgbkrk_\n",
"\n",
"The hosting of nbviewer.org is donated by [Rackspace](https://developer.rackspace.com/?nbviewer=awesome). On their servers, the system is run in a [Docker Swarm](https://docs.docker.com/swarm/) cluster with a number of nodes running tornado, and some running the `memcache` server.\n",
"\n",
"> TODO: architecture diagram\n",
"\n",
"When a client makes a `GET` request, it first\n",
" - hits the [Fastly](http://www.fastly.com/) CDN cache. If an up-to-date version of the file\n",
" - is found\n",
" - the cached version of the request is returned to the client. Hooray!\n",
" - is not found\n",
" - it gets routed to one of the tornado nodes, and nbviewer...\n",
" - looks at the URL, and hands it to the appropriate _provider_ e.g. `github` which...\n",
" - fetches the `.ipynb`\n",
" - hands it to the `format` which\n",
" - renders the HTML\n",
" - caches the HTML\n",
" - returning it the upstream server, with cache information\n",
" - cached _again_ by the CDN\n",
" - returned to the client"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment