I love docker. It makes environments:
- portable across any machine
- subject to version control (retraceable and recoverable history)
- easily deployed and discarded with simple commands
Jupyter is a great tool that allows for the creation of nice looking documents consisting of ordered code chunks with inline output. It is a fantastic way to get started with programming, clearly step through your workflow, and/or create stories and presentations out of your work. It runs in the browser after you install it on your machine (i.e. you access it with a URL in your web browser).
This walkthrough will get you set up with a jupyter lab (or jupyter notebooks classic) environment that is fully customizable, isolated, version controllable and portable using docker.
If you don’t want to use docker, you can always install jupyter the classic way. I like docker for the reasons outlined above (isolation, portability, version controllable, easily deployed/discarded).
1: Install Docker
To install Docker, you’ll have to follow the instructions below:
Once installed, you’ll want to open a terminal (or cmd.exe for windows) and enter
docker. You should see a long list of options and commands and not something like
command 'docker' not found. If you see the long list, you’re ready to proceed!
Docker uses the language of images (recipes for containers) and containers (instances of images). In this specific case, things like
jupyter/pyspark-notebook are images while our container will just be called
I like to think of a container as an isolated environment that my app/service (in this case, jupyter) will run in, almost entirely isolated from the rest of my host machine.
2: Clone the
You’ll need git for this step, which you can install here if you don’t already have it. You can check by typing
git in to the terminal and seeing if the command is found.
What you’ll clone is Project Jupyter’s docker-stacks. You can also fork this repo and clone your fork. We’ll go through adding a couple files to this repo that will make deployment and customizing things easy. You can use my fork or this tutorial as a reference. To clone the
docker-stacks repo, run the following command in terminal/command prompt:
docker-stacks directory we just cloned contains a lot of stuff. Open the folder in your favorite editor or just look at the repo here. We’re going to look at just a few bits of a few important files. First, the
Dockerfile in the
Dockerfile specifies that all jupyter containers will derive from a Linux Ubuntu “bionic” OS (
If you examine the
Dockerfiles for other directories, you’ll see there’s a chain of derivations that end in the base-notebook:
The docker-stacks team has structured the project so you can specify how complicated of a setup you want out of the box based on what image you select. The
jupyter/minimal-notebook image will contain the simplest installation you need for jupyter to run at all. The
jupyter/pyspark-notebook image will come with a lot more, like a spark installation in the jupyter container file system (likely overkill unless you’re using pyspark!). Higher up images in the inheritance tree will include everything below them (e.g. pyspark-notebook contains everything from scipy-notebook, minimal-notebook, and base-notebook).
Custom File 1: Docker Compose
The first file we add to the
docker-stacks base repo is
docker-compose tool is often used for orchestrating many containers (e.g. “bring up an nginx container and a jupyter container together so nginx can act as a reverse proxy and make the jupyter URL more user-friendly”), but here we only have one container (
jupyter). I like to use
docker-compose.yml as a way of version controlling a
docker run command, which is typically used to bring single containers up. All
docker-run commands can be translated into
docker-compose files and vice versa.
You’ll create a
docker-stacks/docker-compose.yml file that looks something like this (totally up to you to customize!):
Meaning of this witchcraft:
version '3': docker-compose specifies versions. They have slightly different syntax between them
services: our list of services here is just one service called jupyter that will create a container called jupyter
image: here is where you call out which image you want to use, as discussed above
volumes: this is how you break isolation between the container and host environment, linking directories and/or files from the host to the container filesystems. This is important because otherwise when you destroy your container, your files would be lost with it. Syntax *
host_path:container_path. Changes made in either host or container will reflect in the other
ports: jupyter will run on a port in the container.
portsbinds the host port to the container port. Syntax is
restart: unless-stopped: if your container fails, it will auto-restart unless you stop it with e.g*
environment: specify environment variables.
JUPYTER_ENABLE_LABenables jupyter lab by default
command: this is the final command docker will run in the container upon build.
NotebookApp.tokenpassword-protects your jupyter instance. The port here should also match the container port *
Custom File 2: Setup Script
You may have noticed that the second volume specified in the
docker-compose.yml file is called
./setup_envs.sh. This is a custom startup script that I use to do the following:
- Create environment variables that are accessible in jupyter
- Add jupyterlab extensions
- Create custom conda environments and link them to jupyter kernels
- Run other useful commands in the container upon startup (e.g. configure git, install
You’ll create a
docker-stacks/setup_envs.sh file that looks something like this (totally up to you to customize!):
Dirty details on how this runs just by placing it in the container: the
start-notebook.sh script run by the
docker-compose command will subsequently run
start.sh. Both these shell scripts are in the
base-notebook directory. The
start.sh script has “hooks” for running scripts that are in the
/usr/local/bin/before-notebook.d. All this to say that
setup_envs.sh will be run as part of your jupyter setup because we put it in the
/usr/local/bin/before-notebook.d directory in the jupyter container!
Note that you could totally exclude the
setup_envs.sh script and volume and your deployment would be fully functioning. I like to use this method as a way of customizing my jupyter setup in a version-controlled manner. Rather than installing a new python package in my base environment or installing a system package every time manually and trying to remember what I did when I port to a new machine, I’ll edit the
setup_envs.sh script, commit and push to my fork, then redeploy on whatever machine I want and get the exact same setup.
3: Bring Up the Jupyter Container
If you’ve gotten this far, you’re basically done! The one thing we’re missing is to set a password/token. In terminal, run this command with your custom password. If you don’t do this,
docker-compose will warn you at the next step.
You can now navigate to the
docker-stacks directory in a terminal and run:
-d flag stands for “detached”, and ensures the
jupyter container will stay running even if you exit the terminal.
You can now run:
-f flag standing for “follow”. Watch the progress in your terminal as your jupyter container is instantiated and your
setup_envs.sh script is run (if you made one).
Once you see something like the following:
You’re ready to go! Navigate to http://127.0.0.1:9999 (or whatever port you specified if different than
9999) where you should see a page that prompts you for your password (set with the
JUPYTER_PASSWORD environment variable earlier).
You have a fully functioning and replicable jupyter environment, portable to any machine that runs docker! Remember that even though your container has access to your host file system through the volume we created, your file system isn’t (and shouldn’t be) version controlled with the
docker-stacks directory. Version control your projects separately.
- Another approach to version controlled customization could be to modify the Dockerfiles themselves. I prefer the
setup_envs.shscript approach as rebuilding the images is time and disk space intensive. Modifying Dockerfiles, though, is a totally valid approach.
- If you get stuck on anything in this process or see that I’ve been wrong/unclear about something, please open an issue on the source of this website here.