Python Docker Image

Using Python in a Docker image "can" be as simple as:

FROM python:3

And for small scripts, this simple base image usually gets you up and running! Unforunately, using the official Python image from Docker can be limiting as your application grows. You may reach for a slightly more flexible image such as those based on Debian or Alpine, but this still makes some assumptions about how you will use Python and the tools it provides.

You may need to install a system package to compile a Python library that your new application uses. A great example of this is the library Pillow. Pillow requires at LEAST libjpeg and zlib, and an optional 8 other system packages for further functionality.

While Pillow, and many other Python packages, claim to have the binaries to their related system pacakges bundled with the wheel varients this is not always the case (or they don't always work as described).

Instead of relying other folks to provide functionality for you, we'll build our own Python image from scratch that handles all of our use cases.

Fresh Image

A few key points about this image:

  1. We will be making use of Layers to reduce build times and image size.
  2. We are NOT focusing on making the slimmest image possible however.
  3. We are NOT assuming 100% lockdown. We will include a few items for debugging.

Let's treat the image like a fresh VM. In my world I use the LTS (long term support) of Ubuntu, we that's what we'll base our image on.

# syntax=docker/dockerfile:1
FROM ubuntu:latest as base

You may have noticed a line you haven't used before! Having # syntax=docker/dockerfile:1 tells our container builiding tool which syntax is valid in this Dockerfile. We'll explicity set the standard one for Docker, you can read more about BuildKit frontends here. We'll be using Ubunt 22.04 (jammy) via ubuntu:latest refers to the latest LTS release of Ubuntu by it's codename. We'll then be calling this layer base so we can reference it later in the Dockerfile!

Environment Variables

Next we'll setup some environment variables as a single step. If you've read my post on setting up Python on a local system you'll recognize a few items here. We'll again make use of Pyenv and Poetry to install Python over the apt-get method.

ENV \
    # LANG and LC_ALL are set to C.UTF-8 to ensure that Python and other programs are running in UTF-8 mode.
    LANG="C.UTF-8" \
    LC_ALL="C.UTF-8" \
    # DEBIAN_FRONTEND is set to noninteractive to prevent any interactive prompts from appearing during package installation.
    DEBIAN_FRONTEND=noninteractive \
    # PYTHONDONTWRITEBYTECODE is set to 1 to prevent Python from writing .pyc files to disc.
    PYTHONDONTWRITEBYTECODE=1 \
    # PYTHONUNBUFFERED is set to 1 to prevent Python from buffering stdout and stderr.
    PYTHONUNBUFFERED=1 \
    # PIP_NO_CACHE_DIR is set to off to prevent pip from caching downloaded packages.
    PIP_NO_CACHE_DIR=off \
    # PIP_DISABLE_PIP_VERSION_CHECK is set to on to prevent pip from checking for newer versions of itself.
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    # PIP_DEFAULT_TIMEOUT is set to 100 to prevent pip from timing out when downloading packages.
    PIP_DEFAULT_TIMEOUT=100 \
    # PATH is set to include the pyenv shims and bin directories.
    PATH="/opt/pyenv/shims:/opt/pyenv/bin:$PATH" \
    # PYENV_ROOT is set to /opt/pyenv to specify the location of the pyenv installation.
    PYENV_ROOT="/opt/pyenv" \
    # PYENV_SHELL is set to bash to specify the shell that pyenv should use.
    PYENV_SHELL="bash" \
    # POETRY_NO_INTERACTION is set to 1 to prevent poetry from prompting for input.
    POETRY_NO_INTERACTION=1 \
    # POETRY_VIRTUALENVS_IN_PROJECT is set to false to prevent poetry from creating virtual environments in the project directory.
    POETRY_VIRTUALENVS_IN_PROJECT=false \
    # POETRY_VIRTUALENVS_CREATE is set to false to prevent poetry from creating virtual environments.
    POETRY_VIRTUALENVS_CREATE=false \
    # PYENV_VERSION is set to the version of Python that should be installed (latest).
    # PYENV_VERSION=2.3.35 \ using this causes the build to fail on pip install poetry!
    # POETRY_VERSION is set to the version of Poetry that should be installed (latest).
    POETRY_VERSION=1.7.1 \
    # PYTHON_VERSION is set to the version of Python that should be installed (LTS).
    PYTHON_VERSION=3.12.1

Installing System Dependencies

This is the layer where we'll install system dependencies required for Python and your application's Python dependencies to be compiled. You will only need to update this layer if a Python package you install fails and explicitly asks for a system package to be installed. A common culprit for updating this layer would be changes to PyTorch.

FROM base AS system-deps
# We'll copy the sources.list file and enable source repositories.
# Next we'll update the package list.
RUN cp /etc/apt/sources.list /etc/apt/sources.list~ && \
    sed -Ei 's/^# deb-src /deb-src /' /etc/apt/sources.list && \
    apt-get update && apt-get install -y --no-install-recommends \
    # this set of packages are all the required ones to build Python and ALL of it's standard library
    build-essential gdb lcov pkg-config \
      libbz2-dev libffi-dev libgdbm-dev libgdbm-compat-dev liblzma-dev \
      libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev \
      lzma lzma-dev tk-dev uuid-dev zlib1g-dev \
    # place other packages we typicall use in Ubuntu for development or debugging
    # we'll use these to install Pyenv, Poetry, and Python
    ca-certificates \
    curl \
    git \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Install Python

We'll build and install Pyenv from scratch, then install LTS Python and set it as the system's global Python. After a new global Python has been installed, we'll pip install Poetry. Again, this step is only done if we want to change versions of Python or Poetry so this layer stays constent for long periods of time reducing build times!

FROM system-deps AS python-setup
RUN git clone -b v2.3.35 --single-branch --depth 1 https://github.com/pyenv/pyenv.git $PYENV_ROOT && \
    pyenv install ${PYTHON_VERSION} && \
    pyenv global ${PYTHON_VERSION} && \
    find $PYENV_ROOT/versions -type d '(' -name '__pycache__' -o -name 'test' -o -name 'tests' ')' -exec rm -rf '{}' + && \
    find $PYENV_ROOT/versions -type f '(' -name '*.pyo' -o -name '*.exe' ')' -exec rm -f '{}' + && \
    rm -rf /tmp/* && \
    pip install poetry==${POETRY_VERSION}

Installing Your Application's Python Dependencies

Now that we have system packages and Python installed, we'll make a new layer that is YOUR applications Python dependencies. These will change far less than the code you write so making a reusable layer out of this is a great idea.

FROM python-setup AS poetry-install
WORKDIR /code/
# We'll copy over your pyproject.toml and poetry.lock files to install your dependencies.
COPY poetry.lock pyproject.toml /code/
# Install your dependencies without development dependencies to the system (no virtual environment required).
RUN poetry install --no-dev --no-root --no-interaction --no-ansi

Great! Now we have a layer with your Python packages ready to go. We'll copy the built packages to the final image.

Extra Credit: Installing PyTorch for CPU Only

If you have PyTorch specified in your local development settings or for images that use GPU, we will actually remove it and install the CPU only varients here if you need to access PyTorch functions in environments where you do not have GPU.

RUN pip uninstall torch --yes
RUN pip install torch --index-url https://download.pytorch.org/whl/cpu

Add Your Code Into The Image

Finally, we've got a layer that has a built version of Python and our system dependencies with NO development dependencies. We'll first install any system dependencies that are required at runtime and cannot be set aside in another layer. Then we'll copy over our codebase.

FROM base AS copy-codebase
WORKDIR /code/
# Install any system dependencies that are required at runtime
RUN apt-get update && apt-get install -y --no-install-recommends \
    # ca-certificates is typically required for a certificate store
    ca-certificates \
    # libpq-dev is used for Postgres interaction via libraries like psycopg2
    libpq-dev \
    && update-ca-certificates --fresh
# SSL_CERT_DIR is set to /etc/ssl/certs to specify the location of the SSL certificate directory.
ENV SSL_CERT_DIR=/etc/ssl/certs
# Copy Python and your application's Python dependencies from the poetry-install layer
COPY --from=poetry-install $PYENV_ROOT $PYENV_ROOT
# ADD YOUR PYTHON CODE HERE
COPY <some_files> .
# Run `pyenv init -` on container startup, giving it access to pyenv's shims and binaries
# aka enable Python to be used outside of the layer it was created in
RUN eval "$(pyenv init -)"

Now you've got it! A Python Docker image that can be used for any Python application.

Putting It All Together

Here's the full Dockerfile for you to use! You can find it and an example of how to use it in the GitHub repository

# syntax=docker/dockerfile:1
FROM ubuntu:latest as base
 
ENV \
    LANG="C.UTF-8" \
    LC_ALL="C.UTF-8" \
    DEBIAN_FRONTEND=noninteractive \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    PIP_DEFAULT_TIMEOUT=100 \
    PATH="/opt/pyenv/shims:/opt/pyenv/bin:$PATH" \
    PYENV_ROOT="/opt/pyenv" \
    PYENV_SHELL="bash" \
    POETRY_NO_INTERACTION=1 \
    POETRY_VIRTUALENVS_IN_PROJECT=false \
    POETRY_VIRTUALENVS_CREATE=false \
    # PYENV_VERSION=2.3.35 \ using this causes the build to fail on pip install poetry!
    POETRY_VERSION=1.7.1 \
    PYTHON_VERSION=3.12.1
 
FROM base AS system-deps
RUN cp /etc/apt/sources.list /etc/apt/sources.list~ && \
    sed -Ei 's/^# deb-src /deb-src /' /etc/apt/sources.list && \
    apt-get update && apt-get install -y --no-install-recommends \
    build-essential gdb lcov pkg-config \
      libbz2-dev libffi-dev libgdbm-dev libgdbm-compat-dev liblzma-dev \
      libncurses5-dev libreadline6-dev libsqlite3-dev libssl-dev \
      lzma lzma-dev tk-dev uuid-dev zlib1g-dev \
    ca-certificates \
    curl \
    git \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 
FROM system-deps AS python-setup
RUN git clone -b v2.3.35 --single-branch --depth 1 https://github.com/pyenv/pyenv.git $PYENV_ROOT && \
    pyenv install ${PYTHON_VERSION} && \
    pyenv global ${PYTHON_VERSION} && \
    find $PYENV_ROOT/versions -type d '(' -name '__pycache__' -o -name 'test' -o -name 'tests' ')' -exec rm -rf '{}' + && \
    find $PYENV_ROOT/versions -type f '(' -name '*.pyo' -o -name '*.exe' ')' -exec rm -f '{}' + && \
    rm -rf /tmp/* && \
    pip install poetry==${POETRY_VERSION}
 
FROM python-setup AS poetry-install
WORKDIR /code/
COPY poetry.lock pyproject.toml /code/
RUN poetry install --no-dev --no-root --no-interaction --no-ansi
 
FROM base AS copy-codebase
WORKDIR /code/
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    && update-ca-certificates --fresh
ENV SSL_CERT_DIR=/etc/ssl/certs
COPY --from=poetry-install $PYENV_ROOT $PYENV_ROOT
COPY hello_world.py .
RUN eval "$(pyenv init -)"

Usage

First ensure you have Docker installed and running. For interactive use refer to https://docs.docker.com/desktop/. Next clone the example repo via:

git clone https://github.com/djstein/python-docker-image

Then, in the same directory as the Dockerfile, run the following command to build your image with some tag:

docker build -t hello-world .