This example demonstrates building a data science application that requires system dependencies for compilation.
- ✅ Installing packages with native extensions (NumPy, Pandas)
- ✅ Adding system build dependencies (gcc, gfortran)
- ✅ Processing data with common data science libraries
- ✅ Copying required runtime libraries to distroless
data-science/
├── Dockerfile
├── requirements.txt
├── analyze.py
└── sample_data.csv
docker build -t data-science-example .# Run the analysis on sample data
docker run --rm data-science-example
# Run with custom data (mount volume)
docker run --rm -v $(pwd)/your_data.csv:/app/data.csv data-science-exampleMany data science packages (NumPy, Pandas, SciPy) require system libraries to compile:
# Install build dependencies in the build stage
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc g++ gfortran \
libopenblas-dev \
&& rm -rf /var/lib/apt/lists/*Compiled packages need runtime libraries in the final image:
# Copy required runtime libraries
COPY --from=build-venv /usr/lib/x86_64-linux-gnu/libgfortran.so.5* /usr/lib/x86_64-linux-gnu/
COPY --from=build-venv /usr/lib/x86_64-linux-gnu/libopenblas.so.0* /usr/lib/x86_64-linux-gnu/
COPY --from=build-venv /usr/lib/x86_64-linux-gnu/libquadmath.so.0* /usr/lib/x86_64-linux-gnu/Why this is needed:
- Build dependencies (gcc, gfortran) compile the code
- Runtime libraries (libgfortran, libopenblas) are needed to run it
- Distroless doesn't include these by default
- We copy only what's needed to keep the image small
Problem: Pandas/NumPy can't find required shared libraries.
Solution: Copy the missing library from build stage (shown above).
Problem: error: command 'gcc' failed
Solution: Add required build dependencies in the build stage:
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc g++ make \
&& rm -rf /var/lib/apt/lists/*RUN apt-get install -y gcc g++ gfortran libopenblas-devRUN apt-get install -y gcc libjpeg-dev zlib1g-devRUN apt-get install -y gcc libpq-devSee TROUBLESHOOTING.md for more package-specific dependencies.
If copying libraries becomes complex, consider using a slim runtime instead of distroless:
FROM debian:bookworm-slim
# Install only runtime dependencies (not build tools)
RUN apt-get update && apt-get install -y --no-install-recommends \
libgfortran5 \
libopenblas0 \
&& rm -rf /var/lib/apt/lists/*
COPY --from=build-venv /.venv /.venv
# ... rest of DockerfileTrade-off:
- ✅ Easier - no need to manually copy libraries
- ✅ More compatible - includes more system libraries
- ❌ Larger image - includes package manager and shell
- ❌ Less secure - more attack surface
Pre-built wheels avoid compilation:
# uv automatically uses wheels when available
RUN uv pip install numpy pandasUse BuildKit cache mounts for faster rebuilds:
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install -r requirements.txtThe analyze.py script:
- Loads CSV data with Pandas
- Calculates basic statistics
- Demonstrates NumPy array operations
- Shows that compiled packages work in distroless
This proves your data science stack is working correctly!