Skip to content

Dockerfile/docker image to quickly test (initial work provided) #1

@awesomebytes

Description

@awesomebytes

Hello,
First, thanks for your effort!

I wanted to give it a try, and I didn't want to spend too long fiddling, so I thought of doing it in a docker container.
Given the results of the network (that I could afford running) are... not really good, I have not gone further in making this even more accessible. However, let me share what I made:

Starting from the assumption that you have a host computer with an Nvidia card and the drivers installed (e.g. you can call nvidia-smi), you can build a docker image with this Dockerfile:

# minillm (well, pytorch) was build with CUDA 11.6, so we need to match the version
FROM nvidia/cuda:11.6.0-devel-ubuntu20.04

RUN apt-get update && apt-get install -y git python3-pip
RUN git clone https://github.com/kuleshov/minillm
RUN cd minillm && pip3 install -r requirements.txt
# Note: you need to find your GPU CC with: `nvidia-smi --query-gpu=compute_cap --format=csv` (For RTX 3000, it's 8.6)
# This is to avoid: https://github.com/pytorch/extension-cpp/issues/71
# arch_list[-1] += '+PTX'
# IndexError: list index out of range
RUN cd minillm && TORCH_CUDA_ARCH_LIST="8.6+PTX" python3 setup.py install
RUN minillm download --model llama-7b-4bit --weights llama-7b-4bit.pt

Note the TORCH_CUDA_ARCH_LIST needs to have your CUDA compute capability added, depending on your machine. This could be done via a bash script detecting it and using a variable inside of the Dockerfile to replace it.
This image is about 14.6GB. Which is pretty big, 3.6GB are just the weights, which, arguably, could be downloaded on the host instead and mounted as a volume.

To build this docker image, one can do docker build -f Dockerfile -t minillm . (with Dockerfile being the filename of the previous code block). And, once built, this can be run, with the rather long command, that may actually have not-necessary parameters, but I took it from a different project:

docker run \
    -it \
    --rm \
    -v /tmp/.X11-unix:/tmp/.X11-unix \
    --env="DISPLAY=${DISPLAY}" \
    --privileged \
    --net=host \
    -v /dev:/dev \
    --runtime=nvidia \
    --name minillm \
    minillm

And that will drop you in a shell, where you can use minillm.
For example I did:

generate --model llama-7b-4bit --weights llama-7b-4bit.pt --max-length 100 --temperature 1. --top_k 50 --top_p 0.95 --prompt "Make a rhyme that includes the following words: computer, mug, plant, yoga"
Loading LLAMA model
Done
Make a rhyme that includes the following words: computer, mug, plant, yoga
Computer, mug, plant, yoga
Computer, computer, computer!
Washing machines were not that smart!
In the beginning of time, before time, and space,
Before the age of the dinosaur, the mammal was a worm.
And even before the dawn of our planet
We had a plant, but before our species

All the prompts I tried were rather... random. But this was my first try ever, so I'm happy I got to play with it.

Again, thanks for your work. And I thought I better share what I did instead of just leaving it in a forgotten folder in my laptop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions