Load any clip model with a standardized interface
pip install all_clip
from all_clip import load_clip
import torch
from PIL import Image
import pathlib
model, preprocess, tokenizer = load_clip("open_clip:ViT-B-32/laion2b_s34b_b79k", device="cpu", use_jit=False)
image = preprocess(Image.open(str(pathlib.Path(__file__).parent.resolve()) + "/CLIP.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]Checkout these examples to call this as a lib:
This module exposes a single function load_clip:
- clip_model CLIP model to load (default ViT-B/32). See below supported models section.
- use_jit uses jit for the clip model (default True)
- warmup_batch_size warmup batch size (default 1)
- clip_cache_path cache path for clip (default None)
- device device (default None)
- clip-retrieval to use clip for inference, and retrieval
- open_clip to train clip models
- CLIP_benchmark to evaluate clip models
Specify the model as "ViT-B-32"
"open_clip:ViT-B-32/laion2b_s34b_b79k" to use the open_clip
"hf_clip:patrickjohncyh/fashion-clip" to use the hugging face
DeepSparse is an inference runtime for fast sparse model inference on CPUs. There is a backend available within clip-retrieval by installing it with pip install deepsparse-nightly[clip], and specifying a clip_model with a prepended "nm:", such as "nm:neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds" or "nm:mgoin/CLIP-ViT-B-32-laion2b_s34b_b79k-ds".
japanese-clip provides some models for japanese.
For example one is ja_clip:rinna/japanese-clip-vit-b-16
Please follow these steps:
- Add a file to load model in
all_clip/ - Define a loading function, that returns a tuple (model, transform, tokenizer). Please see
all_clip/open_clip.pyas an example. - Add the function into
TYPE2FUNCinall_clip/main.py - Add the model type in
test_main.pyandci.yml
Remarks:
- The new tokenizer/model must enable to do the following things as https://github.com/openai/CLIP#usage
tokenizer(texts).to(device)...textsis a list of stringmodel.encode_text(tokenized_texts)...tokenized_textsis a output fromtokenizer(texts).to(device)model.encode_image(images)...imagesis a image tensor by thetransform
Setup a virtualenv:
python3 -m venv .env
source .env/bin/activate
pip install -e .
to run tests:
pip install -r requirements-test.txt
then
make lint
make test
You can use make black to reformat the code
python -m pytest -x -s -v tests -k "ja_clip" to run a specific test