e.g.
pip install spacy
python -m spacy download en_core_web_sm
and in Dockerfile:
RUN pip install spacy
RUN python -m spacy download en_core_web_sm
In microllama.py:
from langchain.text_splitter import SpacyTextSplitter
and
splitter = CharacterTextSplitter(... -> splitter = SpacyTextSplitter(...
Index creation takes roughly twice as long.