This repository houses python notebooks used to fine tune, evaluate and compare performance of Small Language Models (SLMs) used to answer the following question -
Can Small Language Models (SMLs) display a comparable performance to LLMs in extracting information from HTML?
Also, I have published a Substack article discussing the approach I take to tackle this question, my finding, thoughts and more!
git clone https://github.com/pradyGn/are-SLMs-performant.git
cd are-SLMs-performant
pip install -r requirements.txt
2025-05-26_finetuning-SML.ipynb: Contains code to fine tune a language model of your choice.
2025-05-20_evaluate-SML.ipynb: Contains code to perform inference on the fine tuned language model.
2025-04-07_results-comparison.ipynb: Contains helper functions and usage examples to compare the performance of the fine tuned language model.
are-SLMs-performant/
βββ notebooks/
β βββ 2025-05-26_finetuning-SML.ipynb
β βββ 2025-05-20_evaluate-SML.ipynb
β βββ 2025-04-07_results-comparison.ipynb
βββ results/
β βββ Llama-3.2-1B_test_dataset_output.parquet
β βββ Llama-3.2-1B_unseen_test_dataset_output.parquet
β βββ ReaderLM-v2_test_dataset_output.parquet
β βββ ReaderLM-v2_unseen_test_dataset_output.parquet
βββ README.md
βββ .gitignore
βββ requirements.txt