Skip to content

Can Small Language Models (SMLs) display a comparable performance to LLMs in extracting information from HTML?

License

Notifications You must be signed in to change notification settings

pradyGn/are-SLMs-performant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Are-SLMs-Performant?

This repository houses python notebooks used to fine tune, evaluate and compare performance of Small Language Models (SLMs) used to answer the following question -

Can Small Language Models (SMLs) display a comparable performance to LLMs in extracting information from HTML?

Also, I have published a Substack article discussing the approach I take to tackle this question, my finding, thoughts and more!

πŸ“¦ Installation

git clone https://github.com/pradyGn/are-SLMs-performant.git
cd are-SLMs-performant
pip install -r requirements.txt

πŸ§ͺ Notebooks

2025-05-26_finetuning-SML.ipynb: Contains code to fine tune a language model of your choice.

2025-05-20_evaluate-SML.ipynb: Contains code to perform inference on the fine tuned language model.

2025-04-07_results-comparison.ipynb: Contains helper functions and usage examples to compare the performance of the fine tuned language model.

πŸ“ Folder Structure

are-SLMs-performant/
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 2025-05-26_finetuning-SML.ipynb
β”‚   β”œβ”€β”€ 2025-05-20_evaluate-SML.ipynb
β”‚   └── 2025-04-07_results-comparison.ipynb
β”œβ”€β”€ results/
β”‚   β”œβ”€β”€ Llama-3.2-1B_test_dataset_output.parquet
β”‚   β”œβ”€β”€ Llama-3.2-1B_unseen_test_dataset_output.parquet
β”‚   β”œβ”€β”€ ReaderLM-v2_test_dataset_output.parquet
β”‚   └── ReaderLM-v2_unseen_test_dataset_output.parquet
β”œβ”€β”€ README.md
β”œβ”€β”€ .gitignore
└── requirements.txt

πŸ”— References

πŸ™‹β€β™‚οΈ Contact

About

Can Small Language Models (SMLs) display a comparable performance to LLMs in extracting information from HTML?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published