LLM and Embeddings Experiments

This repository contains hands-on experiments with:

text generation using small language models
prompt engineering techniques
word and sentence embeddings
semantic similarity using cosine distance
transformer-based applications

The goal is to understand practical behaviour of LLMs and embeddings through experimentation.

Repository Structure

llm_foundations.py – Text generation and tokenisation experiments
prompt_engineering.py – Summarisation, Q&A, and creative prompting
embeddings.py – Word and sentence embeddings using GloVe
sentiment_analysis.py – Transformer-based sentiment analysis demo

Critical Conclusions & Design Decisions

This project uses different models for different tasks based on observed behaviour rather than convenience. During experimentation, it became clear that no single small language model performs well across all use cases.

DistilGPT-2 was used for free-form text generation because it produces diverse and creative continuations when sampling is enabled, making it suitable for open-ended prompts. However, it was less reliable for structured tasks such as summarisation or factual question answering.
Flan-T5 (instruction-tuned) was used for summarisation, Q&A, and prompt engineering experiments. It followed explicit instructions more consistently, and its outputs improved significantly when few-shot examples and decoding controls such as beam search and repetition penalties were applied. Flan-T5 was also preferred for tokenisation experiments due to its cleaner and more interpretable subword tokens.

A key insight from the project is the difference between LLMs and embeddings. LLMs are probabilistic and generative, making them powerful but sometimes inconsistent. Embeddings, on the other hand, are deterministic representations of meaning and are highly reliable for similarity, clustering, and retrieval tasks. Using GloVe embeddings with cosine similarity demonstrated how semantic relationships can be captured without text generation.

Overall, the main takeaway is that effective NLP systems are task-driven and often combine generative models with embeddings, rather than relying on a single model for all problems.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
embeddings.py		embeddings.py
llm_foundations.py		llm_foundations.py
prompt_engineering.py		prompt_engineering.py
requirements.txt		requirements.txt
sentiment_analysis.py		sentiment_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM and Embeddings Experiments

Repository Structure

Critical Conclusions & Design Decisions

About

Uh oh!

Releases

Packages

Languages

rishabhpatre/llm-and-embedding-experiments

Folders and files

Latest commit

History

Repository files navigation

LLM and Embeddings Experiments

Repository Structure

Critical Conclusions & Design Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages