embeddings

What are embeddings ?

Embeddings are representations of values or objects like text, images, and audio that are designed to be consumed by machine learning models and semantic search algorithms. They translate objects like these into a mathematical form according to the factors or traits each one may or may not have, and the categories they belong to.

What types they can be ?

Text Embeddings:

Word Embeddings (words to vectors)
Sentence Embeddings (sentences to vectors. more subtle nuances of language can be captured) Image Embeddings:
CNN (images to vectors. e. g. can be used for classification problems or to generate a new one)
Autoencoders (compress image to dense vector representation) Other types of Embeddings:
Graph Embeddings (recommendation systems)
Sequence Embeddings (for sequence analysis, for example, in time series)

Key terms used for Embedding

Vector - is a mathematical object that represents a quantity with both magnitude and direction. In machine learning, a vector typically refers to an ordered set of numerical values representing a data point or features in a multi-dimensional space.
Dense Vector - is a type of vector where most of its elements are non-zero. In the context of embeddings, dense vectors are used to represent discrete objects or values within a continuous multi-dimensional vector space. Dense vectors contain information about the attributes or features of the represented objects, and they are often utilized in machine learning tasks for their ability to capture intricate relationships and patterns within data.
Cosine Similarity - the most popular metrics to see the distance between two vectorized objects

What Object can be embedded ? (and which algorithms used for it)

Words (Word2Vec; GloVe (Global Vectors for Word Representation); FastText; BERT (Bidirectional Encoder Representations from Transformers))
Text Document (Doc2Vec; Universal Sentence Encoder (USE); BERT; ELMO)
Audio Data (VGGish; OpenL3; Wav2Vec) etc.

How do they work ? (briefly)

Embeddings work by transforming high-dimensional and sparse data into dense, low-dimensional representations in a continuous vector space

Which algorithms were used in this project ?

Word2Vec to vectorize sample data and to look for similar word
GloVe to test this algoritm and also to look for similar word in sentence
SentenceTransformer to use embeddings for news titles
KMeans to cluster those embedded news articles
Elbow method and Silhouette Score to calculate most powerful k-value for k-clusters
PCA for visualization of Word Embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
Screenshot 2025-02-05 at 01.00.31.png		Screenshot 2025-02-05 at 01.00.31.png
glove_large.6B.100d.txt		glove_large.6B.100d.txt
glove_large.6B.100d.word2vec		glove_large.6B.100d.word2vec
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

embeddings

What are embeddings ?

What types they can be ?

Key terms used for Embedding

What Object can be embedded ? (and which algorithms used for it)

How do they work ? (briefly)

Which algorithms were used in this project ?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

embeddings

What are embeddings ?

What types they can be ?

Key terms used for Embedding

What Object can be embedded ? (and which algorithms used for it)

How do they work ? (briefly)

Which algorithms were used in this project ?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages