1. What are Embeddings?
Embeddings are numerical vectors representing words, sentences, or documents. They allow AI to measure **semantic similarity**βhow close in meaning two pieces of text are.
Example: "I love AI" and "Artificial intelligence is amazing" would have similar embeddings, even though words differ.
2. Visualizing Embeddings
Imagine a 2D map where similar words are close together:
"cat" and "dog" are nearby, "cat" and "car" are far apart.
# Conceptually:
# cat -> [0.1, 0.9]
# dog -> [0.2, 0.85]
# car -> [0.9, 0.1]
# Euclidean distance measures similarity
3. Simple HuggingFace Example
Python example using sentence-transformers to generate embeddings:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = ["I love AI", "Artificial intelligence is amazing", "I enjoy hiking"]
embeddings = model.encode(sentences)
# Find similarity
similarity = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity}")
# Output: similarity close to 1 (high)
This shows how AI measures semantic closeness between texts. Higher cosine similarity β more similar meaning.
4. Applications
- Semantic search (like Google or RAG retrieval)
- Recommendation systems
- Question answering and chatbots
- Clustering similar documents
5. Try It Yourself
Pick a few sentences or short paragraphs. Generate embeddings and calculate similarity to see which are closest in meaning.
Visualize with a 2D plot if you like!
6. Inspirational Quote
"Meaning is hidden in numbers; embeddings reveal it."