Embeddings: Turning Meaning into Numbers

Learn how AI represents text and data in vector space for similarity, search, and reasoning.

1. What are Embeddings?

Embeddings are numerical vectors representing words, sentences, or documents. They allow AI to measure **semantic similarity**β€”how close in meaning two pieces of text are.

Example: "I love AI" and "Artificial intelligence is amazing" would have similar embeddings, even though words differ.

2. Visualizing Embeddings

Imagine a 2D map where similar words are close together:

"cat" and "dog" are nearby, "cat" and "car" are far apart.
# Conceptually:
# cat -> [0.1, 0.9]
# dog -> [0.2, 0.85]
# car -> [0.9, 0.1]
# Euclidean distance measures similarity
      

3. Simple HuggingFace Example

Python example using sentence-transformers to generate embeddings:

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')

sentences = ["I love AI", "Artificial intelligence is amazing", "I enjoy hiking"]

embeddings = model.encode(sentences)

# Find similarity
similarity = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity}")
# Output: similarity close to 1 (high)
      
This shows how AI measures semantic closeness between texts. Higher cosine similarity β†’ more similar meaning.

4. Applications

5. Try It Yourself

Pick a few sentences or short paragraphs. Generate embeddings and calculate similarity to see which are closest in meaning. Visualize with a 2D plot if you like!

6. Inspirational Quote

"Meaning is hidden in numbers; embeddings reveal it."