1. What is a Vector Database?
A vector database stores **high-dimensional embeddings** efficiently, allowing fast retrieval of similar items. Unlike traditional databases that rely on exact matches, vector DBs find **semantic similarity**.
Example: Searching "AI ethics" in a vector DB might also return results like "responsible AI guidelines" based on meaning.
2. How it Works
- Store embeddings (vectors) for text, images, or other data.
- Use similarity metrics (cosine, dot-product) to find closest matches.
- Supports large-scale search with millions of vectors efficiently.
Popular vector DBs: Pinecone, Milvus, Weaviate, FAISS (open-source), Qdrant.
3. Simple Python Example using FAISS
import numpy as np
import faiss
# Sample embeddings (3 vectors of dimension 5)
vectors = np.array([
[0.1, 0.3, 0.2, 0.7, 0.5],
[0.2, 0.1, 0.4, 0.6, 0.3],
[0.9, 0.7, 0.8, 0.2, 0.1]
], dtype='float32')
# Build index
index = faiss.IndexFlatL2(5) # L2 distance
index.add(vectors)
# Query vector
query = np.array([[0.15,0.25,0.3,0.65,0.4]], dtype='float32')
distances, indices = index.search(query, k=2)
print("Closest vectors:", indices)
print("Distances:", distances)
The query returns the most similar embeddings quickly. Vector DBs handle millions efficiently in production.
4. Applications
- Semantic search engines
- RAG systems (Retrieval-Augmented Generation)
- Recommendation engines
- Image or multimedia similarity search
- AI knowledge retrieval
5. Try It Yourself
Generate embeddings for a few sentences or documents.
Store them in a vector DB (like FAISS or Milvus) and perform similarity search.
Visualize results to see how similar items cluster.
6. Inspirational Quote
"Vectors carry knowledge; databases let it speak."