AI Search Engine | CTO Program Project

🎯 Project Goal

This project demonstrates how to rank documents by relevance using TF-IDF and cosine similarity — the foundation of search engines and RAG systems.

⚙️ Implementation

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

docs = [
  "Our company launched a new AI analytics tool for predictive insights.",
  "The quarterly finance report showed steady revenue growth.",
  "We migrated to AWS and Azure for scalable cloud infrastructure.",
  "The latest marketing campaign focused on SEO optimization.",
  "This article discusses AI tools and machine learning applications."
]

query = "AI and machine learning"

vectorizer = TfidfVectorizer()
tfidf = vectorizer.fit_transform(docs + [query])
cosine_scores = cosine_similarity(tfidf[-1], tfidf[:-1]).flatten()

for doc, score in sorted(zip(docs, cosine_scores), key=lambda x: x[1], reverse=True):
    print(f"({score:.2f}) {doc}")

💡 Example Output

🔍 Top Results:
(0.82) This article discusses AI tools and machine learning applications.
(0.66) Our company launched a new AI analytics tool for predictive insights.
(0.09) We migrated to AWS and Azure for scalable cloud infrastructure.

📘 CTO Takeaway

This classic search approach remains efficient and interpretable — ideal for enterprise document retrieval and RAG systems.