AI Fundamentals - Vector Databases
Simple and easy to understand introduction to Vector Databases that power AI models and applications
Celebrate 4th of July with 35% off on our paid subscription — limited-time offer, don’t miss out!
Hello guys, In the rapidly evolving world of AI, managing and searching through high-dimensional data efficiently is becoming a critical challenge. This is where Vector Databases step in as a powerful solution.
Earlier, we have talked about The Complete AI and LLM Engineering Roadmap, RAG fundamentals, and 10 Must Read AI and LLM Engineering Books and today’s article we will discuss what is Vector Database and why do we need this.
For this, I have partnered with
, an Engineering Manager, Software Architect, and Thought Leader and an IIT Delhi alumnus and the author behind the popular Kite newsletter, unpacks the fundamentals of vector databases.Rajendra shares practical insights on why vector databases matter, how they power AI applications like semantic search and recommendation systems, and what you need to know as a developer or architect to leverage them effectively.
Let’s dive into this essential building block for modern AI systems.
What is a Vector Database? (The Simple Answer)
Imagine you're organizing a massive library, but instead of sorting books alphabetically, you organize them by how similar they are to each other.
Romance novels go near each other, science fiction clusters together, and cookbooks sit in their own section.
A vector database does something similar, but with any kind of data - text, images, audio, you name it.
Let's Start with Vectors (Don't Worry, It's Simple!)
What's a vector? Think of it as a list of numbers that describes something. Like a recipe for describing characteristics.
Toy Example: Let's say I want to describe different fruits:
Apple: [5, 2, 8] (sweetness=5, sourness=2, size=8)
Lemon: [1, 9, 3] (sweetness=1, sourness=9, size=3)
Orange: [6, 4, 7] (sweetness=6, sourness=4, size=7)
See how each fruit becomes a list of numbers? That's a vector!
Why Do We Need This?
Traditional databases are like filing cabinets - you need to know exactly what drawer to open.
If I ask a regular database "find me something like an apple," it has no clue what "like" means.
But with vectors, I can ask: "Find me fruits similar to [5, 2, 8]" and it can calculate that oranges [6, 4, 7] are pretty close!
Real-World Example: Netflix Recommendations
When Netflix suggests movies, it's not just looking at genres. It creates a vector for each movie based on hundreds of characteristics:
Movie: "The Matrix" → [0.8, 0.2, 0.9, 0.1, 0.7...] (action=0.8, romance=0.2, sci-fi=0.9, comedy=0.1, etc.)
When you like The Matrix, Netflix finds other movies with similar vectors. That's why you get Blade Runner, not The Notebook!
How Does "Similarity" Work?
Think of it like measuring distance between points on a map. If two vectors are "close" in this mathematical space, the things they represent are similar.
Simple example:
Cat: [4, 1, 8, 9] (fluffy=4, barks=1, small=8, pet=9)
Dog: [3, 9, 7, 9] (fluffy=3, barks=9, small=7, pet=9)
Lion: [5, 2, 1, 2] (fluffy=5, barks=2, small=1, pet=2)
Cat and Dog are more similar to each other than either is to Lion, even though cats and lions are both felines!
The Magic: How Do We Get These Vectors?
This is where it gets cool. We use AI models (like the ones behind ChatGPT) to automatically convert things into vectors:
Text: "I love pizza" → [0.23, -0.45, 0.78, 0.12, ...]
Images: Photo of a sunset → [0.67, 0.34, -0.12, 0.89, ...]
Audio: Beatles song → [0.45, -0.23, 0.56, 0.78, ...]
The AI learns patterns from millions of examples and creates these numerical descriptions automatically.
Real Applications You Use Daily
Google Search: When you search "cute puppy videos," Google converts your query into a vector and finds web pages with similar vectors.
Spotify: Creates vectors for songs based on rhythm, genre, mood, etc. That's how Discover Weekly works!
ChatGPT: When you ask a question, it converts your question to a vector and finds the most relevant information from its training.
Photo Apps: When you search "beach" in your photos, the app finds pictures with vectors similar to typical beach scenes.
Why Not Just Use Regular Databases?
Let me give you a concrete example:
Traditional Database Query: "Find all customers named John"
Perfect! Exact matches.
Vector Database Query: "Find customers similar to this one who might like our new product"
Traditional database: 🤷♀️ "I don't know what 'similar' means"
Vector database: 💡 "Here are customers with similar purchase patterns, demographics, and behavior!"
The Technical Magic (Simplified)
Convert everything to vectors using AI models
Store vectors in a special database optimized for similarity search
When querying: Convert your question to a vector too
Find similar vectors using mathematical distance calculations
Return the original data that those similar vectors represent
Think of it like having a super-smart librarian who understands the essence of what you're looking for, not just the exact words you use.
That’s all guys, if you like this explanation of Vector Database don’t forget to subscribe to Rajendra’s substack, kite, he share interesting article like this there.
All the best with your AI journey.
Here are few more AI and LLM related articles you may like to read