Blog
Vector similarity explained: metrics, algorithms, & best infrastructure
Building AI apps that understand meaning requires more than keyword matching. Your search engine needs to understand that "cheap flights" and "budget airfare" are asking for the same thing. Your RAG system needs to find relevant documents without exact matches. Your AI agent needs to recall past conversations even when the wording changes. Vector similarity makes all of this possible.
The math behind it is straightforward, but scaling it in production isn't. You need to choose the right similarity metric for your use case, pick algorithms that deliver sub-100ms latency at scale, and build infrastructure that doesn't bottleneck when you're searching millions of vectors.
This guide breaks down how vector similarity works under the hood, which metrics and algorithms to use, and how to scale it in production without performance bottlenecks.
Understanding vector similarity
Vector similarity is the mathematical measurement of how close two data points are in a high-dimensional vector space. It's the foundation for semantic search, RAG, recommendation systems, AI agent memory, and most modern AI features.
What are vectors?
A vector is a list of numbers that represents data in a way machines can process. In AI, vectors capture the meaning of text, images, audio, or any other data type as coordinates in a multi-dimensional space.
It’s like plotting words on a graph where similar meanings are clustered together. Two words like "king" and "queen" would be close to each other, while "king" and "refrigerator" would be far apart. Vectors make this kind of positioning possible by encoding semantic relationships as numerical positions.
In practice, AI models convert raw data like sentences, product descriptions, user behavior, and images into vectors with hundreds or thousands of dimensions. These high-dimensional vectors can capture nuanced relationships that simple keyword matching can't.
What is vector similarity?
Vector similarity is the process of measuring how close two vectors are in that multi-dimensional space. When vectors are close together, the data they represent is semantically similar. When they're far apart, the data is unrelated. Vector similarity connects raw data to meaningful AI outputs through a pipeline of embeddings, semantic relationships, similarity metrics, and search algorithms.
Vector embeddings
Vector embeddings are the numerical representations that make similarity possible. Embedding models convert text, images, or other data into dense vectors. It makes sure that words with similar meanings get similar vectors, sentences about related topics cluster together, and images with similar content map to nearby coordinates.
Semantic relationships
Embeddings encode semantic relationships that go beyond surface-level text matching. This is what makes vector similarity powerful for AI applications.
Semantic search uses embeddings to capture conceptually similar items. Exact keyword matches might not return "queen" when you search for "king," but vector similarity understands the relationship. Instacart uses this approach to help users filter through noisy product data across hundreds of retailers.
RAG systems depend on semantic relationships to retrieve relevant context. When an employee queries personnel records, vector similarity ensures the system returns relevant documents even if the exact phrasing doesn't match what's stored.
AI agents use semantic relationships for memory. As agents work through tasks, they build context that needs to be retrieved later. Vector similarity allows agents to recall relevant past interactions without exact keyword matches.
Measuring similarity
Once you have vectors, you need a way to measure how close they are. There are three main, commonly used similarity metrics: cosine similarity, dot product, and Euclidean distance. Each handles direction and magnitude differently, and choosing the right one affects your application's accuracy.
| Considers direction | Considers magnitude | |
|---|---|---|
| Cosine similarity | ✅ | ❌ |
| Dot product | ✅ | ✅ |
| Euclidean distance | ✅ | ✅ |
Cosine similarity
Cosine similarity measures the angle between vectors, regardless of their magnitude. You can calculate it as the cosine of the angle between the two vectors. A cosine similarity score of 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates complete dissimilarity.
Because cosine similarity disregards the length of the vectors, focusing only on directionality, it’s most useful when the overall scale of the compared vectors’ values isn’t meaningful. For example, if you’re querying across documents, the fact that one document might be much longer than another isn’t meaningful when looking for substantive similarity.
Cosine similarity can become limiting, however, because it misses differences in scale that can be meaningful in certain contexts.
Dot product
Euclidean distance measures the straight-line distance between vectors and is sensitive to both magnitude and position in space, treating the vectors as though they were points in a geometric space. Euclidean distance accounts for the absolute magnitude of vector components, meaning it directly measures the distance between two vectors in space, including their directions and scales of difference.
To calculate Euclidean distance, you take the difference between the corresponding components of two vectors, square each difference, sum them up, and take the square root. A smaller Euclidean distance means the vectors are very close in terms of all their component values.
Euclidean distance is typically the best choice when differences in feature values are meaningful. If a feature compares user profiles with count-based features (i.e., features that rely on the frequency of items, characteristics, or events), Euclidean distance can measure how much those attributes differ.
Euclidean distance
Euclidean distance measures the straight-line distance between vectors, treating them as points in geometric space. You calculate it by taking the difference between corresponding vector components, squaring each difference, summing them, and taking the square root.
A smaller Euclidean distance means vectors are closer in terms of all their component values. This metric works well when absolute differences in feature values matter—like comparing user profiles with count-based features where the frequency of items or events is meaningful.
When to use each metric
Your choice of similarity metric should match your use case:
- Cosine similarity: Use for text similarity, document comparison, and semantic search where document length varies. Best when you care about meaning.
- Dot product: Use for recommendation systems, collaborative filtering, and applications where magnitude represents importance (like user activity levels). Also use when your embedding model was trained with dot product loss.
- Euclidean distance: Use for clustering, anomaly detection, and applications where absolute differences in feature values matter. Works well for count-based features and spatial data.
If you're unsure which metric will work best, or if your use cases are still evolving, choose infrastructure that supports all three. RedisVL, for example, supports cosine, dot product, and Euclidean distance and integrates with LangChain, SpringAI, and LlamaIndex.
Vector similarity algorithms
It’s computationally expensive to calculate similarity across millions and billions of vectors. At production scale, you need algorithms that trade a small amount of precision for massive speed improvements.
These are some of the most commonly used vector similarity algorithms:
- HNSW (Hierarchical Navigable Small World): Builds a multi-layer graph where searches start at sparse top layers and navigate down through denser layers. Delivers 95%+ recall while being orders of magnitude faster than brute-force search. The tradeoff is memory—HNSW stores the graph structure alongside vectors.
- ScaNN (Scalable Nearest Neighbors): Developed by Google, ScaNN uses learned quantization to compress vectors while preserving similarity relationships. Excels at balancing speed, accuracy, and memory efficiency for large-scale deployments.
- IVF (Inverted File Index): Partitions vectors into clusters using k-means, then searches only relevant clusters at query time. Memory-efficient for very large datasets, but recall can suffer near cluster boundaries. Often combined with product quantization (IVF-PQ) for further memory savings.
When you’re choosing an algorithm, consider the tradeoffs between query speed, recall accuracy, memory usage, and index build time. The right choice depends on your dataset size, latency requirements, and infrastructure constraints.
What makes vector similarity hard to scale
The mathematics at the foundation of vector similarity is relatively simple. You don’t need to be a data scientist, ML engineer, or mathematician to understand the basic mechanics. The nuance, the complexity, and ultimately, the scalability and success of these approaches emerge when you consider vector similarity at production scale.
Performance at scale
In theory, computing vector similarity is just a mathematical calculation, but in practice, computing vector similarity across large vector sets, in production, is a systemic challenge that frequently faces latency bottlenecks.
High-dimensional vectors are expensive to store and process at scale. In a production context, that work can be made even more complex through infrastructural overhead, including index structures and sharding. All of this can increase the memory footprint, which can lead to greater infrastructure costs. These costs can also translate into performance issues, especially in use cases like real-time search, which require sub-millisecond infrastructure.
Throughout, there are tradeoffs between memory and latency. Some approaches use more memory to run faster queries; others reduce memory and increase computation costs; and others compress vectors to reduce the memory costs of distance calculation, even though it might require more computation.
Operational complexity
Especially given the numerous performance issues and cost tradeoffs mentioned above, operational complexity can make vector similarity hard to scale. There are many open-source options, such as FAISS, a library that supports similarity search and dense vector clustering, but they tend to be complex.
These options tend to be powerful, but the infrastructure setup can be complex, and the complexity costs can accumulate over time, making maintenance difficult. Hosted solutions can often take care of this complexity for you, but there’s a tradeoff. Hosted solutions often introduce lock-in or inflexibility, making it difficult to iterate.
Integration challenges
Similarity search is one of the most frequent occasions where vector similarity comes into play, and it also highlights one of the scalability challenges: integrations. Teams building similarity search often need to plug into existing stacks. They might already be using AI frameworks, such as LangChain, semantic caches, or memory systems.
All of these tools and approaches can be powerful, but different toolstacks can struggle to scale when used in complex multi-modal or agentic workflows. The Redis Vector Library (RedisVL), in contrast, simplifies indexing and querying, making it easier to manage similarity metrics.
How Redis supports scalable vector similarity
Redis has supported vector storage since Redis 7.2 introduced scalable vector similarity search in 2023, followed by RedisVL in 2024. In April 2025, Redis announced vector sets, a data type designed specifically for vector similarity that reduces memory use, simplifies indexing, and optimizes real-time similarity queries.
Vector search built for real-time AI workflows
Redis Cloud provides sub-millisecond latency with native support for cosine, dot product, and Euclidean vector similarity through RedisVL. Organizations use Redis vector search to power RAG, chatbots, semantic caching, and long-term agent memory, without latency that damages user experience.
There’s a significant performance gap between Redis and competitors. Redis achieves 9.5X higher QPS and 9.7X lower latency than Amazon Aurora PostgreSQL with pgvector, 11X higher QPS and 14.2X lower latency than MongoDB Atlas, and 53X higher QPS with latency improvements over Amazon OpenSearch. At scale, Redis delivers sub-100ms latency for vector search operations, with 95th percentile latency at 30ms under heavy usage. With Redis 8, searching 1 billion vectors achieves 90% precision at 200ms median latency.
Developer-ready & flexible
Infrastructure is only as good as it is usable. Powerful features hidden behind overly complex interfaces aren’t as powerful and effective tools that don’t effectively integrate with other tools aren’t as useful as they could be.
Redis integrates with a wide range of tools and frameworks developers are already familiar with, including LangChain, LlamaIndex, SpringAI, and more. RedisVL is open source, easy to use, and available through Redis Cloud and hybrid or on-premises environments.
Redis in action
Redis supports organizations across industries with their similarity workloads:
- Relevance AI, which helps companies build AI agents, uses Redis to power vector search with sub-millisecond latency, allowing AI agents to retrieve relevant information and generate instant responses.
- Superlinked, a compute framework and cloud infrastructure provider, used Redis Cloud to build a highly responsive, scalable vector database sustaining heavy usage with 95th percentile latency at 30ms.
- Docugami, an AI-powered document engineering platform, uses Redis as a vector database to enable RAG, in-context learning, and vector search.
Jacky Koh, Co-Founder and CEO of Relevance AI, put it simply: "Every millisecond counts, and slow vector searches were limiting our AI agents from delivering instant, accurate responses."
Redis vs. other vector infrastructure options
Redis is not the only vector infrastructure available, but customers frequently test it and find it to be the fastest, most scalable option. For example, Daniel Svonava, co-founder of Superlinked, “had very specific requirements for a vector database. We looked at the available options and determined that the Redis Cloud best fit our needs.”
- Pinecode, a fully managed vector database, is primarily a managed service, whereas Redis supports hosted and self-managed options.
- FAISS is an open-source option that’s highly performant but doesn’t offer a REST API or native service layer, unlike Redis.
- Weavitate is another open-source option, but its GraphQL interface can be intimidating, unlike the intuitive interface that Redis offers.
- Milvus is a specialized vector database, but it requires significant overhead, whereas Redis minimizes overhead while offering fine-tuning options.
- Elasticsearch offers vector search, but because Elasticsearch isn’t optimized for this use case, Redis can maintain greater performance and recall at scale
Redis vs. Pinecone
Pinecone is a cloud-native, fully managed vector database. Pinecone is optimized for vector search and supports serverless architecture for scalability and hybrid search to enhance search accuracy. Pinecone is primarily offered as a managed service on cloud platforms, such as AWS and Azure, meaning that Pinecone is limited as a self-hosted option.
Redis, in contrast, supports both hosted and self-managed options, allowing companies full control over how and where they build their vector databases and vector search features. This matters for industries like financial services and healthcare where data residency, compliance requirements, or security policies require on-premises deployments.
Redis vs. FAISS, Weaviate, Milvus, & Elasticsearch
There is a wide variety of open source options available to support vector infrastructure, but each poses separate limitations.
FAISS, for example, is highly performant but infrastructure-heavy. FAISS doesn’t offer a REST API or native service layer, meaning it’s only suitable for teams that want to build everything from scratch.
Weaviate includes numerous built-in assumptions and requires more ramp-up for teams to understand. The GraphQL interface adds complexity compared to Redis' more intuitive approach.
Milvus is a specialized vector database that is fairly performant, but it introduces operational overhead that poses tradeoffs to that performance. Milvus also requires separate metadata stores and GPU tuning to reach optimal speed.
Elasticsearch is familiar to most teams and does offer vector search via dense vector fields, but it isn't optimized for this use case. Performance and recall can degrade at scale, making it better suited for hybrid keyword and semantic search rather than pure vector workloads.
Own your vector search stack
AI is changing daily, and catching up can be a pyrrhic victory if you buy into tools that create vendor lock-in. Instead, developers and teams must own their vector search stack to have the freedom and flexibility to move faster, ship better, and develop smarter and smarter AI solutions.
Redis gives teams that control without sacrificing performance, allowing teams to blend flexibility, usability, and production readiness so that developers can use vectors however they need to.
Ultimately, as Svonava, co-founder of Superlinked, put it, “Users expect great search and recommendation functionality in every application and website they encounter, yet more than 80 percent of business data is unstructured—stored as text, images, audio, video, or other formats. That’s why vector databases with powerful search features will fuel the next generation of applications.”
If you want to build the next generation of applications, Redis is the infrastructure you need. Ready to own your vector search stack? Check out the Redis Vector Library (RedisVL) and then try Redis free or request a demo.
FAQs about vector similarity
When should I use vector similarity vs. keyword search?
Use vector similarity when you need semantic understanding, like finding results that mean the same thing even with different words. Use keyword search for exact matches or when precision on specific terms matters. Many applications combine both through hybrid search.
How do I choose between cosine, dot product, & Euclidean for my use case?
Start with your data and what matters most.
- Use cosine similarity if you're comparing text or documents of varying lengths and only care about semantic meaning, use cosine similarity.
- Use dot product if magnitude carries meaning in your application, like user activity levels in recommendations or when your embedding model was trained with dot product loss.
- Use Euclidean distance if you need to measure absolute differences in feature values, like count-based user profiles or spatial data.
When in doubt, test all three on your actual data and measure recall.
When should I use each vector similarity algorithm?
The algorithm you should use depends on your priorities around speed, memory, and accuracy.
- Use HNSW when query speed and high recall are priorities and you have memory to spare. It's the best general-purpose choice for most production workloads.
- Use ScaNN when you need to balance speed, accuracy, and memory efficiency at large scale, especially in memory-constrained environments.
- Use IVF when you're working with very large datasets and memory efficiency matters more than peak recall. It's a good fit when you can tolerate slightly lower accuracy near cluster boundaries.
For most AI applications, start with HNSW and optimize from there.
What's the difference between exact & approximate nearest neighbor search?
Exact nearest neighbor search compares a query vector against every vector in your database. It’s accurate, but slow at scale. Approximate nearest neighbor (ANN) algorithms like HNSW, ScaNN, and IVF trade a small amount of precision for massive speed improvements, making real-time search possible with millions or billions of vectors.
How fast is Redis for vector similarity search?
Redis achieves 9.5X higher QPS than Amazon Aurora PostgreSQL with pgvector, 11X higher QPS than MongoDB Atlas, and 53X higher QPS than Amazon OpenSearch. At scale, Redis delivers sub-100ms latency with 95th percentile at 30ms. Searching 1 billion vectors with Redis 8 achieves 90% precision at 200ms median latency.
Get started with Redis today
Speak to a Redis expert and learn more about enterprise-grade Redis today.
