dot Stop testing, start deploying your AI apps. See how with MIT Technology Review’s latest research.

Download now

You need more than a vector database

Unless you’ve been off the grid the past two years, you’ve likely come across the term “vector database” in conversations about GenAI.

I joined Redis in 2022, pre-ChatGPT, when vectors were largely unknown to most software developers. Back then, my team advocated use cases for vectors and vector search, but the story was hard to sell until the rise of Large Language Models (LLMs).  Just look at how the Google search trends for the term have exploded over time:

There are now over 40 options to choose from today, including Redis.

In 2024, vector databases became the go-to for building retrieval-augmented generation (RAG) systems. Why? Because vectors power semantic search between user questions and document chunks. But it’s not quite that simple.

Reflecting on our customers’ success, lessons learned, and the rapidly shifting AI landscape, there’s still an elephant in the room that needs addressing—one that might be news to some and obvious to others: you need more than a vector database.

Information retrieval is not new

Long before the hype around LLMs and vector search, the information retrieval (IR) community spent decades making sense of vast amounts of text and unstructured data. Algorithms like TF-IDF and BM25 still power world-class search engines. These methods generate scores based on how often words appear in a document and how rare they are across the entire corpus, highlighting terms that best characterize a document. 

Though the spotlight shifted to vectors, vector database offerings are scrambling to catch up with lexical search—like it’s 1990 all over again.

Why? The most effective retrieval solutions don’t choose one over the other. Instead, they often implement some form of hybrid search that combines signals from both vector and lexical search. Vectors might be the hot topic today, but they’re building on decades of innovation—standing on the shoulders of giants.

Vectors are just lists of numbers

In the field, I’ve seen non-technical leaders get wide-eyed hearing about the power of vectors and semantic search. But vectors aren’t magic; they’re just numerical fingerprints of data.

Part of the issue here is that we tend to make things sound more complicated than they actually are.

Top tech companies like Google, Amazon, and Meta have used embeddings for recommendations and personalization for decades. Back at my previous startup, I had to train a two-tower deep learning model from scratch (for days) just to generate quality word embeddings.

Today, we can create them on-demand with off-the-shelf APIs from the likes of HuggingFace, Cohere, and OpenAI.

Tooling and techniques have improved, but the core concept remains the same.

Vector search is applied linear algebra

At its core, vector search compares the lists of numbers to find “nearest neighbors.” The math—dot product, cosine similarity, or euclidean distance—has been well understood for decades. Advanced vector indexing techniques to accelerate search like HNSW or Microsoft’s DiskANN are well-known and commonly implemented in vector databases today.

The hard part isn’t the vector math.

The real challenge begins where the math ends

In practice, the hardest parts of bringing a RAG system to production aren’t related to vector math at all. Complexity arises from managing real-time updates, dynamic re-indexing, large data volumes, fluctuating query loads, disaster prevention and recovery. These are the operational challenges that determine whether your platform can deliver consistent performance at scale. Redis, for instance, has long offered multi-tenancy, high availability, active-active replication, and durability—capabilities that pure vector database vendors are now scrambling to implement.

As Chip Huyen (famous ML researcher and founder) notes, the ambiguity of LLM responses, the need for prompt versioning and evaluation, and the careful consideration of cost and latency all illustrate that success comes from mastering more than just retrieval. While I won’t detail an entire LLMOps framework here, let’s highlight a few key considerations that underscore why you need more than a vector database.

Managing access: security and governance

The very first hurdle you will likely encounter is protecting your most valuable assets: your proprietary data and your customers’ sensitive information. Unfortunately, it’s all too easy to pass the wrong pieces of information over the web to a third party model. This introduces new requirements in the development lifecycle.

Financial service companies that we work with heavily invest in model risk management (MRM) processes to mitigate potential for harm. Other companies introduce strict document governance to ensure users only gain access to docs according to their roles.

While necessary, these processes introduce added complexity and slowdowns in delivery timelines.

Managing state: session management

LLMs don’t retain memory between inference calls. Secondly, any distributed app needs a “session.”  So, if you’re building a conversational app or a multi-step LLM workflow, you will need a data layer to store and retrieve session context and chat history in real time. Relying on in-process app memory is wrought with peril as soon as you scale beyond a single user! 

Controlling cost with caching

In talking with one of our marquee RAG customers at AWS re:Invent a few weeks back, they shared that they spend over $80k per quarter on their OpenAI bill for text generation alone (on input and output tokens). They estimate that somewhere around 30-40% of their calls are similar to previously asked questions (which also lines up with academic research on this very topic). The lead architect said that he’s worried about the end of year conversation with the CTO and CFO related to cost optimization.

 I don’t blame him.

Third party LLMs can be expensive. Repeatedly calling them for redundant queries is wasteful. Semantic caching is a helpful technique for minimizing latency, increasing throughput, and reducing costs. By caching results keyed by semantic similarity rather than exact strings, you can quickly deliver answers without repeatedly hitting LLMs. 

And much more

As apps grow in complexity, you may find other techniques or patterns helpful:

  • AI gateways: centrally manage access to LLMs and enforce rate limits across teams. Gateways like Kong or open source proxy services like LiteLLM are often necessary to control the flow of requests to different models. We see this pattern applied heavily in large banks and financial institutions.
  • Semantic routing: route queries based on their meaning/intent. Product support questions go to one pipeline, HR questions to another. Send simple queries to a cheaper LLM and more complex reasoning-based requests to your heavy hitter models like GPT o1.
  • Embedding caching: cache embeddings to avoid re-embedding the same exact chunk data repeatedly.
  • Agent checkpointing: if you’re using agent-like architectures with tools like CrewAI or LangGraph, save intermediate states so they can resume reasoning without retracing every step.
  • Message brokering & streaming: use message streams (like Redis Streams) to orchestrate multi-step LLM pipelines and handle real-time updates efficiently.

Below is an example architecture based on Redis that includes many of these components in an end-to-end flow.

Also, check out this great reference architecture from Dell AI Factory that demonstrates a similar thing.

TLDR: Vector search is only one piece of the puzzle.

As the GenAI market matures, users will demand platforms that cover the full gamut—full-text search, vectors, caching, message streaming, session management, and beyond. Expect consolidation; nobody wants to juggle half a dozen single-purpose tools while trying to build something real.

We’re also heading into a world of agent-driven workflows with multiple LLM calls per request. You’ll need robust infrastructure, battle-tested features, and a data platform that can handle it all. Redis has spent years delivering operational excellence at enterprise scale, while newcomers scramble to catch up.

The future belongs to those who look past the hype and embrace holistic solutions. You need more than a vector database—start thinking comprehensively today. After all, the real challenge begins once you’ve nailed the math.

To learn more, check out Redis for AI or read our docs.