Choosing a vector database is one of the first decisions in any RAG project. Three options dominate developer usage: Pinecone, Chroma, and Weaviate. Here’s a practical comparison.
Quick Overview
| Pinecone | Chroma | Weaviate | |
|---|---|---|---|
| Hosting | Managed cloud only | Local or cloud | Self-hosted or cloud |
| Setup time | 5 minutes | 1 minute | 10–20 minutes |
| Free tier | Yes (1 index) | Yes (local) | Yes (self-hosted) |
| Best for | Production scale | Development, prototyping | Hybrid search, complex queries |
Chroma: Start Here
Chroma runs in-process — no server needed. Perfect for development and small projects.
pip install chromadb
import chromadb
from chromadb.utils import embedding_functions
client = chromadb.PersistentClient("./chroma_db")
ef = embedding_functions.SentenceTransformerEmbeddingFunction( model_name="all-MiniLM-L6-v2" )
collection = client.get_or_create_collection("docs", embedding_function=ef)
collection.add( documents=[ "RAG stands for Retrieval-Augmented Generation.", "Fine-tuning updates model weights on new data.", "Vector search finds semantically similar content.", ], ids=["doc1", "doc2", "doc3"], )
results = collection.query(query_texts=["how does RAG work?"], n_results=2) print(results["documents"])
Pros: Zero setup, runs locally, great for prototyping. Cons: Not designed for multi-node scale, no built-in auth.
Pinecone: Managed Scale
Pinecone is a fully managed vector database. No infrastructure to run.
pip install pinecone-client sentence-transformers
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
pc.create_index( name="kalyna-docs", dimension=384, metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1"), )
index = pc.Index("kalyna-docs") model = SentenceTransformer("all-MiniLM-L6-v2")
texts = ["RAG is a retrieval technique.", "Fine-tuning changes model weights."] embeddings = model.encode(texts).tolist()
index.upsert(vectors=[ {"id": "v1", "values": embeddings[0], "metadata": {"text": texts[0]}}, {"id": "v2", "values": embeddings[1], "metadata": {"text": texts[1]}}, ])
query_vec = model.encode(["how to retrieve documents?"]).tolist()[0] results = index.query(vector=query_vec, top_k=2, include_metadata=True) for match in results.matches: print(match.metadata["text"], "| score:", round(match.score, 3))
Pros: Zero ops, scales automatically, fast at large scale. Cons: Vendor lock-in, can get expensive, free tier limits.
Weaviate: Hybrid Search
Weaviate supports both vector search and keyword (BM25) search — called hybrid search.
pip install weaviate-client
docker run -p 8080:8080 cr.weaviate.io/semitechnologies/weaviate:latest
import weaviate
from weaviate.classes.config import Configure, Property, DataType
client = weaviate.connect_to_local()
client.collections.create( "Document", vectorizer_config=Configure.Vectorizer.text2vec_transformers(), properties=[ Property(name="content", data_type=DataType.TEXT), Property(name="source", data_type=DataType.TEXT), ], )
collection = client.collections.get("Document") collection.data.insert({"content": "RAG retrieves documents at inference time.", "source": "guide"})
results = collection.query.hybrid( query="retrieval augmented generation", alpha=0.5, # 0 = pure keyword, 1 = pure vector limit=2, ) for obj in results.objects: print(obj.properties["content"])
client.close()
Pros: Hybrid search, strong filtering, active community. Cons: More complex setup, steeper learning curve.
Which One to Pick?
Chroma — prototyping, demos, local development. Working in 10 minutes.
Pinecone — production, you don’t want to manage infrastructure. Pay for convenience.
Weaviate — hybrid search (semantic + keyword), complex filters, multi-tenant systems.
With Claude (RAG Example)
import anthropic
def rag_answer(question: str) -> str: results = collection.query(query_texts=[question], n_results=3) context = "\n".join(results["documents"][0])
client = anthropic.Anthropic() response = client.messages.create( model="claude-sonnet-4-6", max_tokens=512, messages=[{ "role": "user", "content": f"Answer based on this context only:\n{context}\n\nQuestion: {question}" }] ) return response.content[0].text
This pattern works identically with Pinecone or Weaviate — just swap the retrieval step.
Originally published at kalyna.pro