The Standard Recommendation and Why It Is Incomplete
pgvector 0.5.0 introduced HNSW (Hierarchical Navigable Small World) indexing alongside the existing IVFFlat index type. The pgvector benchmarks, the documentation, and most tutorials consistently recommend HNSW: it achieves higher recall at equivalent query latency, does not require knowing the number of vectors in advance, and handles incremental inserts without requiring a full rebuild. For the typical read-heavy similarity search workload — embedding a query, finding the top-K nearest neighbors — HNSW is the better index in most benchmarks.
The cases where IVFFlat is operationally preferable are real, they come up regularly in production deployments, and they are underrepresented in the documentation that teams rely on when making the initial architecture decision.
How Each Index Works
IVFFlat
IVFFlat (Inverted File with Flat compression) divides the vector space into lists clusters using k-means. Each vector is assigned to its nearest cluster centroid. At query time, the search scans the probes closest clusters and computes exact distances within those clusters.
The index is built once, offline, from the full dataset. Once built, it is static — new vectors are inserted into the appropriate cluster but the cluster centroids do not update. As the vector distribution drifts from what it was at build time, recall quality degrades. Periodically rebuilding the index restores recall quality.
HNSW
HNSW builds a multi-layer graph where each vector is connected to its approximate nearest neighbors at each layer. Queries traverse from the top layer (few nodes, coarse navigation) down through increasingly dense layers to find approximate nearest neighbors. The graph is maintained incrementally — each INSERT updates the graph structure without requiring a full rebuild.
The index build is significantly more memory-intensive than IVFFlat. The m parameter (number of connections per node) and ef_construction (search depth during build) control the index quality/build-time tradeoff. Higher values produce better recall but require more memory during construction and more storage for the index itself.
When IVFFlat Wins: The Four Cases
Case 1: Bulk Load Followed by Read-Heavy Query Phase
If your workload is batch — load a large corpus of embeddings, then serve similarity queries with minimal new inserts — IVFFlat's build-then-query model fits naturally. You build the index after the bulk load with optimal cluster configuration, then queries run against a fully-tuned index.
HNSW's incremental build on a bulk load is substantially slower and more memory-intensive than IVFFlat's one-shot k-means clustering. On a 10M vector load into a 1536-dimension collection, HNSW build time can be 3-5x longer than IVFFlat and requires significantly more RAM during construction.
-- IVFFlat: build after bulk load
-- Rule of thumb: lists = sqrt(row_count) for up to 1M rows
-- lists = row_count / 1000 for larger collections
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000); -- for ~1M vectors
-- Verify cluster distribution
SELECT
n_distinct,
most_common_vals
FROM pg_stats
WHERE tablename = 'embeddings' AND attname = 'embedding';
-- HNSW equivalent for comparison
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Case 2: Memory-Constrained Environments
HNSW's memory footprint during index build scales with m * ef_construction * dimensions * sizeof(float) per vector, plus the graph connectivity data. For high-dimension embeddings (1536-d from OpenAI, 3072-d from text-embedding-3-large) with large m values, the build-time memory requirement can exceed available RAM on smaller instances, causing the build to spill to disk and becoming extremely slow.
-- Estimate HNSW build memory requirement
-- Approximate formula: rows * m * 2 * 4 bytes (for graph edges)
-- Plus: rows * dimensions * 4 bytes (for the vectors themselves)
-- For 500K vectors at 1536 dimensions, m=16:
-- Graph edges: 500,000 * 16 * 2 * 4 = ~64 MB
-- Vectors: 500,000 * 1536 * 4 = ~3 GB
-- Total build memory: ~3+ GB needed in maintenance_work_mem
-- Set for HNSW build session
SET maintenance_work_mem = '4GB';
-- IVFFlat k-means clustering is much less memory-intensive
-- maintenance_work_mem of 256MB-1GB typically sufficient
SET maintenance_work_mem = '512MB';
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 700);
On db.r6g.2xlarge Aurora instances (64 GB RAM) with other databases sharing memory, allocating 4+ GB to maintenance_work_mem for an HNSW build is often impractical. IVFFlat builds comfortably within tighter memory constraints.
Case 3: Vector Distribution Drift with Scheduled Rebuilds
If your embedding collection represents a domain that evolves significantly over time — product catalog embeddings that update monthly, document embeddings that change as a knowledge base is rewritten, user preference embeddings that shift with user behavior — the vector distribution at query time may differ substantially from the distribution at index build time.
For HNSW, incremental inserts degrade the graph structure compared to a fresh build from the current distribution. HNSW does not support rebuilding short of dropping and recreating the index. IVFFlat, designed around periodic rebuilds, handles this explicitly:
-- IVFFlat: periodic rebuild to refresh cluster centroids
-- Can be done concurrently without locking reads
-- Step 1: build new index concurrently
CREATE INDEX CONCURRENTLY embeddings_ivfflat_new
ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);
-- Step 2: swap (drop old, rename new)
BEGIN;
DROP INDEX embeddings_ivfflat_old;
ALTER INDEX embeddings_ivfflat_new RENAME TO embeddings_ivfflat;
COMMIT;
-- Schedule this rebuild based on insert volume
-- Common trigger: rebuild when new inserts exceed 20% of total rows
SELECT
relname,
n_live_tup,
n_ins_since_vacuum
FROM pg_stat_user_tables
WHERE relname = 'embeddings';
Case 4: Filtered Similarity Search at Scale
Both index types work with WHERE clause filters on non-vector columns. However, their behavior with aggressive filtering differs. When a filter eliminates most rows before the similarity comparison (e.g., searching only within a specific tenant, category, or date range), both indexes may effectively scan all qualifying rows anyway if the filtered set is small enough that the index provides no benefit.
For workloads with high-selectivity WHERE clauses, a partial IVFFlat index scoped to the filtered subset can outperform a full HNSW index:
-- Partial index for a specific tenant's embeddings
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100)
WHERE tenant_id = 'acme-corp';
-- Query uses the partial index automatically
SELECT id, content,
embedding <=> '[0.1, 0.2, ...]'::vector AS distance
FROM embeddings
WHERE tenant_id = 'acme-corp'
ORDER BY distance
LIMIT 10;
Multiple small partial IVFFlat indexes often outperform one large HNSW index when query patterns are tenant- or category-scoped.
Tuning IVFFlat: lists and probes
IVFFlat has two tuning parameters that matter:
lists: Number of clusters created at build time. More clusters = smaller clusters = faster search (fewer distance computations per probe) but lower recall if the query vector's true neighbors span multiple clusters. The standard heuristic issqrt(row_count)for collections under 1M rows.probes: Number of clusters searched at query time. Set viaSET ivfflat.probes = N. Higher probes = higher recall but slower queries. Default is 1, which is too low for most production use cases.
-- Check recall quality at different probe settings
-- Run against a held-out test set with known ground truth nearest neighbors
SET ivfflat.probes = 1;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM embeddings
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
SET ivfflat.probes = 10;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM embeddings
ORDER BY embedding <=> '[...]'::vector
LIMIT 10;
-- Set probes based on recall requirement
-- For 95%+ recall: probes ≈ lists / 10
-- For 99%+ recall: probes ≈ lists / 3
-- Measure against your actual data and query distribution
Tuning HNSW: m and ef_construction
-- m: number of connections per node
-- Higher m = better recall, more memory, slower build, larger index
-- Default: 16. Range: 2-100. Values above 64 rarely help.
-- ef_construction: search depth during index build
-- Higher = better recall at build time, slower build
-- Default: 64. Should be >= 2 * m.
-- ef_search: search depth at query time (session-level)
-- SET hnsw.ef_search = 40; -- default 40
-- Higher = better recall, slower queries
-- Production starting points:
-- m = 16, ef_construction = 64, ef_search = 40 (balanced)
-- m = 32, ef_construction = 128, ef_search = 80 (higher recall)
CREATE INDEX ON embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
SET hnsw.ef_search = 40;
The Decision Framework
Start with HNSW if: queries are read-heavy with continuous small inserts, memory budget is generous, and recall quality on a stable vector distribution is the primary concern.
Consider IVFFlat if: workload is bulk-load followed by query-heavy phase, memory at build time is constrained, vector distribution changes significantly over time requiring periodic rebuilds, or query patterns are filtered well enough to justify partial indexes.
Benchmark both on your actual data before committing. Synthetic benchmarks on uniformly distributed random vectors do not accurately predict performance on real embedding distributions, which tend to have cluster structure that affects how both index types behave.
Building a production RAG or semantic search system?
We assess pgvector configurations, index tuning, and Aurora/RDS storage architecture for AI workloads as part of a free cloud database assessment.