Exploiting Structural Properties for Efficient Constraint-Aware HNSW Hyperparameter Tuning

SIGMOD 2027

CHAT turns HNSW hyperparameter tuning from expensive black-box trial-and-error into a structure-aware search over constraint boundaries.

1Seoul National University
*Corresponding author
45% higher QPS
11% higher Recall
44× faster convergence

Default HNSW settings rarely sit on the right constraint-aware frontier.

Black-box tuning spends its budget rebuilding expensive indexes and sampling noisy, discrete configurations.
CHAT uses HNSW-specific structure to search constraint boundaries directly and validate only useful candidates.

Abstract

Vector databases are now a core substrate for modern retrieval systems, including Retrieval-Augmented Generation pipelines. Their search layer must satisfy several demands at once: high recall, high throughput, bounded latency, manageable build time, and compact index size. HNSW is widely adopted because it offers an attractive recall-latency trade-off, yet its three central knobs, M, efc, and efs, interact in nonlinear ways that make production tuning costly and brittle.

This paper shows that the HNSW search space is not arbitrary. Query-time candidate expansion induces monotone feasibility boundaries; construction-time parameters exhibit dominant unimodal behavior after the search boundary is selected; and resource usage separates into compact build-time and index-size dependencies. CHAT uses these structural regularities to tune HNSW with deterministic, sample-efficient search while pruning resource-infeasible configurations before full index construction.

Across multiple datasets and HNSW-based vector search engines, CHAT finds configurations that maximize Recall or QPS while satisfying constraints on accuracy, throughput, build time, index size, and tuning budget.

Motivation

HNSW looks simple from the outside, but a production deployment rarely asks for a single metric. It asks for a configuration that is fast enough, accurate enough, small enough, cheap enough to build, and discoverable within a limited tuning window.

Defaults Are Dataset-Specific Compromises

The same default parameters can be conservative on one workload and infeasible on another. Under identical Recall or QPS targets, the best configuration can move substantially across datasets.

Constraints Change the Objective

A Recall constraint asks the tuner to maximize throughput at the smallest useful search breadth, while a QPS constraint asks it to spend just enough search effort to improve accuracy without violating service targets.

Index Builds Are Expensive Evidence

Construction parameters require rebuilding the graph, so naive exploration wastes most of its budget on configurations that are either resource-infeasible or far from the constraint boundary.

Contributions

01

A Constraint-Aware Formulation

We cast HNSW tuning as selecting M, efc, and efs under explicit performance, resource, and tuning-budget constraints.

02

Structural Regularities

We identify monotone query-time boundaries, dominant unimodal construction trends, and separable resource behavior in standard HNSW implementations.

03

CHAT

We introduce an API-level black-box, HNSW-structure-aware tuner that combines hierarchical search, boundary search, and measured validation.

04

Resource-Aware Pruning

We develop compact build-time and index-size models that prune unlikely candidates before full graph construction while preserving final validation.

05

Broad Evaluation

We validate CHAT across Faiss, Hnswlib, and Milvus on five representative datasets, comparing against exhaustive search and strong black-box tuners.

Structural Insights

CHAT is not a generic optimizer with a new acquisition function. Its efficiency comes from the mechanics of HNSW itself: how query breadth changes performance, how graph construction improves then saturates, and how resource costs separate.

01

Monotone Search-Time Feasibility

For a fixed HNSW graph, efs is an ordered query-time knob. Larger efs explores more candidates, so Recall increases toward saturation while QPS decreases.

CHAT uses this monotonic structure to localize the smallest feasible efs under a Recall constraint, or the largest feasible efs under a throughput constraint, instead of blindly sampling query settings.

Effect of efs on Recall and QPS
02

Dominant Unimodality

Construction knobs such as efc improve graph quality at first, but the benefit saturates. Past that point, extra construction effort mainly increases traversal overhead.

The resulting constrained objective follows a dominant rise-then-fall trend, allowing CHAT to search the construction space directionally rather than enumerate every (M, efc) pair.

Dominant unimodal objective trends as efc varies
03

Separable Resources

Resource costs do not move as a single black-box quantity. Build time follows construction search breadth and realized connectivity, while index size is governed mainly by the degree budget induced by M.

CHAT turns this separation into early pruning: resource-infeasible construction candidates can be rejected before spending the full cost of validation.

Build time and realized connectivity as construction parameters vary
Index size dependence on M and efc

Method

CHAT decomposes the original three-dimensional tuning problem into a sequence of smaller decisions. It first proposes construction parameters, checks whether they are likely to satisfy resource budgets, builds only surviving candidates, and then searches for the boundary efs value on the completed index. Measurements from each build and validation run feed back into the search.

Architecture and iterative tuning workflow of CHAT

CHAT iteratively proposes HNSW configurations, filters resource-infeasible candidates, tunes efs on built indexes, and feeds measurements back into the constraint models.

1

Propose a Construction Region

The outer search localizes promising M values, while the inner search selects candidate efc values for each fixed M.

2

Filter by Resource Feasibility

Conservative build-time and index-size models reject candidates likely to violate resource budgets before expensive graph construction begins.

3

Build, Measure, and Calibrate

Candidates that pass the filter are built on the target backend. CHAT records build time, index size, Recall, and QPS to update the search history.

4

Locate the Boundary efs

If Recall is constrained, CHAT chooses the smallest feasible efs to preserve QPS. If QPS is constrained, it chooses the largest feasible efs to recover Recall.

API-Level Black Box

CHAT does not inspect adjacency lists, layer populations, or degree distributions. It uses standard build/query APIs and validation metrics.

Structure-Aware Search

The search procedure is specialized to standard HNSW semantics: bounded-degree construction, greedy layer traversal, and diversified neighbor selection.

Validated Output

Resource models only prune. Any configuration returned by CHAT has survived actual construction, measurement, and performance validation.

Results

CHAT is evaluated on nytimes, glove, sift, deep1M, and youtube datasets with Faiss and Hnswlib, plus an additional Milvus study. Under Recall constraints, CHAT maximizes throughput; under QPS constraints, it maximizes Recall while satisfying the required service target.

The experiments compare CHAT with exhaustive search, Grid Search, Random Search, Optuna, NSGA-II, Expected Constrained Improvement, and VDTuner. All methods are evaluated under the same tuning budget, and reported configurations must satisfy the active performance and resource constraints.

98.03% - 99.97%

of Oracle QPS under Recall constraints.

99.11% - 99.99%

of Oracle Recall under QPS constraints.

up to 45%

higher throughput than strong black-box tuning baselines.

up to 44×

faster convergence to 95% of the Oracle objective.

Tuning performance over time for CHAT and baseline methods

Main tuning results: CHAT reaches high-quality configurations quickly across datasets, backends, and constraint settings, staying close to the exhaustive-search oracle while outperforming black-box baselines.

Dimension Setup
Backends Faiss, Hnswlib, and Milvus
Datasets nytimes, glove, sift, deep1M, and youtube
Knobs M in [4, 64], efc in [8, 1024], efs in [10, 1024]
Constraints Recall, QPS, build time, index size, tail latency, and tuning budget
Baselines Oracle, Grid, Random, Optuna, NSGA-II, ECI, and VDTuner

Performance

CHAT stays close to exhaustive search while consistently reaching strong configurations earlier than generic tuners. Under Recall constraints it improves throughput by up to 45%, and under QPS constraints it improves Recall by up to 11% over the best baselines.

Resource Constraints

The resource filter reduces unnecessary index builds and total tuning time without changing the final selected configuration in the reported settings. This is especially important because construction-time exploration dominates practical tuning cost.

Deployment Robustness

CHAT also supports query-side constraints such as hard-query Recall and P95/P99 latency. Because these constraints can be expressed through the same ordered efs boundary, the tuner can handle stricter service-level objectives without changing its core workflow.

Takeaway

CHAT shows that HNSW tuning does not need to be treated as an opaque optimization problem. By respecting the algorithm's structural properties, it can navigate narrow feasibility regions, avoid expensive invalid builds, and return validated configurations that are close to exhaustive-search quality within practical tuning budgets.

BibTeX

@article{choi2027chat,
  author    = {Choi, Geon and Lee, Hoeun and Do, Jaeyoung},
  title     = {Exploiting Structural Properties for Efficient Constraint-Aware HNSW Hyperparameter Tuning},
  journal   = {Proceedings of the 2027 International Conference on Management of Data (SIGMOD)},
  year      = {2027},
}