Skip to main content

Working with Vector Search

Description — Use Vector Search with Full Text Search and Query.
Important

This feature is an Enterprise Edition feature.

To use vector search in Dart, bundle the extension library as part of your build and enable it before opening a database that uses vector indexes.

Configure your package pubspec.yaml like this:

hooks:
user_defines:
cbl:
edition: enterprise
vector_search: true
note

If you are using a Dart pub workspace, hooks.user_defines are read from the workspace root pubspec.yaml, so put this configuration there instead.

Then initialize Couchbase Lite and enable vector search:

Example 1. Enable Vector Search
await CouchbaseLite.init();

final status = Extension.enableVectorSearch();
if (status != VectorSearchStatus.enabled) {
throw StateError('Vector search is not available: $status');
}

You can also inspect the current status without enabling the extension by reading Extension.vectorSearchStatus.

If the vector search library is bundled but its path cannot be resolved, both Extension.vectorSearchStatus and Extension.enableVectorSearch() throw a DatabaseException.

VectorSearchStatus values​

ValueMeaning
availableThe library is bundled and the current system supports it. Call Extension.enableVectorSearch() to enable it.
enabledVector search has already been enabled in this process.
libraryNotAvailableThe library was not bundled. Set vector_search: true under hooks.user_defines.cbl in pubspec.yaml.
systemNotSupportedThe current system cannot use vector search. Couchbase Lite supports ARM64 and x86-64, and x86-64 additionally requires AVX2 support.

Create a Vector Index​

Create a vector index with VectorIndexConfiguration and then pass it to Collection.createIndex.

Example 2. Create a Vector Index
final collection = await database.createCollection('colors');

final config = VectorIndexConfiguration(
'vector',
dimensions: 3,
centroids: 100,
encoding: VectorEncoding.none(),
metric: DistanceMetric.cosine,
numProbes: 8,
minTrainingSize: 2500,
maxTrainingSize: 5000,
);

await collection.createIndex('colors_index', config);

This example creates an index over the vector property with:

  • dimensions set to 3
  • centroids set to 100
  • no vector compression
  • cosine distance
  • explicit probe and training-size settings
note

Increasing dimensions, training size, and other vector-index parameters can raise CPU and memory requirements because training vectors must be resident in memory while the index is trained.

Vector Index Configuration​

The table below summarizes the VectorIndexConfiguration options.

Configuration NameIs RequiredDefault ConfigurationFurther Information
ExpressionYesNo defaultA SQL++ expression that returns the vector source. This can be a document property with embedded vectors or a PREDICTION() call.
Number of DimensionsYesNo defaultSupported range: 2 to 4096.
Number of CentroidsYesNo defaultSupported range: 1 to 64000. A common starting point is roughly the square root of the number of documents.
Distance MetricNoDistanceMetric.euclideanSquaredAlternatives are DistanceMetric.cosine, DistanceMetric.euclidean, and DistanceMetric.dot.
EncodingNoVectorEncoding.scalarQuantizer(ScalarQuantizerType.eightBit)Use VectorEncoding.none(), scalar quantization, or product quantization depending on the quality/size tradeoff you need.
Training SizeNoDetermined from the number of centroids and the encodingUse minTrainingSize and maxTrainingSize to override the defaults when you have measured a better configuration for your data set.
NumProbesNoDetermined from the number of centroidsA common guideline for a custom value is at least 8 or around 0.5% of the number of centroids.
LazyNofalseSet lazy: true to disable automatic updates and manage vector computation yourself.
caution

Changing the default training sizes can hurt both index quality and query performance. Treat custom values as an optimization you measure, not a setting you tune blindly.

Generating Vectors​

There are three common ways to generate vectors in Couchbase Lite:

  1. Store embeddings directly in your documents and index the document property.
  2. Use the SQL++ PREDICTION() function so vectors are generated while the index is built.
  3. Use lazy vector indexes to generate vectors asynchronously, which is useful when the model is remote or temporarily unavailable.

Create a Vector Index with Embeddings​

If your documents already contain vectors, create the index directly over that property.

Example 3. Create a Vector Index with Embeddings
final config = VectorIndexConfiguration(
'vector',
dimensions: 3,
centroids: 100,
);

await collection.createIndex('colors_index', config);

Create Vector Index Embeddings from a Predictive Model​

You can also generate vectors at index time with a registered predictive model.

Example 4. Create Vector Index Embeddings from a Predictive Model
class ColorModel implements PredictiveModel {

Dictionary? predict(Dictionary input) {
final color = input.string('colorInput');
if (color == null) {
return null;
}

final vector = colorVectorSync(color);
return MutableDictionary({'vector': vector});
}
}

Database.prediction.registerModel('ColorModel', ColorModel());

final expression =
'PREDICTION(ColorModel, {"colorInput": color}, "vector")';
final config = VectorIndexConfiguration(
expression,
dimensions: 3,
centroids: 100,
);

await collection.createIndex('colors_index', config);

This approach can use less document storage because the embeddings only live in the index. The tradeoff is that indexing takes longer, because the model runs while the index is being built or updated.

Create a Lazy Vector Index​

Lazy vector indexes are useful when vectors cannot be generated synchronously as part of normal document writes.

Example 5. Create a Lazy Vector Index
final config = VectorIndexConfiguration(
'color',
dimensions: 3,
centroids: 100,
lazy: true,
);

await collection.createIndex('colors_index', config);

With lazy: true, the expression does not have to return the final vector. Instead, it returns the value your application will use later to compute the vector.

note

Lazy vector indexing is opt-in. The default is lazy: false.

Updating the Lazy Index​

To update a lazy index, fetch the index and repeatedly call QueryIndex.beginUpdate until there are no more pending rows.

Example 6. Update a Lazy Vector Index
final index = await collection.index('colors_index');
if (index == null) {
throw StateError('colors_index not found');
}

while (true) {
final updater = await index.beginUpdate(limit: 50);
if (updater == null) {
break;
}

for (var i = 0; i < updater.length; i++) {
final color = await updater.value<String>(i);
if (color == null) {
await updater.skipVector(i);
continue;
}

try {
final vector = await colorVector(color);
await updater.setVector(i, vector);
} catch (_) {
await updater.skipVector(i);
}
}

await updater.finish();
}

The update flow is:

  1. Start a batch with beginUpdate(limit: ...).
  2. Read the pending source value for each row with value<T>(index).
  3. Call setVector with the computed vector, setVector(..., null) to remove an existing vector, or skipVector to retry that row later.
  4. Call finish() to commit the batch.

finish() throws if any row in the updater was neither set nor skipped.

Vector Search SQL++ Support​

Couchbase Lite SQL++ supports hybrid vector search and the APPROX_VECTOR_DISTANCE() function.

important

Like the full-text MATCH() function, vector-search predicates cannot be combined with other predicates in the same WHERE clause using OR.

Hybrid vector search combines vector similarity with regular SQL++ filters. The WHERE clause narrows the candidate documents first, and vector distance is then evaluated only for the remaining matches.

note

For non-hybrid vector search, use a LIMIT clause to avoid an exhaustive scan of every possible match. Hybrid vector search does not require a LIMIT, but you will usually still want one to cap result size.

Hybrid Vector Search with Full Text Match​

The following example combines a full-text match with vector similarity.

Example 7. Hybrid Vector Search with Full Text Match
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
WHERE MATCH(color_desc_index, $text)
ORDER BY APPROX_VECTOR_DISTANCE(vector, $vector)
LIMIT 8
''');

query.parameters = Parameters({
'text': 'vibrant',
'vector': await colorVector('FF00AA'),
});

final resultSet = await query.execute();

You can also compute vectors on the fly with PREDICTION() inside the query.

Example 8. Prediction with Hybrid Vector Search
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
WHERE saturation > 0.5
ORDER BY APPROX_VECTOR_DISTANCE(
PREDICTION(ColorModel, {"colorInput": color}, "vector"),
$vector
)
LIMIT 8
''');

query.parameters = Parameters({
'vector': await colorVector('FF00AA'),
});

final resultSet = await query.execute();

APPROX_VECTOR_DISTANCE(vector-expr, target-vector, [metric], [nprobes], [accurate])​

APPROX_VECTOR_DISTANCE() calculates the approximate distance between a query vector and the vectors returned by vector-expr.

warning

If you pass a metric argument, it must match the metric configured for the index. Otherwise query compilation fails.

ParameterIs RequiredDescription
vector-exprYesThe expression that returns the indexed vector. It must match the vector-index expression exactly.
target-vectorYesThe query vector to compare against.
metricNoOne of "EUCLIDEAN_SQUARED", "L2_SQUARED", "EUCLIDEAN", "L2", "COSINE", or "DOT". If omitted, the index metric is used.
nprobesNoOverrides the configured probe count for this query.
accurateNoIf omitted, false is used. This means the encoded vectors in the index are used for distance calculations.
important

Only accurate = false is supported.

Use APPROX_VECTOR_DISTANCE()​

The following example filters by approximate distance.

Example 9. Use APPROX_VECTOR_DISTANCE()
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
WHERE APPROX_VECTOR_DISTANCE(vector, $vector) < 0.5
LIMIT 8
''');

query.parameters = Parameters({
'vector': await colorVector('FF00AA'),
});

final resultSet = await query.execute();

Smaller distance values indicate more similar vectors.

Prediction with APPROX_VECTOR_DISTANCE()​

You can also use PREDICTION() together with APPROX_VECTOR_DISTANCE().

Example 10. Prediction with APPROX_VECTOR_DISTANCE()
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
ORDER BY APPROX_VECTOR_DISTANCE(
PREDICTION(ColorModel, {"colorInput": color}, "vector"),
$vector
)
LIMIT 8
''');

query.parameters = Parameters({
'vector': await colorVector('FF00AA'),
});

final resultSet = await query.execute();

See Also​