Working with Vector Search
This feature is an Enterprise Edition feature.
Use Vector Search​
To use vector search in Dart, bundle the extension library as part of your build and enable it before opening a database that uses vector indexes.
Configure your package pubspec.yaml like this:
hooks:
user_defines:
cbl:
edition: enterprise
vector_search: true
If you are using a Dart pub workspace, hooks.user_defines are read from the
workspace root pubspec.yaml, so put this configuration there instead.
Then initialize Couchbase Lite and enable vector search:
await CouchbaseLite.init();
final status = Extension.enableVectorSearch();
if (status != VectorSearchStatus.enabled) {
throw StateError('Vector search is not available: $status');
}
You can also inspect the current status without enabling the extension by
reading Extension.vectorSearchStatus.
If the vector search library is bundled but its path cannot be resolved, both
Extension.vectorSearchStatus and Extension.enableVectorSearch()
throw a DatabaseException.
VectorSearchStatus values​
| Value | Meaning |
|---|---|
available | The library is bundled and the current system supports it. Call Extension.enableVectorSearch() to enable it. |
enabled | Vector search has already been enabled in this process. |
libraryNotAvailable | The library was not bundled. Set vector_search: true under hooks.user_defines.cbl in pubspec.yaml. |
systemNotSupported | The current system cannot use vector search. Couchbase Lite supports ARM64 and x86-64, and x86-64 additionally requires AVX2 support. |
Create a Vector Index​
Create a vector index with VectorIndexConfiguration and then pass it to
Collection.createIndex.
final collection = await database.createCollection('colors');
final config = VectorIndexConfiguration(
'vector',
dimensions: 3,
centroids: 100,
encoding: VectorEncoding.none(),
metric: DistanceMetric.cosine,
numProbes: 8,
minTrainingSize: 2500,
maxTrainingSize: 5000,
);
await collection.createIndex('colors_index', config);
This example creates an index over the vector property with:
dimensionsset to3centroidsset to100- no vector compression
- cosine distance
- explicit probe and training-size settings
Increasing dimensions, training size, and other vector-index parameters can raise CPU and memory requirements because training vectors must be resident in memory while the index is trained.
Vector Index Configuration​
The table below summarizes the VectorIndexConfiguration options.
| Configuration Name | Is Required | Default Configuration | Further Information |
|---|---|---|---|
| Expression | Yes | No default | A SQL++ expression that returns the vector source. This can be a document property with embedded vectors or a PREDICTION() call. |
| Number of Dimensions | Yes | No default | Supported range: 2 to 4096. |
| Number of Centroids | Yes | No default | Supported range: 1 to 64000. A common starting point is roughly the square root of the number of documents. |
| Distance Metric | No | DistanceMetric.euclideanSquared | Alternatives are DistanceMetric.cosine, DistanceMetric.euclidean, and DistanceMetric.dot. |
| Encoding | No | VectorEncoding.scalarQuantizer(ScalarQuantizerType.eightBit) | Use VectorEncoding.none(), scalar quantization, or product quantization depending on the quality/size tradeoff you need. |
| Training Size | No | Determined from the number of centroids and the encoding | Use minTrainingSize and maxTrainingSize to override the defaults when you have measured a better configuration for your data set. |
| NumProbes | No | Determined from the number of centroids | A common guideline for a custom value is at least 8 or around 0.5% of the number of centroids. |
| Lazy | No | false | Set lazy: true to disable automatic updates and manage vector computation yourself. |
Changing the default training sizes can hurt both index quality and query performance. Treat custom values as an optimization you measure, not a setting you tune blindly.
Generating Vectors​
There are three common ways to generate vectors in Couchbase Lite:
- Store embeddings directly in your documents and index the document property.
- Use the SQL++
PREDICTION()function so vectors are generated while the index is built. - Use lazy vector indexes to generate vectors asynchronously, which is useful when the model is remote or temporarily unavailable.
Create a Vector Index with Embeddings​
If your documents already contain vectors, create the index directly over that property.
final config = VectorIndexConfiguration(
'vector',
dimensions: 3,
centroids: 100,
);
await collection.createIndex('colors_index', config);
Create Vector Index Embeddings from a Predictive Model​
You can also generate vectors at index time with a registered predictive model.
class ColorModel implements PredictiveModel {
Dictionary? predict(Dictionary input) {
final color = input.string('colorInput');
if (color == null) {
return null;
}
final vector = colorVectorSync(color);
return MutableDictionary({'vector': vector});
}
}
Database.prediction.registerModel('ColorModel', ColorModel());
final expression =
'PREDICTION(ColorModel, {"colorInput": color}, "vector")';
final config = VectorIndexConfiguration(
expression,
dimensions: 3,
centroids: 100,
);
await collection.createIndex('colors_index', config);
This approach can use less document storage because the embeddings only live in the index. The tradeoff is that indexing takes longer, because the model runs while the index is being built or updated.
Create a Lazy Vector Index​
Lazy vector indexes are useful when vectors cannot be generated synchronously as part of normal document writes.
final config = VectorIndexConfiguration(
'color',
dimensions: 3,
centroids: 100,
lazy: true,
);
await collection.createIndex('colors_index', config);
With lazy: true, the expression does not have to return the final vector.
Instead, it returns the value your application will use later to compute the
vector.
Lazy vector indexing is opt-in. The default is lazy: false.
Updating the Lazy Index​
To update a lazy index, fetch the index and repeatedly call
QueryIndex.beginUpdate until there are no more pending rows.
final index = await collection.index('colors_index');
if (index == null) {
throw StateError('colors_index not found');
}
while (true) {
final updater = await index.beginUpdate(limit: 50);
if (updater == null) {
break;
}
for (var i = 0; i < updater.length; i++) {
final color = await updater.value<String>(i);
if (color == null) {
await updater.skipVector(i);
continue;
}
try {
final vector = await colorVector(color);
await updater.setVector(i, vector);
} catch (_) {
await updater.skipVector(i);
}
}
await updater.finish();
}
The update flow is:
- Start a batch with
beginUpdate(limit: ...). - Read the pending source value for each row with
value<T>(index). - Call
setVectorwith the computed vector,setVector(..., null)to remove an existing vector, orskipVectorto retry that row later. - Call
finish()to commit the batch.
finish() throws if any row in the updater was neither set nor skipped.
Vector Search SQL++ Support​
Couchbase Lite SQL++ supports hybrid vector search and the
APPROX_VECTOR_DISTANCE() function.
Like the full-text MATCH() function, vector-search predicates cannot be
combined with other predicates in the same WHERE clause using OR.
Use Hybrid Vector Search​
Hybrid vector search combines vector similarity with regular SQL++ filters. The
WHERE clause narrows the candidate documents first, and vector distance is
then evaluated only for the remaining matches.
For non-hybrid vector search, use a LIMIT clause to avoid an exhaustive scan
of every possible match. Hybrid vector search does not require a LIMIT, but
you will usually still want one to cap result size.
Hybrid Vector Search with Full Text Match​
The following example combines a full-text match with vector similarity.
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
WHERE MATCH(color_desc_index, $text)
ORDER BY APPROX_VECTOR_DISTANCE(vector, $vector)
LIMIT 8
''');
query.parameters = Parameters({
'text': 'vibrant',
'vector': await colorVector('FF00AA'),
});
final resultSet = await query.execute();
Prediction with Hybrid Vector Search​
You can also compute vectors on the fly with PREDICTION() inside the query.
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
WHERE saturation > 0.5
ORDER BY APPROX_VECTOR_DISTANCE(
PREDICTION(ColorModel, {"colorInput": color}, "vector"),
$vector
)
LIMIT 8
''');
query.parameters = Parameters({
'vector': await colorVector('FF00AA'),
});
final resultSet = await query.execute();
APPROX_VECTOR_DISTANCE(vector-expr, target-vector, [metric], [nprobes], [accurate])​
APPROX_VECTOR_DISTANCE() calculates the approximate distance between a query
vector and the vectors returned by vector-expr.
If you pass a metric argument, it must match the metric configured for the
index. Otherwise query compilation fails.
| Parameter | Is Required | Description |
|---|---|---|
vector-expr | Yes | The expression that returns the indexed vector. It must match the vector-index expression exactly. |
target-vector | Yes | The query vector to compare against. |
metric | No | One of "EUCLIDEAN_SQUARED", "L2_SQUARED", "EUCLIDEAN", "L2", "COSINE", or "DOT". If omitted, the index metric is used. |
nprobes | No | Overrides the configured probe count for this query. |
accurate | No | If omitted, false is used. This means the encoded vectors in the index are used for distance calculations. |
Only accurate = false is supported.
Use APPROX_VECTOR_DISTANCE()​
The following example filters by approximate distance.
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
WHERE APPROX_VECTOR_DISTANCE(vector, $vector) < 0.5
LIMIT 8
''');
query.parameters = Parameters({
'vector': await colorVector('FF00AA'),
});
final resultSet = await query.execute();
Smaller distance values indicate more similar vectors.
Prediction with APPROX_VECTOR_DISTANCE()​
You can also use PREDICTION() together with APPROX_VECTOR_DISTANCE().
final query = await database.createQuery(r'''
SELECT META().id, color
FROM colors
ORDER BY APPROX_VECTOR_DISTANCE(
PREDICTION(ColorModel, {"colorInput": color}, "vector"),
$vector
)
LIMIT 8
''');
query.parameters = Parameters({
'vector': await colorVector('FF00AA'),
});
final resultSet = await query.execute();