Vector Query
This article is generated by AI translation.
In AI and machine learning scenarios, unstructured data such as text, images, and audio are typically encoded by models into high-dimensional vectors (also called Embeddings).
The distance between vectors measures the semantic similarity of the original data — the closer the distance, the more semantically similar.
The core question of vector query is: given a target vector, find the records in the database closest to it.
Distance Metrics
Different metrics suit different scenarios. dbVisitor supports 6 metrics:
- When unsure, choose L2 (Euclidean distance) — the most general-purpose metric.
- Text semantic search → Cosine, which only considers direction, not vector magnitude. Ideal for normalized embeddings.
- Recommendation / ranking → IP (Inner Product). When vectors are normalized, IP results are equivalent to Cosine similarity.
Two Query Modes
dbVisitor provides two vector search modes, corresponding to SQL ORDER BY and WHERE:
| Comparison | KNN (orderBy*) | Range (vectorBy*) |
|---|---|---|
| SQL position | ORDER BY | WHERE |
| Result count | Fixed K rows (use with initPage) | Variable, depends on threshold |
| Use case | "Find the N most similar" | "Find all within distance range" |
| Composability | Can add WHERE conditions for pre-filtering | Freely combine with other WHERE conditions |
Preparation
1. Create Table
Using PostgreSQL + pgvector as an example, install the vector extension and create a vector column:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE product_vector (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
embedding vector(128) -- 128-dimensional vector
);
2. Entity Mapping
@Table("product_vector")
public class ProductVector {
@Column(primary = true)
private Integer id;
private String name;
@Column(typeHandler = PgVectorTypeHandler.class)
private List<Float> embedding;
// getter / setter ...
}
- Vector fields use
List<Float>for representation. - A
typeHandlermust be specified for vector fields to handle conversion betweenList<Float>and the database vector type.
PgVectorTypeHandler is an implementation for PostgreSQL pgvector, using PGobject to convert between List<Float> and the vector type.
For other databases (e.g., Milvus), use the corresponding TypeHandler.
3. Vector Parameter Format
In KNN sort queries (orderBy* series), the vector parameter needs to be a database-recognizable type.
For pgvector, you cannot pass List<Float> directly — it must be wrapped as a PGobject:
PGobject vectorParam = new PGobject();
vectorParam.setType("vector");
vectorParam.setValue("[0.1,0.2,0.3,...]"); // pgvector text format
In Range filter queries (vectorBy* series), the vector parameter is automatically converted through the entity mapping's TypeHandler, so you can pass List<Float> directly.
KNN Nearest-Neighbor Search
Use orderBy* methods to sort by vector distance and return the K records closest to the target.
L2 Euclidean Distance
LambdaTemplate lambda = ...
Object target = ...; // target vector (PGobject or database-specific type)
List<ProductVector> results = lambda.query(ProductVector.class)
.orderByL2(ProductVector::getEmbedding, target)
.queryForList();
// Equivalent SQL (pgvector):
// SELECT * FROM product_vector ORDER BY embedding <-> ? ASC
Cosine Distance
List<ProductVector> results = lambda.query(ProductVector.class)
.orderByCosine(ProductVector::getEmbedding, target)
.queryForList();
// Equivalent SQL (pgvector):
// SELECT * FROM product_vector ORDER BY embedding <=> ? ASC
IP Inner Product Distance
List<ProductVector> results = lambda.query(ProductVector.class)
.orderByIP(ProductVector::getEmbedding, target)
.queryForList();
// Equivalent SQL (pgvector):
// SELECT * FROM product_vector ORDER BY embedding <#> ? ASC
pgvector's <#> operator returns the negative inner product. After sorting, the record with the largest inner product (most similar) comes first.
Top-K Query
Combine with initPage to return only the K nearest records:
int topK = 5;
List<ProductVector> results = lambda.query(ProductVector.class)
.orderByL2(ProductVector::getEmbedding, target)
.initPage(topK, 0) // take only top 5
.queryForList();
Generic Metric Interface
Use orderByMetric with the MetricType enum to dynamically specify the metric:
import net.hasor.dbvisitor.lambda.core.MetricType;
List<ProductVector> results = lambda.query(ProductVector.class)
.orderByMetric(MetricType.L2, ProductVector::getEmbedding, target)
.queryForList();
See the Distance Metrics section above for all available metrics.
| MetricType | Shortcut method | pgvector operator |
|---|---|---|
MetricType.L2 | orderByL2 | <-> |
MetricType.COSINE | orderByCosine | <=> |
MetricType.IP | orderByIP | <#> |
MetricType.HAMMING | orderByHamming | <~> |
MetricType.JACCARD | orderByJaccard | <%> |
MetricType.BM25 | orderByBM25 | <?> |
Range Search
Use vectorBy* methods to return only records whose distance to the target vector is less than the threshold. These are WHERE conditions that can be freely combined with other conditions.
L2 Distance Filtering
List<Float> target = ...; // can use List<Float> directly
double threshold = 5.0;
List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByL2(ProductVector::getEmbedding, target, threshold)
.queryForList();
// Equivalent SQL (pgvector):
// SELECT * FROM product_vector WHERE embedding <-> ? < ?
Cosine Distance Filtering
List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByCosine(ProductVector::getEmbedding, target, 0.1)
.queryForList();
// Equivalent SQL (pgvector):
// SELECT * FROM product_vector WHERE embedding <=> ? < ?
IP Distance Filtering
List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByIP(ProductVector::getEmbedding, target, -50.0)
.queryForList();
vectorBy* vector parameters are automatically converted through the entity mapping's TypeHandler, so you can pass List<Float> directly.
orderBy* vector parameters go directly into SQL parameter binding and need to be a database-recognizable type (e.g., PGobject).
Condition Toggle
All vectorBy* methods support a boolean first parameter to control whether the condition is active:
boolean enableVectorFilter = ...;
List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByL2(enableVectorFilter, ProductVector::getEmbedding, target, threshold)
.queryForList();
// When enableVectorFilter = false, the vector filter condition will not appear in the SQL
Combined Queries
Vector queries can be freely combined with scalar conditions to achieve filter-then-sort or sort-then-filter.
KNN + Scalar Filtering
List<ProductVector> results = lambda.query(ProductVector.class)
.likeRight(ProductVector::getName, "Cat-A") // scalar condition
.orderByL2(ProductVector::getEmbedding, target) // vector sort
.initPage(3, 0) // Top-3
.queryForList();
// Equivalent SQL (pgvector):
// SELECT * FROM product_vector
// WHERE name LIKE 'Cat-A%'
// ORDER BY embedding <-> ?
// LIMIT 3
Range + Scalar Filtering
List<ProductVector> results = lambda.query(ProductVector.class)
.likeRight(ProductVector::getName, "R-A") // scalar condition
.vectorByL2(ProductVector::getEmbedding, target, 6.0) // vector range filter
.queryForList();
// Equivalent SQL (pgvector):
// SELECT * FROM product_vector
// WHERE name LIKE 'R-A%'
// AND embedding <-> ? < ?
Basic Operations
Vector data insert, update, and delete operations are identical to regular entities. The TypeHandler automatically handles serialization and deserialization of List<Float>.
ProductVector p = new ProductVector();
p.setId(1001);
p.setName("sample");
p.setEmbedding(Arrays.asList(0.1f, 0.2f, 0.3f, ...)); // 128 dimensions
lambda.insert(ProductVector.class)
.applyEntity(p)
.executeSumResult();
List<Float> newVec = Arrays.asList(0.9f, 0.8f, 0.7f, ...);
lambda.update(ProductVector.class)
.eq(ProductVector::getId, 1001)
.updateTo(ProductVector::getEmbedding, newVec)
.doUpdate();
ProductVector loaded = lambda.query(ProductVector.class)
.eq(ProductVector::getId, 1001)
.queryForObject();
List<Float> vec = loaded.getEmbedding(); // automatically deserialized to List<Float>
API Reference
KNN Sort (QueryFunc Interface)
| Method | Description |
|---|---|
orderByL2(P property, Object vector) | Sort by L2 distance |
orderByCosine(P property, Object vector) | Sort by Cosine distance |
orderByIP(P property, Object vector) | Sort by IP distance |
orderByHamming(P property, Object vector) | Sort by Hamming distance |
orderByJaccard(P property, Object vector) | Sort by Jaccard distance |
orderByBM25(P property, Object vector) | Sort by BM25 score |
orderByMetric(MetricType, P property, Object vector) | Specify metric via enum |
Range Search (QueryCompare Interface)
| Method | Description |
|---|---|
vectorByL2(P property, Object vector, Number threshold) | L2 distance less than threshold |
vectorByCosine(P property, Object vector, Number threshold) | Cosine distance less than threshold |
vectorByIP(P property, Object vector, Number threshold) | IP distance less than threshold |
vectorByHamming(P property, Object vector, Number threshold) | Hamming distance less than threshold |
vectorByJaccard(P property, Object vector, Number threshold) | Jaccard distance less than threshold |
vectorByBM25(P property, Object vector, Number threshold) | BM25 distance less than threshold |
All vectorBy* methods also support the (boolean test, P property, Object vector, Number threshold) overload for dynamically controlling whether the condition is active.
Related classes:
- net.hasor.dbvisitor.lambda.core.MetricType
- net.hasor.dbvisitor.lambda.core.QueryFunc
- net.hasor.dbvisitor.lambda.core.QueryCompare