Skip to main content

Vector Query

Hint

This article is generated by AI translation.

In AI and machine learning scenarios, unstructured data such as text, images, and audio are typically encoded by models into high-dimensional vectors (also called Embeddings).
The distance between vectors measures the semantic similarity of the original data — the closer the distance, the more semantically similar.

Raw Data🐱 Cat photo🐶 Dog photo🚗 Car photo📝 A text passageEmbeddingModel EncodingVector Space[0.12, 0.85, ..., 0.33][0.15, 0.80, ..., 0.31][0.91, 0.02, ..., 0.77][0.45, 0.62, ..., 0.19]DistanceSimilarity RankingQuery Results🥇 Cat (dist=0.05)🥈 Dog (dist=0.12)🥉 Text (dist=0.58)④ Car (dist=0.91)

The core question of vector query is: given a target vector, find the records in the database closest to it.

Distance Metrics

Different metrics suit different scenarios. dbVisitor supports 6 metrics:

L2 Euclidean DistanceStraight-line distance between two pointsd = √Σ(aᵢ - bᵢ)²Lower → more similar✅ General-purpose defaultpgvector: <->Cosine DistanceComplement of cosine of the angled = 1 - cos(θ)Lower → closer direction✅ Text semantics / NLPpgvector: <=>IP Inner ProductDot product of vectors (negative for sorting)d = -Σ(aᵢ × bᵢ)Lower → larger inner product✅ Recommendation systemspgvector: <#>Hamming DistanceNumber of differing bitsFor binary vectorsLower → more similar✅ Hash / fingerprint matchingpgvector: <~>Jaccard DistanceComplement of intersection-over-uniond = 1 - |A∩B| / |A∪B|Lower → more similar✅ Set / tag similaritypgvector: <%>BM25Term-frequency-based text relevanceClassic IR scoringLower → more relevant✅ Full-text searchpgvector: <?>
How to choose a metric
  • When unsure, choose L2 (Euclidean distance) — the most general-purpose metric.
  • Text semantic search → Cosine, which only considers direction, not vector magnitude. Ideal for normalized embeddings.
  • Recommendation / ranking → IP (Inner Product). When vectors are normalized, IP results are equivalent to Cosine similarity.

Two Query Modes

dbVisitor provides two vector search modes, corresponding to SQL ORDER BY and WHERE:

KNN Sort Mode (orderBy*)Return the K closest recordstargetSort by distance, take top K (blue)SQL: ORDER BY embedding <-> ? LIMIT KRange Filter Mode (vectorBy*)Return all records within threshold distancethresholdtargetAll inside the circle returned (green), outside discarded (grey)SQL: WHERE embedding <-> ? < threshold
ComparisonKNN (orderBy*)Range (vectorBy*)
SQL positionORDER BYWHERE
Result countFixed K rows (use with initPage)Variable, depends on threshold
Use case"Find the N most similar""Find all within distance range"
ComposabilityCan add WHERE conditions for pre-filteringFreely combine with other WHERE conditions

Preparation

1. Create Table

Using PostgreSQL + pgvector as an example, install the vector extension and create a vector column:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE product_vector (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
embedding vector(128) -- 128-dimensional vector
);

2. Entity Mapping

Vector entity class
@Table("product_vector")
public class ProductVector {
@Column(primary = true)
private Integer id;
private String name;

@Column(typeHandler = PgVectorTypeHandler.class)
private List<Float> embedding;

// getter / setter ...
}
  • Vector fields use List<Float> for representation.
  • A typeHandler must be specified for vector fields to handle conversion between List<Float> and the database vector type.
Tip

PgVectorTypeHandler is an implementation for PostgreSQL pgvector, using PGobject to convert between List<Float> and the vector type. For other databases (e.g., Milvus), use the corresponding TypeHandler.

3. Vector Parameter Format

In KNN sort queries (orderBy* series), the vector parameter needs to be a database-recognizable type.
For pgvector, you cannot pass List<Float> directly — it must be wrapped as a PGobject:

Construct vector query parameter (pgvector)
PGobject vectorParam = new PGobject();
vectorParam.setType("vector");
vectorParam.setValue("[0.1,0.2,0.3,...]"); // pgvector text format

In Range filter queries (vectorBy* series), the vector parameter is automatically converted through the entity mapping's TypeHandler, so you can pass List<Float> directly.

KNN Nearest-Neighbor Search

Use orderBy* methods to sort by vector distance and return the K records closest to the target.

L2 Euclidean Distance

Sort by L2 distance
LambdaTemplate lambda = ...
Object target = ...; // target vector (PGobject or database-specific type)

List<ProductVector> results = lambda.query(ProductVector.class)
.orderByL2(ProductVector::getEmbedding, target)
.queryForList();

// Equivalent SQL (pgvector):
// SELECT * FROM product_vector ORDER BY embedding <-> ? ASC

Cosine Distance

Sort by Cosine distance
List<ProductVector> results = lambda.query(ProductVector.class)
.orderByCosine(ProductVector::getEmbedding, target)
.queryForList();

// Equivalent SQL (pgvector):
// SELECT * FROM product_vector ORDER BY embedding <=> ? ASC

IP Inner Product Distance

Sort by Inner Product distance
List<ProductVector> results = lambda.query(ProductVector.class)
.orderByIP(ProductVector::getEmbedding, target)
.queryForList();

// Equivalent SQL (pgvector):
// SELECT * FROM product_vector ORDER BY embedding <#> ? ASC
IP Distance Note

pgvector's <#> operator returns the negative inner product. After sorting, the record with the largest inner product (most similar) comes first.

Top-K Query

Combine with initPage to return only the K nearest records:

Top-K nearest neighbors
int topK = 5;

List<ProductVector> results = lambda.query(ProductVector.class)
.orderByL2(ProductVector::getEmbedding, target)
.initPage(topK, 0) // take only top 5
.queryForList();

Generic Metric Interface

Use orderByMetric with the MetricType enum to dynamically specify the metric:

Using MetricType enum
import net.hasor.dbvisitor.lambda.core.MetricType;

List<ProductVector> results = lambda.query(ProductVector.class)
.orderByMetric(MetricType.L2, ProductVector::getEmbedding, target)
.queryForList();

See the Distance Metrics section above for all available metrics.

MetricTypeShortcut methodpgvector operator
MetricType.L2orderByL2<->
MetricType.COSINEorderByCosine<=>
MetricType.IPorderByIP<#>
MetricType.HAMMINGorderByHamming<~>
MetricType.JACCARDorderByJaccard<%>
MetricType.BM25orderByBM25<?>

Range Search

Use vectorBy* methods to return only records whose distance to the target vector is less than the threshold. These are WHERE conditions that can be freely combined with other conditions.

L2 Distance Filtering

L2 distance range filter
List<Float> target = ...;   // can use List<Float> directly
double threshold = 5.0;

List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByL2(ProductVector::getEmbedding, target, threshold)
.queryForList();

// Equivalent SQL (pgvector):
// SELECT * FROM product_vector WHERE embedding <-> ? < ?

Cosine Distance Filtering

Cosine distance range filter
List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByCosine(ProductVector::getEmbedding, target, 0.1)
.queryForList();

// Equivalent SQL (pgvector):
// SELECT * FROM product_vector WHERE embedding <=> ? < ?

IP Distance Filtering

IP distance range filter
List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByIP(ProductVector::getEmbedding, target, -50.0)
.queryForList();
vectorBy vs orderBy parameter differences

vectorBy* vector parameters are automatically converted through the entity mapping's TypeHandler, so you can pass List<Float> directly.
orderBy* vector parameters go directly into SQL parameter binding and need to be a database-recognizable type (e.g., PGobject).

Condition Toggle

All vectorBy* methods support a boolean first parameter to control whether the condition is active:

Dynamically control vector filtering
boolean enableVectorFilter = ...;

List<ProductVector> results = lambda.query(ProductVector.class)
.vectorByL2(enableVectorFilter, ProductVector::getEmbedding, target, threshold)
.queryForList();

// When enableVectorFilter = false, the vector filter condition will not appear in the SQL

Combined Queries

Vector queries can be freely combined with scalar conditions to achieve filter-then-sort or sort-then-filter.

KNN + Scalar Filtering

KNN search within a specific category
List<ProductVector> results = lambda.query(ProductVector.class)
.likeRight(ProductVector::getName, "Cat-A") // scalar condition
.orderByL2(ProductVector::getEmbedding, target) // vector sort
.initPage(3, 0) // Top-3
.queryForList();

// Equivalent SQL (pgvector):
// SELECT * FROM product_vector
// WHERE name LIKE 'Cat-A%'
// ORDER BY embedding <-> ?
// LIMIT 3

Range + Scalar Filtering

Vector range filter + scalar condition
List<ProductVector> results = lambda.query(ProductVector.class)
.likeRight(ProductVector::getName, "R-A") // scalar condition
.vectorByL2(ProductVector::getEmbedding, target, 6.0) // vector range filter
.queryForList();

// Equivalent SQL (pgvector):
// SELECT * FROM product_vector
// WHERE name LIKE 'R-A%'
// AND embedding <-> ? < ?

Basic Operations

Vector data insert, update, and delete operations are identical to regular entities. The TypeHandler automatically handles serialization and deserialization of List<Float>.

Insert vector
ProductVector p = new ProductVector();
p.setId(1001);
p.setName("sample");
p.setEmbedding(Arrays.asList(0.1f, 0.2f, 0.3f, ...)); // 128 dimensions

lambda.insert(ProductVector.class)
.applyEntity(p)
.executeSumResult();
Update vector
List<Float> newVec = Arrays.asList(0.9f, 0.8f, 0.7f, ...);

lambda.update(ProductVector.class)
.eq(ProductVector::getId, 1001)
.updateTo(ProductVector::getEmbedding, newVec)
.doUpdate();
Read vector
ProductVector loaded = lambda.query(ProductVector.class)
.eq(ProductVector::getId, 1001)
.queryForObject();

List<Float> vec = loaded.getEmbedding(); // automatically deserialized to List<Float>

API Reference

KNN Sort (QueryFunc Interface)

MethodDescription
orderByL2(P property, Object vector)Sort by L2 distance
orderByCosine(P property, Object vector)Sort by Cosine distance
orderByIP(P property, Object vector)Sort by IP distance
orderByHamming(P property, Object vector)Sort by Hamming distance
orderByJaccard(P property, Object vector)Sort by Jaccard distance
orderByBM25(P property, Object vector)Sort by BM25 score
orderByMetric(MetricType, P property, Object vector)Specify metric via enum

Range Search (QueryCompare Interface)

MethodDescription
vectorByL2(P property, Object vector, Number threshold)L2 distance less than threshold
vectorByCosine(P property, Object vector, Number threshold)Cosine distance less than threshold
vectorByIP(P property, Object vector, Number threshold)IP distance less than threshold
vectorByHamming(P property, Object vector, Number threshold)Hamming distance less than threshold
vectorByJaccard(P property, Object vector, Number threshold)Jaccard distance less than threshold
vectorByBM25(P property, Object vector, Number threshold)BM25 distance less than threshold

All vectorBy* methods also support the (boolean test, P property, Object vector, Number threshold) overload for dynamically controlling whether the condition is active.

Related classes:

  • net.hasor.dbvisitor.lambda.core.MetricType
  • net.hasor.dbvisitor.lambda.core.QueryFunc
  • net.hasor.dbvisitor.lambda.core.QueryCompare