Metadata Filtering¶
Metadata filters can be applied during nearest-neighbor search or used standalone via
meta_search / meta_search_paginated.
Privacy Notice¶
Warning
Metadata fields (tag1–tag5, facets) are stored in plaintext and are not
encrypted by default. Chunk content and embedding vectors are end-to-end encrypted —
metadata is the only part of a chunk that the XTrace server can read.
If metadata privacy is a requirement, consider the following mitigation until native support is available:
Store only opaque identifiers in metadata tags (e.g. a hashed or randomly assigned
user_idrather than a readable name), and keep the mapping in your own system.Restrict filters to equality checks (
$eq/$in) on those opaque values. Range operators ($gt,$lte,$begins_with, etc.) leak ordering information and should be avoided when the tag value itself is sensitive.
XTrace plans to support encrypted metadata indexes natively in a future release.
Metadata Fields¶
Each chunk has five indexed scalar tags and one multi-value field:
Field |
Type |
Recommended Use |
|---|---|---|
|
String |
High-cardinality identifier (e.g. |
|
String |
Collection / project / knowledge base |
|
Zero-padded number string |
Numeric ranges (e.g. score, price, count) |
|
ISO 8601 date string |
Temporal ranges (e.g. |
|
String |
Source or namespace |
|
List of strings |
Labels, categories, ad-hoc metadata |
Warning
All tags are compared as strings. For correct range ordering, numeric values must be zero-padded to a fixed width and dates must use ISO 8601 UTC.
# Numeric — zero-pad so "0000000010" > "0000000002"
"tag3": "0000000010"
# Date — ISO 8601 UTC
"tag4": "2024-01-01T00:00:00Z"
Supported Operators¶
facets (multi-value)¶
Operator |
Description |
|---|---|
|
Contains all provided tokens |
|
Contains at least one provided token |
|
Contains none of the provided tokens |
|
Contains the given token |
|
Facets list has exactly N tokens |
Query Examples¶
Nearest-neighbor search with a filter:
ids = await retriever.nn_search_for_ids(
query_vector,
k=5,
kb_id="your_kb_id",
meta_filter={"tag1": "org_772", "tag2": "invoices"},
)
Range filter on a numeric tag:
meta_filter = {
"tag1": "org_772",
"tag3": {"$gt": "000000100000"},
}
Date range with facet refinement:
meta_filter = {
"tag4": {
"$gte": "2024-01-01T00:00:00Z",
"$lte": "2024-03-31T23:59:59Z",
},
"facets": {"$contains": "finance"},
}
Subset facet filter:
meta_filter = {
"tag1": "org_772",
"facets": {"$subset": ["finance", "tax_audit"]},
}
Standalone metadata search:
results = await xtrace.meta_search(
kb_id="your_kb_id",
meta_filter={"tag1": "org_772", "tag2": "invoices"},
context_id=execution_context.id,
)
Paginated metadata search:
page = await xtrace.meta_search_paginated(
kb_id="your_kb_id",
context_id=execution_context.id,
meta_filter={"tag1": "org_772", "tag2": "invoices"},
limit=20,
offset=0,
return_content=True, # include encrypted chunk_content in results
)