Agentic memory search
Agentic search over the memory pool.
Pipeline: per-corpus vector retrieval → (mode=compose only)
LLM context selection. data: list[Memory] is always returned;
mode=compose additionally assembles a markdown block into
context: str, while mode=retrieve stops after retrieval
and leaves context null. Scope is enforced server-side by the
per-request store (QdrantVectorDB.scope_user_id /
scope_group_ids); no caller-supplied filter DSL.
Authorizations
Long-lived org API key. Alternative: Authorization: Bearer <key>.
Required alongside the API key (no key→org reverse index).
Body
POST /v1/memories/search — agentic memory search.
The pipeline has two phases:
- Retrieve — embed the query, fetch per-corpus vector
candidates, optionally rerank. Output: a ranked set of
:class:
Memoryrows. - Compose (only when
mode='compose') — an LLM context-selection step picks the most relevant subset of the candidates and weaves them into a markdown block ready to drop into an LLM prompt.
mode toggles whether phase 2 runs:
"compose"(default) →datapopulated with rows andcontextpopulated with the assembled markdown. The dominant use case (memory search → LLM prompt) gets the prompt-ready blob without an opt-in."retrieve"→datapopulated,contextnull. Skips the LLM compose step (cheaper, faster); for callers building their own UI or doing custom downstream processing.
Scope is enforced server-side by the per-request store
(QdrantVectorDB._scope_must) and follows "scope by what you
pass": every scope axis you supply ANDs; an axis you omit is left
unconstrained. org_id always comes from auth. At least one of
user_id / agent_id / app_id / group_ids must be
supplied — an unscoped org-wide search is rejected. user_id is
optional here (unlike ingest, where it is required): omit it and
pass group_ids to read a shared group across users. No filter
DSL — the four scope axes are the only narrowing.
Legacy compatibility (undocumented in the public spec, kept so existing SDK installations don't break):
filters: {user_id: "X", ...}— accepted;user_idis lifted out offiltersif absent at the body root. Other filter keys are silently dropped (the new shape doesn't support a general filter DSL).mode: "rows"→ translated to"retrieve".mode: "context"→ translated to"compose".include: ["context_prompt"]→ setsmode="compose"and drops the value."full_content"is dropped silently.
These translations are intentionally invisible on the wire — the public spec only advertises the new shape — and will be removed once consumers have migrated.
Natural-language query text. Embedded server-side; xmem's pipeline ranks and selects.
1 - 4000"who likes thai food?"
Scope key. When supplied, baked into the per-request store as an AND pin so reads are tenant-isolated at the Qdrant filter layer (QdrantVectorDB._scope_must). Optional on search (unlike ingest): omit it and pass group_ids to read a shared group across users. At least one of user_id / agent_id / app_id / group_ids is required. The compat shim also accepts it inside legacy filters.user_id and lifts it before field validation runs, so old SDKs that send the pre-#68 wire shape don't 422.
"user-123"
Pipeline depth selector.
compose(default) — vector retrieval plus an LLM context-selection step that picks the most relevant subset, then assembles it into a markdown block.datacarries the selected rows;contextcarries the assembled markdown. One LLM call.retrieve— vector retrieval only. No LLM, no agent, cheaper and faster.datacarries the raw ranked candidates (the unfiltered set);contextis null.
data is populated in both modes; under compose it's the LLM-selected subset, under retrieve it's the raw candidate set.
retrieve, compose Which corpora to search. Subset to restrict — e.g. ["fact"] for facts-only. Default is all three.
fact, artifact, episode Optional group tags — another AND scope axis. When non-empty, candidate rows must be tagged to at least one of the requested group(s):
org AND kb_type [ AND user_id ] AND ( group_ids ∩ <group_ids> ) [ AND agent_id ] [ AND app_id ]
Group membership (the ∩) is OR / any-of across the list — pass [trip_tokyo, trip_paris] to span both trips.
How it composes with user_id ("scope by what you pass"):
- omit
user_id, passgroup_ids— the cross-user whole-group read: every user's rows tagged to those group(s). A traveler's AI sees the whole trip's shared facts, from any traveler. - pass both
user_idandgroup_ids— the intersection: only the caller's own rows that are also tagged to those group(s) (the caller's slice of the group).
(agent_id / app_id AND on top when set — they narrow further.) Group ids are server-generated unguessable handles (see POST /v1/groups), so knowing the id is the access boundary; other users' untagged memories never surface.
["grp_a1b2c3d4e5f6071829304a5b6c7d8e9f"]Optional agent scope. When set, ANDs onto the active primary scope (whether that's user_id or group_ids): candidate rows must also carry this exact agent_id. Use it to narrow a search to one agent's contributions. Indexed payload key, same axis ingest stamps.
"bot-7"
Optional app scope. When set, ANDs onto the active primary scope (like agent_id): candidate rows must also carry this exact app_id. Indexed payload key, same axis ingest stamps.
"app-3"
Response
Successful Response
POST /v1/memories/search response.
data is always populated — both modes return the ranked
Memory rows from retrieval. context is populated only when
mode='compose' with the assembled markdown block from the
LLM context-selection step. stage_timings is always present
for per-stage latency attribution.
Note that under mode='compose' the rows in data are the
retrieval candidates — a possibly larger set than the subset
the LLM selected and wove into context. Useful for showing
"everything we found" alongside "what we sent to the model."
Echoes the request's mode. compose ⇒ both data and context populated; retrieve ⇒ data populated and context null.
retrieve, compose Constant discriminator for the resource type.
"search"Ranked Memory rows from retrieval. Populated for both modes. Sort order is score desc. For mode=compose, these are the candidates considered by the context-selection step — a (possibly larger) superset of what ended up in context.
Assembled markdown context block ready for prompt insertion. Populated only when mode=compose; null otherwise.
Per-stage retrieval-pipeline latencies (seconds). Populated in both modes.
True when xmem's LLM context-selection step ran. False indicates the pipeline fell through (e.g. empty candidate pool, or XMEM_ENABLE_CONTEXT_SELECTION=false).