Ingest is the write path. You send conversation messages; the server runs LLM-based extraction to pull out facts (and, when relevant, artifacts and episodes), embeds each one, and stores them in your org’s vector index.
The mental model
Ingest is asynchronous by default. Extraction is LLM-bound — typically 3–10 seconds — so the API returns a job immediately and does the work in the background. Your code polls or opts into sync mode.
┌──────────┐ ┌───────────────┐
│ Client │ POST /v1/memories ──────► │ Memory API │
│ │ ◄──── IngestJob (pending) │ (returns 1s) │
└──────────┘ └───────────────┘
│
│ extraction (3–10s)
▼
status: succeeded
result.memories_created: [...]
Required fields
Every ingest needs:
messages — array of { role, content }. Empty array → 400.
user_id — keys the per-user session namespace
conv_id — anchors every extracted memory to a conversation (for replay, export, bulk retract)
Optional: agent_id, app_id, group_ids (tag the extracted memories to shared groups), timestamp_format (a strptime format for parsing dated turns on the batch path), extract_artifacts (defaults to true — pass false to skip the artifact-extraction stage, the most expensive part of the pipeline).
Async ingest (default)
const job = await client.memories.ingest({
messages: [
{ role: 'user', content: 'My favorite food is pad see ew.' },
{ role: 'assistant', content: 'Noted — Thai food.' },
],
user_id: 'alice',
conv_id: 'conv_2026_05_16',
});
// pollUntilDone handles exponential backoff (500ms → 5s) and timeout.
const done = await client.memories.jobs.pollUntilDone(job.id);
if (done.status === 'failed') {
throw new Error(`Ingest failed: ${done.error?.message}`);
}
console.log('Created', done.result?.memories_created.length, 'memories');
Sync ingest (wait: true)
Useful for demos, one-shot scripts, or any code where you want the result inline:
const job = await client.memories.ingest(
{
messages: [{ role: 'user', content: 'I am vegetarian.' }],
user_id: 'alice',
conv_id: 'conv_2026_05_16',
},
{ wait: true },
);
if (job.status === 'succeeded') {
console.log('Inline result:', job.result?.memories_created);
} else if (job.status === 'failed') {
console.error('Extraction failed:', job.error);
} else {
// Sync budget elapsed (30s) — fell back to async; poll job.id as above.
console.log('Polling required:', job.id);
}
The server holds the connection for up to 30 seconds. If extraction finishes in that window the response is terminal (succeeded or failed). If the budget elapses, you get a pending/running job back and have to poll — same as async mode.
Use sync mode for interactive demos and CLI tools; use async mode for production agent loops where you want to dispatch ingest and continue working.
You pass messages; you don’t pre-decide what’s a fact vs an artifact vs an episode. The server’s extraction pipeline decides:
| Type | Triggered when |
|---|
| Fact | The default. A semantic claim in a turn (“User likes X”, “User works at Y”). |
| Artifact | The conversation references a structured object — a doc, code snippet, summary — that’s worth storing standalone. Extracted by default; pass extract_artifacts: false to skip this stage. |
| Episode | A stretch of turns gets summarized into a session-level memory. Server-driven; no client knob. |
The result.memories_created array tells you what landed; each entry is a thin reference ({id, type, text}). For the full row, call client.memories.get(id).
Tagging memories to groups
Pass group_ids to associate this ingest with one or more groups — shared tagging targets you register up front (see Groups). At extraction time a classifier reads each group’s prompt and tags the extracted memories that belong to it. Other members of the group can then surface those memories with a group search.
await client.memories.ingest({
messages: [
{ role: 'user', content: "When I'm in Tokyo I always stay near Shibuya station." },
{ role: 'assistant', content: 'Noted.' },
],
user_id: 'alice',
conv_id: 'conv_2026_05_16',
group_ids: ['grp_tokyo2026'],
});
- The classifier tags each extracted memory with the subset of
group_ids it belongs to — a memory can land in several groups, one, or none. Untagged extraction still happens as usual; tagging is additive.
- Unknown or archived ids are soft-skipped — they never fail the ingest, and come back in
result.ignored_group_ids so you can prune stale ids client-side.
- Up to 20 group ids per ingest; more returns
422.
Groups are how you share memory across users. A fact Alice ingests with group_ids: ['grp_tokyo2026'] becomes visible to every member of that group via group search — without exposing her untagged personal memories.
Failure modes
Extraction can fail for various reasons — upstream LLM hiccup, content that doesn’t yield extractable facts, rate limits. The job lands in status: "failed" with an error.code and error.message. Retry by submitting the same body again; we don’t auto-retry server-side.
Common failure codes:
| Code | Meaning |
|---|
ingest_failed | Generic extraction error; check error.message |
rate_limit_exceeded | Org quota hit; wait and retry |
See also
- Searching memories — query what you just ingested
- API Reference → Memories → Ingest — full request/response schemas