xtrace_sdk.x_vec.data_loaders.loader¶
Attributes¶
Classes¶
Encrypts and uploads document collections to XTrace. |
Module Contents¶
- xtrace_sdk.x_vec.data_loaders.loader._log¶
- class xtrace_sdk.x_vec.data_loaders.loader.DataLoader(execution_context, integration)¶
Encrypts and uploads document collections to XTrace.
DataLoaderhandles the two encryption steps required before data reaches XTrace:AES encryption of chunk content (text → ciphertext bytes).
Homomorphic encryption of embedding vectors (float vector → encrypted index).
The encrypted data is then uploaded via
XTraceIntegration. Neither the plaintext chunk content nor the raw embedding vectors leave the client.- Parameters:
execution_context (xtrace_sdk.x_vec.utils.execution_context.ExecutionContext) – Initialised
ExecutionContextcontaining the AES key and homomorphic client.integration (xtrace_sdk.integrations.xtrace.XTraceIntegration) – Authenticated
XTraceIntegrationinstance.
- execution_context¶
- integration¶
- async dump_db(db, index, kb_id, concurrent=False)¶
Upload a pre-encrypted database to XTrace.
Typically called after
load_data_from_memory()orload_data_from_memory_batch()have produced an(index, encrypted_db)pair.- Parameters:
db (EncryptedDB) – Encrypted document collection produced by
load_data_from_memory().index (list[list[int]]) – Encrypted embedding vectors produced by
load_data_from_memory().kb_id (str) – Destination knowledge-base ID.
concurrent (bool, optional) – Upload batches concurrently. Defaults to
False.
- Returns:
List of server responses, one per upload batch.
- Return type:
- async upsert_one(chunk, vector, kb_id)¶
Encrypt and upload a single chunk.
- async delete_chunks(chunk_ids, kb_id)¶
Delete chunks by ID.
- async update_chunks(chunk_updates, vectors, kb_id)¶
Re-encrypt updated chunks and upload them.
Each chunk in
chunk_updatesmust include achunk_idfield identifying the record to replace.
- async load_data_from_memory(chunks, vectors, disable_progress=False)¶
Encrypt a document collection one chunk at a time.
AES-encrypts each chunk’s
chunk_contentand homomorphically encrypts each float embedding vector into an encrypted index. Results are ready to pass todump_db().- Parameters:
chunks (DocumentCollection) – Document collection — each item must have a
chunk_contentstring field.vectors (list[list[float]]) – Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited
bin_embed()call) — it will be awaited automatically.disable_progress (bool, optional) – If
True, suppress the tqdm progress bar, defaults toFalse.
- Returns:
Tuple of
(index, encrypted_db)whereindexcontains the encrypted vectors andencrypted_dbcontains chunks with AES-encrypted content.- Return type:
- async load_data_from_memory_batch(chunks, vectors, disable_progress=False)¶
Encrypt a document collection using batch homomorphic encryption.
Faster than
load_data_from_memory()for large collections because all embedding vectors are passed to the homomorphic client in a single batch call instead of one at a time. AES encryption is still applied per chunk.- Parameters:
chunks (DocumentCollection) – Document collection — each item must have a
chunk_contentstring field.vectors (list[list[float]]) – Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited
bin_embed()call) — it will be awaited automatically.disable_progress (bool, optional) – Unused; kept for API compatibility with
load_data_from_memory().
- Returns:
Tuple of
(index, encrypted_db)whereindexcontains the encrypted vectors andencrypted_dbcontains chunks with AES-encrypted content.- Return type: