xtrace_sdk.x_vec.data_loaders.loader¶

Classes¶

DataLoader

Encrypts and uploads document collections to XTrace.

Module Contents¶

class xtrace_sdk.x_vec.data_loaders.loader.DataLoader(execution_context, integration)¶

Encrypts and uploads document collections to XTrace.

DataLoader handles the two encryption steps required before data reaches XTrace:

AES encryption of chunk content (text → ciphertext bytes).
Homomorphic encryption of embedding vectors (float vector → encrypted index).

The encrypted data is then uploaded via XTraceIntegration. Neither the plaintext chunk content nor the raw embedding vectors leave the client.

Parameters:

execution_context (xtrace_sdk.x_vec.utils.execution_context.ExecutionContext) – Initialised ExecutionContext containing the AES key and homomorphic client.
integration (xtrace_sdk.integrations.xtrace.XTraceIntegration) – Authenticated XTraceIntegration instance.

execution_context¶

integration¶

async dump_db(db, index, kb_id, concurrent=False)¶

Upload a pre-encrypted database to XTrace.

Typically called after load_data_from_memory() or load_data_from_memory_batch() have produced an (index, encrypted_db) pair.

Parameters:

db (EncryptedDB) – Encrypted document collection produced by load_data_from_memory().
index (EncryptedIndex) – Encrypted embedding vectors produced by load_data_from_memory().
kb_id (str) – Destination knowledge-base ID.
concurrent (bool, optional) – Upload batches concurrently. Defaults to False.

Returns:

List of server responses, one per upload batch.

Return type:

list[dict]

async upsert_one(chunk, vector, kb_id)¶

Encrypt and upload a single chunk.

Parameters:

chunk (Chunk) – Chunk dict with at minimum a chunk_content string field.
vector (list[float]) – Float embedding vector for this chunk.
kb_id (str) – Destination knowledge-base ID.

Returns:

Server response list.

Return type:

list[dict]

Raises:

ValueError – If chunk is not a dict.

async delete_chunks(chunk_ids, kb_id)¶

Delete chunks by ID.

Parameters:

chunk_ids (list[int]) – List of chunk IDs to delete.
kb_id (str) – Knowledge-base ID the chunks belong to.

Returns:

Server response.

Return type:

dict

async update_chunks(chunk_updates, vectors, kb_id)¶

Re-encrypt updated chunks and upload them.

Each chunk in chunk_updates must include a chunk_id field identifying the record to replace.

Parameters:

chunk_updates (list[Chunk]) – Updated chunk dicts, each containing a chunk_id.
vectors (list[list[float]]) – New float embedding vectors, one per chunk.
kb_id (str) – Knowledge-base ID the chunks belong to.

Returns:

Server response list.

Return type:

list[dict]

async load_data_from_memory(chunks, vectors, disable_progress=False)¶

Encrypt a document collection one chunk at a time.

AES-encrypts each chunk’s chunk_content and homomorphically encrypts each float embedding vector into an encrypted index. Results are ready to pass to dump_db().

Parameters:

chunks (DocumentCollection) – Document collection — each item must have a chunk_content string field.
vectors (list[list[float]]) – Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited bin_embed() call) — it will be awaited automatically.
disable_progress (bool, optional) – If True, suppress the tqdm progress bar, defaults to False.

Returns:

Tuple of (index, encrypted_db) where index contains the encrypted vectors and encrypted_db contains chunks with AES-encrypted content.

Return type:

tuple[EncryptedIndex, EncryptedDB]

async load_data_from_memory_batch(chunks, vectors, disable_progress=False)¶

Encrypt a document collection using batch homomorphic encryption.

Faster than load_data_from_memory() for large collections because all embedding vectors are passed to the homomorphic client in a single batch call instead of one at a time. AES encryption is still applied per chunk.

Parameters:

chunks (DocumentCollection) – Document collection — each item must have a chunk_content string field.
vectors (list[list[float]]) – Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited bin_embed() call) — it will be awaited automatically.
disable_progress (bool, optional) – Unused; kept for API compatibility with load_data_from_memory().

Returns:

Tuple of (index, encrypted_db) where index contains the encrypted vectors and encrypted_db contains chunks with AES-encrypted content.

Return type:

tuple[EncryptedIndex, EncryptedDB]