xtrace_sdk.x_vec.data_loaders.loader

Attributes

Classes

DataLoader

Encrypts and uploads document collections to XTrace.

Module Contents

xtrace_sdk.x_vec.data_loaders.loader._log
class xtrace_sdk.x_vec.data_loaders.loader.DataLoader(execution_context, integration)

Encrypts and uploads document collections to XTrace.

DataLoader handles the two encryption steps required before data reaches XTrace:

  1. AES encryption of chunk content (text → ciphertext bytes).

  2. Homomorphic encryption of embedding vectors (float vector → encrypted index).

The encrypted data is then uploaded via XTraceIntegration. Neither the plaintext chunk content nor the raw embedding vectors leave the client.

Parameters:
execution_context
integration
async dump_db(db, index, kb_id, concurrent=False)

Upload a pre-encrypted database to XTrace.

Typically called after load_data_from_memory() or load_data_from_memory_batch() have produced an (index, encrypted_db) pair.

Parameters:
Returns:

List of server responses, one per upload batch.

Return type:

list[dict]

async upsert_one(chunk, vector, kb_id)

Encrypt and upload a single chunk.

Parameters:
  • chunk (Chunk) – Chunk dict with at minimum a chunk_content string field.

  • vector (list[float]) – Float embedding vector for this chunk.

  • kb_id (str) – Destination knowledge-base ID.

Returns:

Server response list.

Return type:

list[dict]

Raises:

ValueError – If chunk is not a dict.

async delete_chunks(chunk_ids, kb_id)

Delete chunks by ID.

Parameters:
  • chunk_ids (list[int]) – List of chunk IDs to delete.

  • kb_id (str) – Knowledge-base ID the chunks belong to.

Returns:

Server response.

Return type:

dict

async update_chunks(chunk_updates, vectors, kb_id)

Re-encrypt updated chunks and upload them.

Each chunk in chunk_updates must include a chunk_id field identifying the record to replace.

Parameters:
  • chunk_updates (list[Chunk]) – Updated chunk dicts, each containing a chunk_id.

  • vectors (list[list[float]]) – New float embedding vectors, one per chunk.

  • kb_id (str) – Knowledge-base ID the chunks belong to.

Returns:

Server response list.

Return type:

list[dict]

async load_data_from_memory(chunks, vectors, disable_progress=False)

Encrypt a document collection one chunk at a time.

AES-encrypts each chunk’s chunk_content and homomorphically encrypts each float embedding vector into an encrypted index. Results are ready to pass to dump_db().

Parameters:
  • chunks (DocumentCollection) – Document collection — each item must have a chunk_content string field.

  • vectors (list[list[float]]) – Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited bin_embed() call) — it will be awaited automatically.

  • disable_progress (bool, optional) – If True, suppress the tqdm progress bar, defaults to False.

Returns:

Tuple of (index, encrypted_db) where index contains the encrypted vectors and encrypted_db contains chunks with AES-encrypted content.

Return type:

tuple[list[list[int]], EncryptedDB]

async load_data_from_memory_batch(chunks, vectors, disable_progress=False)

Encrypt a document collection using batch homomorphic encryption.

Faster than load_data_from_memory() for large collections because all embedding vectors are passed to the homomorphic client in a single batch call instead of one at a time. AES encryption is still applied per chunk.

Parameters:
  • chunks (DocumentCollection) – Document collection — each item must have a chunk_content string field.

  • vectors (list[list[float]]) – Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited bin_embed() call) — it will be awaited automatically.

  • disable_progress (bool, optional) – Unused; kept for API compatibility with load_data_from_memory().

Returns:

Tuple of (index, encrypted_db) where index contains the encrypted vectors and encrypted_db contains chunks with AES-encrypted content.

Return type:

tuple[list[list[int]], EncryptedDB]