xtrace_sdk.data_loaders.base

Classes

DataLoaderBase

This is the base class for a DataLoader.

Module Contents

class xtrace_sdk.data_loaders.base.DataLoaderBase(execution_context, integration, **kwargs)

This is the base class for a DataLoader.

Parameters:
execution_context
integration
classmethod _from_disk(path_to_execution_context, passphrase, integration)

construct a DataLoader instance from saved execution context on disk.

Parameters:
Returns:

a DataLoader instance.

Return type:

DataLoader

classmethod _from_remote(passphrase, context_id, integration)
Async:

Parameters:
Return type:

Any

construct a DataLoader instance from saved execution context in remote storage.

Parameters:
Returns:

a DataLoader instance.

Return type:

DataLoader

static init_default_execution_context(passphrase, homomorphic_client_type, embedding_length, key_len, path=None)

Initializes a default execution context with AES, embedding, and homomorphic encryption clients.

Parameters:
  • passphrase (str) – The AES key to be used for encryption and decryption of text chunks.

  • homomorphic_client_type (str) – The type of homomorphic client to be used, e.g., “PaillierClient”

  • path (str, optional) – The path to the execution context file, defaults to None

  • embedding_length (int)

  • key_len (int)

Returns:

The initialized execution context

Return type:

dict

dump_execution_context_to_disk(path)

saves execution context to disk

Parameters:

path (str) – the path to which the execution context will be dumped

Return type:

None

async dump_index(index, **kwargs)

Dumps index from memory to a persistent storage. Could be local disk or some remote storage service.

Parameters:
  • index (Index) – index object in memory

  • kwargs (Any)

Return type:

Any

async dump_db(db, **kwargs)

dumps a db file to a persistent storage for future use

Parameters:
  • index (EncryptedDB) – a DB object that’s going to be saved somethere

  • dest (str, optional) – the destination to which the db file will be stored. Could be local path or remote url. defaults to None

  • db (xtrace_sdk.utils.xtrace_types.EncryptedDB)

  • kwargs (Any)

Return type:

Any

async upsert_one(chunk, vector, **kwargs)

This method is used for upserting a single chunk of data into the database. It encrypts the chunk and stores it in the database.

Parameters:
  • chunk (Chunk) – the chunk of data to be upserted

  • vector (list[float])

  • kwargs (Any)

Raises:

ValueError – if the chunk is not a Chunk instance

Return type:

Any

async delete_chunks(chunk_ids, **kwargs)

This method is used for deleting chunks from the database.

Parameters:
  • chunk_ids (list[int]) – the list of chunk ids to be deleted

  • kwargs (Any)

Return type:

Any

async update_chunks(chunk_updates, vectors, **kwargs)

This method is used for updating chunks in the database. It encrypts the chunks and updates them in the database.

Parameters:
  • chunks (DocumentCollection) – the list of chunks to be updated

  • vectors (Iterable[list[int]]) – the list of vectors corresponding to the chunks

  • chunk_updates (list[dict])

  • kwargs (Any)

Raises:

ValueError – if any chunk is not a Chunk instance

Return type:

Any

load_data_from_memory(chunks, vectors, disable_progress=False)

This method is used for loading document collection from memory and encrypting it. Modifies the document collection in place.

Parameters:
  • chunks (DocumentCollection) – the document collection to be loaded

  • vectors (Iterable[list[int]]) – the list of vectors corresponding to the chunks

  • disable_progress (bool, optional) – whether to disable the progress bar, defaults to False

Returns:

the index and the encrypted db as a tuple

Return type:

tuple[Index, EncryptedDB]