xtrace_sdk.x_vec.data_loaders ============================= .. py:module:: xtrace_sdk.x_vec.data_loaders Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/xtrace_sdk/x_vec/data_loaders/loader/index Classes ------- .. autoapisummary:: xtrace_sdk.x_vec.data_loaders.DataLoader Package Contents ---------------- .. py:class:: DataLoader(execution_context, integration) Encrypts and uploads document collections to XTrace. ``DataLoader`` handles the two encryption steps required before data reaches XTrace: 1. **AES encryption** of chunk content (text → ciphertext bytes). 2. **Homomorphic encryption** of embedding vectors (float vector → encrypted index). The encrypted data is then uploaded via :class:`~xtrace_sdk.integrations.xtrace.XTraceIntegration`. Neither the plaintext chunk content nor the raw embedding vectors leave the client. :param execution_context: Initialised :class:`~xtrace_vec.utils.execution_context.ExecutionContext` containing the AES key and homomorphic client. :param integration: Authenticated :class:`~xtrace_sdk.integrations.xtrace.XTraceIntegration` instance. .. py:attribute:: execution_context .. py:attribute:: integration .. py:method:: dump_db(db, index, kb_id, concurrent = False) :async: Upload a pre-encrypted database to XTrace. Typically called after :meth:`load_data_from_memory` or :meth:`load_data_from_memory_batch` have produced an ``(index, encrypted_db)`` pair. :param db: Encrypted document collection produced by :meth:`load_data_from_memory`. :type db: EncryptedDB :param index: Encrypted embedding vectors produced by :meth:`load_data_from_memory`. :type index: list[list[int]] :param kb_id: Destination knowledge-base ID. :type kb_id: str :param concurrent: Upload batches concurrently. Defaults to ``False``. :type concurrent: bool, optional :return: List of server responses, one per upload batch. :rtype: list[dict] .. py:method:: upsert_one(chunk, vector, kb_id) :async: Encrypt and upload a single chunk. :param chunk: Chunk dict with at minimum a ``chunk_content`` string field. :type chunk: Chunk :param vector: Float embedding vector for this chunk. :type vector: list[float] :param kb_id: Destination knowledge-base ID. :type kb_id: str :return: Server response list. :rtype: list[dict] :raises ValueError: If ``chunk`` is not a dict. .. py:method:: delete_chunks(chunk_ids, kb_id) :async: Delete chunks by ID. :param chunk_ids: List of chunk IDs to delete. :type chunk_ids: list[int] :param kb_id: Knowledge-base ID the chunks belong to. :type kb_id: str :return: Server response. :rtype: dict .. py:method:: update_chunks(chunk_updates, vectors, kb_id) :async: Re-encrypt updated chunks and upload them. Each chunk in ``chunk_updates`` must include a ``chunk_id`` field identifying the record to replace. :param chunk_updates: Updated chunk dicts, each containing a ``chunk_id``. :type chunk_updates: list[Chunk] :param vectors: New float embedding vectors, one per chunk. :type vectors: list[list[float]] :param kb_id: Knowledge-base ID the chunks belong to. :type kb_id: str :return: Server response list. :rtype: list[dict] .. py:method:: load_data_from_memory(chunks, vectors, disable_progress = False) :async: Encrypt a document collection one chunk at a time. AES-encrypts each chunk's ``chunk_content`` and homomorphically encrypts each float embedding vector into an encrypted index. Results are ready to pass to :meth:`dump_db`. :param chunks: Document collection — each item must have a ``chunk_content`` string field. :type chunks: DocumentCollection :param vectors: Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited ``bin_embed()`` call) — it will be awaited automatically. :type vectors: list[list[float]] :param disable_progress: If ``True``, suppress the tqdm progress bar, defaults to ``False``. :type disable_progress: bool, optional :return: Tuple of ``(index, encrypted_db)`` where ``index`` contains the encrypted vectors and ``encrypted_db`` contains chunks with AES-encrypted content. :rtype: tuple[list[list[int]], EncryptedDB] .. py:method:: load_data_from_memory_batch(chunks, vectors, disable_progress = False) :async: Encrypt a document collection using batch homomorphic encryption. Faster than :meth:`load_data_from_memory` for large collections because all embedding vectors are passed to the homomorphic client in a single batch call instead of one at a time. AES encryption is still applied per chunk. :param chunks: Document collection — each item must have a ``chunk_content`` string field. :type chunks: DocumentCollection :param vectors: Float embedding vectors, one per chunk. Each entry may also be a coroutine (e.g. an unawaited ``bin_embed()`` call) — it will be awaited automatically. :type vectors: list[list[float]] :param disable_progress: Unused; kept for API compatibility with :meth:`load_data_from_memory`. :type disable_progress: bool, optional :return: Tuple of ``(index, encrypted_db)`` where ``index`` contains the encrypted vectors and ``encrypted_db`` contains chunks with AES-encrypted content. :rtype: tuple[list[list[int]], EncryptedDB]