Quickstart

This page gives a good introduction in how to get started with XTraceSDK.

First make sure that you have installed XTraceSDK. You can find the installation instructions in the install section.

Preprocessing Data with DataLoader

Say we want to p, we would import the TxtLoader class and all the necessary cryptography modules.

from xtrace_sdk.data_loaders.base import DataLoaderBase
from xtrace_sdk.crypto.paillier_client import PaillierClient
from xtrace_sdk.crypto.encryption.aes import AESClient
from xtrace_sdk.utils.embedding import Embedding
from xtrace_sdk.integrations.local.storage import LocalStorage
from xtrace_sdk.connectors.local_disk_connector import LocalDiskConnector

Next we need to create instances of crytpo clients, embedding and storage module to be used by the data loader.

paillier_client = PaillierClient()
aes_client = AESClient("this is a super safe AES Key")
embedding = Embedding()
storage = LocalStorage("data/") #mounts the data directory to the storage module

Now we can create an instance of the DataLoaderBase class and use it with a LocalDiskConnector to load our data from a file into encrypted index and database to be used later.

data_loader = DataLoaderBase(embedding, aes_client, paillier_client, storage)
connector = LocalDiskConnector("path/to/your/data/files")
collection = connector.load_data()
index,db = data_loader.load_data_from_memory(collection)

We can also dump the index and database to a storage mounted in the storage object for later use:

# saves the index and database to under data/ directory
await data_loader.dump_index(index)
await data_loader.dump_db(db)

Quering Encrypted Data with Encrypted Query using Retriever

Now that we have preprocessed our data, we can query it using the Retriever class. We will use the same crypto clients, embedding as before, with a compute module to perform the query logic.

from xtrace_sdk.compute.local.compute import LocalCompute
from xtrace_sdk.retrievers.simple_retriever import SimpleRetriever


compute = LocalCompute('data/', paillier_client)
retriever = SimpleRetriever(embedding,aes_client,paillier_client,compute)

Quering is as simple as calling the query method on the retriever object with the index and database we created earlier.

query = "How do I use XTraceSDK?"
ids = await retriever.nn_search_for_ids(query,k=3,meta_filter=None)
contexts = await retriever.retrieve_and_decrypt(ids)
print(contexts)
# Output: ['XTraceSDK is a software development kit that allows you to ...', 'To use XTraceSDK, you need to ...', 'XTraceSDK is designed to be ...']

Integrate with LLMs to Build Privacy Preserving RAG Pipeline

You can easily integrate the retriever with an LLM (Large Language Model) to generate responses based on the retrieved contexts like the one below. To do this, you can use the OllamaClient to interact with an LLM like DeepSeek. First, make sure you have the Ollama installed and configured.

from xtrace_sdk.inference.ollama import OllamaClient
from xtrace_sdk.retrievers.base import RetrieverBase
formated_context = RetrieverBase.format_context([i['chunk_content'] for i in contexts])
RAG_PROMPT_TEMP = lambda context,query: f"DOCUMENT:\n{context}\nQUESTION:\n{query}\nINSTRUCTIONS:\n Answer the users QUESTION using the DOCUMENT text above.Keep your answer ground in the facts of the DOCUMENT."
OLLAMA_URL = 'http://localhost:11434'
deepseek = OllamaClient(OLLAMA_URL,"deepseek-r1:1.5b")
deepseek.query(RAG_PROMPT_TEMP(formated_context,q))