LLM Inference

XTrace SDK also provides packaged inference capabilities for multiple inference service providers
  1. OpenAI

  2. Anthropic

  3. Redpill

  4. Ollama

For end to end privacy we recommand users to use locally set up Ollama for inference service. If that is not feasible, user can still achieve privacy via GPU TEE using Redpill’s inference service. If inference privacy is not a concern, users can use OpenAI or Anthropic’s inference service.

Use OpenAI as Inference Provider

from xtrace_sdk.inference.llm import InferenceClient

inference = InferenceClient(inference_provider="OpenAI", model_name="o1", api_key="your_api_key")
inference.query("how many r are there in the word strawberry")

For supported models, refer to the documentation of OpenAI.

Use Redpill as Inference Provider

Why Use Redpill?

Redpill provides private inference with models running in TEE (Trusted Execution Environment) GPUs, ensuring your data and queries remain secure and private during inference. This makes it an ideal choice when you need privacy protection but cannot run models locally.

Key features: - Private inference: Models run in TEE GPU environments - Unified API: Access to 200+ AI models through a single API - High performance: Superior RPM (Requests Per Minute) and TPM (Tokens Per Minute) - Cost-effective: Tokenization model for transparent pricing

Registration and Setup

  1. Create API Key: https://redpill.ai/

  2. Usage Example:

from xtrace_sdk.inference.llm import InferenceClient

inference = InferenceClient(inference_provider="redpill", model_name="DeepSeek: DeepSeek V3 0324", api_key="your_api_key")
inference.query("how many r are there in the word strawberry")

Supported Models

Currently supported models include:

  • Google: Gemma 3 27B

  • OpenAI: GPT OSS 120B

  • Qwen: Qwen3 Coder, Qwen2.5 VL 72B Instruct, Qwen2.5 7B Instruct

  • DeepSeek: DeepSeek V3 0324

  • Meta: Llama 3.3 70B Instruct

For the complete list of available models and pricing, visit https://docs.redpill.ai/

Embedding models

XTrace SDK provides embedding models through Ollama and OpenAI. We recommand users to use Ollama for embedding as it can be run locally and thus provide better privacy protection. If that is not feasible, users can use OpenAI’s embedding models.

Use Ollama as Embedding Provider

Ollama can be run as a local service. For more details about how to set up Ollama, please refer to https://ollama.com/docs/installation.

Assuming you have set up Ollama service, you can use the following code to get embeddings:

from xtrace_sdk.inference.embedding import Embedding

embed = Embedding("ollama","mxbai-embed-large",1024)
vector = embed.bin_embed("how many r are there in the word strawberry")

Use Sentence Transformers as Embedding Provider

You can use Sentence Transformers for generating embeddings locally. For more details about Sentence Transformers, please refer to https://www.sbert.net/. Note that you need to hae the model downloaded locally to use it. You can find the list of available models at https://www.sbert.net/docs/pretrained_models.html.

from xtrace_sdk.inference.embedding import Embedding

embed = Embedding("sentence_transformer","mxbai-embed-large-v1",512)
vector = embed.bin_embed("how many r are there in the word strawberry")

Use OpenAI as Embedding Provider

If privacy is not a concern, you can use OpenAI’s embedding models. For supported models, refer to the documentation of OpenAI. Note that you need to set your OpenAI API key as environment variable to use OpenAI’s embedding models.

from xtrace_sdk.inference.embedding import Embedding

embed = Embedding("openai","text-embedding-3-small",1536)
vector = embed.bin_embed("how many r are there in the word strawberry")

Bring Your Own Vectors

If you have your own vectors, you can use them directly without using any embedding models. Just make sure the vectors are in the correct format (list of floats).

from xtrace_sdk.inference.embedding import Embedding

your_vector = [0.1, -0.2, 0.3, ...] # list of floats
xtrace_compatible_vector = Embedding.float_2_bin(your_vector) # convert to list of binary integers