xtrace_sdk.utils.chunking

Classes

ChunkingUtils

A utility class for chunking text into smaller pieces. This class provides methods for splitting text into smaller chunks for embedding and encryption.

Module Contents

class xtrace_sdk.utils.chunking.ChunkingUtils

A utility class for chunking text into smaller pieces. This class provides methods for splitting text into smaller chunks for embedding and encryption. A wrapper for langchain text splitters.

static split_text(text_data, max_chunk_size=300)

A helper method for splitting text data into smaller chunks. This method is used for splitting text data into smaller chunks for embedding and encryption.

Parameters:
  • text_data (str) – the text data to be split

  • max_chunk_size (int) – the maximum size of each chunk, defaults to 300

Returns:

a list of chunks

Return type:

Iterable[str]

static split_html(html_data)

A helper method for splitting HTML data into smaller chunks. This method is used for splitting HTML data into smaller chunks for embedding and encryption.

Parameters:
  • html_data (str) – the HTML data to be split

  • max_chunk_size (int) – the maximum size of each chunk, defaults to 300

Returns:

a list of chunks

Return type:

Iterable[str]

static split_json(json_data, max_chunk_size=300)

A helper method for splitting JSON data into smaller chunks. This method is used for splitting JSON data into smaller chunks for embedding and encryption.

Parameters:
  • json_data (str) – the JSON data to be split

  • max_chunk_size (int) – the maximum size of each chunk, defaults to 300

Returns:

a list of chunks

Return type:

Iterable[str]

static split_markdown(markdown_data)

A helper method for splitting Markdown data into smaller chunks. This method is used for splitting Markdown data into smaller chunks for embedding and encryption.

Parameters:

markdown_data (str) – the Markdown data to be split

Returns:

a list of chunks

Return type:

Iterable[str]

static split_code(code_data, language, chunk_size)

A helper method for splitting code data into smaller chunks. This method is used for splitting code data into smaller chunks for embedding and encryption.

Parameters:
  • code_data (str) – the code data to be split

  • language (Language) – the language of the code data

  • chunk_size (int)

Returns:

a list of chunks

Return type:

Iterable[str]