Other Utils

oscopilot.utils.utils.send_chat_prompts(sys_prompt, user_prompt, llm)[source]

Sends a sequence of chat prompts to a language learning model (LLM) and returns the model’s response.

Parameters:
  • sys_prompt (str) – The system prompt that sets the context or provides instructions for the language learning model.

  • user_prompt (str) – The user prompt that contains the specific query or command intended for the language learning model.

  • llm (object) – The language learning model to which the prompts are sent. This model is expected to have a chat method that accepts structured prompts.

Returns:

The response from the language learning model, which is typically a string containing the model’s answer or generated content based on the provided prompts.

The function is a utility for simplifying the process of sending structured chat prompts to a language learning model and parsing its response, useful in scenarios where dynamic interaction with the model is required.

oscopilot.utils.utils.random_string(length)[source]

Generates a random string of a specified length.

Parameters:

length (int) – The desired length of the random string.

Returns:

A string of random characters and digits of the specified length.

Return type:

str

oscopilot.utils.utils.num_tokens_from_string(string: str) int[source]

Calculates the number of tokens in a given text string according to a specific encoding.

Parameters:

text (str) – The text string to be tokenized.

Returns:

The number of tokens the string is encoded into according to the model’s tokenizer.

Return type:

int

oscopilot.utils.utils.parse_content(content, html_type='html.parser')[source]

Parses and cleans the given HTML content, removing specified tags, ids, and classes.

Parameters:
  • content (str) – The HTML content to be parsed and cleaned.

  • type (str, optional) – The type of parser to be used by BeautifulSoup. Defaults to “html.parser”. Supported types include “html.parser”, “lxml”, “lxml-xml”, “xml”, and “html5lib”.

Raises:

ValueError – If an unsupported parser type is specified.

Returns:

The cleaned text extracted from the HTML content.

Return type:

str

oscopilot.utils.utils.clean_string(text)[source]

Cleans a given string by performing various operations such as whitespace normalization, removal of backslashes, and replacement of hash characters with spaces. It also reduces consecutive non-alphanumeric characters to a single occurrence.

Parameters:

text (str) – The text to be cleaned.

Returns:

The cleaned text after applying all the specified cleaning operations.

Return type:

str

oscopilot.utils.utils.chunks(iterable, batch_size=100, desc='Processing chunks')[source]

Breaks an iterable into smaller chunks of a specified size, yielding each chunk in sequence.

Parameters:
  • iterable (iterable) – The iterable to be chunked.

  • batch_size (int, optional) – The size of each chunk. Defaults to 100.

  • desc (str, optional) – Description text to be displayed alongside the progress bar. Defaults to “Processing chunks”.

Yields:

tuple – A chunk of the iterable, with a maximum length of batch_size.

oscopilot.utils.utils.generate_prompt(template: str, replace_dict: dict)[source]

Generates a string by replacing placeholders in a template with values from a dictionary.

Parameters:
  • template (str) – The template string containing placeholders to be replaced.

  • replace_dict (dict) – A dictionary where each key corresponds to a placeholder in the template and each value is the replacement for that placeholder.

Returns:

The resulting string after all placeholders have been replaced with their corresponding values.

Return type:

str

oscopilot.utils.utils.cosine_similarity(a, b)[source]

Calculates the cosine similarity between two vectors.

Parameters:
  • a (array_like) – The first vector.

  • b (array_like) – The second vector.

Returns:

The cosine similarity between vectors a and b.

Return type:

float

oscopilot.utils.utils.is_valid_json_string(source: str)[source]

Checks if a given string is a valid JSON.

Parameters:

source (str) – The string to be validated as JSON.

Returns:

True if the given string is a valid JSON format, False otherwise.

Return type:

bool