Advantages and use cases of Impala AI functions
You can use Impala's ai_generate_text function to access Large Language Models (LLMs) in SQL queries. This function enables you to input a prompt, retrieve the LLM response, and include it in results. You can create custom UDFs for complex tasks like sentiment analysis and translation.
Use LLMs directly in SQL with Impala's ai_generate_text
function>
Impala introduces a built-in AI function called ai_generate_text
that
enables direct access to and utilization of Large Language Models (LLMs) in SQL queries.
With this function, you can input a prompt, which may include data. The function
communicates with a supported LLM endpoint, sends the prompt, retrieves the response, and
includes it in the query result.
Alternatively, seamlessly integrate LLM intelligence into your Impala workflow by creating
custom User Defined Functions (UDFs) on top of ai_generate_text
. This
allows you to use concise SQL statements for sending prompts to an LLM and receiving
responses. You can define UDFs for complex tasks like sentiment analysis, language
translation, and generative contextual analysis.
Advantages of using AI functions
- Simplified Workflow: Eliminates the necessity for setting up intricate data pipelines.
- No ML Expertise Required: No specialized machine learning skills are needed.
- Swift Decision-Making: Enables faster insights on the data, facilitating critical business decisions by using in-database function calls.
- Integrated Functionality: Requires no external applications, as it is a built-in feature in Data Warehouse.
List of possible use cases
- Sentiment Analysis: Use the AI model to examine customer reviews for a product and identify their sentiment as positive, negative, or neutral.
- Language Translation: Translate product reviews written in different languages to understand customer feedback from various regions.
- Generative Contextual Analysis: Generate detailed reports and insights on various topics based on provided data.
Syntax for AI built-in function arguments
The following example of a built-in AI function demonstrates the use of the OpenAI API as a large language model. Currently, OpenAI's public endpoint and Azure OpenAI endpoints are supported.
- AI_GENERATE_TEXT_DEFAULT
- Syntax:
ai_generate_text_default(prompt)
- AI_GENERATE_TEXT
- Syntax:
ai_generate_text(ai_endpoint, prompt, ai_model, ai_api_key_jceks_secret, additional_params)
The ai_generate_text
function uses the values you provide as an argument
in the function for ai_endpoint
, ai_model
, and
ai_api_key_jceks_secret
. If any of the arguments are left empty or set to
NULL, the function uses the default values defined at the instance level. These default
values correspond to the flag settings configured in the Impala instance. For example, if
the ai_endpoint
argument is NULL or empty, the function will use the value
specified by the ai_endpoint
flag as the default.
When using the ai_generate_text_default
function, make sure to
set all parameters (ai_endpoint
, ai_model
, and
ai_api_key_jceks_secret
) in the coordinator/executor flagfiles with
appropriate values.
Key parameters for using the AI model
- ai_endpoint: The endpoint for the model API that is being interfaced with, supports services like OpenAI and Azure OpenAI Service, for example, https://api.openai.com/v1/chat/completions.
- prompt: The text you submit to the AI model to generate a response.
- ai_model: The specific model name you want to use within the desired API, for example, gpt-3.5-turbo.
- ai_api_key_jceks_secret: The key name for the JCEKS secret that contains your API
key for the AI API you are using. You need a JCEKS keystore containing the specified JCEKS
secret referenced in
ai_api_key_jceks_secret
. To do this, set thehadoop.security.credential.provider.path
property in thecore-site
configuration for both the executor and coordinator. - additional_params: Additional parameters that the AI API offers that is provided to the built-in function as a JSON object.
Examples of using the built-in AI function
ai_generate_text_default
.> select ai_generate_text_default('hello');
Response:
Hello! How can I assist you today?
> select customer_id, star_rating, ai_generate_text_default(CONCAT('Classify the following review as positive, neutral, or negative', and only include the uncapitalized category in the response: ', review_body)) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id LIMIT 1;
Response:
+--+------------+------------+----------------+------------------+
| |customer_id |star_rating |review_analysis |review_body |
+--+------------+------------+----------------+------------------+
|1 |4343565 | 5 |positive |What is this book |
| | | | |all about ………… |
+--+------------+------------+----------------+------------------+
Examples of creating and using custom UDFs along with the built-in AI function
Instead of writing the prompts in a SQL query, you can build a UDF with your intended
prompt. Once you build your custom UDF, pass your desired prompt within your custom UDF into
the ai_generate_text_default
built-in Impala function.
Example: Classify input customer reviews
The following UDF uses the Amazon book reviews database as the input and requests the LLM to classify the sentiment.
Classify input customer reviews:
IMPALA_UDF_EXPORT
StringVal ClassifyReviews(FunctionContext* context, const StringVal& input) {
std::string request =
std::string("Classify the following review as positive, neutral, or negative")
+ std::string(" and only include the uncapitalized category in the response: ")
+ std::string(reinterpret_cast<const char*>(input.ptr), input.len);
StringVal prompt(request.c_str());
const StringVal endpoint("https://api.openai.com/v1/chat/completions");
const StringVal model("gpt-3.5-turbo");
const StringVal api_key_jceks_secret("open-ai-key");
const StringVal params("{\"temperature\": 0.9, \"model\": \"gpt-4\"}");
return context->Functions()->ai_generate_text(
context, endpoint, prompt, model, api_key_jceks_secret, params);
}
Now you can define these prompt building UDFs and build them in Impala. Once you have them running, you can query your datasets using them.
Creating analyze_reviews
function:
> CREATE FUNCTION analyze_reviews(STRING)
RETURNS STRING
LOCATION ‘s3a://dw-...............’
SYMBOL=’ClassifyReviews’
Using SELECT query for Sentiment analysis to classify Amazon book reviews
> SELECT customer_id, star_rating, analyze_reviews(review_body) AS review_analysis, review_body from amazon_book_reviews where product_title='Artificial Superintelligence' order by customer_id;
Response:
+--+------------+------------+----------------+----------------------+
| |customer_id |star_rating |review_analysis |review_body |
+--+------------+------------+----------------+----------------------+
|1 |44254093 | 5 |positive |What is this book all |
| | | | |about? It is all about|
| | | | |a mind-blowing |
| | | | |universal law of |
| | | | |nature. Mind-blow… |
+--+------------+------------+----------------+----------------------+
|2 |50050072 | 5 |positive |The two tightly- |
| | | | |connected ideas strike|
| | | | |you as amazed. In the |
| | | | |first place, what has |
| | | | |never bef… |
+--+------------+------------+----------------+----------------------+
|3 |50050072 | 5 |positive |The two tightly- |
| | | | |connected ideas strike|
| | | | |you as amazed. In the |
| | | | |first place, what has |
| | | | |never bef… |
+--+------------+------------+----------------+----------------------+
|4 |52932308 | 1 |negative |This book is seriously|
| | | | |flawed. I could not |
| | | | |work out if the author|
| | | | |was a mathemetician |
| | | | |dabbi… |
+--+------------+------------+----------------+----------------------+
|5 |52971961 | 1 |negative |Abdoullaev's |
| | | | |exploration of |
| | | | |Al issues appears to |
| | | | |be very technological |
| | | | |and straightforward… |
+--+------------+------------+----------------+----------------------+
|6 |53008416 | 4 |positive |As Co Founder of |
| | | | |ArtilectWorld:ultra |
| | | | |intelligent machine, |
| | | | |I recommend reading |
| | | | |this book! |
+--+------------+------------+----------------+----------------------+