vanna
Source Code
View the source code on GitHub: https://github.com/vanna-ai/vanna
Basic Usage
Getting an API key
import vanna as vn
api_key = vn.get_api_key('[email protected]')
vn.set_api_key(api_key)
Setting the model
vn.set_model('chinook')
Asking a question
vn.ask(question='What are the top 10 artists by sales?')
vn.ask(...)
is a convenience wrapper around vn.generate_sql(...)
, vn.run_sql(...)
, vn.generate_plotly_code(...)
, vn.get_plotly_figure(...)
, and vn.generate_followup_questions(...)
.
For a runnable notebook where you can ask questions, see here
Training
There are 3 main types of training data that you can add to a model: SQL, DDL, and documentation.
# DDL Statements
vn.train(ddl='CREATE TABLE employees (id INT, name VARCHAR(255), salary INT)')
# Documentation
vn.train(documentation='Our organization\'s definition of sales is the discount price of an item multiplied by the quantity sold.')
# SQL
vn.train(sql='SELECT AVG(salary) FROM employees')
vn.train(...)
is a convenience wrapper around vn.add_sql(...)
, vn.add_ddl(...)
, and vn.add_documentation(...)
.
For a runnable notebook where you can train a model, see here
Nomenclature
Prefix | Definition | Examples |
---|---|---|
vn.set_ |
Sets the variable for the current session | [vn.set_model(...) ][set_model] [ vn.set_api_key(...) ][set_api_key] |
vn.get_ |
Performs a read-only operation | [vn.get_model() ][get_models] |
vn.add_ |
Adds something to the model | [vn.add_sql(...) ][add_sql] [ vn.add_ddl(...) ][add_ddl] |
vn.generate_ |
Generates something using AI based on the information in the model | [vn.generate_sql(...) ][generate_sql] [ vn.generate_explanation() ][generate_explanation] |
vn.run_ |
Runs code (SQL or Plotly) | [vn.run_sql ][run_sql] |
vn.remove_ |
Removes something from the model | [vn.remove_training_data ][remove_training_data] |
vn.update_ |
Updates something in the model | [vn.update_model_visibility(...) ][update_model_visibility] |
vn.connect_ |
Connects to a database | [vn.connect_to_snowflake(...) ][connect_to_snowflake] |
Permissions
By default when you create a model it is private. You can add members or admins to your model or make it public.
User Role | Public Model | Private Model | ||
---|---|---|---|---|
Use | Train | Use | Train | |
Non-Member | ✅ | ❌ | ❌ | ❌ |
Member | ✅ | ❌ | ✅ | ❌ |
Admin | ✅ | ✅ | ✅ | ✅ |
Open-Source and Extending
Vanna.AI is open-source and extensible. If you'd like to use Vanna without the servers, see an example here.
The following is an example of where various functions are implemented in the codebase when using the default "local" version of Vanna. vanna.base.VannaBase
is the base class which provides a vanna.base.VannaBase.ask
and vanna.base.VannaBase.train
function. Those rely on abstract methods which are implemented in the subclasses vanna.openai_chat.OpenAI_Chat
and vanna.chromadb_vector.ChromaDB_VectorStore
. vanna.openai_chat.OpenAI_Chat
uses the OpenAI API to generate SQL and Plotly code. vanna.chromadb_vector.ChromaDB_VectorStore
uses ChromaDB to store training data and generate embeddings.
If you want to use Vanna with other LLMs or databases, you can create your own subclass of vanna.base.VannaBase
and implement the abstract methods.
flowchart subgraph VannaBase ask train end subgraph OpenAI_Chat get_sql_prompt submit_prompt generate_question generate_plotly_code end subgraph ChromaDB_VectorStore generate_embedding add_question_sql add_ddl add_documentation get_similar_question_sql get_related_ddl get_related_documentation end
API Reference
Example
vn.run_sql = lambda sql: pd.read_sql(sql, engine)
Set the SQL to DataFrame function for Vanna.AI. This is used in the [vn.ask(...)
][ask] function.
Instead of setting this directly you can also use [vn.connect_to_snowflake(...)
][connect_to_snowflake] to set this.
Example:
vn.get_api_key(email="[email protected]")
Login to the Vanna.AI API.
Arguments:
- email (str): The email address to login with.
- otp_code (Union[str, None]): The OTP code to login with. If None, an OTP code will be sent to the email address.
Returns:
str: The API key.
Sets the API key for Vanna.AI.
Example:
api_key = vn.get_api_key(email="[email protected]")
vn.set_api_key(api_key)
Arguments:
- key (str): The API key.
Example:
models = vn.get_models()
List the models that the user is a member of.
Returns:
List[str]: A list of model names.
Example:
vn.create_model(model="my-model", db_type="postgres")
Create a new model.
Arguments:
- model (str): The name of the model to create.
- db_type (str): The type of database to use for the model. This can be "Snowflake", "BigQuery", "Postgres", or anything else.
Returns:
bool: True if the model was created successfully, False otherwise.
Example:
vn.add_user_to_model(model="my-model", email="[email protected]")
Add a user to an model.
Arguments:
- model (str): The name of the model to add the user to.
- email (str): The email address of the user to add.
- is_admin (bool): Whether or not the user should be an admin.
Returns:
bool: True if the user was added successfully, False otherwise.
Example:
vn.update_model_visibility(public=True)
Set the visibility of the current model. If a model is visible, anyone can see it. If it is not visible, only members of the model can see it.
Arguments:
- public (bool): Whether or not the model should be publicly visible.
Returns:
bool: True if the model visibility was set successfully, False otherwise.
Set the models to use for the Vanna.AI API.
Example:
vn.set_model("my-model")
Arguments:
- model (str): The name of the model to use.
Adds a question and its corresponding SQL query to the model's training data. The preferred way to call this is to use [vn.train(sql=...)
][train].
Example:
vn.add_sql(
question="What is the average salary of employees?",
sql="SELECT AVG(salary) FROM employees"
)
Arguments:
- question (str): The question to store.
- sql (str): The SQL query to store.
- tag (Union[str, None]): A tag to associate with the question and SQL query.
Returns:
bool: True if the question and SQL query were stored successfully, False otherwise.
Adds a DDL statement to the model's training data
Example:
vn.add_ddl(
ddl="CREATE TABLE employees (id INT, name VARCHAR(255), salary INT)"
)
Arguments:
- ddl (str): The DDL statement to store.
Returns:
bool: True if the DDL statement was stored successfully, False otherwise.
Adds documentation to the model's training data
Example:
vn.add_documentation(
documentation="Our organization's definition of sales is the discount price of an item multiplied by the quantity sold."
)
Arguments:
- documentation (str): The documentation string to store.
Returns:
bool: True if the documentation string was stored successfully, False otherwise.
A class representing a training plan. You can see what's in it, and remove items from it that you don't want trained.
Example:
plan = vn.get_training_plan()
plan.get_summary()
EXPERIMENTAL : This method is experimental and may change in future versions.
Get a training plan based on the metadata in the database. Currently this only works for Snowflake.
Example:
plan = vn.get_training_plan_experimental(filter_databases=["employees"], filter_schemas=["public"])
vn.train(plan=plan)
Example:
vn.train()
Train Vanna.AI on a question and its corresponding SQL query.
If you call it with no arguments, it will check if you connected to a database and it will attempt to train on the metadata of that database.
If you call it with the sql argument, it's equivalent to [add_sql()
][add_sql].
If you call it with the ddl argument, it's equivalent to [add_ddl()
][add_ddl].
If you call it with the documentation argument, it's equivalent to [add_documentation()
][add_documentation].
It can also accept a JSON file path or SQL file path to train on a batch of questions and SQL queries or a list of SQL queries respectively.
Additionally, you can pass a [TrainingPlan
][TrainingPlan] object. Get a training plan with [vn.get_training_plan_experimental()
][get_training_plan_experimental].
Arguments:
- question (str): The question to train on.
- sql (str): The SQL query to train on.
- sql_file (str): The SQL file path.
- json_file (str): The JSON file path.
- ddl (str): The DDL statement.
- documentation (str): The documentation to train on.
- plan (TrainingPlan): The training plan to train on.
Example:
vn.flag_sql_for_review(question="What is the average salary of employees?")
Flag a question and its corresponding SQL query for review. You can see the tag show up in [vn.get_all_questions()
][get_all_questions]
Arguments:
- question (str): The question to flag.
- sql (str): The SQL query to flag.
- error_msg (str): The error message to flag.
Returns:
bool: True if the question and SQL query were flagged successfully, False otherwise.
Remove a question and its corresponding SQL query from the model's training data
Example:
vn.remove_sql(question="What is the average salary of employees?")
Arguments:
- question (str): The question to remove.
Remove training data from the model
Example:
vn.remove_training_data(id="1-ddl")
Arguments:
- id (str): The ID of the training data to remove.
Example:
vn.generate_sql(question="What is the average salary of employees?")
# SELECT AVG(salary) FROM employees
Generate an SQL query using the Vanna.AI API.
Arguments:
- question (str): The question to generate an SQL query for.
Returns:
str or None: The SQL query, or None if an error occurred.
Example:
vn.generate_meta(question="What tables are in the database?")
# Information about the tables in the database
Generate answers about the metadata of a database using the Vanna.AI API.
Arguments:
- question (str): The question to generate an answer for.
Returns:
str or None: The answer, or None if an error occurred.
Example:
vn.generate_followup_questions(question="What is the average salary of employees?", df=df)
# ['What is the average salary of employees in the Sales department?', 'What is the average salary of employees in the Engineering department?', ...]
Generate follow-up questions using the Vanna.AI API.
Arguments:
- question (str): The question to generate follow-up questions for.
- df (pd.DataFrame): The DataFrame to generate follow-up questions for.
Returns:
List[str] or None: The follow-up questions, or None if an error occurred.
Example:
vn.generate_questions()
# ['What is the average salary of employees?', 'What is the total salary of employees?', ...]
Generate questions using the Vanna.AI API.
Returns:
List[str] or None: The questions, or None if an error occurred.
Example:
# RECOMMENDED IN A NOTEBOOK:
sql, df, fig, followup_questions = vn.ask()
sql, df, fig, followup_questions = vn.ask(question="What is the average salary of employees?")
# SELECT AVG(salary) FROM employees
Ask a question using the Vanna.AI API. This generates an SQL query, runs it, and returns the results in a dataframe and a Plotly figure. If you set print_results to True, the sql, dataframe, and figure will be output to the screen instead of returned.
Arguments:
- question (str): The question to ask. If None, you will be prompted to enter a question.
- print_results (bool): Whether to print the SQL query and results.
- auto_train (bool): Whether to automatically train the model if the SQL query is incorrect.
- generate_followups (bool): Whether to generate follow-up questions.
Returns:
str or None: The SQL query, or None if an error occurred. pd.DataFrame or None: The results of the SQL query, or None if an error occurred. plotly.graph_objs.Figure or None: The Plotly figure, or None if an error occurred. List[str] or None: The follow-up questions, or None if an error occurred.
Example:
vn.generate_plotly_code(
question="What is the average salary of employees?",
sql="SELECT AVG(salary) FROM employees",
df=df
)
# fig = px.bar(df, x="name", y="salary")
Generate Plotly code using the Vanna.AI API.
Arguments:
- question (str): The question to generate Plotly code for.
- sql (str): The SQL query to generate Plotly code for.
- df (pd.DataFrame): The dataframe to generate Plotly code for.
- chart_instructions (str): Optional instructions for how to plot the chart.
Returns:
str or None: The Plotly code, or None if an error occurred.
Example:
fig = vn.get_plotly_figure(
plotly_code="fig = px.bar(df, x='name', y='salary')",
df=df
)
fig.show()
Get a Plotly figure from a dataframe and Plotly code.
Arguments:
- df (pd.DataFrame): The dataframe to use.
- plotly_code (str): The Plotly code to use.
Returns:
plotly.graph_objs.Figure: The Plotly figure.
DEPRECATED. Use vn.run_sql
instead.
Run the SQL query and return the results as a pandas dataframe. This is just a helper function that does not use the Vanna.AI API.
Arguments:
- cs: Snowflake connection cursor.
- default_database (str): The default database to use.
- sql (str): The SQL query to execute.
Returns:
pd.DataFrame: The results of the SQL query.
Example:
vn.generate_explanation(sql="SELECT * FROM students WHERE name = 'John Doe'")
# 'This query selects all columns from the students table where the name is John Doe.'
Generate an explanation of an SQL query using the Vanna.AI API.
Arguments:
- sql (str): The SQL query to generate an explanation for.
Returns:
str or None: The explanation, or None if an error occurred.
Example:
vn.generate_question(sql="SELECT * FROM students WHERE name = 'John Doe'")
# 'What is the name of the student?'
Generate a question from an SQL query using the Vanna.AI API.
Arguments:
- sql (str): The SQL query to generate a question for.
Returns:
str or None: The question, or None if an error occurred.
Get a list of questions from the Vanna.AI API.
Example:
questions = vn.get_all_questions()
Returns:
pd.DataFrame or None: The list of questions, or None if an error occurred.
Get the training data for the current model
Example:
training_data = vn.get_training_data()
Returns:
pd.DataFrame or None: The training data, or None if an error occurred.
Connect to a SQLite database. This is just a helper function to set [vn.run_sql
][run_sql]
Arguments:
- url (str): The URL of the database to connect to.
Returns:
None
Connect to Snowflake using the Snowflake connector. This is just a helper function to set [vn.run_sql
][run_sql]
Example:
import snowflake.connector
vn.connect_to_snowflake(
account="myaccount",
username="myusername",
password="mypassword",
database="mydatabase",
role="myrole",
)
Arguments:
- account (str): The Snowflake account name.
- username (str): The Snowflake username.
- password (str): The Snowflake password.
- database (str): The default database to use.
- schema (Union[str, None], optional): The schema to use. Defaults to None.
- role (Union[str, None], optional): The role to use. Defaults to None.
Connect to postgres using the psycopg2 connector. This is just a helper function to set [vn.run_sql
][run_sql]
Example:
import psycopg2.connect
vn.connect_to_bigquery(
host="myhost",
dbname="mydatabase",
user="myuser",
password="mypassword",
port=5432
)
Arguments:
- host (str): The postgres host.
- dbname (str): The postgres database name.
- user (str): The postgres user.
- password (str): The postgres password.
- port (int): The postgres Port.
Connect to gcs using the bigquery connector. This is just a helper function to set [vn.run_sql
][run_sql]
Example:
import bigquery.Client
vn.connect_to_bigquery(
project_id="myprojectid",
cred_file_path="path/to/credentials.json",
)
Arguments:
- project_id (str): The gcs project id.
- cred_file_path (str): The gcs credential file path