Vanna.AI Data Security FAQ
Vanna AI Architecture
Vanna AI Core Python Package
The Vanna AI Core Python Package is a Python package that provides a set of tools for connecting to various databases, generating SQL queries using AI, running SQL queries, generating visualizations, and related functionality. The package is designed to be extensible, allowing users to add or modify functionality as needed.
In order to function, the Python package needs 2 major components - a large language model (LLM) and a retrieval augmentation layer. The LLM is responsible for generating SQL queries from natural language questions, while the retrieval augmentation layer is responsible for providing context to the LLM. The retrieval augmentation layer is trained on a combination of DDL statements, documentation strings, SQL statements, and question-SQL pairs.
You may choose to use the Vanna AI Core Python Package with your own LLM and retrieval augmentation or you may choose to use the Vanna AI Hosted Services, which provide access to an LLM and retrieval augmentation layer.
Code Integrity
The core Python package is an open-source project, and the code is available on GitHub. Code that is contributed to the project is reviewed by the Vanna AI team before being merged into the main codebase. The code is also subject to automated testing and linting to ensure that it meets the project's standards.
Vanna AI Hosted Services
If you use Vanna's hosted services, the training data is stored on Vanna's servers. Some of that data is sent to the LLM for the purpose of generating SQL queries or related functionality.
Data Stored
-
DDL statements that were used to train the system (e.g. from
vn.train(ddl=...)
) -
Documentation strings that were used to train the system (e.g. from
vn.train(documentation=...)
) -
SQL statements that were used to train the system (e.g. from
vn.train(sql=...)
) -
Question-SQL pairs that were used to train the system (e.g. from
vn.train(question=..., sql=...)
)
Data Sent to LLM
During each call, a subset of the data stored is sent to the LLM for the purpose of generating SQL queries or related functionality. This data is sent securely over HTTPS.
Database contents are not sent to Vanna's servers or the LLM unless you specifically set the parameter
allow_llm_to_see_data = True
in the built-in Flask app or use functions like
vn.generate_summary
explicitly that require the LLM to "see" the data in order to produce an answer. This parameter is set to
False
by default.
For functionality that requires the LLM to "see" the data, the data is only sent to the LLM and not stored on Vanna's servers.
Database Credentials
Database credentials are only used in the context of the Python package and are not sent to Vanna's servers. They are used to connect to your database and run SQL queries locally wherever the Python package is running.
Third-Party Services
Vanna AI uses the following third-party services for hosting, storage, and other functionality. These services are chosen for their security and reliability.
- Microsoft Azure
- Google Cloud Platform
- Amazon Web Services
Employee Access
Vanna AI employees and contractors do not have direct access to the training data as a matter of everyday business. Access to the training data is restricted to a small number of employees who require access for the purpose of maintaining the system. All employees with access to the training data are required to sign a confidentiality agreement.
If you require support that requires Vanna AI employees to access your training data, you must e-mail [email protected] from the e-mail address associated with your account in order to authorize the support employee to view your training data for the purpose of providing support.