Welcome to NLP2SQL, a Streamlit-based application that allows users to interact with a SQL database using natural language. Powered by Azure OpenAI, LangChain, and Vector Search, this tool simplifies database exploration with an intuitive chat interface.
- Natural Language to SQL: Convert plain language queries into SQL commands.
- Embedded Knowledge Base: Retrieve proper nouns like artist names or album titles using FAISS for more precise filtering.
- Interactive Chat Interface: Engage with the database through a user-friendly Streamlit chat interface.
.env: Contains environment variables for the Azure OpenAI configuration.requirements.txt: Lists required Python libraries.Chinook.db: A sample SQLite database.app.py: The main application file that connects LangChain, Azure OpenAI, and the Chinook database.
-
Clone the repository:
git clone https://github.com/sifinell/NLP2SQL.git cd NLP2SQL -
Install dependencies:
pip install -r requirements.txt
-
Set up the environment variables:
- Create a
.envfile in the project root with the following content:
AZURE_OPENAI_ENDPOINT="your-azure-openai-endpoint" AZURE_OPENAI_API_KEY="your-azure-openai-api-key" AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o" AZURE_OPENAI_EMBEDDING_NAME="text-embedding-ada-002 model" AZURE_OPENAI_API_VERSION="2024-05-01-preview"- Replace
your-azure-openai-endpointandyour-azure-openai-api-keywith your Azure OpenAI deployment details.
- Create a
-
Run the application:
streamlit run app.py
Here are some examples of queries you can ask the application:
-
List all artists:
Input: List all artists. Query: SELECT * FROM Artist; -
Find all albums for the artist 'AC/DC':
Input: Find all albums for the artist 'AC/DC'. Query: SELECT * FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'AC/DC'); -
List all tracks in the 'Rock' genre:
Input: List all tracks in the 'Rock' genre. Query: SELECT * FROM Track WHERE GenreId = (SELECT GenreId FROM Genre WHERE Name = 'Rock'); -
Find the total duration of all tracks:
Input: Find the total duration of all tracks. Query: SELECT SUM(Milliseconds) FROM Track; -
List all customers from Canada:
Input: List all customers from Canada. Query: SELECT * FROM Customer WHERE Country = 'Canada'; -
How many employees are there?:
Input: How many employees are there? Query: SELECT COUNT(*) FROM "Employee";
For more details about the LangChain SQL capabilities used in this project, visit the LangChain SQL Quickstart documentation.
Feel free to submit issues or pull requests for new features and improvements!