This code implements an example of a Report Generation Agent for single-table relational data source, including a demo agent demo UI and evaluations with Langfuse.
The data source implemented here is an SQLite database which is supported natively by Python and saves the data in disk. SQLAlchemy is used as a SQL connection tool so this SQL connection can be easily swapped for other databases.
The Report Generation Agent will provide an UI to read user queries in natural language
and proceed to make SQL queries to the database in order to produce the data for
the report. At the end, the Agent will provide a downloadable link to the report as
an .xlsx file.
This example also provides agent monitoring and evaluations using Langfuse.
Below is a diagram of the system:
To run the demo, you can choose to follow the steps below or follow the instructions in the notebooks in this folder.
The dataset used in this example is the Online Retail dataset. It contains information about invoices for products that were purchased by customers, which also includes product quantity, the invoice date and the country that the customer resides in. For a more detailed data structure, please check the OnlineRetail.ddl file.
To import the data, please download the dataset file from the link below and save it to your file system.
https://archive.ics.uci.edu/static/public/352/online+retail.zip
You can import the dataset to the database by running the script below:
uv run --env-file .env python -m implementations.report_generation.data.import_online_retail_data --dataset-path <path_to_the_csv_file>Replace <path_to_the_csv_file> with the path the dataset's .CSV file is saved in your machine.
NOTE: The location the database is saved is determined by an environment variable
named REPORT_GENERATION_DB__DATABASE.
To run the agent, please execute:
uv run --env-file .env python -m implementations.report_generation.demoThe agent will be available through a Gradio web UI under the local address http://127.0.0.1:7860, which can be accessed on your preferred browser.
On the UI, there will be a few examples of requests you can make to this agent. It also features a text input so you can make your own report requests to it.
The agent will automatically upload a trace to Langfuse that can be used to evaluate the run or debug any issues.
If you wish to run the agent using the Google Agent Development Kit UI, please run the command below from the project's root folder:
adk web implementations/Once the service is up, it will be available at http://127.0.0.1:8000. On the
"Select an Agent" dropdown, please select "report_generation". You can type a request to
the agent on the text box on the right hand side panel.
Before running the evaluations, the ground truth dataset needs to be uploaded to Langfuse first.
To do so with the sample ground-truth dataset in this repository, please run:
uv run --env-file .env python -m implementations.report_generation.data.langfuse_uploadTo upload custom data or use a different dataset name, please run:
uv run --env-file .env python -m implementations.report_generation.data.langfuse_upload --dataset-path <path/to/dataset.json> --dataset-name <dataset name>Once the dataset has been uploaded to Langfuse, the offline evaluations against a pre-determined dataset can be run with the command below:
uv run --env-file .env python -m implementations.report_generation.evaluateTo run the offline evaluations against a custom dataset, please execute:
uv run --env-file .env python -m implementations.report_generation.evaluate --dataset-name <dataset name>This script will run the Report Generation Agent against each element of the dataset and then use LLM-as-a-judge Evaluator Agents to evaluate each result. The evaluator will check the data generated to produce the report and also the trajectory the agent used against the ground truth and produce True/False scores along with a reasoning.
At the end of the run, an evaluation report will be displayed along with a link to check details about the evaluation in Langfuse.
The agent is also set to collect online evaluation metrics in both the Gradio Demo UI and the Google ADK UI. The online evaluations will check if the token usage and execution time are higher than a certain threshold defined in the code, and it will also check if the final result is present and contains a link to the report.
Those evaluation results will be sent to Langfuse as scores, where they can be analyzed both in an aggregate fashion as well as individually.
On the Gradio Demo UI, there are two buttons to record user feedback: a thumbs up button to record positive user feedback and a thumbs down button to record negative user feedback. The buttons will appear at the end of the agent's execution and it will record the user feedback as Langfuse scores.
