NodeSynth

Tool Prototype: http://go/sarai-external-prototype

NodeSynth is a research prototype that implements a scalable, multi-stage method for creating socially relevant and grounded synthetic data (e.g., annotated queries) for AI model evaluation. The pipeline breaks down topics related to safety policies (e.g., harassment) and sensitive domains (e.g., education) into taxonomies using a fine-tuned taxonomy generator; identifies key relationships within the taxonomies (e.g., social groups, use cases); and validates synthetic query quality for model evaluation.

NodeSynth enables users (e.g., researchers, developers) to go from a topic to a synthetic dataset capturing relationships that represent documented harms in the real world. This prototype and the approach outlined in the accompanying paper can be used to conduct lightweight model evaluations specific to sensitive topics, enabling model developers and deployers to prioritize key areas of concern for in-depth human evaluation.

Getting Started

TODO: Add installation and usage instructions.

Requirements

TODO: List requirements and dependencies.

Usage

TODO: Add usage examples.

Citation

If you use NodeSynth in your research, please cite the following paper:

@article{nodesynth2026,
  title={From Policy to Prompt: Socially Aligned Synthetic Data for AI Evaluation},
  author={TODO: Add full author list},
  year={2026}
}

Disclaimer

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

This project is intended for demonstration purposes only. It is not intended for use in a production environment.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contributing

See CONTRIBUTING.md for details.

How to run it on your own machine

python3 -m venv venv; source venv/bin/activate; pip install -r requirements.txt; streamlit run streamlit_app.py;

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NodeSynth_Data_Cultural_Full_Export.csv		NodeSynth_Data_Cultural_Full_Export.csv
NodeSynth_Data_med_Full_Export.csv		NodeSynth_Data_med_Full_Export.csv
README.md		README.md
[EXTERNAL]NodeSynth_Prompt_Creation_General_Pipeline_+_Streamlit_.ipynb		[EXTERNAL]NodeSynth_Prompt_Creation_General_Pipeline_+_Streamlit_.ipynb
[EXTERNAL]Nodesynth_Taxnomomy_SFT_Model_+_Performance_Comparison.ipynb		[EXTERNAL]Nodesynth_Taxnomomy_SFT_Model_+_Performance_Comparison.ipynb
[EXTERNAL]_Nodesynth_Prompt_Eval_with_target_models.ipynb		[EXTERNAL]_Nodesynth_Prompt_Eval_with_target_models.ipynb
analyse.csv		analyse.csv
d3_sankey.py		d3_sankey.py
evaluation_data.csv		evaluation_data.csv
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NodeSynth

Getting Started

Requirements

Usage

Citation

Disclaimer

License

Contributing

How to run it on your own machine

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NodeSynth

Getting Started

Requirements

Usage

Citation

Disclaimer

License

Contributing

How to run it on your own machine

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages