Skip to content

YogeshSivakumar18/Research-on-EVO-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

EVO 2: A Critical Analysis of AI-Driven Genomics Through its Data Pipeline

Overview

This repository presents an academic research project that offers a comprehensive assessment of EVO 2, a state-of-the-art AI model developed for genomic analysis. The study investigates EVO 2 by thoroughly examining its data pipeline architecture, revealing how its structure, training processes, and data handling strategies give rise to broader concerns about risk, ethics, and governance in AI-driven genomics.

EVO 2 was developed in collaboration between the Arc Institute, NVIDIA, Stanford, UC Berkeley, and UCSF. Capable of processing genomic sequences up to 1 million base pairs, EVO 2 utilizes a novel architecture (StripedHyena 2) and was trained on a massive OpenGenome2 dataset containing over 9.3 trillion nucleotides.

Core Focus

This project does not aim to build or extend EVO 2, but instead to critically assess the implications of its data pipeline, including:

Pipeline-Based Risk Assessment

  • Data quality controls and privacy protections
  • Risks of genetic misuse or unintended consequences in genome editing
  • Bias inheritance from organismal diversity in the training data

Ethical Evaluation

  • Informed consent in genomic data collection and use
  • Transparency in excluded pathogen data and its impact on research access
  • Fair use and equitable treatment in predictions and applications

Governance and Policy Implications

  • International regulation needs and ethical frameworks
  • Public accountability and legal alignment (e.g., GDPR, EU Bioethics)
  • Real-time oversight of AI systems in biomedical research

Structure of This Repository

  • EVO2_Assessment_Report.docx – Full research document including references
  • Data-Pipeline-chart-of-Evo-2.png - A visual of the EVO 2 data pipeline designed by me
  • README.md

Disclaimer

This project is for academic, educational, and policy discourse purposes. No proprietary data or private genomic information is used or accessed. All conclusions are based on publicly available documentation and open-source model behavior.

License

Open-access under CC BY-NC 4.0. Refer to the LICENSE file for reuse conditions.

About

This project critically analyzes the data pipeline of EVO 2, a cutting-edge AI model for genomic research developed by the Arc Institute and collaborators. The focus lies in evaluating its risks, ethical challenges, and governance structures in the context of data privacy, genome editing, and AI-driven biomedical innovation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors