This repository contains the complete implementation of the University Knowledge Graph project developed for the Knowledge Engineering and Ontologies course.The project structurally models, validates, and queries the academic ecosystem of the Manisa Celal Bayar University (MCBU) Engineering Faculty, focusing on the Computer, Industrial, and Mechanical Engineering departments.
The primary objective of this project is to resolve the challenge of fragmented and siloed academic data across university faculties. By designing a unified Semantic Web application, this system:
- Formalizes the Domain: Captures complex relationships between students, academic staff (professors and research assistants), courses (mandatory and elective), and academic sub-fields using an OWL 2 ontology.
- Ensures Data Integrity: Employs SHACL shapes to validate essential business constraints, such as mandatory student IDs and departmental affiliations.
- Enables Intelligent Querying: Leverages SPARQL for logical reasoning and automated deduction (e.g., transitive prerequisite chains).
- Integrates Large Language Models (LLMs): Bridges the gap between natural language and structured query logic by translating user questions into ontology-grounded SPARQL queries using the Claude API, drastically minimizing AI hallucinations.
]Rather than relying on static or synthetic datasets, real-time data extraction was implemented to reflect the authentic academic ecosystem of the faculty:
- MCBU Course Information System: Used to scrape course codes, names, semester data, and multi-tier prerequisite chains.
- AVESIS (Academic Data Management System): Used to extract academic staff attributes, including titles, institutional email addresses, and official profile URLs.
Note: Data was collected using an automated Python pipeline leveraging Selenium for dynamic content handling and BeautifulSoup for HTML parsing, cleaned with Pandas, and mapped to the structural schema (TBox) to generate 1,977 RDF triples.
The repository is organized as follows to fulfill all semantic web and deployment constraints:
├── ontology/
│ └── university-knowledge-graph-v2.rdf # Core OWL ontology file (Turtle format)
├── shacl/
│ └── shacl-shapes.ttl # SHACL constraints for data validation
├── queries/
│ ├── query1_software_staff.sparql # SPARQL query for sub-field staff retrieval
│ ├── query2_transitive_prereq.sparql # SPARQL query testing transitive prerequisites
│ ├── query3_professor_emails.sparql # SPARQL query for professor contact details
│ └── query4_student_enrollments.sparql # SPARQL query for per-student schedules
├── src/
│ ├── scraper.py # Python automation script (Selenium & BeautifulSoup)
│ ├── rdf_mapper.py # Relational CSV-to-RDF triple generator
│ ├── validator.py # pySHACL validation engine script
│ └── llm_interface.py # Claude API Natural Language to SPARQL pipeline
├── docs/ # WIDOCO automated ontology documentation
│ └── index.html # Interactive visualization and documentation
└── README.md # Project documentation overview
To test data integrity and see the validation constraints in action (including the intentional validation failure of the incomplete :wrong_student record):
python src/validator.py
You can run the predefined SPARQL queries on the loaded Turtle memory graph:
python src/rdf_mapper.py
Alternatively, you can open ontology/university-knowledge-graph-v2.rdf in Protégé or GraphDB to execute queries directly inside their respective SPARQL Query tabs.
To launch the semantic natural language interface using Claude Sonnet, make sure to export your API key first:
export ANTHROPIC_API_KEY="your-api-key-here"
python src/llm_interface.py
This project was developed collaboratively by:
Efe Şamil Sarıgül
Ayça Selda Keskin
For a detailed look into the TBox/ABox conceptual decisions, axiom justifications, and technical outputs, please explore the interactive documentation inside the docs/ folder or read our full project report.