Open research questions, experimental techniques, and academic collaboration opportunities.
Last Updated: November 2025
Question: Can we identify causal relationships (not just correlations) between pilot experience, weather, and accident severity using causal inference?
Motivation: Traditional statistical models identify correlations, but aviation safety requires understanding true causality to inform policy.
Approach:
- Use DoWhy library for causal inference
- Build causal DAG: weather → pilot decisions → accident severity
- Estimate treatment effects: ATE of pilot experience on fatality risk
- Refute estimates: placebo tests, random common cause
Expected Outcome: Quantify causal effects (e.g., "1000 additional flight hours reduces fatality risk by 15%")
Timeline: 3-4 months Collaborators: Aviation safety researchers, causal inference experts Deliverable: Research paper for Safety Science or Reliability Engineering & System Safety
Question: Can we fine-tune general-purpose LLMs (GPT-4, Claude) on aviation accident reports to create specialized models with 90%+ accuracy?
Motivation: General LLMs lack aviation domain knowledge. Fine-tuned models could outperform SafeAeroBERT.
Approach:
- Fine-tune GPT-3.5/GPT-4 on 10K+ NTSB narratives
- Compare to BERT-based models (SafeAeroBERT, DistilBERT)
- Evaluate on classification, NER, and summarization tasks
- Measure cost-benefit: accuracy vs API costs
Expected Outcome: Specialized aviation LLM with 92%+ accuracy (vs 87-91% for BERT)
Timeline: 2-3 months Budget: $500-1000 (OpenAI fine-tuning costs) Deliverable: Model on Hugging Face Hub, paper for NLP in Aviation workshop
Question: Can we predict future accident hotspots using spatiotemporal models (e.g., Gaussian Processes, spatial LSTMs)?
Motivation: Current geospatial analysis identifies historical hotspots. Predictive models could enable proactive safety measures.
Approach:
- Build spatiotemporal dataset: (lat, lon, time, features)
- Train Gaussian Process regression with RBF kernel
- Train spatial LSTM (ConvLSTM) for sequence prediction
- Validate on 2023-2024 data: predict accident density for 2025
Expected Outcome: Identify 5-10 future hotspots with 70%+ precision
Timeline: 4-5 months Collaborators: Geospatial data scientists, aviation safety analysts Deliverable: Paper for Transportation Research or GeoInformatica
Question: How can we improve SHAP explanations for aviation ML models to be interpretable by non-technical stakeholders (pilots, regulators)?
Motivation: SHAP values are technical. Aviation stakeholders need natural language explanations.
Approach:
- Generate SHAP values for 1000+ predictions
- Use LLM to convert SHAP to natural language: "This accident was predicted as fatal because the aircraft was 30 years old (20% contribution), the pilot had low hours (15% contribution), and weather was IMC (12% contribution)"
- Evaluate with aviation experts: comprehension, trust, actionability
- Compare to LIME, anchor explanations
Expected Outcome: Natural language explanations with 85%+ expert approval rating
Timeline: 2-3 months Collaborators: HCI researchers, aviation safety experts Deliverable: Paper for XAI conference (IUI, CHI, ExSS)
Question: Can we improve accident classification by combining textual narratives with accident scene images?
Motivation: NTSB reports include images (wreckage, damage). Multi-modal models could capture visual cues missed by text-only models.
Approach:
- Collect accident images from NTSB reports (5K+ images)
- Train vision model: ResNet, Vision Transformer (ViT)
- Train multi-modal model: CLIP, ViLT, or custom fusion
- Compare to text-only baseline (SafeAeroBERT)
Expected Outcome: Multi-modal model achieves 93%+ accuracy (vs 87-91% text-only)
Timeline: 5-6 months Challenges: Limited labeled image data, image quality varies Deliverable: Paper for Computer Vision in Transportation (CVPR, ICCV workshops)
Question: Can we predict occupant injury severity (Kaplan-Meier survival curves) based on accident characteristics?
Motivation: Current models predict accident severity. Occupant-level predictions could inform safety design (seatbelts, airbags, crashworthiness).
Approach:
- Build occupant-level dataset: seat position, restraints, impact forces
- Train Cox proportional hazards model
- Identify risk factors: age, seat position, aircraft type
- Validate with medical trauma scoring systems (ISS, AIS)
Expected Outcome: Predict occupant fatality risk with 75%+ C-index
Timeline: 3-4 months Collaborators: Medical researchers, aviation safety engineers Deliverable: Paper for Journal of Trauma or Aviation, Space, and Environmental Medicine
Question: Can we use graph neural networks (GNNs) to predict missing relationships in the aviation knowledge graph?
Motivation: Knowledge graph has 20%+ disconnected nodes. Link prediction could infer implicit relationships.
Approach:
- Train GNN on knowledge graph: GraphSAGE, GAT, or R-GCN
- Predict missing links: (Aircraft A) -[SIMILAR_TO]-> (Aircraft B)
- Evaluate with link prediction metrics: MRR, Hits@10
- Compare to rule-based approaches (Cypher queries)
Expected Outcome: Predict 1000+ new relationships with 70%+ precision
Timeline: 4-5 months Collaborators: Graph ML researchers, Neo4j experts Deliverable: Paper for Graph Learning in Transportation (KDD, ICML workshops)
Question: Can we train ML models on distributed NTSB data (across multiple agencies) without sharing raw data?
Motivation: Federated learning enables privacy-preserving collaboration. Could combine NTSB (US) + EASA (EU) + CASA (Australia) data.
Approach:
- Simulate federated setting: split data by region
- Train federated XGBoost: aggregate model updates (not raw data)
- Compare to centralized model: accuracy, convergence time
- Evaluate privacy guarantees: differential privacy
Expected Outcome: Federated model achieves 90%+ centralized accuracy with formal privacy
Timeline: 6-8 months Collaborators: Privacy researchers, international aviation agencies Deliverable: Paper for Privacy in ML (CCS, USENIX Security) or Aviation Safety
Question: Can we detect anomalous accident patterns in real-time (e.g., sudden spike in engine failures)?
Motivation: Early detection of emerging safety issues could trigger investigations before fatalities increase.
Approach:
- Implement streaming anomaly detection: Isolation Forest, LSTM autoencoder
- Monitor incoming accident reports (monthly NTSB updates)
- Define anomalies: 2σ above expected rate, novel occurrence codes
- Alert system: Slack/email when anomaly detected
Expected Outcome: Detect 80%+ of anomalies within 24 hours
Timeline: 2-3 months Dependencies: Phase 5 (real-time Kafka pipeline) Deliverable: Production system + paper for Anomaly Detection in Time Series
Question: Can we generate counterfactual explanations: "If pilot had 1000 more hours, would accident still be fatal?"
Motivation: Counterfactuals help stakeholders understand how to prevent future accidents.
Approach:
- Use DICE (Diverse Counterfactual Explanations) library
- Generate counterfactuals for 1000+ accidents
- Validate feasibility: are counterfactuals realistic?
- Compare to causal inference results (DoWhy)
Expected Outcome: Generate 3-5 actionable counterfactuals per accident
Timeline: 2-3 months Collaborators: Explainable AI researchers Deliverable: Tool for investigators + paper for XAI conference
- Hypothesis: TabTransformer outperforms XGBoost on accident prediction
- Approach: Train TabTransformer on 100+ engineered features
- Baseline: XGBoost (91% accuracy)
- Target: 92%+ accuracy
- Effort: 20 hours
- Hypothesis: GNN leveraging knowledge graph improves predictions
- Approach: Encode accident as graph node, aggregate neighbor features
- Baseline: XGBoost with tabular features
- Target: 2-3% accuracy improvement
- Effort: 30 hours
- Hypothesis: Stacking ensemble (meta-learner) beats voting ensemble
- Approach: Stack XGBoost + RF + LSTM with Logistic Regression meta-learner
- Baseline: Voting ensemble (91% accuracy)
- Target: 92%+ accuracy
- Effort: 15 hours
- Hypothesis: Few-shot learning improves classification for rare accidents (<100 examples)
- Approach: Use Prototypical Networks or MAML
- Baseline: Standard classification (poor on rare classes)
- Target: 70%+ accuracy on rare classes
- Effort: 25 hours
- Hypothesis: GPT-4 can generate synthetic accident narratives to augment training data
- Approach: Generate 5000+ synthetic narratives, train SafeAeroBERT
- Baseline: 87-91% accuracy (real data only)
- Target: 90-93% accuracy (with synthetic data)
- Effort: 20 hours
- Budget: $200-500 (GPT-4 API)
- Hypothesis: SMOTE (Synthetic Minority Over-sampling) improves rare accident classification
- Approach: Apply SMOTE to minority classes, retrain XGBoost
- Baseline: F1-score 0.75 on minority classes
- Target: F1-score 0.82+
- Effort: 10 hours
- A: Dense retrieval (FAISS)
- B: Hybrid retrieval (BM25 + FAISS)
- Metric: Precision@5, Recall@10
- Hypothesis: Hybrid beats dense by 10-15%
- A: XGBoost v1.0
- B: XGBoost v1.1 (tuned with Optuna)
- Metric: Accuracy, inference latency
- Hypothesis: v1.1 improves accuracy by 1-2%
- A: Current Streamlit layout
- B: Redesigned layout (user testing)
- Metric: Time to complete task, user satisfaction
- Hypothesis: Redesign reduces task time by 20%
- A: Simple prompt: "Answer based on these reports"
- B: Chain-of-thought prompt: "First identify relevant reports, then analyze patterns, finally provide recommendations"
- Metric: Answer quality (ROUGE, human evaluation)
- Hypothesis: CoT prompt improves quality by 15%
-
MIT - International Center for Air Transportation
- Focus: Aviation safety, operations research
- Potential collaboration: Causal inference, predictive modeling
- Contact: Prof. John Hansman
-
Embry-Riddle Aeronautical University
- Focus: Aviation safety, human factors
- Potential collaboration: Pilot behavior analysis, survival analysis
- Contact: Dr. David Esser (Aviation Safety)
-
Stanford - AI Lab
- Focus: Explainable AI, causal inference
- Potential collaboration: XAI for aviation, counterfactual explanations
- Contact: Prof. Percy Liang
-
Carnegie Mellon - Software Engineering Institute
- Focus: ML engineering, MLOps
- Potential collaboration: Production ML systems, monitoring
- Contact: Dr. Grace Lewis
-
NASA - Aviation Safety Reporting System (ASRS)
- Focus: Incident reporting, NLP
- Potential collaboration: Text analysis, topic modeling
- Dataset sharing: ASRS reports (complementary to NTSB)
-
FAA - Office of Aerospace Medicine
- Focus: Human factors, medical certification
- Potential collaboration: Occupant injury analysis, survival models
- Access to medical data
-
EASA - European Union Aviation Safety Agency
- Focus: International aviation safety standards
- Potential collaboration: Federated learning, cross-border analysis
- Dataset sharing: European accident data
-
Boeing - Safety Analytics
- Focus: Commercial aviation, fleet safety
- Potential collaboration: Predictive maintenance, ML models
- Data access: Flight data recorder (FDR) data
-
Airbus - Flight Safety
- Focus: Aircraft design, crashworthiness
- Potential collaboration: Multi-modal learning (text + images), safety design
- Access to engineering data
-
FlightSafety International
- Focus: Pilot training, simulation
- Potential collaboration: Pilot behavior modeling, training effectiveness
- Simulator data
- NeurIPS - May 2025 (notification: Sep 2025)
- ICML - January 2026 (notification: May 2026)
- AAAI - August 2025 (notification: November 2025)
- KDD - February 2026 (notification: May 2026)
- International Conference on Aviation Safety - June 2025
- Aerospace Technology Congress - October 2025
- Flight Safety Foundation Annual Safety Summit - November 2025
- ACL - February 2025 (notification: May 2025)
- EMNLP - June 2025 (notification: September 2025)
- NAACL - December 2025 (notification: March 2026)
- XAI Workshop @ NeurIPS - September 2025
- IUI (Intelligent User Interfaces) - October 2025
- CHI (Computer-Human Interaction) - September 2025
- Target: Safety Science (Q1 journal, IF: 4.2)
- Timeline:
- Q2 2025: Data analysis, causal models (DoWhy)
- Q3 2025: Write draft, internal review
- Q4 2025: Submit to journal
- Q1 2026: Revisions, acceptance
- Authors: Lead Data Scientist, Aviation Safety Expert, Causal Inference Researcher
- Expected Citations: 20-50 in first 2 years
- Target: EMNLP 2025 or ACL 2026
- Timeline:
- Q3 2025: Fine-tune BERT, evaluate on test set
- Q4 2025: Write draft, internal review
- Q1 2026: Submit to conference
- Q2 2026: Presentation (if accepted)
- Authors: NLP Specialist, Data Scientist
- Code: Open-source on Hugging Face
- Expected Citations: 50-100 in first 2 years (high-impact venue)
- Target: KDD 2026 or ICDM 2026
- Timeline:
- Q4 2025: Build knowledge graph, run graph algorithms
- Q1 2026: GNN experiments, link prediction
- Q2 2026: Write draft, submit to conference
- Q3 2026: Presentation (if accepted)
- Authors: AI Engineer, Graph ML Researcher
- Code: Open-source Neo4j schema + queries
- Expected Citations: 30-60 in first 2 years
- Format: Parquet files (compressed, 10x smaller than CSV)
- Size: ~2GB (100K accidents, 200+ features)
- Platform: Kaggle, Hugging Face Datasets
- License: CC BY 4.0 (public domain data)
- Timeline: Q2 2025 (after Phase 1 complete)
- Expected Downloads: 1000+ in first year
- Format: Hugging Face Datasets
- Size: 10K+ narratives with labels (severity, causes)
- Use case: NLP research, BERT fine-tuning
- License: CC BY 4.0
- Timeline: Q3 2025 (after SafeAeroBERT training)
- Expected Downloads: 500+ in first year
- Format: Neo4j dump, RDF triples
- Size: 50K+ entities, 100K+ relationships
- Platform: Zenodo, GitHub
- License: CC BY 4.0
- Timeline: Q1 2026 (after Phase 4 complete)
- Expected Downloads: 200+ in first year
- Description: Python SDK for NTSB analytics API
- Features: Authentication, rate limiting, pagination
- Platform: PyPI
- Timeline: Q1 2026 (Phase 5)
- Expected Downloads: 500+ monthly
- Description: Fine-tuned BERT for aviation accident classification
- Platform: Hugging Face Hub
- Timeline: Q3 2025 (Phase 4)
- Expected Downloads: 100+ monthly
- Description: Feature engineering pipelines for aviation ML
- Features: NTSB code extraction, temporal/spatial features
- Platform: PyPI
- Timeline: Q3 2025 (Phase 3)
- Expected Downloads: 200+ monthly
| Category | Cost |
|---|---|
| LLM API costs (fine-tuning, RAG) | $1,000 - $2,000 |
| Cloud compute (GPU training) | $500 - $1,000 |
| Conference travel (2-3 conferences) | $4,000 - $6,000 |
| Open-access publication fees | $1,500 - $3,000 |
| Dataset hosting (Zenodo, S3) | $100 - $500 |
| Total | $7,100 - $12,500 |
Funding Sources:
- NSF grants (SBIR/STTR)
- Academic partnerships (equipment, cloud credits)
- Industry sponsorship (Boeing, Airbus)
- Crowdfunding (Patreon, GitHub Sponsors)
Last Updated: November 2025 Version: 1.0