Thank you for your interest in contributing to DataLineagePy! This project aims to make data lineage tracking accessible to everyone.
-
Fork the repository
# Fork on GitHub, then clone your fork git clone https://github.com/yourusername/DataLineagePy.git cd DataLineagePy
-
Set up development environment
# Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install in development mode pip install -e ".[dev]"
-
Run tests to verify setup
pytest tests/ -v
- Check existing issues and pull requests
- For new features, please open an issue first to discuss the approach
-
Create a feature branch
git checkout -b feature/your-feature-name
-
Make your changes
- Write clean, readable code
- Follow existing code style and patterns
- Add tests for new functionality
- Update documentation as needed
-
Test your changes
# Run tests pytest tests/ # Check code formatting (if black is installed) black --check . # Run specific test categories pytest tests/test_basic_functionality.py -v
-
Commit your changes
git add . git commit -m "feat: add your feature description"
-
Push and create pull request
git push origin feature/your-feature-name # Then create PR on GitHub
- Follow PEP 8 Python style guide
- Use descriptive variable and function names
- Add type hints where appropriate
- Write docstrings for public functions and classes
- Add tests for all new functionality
- Ensure existing tests continue to pass
- Aim for good test coverage of new code
- Use descriptive test names that explain what is being tested
- Update relevant documentation for new features
- Add examples to demonstrate usage
- Keep README.md and other docs up to date
# Run all tests
pytest
# Run specific test file
pytest tests/test_basic_functionality.py
# Run tests with verbose output
pytest -v
# Run tests matching a pattern
pytest -k "test_dataframe"# If using sphinx or similar (future enhancement)
cd docs/
make html# Run performance benchmarks
python tests/test_benchmark_suite.py- New Connectors: Database connectors (MongoDB, Snowflake, etc.)
- Performance Optimizations: Memory usage, speed improvements
- Documentation: More examples and tutorials
- Testing: Additional test coverage and edge cases
- Visualization Enhancements: New chart types, better layouts
- Integration Examples: Jupyter notebooks, real-world scenarios
- Error Handling: Better error messages and recovery
- Configuration Options: More customization capabilities
- Apache Spark Integration: Native Spark DataFrame support
- Real-time Streaming: Enhanced Kafka and streaming support
- Cloud Integrations: Better cloud storage connectors
- ML Pipeline Support: Integration with ML frameworks
When reporting bugs, please include:
- Python version and operating system
- DataLineagePy version
- Minimal code example that reproduces the issue
- Full error message and stack trace
- Expected vs actual behavior
For new features, please provide:
- Clear description of the feature
- Use case and motivation
- Proposed API or interface (if applicable)
- Any implementation ideas
- Ensure tests pass: All existing tests must continue to pass
- Add tests: New functionality should include appropriate tests
- Update documentation: Add or update docs for new features
- Describe changes: Write clear PR description explaining what and why
- Link issues: Reference any related GitHub issues
- Maintainer will review PRs within a few days
- Address any feedback or requested changes
- Once approved, maintainer will merge the PR
Contributors will be:
- Listed in the project's contributors section
- Credited in release notes for significant contributions
- Invited to join the project as a maintainer (for ongoing contributors)
- GitHub Issues: For bugs and feature requests
- Discussions: For questions and general discussion
- Email: Contact arbaznazir4@gmail.com for complex topics
By contributing to DataLineagePy, you agree that your contributions will be licensed under the same MIT License that covers the project.
Every contribution, no matter how small, helps make DataLineagePy better for the entire data engineering community. Thank you for being part of this project!
Happy coding! 🚀