-
Moved
src/tasks/batch_integrationtotask_batch_integration(PR #910). -
Moved
src/tasks/denoisingtotask_denoising(PR #910). -
Moved
src/tasks/dimensionality_reductiontotask_dimensionality_reduction(PR #910). -
Moved
src/tasks/label_projectiontotask_label_projection(PR #910). -
Moved
src/tasks/match_modalitiestotask_match_modalities(PR #910). -
Moved
src/tasks/predict_modalitytotask_predict_modality(PR #910). -
Moved
src/tasks/spatial_decompositiontotask_spatial_decomposition(PR #910). -
Moved
src/tasks/spatially_variable_genestotask_spatially_variable_genes(PR #910). -
Moved
src/datasetstodatasets(PR #933).
-
Update Viash to 0.9.0 (PR #911).
-
Update Viash to 0.9.4 (PR #925).
-
Add the CELLxGENE immune cell atlas dataset as a common test resource (PR #907).
-
Update
dataset_idfortenx_visium,zenodo_spatial,zenodo_spatial_slidetagsdatasets and usemouse_brain_coronalas a test resource in thespatially_variable_genestask (PR #908). -
Update
get_task_info,get_method_infoandget_metrics_inforeporting components with more info and extend output (PR #915).
-
Fix extracting metadata from anndata files in the
extract_metadatacomponent (PR #914). -
Fix path in
touchcmd inreporting/process_task_results/run_test.sh(PR #915).
A major update to the OpenProblems framework, switching from a Python-based framework to a Viash + Nextflow-based framework. This update features the same concepts as the previous version, but with a new implementation that is more flexible, scalable, and maintainable.
Most relevant parts of the overall structure:
-
src/tasks: Benchmarking tasks:batch_integration: Batch integrationdenoising: Denoisingdimensionality_reduction: Dimensionality reductionmatch_modalities: Match modalitiespredict_modality: Predict modalityspatial_decomposition: Spatial decompositionspatially_variable_genes: Spatially variable genes
-
src/datasets: Components for creating common datasets. Loaders:cellxgene_census: Query cells from a CellxGene Censusopenproblems_neurips2021_bmmc: Fetch a dataset from the OpenProblems NeurIPS2021 competitionopenproblems_neurips2022_pbmc: Fetch a dataset from the OpenProblems NeurIPS2022 competitionopenproblems_v1: Fetch a legacy OpenProblems v1 datasetopenproblems_v1_multimodal: Fetch a legacy OpenProblems v1 multimodal datasettenx_visium: Fetch a and convert 10x Visium datasetzenodo_spatial: Fetch and process an Anndata file containing DBiT seq, MERFISH, seqFISH, Slide-seq v2, STARmap, and Stereo-seq data from Zenodo.zenodo_spatial_slidetags: Download a compressed file containing gene expression matrix and spatial locations from zenodo.
-
src/common: Common components used by all tasks.check_dataset_schema: Check whether an h5ad dataset adheres to a dataset schemacheck_yaml_schema: Check whether a YAML adheres to a JSON schemacomp_tests: Reusable component unit testscreate_component: Create a component Viash component.create_task_readme: Create a README for an OpenProblems task.extract_metadata: Extract the.unsmetadata from an h5ad file.helper_functions: Commonly used helper functions in Python or in R,process_task_results: Process the raw tasks results (containing raw logs, unprocessed component configs, and various metrics) into nicely formatted task results.schemas: JSON schemas for YAML files in the repositorysync_test_resources: Synchronise the test resources from s3 to resources_test
For more information related to the structure of this repository, see the documentation.
Note: This changelog was automatically generated from the git log.
- Added
cell2locationto thespatial_decompositiontask. - Added nearest-neighbor ranking matrix computation to
_utils. - Datasets now store nearest-neighbor ranking matrix in
adata.obsm["X_ranking"]. - Added support for parsing Nextflow output and generating benchmark results for the website.
- Added
max_samplesparameter toqlocal,qglobal,qnn_auc,lcmc,qnn, andcontinuitymetrics to allow for subsampling of data for faster computation. - Added new scArches based methods:
scarches_scanvi_xgb_all_genesandscarches_scanvi_xgb_hvg. - Added
prediction_methodparameter to_scanvi_scarchesto specify prediction method. - Added
_pred_xgbfunction to perform XGBoost prediction based on latent representations. - Added
obsmparameter to_xgboostfunction to allow specifying the embedding space for XGBoost training.
- Updated
scvi-toolsto version0.20in both Python and R environments. - Updated datasets to include nearest-neighbor ranking matrix.
- Modified dimensionality reduction task to include nearest-neighbor ranking matrix computation in dataset generation.
- The website update workflow was refactored to use a new workflow using json instead of markdown.
- Updated the website generation process to remove duplicate BibTex entries.
- Added a new
parse_metadata.pyscript for generating metadata for the website. - Added a new function to
openproblems.utils.pyto get the member ID of a task, dataset, method or metric. - Removed the redundant computation and storage of the nearest-neighbor ranking matrix in datasets.
- Updated method names to be shorter and more consistent across tasks.
- Improved method summaries for clarity.
- Updated JAX and JAXlib versions to 0.4.6.
- Updated dependencies to support new versions of Snakemake and GitPython.
- Removed code related to "nbt2022-reproducibility" repo and merged it into the main website.
- Updated the schema for benchmark results to include submission time, code version, and resource usage metrics.
- Improved error handling and added logging to the parsing script.
- Removed the "raw.json" file from the results directory and merged all data into a single "results.json" file.
- Updated the workflow to upload the final results to the website's results directory instead of the data directory.
- Removed unnecessary code and refactored the parsing script for better readability.
- Added unit tests for the new parsing script.
- Updated the
run_testsworkflow to skip testing on thetest_websitebranch. - Updated the
run_testsworkflow to skip testing on thetest_processbranch. - Updated the
create-pull-requeststep to set the author for the pull request. - Updated the
run_testsworkflow to skip testing on pull request reviews. - Updated the
update_website_contentworkflow to update the website on themainbranch. - Updated the
main.bibfile to fix a typo. - Removed extraneous headings from task README files.
- Updated
generate_test_matrix.pyto use the newopenproblems.utils.get_member_idfunction. - Updated the website generation process to copy BibTex files to the correct location.
- Updated the
process_requiressection insetup.pyto includegitpython. - Updated git commit hash generation for openproblems functions.
- Modified
_xgboostto allow for specifyingtree_method. - Modified
_scanvi_scarchesto consistently useunlabeled_category. - Modified
_scanvi_scarchesto remove unnecessary copying oflabels. - Removed
_scanvi_scarchesfunctions that were redundant with_scanvi_scarches. - Removed unused
_scanvifunctions. - Modified
_scanvi_scarchesto allow for specifyingprediction_methodand handleunlabeled_categoryconsistently.
- Improved the documentation of the
auprcmetric. - Improved the documentation of the
cell2locationmethods. - Document sub-stub task behaviour
- Fixed an error in
neuralee_defaultwhere thesubsample_genesargument could be too small. - Fixed an error in
knn_naivewhere theis_baselineargument was set toFalse. - Fixed calculation of ranking matrix in
_utilsto include ties. - Fixed a bug in
load_tenx_5k_pbmc()where a warning about non-unique variable names was being raised. - Removed the unused
_utils.pyfile. - Removed the
X_rankingentry from theobsmattribute of datasets. - The
_fit()function innn_ranking.pynow subsamples the data ifmax_samplesis specified. - The
nn_rankingmetrics now use subsampling in the_fit()function to improve performance. - Fixed the git hash generation for openproblems functions
- Fixed a warning about
pkg_resourcesbeing deprecated - Removed unnecessary
fetch-depth: 1from workflow - Fixed potential issue in
_scanvi_scarcheswherelabels_predcould be overwritten - Fixed potential issue in
_pred_xgbwherenum_roundwasn't being used correctly - Fixed an issue where baseline methods were not being filtered correctly from the benchmark results.
- Fixed an issue where metrics with all NaN values were not being removed from the benchmark results.
- Fixed an issue where some metrics were not being parsed correctly from the Nextflow output.
- Fixed an issue where the "mean_score" field was not being calculated correctly for each method.
- Fixed an issue where the "code_version" field was not being populated correctly for each method.
- Fixed an issue where the "submission_time" field was not being populated correctly for each method.
- Fixed an issue where the resource usage metrics were not being parsed correctly from the Nextflow output.
- Updated the
run_testsworkflow to skip testing on thetest_websitebranch. - Updated the
run_testsworkflow to skip testing on thetest_processbranch. - Updated the
create-pull-requeststep to set the author for the pull request. - Updated the
run_testsworkflow to skip testing on pull request reviews. - Updated the `updatewebsite
Note: This changelog was automatically generated from the git log.
- Added the zebrafish_labs dataset to the dimensionality reduction task.
- Added the
diffusion_mapmethod to the dimensionality reduction task. - Added the
spectral_featuresmethod to the dimensionality reduction task, which uses diffusion maps to create embedding features. - Added the
distance_correlation_spectralmetric to the dimensionality reduction task, which evaluates the similarity of the high-dimensional Laplacian eigenmaps on the full data matrix and the dimensionally-reduced matrix. - Added baseline methods for batch integration: no integration, random integration, random integration by cell type, random integration by batch.
- Added
alra_sqrt_reversenorm,alra_log_reversenormmethods for ALRA with reversed normalization order. - Added
celltype_random_embedding_jittermethod to randomize embedding with jitter.
- Improved the
density_preservationmetric calculation. - Updated the
distance_correlationmetric to use the newdiffusion_mapmethod. - Increased the default number of components used for
distance_correlation_spectralto 1000. - Made metrics more robust by copying the AnnData object before passing it to the metric function.
- Added
is_baselineflag toadata.unsinmethoddecorator. - Added
is_baselinefield toadata.unsfor all methods. - Increased default values for
max_epochs_spandmax_epochs_scindestvimethod. - Changed default value of
early_stopping_monitortoelbo_validationfromreconstruction_loss_trainindestvimethod. - Added
train_sizeandvalidation_sizearguments to thesc_model.traincall indestvimethod. - Added
batch_sizeandplan_kwargsarguments to thest_model.traincall indestvimethod. - Refactor ALRA methods for improved clarity and consistency.
- Added tests for new ALRA methods with reversed normalization order.
- Added jitter parameter to
_random_embeddingfunction. - Updated
celltype_random_embeddingto usejitter=Nonein_random_embedding. - Removed unnecessary parameters from the
sample_datasetfunction. - Removed unnecessary checks for PCA and neighbors in the
check_datasetfunction. - Updated
pytest.inito ignore deprecation warning related topkg_resources. - Added permission to all workflows to read and write contents
- Added permission to write pull requests to several workflows
- Added permission to write packages to the
run_testsworkflow.
- Fixed a bug in
density_preservationthat caused it to return 0 when there were NaN values in the embedding. - Removed unused
true_features_log_cp10kandtrue_features_log_cp10k_hvgmethods. - Removed unnecessary imports in metrics.
- Removed unnecessary
neighborscalls in metrics. - Removed unused
_get_splitfunction. - Added
embedding_to_graphandfeature_to_graphfunctions for graph-based metrics. - Added
get_splitfunction for metrics that require splitting data into training and testing sets. - Added
feature_to_embeddingfunction for embedding-based metrics. - Fixed issue where baseline methods were not properly documented.
- Increased default maximum epochs for spatial models to improve performance.
- Improved training parameters for both spatial and single-cell models to improve stability and performance.
- Updated validation metric used for early stopping in spatial model to improve training quality.
- Updated documentation to clarify that the AnnData object passed to metric functions is a copy.
- Updated the documentation for batch integration tasks to reflect the change in the expected format of the dataset objects.
- Moved baseline methods from individual task modules to a common module.
- Removed redundant baseline methods from individual task modules.
- Increased default values for
max_epochs_spandmax_epochs_scindestvimethod.
Note: This changelog was automatically generated from the git log.
- Added metadata for all datasets, methods, and metrics.
- Updated nf-openproblems to v1.10.
- Added a new
docker_pullrule to the Snakemake workflow to pull Docker images. - Added a new
dockerrule to the Snakemake workflow to build Docker images. - Changed the
pytestcommand to include coverage for thetestdirectory. - Added new environment variables for the TOWER_TEST_ACTION_ID and TOWER_FULL_ACTION_ID to the Snakemake workflow.
- Updated the
scripts/install_renv.Rscript to increase the number of retry attempts.
Note: This changelog was automatically generated from the git log.
- Updated
scibversion to1.1.3indocker/openproblems-r-extras/requirements.txtanddocker/openproblems-r-pytorch/requirements.txt.
- Added
pytest-timestamperto test dependencies for better debugging.
Note: This changelog was automatically generated from the git log.
- Fixed an issue where pymde did not work on sparse data.
Note: This changelog was automatically generated from the git log.
- Added
hvg_unintandn_genes_preto the lung batch.
Note: This changelog was automatically generated from the git log.
- Added a bibtex file
main.bibfor storing all references cited in the repository. - Added a section on adding paper references to
CONTRIBUTING.mdexplaining how to add entries tomain.biband link to them in markdown documents. - Added new baseline methods for dimensionality reduction: "True Features (logCPM)", "True Features (logCPM, 1kHVG)".
- Added
alra_logmethod, which implements ALRA with log normalization. - Added
alra_sqrtmethod, which implements ALRA with square root normalization. - Added PyMDE dimensionality reduction methods
- Added citations for Chen et al. (2009) "Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis", Kraemer et al. (2018) "dimRed and coRanking - Unifying Dimensionality Reduction in R", Lee et al. (2009) "Quality assessment of dimensionality reduction: Rank-based criteria", Lueks et al. (2011) "How to Evaluate Dimensionality Reduction? - Improving the Co-ranking Matrix", Szubert et al. (2019) "Structure-preserving visualisation of high dimensional single-cell datasets", and Venna et al. (2006) "Local multidimensional scaling".
- Added
install_renv.Rscript to install R packages usingrenvwith retries - Added a new metric to evaluate the conservation of highly variable genes (HVGs) after batch integration.
- Added support for lung data from Vieira Braga et al.
- Added
magic_reverse_normandmagic_approx_reverse_normmethods which reverse the order of normalization and transformation in the MAGIC algorithm. - Added a new workflow to comment on pull request status.
- Updated the
openproblemsrepository to cite papers using bibtex references. - Renamed
alramethod toalra_sqrt. - Updated
spacexrto latest version. - Added
fc_cutoffandfc_cutoff_regparameters torctdmethod to control minimum log-fold-change for genes in the normalization and RCTD steps. - Renamed the "multimodal_data_integration" task to "matching_modalities".
- Bumped version to 0.7.0.
- Added BibTex references to all data loaders in
openproblems/data. - Added BibTex references to all methods in
openproblems/tasks. - Added BibTex references to all metrics in
openproblems/tasks. - Updated
update_website_content.ymlto copymain.bibto the Open Problems website. - Added a BibTeX Tidy hook to
.pre-commit-config.yaml. - Updated
scvi-toolsversion to~0.19in bothopenproblems-python-pytorchandopenproblems-r-pytorchdockerfiles. - Updated
cell2locationversion to47c8d6dc90dd3f1ab639861e8617c6ef0b62bb89in theopenproblems-python-pytorchdockerfile. - Updated
bslibto version 0.4.2. - Updated
htmltoolsto version 0.5.4. - Updated the
alra_sqrtmethod to use square root normalization. - Updated the
alra_logmethod to use log normalization. - Updated the method names to reflect the normalization used.
- Updated dependencies for
gtfparseandpolars. - Added PyMDE dependency to requirements.txt
- Updated the API to specify that datasets should provide log CPM-normalized counts in `adata.X
Note: This changelog was automatically generated from the git log.
- Added
cell2location_detection_alpha_1method, which usesdetection_alpha=1and a hard-coded reference. - Added a new parameter
hard_coded_referencetocell2location_detection_alpha_1method. - Added a new baseline method for dimensionality reduction using high-dimensional Laplacian Eigenmaps.
- Added organism metadata to datasets.
- Added a new image,
openproblems-python-bedtools, to contain packages required for runningpybedtoolsandpyensemblPython packages. - Added support for TensorFlow 2.9.0.
- Added a new schema for storing results in JSON format.
- Added a new function to parse Nextflow trace files to this JSON schema.
- Added
rmse_spectralmetric, which calculates the root mean squared error (RMSE) between high-dimensional Laplacian eigenmaps on the full (or processed) data matrix and the dimensionally-reduced matrix. - Added new methods to LIANA:
magnitude_max,magnitude_sum,specificity_max, andspecificity_sum. - Added
aggregate_howparameter tolianaR function to allow aggregation by "magnitude" or "specificity". - Added
top_propparameter toodds_ratiometric to allow specifying the proportion of interactions to consider for calculating the odds ratio.
- Removed unused
openproblems-python-batch-integrationdocker image. - Moved
scanorama,bbknn,scVI,mnnpyandscibfromopenproblems-python-batch-integrationtoopenproblems-r-pytorch. - Moved
cell2location,molecular-cross-validation,neuralee,tangramandphatefromopenproblems-python-extrastoopenproblems-python-pytorch. - Moved
pybedtools,pyensemblandscalexfromopenproblems-python-extrastoopenproblems-python-pytorch. - Moved
dcaandkerasfromopenproblems-python-tf2.4toopenproblems-python-tensorflow. - Added
openproblems-python-bedtoolsdocker image. - Added
openproblems-python-tensorflowdocker image. - Added
openproblems-python-pytorchdocker image. - Moved
harmony-pytorchfromopenproblems-r-extrastoopenproblems-r-pytorch. - Added
openproblems-r-pytorchdocker image. - Updated
anndata2riversion inopenproblems-r-base. - Updated
kBETversion inopenproblems-r-extras. - Updated
scibversion inopenproblems-r-extras. - Updated
scvi-toolsversion inopenproblems-r-pytorch. - Updated
torchversion inopenproblems-r-pytorch. - Moved the
codecovaction to run only on success - Updated the workflow to upload coverage reports to GitHub Actions as an artifact
- Renamed the
run_benchmarkjob tosetup_benchmark. - Added a new
run_benchmarkjob that runs aftersetup_benchmark. - Moved the benchmark running logic from the
run_benchmarkjob to the newrun_benchmarkjob. - Added a
setup-environmentstep tosetup_benchmarkjob. - Added outputs to the
setup_benchmarkjob. - Renamed the
nbt2022-reproducibilitytowebsite-experimental
- Updated
numpyandscipydependencies in setup.py. - Updated
scikit-learn,louvain,python-igraph,decoratorandcoloramadependencies in setup.py. - Improved Docker image caching.
- Removed the
countslayer from theimmune_cells,pancreasdatasets, and thebatch_integration_featuretask. - Removed the
countslayer fromgenerate_synthetic_datasetfunctions in spatial decomposition datasets. - Updated the
normalizefunctions to not modify the data in place. - Updated the
log_cpm_hvgfunction to annotate HVGs instead of subsetting the data. - Updated the
_high_dimfunction in thenn_rankingmetric to subset to HVGs. - Updated the
dimensionality_reductiontask README to clarify the role of thehighly_variablekey. - Reduced the random noise added to the one-hot embedding in the
_random_embeddingfunction from (-0.1, 0.1) to (-0.01, 0.01). - Removed
high_dim_pcaandhigh_dim_spectralmethods. - Updated the
random_featuresmethod to use thecheck_versionfunction. - Moved raw output files from website to the NBT 2022 reproducibility repository.
- Updated the
process_results.ymlworkflow to include the NBT 2022 reproducibility repository. - Updated the
run_tests.ymlworkflow to skip tests when pushing to specific branches. - Removed
# ci skipfrom commit message in CI workflow. - Removed redundant file deletion from
process_results.ymlworkflow. - Added
update_website_content.ymlworkflow to update benchmark content on the website repository. - Modified the
process_results.ymlworkflow to update website content based on results. - Changed the
update_website_content.ymlworkflow to trigger on both themainandtest_websitebranches. - Updated workflow to push changes to the website only if there are changes to the website content.
- Added environment variable to track changes.
- Removed unused git command.
- Decreased number of samples for testing.
- Updated
igraphto 0.10.* insetup.py. - Updated
anndata2rito 1.1.* inopenproblems-r-base/README.md. - Updated
kBETtoa10ffeainopenproblems-r-extras/r_requirements.txt. - Updated
scibtof0be826inopenproblems-r-extras/requirements.txt. - Updated
harmony-pytorchto 0.1.* inopenproblems-r-pytorch/requirements.txt. - Updated
torchto 1.13.* inopenproblems-r-pytorch/requirements.txt. - Updated
scanoramato 1.7.0 inopenproblems-r-pytorch/requirements.txt. - Updated
scvi-toolsto 0.16.* inopenproblems-r-pytorch/requirements.txt. - Updated the
regulatory_effect_predictiontask to use
Note: This changelog was automatically generated from the git log.
- Added a new dataset: "Pancreas (inDrop)"
- Added a new function: "pancreas"
- Added a new utility function: "utils.split_data"
- Added
tabula_muris_senis_lung_randomdataset. - Added
celltype_random_embeddingbaseline method for batch integration embedding. - Added
celltype_random_graphbaseline method for batch integration graph. - Added a new argument
sctransform_n_cellsto the seuratv3 function to allow users to specify the number of cells used to build the negative binomial regression in the SCTransform function. - Added a new sample dataset that is smaller and more efficient than the previous one.
- Added a "mean score" metric to the results table.
- Added support for loading the sample dataset in
load_sample_data. - Added support for running benchmarks on pull requests.
- Added a new workflow for creating a test matrix.
- Added a new script to generate a test matrix for the
run_testerworkflow. - Added a new script for cleaning up runner diskspace.
- Added support for uploading docker images to ECR.
- Added
tabula_muris_senisdataset toopenproblems/tasks/denoising/datasets/__init__.py. - Updated
stylerto version 1.8.1. - Updated the method for normalizing scores to correctly account for baseline method scores.
- Improved the way NaN and infinite values are handled in the ranking calculation.
- Removed redundant code that was previously used to upload results and markdown artifacts to test.
- Removed the raw output files from the website data directory.
- Updated the list of reviewers for the pull request to include more relevant team members.
- Changed the reference to "Code" to "Library" in the JSON output to better reflect the data presented.
- Added a check to ensure that the task has a minimum number of non-baseline methods before processing results.
- Removed the check to ensure that the task has a minimum number of methods before processing results.
- Removed redundant code that was previously used to handle incomplete tasks.
- Updated the workflow to use a consistent version of Python across all jobs.
- Updated flake8 dependency to
https://github.com/pycqa/flake8. - Improved random embedding for
celltype_random_embeddingandcelltype_random_graph. - Removed
pip checkfrom Dockerfile. - Updated code to use a more consistent random number generator.
- Updated liana code to inverse the distribution of the aggregate rank.
- Improved the logic in
odds_ratioto ensure that the numerator/denominator is not zero. - Removed unnecessary NXF_DEFAULT_DSL from
run_testerworkflow. - Increased the number of cells used to build the negative binomial regression in the SCTransform function from 3000 to 5000.
- Adjusted the default values for
n_pcaandsctransform_n_cellsin the seuratv3 function for test and non-test cases. - Updated the seuratv3_wrapper.R script to pass the
sctransform_n_cellsargument to the SCTransform function. - Moved the sample dataset from the
multimodalfolder to thesamplefolder. - Refactored the sample data generation to be more efficient.
- Modified the
compute_rankingfunction to calculate and add the "mean score" to thedataset_resultsdictionary. - Updated the
dataset_results_to_jsonfunction to include the "mean score" in the results table. - Updated the pull request template to reflect recent changes and improvements in the workflow.
- Updated the workflow to include a new
test_full_benchmarkbranch. - Removed redundant code from the workflow.
Note: This changelog was automatically generated from the git log.
- Added a new metric, AUPRC, for evaluating cell-cell communication predictions.
- Added support for aggregating method scores using "max" and "sum" operations.
- Implemented a new method, true events, which predicts all possible interactions.
- Added a new method, random events, which randomly predicts interactions.
- Implemented LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods with the option to aggregate scores using "max" or "sum."
- Added LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods to the cell-cell communication ligand-target task.
- Added LIANA, CellPhoneDB, Connectome, Log2FC, NATMI, and SingleCellSignalR methods to the cell-cell communication source-target task.
- Fixed a bug where the odds ratio metric was not handling cases where the numerator or denominator was zero.
- Updated the IRkernel package version in the R base docker image to 1.3.1.
- Updated the saezlab/liana package version in the R extras docker image to 0.1.7.
- Updated the boto3 package version in the main docker image to 1.26.*.
- Added a check to the cell-cell communication dataset validation to ensure that there are no duplicate entries in the target data.
- Updated the documentation for the cell-cell communication ligand-target task.
- Updated the documentation for the cell-cell communication source-target task.
Note: This changelog was automatically generated from the git log.
- Fixed an issue where a sparse matrix was not being converted to CSR format.
- Fixed a bug in
docker_run.shwhere pip check was not being executed.
- Updated
pkgloadto version 1.3.1.
Note: This changelog was automatically generated from the git log.
- Converted sparse matrix to csr format.
Note: This changelog was automatically generated from the git log.
- Converted sparse matrices to CSR format.
Note: This changelog was automatically generated from the git log.
Note: This changelog was automatically generated from the git log.
- Fixed a bug where the bioconductor version was incorrect.
- Fixed a bug where the matrix in obs was incorrect.
- Updated the scran package to version 1.24.1.
- Updated the batchelor and scuttle packages.
Note: This changelog was automatically generated from the git log.
Note: This changelog was automatically generated from the git log.
- Updated workflow to run tests against
prodbranch.
Note: This changelog was automatically generated from the git log.
- Skip benchmark if tester fails.
Note: This changelog was automatically generated from the git log.
- Explicitly push prod images on tag
- Added short metric descriptions to README
- Added labels tests
Note: This changelog was automatically generated from the git log.
- Reverted bump of louvain to 0.8, which caused issues.
- Updated torch requirement to 1.13 in the openproblems-r-pytorch docker.
Note: This changelog was automatically generated from the git log.
- Added support for SCALEX version 1.0.2.
- Updated RcppAnnoy to version 0.0.20.
- Updated SageMaker requirement to version 2.116.*.
- Fixed a bug in the
docker_hashfunction, which now returns a string instead of an integer. - Fixed a bug in the
scalexmethod, which now correctly handles theoutdirparameter.
Note: This changelog was automatically generated from the git log.
- Update rpy2 requirement from <3.5.5 to <3.5.6
- Update ragg to 1.2.4
- Don't fail job if hash fails
Note: This changelog was automatically generated from the git log.
- Updated scIB to 77ab015.
Note: This changelog was automatically generated from the git log.
- Added a new batch integration subtask for corrected feature matrices.
- Added a new sub-task for batch integration, "batch integration embed", which includes all methods that output a joint embedding of cells across batches.
- Added a new sub-task for batch integration, "batch integration graph", which includes all methods that output a cell-cell similarity graph (e.g., a kNN graph).
Note: This changelog was automatically generated from the git log.
- Fixed an issue where the
::in branch names would cause problems. - Fixed an issue where the
check_r_dependencies.ymlworkflow was not properly handling branch names with::.
- Updated the
caretpackage to version 6.0-93. - Updated the README to include information about the Open Problems team and task leaders.
- Replaced the
NuSVRmethod with a faster alternative, improving performance.
- Added a new method for running Seuratv3 from a fork, allowing for more efficient use of resources.
- Added a new requirement to the
r_requirements.txtfile for thebslibpackage. - Added a new requirement to the
r_requirements.txtfile for thecaretpackage.
- Added a new section to the README to document the process of running Seuratv3 from a fork.
- Updated the README to include a list of all contributors to the Open Problems project.
Note: This changelog was automatically generated from the git log.
- Fix sampling and reindexing
- Fix docker unavailable error to include image name
- Require minimum celltype count for
spatial_decomposition
- Update Rcpp to 1.0.9
- Update to nf-openproblems v1.7
Note: This changelog was automatically generated from the git log.
- Fixed an issue where some cell types were missing from the output.
Note: This changelog was automatically generated from the git log.
- Fixed a bug in the rctd method where cell types with fewer than 25 cells were not being used.
Note: This changelog was automatically generated from the git log.
- Handle missing function error by catching FileNotFoundError and NoSuchFunctionError instead of just RuntimeError.
Note: This changelog was automatically generated from the git log.
- Updated
scipyrequirement from==1.8.*to>=1.8,<1.10. - Updated
igraphto version1.3.4.
- Changed the mnnpy dependency to use a patch version instead of a specific commit hash.
- Changed
docker_hashto use the Docker API ifdockeris not available. - Use
curlto retrieve the Docker hash ifdockerfails. - Fixed an issue with using
git+httpsformnnpy.
Note: This changelog was automatically generated from the git log.
- Updated several R package dependencies.
- Updated several Python package dependencies.
- Added several new methods for spatial decomposition: RCTD, DestVI, Stereoscope.
- Added a new dataset for dimensionality reduction: Mouse hematopoietic stem cell differentiation.
- Improved documentation for tasks and datasets.
- Fixed a bug where the
lintrpackage was not being installed correctly. - Fixed a bug where the
BRANCH_PARSEDvariable was not being properly sanitized in therun_tests.ymlworkflow. - Fixed a bug in
_scanviand_scvifunctions where themax_epochsparameter was not being passed to thescanviandscvifunctions. - Fixed a bug in
install_renv.Rcausing incorrect installation of packages from R repositories. - Fixed an issue where the dependency upgrade script would fail to capture the output of the upgrade process.
- Fixed an issue where the dependency upgrade script would not correctly write updates to the requirements file.
- Fixed an issue where the
git_hashfunction was not being called for external modules. - Fixed a bug in
openproblems/tasks/denoising/methods/__init__.pythat prevented DCA from being used. - Fixed a bug in
neuralee_defaultwhere it could fail due to sparseness of data. - Fixed a bug in
scanvi_all_geneswhere the code version was not being set correctly. - Fixed a bug in
scanvi_hvgwhere the code version was not being set correctly. - Fixed a bug in
scarches_scanvi_all_geneswhere the code version was not being set correctly. - Fixed a bug in
scarches_scanvi_hvgwhere the code version was not being set correctly.
- Added a new denoising method called "DCA" based on a deep count autoencoder.
- Added
xgboost_log_cpmandxgboost_scranmethods toopenproblems.tasks.label_projection. - Added a new command-line interface for testing datasets, methods, and metrics.
- Added a new
install_renv.Rscript to simplify the installation of therenvpackage. - Added automated CI check to find and suggest available updates to R packages in docker images.
- Added a hash to the docker image that is based on the age of the code.
- Added data_reference to dataset metadata.
- Added
docker_hashfunction to retrieve the docker image hash associated with an image. - Added support for retrieving the docker image hash for R functions that have a defined
__r_file__.
- Updated contributing guide to reflect the
mainbranch as the default branch. - Updated issue templates to reflect the
mainbranch as the default branch. - Updated pull request template to reflect the
mainbranch as the default branch.
Note: This changelog was automatically generated from the git log.
- Added a new docker image
openproblems-r-pytorchfor running Harmony in Python
- Moved
harmonyto Python-basedharmony-pytorch
- Fixed an issue where
adata.varwas not being correctly handled in_utils.py - Updated the documentation for the
openproblems-r-extrasdocker image
Note: This changelog was automatically generated from the git log.
- Added PHATE with sqrt potential
- Fixed path to R_HOME
- Fixed Dockerfile to use R 4.2
- Minor CI fixes
Note: This changelog was automatically generated from the git log.
- Run scran pooling in series, not in parallel.
Note: This changelog was automatically generated from the git log.
- Added
FastMNN,Harmony, andLigermethods for batch integration. - Added
bbknn_full_unscaledmethod. - Added Dependabot configuration for pip and GitHub Actions dependencies.
- Updated dependencies:
scib,bbknn,scanorama,annoy, andmnnpy. - Improved the performance of several methods by pre-processing the data before running them.
- Fixed bugs in
fastMNN,harmony,liger,scanorama,scanvi,scvi,mnn, andcombatthat caused incorrect embedding.
Note: This changelog was automatically generated from the git log.
- Added a new file
workflow/generate_website_markdown.pyto generate website markdown files for all tasks and datasets. - Updated Nextflow version to v1.5.
- Updated Nextflow version to v1.6.
- Added code version to the output of each method.
- Updated
nextflowversion tov1.3. - Updated
nextflowversion tov1.4. - Updated docker version to 20.10.15.
- Removed Docker setup from CI workflow.
- Updated Python version to 3.8.13.
- Updated dependencies for the Docker images.
- Updated pre-commit hooks to include
requirements-txt-fixer. - Updated Nextflow workflow to version 1.4.
- Updated the location of method versions in the results directory.
- Updated the Tower action ID.
- Fixed a bug where Docker images were not properly pushed to Docker Hub.
- Updated
requirements.txtfiles to fix dependency conflicts. - Removed unnecessary dependencies from CI workflows to reduce disk space usage on GitHub runners.
Note: This changelog was automatically generated from the git log.
- Added new integration methods: BBKNN, Combat, FastMNN feature, FastMNN embed, Harmony, Liger, MNN, Scanorama feature, Scanorama embed, Scanvi, Scvi
- Added new metrics: graph_connectivity, iso_label_f1, nmi
- Added _utils.py with functions: hvg_batch, scale_batch
- Added
run_bbknnfunction. - Added a test for the trustworthiness metric, which now passes for sparse matrices.
- Added a test for the density preservation metric, which now passes against densmap for a reasonable degree of similarity.
- Added tests for all methods and metrics.
- Added a new workflow to automatically delete untagged images from the OpenProblems ECR repository.
- Added a new workflow to process results and create a PR to update the OpenProblems benchmark.
- Added support for running tests with the
processextra insetup.py. - Added
densmapdimensionality reduction method. - Added
neuraleedimensionality reduction method. - Added
alradenoising method. - Added
scarches_scanvilabel projection method. - Added
bbknnbatch integration graph method. - Added
betaregulatory effect prediction method. - Added a new
invite-contributors.ymlfile to the repository.
- The
test_methods.pyfile has been simplified by removing unused arguments. - The
test_metrics.pyfile has been simplified by removing unused arguments. - The
test_utils/docker.pyfile has been modified to allow specifying the docker image as a decorator argument. - Updated Nextflow version to 22.04.0.
- Modified the processing of Nextflow results to save them in a temporary directory.
- Modified
workflow/parse_nextflow.pyto parse results from Nextflow runs. - Modified
.github/workflows/run_tests.ymlto cancel previous runs when a new commit is pushed.
- Removed
.nextflow,scratch/,openproblems/results/andopenproblems/work/from.gitignore. - Updated
CONTRIBUTING.md - Methods should not edit
adata.obsm["train"]oradata.obsm["test"]. - Redirects stdout to stderr when running subcommands to ensure that output is printed correctly.
- Updated CI workflow to skip running tests on push if they failed on the
run_testerjob, unless the branch name starts withtest_benchmark. - Refactored the Neuralee method to use a separate function for embedding.
- Improved performance by using a default value for
maxitinfine_tune_kwargs. - Removed unnecessary code for storing raw counts in the
neuralee_defaultmethod.
Note: This changelog was automatically generated from the git log.
- Added CeNGEN, Tabula Muris Senis, and Pancreas datasets to the label_projection task.
- Added scANVI and scArches+scANVI methods to the label_projection task.
- Added majority_vote and random_labels baseline methods to the label_projection task.
- Added new methods: densMAP, NeuralEE, scvis
- Added new metrics: NN Ranking (continuity, co-KNN size, co-KNN AUC, Local continuity meta criterion, Local property metric, Global property metric)
- Added pre-processing function: log_cpm_hvg()
- Added support for custom pre-processing functions
- Added support for variants of methods
- Added a new batch integration task.
- Added a batch integration graph subtask.
- Added a batch integration embedding subtask.
- Added a batch integration corrected feature matrix subtask.
- Added ivis method for dimensionality reduction to openproblems.
- Added self-hosted runner support for
run_benchmarkworkflow using Cirun.io - Added a
--testflag to therunsubcommand, allowing for running a test version of a method. - Added
test_load_datasettotest/test__load_data.pyto test loading and caching of datasets. - Added
test_methodtotest/test_methods.pyto test application of methods. - Added
test_trustworthiness_sparsetotest/test_metrics.pyto test trustworthiness metric on sparse data. - Added
test_density_preservation_matches_densmaptotest/test_metrics.pyto test density preservation metric against densmap. - Updated
test/utils/docker.pyto allow specifying the docker image as the last argument. - Added
--testflag torunsubcommand to run the test version of a method. - Added Docker image building to
run_tests.yml. - Added a new workflow to process Nextflow results
- Added a new workflow to run tests and benchmarks
- Added support for running benchmarks from tags
- Added support for running benchmarks from forks
- Added
openproblems-clicommand to run test-hash
Note: This changelog was automatically generated from the git log.
- Added support for balanced SCOT alignment.
- Updated the workflow to store benchmark results in
/tmp.
- Fixed the parsing and committing of benchmark results on tag.
- Fixed the Github Actions badge link.
- Fixed the coverage badge.
- Fixed the benchmark commit.
- Ignored AWS warning and cleaned up S3 properly.
- Updated the workflow to continue on error for forks.
Note: This changelog was automatically generated from the git log.
- Added trustworthiness metric to the dimensionality reduction task.
- Added density preservation metric.
- Added several metrics based on nearest neighbor ranking: continuity, co-KNN size, co-KNN AUC, local continuity meta criterion, local property, global property.
- Added mouse blood data from Olsson et al. (2016) Nature to the
openproblemsdataset collection. - Added a test mode to the
load_olsson_2016_mouse_bloodfunction. - Added a dataset function for the
mouse_blood_olssen_labelleddataset in theopenproblems.tasks.dimensionality_reduction.datasetsmodule. - Added ALRA denoising method.
- Added support for the Single Cell Optimal Transport (SCOT) method for multimodal data integration.
- SCOT implements Gromov-Wasserstein optimal transport to align single-cell multi-omics data.
- Added four variations of SCOT:
-
- sqrt CPM unbalanced
-
- sqrt CPM balanced
-
- log scran unbalanced
-
- log scran balanced
- Each variation implements different normalization strategies for the input data.
- Added
scotmethod toopenproblems.tasks.multimodal_data_integration.methods. - Added pre-processing to the
dimensionality_reductiontask. - Added pre-processing to all
dimensionality_reductionmethods. - Added Wagner_2018_zebrafish_embryo_CRISPR dataset loader
- Added PR review checklist to the pull request template.
- Added
cmake==3.18.4to thedocker/openproblems-python-extras/requirements.txtfile. - Added
--versionflag to print the version. - Added
--test-hashflag to print the current hash. - Added basic help message.
- Added
install_renv.Rscript for installing R packages in Docker images. - Added
docker/.versionfile to track Docker image version. - Added a new docker image for running GitHub Actions.
- Added a new utils.git module to determine which tasks have changed relative to base/main.
- Added support for running benchmark tests on tags.
- Added a test directory for use in the workflow.
Note: This changelog was automatically generated from the git log.
- Added chromatin potential task
- Added PHATE to the dimensional_reduction task.
- Added support for testing docker builds on a separate branch.
- Added support for building images and pushing them to docker hub.
- Added support for writing methods in R using
scprep'sRFunctionclass. - Added a CLI interface to
openproblems. - Added
f1_micrometric. - Added
mlp_log_cpmandmlp_scranmethods for label projection. - Added
pancreas_batchandpancreas_randomdatasets for label projection. - Added
f1metric for label projection. - Added metadata to methods and metrics.
- Added
openproblems.tools.decoratorsfor decorating methods and metrics. - Added
openproblems.tools.normalizefor common normalization functions. - Added methods for
logistic_regression,mlp,harmonic_alignment,mnn, andprocrustes. - Added metrics for
accuracy,f1,knn_auc, andmse. - Added
openproblems.versionto provide package version. - Added
datasetdecorator for registering datasets. - Added
tools.decorators.profiledecorator to measure memory usage and runtime of methods. - Added
tools.normalizemodule to provide normalization functions. - Added
tools.decorators.normalizerdecorator to normalize data prior to applying methods. - Added a new "data loader" component that loads data in a way that's formatted correctly for a given task.
- Added CITE-seq Cord Blood Mononuclear Cells dataset.
- Added snakemake support for automatic evaluation.
- Added zebrafish data to label projection task.
- Added a new task, "Link gene expression with chromatin accessibility"
- Added a new dataset, "sciCAR Mouse Kidney with cell clusters"
- Added a new method, "BETA"
- Added a new metric, "Correlation between RNA and ATAC"
- Added a new task, "Dimensional reduction"
- Added human blood dataset from Nestorowa et al. Blood. 2016
- Added 10x PBMC dataset
- Added
load_10x_5k_pbmcfunction to load the 10x 5k PBMC dataset.
Note: This changelog was automatically generated from the git log.
- Added MLP method for label projection task.
- Added pancreas data loading to label projection task.
- Updated black.
- Updated test version of pancreas_batch to have test data.
- Added random pancreas train data.
- Fixed zebrafish code duplication.
- Fixed pancreas import location.
- Fixed bug in zebrafish data.
- Fixed bug in pancreas import.
- Removed normalization from loader.
- Removed dummy and cheat metrics/datasets.
- Removed excess covariates from pancreas dataset.
Note: This changelog was automatically generated from the git log.
- Added zebrafish label projection task
- Moved scIB, rpy2, harmonicalignment, and mnnpy to optional dependencies
- Improved n_components fix
- Moved URL into function for neater namespace
- Fixed n_svd for truncatedSVD
- Fixed data loader
- Fixed n_pca problem
- Scaled without mean if sparse
- Scaled data for regression
- Added check to ensure that data has nonzero size
Note: This changelog was automatically generated from the git log.
- Added a results page to the website.
- Added a new zebrafish dataset to the openproblems library.
- Added netlify.toml to deploy website.
- Updated documentation to reflect new features and datasets.
- Bumped version to 0.1.
- Improved the website's home menu link.
- Improved website links.
- Updated website's hero and social links.
- Updated website's task cards.
- Updated the website's demo.
- Improved website's frontmatter.
- Separated frontmatter from content in website's Markdown files.
- Fixed black syntax.
- Excluded website from black.
- Updated website content to display results.
- Updated the Travis CI configuration to exclude website from black.
- Fixed zebrafish data loader.
Note: This changelog was automatically generated from the git log.
- Added harmonic alignment method.
- Added scicar datasets.
- Added logistic regression methods.
- Added ability to normalize obsm.
- Added test suite.
- Added normalization tools.
- Updated documentation to reflect normalization changes.
- Migrated normalizations to openproblems.tools.normalize.
- Updated dataset specification to require normalization in methods.
- Removed zebrafish dataset.
- Moved dataset test spec.
- Removed "mode2_raw" and "raw" from datasets.
- Added test dataset spec.
- Consolidated scicar datasets.
- Migrated references to github repo.
- Improved sparse array equality test.
- Improved sparse inequality check.
- Increased test data size.
- Normalized mode2.
- Fixed decorator.
- Used uns.
- Used functools.wraps.
- Updated name of log_scran_pooling function.
- Fixed storing normalization results.
- Fixed zebrafish load caching.
- Fixed zebrafish test.
- Added normalization functions.
- Updated logistic regression function to work with anndata properly.
- Fixed cheat method.
- Fixed git upload.
- Fixed Travis CI.
- Fixed harmonic alignment import.
- Increased test coverage.
- Bugfix harmonic_alignment, closes #4.
- Bugfix harmonic alignment import.
- Normalized data inside methods, closes #19.
- Fix storing normalization results.
- Fixed zebrafish test.
- Fix zebrafish load caching.
- Fix decorator.
- Fix cheat method.
- Don't check for raw data -- we are no longer normalizing.
Note: This changelog was automatically generated from the git log.
- Added dummy dataset to
openproblems/data - Added
load_dummyfunction toopenproblems/data - Added
loaderdecorator toopenproblems/data - Added loading functions for sciCAR datasets to
openproblems/data/scicar - Added
scicar_cell_linesdataset toopenproblems/tasks/multimodal_data_integration/datasets - Added
scicar_mouse_kidneydataset toopenproblems/tasks/multimodal_data_integration/datasets - Added
dummydataset toopenproblems/tasks/label_projection/datasets
- Changed data structure for multimodal data integration tasks in
openproblems/tasks/multimodal_data_integration - Bumped version to 0.0.2 in
openproblems/version.py - Modified the way to run
evaluate.shin.travis.yml - Added
chmod +x evaluate.shto.travis.yml
- Added documentation for adding a dataset to a task in
README.md - Added documentation for dataset loading in
README.md - Added documentation for adding a new dataset in
README.md - Updated documentation in
openproblems/tasks/multimodal_data_integration/README.md - Updated documentation in
openproblems/version.py
First release of OpenProblems.
methods, 1 metric)
- Multimodal data integration (2 datasets, 2 methods, 2 metrics)