@@ -442,9 +442,9 @@ def get_adaJSD_matrix(
442442 Two Ensemble objects storing the ensemble data to compare.
443443 return_bins : bool, optional
444444 If True, also return the histogram bin edges used in the comparison.
445- **remaining**
446- Additional arguments passed to `idpet .comparison.score_adaJSD`.
447-
445+ **remaining
446+ Additional arguments passed to `dpet .comparison.score_adaJSD`.
447+
448448 Output
449449 ------
450450 score : float
@@ -824,95 +824,88 @@ def all_vs_all_comparison(
824824 verbose : bool = False
825825 ) -> dict :
826826 """
827- Compare all pairs of ensembles using divergence scores.
828-
829- Implemented scores are approximate average Jensen–Shannon divergences
827+ Compare all pair of ensembles using divergence scores.
828+ Implemented scores are approximate average Jensen–Shannon divergence
830829 (JSD) over several kinds of molecular features. The lower these scores
831- are, the higher the similarity between the probability distributions of
830+ are, the higher the similarity between the probability distribution of
832831 the features of the ensembles. JSD scores here range from a minimum of 0
833- to a maximum of log(2) ≈ 0.6931.
832+ to a maximum of log(2) ~= 0.6931.
834833
835834 Parameters
836835 ----------
837- ensembles : List[Ensemble]
838- Ensemble objects to analyze.
839- score : str
836+ ensembles: List[Ensemble]
837+ Ensemble objectes to analyze.
838+ score: str
840839 Type of score used to compare ensembles. Choices: `adaJSD` (carbon
841- Alpha Distance Average JSD), `ramaJSD` (RAMAchandran Average JSD), and
840+ Alfa Distance Average JSD), `ramaJSD` (RAMAchandran average JSD) and
842841 `ataJSD` (Alpha Torsion Average JSD). `adaJSD` scores the average
843- JSD over all Cα–Cα distance distributions of residue pairs with
842+ JSD over all Ca-Ca distance distributions of residue pairs with
844843 sequence separation > 1. `ramaJSD` scores the average JSD over the
845- φ–ψ angle distributions of all residues. `ataJSD` scores the average
846- JSD over all alpha torsion angles, which are the angles formed by four
847- consecutive Cα atoms in a protein.
848- featurization_params : dict, optional
844+ phi-psi angle distributions of all residues. `ataJSD` scores the
845+ average JSD over all alpha torsion angles, which are the angles
846+ formed by four consecutive Ca atoms in a protein.
847+ featurization_params: dict, optional
849848 Optional dictionary to customize the featurization process for the
850849 above features.
851- bootstrap_iters : int, optional
852- Number of bootstrap iterations. By default, its value is ``None``. In
853- this case, IDPET will directly compare each pair of ensembles
854- :math:`i` and :math:`j` by using all of their conformers and perform
855- the comparison only once. On the other hand, if an integer value is
856- provided for this argument, each pair of ensembles :math:`i` and
857- :math:`j` will be compared ``bootstrap_iters`` times by randomly
858- selecting (bootstrapping) conformations from them. Additionally, each
859- ensemble will be auto-compared with itself by subsampling conformers
860- via bootstrapping. Then, IDPET will perform a statistical test to
861- determine whether the inter-ensemble (:math:`i \\ neq j`) scores are
862- significantly different from the intra-ensemble (:math:`i = j`)
863- scores.
864-
865- The tests work as follows: for each ensemble pair :math:`i \\ neq j`,
866- IDPET obtains their inter-ensemble comparison scores from
867- bootstrapping. Then, it retrieves the bootstrapping scores from
868- auto-comparisons of ensembles :math:`i` and :math:`j`, and the scores
869- with the higher mean are selected as reference intra-ensemble scores.
870- Finally, the inter-ensemble and intra-ensemble scores are compared via
871- a one-sided Mann–Whitney U test with the alternative hypothesis that
872- inter-ensemble scores are stochastically greater than intra-ensemble
873- scores. The p-values obtained from these tests will additionally be
874- returned.
875-
876- For small protein structural ensembles (fewer than 500 conformations),
877- most comparison scores in IDPET are not robust estimators of
878- divergence or distance. Performing bootstrapping provides an estimate
879- of how ensemble size affects the comparison. Use values ≥ 50 when
880- comparing ensembles with very few conformations (less than 100). When
881- comparing large ensembles (more than 1,000–5,000 conformations), you
882- can safely avoid bootstrapping.
883- bootstrap_frac : float, optional
850+ bootstrap_iters: int, optional
851+ Number of bootstrap iterations. By default its value is None. In
852+ this case, IDPET will directly compare each pair of ensemble $i$ and
853+ $j$ by using all of their conformers and perform the comparison only
854+ once. On the other hand, if providing an integer value to this
855+ argument, each pair of ensembles $i$ and $j$ will be compared
856+ `bootstrap_iters` times by randomly selecting (bootstrapping)
857+ conformations from them. Additionally, each ensemble will be
858+ auto-compared with itself by subsampling conformers via
859+ bootstrapping. Then IDPET will perform a statistical test to
860+ establish if the inter-ensemble ($i != j$) scores are significantly
861+ different from the intra-ensemble ($i == j$) scores. The tests work
862+ as follows: for each ensemble pair $i != j$ IDPET will get their
863+ inter-ensemble comparison scores obtained in bootstrapping. Then, it
864+ will get the bootstrapping scores from auto-comparisons of ensemble
865+ $i$ and $j$ and the scores with the higher mean here are selected as
866+ reference intra-ensemble scores. Finally, the inter-ensemble and
867+ intra-ensemble scores are compared via a one-sided Mann-Whitney U
868+ test with the alternative hypothesis being: inter-ensemble scores
869+ are stochastically greater than intra-ensemble scores. The p-values
870+ obtained in these tests will additionally be returned. For small
871+ protein structural ensembles (less than 500 conformations) most
872+ comparison scores in IDPET are not robust estimators of
873+ divergence/distance. By performing bootstrapping, you can have an
874+ idea of how the size of your ensembles impacts the comparison. Use
875+ values >= 50 when comparing ensembles with very few conformations
876+ (less than 100). When comparing large ensembles (more than
877+ 1,000-5,000 conformations) you can safely avoid bootstrapping.
878+ bootstrap_frac: float, optional
884879 Fraction of the total conformations to sample when bootstrapping.
885- Default value is 1.0, which results in bootstrap samples with the same
886- number of conformations as the original ensemble.
887- bootstrap_replace : bool, optional
888- If ``True``, bootstrap will sample with replacement. Default is
889- ``True``.
890- bins : Union[int, str], optional
880+ Default value is 1.0, which results in bootstrap samples with the
881+ same number of conformations of the original ensemble.
882+ bootstrap_replace: bool, optional
883+ If `True`, bootstrap will sample with replacement. Default is `True`.
884+ bins: Union[int, str], optional
891885 Number of bins or bin assignment rule for JSD comparisons. See the
892- documentation of `` dpet.comparison.get_num_comparison_bins` ` for
886+ documentation of `dpet.comparison.get_num_comparison_bins` for
893887 more information.
894- random_seed : int, optional
888+ random_seed: int, optional
895889 Random seed used when performing bootstrapping.
896- verbose : bool, optional
897- If `` True``, prints additional information about the comparisons to
890+ verbose: bool, optional
891+ If `True`, some information about the comparisons will be printed to
898892 stdout.
899893
900894 Returns
901895 -------
902- results : dict
903- A dictionary containing the following key– value pairs:
904-
905- - ``scores``: a (M, M, B) NumPy array storing the comparison
906- scores, where M is the number of ensembles being compared and
907- B is the number of bootstrap iterations (B = 1 if bootstrapping
908- was not performed).
909- - ``p_values``: a (M, M) NumPy array storing the p-values
910- obtained from the statistical tests performed when using
911- a bootstrapping strategy (see the ``bootstrap_iters`` parameter).
912- Returned only when performing a bootstrapping strategy.
896+ results: dict
897+ A dictionary containing the following key- value pairs:
898+ `scores`: a (M, M, B) NumPy array storing the comparison
899+ scores, where M is the number of ensembles being
900+ compared and B is the number of bootstrap iterations (B
901+ will be 1 if bootstrapping was not performed).
902+ `p_values`: a (M, M) NumPy array storing the p-values
903+ obtained in the statistical test performed when using
904+ a bootstrapping strategy (see the `bootstrap_iters`)
905+ method. Returned only when performing a bootstrapping
906+ strategy.
913907 """
914908
915-
916909 score_type , feature = scores_data [score ]
917910
918911 ### Check arguments.
0 commit comments