diff --git a/docs/_includes/ADR0014.md b/docs/_includes/ADR0014.md new file mode 100644 index 00000000..787b159c --- /dev/null +++ b/docs/_includes/ADR0014.md @@ -0,0 +1,11 @@ +!!! note inline end "About the term "decision tree"" + + In machine learning, a decision tree usually refers to a model learned from + data through statistical analysis. That’s not what SSVC uses the term for. In + the original SSVC documentation, decision tree referred to the + operations-research concept: a hand-crafted structure that encodes deliberate + choices, not a model inferred from datasets. + + Starting in SSVC v2025.9, we shifted to the term decision table to avoid + confusion with the ML meaning. For more information, see + [ADR-0014](../adr/0014-use-decision-table-terminology.md). diff --git a/docs/topics/decision_trees.md b/docs/topics/decision_trees.md index 8e03ce6c..004b38fa 100644 --- a/docs/topics/decision_trees.md +++ b/docs/topics/decision_trees.md @@ -1,43 +1,68 @@ # Decision Trees -A decision tree is an acyclic structure where nodes represent aspects of the decision or relevant properties and branches represent possible options for each aspect or property. -Each decision point can have two or more options. +{% include-markdown "../_includes/ADR0014.md" %} -Decision trees can be used to meet all of the design goals, even plural recommendations and transparent tree-construction processes. -Decision trees support plural recommendations because a separate tree can represent each stakeholder group. -The opportunity for transparency surfaces immediately: any deviation among the decision trees for different stakeholder groups should have a documented reason—supported by public evidence when possible—for the deviation. -Transparency may be difficult to achieve, since each node in the tree and each of the values need to be explained and justified, but this cost is paid infrequently. +A decision tree is an acyclic structure where nodes represent aspects of the +decision or relevant properties and branches represent possible options for each +aspect or property. Each decision point can have two or more options. -There has been limited but positive use of decision trees in vulnerability management. -For example, Vulnerability Response Decision Assistance (VRDA) studies how to make decisions about how to respond to vulnerability reports [@manion2009vrda]. -This paper continues roughly in the vein of such work to construct multiple decision trees for prioritization within the vulnerability management process. +Decision trees can be used to meet all of the design goals, even plural +recommendations and transparent tree-construction processes. Decision trees +support plural recommendations because a separate tree can represent each +stakeholder group. The opportunity for transparency surfaces immediately: any +deviation among the decision trees for different stakeholder groups should have +a documented reason—supported by public evidence when possible—for the +deviation. Transparency may be difficult to achieve, since each node in the +tree and each of the values need to be explained and justified, but this cost is +paid infrequently. + +There has been limited but positive use of decision trees in vulnerability +management. For example, [Vulnerability Response Decision Assistance +(VRDA)](https://www.sei.cmu.edu/library/effectiveness-of-the-vulnerability-response-decision-assistance-vrda-framework) +studies how to make decisions about how to respond to vulnerability reports. +This paper continues roughly in the vein of such work to construct multiple +decision trees for prioritization within the vulnerability management process. ## Representation choices -A decision tree can represent the same content in different ways. -Since a decision tree is a representation of logical relationships between qualitative variables, the equivalent content can be represented in other formats as well. -The R package [data.tree](https://cran.r-project.org/web/packages/data.tree/data.tree.pdf) has a variety of both internal representations and visualizations. +A decision tree can represent the same content in different ways. Since a +decision tree is a representation of logical relationships between qualitative +variables, the equivalent content can be represented in other formats as well. +The R package +[data.tree](https://cran.r-project.org/web/packages/data.tree/data.tree.pdf) has +a variety of both internal representations and visualizations. -For data input, we elected to keep SSVC simpler than R, and just use a CSV (or other fixed-delimiter separated file) as canonical data input. -All visualizations of a tree should be built from a canonical CSV that defines the decisions for that stakeholder. -Examples are located in [SSVC/data](https://github.com/CERTCC/SSVC/tree/main/data). -An interoperable CSV format is also flexible enough to support a variety of uses. -Every situation in SSVC is defined by the values for each decision point and the priority label (outcome) for that situation (as defined in [Likely Decision Points and Relevant Data](../reference/decision_points/index.md)). -A CSV will typically be 30-100 rows that each look something like: +For data input, we elected to keep SSVC simpler than R, and just use a CSV (or +other fixed-delimiter separated file) as canonical data input. All +visualizations of a tree should be built from a canonical CSV that defines the +decisions for that stakeholder. Examples are located in +[SSVC/data](https://github.com/CERTCC/SSVC/tree/main/data). An interoperable +CSV format is also flexible enough to support a variety of uses. Every +situation in SSVC is defined by the values for each decision point and the +priority label (outcome) for that situation (as defined in [Likely Decision +Points and Relevant Data](../reference/decision_points/index.md)). A CSV will +typically be 30-100 rows that each look something like: ``` 2,none,laborious,partial,significant,scheduled ``` -Where “2” is the row number, [*none*](../reference/decision_points/exploitation.md) through [*significant*](../reference/decision_points/public_safety_impact.md) are values for decision points, and *scheduled* is a priority label or outcome. -Different stakeholders will have different decision points (and so different options for values) and different outcomes, but this is the basic shape of a CSV file to define SSVC stakeholder decisions. +Where “2” is the row number, +[*none*](../reference/decision_points/exploitation.md) through +[*significant*](../reference/decision_points/public_safety_impact.md) are values +for decision points, and *scheduled* is a priority label or outcome. Different +stakeholders will have different decision points (and so different options for +values) and different outcomes, but this is the basic shape of a CSV file to +define SSVC stakeholder decisions. ### Visualizing Decision Trees -The tree visualization options are more diverse. -We provide an example format, and codified it in [src/SSVC_csv-to-latex.py](https://github.com/CERTCC/SSVC/tree/main/src). -Why have we gone to this trouble when (for example) the R data.tree package has a handy print-to-ASCII function? -Because this function produces output like the following: +The tree visualization options are more diverse. We provide an example format, +and codified it in +[src/SSVC_csv-to-latex.py](https://github.com/CERTCC/SSVC/tree/main/src). Why +have we gone to this trouble when (for example) the R data.tree package has a +handy print-to-ASCII function? Because this function produces output like the +following: ``` 1 start @@ -52,7 +77,9 @@ Because this function produces output like the following: 35 ¦ ¦ ¦ ¦ ¦ ¦ ¦--A:H Critical ``` -This sample is a snippet of the CVSS version 3.0 base scoring algorithm represented as a decision tree. -The full tree can be found [here](cvss_full_tree.md). -This tree representation is functional, but not as flexible or aesthetic as might be hoped. -The visualizations provided by R are geared towards analysis of decision trees in a random forest ML model, rather than operations-research type trees. +This sample is a snippet of the CVSS version 3.0 base scoring algorithm +represented as a decision tree. The full tree can be found +[here](cvss_full_tree.md). This tree representation is functional, but not as +flexible or aesthetic as might be hoped. The visualizations provided by R are +geared towards analysis of decision trees in a random forest ML model, rather +than operations-research type trees. diff --git a/docs/topics/formalization_options.md b/docs/topics/formalization_options.md index 75d8d14e..8f911feb 100644 --- a/docs/topics/formalization_options.md +++ b/docs/topics/formalization_options.md @@ -1,5 +1,7 @@ # Formalization Options +{% include-markdown "../_includes/ADR0014.md" %} + This section briefly surveys the available formalization options against the six design goals described above. The table below summarizes the results. This survey is opportunistic; it is based on conversations with several experts and our professional experience.