@@ -81,3 +81,45 @@ contexts would be:
818113 + CHG
828215 - CHG
8383```
84+
85+ The following table should help explain which triples of nucleotides
86+ are counted towards each context. Each of the triples begins with a C,
87+ and in our formats, this is the cytosine where the methylation level
88+ or state is in question.
89+
90+ | Trip | CpG | CHH | CHG | CWG | CCG | |
91+ | ------| -----| -----| -----| -----| -----| ---|
92+ | CAA | | 1 | | | | |
93+ | CAC | | 1 | | | | |
94+ | CAG | | | 1 | 1 | | * |
95+ | CAT | | 1 | | | | |
96+ | CCA | | 1 | | | | |
97+ | CCC | | 1 | | | | |
98+ | CCG | | | 1 | | 1 | * |
99+ | CCT | | 1 | | | | |
100+ | CGA | 1 | | | | | |
101+ | CGC | 1 | | | | | |
102+ | CGG | 1 | | | | | |
103+ | CGT | 1 | | | | | |
104+ | CTA | | 1 | | | | |
105+ | CTC | | 1 | | | | |
106+ | CTG | | | 1 | 1 | | * |
107+ | CTT | | 1 | | | | |
108+
109+ The traditional contexts considered are the CpG, the CHH and the
110+ CHG. The CHH and CHG have been of more interest in plants, especially
111+ Arabidopsis. Together the CpG, CHH and CHG contexts cover all
112+ trinucleotides that start with C, and partition the trinucs
113+ unambiguously. These are the first 3 columns above. The CWG and CCG
114+ are of interest mostly because the CWG is so important in vertebrate
115+ species (and more-so for mammalia) where a combination of
116+ deamination-induced loss of the middle cytosine of a CCG and tandem
117+ expansion of CAG/CTG repeats (including within human populations) has
118+ led to a relative abundance of the CWG in important places in the
119+ genome. The CWG may also be called "symmetric" in the same way as a
120+ CHG. However, in a strict sense, if one calls every CHG symmetric,
121+ then it might include CCG on one strand, with CGG on the other, and
122+ the CGG would not be a CHG.
123+
124+ The above table does not mention the CXG; we plan to remove the CXG
125+ and only include contexts from the above table.
0 commit comments