Skip to content

Commit 9c40aba

Browse files
ldgauthiersooheelee
authored andcommitted
Doc fix for SOR (#5703)
Update StrandOddsRatio (SOR) and AS_StrandOddsRatio (AS_SOR) tool documents - Render LaTex equations with plugin (pre-tested) - Render Markdown table correctly and add spacing between columns - In 'Statistical notes' section, introduce the contingency table and place appropriately with text - Provide both R and 1/R equations instead of just one (both are mentioned in the text) - Close out the equation elements appropriately with `]` and `)`s - Mention the final annotation is given in log space and provide a link to an example calculation - Update links, e.g. to related annotations, to point to GATK4 current docs instead of v3.8 - Add example step-by-step SOR calculation using variant record provided by Laura - Clarify SOR is calculated with `ln(ratio) + ln(refRatio) - ln(altRatio)` - Mention one is added to each count
1 parent 811e72a commit 9c40aba

File tree

2 files changed

+112
-33
lines changed

2 files changed

+112
-33
lines changed

src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/StrandOddsRatio.java

+78-15
Original file line numberDiff line numberDiff line change
@@ -20,37 +20,100 @@
2020
/**
2121
* Strand bias estimated by the Symmetric Odds Ratio test
2222
*
23-
* <p>Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other. The StrandOddsRatio annotation is one of several methods that aims to evaluate whether there is strand bias in the data. It is an updated form of the Fisher Strand Test that is better at taking into account large amounts of data in high coverage situations. It is used to determine if there is strand bias between forward and reverse strands for the reference or alternate allele.</p>
23+
* <p>Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in
24+
* incorrect evaluation of the amount of evidence observed for one allele vs. the other. The StrandOddsRatio annotation
25+
* is one of several methods that aims to evaluate whether there is strand bias in the data. It is an updated form of
26+
* the <a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_FisherStrand.php">Fisher Strand Test</a>
27+
* that is better at taking into account large amounts of data in high coverage situations. It is used to determine if
28+
* there is strand bias between forward and reverse strands for the reference or alternate allele(s).</p>
2429
*
2530
* <h3>Statistical notes</h3>
26-
* <p> Odds Ratios in the 2x2 contingency table below are</p>
2731
*
28-
* $$ R = \frac{X[0][0] * X[1][1]}{X[0][1] * X[1][0]} $$
29-
*
30-
* <p>and its inverse:</p>
32+
* <p>The following 2x2 contingency table gives the notation for allele support and strand orientation.</p>
3133
*
3234
* <table>
33-
* <tr><td>&nbsp;</td><td>+ strand </td><td>- strand</td></tr>
34-
* <tr><td>REF;</td><td>X[0][0]</td><td>X[0][1]</td></tr>
35-
* <tr><td>ALT;</td><td>X[1][0]</td><td>X[1][1]</td></tr>
35+
* <tr><th>&nbsp;</th><th>+ strand&nbsp;&nbsp;&nbsp;</th><th>- strand&nbsp;&nbsp;&nbsp;</th></tr>
36+
* <tr><th>REF&nbsp;&nbsp;&nbsp;</th><td>X[0][0]</td><td>X[0][1]</td></tr>
37+
* <tr><th>ALT&nbsp;&nbsp;&nbsp;</th><td>X[1][0]</td><td>X[1][1]</td></tr>
3638
* </table>
3739
*
38-
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT (the sum makes it symmetric). A high value is indicative of large difference where one entry is very small compared to the others. A scale factor of refRatio/altRatio where</p>
40+
* <p>We can then represent the Odds Ratios with the equation:</p>
41+
*
42+
* <img src="http://latex.codecogs.com/svg.latex?$$ R = \frac{X[0][0] * X[1][1]}{X[0][1] * X[1][0]} $$" border="0"/>
43+
*
44+
* <p>and its inverse:</p>
45+
*
46+
* <img src="http://latex.codecogs.com/svg.latex?$$ \frac{1}{R} = \frac{X[0][1] * X[1][0]}{X[0][0] * X[1][1]} $$" border="0"/>
3947
*
40-
* $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$
48+
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT. The sum makes it symmetric.
49+
* A high value is indicative of large difference where one entry is very small compared to the others. A scale factor
50+
* of refRatio/altRatio where</p>
51+
*
52+
* <img src="http://latex.codecogs.com/svg.latex?$$ refRatio = \frac{min(X[0][0], X[0][1])}{max(X[0][0], X[0][1])} $$" border="0"/>
4153
*
4254
* <p>and </p>
4355
*
44-
* $$ altRatio = \frac{max(X[1][0], X[1][1])}{min(X[1][0], X[1][1]} $$
56+
* <img src="http://latex.codecogs.com/svg.latex?$$ altRatio = \frac{min(X[1][0], X[1][1])}{max(X[1][0], X[1][1])} $$" border="0"/>
57+
*
58+
* <p>ensures that the annotation value is large only. The final SOR annotation is given in natural log space.</p>
59+
*
60+
* <p>See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a>
61+
* for a more detailed explanation of this statistical test.</p>
62+
*
63+
* <h3>Example calculation</h3>
64+
*
65+
* <p>Here is a variant record where SOR is 0.592.</p>
66+
*
67+
* <pre>
68+
* AC=78;AF=2.92135e-02;AN=2670;DP=31492;FS=48.628;MQ=58.02;MQRankSum=-2.02400e+00;MQ_DP=3209;QD=3.03; \
69+
* ReadPosRankSum=-1.66500e-01;SB_TABLE=1450,345,160,212;SOR=0.592;VarDP=2167
70+
* </pre>
71+
*
72+
* <p>Read support shows some strand bias for the reference allele but not
73+
* the alternate allele. The SB_TABLE annotation (a non-GATK annotation) indicates 1450 reference alleles on the forward strand, 345
74+
* reference alleles on the reverse strand, 160 alternate alleles on the forward strand and 212 alternate alleles on
75+
* the reverse strand. The tool uses these counts towards calculating SOR. To avoid multiplying or dividing by zero
76+
* values, the tool adds one to each count.</p>
77+
*
78+
* <pre>
79+
* refFw = 1450 + 1 = 1451
80+
* refRv = 345 + 1 = 346
81+
* altFw = 160 + 1 = 161
82+
* altRv = 212 + 1 = 213
83+
* </pre>
84+
*
85+
* <p>Calculate SOR with the following.</p>
86+
*
87+
* <p><img src="http://latex.codecogs.com/svg.latex?$$ SOR = ln(symmetricalRatio) + ln(refRatio) - ln(altRatio) $$" border="0"/></p>
88+
*
89+
* <p>where</p>
90+
*
91+
* <p><img src="http://latex.codecogs.com/svg.latex?$$ symmetricalRatio = R + \frac{1}{R} $$" border="0"/></p>
92+
* <p><img src="http://latex.codecogs.com/svg.latex?$$ R = \frac{(\frac{refFw}{refRv})}{(\frac{altFw}{altRv})} = \frac{(refFw*altRv)}{(altFw*refRv)} $$" border="0"/></p>
93+
*
94+
* <p><img src="http://latex.codecogs.com/svg.latex?$$ refRatio = \frac{(smaller\;of\;refFw\;and\;refRv)}{(larger\;of\;refFw\;and\;refRv)} $$" border="0"/></p>
95+
*
96+
* <p>and</p>
97+
*
98+
* <p><img src="http://latex.codecogs.com/svg.latex?$$ altRatio = \frac{(smaller\;of\;altFw\;and\;altRv)}{(larger\;of\;altFw\;and\;altRv)} $$" border="0"/></p>
4599
*
46-
* <p>ensures that the annotation value is large only. </p>
100+
* <p>Fill out the component equations with the example counts to calculate SOR.</p>
47101
*
48-
* <p>See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a> for a more detailed explanation of this statistical test.</p>
102+
* <pre>
103+
* symmetricalRatio = (1451*213)/(161*346) + (161*346)/(1451*213) = 5.7284
104+
* refRatio = 346/1451 = 0.2385
105+
* altRatio = 161/213 = 0.7559
106+
* SOR = ln(5.7284) + ln(0.2385) – ln(0.7559) = 1.7454427755 + (-1.433) – (-0.2798) = 0.592
107+
* </pre>
49108
*
50109
* <h3>Related annotations</h3>
51110
* <ul>
52-
* <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandBiasBySample.php">StrandBiasBySample</a></b> outputs counts of read depth per allele for each strand orientation.</li>
53-
* <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_FisherStrand.php">FisherStrand</a></b> uses Fisher's Exact Test to evaluate strand bias.</li>
111+
* <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_allelespecific_AS_StrandOddsRatio.php">AS_StrandOddsRatio</a></b>
112+
* allele-specific strand bias estimated by the symmetric odds ratio test.</li>
113+
* <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_StrandBiasBySample.php">StrandBiasBySample</a></b>
114+
* outputs counts of read depth per allele for each strand orientation.</li>
115+
* <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_FisherStrand.php">FisherStrand</a></b>
116+
* uses Fisher's Exact Test to evaluate strand bias.</li>
54117
* </ul>
55118
*
56119
*/

src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/allelespecific/AS_StrandOddsRatio.java

+34-18
Original file line numberDiff line numberDiff line change
@@ -16,44 +16,60 @@
1616
/**
1717
* Allele-specific strand bias estimated by the Symmetric Odds Ratio test
1818
*
19-
* <p>Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other. </p>
20-
*
21-
* <p>The AS_StrandOddsRatio annotation is one of several methods that aims to evaluate whether there is strand bias in the data. It is an updated form of the Fisher Strand Test that is better at taking into account large amounts of data in high coverage situations. It is used to determine if there is strand bias between forward and reverse strands for the reference or alternate allele. It does so separately for each allele. The reported value is ln-scaled.</p>
19+
* <p>Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in
20+
* incorrect evaluation of the amount of evidence observed for one allele vs. the other. The AS_StrandOddsRatio
21+
* annotation is one of several methods that aims to evaluate whether there is strand bias in the data. It is an
22+
* updated form of the Fisher Strand Test that is better at taking into account large amounts of data in high coverage
23+
* situations. It is used to determine if there is strand bias between forward and reverse strands for the reference or
24+
* alternate allele. It does so separately for each allele. The reported value is ln-scaled.</p>
2225
*
2326
* <h3>Statistical notes</h3>
24-
* <p> Odds Ratios in the 2x2 contingency table below are</p>
27+
* <p>The following 2x2 contingency table gives the notation for allele support and strand orientation.</p>
28+
*
29+
* <table>
30+
* <tr><th>&nbsp;</th><th>+ strand&nbsp;&nbsp;&nbsp;</th><th>- strand&nbsp;&nbsp;&nbsp;</th></tr>
31+
* <tr><th>REF&nbsp;&nbsp;&nbsp;</th><td>X[0][0]</td><td>X[0][1]</td></tr>
32+
* <tr><th>ALT&nbsp;&nbsp;&nbsp;</th><td>X[1][0]</td><td>X[1][1]</td></tr>
33+
* </table>
2534
*
26-
* $$ R = \frac{X[0][0] * X[1][1]}{X[0][1] * X[1][0]} $$
35+
* <p>We can then represent the Odds Ratios with the equation:</p>
36+
*
37+
* <img src="http://latex.codecogs.com/svg.latex?$$ R = \frac{X[0][0] * X[1][1]}{X[0][1] * X[1][0]} $$" border="0"/>
2738
*
2839
* <p>and its inverse:</p>
2940
*
30-
* <table>
31-
* <tr><td>&nbsp;</td><td>+ strand </td><td>- strand</td></tr>
32-
* <tr><td>REF;</td><td>X[0][0]</td><td>X[0][1]</td></tr>
33-
* <tr><td>ALT;</td><td>X[1][0]</td><td>X[1][1]</td></tr>
34-
* </table>
41+
* <img src="http://latex.codecogs.com/svg.latex?$$ \frac{1}{R} = \frac{X[0][1] * X[1][0]}{X[0][0] * X[1][1]} $$" border="0"/>
3542
*
36-
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT (the sum makes it symmetric). A high value is indicative of large difference where one entry is very small compared to the others. A scale factor of refRatio/altRatio where</p>
43+
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT. The sum makes it symmetric.
44+
* A high value is indicative of large difference where one entry is very small compared to the others. A scale factor
45+
* of refRatio/altRatio where</p>
3746
*
38-
* $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$
47+
* <img src="http://latex.codecogs.com/svg.latex?$$ refRatio = \frac{min(X[0][0], X[0][1])}{max(X[0][0], X[0][1])} $$" border="0"/>
3948
*
4049
* <p>and </p>
4150
*
42-
* $$ altRatio = \frac{max(X[1][0], X[1][1])}{min(X[1][0], X[1][1]} $$
51+
* <img src="http://latex.codecogs.com/svg.latex?$$ altRatio = \frac{min(X[1][0], X[1][1])}{max(X[1][0], X[1][1])} $$" border="0"/>
4352
*
44-
* <p>ensures that the annotation value is large only. </p>
53+
* <p>ensures that the annotation value is large only. The final SOR annotation is given in natural log space.</p>
4554
*
46-
* <p>See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a> for a more detailed explanation of this statistical test.</p>
55+
* <p>See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a>
56+
* for a more detailed explanation of this statistical test, and see
57+
* <a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_StrandOddsRatio.php">StrandOddsRatio</a>
58+
* documentation for an example calculation.</p>
4759
*
4860
* <h3>Caveat</h3>
4961
* <p>
5062
* The name AS_StrandOddsRatio is not entirely appropriate because the implementation was changed somewhere between the start of development and release of this annotation. Now SOR isn't really an odds ratio anymore. The goal was to separate certain cases of data without penalizing variants that occur at the ends of exons because they tend to only be covered by reads in one direction (depending on which end of the exon they're on), so if a variant has 10 ref reads in the + direction, 1 ref read in the - direction, 9 alt reads in the + direction and 2 alt reads in the - direction, it's actually not strand biased, but the FS score is pretty bad. The implementation that resulted derived in part from empirically testing some read count tables of various sizes with various ratios and deciding from there.</p>
5163
*
5264
* <h3>Related annotations</h3>
5365
* <ul>
54-
* <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandOddsRatio.php">StrandOddsRatio</a></b> outputs a version of this annotation that includes all alternate alleles in a single calculation.</li>
55-
* <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandBiasBySample.php">StrandBiasBySample</a></b> outputs counts of read depth per allele for each strand orientation.</li>
56-
* <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_FisherStrand.php">FisherStrand</a></b> uses Fisher's Exact Test to evaluate strand bias.</li>
66+
* <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_allelespecific_AS_FisherStrand.php">AS_FisherStrand</a></b>
67+
* uses Fisher's Exact Test to evaluate strand bias.</li>
68+
* <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_StrandOddsRatio.php">StrandOddsRatio</a></b>
69+
* outputs a version of this annotation that includes all alternate alleles in a single calculation.</li>
70+
* <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_StrandBiasBySample.php">StrandBiasBySample</a></b>
71+
* outputs counts of read depth per allele for each strand orientation.</li>
72+
*
5773
* </ul>
5874
*
5975
*/

0 commit comments

Comments
 (0)