|
20 | 20 | /**
|
21 | 21 | * Strand bias estimated by the Symmetric Odds Ratio test
|
22 | 22 | *
|
23 |
| - * <p>Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in incorrect evaluation of the amount of evidence observed for one allele vs. the other. The StrandOddsRatio annotation is one of several methods that aims to evaluate whether there is strand bias in the data. It is an updated form of the Fisher Strand Test that is better at taking into account large amounts of data in high coverage situations. It is used to determine if there is strand bias between forward and reverse strands for the reference or alternate allele.</p> |
| 23 | + * <p>Strand bias is a type of sequencing bias in which one DNA strand is favored over the other, which can result in |
| 24 | + * incorrect evaluation of the amount of evidence observed for one allele vs. the other. The StrandOddsRatio annotation |
| 25 | + * is one of several methods that aims to evaluate whether there is strand bias in the data. It is an updated form of |
| 26 | + * the <a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_FisherStrand.php">Fisher Strand Test</a> |
| 27 | + * that is better at taking into account large amounts of data in high coverage situations. It is used to determine if |
| 28 | + * there is strand bias between forward and reverse strands for the reference or alternate allele(s).</p> |
24 | 29 | *
|
25 | 30 | * <h3>Statistical notes</h3>
|
26 |
| - * <p> Odds Ratios in the 2x2 contingency table below are</p> |
27 | 31 | *
|
28 |
| - * $$ R = \frac{X[0][0] * X[1][1]}{X[0][1] * X[1][0]} $$ |
29 |
| - * |
30 |
| - * <p>and its inverse:</p> |
| 32 | + * <p>The following 2x2 contingency table gives the notation for allele support and strand orientation.</p> |
31 | 33 | *
|
32 | 34 | * <table>
|
33 |
| - * <tr><td> </td><td>+ strand </td><td>- strand</td></tr> |
34 |
| - * <tr><td>REF;</td><td>X[0][0]</td><td>X[0][1]</td></tr> |
35 |
| - * <tr><td>ALT;</td><td>X[1][0]</td><td>X[1][1]</td></tr> |
| 35 | + * <tr><th> </th><th>+ strand </th><th>- strand </th></tr> |
| 36 | + * <tr><th>REF </th><td>X[0][0]</td><td>X[0][1]</td></tr> |
| 37 | + * <tr><th>ALT </th><td>X[1][0]</td><td>X[1][1]</td></tr> |
36 | 38 | * </table>
|
37 | 39 | *
|
38 |
| - * <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT (the sum makes it symmetric). A high value is indicative of large difference where one entry is very small compared to the others. A scale factor of refRatio/altRatio where</p> |
| 40 | + * <p>We can then represent the Odds Ratios with the equation:</p> |
| 41 | + * |
| 42 | + * <img src="http://latex.codecogs.com/svg.latex?$$ R = \frac{X[0][0] * X[1][1]}{X[0][1] * X[1][0]} $$" border="0"/> |
| 43 | + * |
| 44 | + * <p>and its inverse:</p> |
| 45 | + * |
| 46 | + * <img src="http://latex.codecogs.com/svg.latex?$$ \frac{1}{R} = \frac{X[0][1] * X[1][0]}{X[0][0] * X[1][1]} $$" border="0"/> |
39 | 47 | *
|
40 |
| - * $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$ |
| 48 | + * <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT. The sum makes it symmetric. |
| 49 | + * A high value is indicative of large difference where one entry is very small compared to the others. A scale factor |
| 50 | + * of refRatio/altRatio where</p> |
| 51 | + * |
| 52 | + * <img src="http://latex.codecogs.com/svg.latex?$$ refRatio = \frac{min(X[0][0], X[0][1])}{max(X[0][0], X[0][1])} $$" border="0"/> |
41 | 53 | *
|
42 | 54 | * <p>and </p>
|
43 | 55 | *
|
44 |
| - * $$ altRatio = \frac{max(X[1][0], X[1][1])}{min(X[1][0], X[1][1]} $$ |
| 56 | + * <img src="http://latex.codecogs.com/svg.latex?$$ altRatio = \frac{min(X[1][0], X[1][1])}{max(X[1][0], X[1][1])} $$" border="0"/> |
| 57 | + * |
| 58 | + * <p>ensures that the annotation value is large only. The final SOR annotation is given in natural log space.</p> |
| 59 | + * |
| 60 | + * <p>See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a> |
| 61 | + * for a more detailed explanation of this statistical test.</p> |
| 62 | + * |
| 63 | + * <h3>Example calculation</h3> |
| 64 | + * |
| 65 | + * <p>Here is a variant record where SOR is 0.592.</p> |
| 66 | + * |
| 67 | + * <pre> |
| 68 | + * AC=78;AF=2.92135e-02;AN=2670;DP=31492;FS=48.628;MQ=58.02;MQRankSum=-2.02400e+00;MQ_DP=3209;QD=3.03; \ |
| 69 | + * ReadPosRankSum=-1.66500e-01;SB_TABLE=1450,345,160,212;SOR=0.592;VarDP=2167 |
| 70 | + * </pre> |
| 71 | + * |
| 72 | + * <p>Read support shows some strand bias for the reference allele but not |
| 73 | + * the alternate allele. The SB_TABLE annotation (a non-GATK annotation) indicates 1450 reference alleles on the forward strand, 345 |
| 74 | + * reference alleles on the reverse strand, 160 alternate alleles on the forward strand and 212 alternate alleles on |
| 75 | + * the reverse strand. The tool uses these counts towards calculating SOR. To avoid multiplying or dividing by zero |
| 76 | + * values, the tool adds one to each count.</p> |
| 77 | + * |
| 78 | + * <pre> |
| 79 | + * refFw = 1450 + 1 = 1451 |
| 80 | + * refRv = 345 + 1 = 346 |
| 81 | + * altFw = 160 + 1 = 161 |
| 82 | + * altRv = 212 + 1 = 213 |
| 83 | + * </pre> |
| 84 | + * |
| 85 | + * <p>Calculate SOR with the following.</p> |
| 86 | + * |
| 87 | + * <p><img src="http://latex.codecogs.com/svg.latex?$$ SOR = ln(symmetricalRatio) + ln(refRatio) - ln(altRatio) $$" border="0"/></p> |
| 88 | + * |
| 89 | + * <p>where</p> |
| 90 | + * |
| 91 | + * <p><img src="http://latex.codecogs.com/svg.latex?$$ symmetricalRatio = R + \frac{1}{R} $$" border="0"/></p> |
| 92 | + * <p><img src="http://latex.codecogs.com/svg.latex?$$ R = \frac{(\frac{refFw}{refRv})}{(\frac{altFw}{altRv})} = \frac{(refFw*altRv)}{(altFw*refRv)} $$" border="0"/></p> |
| 93 | + * |
| 94 | + * <p><img src="http://latex.codecogs.com/svg.latex?$$ refRatio = \frac{(smaller\;of\;refFw\;and\;refRv)}{(larger\;of\;refFw\;and\;refRv)} $$" border="0"/></p> |
| 95 | + * |
| 96 | + * <p>and</p> |
| 97 | + * |
| 98 | + * <p><img src="http://latex.codecogs.com/svg.latex?$$ altRatio = \frac{(smaller\;of\;altFw\;and\;altRv)}{(larger\;of\;altFw\;and\;altRv)} $$" border="0"/></p> |
45 | 99 | *
|
46 |
| - * <p>ensures that the annotation value is large only. </p> |
| 100 | + * <p>Fill out the component equations with the example counts to calculate SOR.</p> |
47 | 101 | *
|
48 |
| - * <p>See the <a href="http://www.broadinstitute.org/gatk/guide/article?id=4732">method document on statistical tests</a> for a more detailed explanation of this statistical test.</p> |
| 102 | + * <pre> |
| 103 | + * symmetricalRatio = (1451*213)/(161*346) + (161*346)/(1451*213) = 5.7284 |
| 104 | + * refRatio = 346/1451 = 0.2385 |
| 105 | + * altRatio = 161/213 = 0.7559 |
| 106 | + * SOR = ln(5.7284) + ln(0.2385) – ln(0.7559) = 1.7454427755 + (-1.433) – (-0.2798) = 0.592 |
| 107 | + * </pre> |
49 | 108 | *
|
50 | 109 | * <h3>Related annotations</h3>
|
51 | 110 | * <ul>
|
52 |
| - * <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandBiasBySample.php">StrandBiasBySample</a></b> outputs counts of read depth per allele for each strand orientation.</li> |
53 |
| - * <li><b><a href="https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_FisherStrand.php">FisherStrand</a></b> uses Fisher's Exact Test to evaluate strand bias.</li> |
| 111 | + * <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_allelespecific_AS_StrandOddsRatio.php">AS_StrandOddsRatio</a></b> |
| 112 | + * allele-specific strand bias estimated by the symmetric odds ratio test.</li> |
| 113 | + * <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_StrandBiasBySample.php">StrandBiasBySample</a></b> |
| 114 | + * outputs counts of read depth per allele for each strand orientation.</li> |
| 115 | + * <li><b><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_annotator_FisherStrand.php">FisherStrand</a></b> |
| 116 | + * uses Fisher's Exact Test to evaluate strand bias.</li> |
54 | 117 | * </ul>
|
55 | 118 | *
|
56 | 119 | */
|
|
0 commit comments