Self-substitution costs #21

veghp · 2020-06-13T11:27:51Z

Thank you for this great package that helps me in comparing short sequences (https://github.com/Edinburgh-Genome-Foundry/Examples/tree/master/SeqDistance).

I'm wondering if it would possible to add a feature: self-substitution costs. Currently the diagonal of the substitution matrix seems to be ignored.

To expand on this a bit, we use some characters to encode multiple characters (e.g. S = C or G), that is, to encode uncertainty. In this case the chance that two Ss encode the same letter is 50%, so the penalty score should be 0.5.

veghp · 2020-06-13T11:38:19Z

A current workaround is to replace all characters (ATCG...) in one of the strings to another set of characters (#@;&...) and define penalties between the two sets of characters (alphabets) -- at the cost of halving the number of allowed characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-substitution costs #21

Self-substitution costs #21

veghp commented Jun 13, 2020

veghp commented Jun 13, 2020

Self-substitution costs #21

Self-substitution costs #21

Comments

veghp commented Jun 13, 2020

veghp commented Jun 13, 2020