Skip to content

Self-substitution costs #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
veghp opened this issue Jun 13, 2020 · 1 comment
Open

Self-substitution costs #21

veghp opened this issue Jun 13, 2020 · 1 comment

Comments

@veghp
Copy link

veghp commented Jun 13, 2020

Thank you for this great package that helps me in comparing short sequences (https://github.com/Edinburgh-Genome-Foundry/Examples/tree/master/SeqDistance).

I'm wondering if it would possible to add a feature: self-substitution costs. Currently the diagonal of the substitution matrix seems to be ignored.

To expand on this a bit, we use some characters to encode multiple characters (e.g. S = C or G), that is, to encode uncertainty. In this case the chance that two Ss encode the same letter is 50%, so the penalty score should be 0.5.

@veghp
Copy link
Author

veghp commented Jun 13, 2020

A current workaround is to replace all characters (ATCG...) in one of the strings to another set of characters (#@;&...) and define penalties between the two sets of characters (alphabets) -- at the cost of halving the number of allowed characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant