-
Notifications
You must be signed in to change notification settings - Fork 190
Issue with Gene Pair Discrepancy #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Similar to the commonly used BLAST tool, sometimes the match is not transitive. For example, A is a fusion protein between B and C, and let's say 1, 2, 3 is the species.
You can then have matches between |
To investigate whether this issue exists, I used the following code to identify CDS sequences that are present in A vs B and A vs C gene pairs but not in the B vs C gene pairs. I then performed a BLAST search to check the matches, and I found that most of the sequences matched 100%, suggesting that they are not fusion proteins. However, despite these sequences being 100% identical, JCVI did not recognize them as gene pairs. I have attached the code I used, as well as a portion of the results, for your review.
I would appreciate any insights you can provide on why JCVI is not identifying these sequences as gene pairs and how I might resolve this issue. |
Take a look at the file There are two filters:
In particular, the tandem filter may also be a reason for removal, which is what's happening in #770. |
Also, when you check the gene pairs, make sure to also check |
It seems that it was filtered out at this step. I did not find the gene pairs identified by the BLAST search in the
|
wait, are the gene names the same in different species? JCVI will remove gene pairs with the same name by default to prevent self matches. |
no wonder! thank you for the clarification. I will make the necessary changes and try again. Thanks! |
Indeed, the number of gene pairs has returned to normal. However, I have encountered a new issue. In the *.lifted.anchors file, there are some gene pairs that also exist in the .anchors files. Yet, after running the following command: |
I am using JCVI, which is a very powerful tool, but I have encountered some issues. I am analyzing four varieties (A, B, C, D) of the same species using the same code for gene pair analysis. However, the results show significant differences in the number of gene pairs, as follows:
For example, A#gene1 and B#gene1 are identified as a gene pair, A#gene1 and C#gene1 are also identified as a gene pair, but B#gene1 and C#gene1 are not identified as a gene pair.
I would like to understand why this discrepancy is occurring and how to resolve this issue. This problem seems to be related to a similar issue. #770
Could you please provide guidance on how to address this? Thank you very much for your assistance.
The text was updated successfully, but these errors were encountered: