Skip to content

Problem with Iterative Search Web Server Output #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
meltzj opened this issue Apr 28, 2025 · 8 comments
Open

Problem with Iterative Search Web Server Output #456

meltzj opened this issue Apr 28, 2025 · 8 comments

Comments

@meltzj
Copy link

meltzj commented Apr 28, 2025

I've noticed that in the iterative search web server output the order of the matches is not completely in the order of 'best to worst'. 
The bitscores for the hits within the database first get lower and then when you get to the lowest, there's another high bitscore, sometimes even double that of the hit on the top of the list. 
I noticed there are 'three' sections where there is a high score that then descends. Could this be to do with the number of iterations? Is there a way that the output can be ordered as the combination of iterations? So the best score from all the iterations is the first and the lowest is last etc.?

@martin-steinegger
Copy link
Collaborator

It’s ordered by iterations. We ordered it that way so that it’s possible to see in what iteration something was found.

@meltzj
Copy link
Author

meltzj commented Apr 29, 2025

Thank you for your very quick response and for clarifying!

@meltzj
Copy link
Author

meltzj commented Apr 29, 2025

A few more questions:

  1. It is possible that there is e-value inflation during the iteration process?
  2. It seems that each hit is only reported once in local alignment so if it is picked up in an early iteration is the score not further increased?
  3. Is it possible to compare the scores between iterations. My ultimate objective is to find a short list of the most similar structures to my query. If I download the data from m8 files and re-sort based on bit score will that give me the correct ranking?

@martin-steinegger
Copy link
Collaborator

(1) yes it is inflated because we do not do a reverse score correction. However, I did push a change for this. This should be better but we would need to update the server. @milot-mirdita
(2) Yes, we do not use --alt-ali in the server but you can do this locally.
(3) I assume you would like to have something like a query normalized LDDT?

@meltzj
Copy link
Author

meltzj commented May 2, 2025

  1. That would be great! thank you!
  2. What I meant was with the iterative 3di local search each hit is only reported once in one iteration. e.g. if AF_protein_1 is picked up in the first iteration, and even reported as the top result, it isn't reported in a later iteration result (i don't see it listed in iteration 2 or 3). If it is picked up in an early iteration is the score not further increased?
  3. I think a query normalised lddt score would be super helpful and allow comparison between iterations.

@meltzj
Copy link
Author

meltzj commented May 8, 2025

Hi Martin, thanks for all your help so far. Any ideas of when this will be updated?
I have a deadline for this study coming up very soon so I wanted to know what you recommend as the best way for me to move forward. Happy to wait if the update will be soon otherwise let me know what you recommend as the way for me to compare the matches from iterations with the current server settings, or if I should only do one iteration.

@martin-steinegger
Copy link
Collaborator

martin-steinegger commented May 8, 2025

Do you have the resources to run it locally? If so, I obtained the best results using the following search workflow. This approach performs as well as --num-iterations 3 but avoids issues related to shifts in the e-value distribution across iterations. Alternatively, you can print the alnlddt or qtmscore for each hit using the --format-output option in convertalis.

foldseek search querydb targetdb aln tmp -e 10 --max-seqs 2000 -a
foldseek result2profile querydb targetdb aln queryProfile
foldseek search queryProfile targetdb aln tmp -a 

This achieve the following perfomrance

        Fam.     Super. Fam.  Fold
Profile 0.898821 0.559285 0.145641
Seq.    0.860359 0.486473 0.105935

@martin-steinegger
Copy link
Collaborator

We deployed the new iterative search—please give it a try.
Your search might be cached. You can avoid caching by slightly changing the input PDB/mmCIF file or by adding extra databases (DBs) to your search.

Please let us know if this iterative search helped you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants