Skip to content

Inputfile SMILES ERROR #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tiger2Wings opened this issue Mar 19, 2025 · 4 comments
Open

Inputfile SMILES ERROR #14

Tiger2Wings opened this issue Mar 19, 2025 · 4 comments

Comments

@Tiger2Wings
Copy link

Tiger2Wings commented Mar 19, 2025

Hi, excellent job!

We have a problem.
When we run the cmd,
an error:
ERROR (2, 10) [21, 22, 23, 24] [24NH2][21CH2][22CH2][23OH]
What does this line mean?
we believe that one SMILES (C1=CC(=C(C=C1N+[O-])Cl)NC(=O)C2=C(C=CC(=C2)Cl)O.C(CO)N) in the Inputfile is wrong.

So, could you please modify this program so that it can automatically skip problematic SMILES and save them to error_smiles.csv?
It would help a lot.
Thank U!

@ch4perone
Copy link
Collaborator

Hi thanks for the remark,
which version are you using? In the latest release v0.1.2 I implemented skipping input reading errors with SMILES strings, and the --debug option prints out SMILES that cause problems.
Can you confirm that you still encounter this error in v0.1.2 and could you give me the cml output (in debug mode), so I can see at which point it crashes?

@Tiger2Wings
Copy link
Author

Tiger2Wings commented Apr 19, 2025

md5sum value of fiora-main/README.md is "c4e14e6815450ce09005264295aef554", does it mean the version is 0.1.2?

Instrument_type: ["HCD", "Q-TOF", "IT-FT/ion trap with FTMS", "IT/ion trap"], are other types available?
If the Instrument_type is "ABSCIEX", it does not belong one of ["HCD", "Q-TOF", "IT-FT/ion trap with FTMS", "IT/ion trap"], the cmd "fiora-predict" still works, so what is the default Instrument_type when the program runs?

the Input file is attached as error1.csv
error1.csv

cmd "fiora-predict -i error1.csv -o e1.mgf --debug", how can it detect invalid smiles?
the cmd shows:

Running` fiora prediction with the following parameters: Namespace(input='error1.csv', output='e1.mgf', model='default', dev='cpu', min_prob=0.001, rt=False, ccs=False, annotation=False, debug=True)

-----Model-----
Fiora OS v0.1.0
---------------

Disclaimer: No prediction software is perfect. This is an early open-source model. Use with caution.
ERROR (0, 1) [22] [22Cl-]
Traceback (most recent call last):
  File "/home/ubuntu/.conda/envs/fiora/bin/fiora-predict", line 196, in <module>
    main()
  File "/home/ubuntu/.conda/envs/fiora/bin/fiora-predict", line 179, in main
    df, invalid_df = build_metabolites(df, model.model_params)
  File "/home/ubuntu/.conda/envs/fiora/bin/fiora-predict", line 105, in build_metabolites
    df["Metabolite"].apply(lambda x: x.fragment_MOL(depth=1))
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/series.py", line 4924, in apply
    ).apply()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/apply.py", line 1427, in apply
    return self.apply_standard()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/apply.py", line 1507, in apply_standard
    mapped = obj._map_values(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1743, in map_array
    return lib.map_infer(values, mapper, convert=convert)
  File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
  File "/home/ubuntu/.conda/envs/fiora/bin/fiora-predict", line 105, in <lambda>
    df["Metabolite"].apply(lambda x: x.fragment_MOL(depth=1))
  File "/home/ubuntu/.conda/envs/fiora/lib/python3.10/site-packages/fiora/MOL/Metabolite.py", line 210, in fragment_MOL
    self.fragmentation_tree.build_fragmentation_tree(self.MOL, self.edges_as_tuples, depth=depth)
  File "/home/ubuntu/.conda/envs/fiora/lib/python3.10/site-packages/fiora/MOL/FragmentationTree.py", line 132, in build_fragmentation_tree
    _, fragments = self.create_Fragments(mol, i, j, original_mol_isotopes=mol_isotopes)
  File "/home/ubuntu/.conda/envs/fiora/lib/python3.10/site-packages/fiora/MOL/FragmentationTree.py", line 173, in create_Fragments
    return new_mol, [Fragment(m, edge=(int(i), int(j)), isotope_labels=original_mol_isotopes) for m in fragment_mols]
  File "/home/ubuntu/.conda/envs/fiora/lib/python3.10/site-packages/fiora/MOL/FragmentationTree.py", line 173, in <listcomp>
    return new_mol, [Fragment(m, edge=(int(i), int(j)), isotope_labels=original_mol_isotopes) for m in fragment_mols]
  File "/home/ubuntu/.conda/envs/fiora/lib/python3.10/site-packages/fiora/MOL/FragmentationTree.py", line 27, in __init__
    raise ValueError("Unidentified edge in fragment")
ValueError: Unidentified edge in fragment

@ch4perone
Copy link
Collaborator

ch4perone commented Apr 22, 2025

I think the program has problems reading the "." symbol in the SMILES, though the error occurs much later when fragmenting the molecule. I will work on a fix soon. For now, I recommend removing every SMILES with with a dot "." from the csv file. I hope this already helps.

Regarding the instrument type. Yours will be automatically flagged as "Others" instrument type for model input. I recommend using "HCD" instead, since the OS model predominantly trained on Orbitrap data. You should yield better results, even if its not technically correct.

@Tiger2Wings
Copy link
Author

Tiger2Wings commented Apr 23, 2025

I think the program has problems reading the "." symbol in the SMILES, though the error occurs much later when fragmenting the molecule. I will work on a fix soon. For now, I recommend removing every SMILES with with a dot "." from the csv file. I hope this already helps.

Regarding the instrument type. Yours will be automatically flagged as "Others" instrument type for model input. I recommend using "HCD" instead, since the OS model predominantly trained on Orbitrap data. You should yield better results, even if its not technically correct.

i have just written a script that can censor the smiles and make predictions.
fiora_check1.py
(File type not allowed: .py. So you can just convert the uploaded .txt to .py)

It looks like that Error occurs when a molecule isn NOT single_connected.

def is_single_connected(smiles):
    mol = Chem.MolFromSmiles(smiles)
    return len(Chem.GetMolFrags(mol, asMols=True)) == 1 if mol else False

fiora_check1.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants