-
Notifications
You must be signed in to change notification settings - Fork 220
Metadata FBC3 #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Metadata FBC3 #1440
Conversation
…notationList to serve as alternative to StandardizedAnnotationStore, but without ownership.
…fiersAlias enum to handle common sets of qualifiers.
…vides a powerful interface to select resources.
…t cases, improved implementation. Almost all tests are still passing, except some that seem to be caused by lacking implementations in libsbml.
…c-20304). Only remaining failing test is missing implementation of saving nested annotations of KeyValuePairs (fbc-21502).
Fantastic, thank you for picking up the work. I'll need to reacquaint myself with the changes as it has been too long ago, but I will try to review soon so we can finally publish this work. |
Thanks! |
…RLs and compact urls with integrated namespace) should all work correctly now. Changed Resource equality to namespace+identifier equality, instead of URI equality. Added tests for these cases.
…identifiers.org identifiers as URI. Added the appropriate tests for that. Made hashing method consistent with equality method.
…or standardized.py.
For reference, I fixed/started fixing the following:
I opened an issue in libsbml and got a response (sbmlteam/libsbml#429).
Looking at this issue in the libsml repo, it should not be an issue: sbmlteam/libsbml#360 At least from a technical point of view, mixing FBC3 with L3 v2 Core should be fine.
Added some, will add more.
I took a look at the Resource/identifiers.org logic and found out that the problem is slightly more complex. There are two types of urls, an old variant: Another issue with the two types of URLs is that their patterns overlap. An identifier can have a colon (:), which is the case for biocyc. The biocyc identifiers.org url There are two instances where the compact identifier does not 1:1 match the But all in all, the new identifiers.org interpretation function should handle almost all cases correctly. In other cases, you can always set
This is fixed now. |
…ons to speed up copying of models.
…s not being written to SBML. Updated tests. Added xfail test for nested kvps.
…n adapted accordingly and some tests are marked for skipping when run with python 3.8.
The latest commits addressed the following issues:
The new metadata implementation still makes cobrapy slower, which is expected, since extra complexity is added to each object. The comparison between the
The worst performance hit still happens for copying-related benchmarks, maybe that can be optimized further. The highest increase in runtime happens for |
…overhead. Metadata should not have any recurring (identical) objects, so those checks are not necessary.
This is shaping up very nicely. Thanks!
Not an issue from my side. It's not a massive change and we are talking about a fairly fast operation already (~1s). Does it affect SBML parsing as well? I think we don't have a benchmark for this but it might be good to know how long it takes to read Recon3 for example with the old and new version. The review might take a while because so many files got changed, but the following would help me at least:
I am personally really excited for the custom annotations because that would really fix some pain points we have in MICOM. |
@cdiener Thank you, great to hear!
Yes, it will. I would estimate that for files with a lot of annotations it will be in the same range (let's say >2x slower). I will look at adding a benchmark for this. Optionally, we could add a keyword argument (e.g.
Yeah, that's understandable.
I added a jupyter notebook with an example workflow to the documentation, so that this is a chapter/page of the docs. I will look at that again and provide some more scripts so that you can play around with the new metadata system.
It should be 99% backwards compatible (as in: old scripts should almost always still work). The The new save formats (SBML l3 v2 core + fbc3 and json schema 2) can of course not be read reliably by older cobrapy versions.
Great! I am still a proponent of renaming I saw you ran the CI/CD again and it failed for some environments. The failing tests are all |
Updated version of the SBML FBC version 3 implementation #1237 by @akaviaLab and earlier #988 by @Hemant27031999.
Implements:
metadata
attribute.annotation
attribute. This interface behaves like a dictionary for backward compatibility and allows access to an ObjectsStandardizedAnnotation
s.For reference, the FBC3 specification can be found here: https://github.com/sbmlteam/sbml-specifications/blob/develop/sbml-level-3/version-1/fbc/spec/sbml-fbc-version-3-release-1.pdf
Major differences with the two previous pull requests:
CVTerm
has been renamed toStandardizedAnnotation
andKeyValuePair
toCustomAnnotation
.StandardizeAnnotationStore
andCustomAnnotationStore
, respectively. The annotations keep track of their parent and can be removed usingannotation.remove_from_parent()
, similar to how metabolites etc. do this. This is also implemented forResource
s with respect to their parentStandardizedAnnotation
. Theres is also aStandardizedAnnotationList
, which is basically the same class asStandardizedAnnotationStore
, but it does not change the ownership/parent-child relationship. So this clas can be used as a return type for a selection of existing annotations (a bit like a view).StandardizedAnnotation
has been simplified a bit. Previously, it would contain a qualifier and a list ofExternalResource
s, which could contain resources and nested annotations. The external resources class has now been removed andStandardizedAnnotation
s directly contain resources and nested annotations. AResource
is now its own class that handles interepreting URIs.CustomAnnotation
s (key-value pairs) do not prominently feature aname
andid
anymore, since I think they make them more confusion. They can still be set, but they are not part of the__init__
method etc. ACustomResource
now also inherits from theObject
class, so it can have its own metadata. The FBC3 specification is a bit vague here, since it is not explicitly mentioned thatKeyValuePair
inherits fromSBase
, but it can have a name and id,fbc-21501
mentiones that it can have ansboTerm
andfbc-21502
mentions that it can have notes and annotations/metadata, which is bascially what is implemented in the cobrapyObject
class.Qualifier
names are now also more readable (e.g.Biological_is
vs.bqb_is
,Modelling_isDerivedFrom
vsbqm_isDerivedFrom
).metadata
attribute, instead ofannotation
. Theannotation
attribute is now a class instance that bascially is a dict-like view of themetadata.standardized
standardized annotations (SBML CVTerms).to_records
method, of which the output can directly be used in pandas (demonstrated in the metadata jupyter notebook).annotation
interface), this can probably be expanded/improved.Outstanding issues:
CustomAnnotation
). As mentioned above, it is explicitly stated thatKeyValuePair
, like anSBase
instance, can have annotations. Reading files that have this does workm, but saving does not, since this was not implemented inlibsbml
. I will open an issue in the libsbml repo addressing this.Smaller issues:
CustomAnnotation
. An example could be the value offunctional
for a gene, as proposed in issue [BUG] Gene.functional attribute not saved in SBML model #1422 . One could think of storing its value udner the key 'cobra_gene_functional' for example. Questions would be: should these values still be available undermetadata.custom
? And what happens when that value changes? Should it have a way of storing the datatype (e.g. 'bool:true') and should that be implemented as part of theCustomAnnotation
class? This is all not really urgent, but if some changes are needed to the custom annotation interface, we might want to make them now.Resource
class name, since it's a bit vague, but I wasn't sure what else to name it. In previous discussions it was also mentioned thatAuthor
may be a better name thanCreator
. TheStandardizedAnnotation
andCustomAnnotation
class names are quite long, so we could try to rename those too. MaybeAnnotation
instead ofStandardizedAnnotation
andProperty
/Symbol
/Attribute
/Parameter
/Flag
/... instead ofCustomAnnotation
.Fixes issues:
#810: The sbml_info storing basic information of SBML is written to JSON to store the basic SBML document information like packages, level, version, notes, annotation attached to the SBML component etc.
#684: The complete metadata structure has been redesigned. A compatibility interface remains in the
annotation
attribute of each object, whilst all structured metadata can be accessed through themetadata
attribute. I've split this up, since changing values through the old interface may change the structure/hierarchy or qualifiers. By having a separateannotation
andmetadata
attribute, it is ensured that changing anything inmetadata
does not have a destructive effect. Additionally, old code and old file formats (e.g. json schema v1) can just write to and read from theannotation
interface as if nothing has changed, while new code and formats (e.g. json schema v2) use themetadata
attribute.#937: As mentioned above, using the
metadata
interface should not lead to loss of any relation information.Let me know what you think of the changes. In the meantime, I will try to fix some of the outstanding issues.