Replies: 1 comment 1 reply
-
Thanks for the link. As far as I understand, the Arlington PDF model basically is for PDF files what XSD is for XML files. This means that it should enforce strict compliance to the standard itself and thus (from my experience) fail with quite some PDF files. Regarding pypdf, I do not see the real benefit for this. We can already create PDF files to some extent, although it quickly becomes quite complex to do some object handling from scratch (which tends to be out of scope for pypdf IMHO). Defining a mandatory dependency on the model itself would further restrict the license of pypdf as the model definitions are subject to Apache-2.0, which we should avoid - apart from the fact that this most likely would mean larger rewrites. TL;DR: I do not see any real benefit of the Arlington model for pypdf.
I honestly do not get the rationale/goal of this sentence. What are you trying to achieve? PDF analysis can be done on any PDF file in conjunction with the official specification (this most likely is what most developers with experience of the PDF internals - including myself - have done at some point and/or are still doing on a regular basis) and using the model (or developing with it) already requires proper experience (from their README):
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
https://github.com/pdf-association/arlington-pdf-model/
The model is specification derived, using structured data which is machine and human readable.
It has a comprehensive definition of every PDF object:
The model has applications for:
PDF-Days-2021-Arlington-PDF-model.pdf
So there are potential use cases for pypdf.
More broadly, the model + pypdf could be used to create PDFs.
Something like starting with Catalog.tsv then either select a value from column PossibleValues or iterate the process from the table referenced in column Link. Probably lots of complications, like xref and things like that.
Comments (and hopefully code!) welcome on how the model and pypdf could be used to create PDFs. The use case would be PDF analysis, so text for example could be from lorem ipsum.
Beta Was this translation helpful? Give feedback.
All reactions