-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Is it ok to throw non-PyPDF2 exceptions (e.g. ValueError) on malformed PDFs? #584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This exception raises when the def test_startxref_zero():
strict = False
should_fail = True
with_prev_0 = False
pdf_data = (
b"%%PDF-1.7\n"
b"1 0 obj << /Count 1 /Kids [4 0 R] /Type /Pages >> endobj\n"
b"2 0 obj << >> endobj\n"
b"3 0 obj << >> endobj\n"
b"4 0 obj << /Contents 3 0 R /CropBox [0.0 0.0 2550.0 3508.0]"
b" /MediaBox [0.0 0.0 2550.0 3508.0] /Parent 1 0 R"
b" /Resources << /Font << >> >>"
b" /Rotate 0 /Type /Page >> endobj\n"
b"5 0 obj << /Pages 1 0 R /Type /Catalog >> endobj\n"
b"xref 1 5\n"
b"%010d 00000 n\n"
b"%010d 00000 n\n"
b"%010d 00000 n\n"
b"%010d 00000 n\n"
b"%010d 00000 n\n"
b"trailer << %s/Root 5 0 R /Size 6 >>\n"
b"startxref\n"
b"%%%%EOF"
)
pdf_data = pdf_data % (
pdf_data.find(b"1 0 obj"),
pdf_data.find(b"2 0 obj"),
pdf_data.find(b"3 0 obj"),
pdf_data.find(b"4 0 obj"),
pdf_data.find(b"5 0 obj"),
b"/Prev 0 " if with_prev_0 else b"",
)
pdf_stream = io.BytesIO(pdf_data)
PdfFileReader(pdf_stream, strict=strict) |
a file without value after startxref and with a %%EOF seems very odd. the test file can not be read with acrobat. is it normal to accept such a corrupted file ? |
The bug here is that PyPDF2 is raising a |
As long as an exception is raised during pdf loading, it means that the loading fails. converting the exception into an other is not very helpfull. If you put the loading code in a try/except you can convert or hangle it as you wish. To clarify my point the test.pdf provided can hardly be viewed as a pdf, and the example with no value behind startxref seems very unlikely to be produce by any software. My proposal would be to close this issue as not relevant. |
Are you implying that the correct way to use PyPDF2 is to wrap calls to |
Let me share some thoughts
I partially agree here: exceptions that are expected should be documented. However, throwing a ValueError seems fine to me, but it should be part of the docs (on read the docs) |
@MartinThoma, |
The issue is not clearly a bug to me, but the expected behavior / good behavior isn't clear either. For this reason, I've moved this to a discussion: #1210 |
Uh oh!
There was an error while loading. Please reload this page.
edit: This code was adjusted for
PyPDF2==2.9.0
:When running the following code with the latest PyPI version of PyPDF2 on test.pdf
results in an unexpected
ValueError
:The text was updated successfully, but these errors were encountered: