Closed
Description
While using the library I'm getting the following error:
TypeError: unsupported operand type(s) for /: 'IndirectObject' and 'int'
Environment
Which environment were you using when you encountered the problem?
macOS-10.16-x86_64-i386-64bit
pypdf==3.17.0, crypt_provider=('cryptography', '37.0.4'), PIL=9.0.1
It happens locally as well as during Azure-deployment.
Code + PDF
This is a minimal, complete example that shows the issue:
reader = PdfReader(file)
pages = reader.pages
Unfortunately I cannot share the document that causes the problem as it contains sensitive information. I also couldn't reproduce it with other documents. This adjustment after line 89, however, solves the problem:
import pypdf
...
sp_width = compute_space_width(ft, sp, space_width)
sp_width = sp_width if type(sp_width) != pypdf.generic._base.IndirectObject else sp_width.get_object()
Traceback
This is the complete Traceback I see:
File "..././scripts/prepdocs.py", line 262, in <module>
loop.run_until_complete(main(file_strategy, azd_credential, args))
File ".../opt/anaconda3/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "..././scripts/prepdocs.py", line 137, in main
await strategy.run(search_info)
File ".../scripts/prepdocslib/filestrategy.py", line 58, in run
pages = [page async for page in self.pdf_parser.parse(content=file.content)]
File ".../scripts/prepdocslib/filestrategy.py", line 58, in <listcomp>
pages = [page async for page in self.pdf_parser.parse(content=file.content)]
File ".../scripts/prepdocslib/pdfparser.py", line 52, in parse
page_text = p.extract_text()
File ".../scripts/.venv/lib/python3.9/site-packages/pypdf/_page.py", line 2284, in extract_text
return self._extract_text(
File ".../scripts/.venv/lib/python3.9/site-packages/pypdf/_page.py", line 1903, in _extract_text
cmaps[f] = build_char_map(f, space_width, obj)
File ".../scripts/.venv/lib/python3.9/site-packages/pypdf/_cmap.py", line 29, in build_char_map
font_subtype, font_halfspace, font_encoding, font_map = build_char_map_from_dict(
File ".../scripts/.venv/lib/python3.9/site-packages/pypdf/_cmap.py", line 93, in build_char_map_from_dict
float(sp_width / 2),
TypeError: unsupported operand type(s) for /: 'IndirectObject' and 'int'
Metadata
Metadata
Assignees
Labels
No labels