Skip to content

Words in DocumentLine store polygons as lists of floats instead of lists of Points #39062

Open
@dylwil3

Description

@dylwil3
  • Package Name: azure.ai.formrecognizer
  • Package Version: 3.3.3
  • Operating System: MacOS
  • Python Version: 3.12

Describe the bug

The words in a DocumentLine seem to store their polygons as a list of floats rather than as a list of Point objects.

from typing import reveal_type

line: DocumentLine
for w in line.get_words():
    for p in w.polygon:
       reveal_type(p) # this reveals as `Point` but it turns out to be a `float` at runtime

This is in contrast to iterating over words from the words field in a DocumentPage, where the polygons are indeed sequences of Point objects.

To Reproduce
Steps to reproduce the behavior:

  1. Use a prebuilt-read model on pdf bytes to run OCR
  2. Examine the words in a document line in the result

Tangentially related to #39031

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Cognitive - Form RecognizerService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions