Skip to content

fix chucking text None type has no attribute stripe #4018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ jobs:
- name: Test
env:
UNS_API_KEY: ${{ secrets.UNS_API_KEY }}
TESSERACT_VERSION : "5.4.1"
TESSERACT_VERSION : "5.5.1"
run: |
source .venv/bin/activate
sudo apt-get update
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
## 0.17.11-dev0
## 0.17.11-dev1

### Enhancements

### Features

### Fixes
- Fix chunking for elements with None text that has AttributeError 'NoneType' object has no attribute 'strip'.
- Invalid elements IDs are not visible in VLM output. Parent-child hierarchy is now retrieved based on unstructured element ID, instead of id injected into HTML code of element.

## 0.17.10
Expand Down
23 changes: 23 additions & 0 deletions test_unstructured/chunking/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,20 @@ def it_can_handle_element_with_none_as_text(self):
)
assert pre_chunk._text == "hello"

def it_can_chunk_elements_with_none_text_without_error(self):
"""Regression test for AttributeError when Image elements have None text."""
pre_chunk = PreChunk(
[Image(None), Text("hello world"), Image(None)],
overlap_prefix="",
opts=ChunkingOptions(),
)

# Should not raise AttributeError when generating chunks
chunks = list(pre_chunk.iter_chunks())

assert len(chunks) == 1
assert chunks[0].text == "hello world"

@pytest.mark.parametrize(
("max_characters", "combine_text_under_n_chars", "expected_value"),
[
Expand Down Expand Up @@ -1026,6 +1040,15 @@ def it_computes_the_original_elements_list_to_help(self):
# -- computation is only on first call, all chunks get exactly the same orig-elements --
assert table_chunker._orig_elements is orig_elements

def it_handles_table_with_none_text_without_error(self):
"""Regression test for AttributeError when Table elements have None text."""
table = Table(None) # Table with None text

# Should not raise AttributeError and should produce no chunks
chunks = list(_TableChunker.iter_chunks(table, "", ChunkingOptions()))

assert len(chunks) == 0


# ================================================================================================
# HTML SPLITTERS
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
</title>
</head>
<body>
<h1 class="Title" id="33d8fd813310ae3e74efd7e17fef99df">
a Department of the Treasury Internal Revenue Service Instructions for Form 3115 (Rev. November 1987) Application for Change in Accounting Method
<h1 class="Title" id="9c3a63df0fa9649fd2065ebcc4922e18">
gai) Department of the Treasury Internal Revenue Service Instructions for Form 3115 (Rev. November 1987) Application for Change in Accounting Method
</h1>
<p class="NarrativeText" id="5801c515b515aadfb7717e4c36a4cea4">
(Section references are to the Internal Revenue Code unless otherwise noted.)
Expand All @@ -28,29 +28,29 @@ <h1 class="Title" id="85af235e687b4a6537e5542a42456d25">
<p class="NarrativeText" id="8753b1907d0b40b882489a68baf3fe2c">
File this form to request a change in your accounting method, including the accounting treatment of any item. If you are requesting a change in accounting period, use Form 1128, Application for Change in Accounting Period. For more information, see Publication 538, Accounting Periods and Methods.
</p>
<p class="NarrativeText" id="0cf9161971e9ea8feec111ff7d24f403">
When filing Form 3115, taxpayers are reminded to determine if IRS has published a ruling or procedure dealing with the specific type of change since November 1987 (the current revision date of Form 3115),
<p class="NarrativeText" id="7b5365f4534832bac87e1df792cf5b16">
When filing Form 3115, taxpayers are reminded to determine if IRS has published a ruling or procedure dealing with the specific type of change since November 1987 (the current revision date of Form 3115).
</p>
<p class="NarrativeText" id="0fb8eb24db1b27f6f8b69213e3dd9b41">
Long-term contracts. —If you are required to change your method of accounting for long-term contracts under section 460, see Notice 87-61 (9/21/87), 1987-38 IRB 40, for the notification procedures that must be followed.
</p>
<p class="NarrativeText" id="7282f497b067ed1e34176cc85d46ea8e">
Other methods.—Unless the Service has published a regulation or procedure to the contrary, all other changes !n accounting methods required by the Act are automatically considered to be approved by the Commissioner. Examples of method changes automatically approved by the Commissioner are those changes required to effect: (1) the repeal of the reserve method for bad debts of taxpayers other than financial institutions (Act section 805); (2) the repeal of the installment method for sales under a revolving credit plan (Act section 812); (3) the Inclusion of income attributable to the sale or furnishing of utility services no later than the year In which the services were provided to customers (Act section 821); and (4) the repeal of the deduction for qualified discount coupons (Act section 823). Do not file Form 3115 for these changes.
</p>
<p class="NarrativeText" id="61f76478266283c91988a108081fc02e">
Generally, applicants must complete Section A. In addition, complete the appropriate sections (B-1 through H) for which a change Is desired.
<p class="NarrativeText" id="9218e8a34790d23be418f5c4ffaaf54c">
Generally, applicants must complete Section A. \n addition, complete the appropriate sections (B-1 through H) for which a change Is desired.
</p>
<p class="NarrativeText" id="b8f9f1fdeffadd34472959092459fba9">
You must give all relevant facts, including a detailed description of your present and proposed methods. You must also state the reason(s) you believe approval to make the requested change should be granted. Attach additional pages if more space is needed for explanations. Each page should show your name, address, and identifying number.
</p>
<p class="NarrativeText" id="6055008a5485b687b614551c78a89c6e">
State whether you desire a conference in the National Office if the Service proposes to disapprove your application.
<p class="NarrativeText" id="b7ac9f40a0b010ca0f9a6dedba12a95c">
State whether you desire a conference In the National Office if the Service proposes to disapprove your application.
</p>
<h1 class="Title" id="45da2e5561453f7cdfcf31c1ace13cf0">
Changes to Accounting Methods Required Under the Tax Reform Act of 1986
</h1>
<p class="NarrativeText" id="9256e7591256b6799035172da259b839">
Uniform capitalization rules and limitation on cash method.—If you are required to change your method of accounting under section,263A (relating to the capitalization and inclusion in inventory costs of certain expenses) or 448 (limiting the use of the cash method of accounting by certain taxpayers) as added by the Tax Reform Act of 1986 (“Act”), the change 1s treated as initiated by the taxpayer, approved by the Commissioner, and the period for taking the adjustments under section 481(a) into account will not exceed 4 years. (Hospitals required to change from the cash method under section 448 have 10 years to take the adjustrnents into account.) Complete Section A and the appropriate sections (B-1 or C and D) for which the change is required.
<p class="NarrativeText" id="0476fb3d546e315ae90c733259812973">
Uniform capitalization rules and limitation on cash method.—If you are required to change your method of accounting under section,263A (relating to the capitalization and inclusion in inventory costs of certain expenses) or 448 (limiting the use of the cash method of accounting by certain taxpayers) as added by the Tax Reform Act of 1986 (“Act”), the change is treated as initiated by the taxpayer, approved by the Commissioner, and the period for taking the adjustments under section 481(a) into account will not exceed 4 years. (Hospitals required to change from the cash method under section 448 have 10 years to take the adjustrnents into account.) Complete Section A and the appropriate sections (B-1 or C and D) for which the change is required.
</p>
<p class="NarrativeText" id="9951e8eac8f909df08655f3bc100a586">
Disregard the instructions under Time and Place for Filing and Late Applications. Instead, attach Form 3115 to your income tax return for the year of change; do not file it separately. Also include on a separate statement accompanying the Form 3115 the period over which the section 481(a) adjustment will be taken into account and the basis for that conclusion. Identify the automatic change being made at the top of page 1 of Form 3115 (e.g., “Automatic Change to Accrual Method—Section 448"). See Temporary Regulations sections 1.263A-1T and 1.448-1T for additional information.
Expand All @@ -76,8 +76,8 @@ <h1 class="Title" id="daacd181c8b4c9cdeaa9762e5efd3586">
<h1 class="Title" id="9bac1c8a91f637da3c6114d95239ceee">
Late Applications
</h1>
<p class="NarrativeText" id="c92c7f4def0263141b370bf307d6bcc0">
If your application is filed after the 180-day period, it is late. The application will be considered for processing only upon a showing of “good cause” and if it can be shown to the satisfaction of the Commissioner that granting you an extension will not jeopardize the Government's interests. For further information, see Rev, Proc. 79-63.
<p class="NarrativeText" id="adad72fa6ed1f3d66351440221c1ad23">
If your application is filed after the 180-day period, it 1s late. The application will be considered for processing only upon a showing of “good cause” and if it can be shown to the satisfaction of the Commissioner that granting you an extension will not jeopardize the Government's interests. For further information, see Rev, Proc. 79-63.
</p>
<h1 class="Title" id="569b780f1a01b3fe19031adfd2ff6567">
Identifying Number
Expand Down Expand Up @@ -118,8 +118,8 @@ <h1 class="Title" id="441fb1ede36ac4766833502b0400a14a">
<h1 class="Title" id="5a646ca8e56ece623a47079b32e62fc6">
Specific Instructions
</h1>
<h1 class="Title" id="e0e692b1f478333e3950f8cb2483a484">
Section A
<h1 class="Title" id="1505240fbe441adc4acdbc867689af29">
SectionA
</h1>
<p class="NarrativeText" id="43c45bb43eaf69131bf2392df1239ef2">
Item 5a, page 1.—“Taxable income or (loss) from operations” is to be entered before application of any net operating loss deduction under section 172(a).
Expand Down Expand Up @@ -166,8 +166,8 @@ <h1 class="Title" id="1f5704b56b007d890b634121c86d81ac">
<p class="NarrativeText" id="454de5bfbdcba4385a21dd6261c57d53">
The limitation on the use of the cash method (except for tax shelters) does not apply to—
</p>
<p class="NarrativeText" id="fc1f0d4d56acd27a18ba80ab0acfb9e9">
(1) Farming businesses.—F or this purpose, the term “farming business” 1s defined in section 263A(e)(4), but it also includes the raising, harvesting, or growing of trees to which section 263A(c)(5) applies. Notwithstanding this exception, section 447 requires certain C corporations and partnerships with a C corporation as a partner to use the accrual method.
<p class="NarrativeText" id="d268b0c2840319e1b229673523368cae">
(1) Farming businesses.—For this purpose, the term “farming business” 1s defined in section 263A(e)(4), but it also includes the raising, harvesting, or growing of trees to which section 263A(c)(5) applies. Notwithstanding this exception, section 447 requires certain C corporations and partnerships with a C corporation as a partner to use the accrual method.
</p>
<p class="NarrativeText" id="51dcb59cd362d0003f609fdb43fbdfdc">
(2) Qualified personal service corporations. — A “qualified personal service corporation” is any corporation: (a) substantially all of the activities of which involve the performance of services in the fields of health, law, engineering, architecture, accounting, actuarial science, performing arts, or consulting, and (b)
Expand All @@ -178,8 +178,8 @@ <h1 class="Title" id="80474543fe96478feeda72a22f019cd1">
<p class="NarrativeText" id="e4776aaec9edf7383c95941623c47ff6">
substantially all of the stock of which is owned by employees performing the services, retired employees who had performed the services, any estate of any individual who had performed the services listed above, or any person who acquired stock of the corporation as a result of the death of an employee or retiree described above if the acquisition occurred within 2 years of death.
</p>
<p class="NarrativeText" id="5f5c402f9ebefef3ba8eabf1b5f628b2">
(3) Entities with gross receipts of $5,000,000 or less. —To qualify for this exception, the C corporation's or partnership’s annual average gross receipts for the three years ending with the prior tax year may not exceed $5,000,000. If the corporation or partnership was not in existence for the entire 3-year period, the period of existence is used to determine whether the corporation or partnership qualifies. If any tax year in the 3-year period is a short tax year, the corporation or partnership must annualize the gross receipts by multiplying the gross receipts by 12 and dividing the result by the number of months in the short period.
<p class="NarrativeText" id="02eb85f4c80a008b9e03744e68528aff">
(3) Entities with gross receipts of $5,000,000 or less. —To qualify for this exception, the C corporation's or partnership’s annual average gross receipts for the three years ending with the prior tax year may not exceed $5,000,000. If the corporation or partnership was not in existence for the entire 3-year period, the period of existence is used to determine whether the corporation or partnership qualifies. If any tax year in the 3-year period is a short tax year, the corporation or partnership must annualize the gross receipts by multiplying the gross receipts by 12 and dividing the result by the number of months tn the short period.
</p>
<p class="NarrativeText" id="427e5fe33c8c181ccb93c7de11946c13">
For more information, see section 448 and Temporary Regulations section 1.448-1T.
Expand Down
Loading
Loading