You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug description
When using TikaDocumentReader, I often encountered the following error:
java
Caused by: java.lang.StackOverflowError: null
at java.base/java.util.regex.Pattern$Caret.match(Pattern.java:3896)
at java.base/java.util.regex.Pattern$Curly.match1(Pattern.java:4597)
at java.base/java.util.regex.Pattern$Curly.match(Pattern.java:4546)
at java.base/java.util.regex.Pattern$Dollar.match(Pattern.java:3996)
at java.base/java.util.regex.Pattern$Caret.match(Pattern.java:3906)
at java.base/java.util.regex.Pattern$GroupHead.match(Pattern.java:4969).
After debugging, I found that the issue lies in the non-optimized regular expression from the trimAdjacentBlankLines() method in the ExtractedTextFormatter class. In my case, the problem occasionally occurred even with a files with small number of empty lines (~150) with default VM stack settings -Xmx8192m.
Steps to reproduce
For testing the occurrence of this error, I created an XLSX file with a large number of empty rows. I am attaching it to this issue. Stack_overflow_exception_test.xlsx
The text was updated successfully, but these errors were encountered:
Bug description
When using TikaDocumentReader, I often encountered the following error:
After debugging, I found that the issue lies in the non-optimized regular expression from the trimAdjacentBlankLines() method in the ExtractedTextFormatter class. In my case, the problem occasionally occurred even with a files with small number of empty lines (~150) with default VM stack settings -Xmx8192m.
Steps to reproduce
For testing the occurrence of this error, I created an XLSX file with a large number of empty rows. I am attaching it to this issue.
Stack_overflow_exception_test.xlsx
The text was updated successfully, but these errors were encountered: