Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize regex in trimAdjacentBlankLines() method of ExtractedTextFormatter to prevent stack overflow #2248

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,9 @@
* An instance of this formatter can be customized using the {@link Builder} nested class.
*
* @author Christian Tzolov
* @author Iryna Kopchak
*/
public final class ExtractedTextFormatter {
public class ExtractedTextFormatter {

/** Flag indicating if the text should be left-aligned */
private final boolean leftAlignment;
Expand Down Expand Up @@ -84,7 +85,7 @@ public static ExtractedTextFormatter defaults() {
* @return Returns the same text but with blank lines trimmed.
*/
public static String trimAdjacentBlankLines(String pageText) {
return pageText.replaceAll("(?m)(^ *\n)", "\n").replaceAll("(?m)^$([\r\n]+?)(^$[\r\n]+?^)+", "$1");
return pageText.replaceAll("(?m)^(?:\\s*\\r?\\n)+", "\n");
}

/**
Expand Down