smarty: double sets of quotes not working at end #1514

ChrisMayfield · 2025-03-18T15:57:20Z

The following example renders incorrectly:

import markdown

text = '''
He replied, "She said 'Hello.'"
'''

html = markdown.markdown(text, extensions=['smarty'])
print(html)

The output is:

<p>He replied, &ldquo;She said &lsquo;Hello.&rsquo;&ldquo;</p>

The “ at the end of this example should be ”.

The quote marks were replaced in this order:

openingSingleQuotesRegex
closingSingleQuotesRegex
openingDoubleQuotesRegex
remainingDoubleQuotesRegex

Unfortunately:

closingDoubleQuotesRegex didn't match, because the " is not followed by a space (it's the last character of the string, which is stripped prior to replacement).
closingDoubleQuotesRegex2 didn't match, because of the closing single quote replacement (i.e., the intermediate data in SubstituteTextPattern.handleMatch() contains \u0003 before the ").

One possible way to fix this issue might be to change closingDoubleQuotesRegex to r'"(?=\s|$)' (i.e., double quote followed by whitespace or end of string). However that doesn't work in general, so a special case is likely needed.

The text was updated successfully, but these errors were encountered:

facelessuser · 2025-03-18T16:08:00Z

One possible way to fix this issue might be to change closingDoubleQuotesRegex to r'"(?=\s|$)' (i.e., double quote followed by whitespace or end of string).

That doesn't seem like a bad solution.

However that doesn't work in general, so a special case is likely needed.

What do you mean by this?

ChrisMayfield · 2025-03-18T16:20:04Z

After posting the issue, I went back to what I was working on and realized my solution didn't work when wrapped in html tags, for example:

<span>He replied, "She said 'Hello.'"</span>

In this case, the closing double quote is not at the end of the string and is not followed by whitespace. So adding another regex similar to doubleQuoteSetsRe might be needed.

None of the other regex variables use $ so it's possible doing so for closingDoubleQuotesRegex might lead to unintended side effects.

facelessuser · 2025-03-18T16:27:30Z

I see, that makes sense. It may be that those tags are converted to placeholders creating a non-word boundary. I think it will need some investigation. I haven't taken a close look at the current pattern and its intentions. I'm sure there are some tricky cases where checking for space following the quote was important, I just don't know what all the cases are right now.

facelessuser · 2025-03-18T16:37:46Z

It seems situations like this are also problematic (non-word boundaries):

(He replied, "She said 'Hello.'")

My initial thought is it needs a non-word boundary check, but that might blow up some other cases. Definitely needs some further investigation to be sure.

Generally, IIRC, smarty quotes can be difficult to always get right the way we do them. Personally, I often turn off smarty quotes for these reasons, but I understand the desire to improve them and get them right (or at least better).

facelessuser · 2025-03-18T16:48:33Z

I don't think a word boundary check would be correct, but maybe a not word check (?=\s|\W|$). It seems to solve the the HTML case, the non-word case, and the end of buffer case without breaking any tests. I'm not sure if there are more cases I'm not considering though. I think @mitya57 was the original developer on this extension, so they would be the best to speak on this.

mitya57 · 2025-03-18T18:18:56Z

I don't think a word boundary check would be correct, but maybe a not word check (?=\s|\W|$). It seems to solve the the HTML case, the non-word case, and the end of buffer case without breaking any tests.

Hm, what exactly change did you try? I have just tried this:

--- a/markdown/extensions/smarty.py
+++ b/markdown/extensions/smarty.py
@@ -142,7 +142,7 @@ decadeAbbrRe = r"(?<!\w)'(?=\d{2}s)"
 openingDoubleQuotesRegex = r'%s"(?=\w)' % openingQuotesBase
 
 # Double closing quotes:
-closingDoubleQuotesRegex = r'"(?=\s)'
+closingDoubleQuotesRegex = r'"(?=\s|\W|$)'
 closingDoubleQuotesRegex2 = '(?<=%s)"' % closeClass
 
 # Get most opening single quotes:

but it does break one of the tests:

======================================================================
FAIL: test_basic (tests.test_syntax.extensions.test_smarty.TestSmarty.test_basic)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dmitry/upstream/python-markdown/tests/test_syntax/extensions/test_smarty.py", line 59, in test_basic
    self.assertMarkdownRenders(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        '"[Link](http://example.com)" --- she said.',
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        '<p>&ldquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>'
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/dmitry/upstream/python-markdown/markdown/test_tools.py", line 77, in assertMarkdownRenders
    self.assertMultiLineEqual(output, expected)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
AssertionError: '<p>&rdquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>' != '<p>&ldquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>'
- <p>&rdquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>
?     ^
+ <p>&ldquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>
?     ^

facelessuser · 2025-03-18T18:53:40Z

@mitya57 You are correct. I ran r'"(?=\s|\W)' and had not yet run r'"(?=\s|\W|$)'. The latter causes a test to fail.

ChrisMayfield · 2025-03-19T01:57:06Z

Thank you both for the suggestions. I can't think of a way to modify closingDoubleQuotesRegex without breaking the existing tests. And modifying closingDoubleQuotesRegex2 won't work either, because if a preceding single quote has already been replaced, closeClass will no longer match (due to \u0003). This leads me to believe that both quote marks ('" or "') need to be matched at the same time. I'll submit a PR with this approach.

ChrisMayfield added a commit to ChrisMayfield/markdown that referenced this issue Mar 19, 2025

Add special case for closing nested quotes; fixes Python-Markdown#1514

ee76bf3

mitya57 mentioned this issue Mar 19, 2025

Add special case for closing nested quotes; fixes #1514 #1515

Merged

waylan closed this as completed in 7aae61b Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smarty: double sets of quotes not working at end #1514

smarty: double sets of quotes not working at end #1514

ChrisMayfield commented Mar 18, 2025 •

edited

Loading

facelessuser commented Mar 18, 2025

ChrisMayfield commented Mar 18, 2025

facelessuser commented Mar 18, 2025

facelessuser commented Mar 18, 2025

facelessuser commented Mar 18, 2025

mitya57 commented Mar 18, 2025

facelessuser commented Mar 18, 2025

ChrisMayfield commented Mar 19, 2025

smarty: double sets of quotes not working at end #1514

smarty: double sets of quotes not working at end #1514

Comments

ChrisMayfield commented Mar 18, 2025 • edited Loading

facelessuser commented Mar 18, 2025

ChrisMayfield commented Mar 18, 2025

facelessuser commented Mar 18, 2025

facelessuser commented Mar 18, 2025

facelessuser commented Mar 18, 2025

mitya57 commented Mar 18, 2025

facelessuser commented Mar 18, 2025

ChrisMayfield commented Mar 19, 2025

ChrisMayfield commented Mar 18, 2025 •

edited

Loading