Skip to content

smarty: double sets of quotes not working at end #1514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ChrisMayfield opened this issue Mar 18, 2025 · 8 comments
Closed

smarty: double sets of quotes not working at end #1514

ChrisMayfield opened this issue Mar 18, 2025 · 8 comments

Comments

@ChrisMayfield
Copy link
Contributor

ChrisMayfield commented Mar 18, 2025

The following example renders incorrectly:

import markdown

text = '''
He replied, "She said 'Hello.'"
'''

html = markdown.markdown(text, extensions=['smarty'])
print(html)

The output is:

<p>He replied, &ldquo;She said &lsquo;Hello.&rsquo;&ldquo;</p>

The &ldquo; at the end of this example should be &rdquo;.

The quote marks were replaced in this order:

  1. openingSingleQuotesRegex
  2. closingSingleQuotesRegex
  3. openingDoubleQuotesRegex
  4. remainingDoubleQuotesRegex

Unfortunately:

  • closingDoubleQuotesRegex didn't match, because the " is not followed by a space (it's the last character of the string, which is stripped prior to replacement).
  • closingDoubleQuotesRegex2 didn't match, because of the closing single quote replacement (i.e., the intermediate data in SubstituteTextPattern.handleMatch() contains \u0003 before the ").

One possible way to fix this issue might be to change closingDoubleQuotesRegex to r'"(?=\s|$)' (i.e., double quote followed by whitespace or end of string). However that doesn't work in general, so a special case is likely needed.

@facelessuser
Copy link
Collaborator

One possible way to fix this issue might be to change closingDoubleQuotesRegex to r'"(?=\s|$)' (i.e., double quote followed by whitespace or end of string).

That doesn't seem like a bad solution.

However that doesn't work in general, so a special case is likely needed.

What do you mean by this?

@ChrisMayfield
Copy link
Contributor Author

After posting the issue, I went back to what I was working on and realized my solution didn't work when wrapped in html tags, for example:

<span>He replied, "She said 'Hello.'"</span>

In this case, the closing double quote is not at the end of the string and is not followed by whitespace. So adding another regex similar to doubleQuoteSetsRe might be needed.

None of the other regex variables use $ so it's possible doing so for closingDoubleQuotesRegex might lead to unintended side effects.

@facelessuser
Copy link
Collaborator

I see, that makes sense. It may be that those tags are converted to placeholders creating a non-word boundary. I think it will need some investigation. I haven't taken a close look at the current pattern and its intentions. I'm sure there are some tricky cases where checking for space following the quote was important, I just don't know what all the cases are right now.

@facelessuser
Copy link
Collaborator

It seems situations like this are also problematic (non-word boundaries):

(He replied, "She said 'Hello.'")

My initial thought is it needs a non-word boundary check, but that might blow up some other cases. Definitely needs some further investigation to be sure.

Generally, IIRC, smarty quotes can be difficult to always get right the way we do them. Personally, I often turn off smarty quotes for these reasons, but I understand the desire to improve them and get them right (or at least better).

@facelessuser
Copy link
Collaborator

I don't think a word boundary check would be correct, but maybe a not word check (?=\s|\W|$). It seems to solve the the HTML case, the non-word case, and the end of buffer case without breaking any tests. I'm not sure if there are more cases I'm not considering though. I think @mitya57 was the original developer on this extension, so they would be the best to speak on this.

@mitya57
Copy link
Collaborator

mitya57 commented Mar 18, 2025

I don't think a word boundary check would be correct, but maybe a not word check (?=\s|\W|$). It seems to solve the the HTML case, the non-word case, and the end of buffer case without breaking any tests.

Hm, what exactly change did you try? I have just tried this:

--- a/markdown/extensions/smarty.py
+++ b/markdown/extensions/smarty.py
@@ -142,7 +142,7 @@ decadeAbbrRe = r"(?<!\w)'(?=\d{2}s)"
 openingDoubleQuotesRegex = r'%s"(?=\w)' % openingQuotesBase
 
 # Double closing quotes:
-closingDoubleQuotesRegex = r'"(?=\s)'
+closingDoubleQuotesRegex = r'"(?=\s|\W|$)'
 closingDoubleQuotesRegex2 = '(?<=%s)"' % closeClass
 
 # Get most opening single quotes:

but it does break one of the tests:

======================================================================
FAIL: test_basic (tests.test_syntax.extensions.test_smarty.TestSmarty.test_basic)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dmitry/upstream/python-markdown/tests/test_syntax/extensions/test_smarty.py", line 59, in test_basic
    self.assertMarkdownRenders(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        '"[Link](http://example.com)" --- she said.',
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        '<p>&ldquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>'
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/dmitry/upstream/python-markdown/markdown/test_tools.py", line 77, in assertMarkdownRenders
    self.assertMultiLineEqual(output, expected)
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
AssertionError: '<p>&rdquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>' != '<p>&ldquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>'
- <p>&rdquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>
?     ^
+ <p>&ldquo;<a href="http://example.com">Link</a>&rdquo; &mdash; she said.</p>
?     ^

@facelessuser
Copy link
Collaborator

@mitya57 You are correct. I ran r'"(?=\s|\W)' and had not yet run r'"(?=\s|\W|$)'. The latter causes a test to fail.

ChrisMayfield added a commit to ChrisMayfield/markdown that referenced this issue Mar 19, 2025
@ChrisMayfield
Copy link
Contributor Author

Thank you both for the suggestions. I can't think of a way to modify closingDoubleQuotesRegex without breaking the existing tests. And modifying closingDoubleQuotesRegex2 won't work either, because if a preceding single quote has already been replaced, closeClass will no longer match (due to \u0003). This leads me to believe that both quote marks ('" or "') need to be matched at the same time. I'll submit a PR with this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants