Skip to content

[libclang/python] Expose Rewriter to the python binding #77269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 29, 2024

Conversation

jimmy-zx
Copy link
Contributor

@jimmy-zx jimmy-zx commented Jan 8, 2024

Exposes CXRewriter API to the python binding as class Rewriter.

Copy link

github-actions bot commented Jan 8, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jan 8, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2024

@llvm/pr-subscribers-clang

Author: Jimmy Z (jimmy-zx)

Changes

Exposes CXRewriter API to the python binding as in 69e5abb.


Full diff: https://github.com/llvm/llvm-project/pull/77269.diff

2 Files Affected:

  • (modified) clang/bindings/python/clang/cindex.py (+62)
  • (added) clang/bindings/python/tests/cindex/test_rewrite.py (+75)
diff --git a/clang/bindings/python/clang/cindex.py b/clang/bindings/python/clang/cindex.py
index d780ee353a133c..ced449180d98fc 100644
--- a/clang/bindings/python/clang/cindex.py
+++ b/clang/bindings/python/clang/cindex.py
@@ -3531,6 +3531,61 @@ def cursor(self):
         return cursor
 
 
+class Rewriter(ClangObject):
+    """
+    The Rewriter is a wrapper class around clang::Rewriter
+
+    It enables rewriting buffers.
+    """
+
+    @staticmethod
+    def create(tu):
+        """
+        Creates a new Rewriter
+        Parameters:
+        tu -- The translation unit for the target AST.
+        """
+        return Rewriter(conf.lib.clang_CXRewriter_create(tu))
+
+    def __init__(self, ptr):
+        ClangObject.__init__(self, ptr)
+
+    def __del__(self):
+        conf.lib.clang_CXRewriter_dispose(self)
+
+    def insertTextBefore(self, loc, insert):
+        """
+        Insert the specified string at the specified location in the original buffer.
+        """
+        conf.lib.clang_CXRewriter_insertTextBefore(self, loc, insert)
+
+    def replaceText(self, toBeReplaced, replacement):
+        """
+        This method replaces a range of characters in the input buffer with a new string.
+        """
+        conf.lib.clang_CXRewriter_replaceText(self, toBeReplaced, replacement)
+
+    def removeText(self, toBeRemoved):
+        """
+        Remove the specified text region.
+        """
+        conf.lib.clang_CXRewriter_removeText(self, toBeRemoved)
+
+    def overwriteChangedFiles(self):
+        """
+        Save all changed files to disk.
+
+        Returns 1 if any files were not saved successfully, returns 0 otherwise.
+        """
+        return conf.lib.clang_CXRewriter_overwriteChangedFiles(self)
+
+    def writeMainFileToStdOut(self):
+        """
+        Writes the main file to stdout.
+        """
+        conf.lib.clang_CXRewriter_writeMainFileToStdOut(self)
+
+
 # Now comes the plumbing to hook up the C library.
 
 # Register callback types in common container.
@@ -3596,6 +3651,13 @@ def cursor(self):
     ("clang_codeCompleteGetNumDiagnostics", [CodeCompletionResults], c_int),
     ("clang_createIndex", [c_int, c_int], c_object_p),
     ("clang_createTranslationUnit", [Index, c_interop_string], c_object_p),
+    ("clang_CXRewriter_create", [TranslationUnit], c_object_p),
+    ("clang_CXRewriter_dispose", [Rewriter]),
+    ("clang_CXRewriter_insertTextBefore", [Rewriter, SourceLocation, c_interop_string]),
+    ("clang_CXRewriter_overwriteChangedFiles", [Rewriter], c_int),
+    ("clang_CXRewriter_removeText", [Rewriter, SourceRange]),
+    ("clang_CXRewriter_replaceText", [Rewriter, SourceRange, c_interop_string]),
+    ("clang_CXRewriter_writeMainFileToStdOut", [Rewriter]),
     ("clang_CXXConstructor_isConvertingConstructor", [Cursor], bool),
     ("clang_CXXConstructor_isCopyConstructor", [Cursor], bool),
     ("clang_CXXConstructor_isDefaultConstructor", [Cursor], bool),
diff --git a/clang/bindings/python/tests/cindex/test_rewrite.py b/clang/bindings/python/tests/cindex/test_rewrite.py
new file mode 100644
index 00000000000000..2c5f904ce50bdc
--- /dev/null
+++ b/clang/bindings/python/tests/cindex/test_rewrite.py
@@ -0,0 +1,75 @@
+import sys
+import io
+import unittest
+import tempfile
+
+from clang.cindex import (
+    Rewriter,
+    TranslationUnit,
+    Config,
+    File,
+    SourceLocation,
+    SourceRange,
+)
+
+
+class TestRewrite(unittest.TestCase):
+    code = """
+int test1;
+
+void test2(void);
+
+int f(int c) {
+    return c;
+}
+"""
+
+    def setUp(self):
+        self.tmp = tempfile.NamedTemporaryFile(suffix=".cpp", buffering=0)
+        self.tmp.write(TestRewrite.code.encode("utf-8"))
+        self.tmp.flush()
+        self.tu = TranslationUnit.from_source(self.tmp.name)
+        self.rew = Rewriter.create(self.tu)
+        self.file = File.from_name(self.tu, self.tmp.name)
+
+    def tearDown(self):
+        self.tmp.close()
+
+    def test_insert(self):
+        snip = "#include <cstdio>\n"
+
+        beginning = SourceLocation.from_offset(self.tu, self.file, 0)
+        self.rew.insertTextBefore(beginning, snip)
+        self.rew.overwriteChangedFiles()
+
+        with open(self.tmp.name, "r", encoding="utf-8") as f:
+            self.assertEqual(f.read(), snip + TestRewrite.code)
+
+    def test_replace(self):
+        pattern = "test2"
+        replacement = "func"
+
+        offset = TestRewrite.code.find(pattern)
+        pattern_range = SourceRange.from_locations(
+            SourceLocation.from_offset(self.tu, self.file, offset),
+            SourceLocation.from_offset(self.tu, self.file, offset + len(pattern)),
+        )
+        self.rew.replaceText(pattern_range, replacement)
+        self.rew.overwriteChangedFiles()
+
+        with open(self.tmp.name, "r", encoding="utf-8") as f:
+            self.assertEqual(f.read(), TestRewrite.code.replace(pattern, replacement))
+
+    def test_remove(self):
+        pattern = "int c"
+
+        offset = TestRewrite.code.find(pattern)
+        pattern_range = SourceRange.from_locations(
+            SourceLocation.from_offset(self.tu, self.file, offset),
+            SourceLocation.from_offset(self.tu, self.file, offset + len(pattern)),
+        )
+        self.rew.removeText(pattern_range)
+        self.rew.overwriteChangedFiles()
+
+        with open(self.tmp.name, "r", encoding="utf-8") as f:
+            self.assertEqual(f.read(), TestRewrite.code.replace(pattern, ""))

@jimmy-zx
Copy link
Contributor Author

jimmy-zx commented Jan 9, 2024

Pinging @AaronBallman and @jyknight for review. (sorry if I chose the wrong people)

@AaronBallman AaronBallman requested a review from Endilll January 16, 2024 19:59
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look reasonable to me as far as they go, but I don't have much expertise in Python. I added @Endilll to hopefully help give it a second set of eyes.

You should add a release note (clang/docs/ReleaseNotes.rst) about the new bindings so others know that they've been improved.

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.
It's good that you wrote your own tests, but it would also be nice to mirror tests in clang/unittests/libclang/LibclangTest.cpp which test the same API. This way we can identify issues in binding layer itself (when C++ tests pass, but Python tests don't, or vise versa)

@Endilll Endilll added the clang:as-a-library libclang and C++ API label Jan 17, 2024
@jimmy-zx
Copy link
Contributor Author

Looks good overall. It's good that you wrote your own tests, but it would also be nice to mirror tests in clang/unittests/libclang/LibclangTest.cpp which test the same API. This way we can identify issues in binding layer itself (when C++ tests pass, but Python tests don't, or vise versa,)

I have reviewed the tests for libclang and it appears that there are already tests for the rewriter, which are more extensive than the ones I wrote. Therefore, I have decided to mirror the tests from libclang in the Python binding. Please let me know if this approach is appropriate.

@Endilll
Copy link
Contributor

Endilll commented Jan 21, 2024

I have reviewed the tests for libclang and it appears that there are already tests for the rewriter, which are more extensive than the ones I wrote. Therefore, I have decided to mirror the tests from libclang in the Python binding. Please let me know if this approach is appropriate.

If you think that existing tests fully cover yours, then it's appropriate. Otherwise, we'd benefit from additional test coverage, but it's not necessarily in the scope of this PR.

@jimmy-zx
Copy link
Contributor Author

Both tests are just invoking the functionalities of the rewriter on a sample code. I think additional tests for libclang does not provide extra coverage (as it is just the front-end). Thus I mirrored the tests back to the Python binding in 7a4ebf9.

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
PR description is going to become a commit message after merging. Let me know when it's ready.

@jimmy-zx
Copy link
Contributor Author

Just updated the description a bit and I believe it's all set for merging.

@linux4life798
Copy link
Contributor

Hi all, sorry for jumping in late. Minor nit here.

It looks like this change is introducing camelCase methods, but the majority of the existing Python file uses snake_case, even though the underlying libclang interface is camelCase. For consistency, can we make these interface functions snake_case or is there some overriding style guide that wants camelCase?

@jimmy-zx
Copy link
Contributor Author

@linux4life798 Thanks for pointing out! I've updated the naming to conform snake_case.

@linux4life798
Copy link
Contributor

Thanks!

@Endilll
Copy link
Contributor

Endilll commented Jan 22, 2024

@linux4life798 Nice catch! I see that so far only CompilationDatabase and CompletionChunk expose functions in camelCase. It would be nice to change them, too, but those are decade-old APIs that I think we promise stability for, so it might not be possible.

@linux4life798
Copy link
Contributor

linux4life798 commented Jan 23, 2024

@linux4life798 Nice catch! I see that so far only CompilationDatabase and CompletionChunk expose functions in camelCase. It would be nice to change them, too, but those are decade-old APIs that I think we promise stability for, so it might not be possible.

No worries. Of course unrelated to this change, we might be able to add the snake_case equivalent binding and marking the old ones as depreciated.

@linux4life798
Copy link
Contributor

I don't think this needs to block this change, but another concern I have about this API is that it is writing to stdout from C++ intermixed with Python writing to its own buffered stdout. This means that the user may see inconsistent output. This is somewhat a deficiency of libclang Rewrite API, since there is only one specific function to "write the main file out" and that is through stdout.

In an attempt to keep the world consistent, we may want to consider flushing the Python stdout buffer (sys.stdout.flush()) before calling clang_CXRewriter_writeMainFileToStdOut. I would hope that clang_CXRewriter_writeMainFileToStdOut flushes its own C++ stdout buffer after writing the file contents, but I honestly haven't looked at the code.

@Endilll
Copy link
Contributor

Endilll commented Jan 23, 2024

@linux4life798 Do you mind filing issues for the points you highlighted? Feel free to skip this step if you (or someone else) plan to submit PRs that address them.

@jimmy-zx
Copy link
Contributor Author

Added flush before calling writeMainFileToStdOut.

Personally I didn't find this function particularly useful: I was hoping to perform rewriting without creating new files, but seems like capturing stdout from a shared library requires some complex operation (dup2 stuff).

Rewriter::getRewrittenText() might be a more interesting interface to interact with, which returns the rewritten content as a string. (This is defined in clang but not exposed by libclang yet).

std::string Rewriter::getRewrittenText(CharSourceRange Range) const {

@Endilll
Copy link
Contributor

Endilll commented Jan 24, 2024

@linux4life798 Can you review the last update?

@linux4life798
Copy link
Contributor

Added flush before calling writeMainFileToStdOut.

I think that's reasonable.

Personally I didn't find this function particularly useful: I was hoping to perform rewriting without creating new files, but seems like capturing stdout from a shared library requires some complex operation (dup2 stuff).

Rewriter::getRewrittenText() might be a more interesting interface to interact with, which returns the rewritten content as a string. (This is defined in clang but not exposed by libclang yet).

std::string Rewriter::getRewrittenText(CharSourceRange Range) const {

I agree. This should probably be exposed to libclang (and then to this python binding).

@linux4life798
Copy link
Contributor

linux4life798 commented Jan 27, 2024

@linux4life798 Can you review the last update?

This change LGTM (I don't have reviewer rights).

@jimmy-zx
Copy link
Contributor Author

Ping. Can one of you press the merge button?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:as-a-library libclang and C++ API clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants