Skip to content

Remove nvim-treesitter dependency #390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

TheLeoP
Copy link

@TheLeoP TheLeoP commented May 30, 2025

I started working on this without realizing that #330 existed :p

This PR:

  • Removes the nvim-treesitter dependency (and hence, it works perfectly with its main branch).
  • Only memoizes the computed matches using the buffer, language and id of the node (tied to it's state) as hash. Using buftick has proven to cause issues like in Cached and memoize values are incorrect when buffer changes without modifying b:changedtick nvim-treesitter/nvim-treesitter#4902 and other function do not have access to the id of the root of the tree, which is the hashable value that tells us if the tree of a buffer has changed.
  • Adds a bunch of lua annotations to ease contributing to the Lua side of the codebase.
  • Cleans the lua setup wrapper (that simply defines g: vimscript variables).
  • Uses a single interface for configuration (g: vimscript variables).
  • Removes the dependency on the Neovim highlighter (outside of the syntax.lua module). Now this plugin parses the buffer if necesary (and could even use the g:matchup_delim_stopline if we could know which window is being used to get the cursor location. An option would be to always use the current window, but I don't know enough of the codebase to say if that would work).
  • format all queries using https://github.com/ribru17/ts_query_ls
  • remove the dependency on the old make-range directive. Instead, it relies on the fact that treesitter matches can capture multiple nodes under the same name. So, a range can be obtained from the first and last nodes with the same name on a given match
  • adds a small query to use treesitter to % between the start and end of Lua tables because I was getting mad at an edge case that the regex based engine couldn't catch on a file in my personal config (https://github.com/TheLeoP/nvim-config/blob/1ba78ab0a268fd5f8e3dc371a82e1af44c829fee/lua/personal/emmet.lua#L27-L51)

Just memoize was copied from nvim-treesitter, the whole query.lua dependency was removed in favor of a smaller (and non-nvim-treesitter dependant) implementation of the functions used by this plugin.

This PR does not

  • modify tests
  • modify Github actions
  • add a Dockerfile for tests
  • add a Makefile for creating a docker image for tests

This PR lacks documentation for the changes introduced in it, but I wanted to see what everyone thinks of it before commit further to it.

Additionally, since treesitter parsing can now be async, I could make use of it in follow up PR if deemed necessary.

@andymass
Copy link
Owner

Thank you for contributing 🙏! Looks like are some merits to both this and #330, I will take bits of both. Since they are extensive I will likely merge them in bits and pieces where possible.

Both PRs have mentioned performance issues (which likely do exist in head already) and which are not yet solved by either.

#330 (comment)

Additionally, since treesitter parsing can now be async, I could make use of it in follow up PR if deemed necessary.

I am curious which approach do we think is more viable? Can you give any more detail on what changes would be required for async parsing (no need for a full PR for that yet)?

@TheLeoP
Copy link
Author

TheLeoP commented May 31, 2025

Thank you for contributing 🙏! Looks like are some merits to both this and #330, I will take bits of both. Since they are extensive I will likely merge them in bits and pieces where possible.

Awesome c:.

Both PRs have mentioned performance issues (which likely do exist in head already) and which are not yet solved by either.

I actually was talking about performance in an abstract way (because I haven't had any performance issues using vim-matchup). Treesitter parsing can be expensive for huge buffers and/or multiple language injections. I would actually love to have some files that may have given others performance problems for this plugins in order to look into in and try to fix the performance issues.

I am curious which approach do we think is more viable? Can you give any more detail on what changes would be required for async parsing (no need for a full PR for that yet)?

To make the parsing async, we would need to modify

https://github.com/theleop/vim-matchup/blob/289564e2dad1b7834cc954525c6697387e36a7b0/lua/treesitter-matchup/internal.lua#L137

To take a callback (for when the parsing has finished) as a second argument. This would mean that get_matches wouldn't be able to synchronously return a list of matches, which could be solved in two ways:

  • make get_matches also receive a callback as a parameter and call it with the list of matches
  • use lua coroutines to make functions async-await-ish

That same line also has a comment mentioning that instead of parsing the whole buffer, we could use g:matchup_delim_stopline to parse a range around the cursor. This should be possible by either mapping the bufnr to a winid or always using the current window (I don't know if there's a situation where the plugin parses a non-current window). #330 mentioned that using a range for the parse would cause issues with how cache is being handled, but I think that shouldn't be a problem with the changes on cache handling in this PR. If there were changes on the root node, it's id will change, so the cache won't get in the way of only parsing a certain range of a given buffer.

I should also mention that async parsing wouldn't make the plugin faster, it would simply avoid freezing Neovim while treesitter is parsing the buffer. Something that could actually make the plugin faster is parsing a given range instead of the whole buffer (like I mentioned above).

@Slotos
Copy link

Slotos commented Jun 10, 2025

Removes the dependency on the Neovim highlighter (outside of the syntax.lua module). Now this plugin parses the buffer if necesary (and could even use the g:matchup_delim_stopline if we could know which window is being used to get the cursor location. An option would be to always use the current window, but I don't know enough of the codebase to say if that would work).

If I could hug you right now, I would. For this and for cleanup around old attach/detach routines - I totally overlooked them.


Slotos#1 - this is my test branch/PR that I use for CI runs. I have a change that parses on a cursor there, which resolves some performance issues, but there's still a failure cascade in a "no match" handling, which can take a few minutes on a file I'm verifying against (.deps/build/src/treesitter-vim/src/parser.c in neovim source after make).

It also works correctly with injections, whereas parse(nil) avoids those. Neither works well on initial parse of a large file.

The problem with memoizing results when parsing on a cursor position is that it introduces memory bloat. Most of memoized values will be used in one user action (there are multiple calls associated with it) never to be touched again.
I'm planning to try a limited depth memoization (something akin to ring buffer with random access) to see if that can help. Still, it's not the primary performance hog I'm looking at right now - even unmemoized functions work perfectly fine. The time seems to be lost in the loop in if opts.direction == 'current' then branch of get_delim function.

LanguageTree:parse() also turned out not to be the fastest growing TS function, Query:iter_matches() takes that crown.

Anyways, if you see good bits of my PR and want to merge them into yours - please do so. Ping me if you need help or just a rubber duck - I'll try to contribute as much as time allows.

@andymass
Copy link
Owner

andymass commented Jun 18, 2025

@Slotos any concerns if I go ahead and merge this PR? Based on your comments regarding "cleanup around old attach/detach routines" it seems this will be a bit cleaner to build on top of than your original. I would like to take your CI/CD updates, though. Just want to make sure any performance improvements you are planning could be built on top of this CR and not make it difficult for you.

@TheLeoP Any guidance for users to migrate? It seems they just need to replace a call from nvim-treesitter.setup to matchup.setup, I will have to add some readme section on this migration. Doing nothing it seems the user's config would break right?

@TheLeoP
Copy link
Author

TheLeoP commented Jun 18, 2025

Any guidance for users to migrate?

Users simply need to add

vim.g.matchup_treesitter_enabled = true

or

g:matchup_treesitter_enabled = v:true

to their configs. They could also use the lua interface if they want to

require("match-up").setup {
  treesitter = {
    enabled = true,
  },
}

It's worth noting that I'm using either Lua's true and false or Neovims v:true and v:fase instead of 1 and 0 like on other options.

Doing nothing it seems the user's config would break right?

It shouldn't break, unless I'm missing something. It'll stop using treesitter by default and rely on the regex engine unless they enable treesitter as mentioned above. I don't know how nvim-treesitter on their master branch handles the configuration of non-existent modules, thought. So I'm not sure if that may throw errors for users configurations.

@Slotos
Copy link

Slotos commented Jun 19, 2025

There are three issues that pop out to me:

  • Use of parser:parse(nil) will yield different results depending on presence of treesitter highlighting. Either the code that loops over all trees needs to be removed, or parser needs to parse with injections.
  • I'm a strong opponent of require.setup. It has no reason to exist nearly ever, and most certainly not in vim-matchup. Editor plugins should just work, offering sane defults and an unobtrusive way to alter them. A lithmus test I use is simple - if I remove the plugin, continuous presence of its configuration should not matter. require.setup breaks that.
  • memoize memory usage will keep on growing with the number of buffers opened in an editing session. I don't think it's a warranted replacement for memoize_by_buf_tick, which actively tries to manage memory usage.

As a personal opinion, I believe treesitter functionality should be enabled by default. While treesitter is said to be experimental in neovim docs, its functionality has been a stable presence over the years, and all changes to vim.treesitter interface had been made with an ample forewarning.

@TheLeoP
Copy link
Author

TheLeoP commented Jun 19, 2025

Use of parser:parse(nil) will yield different results depending on presence of treesitter highlighting. Either the code that loops over all trees needs to be removed, or parser needs to parse with injections.

Oh, I missed that, you are right. I first though that parsing the whole buffer including injections would be too costly. As mentioned previously, parsing only a certain range around the cursor is also feasible. On the case of parsing a range, iterating over all language trees would still be the right thigh to do.

I'm a strong opponent of require.setup. It has no reason to exist nearly ever, and most certainly not in vim-matchup. Editor plugins should just work, offering sane defults and an unobtrusive way to alter them. A lithmus test I use is simple - if I remove the plugin, continuous presence of its configuration should not matter. require.setup breaks that.

It was already present in the plugin. It was just not documented and had some edge cases that I cleaned up. It's by no means necessary and I'm not against removing it.

Sane defaults are there whether setup is called or not, though.

memoize memory usage will keep on growing with the number of buffers opened in an editing session. I don't think it's a warranted replacement for memoize_by_buf_tick, which actively tries to manage memory usage.

memoize uses a week table for cache. This means that Lua will garbage collect it whenever the only references to something in the table are in the table itself. So, memory shouldn't be a problem. Furthermore, the function is taken from the nvim-treesitter repo itself, is what they used to replace memoize_by_buf_tick. Besides, as I already mentioned, memoize_by_buf_tick had issues when the buffer changed but the buf_tick didn't.

As a personal opinion, I believe treesitter functionality should be enabled by default

Agree, but I though the sanest default would be to disable it and let users opt-in (just like it worked before).

@andymass
Copy link
Owner

I'm a strong opponent of require.setup.. Editor plugins should just work

I agree- the treesitter capability should be enabled by default now and controlled just by g: variables (like in @TheLeoP's PR) as this plugin has for every other functionality. However, I think we should still offer an optional setup function which simply updates the corresponding g: variables because many neovim users expect this convenience.

This would mean setting vim.g.matchup_treesitter_enabled = true by default.

It also works correctly with injections, whereas parse(nil)

Thanks, got it, we need @Slotos's version of this part.

@peter-bread
Copy link

Hey! Stumbled across this:

local function f(a) end

f(function()
  local x = "("
end)

The open bracket in the string "(" is matching with the close bracket of the function call end). Likewise if its changed to be ")", then that matches with the open bracket of the function call f(.

This is with vim.g.matchup_treesitter_enabled = true.

This is not the case on master (using nvim-treesitter master branch too) or using the fork in #330.

Seems to have been introduced in 829df75. Unfortunately I'm not familiar enough with treesitter or this project to dig much deeper :(

Thanks for the great work!

@TheLeoP
Copy link
Author

TheLeoP commented Jun 24, 2025

@peter-bread that's not because of this PR (as far as I can tell). That happens simply because there is no treesitter capture for a function call and the regex engine fails to correctly find the closing parenthesis.

If you add the following capture to after/queries/lua/matchup.scm, it works as expected

(function_call
  (arguments
    "(" @open.call
    ")" @close.call)) @scope.call

@peter-bread
Copy link

Thanks! Also found these:

C:

int main() {
  printf("}");    // } matches with { opening the block
}
int main() {
  printf(")");    // ) matches with ( opening the function call
}

Go:

func main() {
  fmt.Println("}")    // } matches with { opening the block
}
func main() {
  fmt.Println(")")    // ) matches with ( opening the function call
}

Bash:

foo() {
  echo "}"    # } matches with { opening the block
}

These seem to be fixed with the following queries, based off the one in this comment:

C:

(compound_statement
  "{" @open.block
  "}" @close.block) @scope.block
(argument_list
  "(" @open.call
  ")" @close.call) @scope.call

Go:

(block
  "{" @open.block
  "}" @close.block) @scope.block
(argument_list
  "(" @open.call
  ")" @close.call) @scope.call

Bash:

(compound_statement
  "{" @open.block
  "}" @close.block) @scope.block

I'm assuming this is the case for other languages but I haven't tested any more.

In all cases (with and without the extra queries), putting the cursor over the bracket inside a string highlights the incorrect match. However (with the queries), after pressing % to get to the outer brackets, matching then works as expected.

image image image

These images show what happens as you press % with the new Lua query, starting with the cursor on ")". Why is there a one-way match from the closing bracket in the string to the open bracket in the function call?

After some quick testing I found that these queries aren't needed on master or in #330.

I realise this is a bit of a weird edge case but I'm wondering if more of the supported languages might need extra queries or if there's been some other fundamental change.

@TheLeoP
Copy link
Author

TheLeoP commented Jun 24, 2025

Thanks for the example @peter-bread , I now understand the issue properly.

The issue was an oversight on my side when modifying the vimscript side of the codebase. It should be working in the same was as master or #330 in this particular case.

@peter-bread
Copy link

Hello again!

Strange one this time.

There seems to be a bug triggered by the builtin gcc comment line keymap with vim-matchup + treesitter enabled. It is only happening when the parser for a language is installed/available and only on non-Lua files (I've only check Lua, C, Go and Haskell; Lua is the only one that is not affected).

I've tested this on Neovim 0.11.2 on both MacOS and Ubuntu via WSL2.

With a minimal configuration, this happens on the second use of gcc. On my main config, it is less predictable.

Minimal repro config
-- Bootstrap lazy.nvim
local lazypath = vim.fn.stdpath("data") .. "/lazy/lazy.nvim"
if not (vim.uv or vim.loop).fs_stat(lazypath) then
	local lazyrepo = "https://github.com/folke/lazy.nvim.git"
	local out = vim.fn.system({ "git", "clone", "--filter=blob:none", "--branch=stable", lazyrepo, lazypath })
	if vim.v.shell_error ~= 0 then
		vim.api.nvim_echo({
			{ "Failed to clone lazy.nvim:\n", "ErrorMsg" },
			{ out, "WarningMsg" },
			{ "\nPress any key to exit..." },
		}, true, {})
		vim.fn.getchar()
		os.exit(1)
	end
end
vim.opt.rtp:prepend(lazypath)

-- Setup lazy.nvim
require("lazy").setup({
	spec = {
		{
			url = "https://github.com/TheLeoP/vim-matchup",
			branch = "update-treesitter",
		},
		{
			"nvim-treesitter/nvim-treesitter",
			branch = "main",
		},
	},
})

I traced the issue back to this line in internal.lua. After a quick search through the code, I think M.is_enabled is called in a FileType * autocmd. I think this autocmd is triggering on some kind of hidden/internal buffer. When this buffer is not loaded, get_parser throws an error. Even though { error = false } is used, that does not suppress this error specifically. See neovim source code.

I added some logging to M.is_enabled to help see what exactly was going on.

Logging code
function M.is_enabled(bufnr)
	bufnr = bufnr or api.nvim_get_current_buf()
	local lang = ts.language.get_lang(vim.bo[bufnr].filetype)
	if not lang then
		return false
	end

+	local info = {
+		bufnr = bufnr,
+		name = vim.api.nvim_buf_get_name(bufnr),
+		filetype = vim.bo[bufnr].filetype,
+		buftype = vim.bo[bufnr].buftype,
+		bufhidden = vim.bo[bufnr].bufhidden,
+		swapfile = vim.bo[bufnr].swapfile,
+		modifiable = vim.bo[bufnr].modifiable,
+		readonly = vim.bo[bufnr].readonly,
+		loaded = vim.api.nvim_buf_is_loaded(bufnr),
+		listed = vim.bo[bufnr].buflisted,
+		lines = vim.api.nvim_buf_get_lines(bufnr, 0, 5, false),
+		changedtick = vim.api.nvim_buf_get_changedtick(bufnr),
+	}
+
+	print(vim.inspect(info))

	local _, err = ts.get_parser(bufnr, nil, { error = false })
	if err then
		return false
	end
	return is_enabled(lang, bufnr)
end

I was also careful to use print(vim.inspect(...)) here. I use snacks.notifier for a prettier vim.notify, but this plugin opens more buffers. If a vim.notify call is made in M.is_enabled, it starts an infinite loop of checking every new notification buffer and throwing the same error on all of them. As long as this function does not create any buffers this is not an issue.

Below are two videos, one from a Lua file (which works fine) and one from a Go file (which throws an error):

Lua:

lua.mov

Go:

go.mov

In both cases, the main file is buffer 1, as seen in the first logging output. After pressing gcc, there is a second log. In both cases, it is for buffer 2, and the buffer is hidden, nameless, and buftype = 'nofile'. However, for some reason the Lua one is loaded, but the Go one is not loaded. The buffer not being loaded then causes the get_parser error. Interestingly, after that, every subsequent use of gcc throws the same error, but with the bufnr increasing by 1 each time. I have no idea why the Lua buffer is loaded versus all the other languages I've tried which are not loaded. I also do not know where these hidden buffers are coming from; perhaps they're are used internally during the commenting process(?).

A quick fix is just to put the following check somewhere in M.is_enabled before ts.get_parser is called:

if not vim.api.nvim_buf_is_loaded(bufnr) then
	return false
end

Setting vim.g.matchup_treesitter_disabled = { "markdown", "go", "haskell" } does not have any effect as this check happens after the get_parser call. Also, even with this bug, matchup is working properly on the intended file (i.e. main.go). It's just the hidden buffers causing issues.

Questions:

  • Is this a good enough fix?
  • Should vim-matchup be more selective when checking buffers (e.g. should it be checking buffers that are 'nofile' or hidden or unnamed)?

Thanks!

@TheLeoP
Copy link
Author

TheLeoP commented Jun 29, 2025

Is this a good enough fix?

In my opinion, it looks good enough. I also don't know why the builtin gcc is creating unloaded buffers, but given that treesitter parsers can't be created for unloaded buffers, this check seems appropriate.

Should vim-matchup be more selective when checking buffers (e.g. should it be checking buffers that are 'nofile' or hidden or unnamed)?

I lack context of the bigger vim-matchup codebase, specially on the vimscript side. But that seems reasonable.

It's strange that the Filetype autocmd is being triggered in an unloaded buffer, though. Maybe worth further investigation and/or opening an issue in the Neovim repo itself. Specially because this is being triggered by the builtin commenting functionality.

@TheLeoP
Copy link
Author

TheLeoP commented Jun 29, 2025

btw, thanks for the detailed research and report. It helps a lot, @peter-bread :D

@andymass
Copy link
Owner

Thanks for your extensive and continued work @TheLeoP! Do you think you are you able to handle the key issues @Slotos raised (potentially using code from @Slotos ’s branch)? (Note I think the only blocking one at this point is regarding parsing and injections)

@TheLeoP
Copy link
Author

TheLeoP commented Jun 29, 2025

Thanks for your extensive and continued work @TheLeoP! Do you think you are you able to handle the key issues @Slotos raised (potentially using code from @Slotos ’s branch)? (Note I think the only blocking one at this point is regarding parsing and injections)

Of course. I'll be working on doing it

@TheLeoP TheLeoP force-pushed the update-treesitter branch from d955af1 to 8904121 Compare June 29, 2025 20:31
@TheLeoP
Copy link
Author

TheLeoP commented Jun 29, 2025

The parsing issue should be fixed now. I also added documentation for the new g:matchup_treesitter variables and modified the README a bit. I noticed that my autoformatter changed more of the README than I intended to, so let me know if that's an issue in order to fix it @andymass .

I chose 400 as the default value for g:matchup_treesitter_stopline because it may have a high impact in performance and it's already the default value of g:matchup_matchparen_stopline.

I also rebased the branch on top of master and force-pushed it

@Slotos
Copy link

Slotos commented Jun 30, 2025

I tried testing things on a fresh NVIM_APPNAME, and discovered that the breakage was due to me using nvim-nightly.

As an aside, I noticed something while trying to figure out what I was doing wrong. You're using uv.gettimeofday() for timing, which returns time that can go backwards. Consider uv.now() or uv.clock_gettime("monotonic") instead.


Brief explanation of my previous statement - function get_memoized_matches has two global states that influence its outcome: tree root state that arises from parsing on range around cursor position and matchup queries.

That said, I've been throwing various scenarios at the problem for an hour or so and I don't see a reasonable one where parsing on a range would create a realistic breakage.

Maybe there's something possible with disjointed injections (disconnected chunks of code in different language that are joined into a single tree), but I'm not even sure whether they are still a thing, so it's best to leave it until and if a real breakage happens.

I also remember that range parsing is used to optimize syntax highlighting, which I took to mean that parse can yield a partial tree, but that's a big assumption on my part that I never took time to properly verify.


Finally, I apologize for stalling things. This branch is in a great state, you've meticulously cleaned obsolete code pathways, and in my opinion it can be merged as is.

I won't say no to a fix for nightly breakage, but I took a bleeding edge path with full knowledge of its pitfalls and I can fix things at my own pace with no complaints.

@andymass
Copy link
Owner

Is it ok if I add a bunch of these kind of captures @andymass ?

certainly, no concern here

@TheLeoP
Copy link
Author

TheLeoP commented Jul 1, 2025

I tried testing things on a fresh NVIM_APPNAME, and discovered that the breakage was due to me using nvim-nightly.

Don't worry. nvim nightly has had a lot of regressions around treesitter lately. One that affected me recently was neovim/neovim#34631

As an aside, I noticed something while trying to figure out what I was doing wrong. You're using uv.gettimeofday() for timing, which returns time that can go backwards. Consider uv.now() or uv.clock_gettime("monotonic") instead.

That's code that I haven't touch, so I don't feel confident doing that change. Thanks for the suggestion anyway c: .

Finally, I apologize for stalling things. This branch is in a great state, you've meticulously cleaned obsolete code pathways, and in my opinion it can be merged as is.

There's no need to apologize. Thanks for your help and I appreciate the effort of making sure the PR is working as expected.

@Slotos

@TheLeoP
Copy link
Author

TheLeoP commented Jul 1, 2025

I think this PR is ready to be merged/further comments @andymass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants