Closed
Description
We can't trust the absolute_offset
that ripgrep reports for non UTF-8 encoded files (see BurntSushi/ripgrep#1627 (comment)). So we need to parse the file ourselves.
Goals for this issue:
- Use the same approach to encoding sniffing that ripgrep uses, either:
- checking for a UTF-8 or UTF-16 BOM, and then using that encoding (defaulting to UTF-8 otherwise)
- using the encoding passed on the command line
-
Find the exact location of the match a non UTF-8 encoded file, and insert the replacement text in the specified encoding. We changed tactics, but the result is the same. We now decode into UTF8/ASCII, perform the replacements and then re-encode before writing to disk
Supported encodings (tests exist for them):
- ASCII
- UTF8
- UTF16BE
- UTF16LE
- TODO: get a list of all the encodings ripgrep supports (uses the
encoding_rs
crate)