Skip to content

Add support for the same encodings that ripgrep supports #12

Closed
@acheronfail

Description

@acheronfail

We can't trust the absolute_offset that ripgrep reports for non UTF-8 encoded files (see BurntSushi/ripgrep#1627 (comment)). So we need to parse the file ourselves.

Goals for this issue:

  • Use the same approach to encoding sniffing that ripgrep uses, either:
    • checking for a UTF-8 or UTF-16 BOM, and then using that encoding (defaulting to UTF-8 otherwise)
    • using the encoding passed on the command line
  • Find the exact location of the match a non UTF-8 encoded file, and insert the replacement text in the specified encoding. We changed tactics, but the result is the same. We now decode into UTF8/ASCII, perform the replacements and then re-encode before writing to disk

Supported encodings (tests exist for them):

  • ASCII
  • UTF8
  • UTF16BE
  • UTF16LE
  • TODO: get a list of all the encodings ripgrep supports (uses the encoding_rs crate)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions