Skip to content

basicio: use fs::path #3264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

basicio: use fs::path #3264

wants to merge 4 commits into from

Conversation

neheb
Copy link
Collaborator

@neheb neheb commented Apr 26, 2025

Allows removing manual UTF8 handling.

ping @gh-andre @VioletGiraffe

@neheb
Copy link
Collaborator Author

neheb commented Apr 26, 2025

ping @w17

@neheb
Copy link
Collaborator Author

neheb commented Apr 26, 2025

from libc++'s implementation:

 18   _LIBCPP_HIDE_FROM_ABI std::string string() const { return string<char>(); }
 17   _LIBCPP_HIDE_FROM_ABI __u8_string u8string() const {
 16     using _CVT = __narrow_to_utf8<sizeof(wchar_t) * __CHAR_BIT__>;
 15     __u8_string __s;
 14     __s.reserve(__pn_.size());
 13     _CVT()(back_inserter(__s), __pn_.data(), __pn_.data() + __pn_.size());
 12     return __s;
 11   }

u8string seems to do the conversion.

@neheb
Copy link
Collaborator Author

neheb commented Apr 26, 2025

oh this is funny.

so I renamed the exiv2 directory to ☺ in the hopes of getting some failures. I got stopped by CMake itself:

CMake Error: Could not read presets from /home/mangix/devstuff/☺:
File not found: /home/mangix/devstuff/☺/CMakePresets.json

@neheb
Copy link
Collaborator Author

neheb commented Apr 26, 2025

HAH. Even meson fails:

meson a;meson compile -C a

ninja: fatal: chdir to 'C:/Users/mangix/scoop/apps/msys2/2023-01-27/home/mangix/devstuff/?/a' - Invalid argument
ninja: Entering directory `C:/Users/mangix/scoop/apps/msys2/2023-01-27/home/mangix/devstuff/?/a'

cd a
ninja

src/lib_convert.a.p/convert.cpp.obj
"ccache" "c++" "-Isrc/lib_convert.a.p" "-Isrc" "-I../src" "-I." "-I.." "-I../include/exiv2" "-I../xmpsdk/include" "-fvisibility=hidden" "-fdiagnostics-color=always" "-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_FAST" "-D_FILE_OFFSET_BITS=64" "-std=c++20" "-O0" "-g" "-DEXV_LOCALEDIR=\"C:/Users/mangix/scoop/apps/msys2/2023-01-27/clang64/share/locale\"" "-DEXIV2API=__declspec(dllexport)" -MD -MQ src/lib_convert.a.p/convert.cpp.obj -MF "src/lib_convert.a.p/convert.cpp.obj.d" -o src/lib_convert.a.p/convert.cpp.obj "-c" ../src/convert.cpp
libc++abi: terminating due to uncaught exception of type std::__1::__fs::filesystem::filesystem_error: filesystem error: in __wide_to_char: Illegal byte sequence

@neheb neheb requested a review from kmilos April 26, 2025 21:05
@gh-andre
Copy link

When I worked on replacing string paths with filesystem::path for my purposes, one of the challenges was handling actual files vs. other I/O implementations. I needed only files to work with arbitrary characters in file paths, so using filesystem::path in FileIo worked well for that, but I also had to leave the original string path methods in place, so people could still use them to open URLs, etc.

This is an export of the DGML I created when I reworked FileIo to use filesystem::path, which shows the components that would need additional changes, such as BasicIo::path() having to return a temporary because there is no longer a string in FileIo to return a reference to.

exiv2-classes-dgml-export

The export tool didn't do a good job and some arrows aren't leading to inner boxes, so some comments may look confusing, but it still should give you a good idea of the scope of the change.

I also want to mention that I think the approach or re-testing everything with paths containing various characters is a good idea - see what's working and what isn't before implementing a significant change like this. I would only do it against the test data, like image names with emojis, not with build tools, like project paths. It would be a good idea to introduce all types of characters in these paths, like A, £, ゎ, 🌎, to cover 1-, 2-, 3- and 4-byte characters.

These tests will probably reveal things like calling ReplaceFileA, etc., and calling filesystem::path::string, which will throw exceptions for characters outside of Win1252, unless the locale is set to en_US.UTF-8:

    std::filesystem::path p(u8"A" u8"\U0001F30E" u8"B");
    std::wstring ws = p.wstring();
    std::string ns = p.string();        // will throw unless std::locale("en_US.UTF-8") is set
    std::u8string us = p.u8string();

The locale may present a challenge if people want to run it for non-UTF-8 locales, but still be able to open files with emojis. This may be tricky without an explicit character handling like that in the diagram.

Lastly, Windows 10 reaches the end of support in 2025, so probably isn't as important for this anymore.

@neheb neheb force-pushed the fspp branch 2 times, most recently from 3320ce0 to cda9422 Compare April 27, 2025 23:31
Copy link

codecov bot commented Apr 27, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 63.88%. Comparing base (2e6fb08) to head (8fc087b).
Report is 48 commits behind head on main.

Files with missing lines Patch % Lines
src/basicio.cpp 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3264      +/-   ##
==========================================
- Coverage   63.91%   63.88%   -0.04%     
==========================================
  Files         104      104              
  Lines       21752    21714      -38     
  Branches    10636    10631       -5     
==========================================
- Hits        13903    13872      -31     
+ Misses       5641     5634       -7     
  Partials     2208     2208              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@neheb
Copy link
Collaborator Author

neheb commented Apr 27, 2025

  Actual: it throws Exiv2::Error with description "D:/a/exiv2/exiv2/test/data\Реган.jp2: Failed to open the data source: No such file or directory (errno = 2)".

boo.

@neheb neheb force-pushed the fspp branch 2 times, most recently from bd385d4 to 27ae3a6 Compare April 27, 2025 23:47
@gh-andre
Copy link

Реган.jp2

I cannot figure out the actual character for the first pair bytes, but the subsequent pairs appear to be spelling еган and if that's the case, it means it interprets UTF-8 as Latin1, which would point to an instance of std::string interpreted without setting the locale to UTF-8.

@neheb neheb force-pushed the fspp branch 2 times, most recently from c791779 to 559f336 Compare April 28, 2025 00:24
@neheb
Copy link
Collaborator Author

neheb commented Apr 28, 2025

Реган.jp2

I cannot figure out the actual character for the first pair bytes, but the subsequent pairs appear to be spelling еган and if that's the case, it means it interprets UTF-8 as Latin1, which would point to an instance of std::string interpreted without setting the locale to UTF-8.

it gets better. that's from the CMake CI. The meson one properly prints Реган but still fails to open.

neheb added 3 commits April 27, 2025 18:29
Allows removing manual UTF8 handling.

Signed-off-by: Rosen Penev <[email protected]>
Signed-off-by: Rosen Penev <[email protected]>
@neheb neheb force-pushed the fspp branch 2 times, most recently from 8fc087b to 37a911f Compare April 28, 2025 01:37
Signed-off-by: Rosen Penev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants