-
Notifications
You must be signed in to change notification settings - Fork 3k
[RFC] Add subrandr SRV3 and WebVTT subtitle renderer #16271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Excellent job! This works almost perfectly on the few videos I tried. Besides the few limitations you've listed,
This is actually not necessary, you can set PREFIX to any directory then point meson to the pkgconfig file. For example I installed it to |
Download the artifacts for this pull request: Windows |
800fc48
to
c6e4b4f
Compare
By "Nothing" I meant that we could "do nothing" not that there's no alternatives, there wouldn't be an alternatives section with alternatives if there weren't alternatives. Looking at ffmpeg-devel I don't see any patches for improving styling support, ffmpeg supported simple webvtt throwing out Also I fixed the stride hack so now the resulting |
Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too. |
Well if that's the case that would make things easier. After downloading many dlls from msys packages I even got it to compile, run, and find the config file in wine but it doesn't find any fonts. I'm hoping that this actually does work on Windows but someone on Windows would have to actually check that. In particular I have no idea what encoding fontconfig returns in FC_FILE on Windows and am currently wishfully assuming it's UTF-8. |
subrandr is a subtitle rendering library which aims to render SRV3 (YouTube) subtitles and WebVTT subtitles accurately. Currently in mpv WebVTT subs are rendered via ffmpeg conversion to ASS which throws away a lot of the style and completely disregards the WebVTT non-region-cue positioning algorithm. Furthermore if one wants to render some more complex SRV3 subtitles one has to resort to external converters since it's not even supported by ffmpeg. However subrandr is able to render SRV3 subtitles natively with support for the most commonly used features. It can render ruby text without relying on font metrics during conversion which is obviously fragile, and it can perform correct scaling using the exact calculations used by YouTube instead of making up ASS approximations. Similarly it follows the WebVTT spec for the features of WebVTT that it supports (mostly). It's not perfect of course and there's still many things it doesn't do or does wrong but those are things that can be incrementally improved outside of mpv.
Allows script to detect the presence of subrandr at runtime, useful for determining whether this mpv instance can play SRV3 subtitles.
This allows YouTube videos played directly from a URL to make use of subrandr's SRV3 support.
Implemented WebVTT snap-to-lines = false layout and Unicode line breaking. This means I'm now slightly less afraid of breaking people's WebVTT rendering. Is there anything to do on the MPV side before this is ready for review? I was thinking that it may be confusing if people using very customized subtitle options have their customization ignored in WebVTT (because sd_sbr doesn't implement them). Maybe initially we could just use subrandr for SRV3 to not cause regressions with WebVTT if subrandr is enabled, though there is still |
This PR adds support for subrandr, a subtitle rendering library I've been working on for, uh, the past 7 months.
The whole point is to render non-ASS subtitle formats correctly, without conversion, because conversion is most of the time lossy. Currently the supported formats are SRV3 which is YouTube's subtitle format and WebVTT.
Results
I have collected a few videos that use more complex SRV3 subtitles while working on subrandr, so I spent some time making three funny dwm four way comparisons between:
I believe supports ruby text via manual layout with font metrics at conversion time(I don't know whether this is actually the case?), this approach is obviously fragile with font fallback in the mix so personally I don't consider it a real solution.Comparisons of example videos
【original anime MV】幽霊船戦【hololive/宝鐘マリン】

Hololive music videos often have ruby text and as such are decent testing material. subrandr should handle SRV3 ruby text correctly although it's not implemented for WebVTT yet.
Worst Teambuilding Exercise Ever

This video contains a lot of positioned subtitles with different types of text shadow, and at this particular moment also exercises line-wrapping, which for SRV3 should be greedy.
sodapoppin checks out Northernlion's stream

This one is not that special and the ASS conversion is decently close, but the positioning is of course incorrect because it is fully in the video frame and the font size is similarly just slightly off.
For the sake of completeness, the process I used to create the comparisons
Each quadrant is either an mpv or firefox window, ran under X11 with the
dwm
window manager in master layout withnmasters = 2
, with the windows being constructed as follows:Top left: Use mpv compiled with subrandr and play a downloaded copy of the video with an accompanying srv3 file. (
--sub-format srv3
in ytdl)Top right: Convert srv3 file to ass file via ffmpeg (
ffmpeg -i <in>.srv3 <out>.ass
) then play the video with mpv and switch to the ASS track. (requires this fork of ffmpeg)Bottom left: Play a downloaded copy of the video with an accompanying vtt file downloaded from YouTube.
Bottom right:
userChrome.css
.I could've probably made separate screenshots in fullscreen mode and then stitched them together inside a markdown table... whatever, this is the first thing I thought of and it probably looks better.
Limitations
Since the library is still in "early" stages there's a lot of things that are not done correctly yet, this is a non-exhaustive list of the most important such things:
Line-breaking is very naive and does unnecessary reshaping, lines are only broken on whitespace instead of following the Unicode line breaking algorithm.Fixed.Font selection is not compliant with the CSS font matching algorithm, this has been mostly implemented in a branch but is not yet finished.I have since learned that chromium does something pretty close to what I do, so it's staying.There are like two places where it can panic because of unimplemented things but that's easily fixable.Fixed.The lack of a font provider means that the library will immediately return an error from
sbr_renderer_render
as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".I attempted to write a DirectWrite font provider earlier today, but got brain damage due to the lack of a simple "give me a font with this codepoint" function and put it off for later. (didn't think there could be something worse than fontconfig, but then
IDWriteTextAnalysisSource
hit me)mpv integration unresolved issues
I changedSolved withytdl-hook
to request srv3 subtitles, however this is not gated behind conditional compilation or any sort of runtime check so it breaks in builds without subrandr. Maybe a runtime property could be added that Lua can read?subrandr-version
property.dpi
of 72, this shouldn't have impact on subtitle layout with the currently supported formats, but it does impact debug UI when enabled viaSBR_DEBUG=draw_version,draw_perf,draw_layout
and may break in the future if support for CSS in WebVTT is added since one could do::cue { text-size: 20px; }
which must be scaled by the device pixel ratio.I have no idea how to get dpi information in
get_bitmaps
without digging intompv_global
which contains a warning specifically telling you not to do that.Oh and I almost forgot, currently I forcefully un-align the stride of the resultingFixed.mp_image
which could probably cause issues down the line on some platforms, so that should probably be changed.Building
So if you got this far and are on Linux with FreeType, HarfBuzz (with FreeType support), and Fontconfig libraries installed, here's how you build and install the library:
(You also need Rust installed, the latest stable toolchain should work)
You should set
PREFIX
to a path where$PREFIX/lib/pkgconfig
will be onPKG_CONFIG_PATH
,$PREFIX/lib
will be on the linker library path, and$PREFIX/include
will be on the include path. (I believe/usr/local/
works on "usual" Linux distributions but can't test)After building the library itself, you should be able to build mpv as usual, by passing
-D subrandr=enabled
tomeson setup
you can ensure the library is correctly detected or you will get a build error. The library itself is linked statically, with mpv inheriting the dynamic library dependencies.Alternatives
Is this all necessary? Other possible approaches could be:
Naturally, after spending months working on this, I am slightly biased and believe a separate renderer is worth because it allows iterating on other subtitle formats without having to worry about the unhinged format known as Advanced Substation Alpha. At first I was developing ASS support in parallel to SRV3 in subrandr, but then realized how much horrible complexity ASS adds and purged it from the code base, this in my eyes confirmed that it is significantly simpler to have other formats handled separately.
Thanks for reading, hope you like my work :)