Skip to content

[RFC] Add subrandr SRV3 and WebVTT subtitle renderer #16271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

afishhh
Copy link

@afishhh afishhh commented Apr 20, 2025

This PR adds support for subrandr, a subtitle rendering library I've been working on for, uh, the past 7 months.

The whole point is to render non-ASS subtitle formats correctly, without conversion, because conversion is most of the time lossy. Currently the supported formats are SRV3 which is YouTube's subtitle format and WebVTT.

Results

I have collected a few videos that use more complex SRV3 subtitles while working on subrandr, so I spent some time making three funny dwm four way comparisons between:

  • Top left: mpv with subrandr
  • Top right: conversion to ass via my ffmpeg decoder. Alternatively one may use YTSubConverter which I believe supports ruby text via manual layout with font metrics at conversion time (I don't know whether this is actually the case?), this approach is obviously fragile with font fallback in the mix so personally I don't consider it a real solution.
  • Bottom left: status quo when playing a video from a URL now, which is subtitles in WebVTT format converted on YouTube's side, converted to ASS by ffmpeg and then played using libass
  • Bottom right: YouTube web player, ground truth
Comparisons of example videos

【original anime MV】幽霊船戦【hololive/宝鐘マリン】
Hololive music videos often have ruby text and as such are decent testing material. subrandr should handle SRV3 ruby text correctly although it's not implemented for WebVTT yet.
2025-04-20_14-53_1

Worst Teambuilding Exercise Ever
This video contains a lot of positioned subtitles with different types of text shadow, and at this particular moment also exercises line-wrapping, which for SRV3 should be greedy.
2025-04-20_15-01

sodapoppin checks out Northernlion's stream
This one is not that special and the ASS conversion is decently close, but the positioning is of course incorrect because it is fully in the video frame and the font size is similarly just slightly off.
2025-04-20_15-11

For the sake of completeness, the process I used to create the comparisons

Each quadrant is either an mpv or firefox window, ran under X11 with the dwm window manager in master layout with nmasters = 2, with the windows being constructed as follows:
Top left: Use mpv compiled with subrandr and play a downloaded copy of the video with an accompanying srv3 file. (--sub-format srv3 in ytdl)
Top right: Convert srv3 file to ass file via ffmpeg (ffmpeg -i <in>.srv3 <out>.ass) then play the video with mpv and switch to the ASS track. (requires this fork of ffmpeg)
Bottom left: Play a downloaded copy of the video with an accompanying vtt file downloaded from YouTube.
Bottom right:

  1. Disable browser decorations via userChrome.css.
  2. Open the video on YouTube.
  3. Seek to the appropriate time, put the player in theater mode.
  4. Run below snippet in the console
(() => {  
function full(el, h) {
  el.style.position = "fixed"
  el.style.zIndex = 10000000;
  if(h == "auto") {
    el.style.top = "50%"
    el.style.left = "50%"
    el.style.transform = "translate(-50%, -50%)";
  } else {
    el.style.top = "0"
    el.style.left = "0"
  }
  el.style.backgroundColor = "black"
}
let player = document.querySelector("#movie_player");
full(player, "100vh");
full(player.querySelector("video"), "auto");
player.querySelector(".ytp-chrome-bottom").style.display = "none";
})()
  1. Make sure that if the subtitles move if the video is paused (don't remember what this depends on) that the video is playing when the screenshot is taken

I could've probably made separate screenshots in fullscreen mode and then stitched them together inside a markdown table... whatever, this is the first thing I thought of and it probably looks better.

Limitations

Since the library is still in "early" stages there's a lot of things that are not done correctly yet, this is a non-exhaustive list of the most important such things:

  • No DirectWrite font provider. (no fonts will be found on Windows)
  • No CoreText font provider. (no fonts will be found on MacOS)
  • Line-breaking is very naive and does unnecessary reshaping, lines are only broken on whitespace instead of following the Unicode line breaking algorithm. Fixed.
  • Unicode bidirectional algorithm is not used.
  • Any form of vertical text is unsupported.
  • Font selection is not compliant with the CSS font matching algorithm, this has been mostly implemented in a branch but is not yet finished. I have since learned that chromium does something pretty close to what I do, so it's staying.
  • Subpixel glyph rendering is not implemented so positions are rounded to integer values, this looks wrong on non-HiDPI displays in lower window sizes. I plan to fix this soon.
  • The rasterizer is not optimized to libass levels, everything is portable (although unsafe) Rust code. The design allows for some GPU acceleration implemented in an optional wgpu rasterizer although mostly only blitting and blurring with actual path rendering being too complicated, and because mpv is designed with software-only OSD rendering in mind it can't be integrated easily. (correct me if I'm wrong)
  • There are like two places where it can panic because of unimplemented things but that's easily fixable. Fixed.

The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".
I attempted to write a DirectWrite font provider earlier today, but got brain damage due to the lack of a simple "give me a font with this codepoint" function and put it off for later. (didn't think there could be something worse than fontconfig, but then IDWriteTextAnalysisSource hit me)

mpv integration unresolved issues

  • I changed ytdl-hook to request srv3 subtitles, however this is not gated behind conditional compilation or any sort of runtime check so it breaks in builds without subrandr. Maybe a runtime property could be added that Lua can read? Solved with subrandr-version property.
  • subrandr is always passed a dpi of 72, this shouldn't have impact on subtitle layout with the currently supported formats, but it does impact debug UI when enabled via SBR_DEBUG=draw_version,draw_perf,draw_layout and may break in the future if support for CSS in WebVTT is added since one could do ::cue { text-size: 20px; } which must be scaled by the device pixel ratio.
    I have no idea how to get dpi information in get_bitmaps without digging into mpv_global which contains a warning specifically telling you not to do that.
  • Oh and I almost forgot, currently I forcefully un-align the stride of the resulting mp_image which could probably cause issues down the line on some platforms, so that should probably be changed. Fixed.

Building

So if you got this far and are on Linux with FreeType, HarfBuzz (with FreeType support), and Fontconfig libraries installed, here's how you build and install the library:

git clone https://github.com/afishhh/subrandr
cd subrandr
cargo b --release
PREFIX=/<path>/<to>/<prefix> ./install.sh

(You also need Rust installed, the latest stable toolchain should work)
You should set PREFIX to a path where $PREFIX/lib/pkgconfig will be on PKG_CONFIG_PATH, $PREFIX/lib will be on the linker library path, and $PREFIX/include will be on the include path. (I believe /usr/local/ works on "usual" Linux distributions but can't test)

After building the library itself, you should be able to build mpv as usual, by passing -D subrandr=enabled to meson setup you can ensure the library is correctly detected or you will get a build error. The library itself is linked statically, with mpv inheriting the dynamic library dependencies.

Alternatives

Is this all necessary? Other possible approaches could be:

  • Adding extensions to libass for things like ruby text, doesn't account for format idiosyncrasies like positioning which is especially important with WebVTT where the positioning algorithm is very different (step 10 onward).
  • Conversion in ffmpeg, see my patch, it does a decent job but has no chance of supporting stuff like ruby text because loading fonts in a decoder would be very strange and probably never get merged into mainline ffmpeg. Did I mention it's hacky and fragile yet?
  • Do nothing, the sad truth is that these formats, even though they're quite powerful, are seldom used to even a fraction of their full potential. YouTube themselves don't support some of SRV3's features on non-Web players (crazy idea would be to add subrandr to revanced, food for thought).

Naturally, after spending months working on this, I am slightly biased and believe a separate renderer is worth because it allows iterating on other subtitle formats without having to worry about the unhinged format known as Advanced Substation Alpha. At first I was developing ASS support in parallel to SRV3 in subrandr, but then realized how much horrible complexity ASS adds and purged it from the code base, this in my eyes confirmed that it is significantly simpler to have other formats handled separately.

Thanks for reading, hope you like my work :)

@llyyr
Copy link
Contributor

llyyr commented Apr 20, 2025

Excellent job! This works almost perfectly on the few videos I tried. Besides the few limitations you've listed, one other thing I noticed was that alignment is a bit off in the video I tested with at 00:59. Scratch that, subrandr matches Chromium output, it's actually Firefox that's dorky here. Probably sensible to treat Chromium as the ground truth here

You should set PREFIX to a path where $PREFIX/lib/pkgconfig will be on PKG_CONFIG_PATH, $PREFIX/lib will be on the linker library path, and $PREFIX/include will be on the include path. (I believe /usr/local/ works on "usual" Linux distributions but can't test)

This is actually not necessary, you can set PREFIX to any directory then point meson to the pkgconfig file. For example I installed it to PREFIX=/opt/subrandr and configured meson with meson setup build --pkg-config-path /opt/subrandr/lib/pkgconfig. This avoids littering your /usr/local files or needing to change environment variables.

Copy link

@afishhh afishhh force-pushed the subrandr branch 2 times, most recently from 800fc48 to c6e4b4f Compare April 20, 2025 23:01
@afishhh
Copy link
Author

afishhh commented Apr 20, 2025

Alternatives -> Nothing

Didn't FFMPEG finally add WebVTT support just 2 days ago?

By "Nothing" I meant that we could "do nothing" not that there's no alternatives, there wouldn't be an alternatives section with alternatives if there weren't alternatives. Looking at ffmpeg-devel I don't see any patches for improving styling support, ffmpeg supported simple webvtt throwing out all most of the styles for a long time already.

Also I fixed the stride hack so now the resulting mp_image has proper aligned stride.

@hooke007
Copy link
Contributor

The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".

Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too.

@afishhh
Copy link
Author

afishhh commented Apr 21, 2025

The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".

Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too.

Well if that's the case that would make things easier. After downloading many dlls from msys packages I even got it to compile, run, and find the config file in wine but it doesn't find any fonts. I'm hoping that this actually does work on Windows but someone on Windows would have to actually check that. In particular I have no idea what encoding fontconfig returns in FC_FILE on Windows and am currently wishfully assuming it's UTF-8.

afishhh added 3 commits April 22, 2025 18:40
subrandr is a subtitle rendering library which aims to render
SRV3 (YouTube) subtitles and WebVTT subtitles accurately.

Currently in mpv WebVTT subs are rendered via ffmpeg conversion to ASS
which throws away a lot of the style and completely disregards
the WebVTT non-region-cue positioning algorithm. Furthermore if
one wants to render some more complex SRV3 subtitles one has to
resort to external converters since it's not even supported by ffmpeg.

However subrandr is able to render SRV3 subtitles natively with support
for the most commonly used features. It can render ruby text without
relying on font metrics during conversion which is obviously fragile,
and it can perform correct scaling using the exact calculations used by
YouTube instead of making up ASS approximations. Similarly it follows
the WebVTT spec for the features of WebVTT that it supports (mostly).

It's not perfect of course and there's still many things it doesn't do
or does wrong but those are things that can be incrementally improved
outside of mpv.
Allows script to detect the presence of subrandr at runtime, useful for
determining whether this mpv instance can play SRV3 subtitles.
This allows YouTube videos played directly from a URL to make use of
subrandr's SRV3 support.
@afishhh
Copy link
Author

afishhh commented Apr 22, 2025

Implemented WebVTT snap-to-lines = false layout and Unicode line breaking. This means I'm now slightly less afraid of breaking people's WebVTT rendering.

Is there anything to do on the MPV side before this is ready for review? I was thinking that it may be confusing if people using very customized subtitle options have their customization ignored in WebVTT (because sd_sbr doesn't implement them). Maybe initially we could just use subrandr for SRV3 to not cause regressions with WebVTT if subrandr is enabled, though there is still ytdl_hook that will now start ignoring your stuff by preferring SRV3 but that's less difficult to have configurable at runtime I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants