[RFC] Add subrandr SRV3 and WebVTT subtitle renderer #16271

afishhh · 2025-04-20T16:54:59Z

This PR adds support for subrandr, a subtitle rendering library I've been working on for, uh, the past 7 months.

The whole point is to render non-ASS subtitle formats correctly, without conversion, because conversion is most of the time lossy. Currently the supported formats are SRV3 which is YouTube's subtitle format and WebVTT.

Results

I have collected a few videos that use more complex SRV3 subtitles while working on subrandr, so I spent some time making three funny dwm four way comparisons between:

Top left: mpv with subrandr
Top right: conversion to ass via my ffmpeg decoder. Alternatively one may use YTSubConverter which ~~I believe supports ruby text via manual layout with font metrics at conversion time~~ (I don't know whether this is actually the case?), this approach is obviously fragile with font fallback in the mix so personally I don't consider it a real solution.
Bottom left: status quo when playing a video from a URL now, which is subtitles in WebVTT format converted on YouTube's side, converted to ASS by ffmpeg and then played using libass
Bottom right: YouTube web player, ground truth

Comparisons of example videos

【original anime MV】幽霊船戦【hololive/宝鐘マリン】
Hololive music videos often have ruby text and as such are decent testing material. subrandr should handle SRV3 ruby text correctly although it's not implemented for WebVTT yet.

Worst Teambuilding Exercise Ever
This video contains a lot of positioned subtitles with different types of text shadow, and at this particular moment also exercises line-wrapping, which for SRV3 should be greedy.

sodapoppin checks out Northernlion's stream
This one is not that special and the ASS conversion is decently close, but the positioning is of course incorrect because it is fully in the video frame and the font size is similarly just slightly off.

For the sake of completeness, the process I used to create the comparisons

Each quadrant is either an mpv or firefox window, ran under X11 with the dwm window manager in master layout with nmasters = 2, with the windows being constructed as follows:
Top left: Use mpv compiled with subrandr and play a downloaded copy of the video with an accompanying srv3 file. (--sub-format srv3 in ytdl)
Top right: Convert srv3 file to ass file via ffmpeg (ffmpeg -i <in>.srv3 <out>.ass) then play the video with mpv and switch to the ASS track. (requires this fork of ffmpeg)
Bottom left: Play a downloaded copy of the video with an accompanying vtt file downloaded from YouTube.
Bottom right:

Disable browser decorations via userChrome.css.
Open the video on YouTube.
Seek to the appropriate time, put the player in theater mode.
Run below snippet in the console

(() => {  
function full(el, h) {
  el.style.position = "fixed"
  el.style.zIndex = 10000000;
  if(h == "auto") {
    el.style.top = "50%"
    el.style.left = "50%"
    el.style.transform = "translate(-50%, -50%)";
  } else {
    el.style.top = "0"
    el.style.left = "0"
  }
  el.style.backgroundColor = "black"
}
let player = document.querySelector("#movie_player");
full(player, "100vh");
full(player.querySelector("video"), "auto");
player.querySelector(".ytp-chrome-bottom").style.display = "none";
})()

Make sure that if the subtitles move if the video is paused (don't remember what this depends on) that the video is playing when the screenshot is taken

I could've probably made separate screenshots in fullscreen mode and then stitched them together inside a markdown table... whatever, this is the first thing I thought of and it probably looks better.

Limitations

Since the library is still in "early" stages there's a lot of things that are not done correctly yet, this is a non-exhaustive list of the most important such things:

No DirectWrite font provider. (no fonts will be found on Windows)
No CoreText font provider. (no fonts will be found on MacOS)
~~Line-breaking is very naive and does unnecessary reshaping, lines are only broken on whitespace instead of following the Unicode line breaking algorithm.~~ Fixed.
Unicode bidirectional algorithm is not used.
Any form of vertical text is unsupported.
~~Font selection is not compliant with the CSS font matching algorithm, this has been mostly implemented in a branch but is not yet finished.~~ I have since learned that chromium does something pretty close to what I do, so it's staying.
Subpixel glyph rendering is not implemented so positions are rounded to integer values, this looks wrong on non-HiDPI displays in lower window sizes. I plan to fix this soon.
The rasterizer is not optimized to libass levels, everything is portable (although unsafe) Rust code. The design allows for some GPU acceleration implemented in an optional wgpu rasterizer although mostly only blitting and blurring with actual path rendering being too complicated, and because mpv is designed with software-only OSD rendering in mind it can't be integrated easily. (correct me if I'm wrong)
~~There are like two places where it can panic because of unimplemented things but that's easily fixable.~~ Fixed.

The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".
I attempted to write a DirectWrite font provider earlier today, but got brain damage due to the lack of a simple "give me a font with this codepoint" function and put it off for later. (didn't think there could be something worse than fontconfig, but then IDWriteTextAnalysisSource hit me)

mpv integration unresolved issues

I changed ytdl-hook to request srv3 subtitles, however this is not gated behind conditional compilation or any sort of runtime check so it breaks in builds without subrandr. Maybe a runtime property could be added that Lua can read? Solved with subrandr-version property.
subrandr is always passed a dpi of 72, this shouldn't have impact on subtitle layout with the currently supported formats, but it does impact debug UI when enabled via SBR_DEBUG=draw_version,draw_perf,draw_layout and may break in the future if support for CSS in WebVTT is added since one could do ::cue { text-size: 20px; } which must be scaled by the device pixel ratio.
I have no idea how to get dpi information in get_bitmaps without digging into mpv_global which contains a warning specifically telling you not to do that.
~~Oh and I almost forgot, currently I forcefully un-align the stride of the resulting mp_image which could probably cause issues down the line on some platforms, so that should probably be changed.~~ Fixed.

Building

So if you got this far and are on Linux with FreeType, HarfBuzz (with FreeType support), and Fontconfig libraries installed, here's how you build and install the library:

git clone https://github.com/afishhh/subrandr
cd subrandr
cargo b --release
PREFIX=/<path>/<to>/<prefix> ./install.sh

(You also need Rust installed, the latest stable toolchain should work)
You should set PREFIX to a path where $PREFIX/lib/pkgconfig will be on PKG_CONFIG_PATH, $PREFIX/lib will be on the linker library path, and $PREFIX/include will be on the include path. (I believe /usr/local/ works on "usual" Linux distributions but can't test)

After building the library itself, you should be able to build mpv as usual, by passing -D subrandr=enabled to meson setup you can ensure the library is correctly detected or you will get a build error. The library itself is linked statically, with mpv inheriting the dynamic library dependencies.

Alternatives

Is this all necessary? Other possible approaches could be:

Adding extensions to libass for things like ruby text, doesn't account for format idiosyncrasies like positioning which is especially important with WebVTT where the positioning algorithm is very different (step 10 onward).
Conversion in ffmpeg, see my patch, it does a decent job but has no chance of supporting stuff like ruby text because loading fonts in a decoder would be very strange and probably never get merged into mainline ffmpeg. Did I mention it's hacky and fragile yet?
Do nothing, the sad truth is that these formats, even though they're quite powerful, are seldom used to even a fraction of their full potential. YouTube themselves don't support some of SRV3's features on non-Web players (crazy idea would be to add subrandr to revanced, food for thought).

Naturally, after spending months working on this, I am slightly biased and believe a separate renderer is worth because it allows iterating on other subtitle formats without having to worry about the unhinged format known as Advanced Substation Alpha. At first I was developing ASS support in parallel to SRV3 in subrandr, but then realized how much horrible complexity ASS adds and purged it from the code base, this in my eyes confirmed that it is significantly simpler to have other formats handled separately.

Thanks for reading, hope you like my work :)

llyyr · 2025-04-20T17:20:16Z

Excellent job! This works almost perfectly on the few videos I tried. Besides the few limitations you've listed, ~~one other thing I noticed was that alignment is a bit off in the video I tested with at 00:59.~~ Scratch that, subrandr matches Chromium output, it's actually Firefox that's dorky here. Probably sensible to treat Chromium as the ground truth here

You should set PREFIX to a path where $PREFIX/lib/pkgconfig will be on PKG_CONFIG_PATH, $PREFIX/lib will be on the linker library path, and $PREFIX/include will be on the include path. (I believe /usr/local/ works on "usual" Linux distributions but can't test)

This is actually not necessary, you can set PREFIX to any directory then point meson to the pkgconfig file. For example I installed it to PREFIX=/opt/subrandr and configured meson with meson setup build --pkg-config-path /opt/subrandr/lib/pkgconfig. This avoids littering your /usr/local files or needing to change environment variables.

github-actions · 2025-04-20T18:47:35Z

Download the artifacts for this pull request:

Windows

macOS

afishhh · 2025-04-20T23:07:40Z

Alternatives -> Nothing

Didn't FFMPEG finally add WebVTT support just 2 days ago?

By "Nothing" I meant that we could "do nothing" not that there's no alternatives, there wouldn't be an alternatives section with alternatives if there weren't alternatives. Looking at ffmpeg-devel I don't see any patches for improving styling support, ffmpeg supported simple webvtt throwing out ~~all~~ most of the styles for a long time already.

Also I fixed the stride hack so now the resulting mp_image has proper aligned stride.

hooke007 · 2025-04-21T05:55:56Z

The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".

Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too.

afishhh · 2025-04-21T10:47:38Z

The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".

Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too.

Well if that's the case that would make things easier. After downloading many dlls from msys packages I even got it to compile, run, and find the config file in wine but it doesn't find any fonts. I'm hoping that this actually does work on Windows but someone on Windows would have to actually check that. In particular I have no idea what encoding fontconfig returns in FC_FILE on Windows and am currently wishfully assuming it's UTF-8.

subrandr is a subtitle rendering library which aims to render SRV3 (YouTube) subtitles and WebVTT subtitles accurately. Currently in mpv WebVTT subs are rendered via ffmpeg conversion to ASS which throws away a lot of the style and completely disregards the WebVTT non-region-cue positioning algorithm. Furthermore if one wants to render some more complex SRV3 subtitles one has to resort to external converters since it's not even supported by ffmpeg. However subrandr is able to render SRV3 subtitles natively with support for the most commonly used features. It can render ruby text without relying on font metrics during conversion which is obviously fragile, and it can perform correct scaling using the exact calculations used by YouTube instead of making up ASS approximations. Similarly it follows the WebVTT spec for the features of WebVTT that it supports (mostly). It's not perfect of course and there's still many things it doesn't do or does wrong but those are things that can be incrementally improved outside of mpv.

Allows script to detect the presence of subrandr at runtime, useful for determining whether this mpv instance can play SRV3 subtitles.

This allows YouTube videos played directly from a URL to make use of subrandr's SRV3 support.

afishhh · 2025-04-22T17:37:00Z

Implemented WebVTT snap-to-lines = false layout and Unicode line breaking. This means I'm now slightly less afraid of breaking people's WebVTT rendering.

Is there anything to do on the MPV side before this is ready for review? I was thinking that it may be confusing if people using very customized subtitle options have their customization ignored in WebVTT (because sd_sbr doesn't implement them). Maybe initially we could just use subrandr for SRV3 to not cause regressions with WebVTT if subrandr is enabled, though there is still ytdl_hook that will now start ignoring your stuff by preferring SRV3 but that's less difficult to have configurable at runtime I guess.

afishhh force-pushed the subrandr branch from 1f5644e to d5fe4c7 Compare April 20, 2025 18:19

afishhh force-pushed the subrandr branch 2 times, most recently from 800fc48 to c6e4b4f Compare April 20, 2025 23:01

afishhh added 3 commits April 22, 2025 18:40

player/command: add subrandr-version property

d801077

Allows script to detect the presence of subrandr at runtime, useful for determining whether this mpv instance can play SRV3 subtitles.

ytdl_hook: accept srv3 subtitle format if subrandr is present

0c2de57

This allows YouTube videos played directly from a URL to make use of subrandr's SRV3 support.

afishhh force-pushed the subrandr branch from c6e4b4f to 0c2de57 Compare April 22, 2025 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Add subrandr SRV3 and WebVTT subtitle renderer #16271

[RFC] Add subrandr SRV3 and WebVTT subtitle renderer #16271

afishhh commented Apr 20, 2025 •

edited

Loading

llyyr commented Apr 20, 2025 •

edited

Loading

github-actions bot commented Apr 20, 2025

afishhh commented Apr 20, 2025 •

edited

Loading

hooke007 commented Apr 21, 2025

afishhh commented Apr 21, 2025

afishhh commented Apr 22, 2025 •

edited

Loading

[RFC] Add subrandr SRV3 and WebVTT subtitle renderer #16271

Are you sure you want to change the base?

[RFC] Add subrandr SRV3 and WebVTT subtitle renderer #16271

Conversation

afishhh commented Apr 20, 2025 • edited Loading

Results

Limitations

mpv integration unresolved issues

Building

Alternatives

llyyr commented Apr 20, 2025 • edited Loading

github-actions bot commented Apr 20, 2025

afishhh commented Apr 20, 2025 • edited Loading

hooke007 commented Apr 21, 2025

afishhh commented Apr 21, 2025

afishhh commented Apr 22, 2025 • edited Loading

afishhh commented Apr 20, 2025 •

edited

Loading

llyyr commented Apr 20, 2025 •

edited

Loading

afishhh commented Apr 20, 2025 •

edited

Loading

afishhh commented Apr 22, 2025 •

edited

Loading