Skip to content

Abstraction for Multilingual Content #6109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dtdesign opened this issue Nov 20, 2024 · 3 comments
Open

Abstraction for Multilingual Content #6109

dtdesign opened this issue Nov 20, 2024 · 3 comments

Comments

@dtdesign
Copy link
Member

There are multiple different implementations for multilingual content and they are all either flawed to some extent or quite limited or both. Some store there values in phrases, others use separate tables but are inconsistent on how the ids are mapped. Furthermore being able to mix monolingual and multilingual content creates a lot of issues when sorting, filtering or generally trying to work with those values because it involves some kind of magic, for example, the value could be a plain value or the name of a phrase.

The only solution forward is a consistent abstraction that handles this in a uniform way that does not rely on phrases (which was a stop-gap solution in itself) and provides convenient helper methods to work with them. It needs to solve the following problems:

  • Any localized value must be stored separately regardless of the number of available languages.
  • Monolingual content must be represented as a multilingual content with only a single localization.
  • Displaying, sorting and filtering by values must be deterministic.
    • Available localizations can be inconsistent, the value must be determined in this order:
      • Requested language.
      • Default language.
      • Values for the language with the lowest languageID.

Determining the values can be done through a sub select using either CASE … THEN or by assigning the languageID a value by preference. The latter could be achieved by setting -2 for preferred language, -1 for default language and the actual languageID for everything else followed by then selecting the lowest value.

This requires extra tables for any such content and should be entirely managed like it is already the case with the *_search_index tables. objectID and languageID need to be fixed and all other columns could be represented through a list of IDatabaseTableColumn. Additionally this would set up a foreign key for both the languageID as well as the objectID with the latter referencing the original object.

We need to explore if this is feasible through specialized helper methods to avoid API consumers having to write custom queries to work with these values.

@Cyperghost
Copy link
Contributor

Cyperghost commented Apr 14, 2025

Multilingual objects

We need two new classes that each inherits from DatabaseObject

  • MultilingualContent represents the content of a multilingual object
  • Multilingual the multilingual object itself

MultilingualContent has the properties objectID and languageID and a helper method getContent().
The getContent() method takes an objectID and a languageID and returns the content of the object in the requested language or the default language if the requested language is not available.
If both are not available, the language with the lowest ID is returned.

Example code for MultilingualContent
/**
 * @property-read int $objectID
 * @property-read ?int $languageID
 */
abstract class MultilingualContent extends DatabaseObject
{
    public static function getContent(int $objectID, ?int $languageID): ?static
    {
        if ($languageID === null) {
            $languageID = WCF::getLanguage()->languageID;
        }
        $defaultLanguageID = LanguageFactory::getInstance()->getDefaultLanguage()->languageID;

        $contentTableName = static::getDatabaseTableName();

        $statement = WCF::getDB()->prepare(
            <<<SQL
            SELECT   *
            FROM     {$contentTableName}
            WHERE    objectID = ?
            ORDER BY CASE
                WHEN languageID = {$languageID} THEN -2
                WHEN languageID = {$defaultLanguageID} THEN -1
                ELSE languageID
            END ASC
            LIMIT 1
            SQL
        );
        $statement->execute([$objectID]);

        return $statement->fetchObject(static::class);
    }
}

Multilingual contains a static property class-string<Multilingual>, which refers to the content class.
The column isMultilingual is required, which indicates whether the object is multilingual(1) or not(0).
In addition, an array $contents is required which stores a list of MultilingualContent.

Example code for Multilingual
/**
 * @template TContent of MultilingualContent
 *
 * @property-read int $isMultilingual
 */
abstract class Multilingual extends DatabaseObject
{
    /**
     * @var class-string<TContent>
     */
    protected static string $contentClassName;

    /**
     * @var array<int, TContent>
     */
    protected array $contents;

    /**
     * @param TContent $content
     */
    public function setContent(MultilingualContent $content): void
    {
        if (!isset($this->contents)) {
            $this->contents = [];
        }

        $this->contents[$content->languageID ?: 0] = $content;
    }

    /**
     * @return ?TContent
     */
    public function getContent(?int $languageID = null): ?MultilingualContent
    {
        $this->loadContent();

        if ($this->isMultilingual === 0) {
            if (isset($this->contents[0])) {
                return $this->contents[0];
            }
        } else {
            if ($languageID === null) {
                $languageID = WCF::getLanguage()->languageID;
            }

            return $this->contents[$languageID]
                ?? $this->contents[LanguageFactory::getInstance()->getDefaultLanguageID()]
                ?? \reset($this->contents);
        }

        return null;
    }

    protected function loadContent(): void
    {
        if (!isset($this->contents)) {
            $this->contents = [];
            $contentTableName = static::getContentClassName()::getDatabaseTableName();

            $statement = WCF::getDB()->prepare(
                <<<SQL
                SELECT *
                FROM   {$contentTableName}
                WHERE  objectID = ?
                SQL
            );
            $statement->execute([$this->getObjectID()]);

            while ($content = $statement->fetchObject(static::getContentClassName())) {
                $this->contents[$content->languageID ?: 0] = $content;
            }
            
            if ($this->isMultilingual) {
                \ksort($this->contents);
            }
        }
    }

    /**
     * @return class-string<TContent>
     */
    public static function getContentClassName(): string
    {
        return static::$contentClassName;
    }
}

Database Table Editor

To create the database table, a helper class should be provided, which already inserts objectID and languageID and the necessary keys.
The developer MUST provide the foreign key for objectID.

Database object list

A sorted or filtered list of objects (wcf1_foo) must be deterministic.
To solve this, we need to use a subquery in DatabaseObjectList::readObjectIDs() and DatabaseObjectList::readObjects() which contains an extra column languageID.
For this we use a new class MultilingualList, which overwrites the above methods.
This class must accept the requested $languageID in the constructor.
In addition, the join must be applied to both DatabaseObjectList::$sqlConditionJoins and DatabaseObjectList::$sqlJoins in the constructor.

A function should be provided which loads the MultilingualContent after readObjects() and call Multilingual::setContent().

The SQL query could then look like this
SELECT    foo.*
FROM      (
    SELECT foo.*, (
        SELECT   languageID
        FROM     wcf1_foo_content fooContent
        WHERE    foo.fooID = fooContent.objectID
        ORDER BY CASE
            WHEN languageID = ? THEN -2 -- preferred languageID
            WHEN languageID = ? THEN -1 -- default languageID
            ELSE languageID
        END ASC
        LIMIT 1
    ) AS   languageID
    FROM   wcf1_foo foo
) as foo
INNER JOIN wcf1_foo_content fooContent
       ON foo.fooID = fooContent.objectID
      AND foo.languageID = fooContent.languageID
WHERE     fooContent.title LIKE ?
ORDER BY  fooContent.secondColumn DESC

Object Action helper

A command should be called when create(), update() and delete() are called.
The command classes should be provided via a protected property.

Example for create
abstract class MultilingualDatabaseObjectAction extends AbstractDatabaseObjectAction
{
    /**
     * @var class-string
     */
    protected string $createContentCommand;

    #[\Override]
    public function create()
    {
        $object = parent::create();

        if (isset($this->createContentCommand) && isset($this->parameters['content'])) {
            $objectID = $object->getObjectID();

            foreach ($this->parameters['content'] as $languageID => $content) {
                $parameters = \array_merge([
                    "objectID" => $objectID,
                    "languageID" => $languageID ?: null,
                ], $content);

                (new $this->createContentCommand(...$parameters))();
            }
        }

        return $object;
    }
}

A command MUST provide the parameters objectID and languageID in the constructor and a variable-length argument ...$extra as the last parameter.

final class CreateFooContent{
    public function __construct(
        public readonly int $objectID,
        public readonly ?int $languageID,
        // extra defined properties
        public readonly string $title,
        public readonly string $description,
        // everything else
        ...$extra
    ) {
    }
    
    public function __invoke(): void
    {
        // do something
    }
}

Form Builder

A MultilingualFormBuilderForm is required, which provides helper functions to create multilingual content.

abstract protected function createLanguageFields(?Language $language) : array is called for each language or with null if no multilingualism is active and should return an array of IFormField.
The developer must use the function protected createLanguageForm(): array which returns an array with one TabMenuFormContainer if multilingualism is active or an array of IFormField if not, with the form elements.
When the form is called, the request parameter isMultilingual is read and inserted into the form as a hidden parameter.

All fields with content_(\d+)_, where 0 stands for monolingual, are inserted into the array $parameters[‘content’][\d+], a FormDataProcessor is required for this. In addition, content from the MultilingualContent must be loaded into the appropriate array as soon as an edit form is loaded. An abstract function must be available for this.

@dtdesign
Copy link
Member Author

Thinking about it a while, there is possibly little value in providing a concrete implementation, at least in terms of a rigid implementation like the proposed DBO classes.

I’m also not convinced that the MultilingualFormBuilderForm is really of any help because things are just ever so slightly different in general. What makes matters worse that having multilingual content isn’t just about localizing the content but should be used for things like language specific forum titles. The whole i18n implementation that uses phrases is a dead-end on its own and this can serve as a replacement.

At this point I’m leaning towards not providing an abstract implementation because there is little added value while at the same time increasing complexity by a lot. That doesn’t mean that we won’t provide anything but instead document the concept and provide helpful code pieces. This simply isn’t a problem that can be meaningfully solved by abstraction.

The best approach would be to implement this concept for two existing pieces and then in retrospective look if there are meaningful gains in providing any sort of helpers. Even if there are none, this could still surface later on and can then be introduced when the need is there.

@Cyperghost
Copy link
Contributor

Cyperghost commented Apr 24, 2025

We currently uses multilingual content in different components, and this content is stored in different ways.

I18nHandler is used by:

  • BB Codes uses buttonLabel
  • BB Code Media Provider uses title and regex
  • Captcha Questions uses question and answers
  • Category uses title and description
  • Contact Recipient uses email and name
  • Cronjobs uses description
  • Label uses label
  • Label Group uses groupName
  • Menu uses title
  • Menu Items uses title
  • Notice uses notice
  • Paid Subscription uses title and description
  • Reaction Types uses buttonLabel
  • Reaction Types uses button
  • Paid Subscription uses title and description
  • Reaction Types uses title
  • Smiley uses smileyTitle
  • Style uses styleDescription
  • Trophy uses title and description
  • User Trophy (Custom)description
  • User Rank uses rankTitle
  • User Option Category uses categoryName
  • Text(Area) Option Type
  • Used for options Page Title, Page Description, . ..

An separate table is used by:

  • Article uses the table wcf1_article_content
  • Box uses the table wcf1_box_content
  • Page uses the table wcf1_page_content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Major Task
Development

No branches or pull requests

2 participants