Description
Preface: this RFC (Request For Comments) is meant to gather feedback and opinions for the feature set and some implementation details for @discordjs/sharder
! While a lot of the things mentioned in this RFC will be opinionated from my side (like what I’d like to see implemented in the module), not everything might be added in the initial version or some extra things might be added as extras.
This RFC is split into several parts, described below. As an important note, through this entire specification, "shard" is not used to describe a gateway shard, those are handled by @discordjs/ws
(see #8083). "Shard" is used to describe the client that the manager instantiates. Those could be (but are not limited to): workers, processes, or servers.
Before any work is done on this, I would like to ask them kindly to not take much if any inspiration from the current shard manager, and that this implementation needs to be as agnostic as possible so it can be used by bots made with discord.js's components (or even with older versions of discord.js).
Yes, I'm also aware this is over-engineered, but everything is designed with composability in mind, and has been thought with experience from implementing this in #7204. Truth be told, without some of those components, certain things would be significantly harder to implement.
ShardManager
Alternatively: Sharder
, ProcessManager
, ClientManager
This is the entry point in using this module. Its job is to handle the creation of shards, and is the central place where shards can intercommunicate. This manager also sets several parameters via environment variables (if available) to configure the clients.
As a result, a ShardManager
has access to one or more ShardClient
s, and has a channel for each one of them.
API
// Error handling: TBD
// Lifecycle handlers: TBD
class ShardManager {
public constructor(options: ShardManagerOptions);
}
interface ShardManagerOptions {
strategy?: ChannelHandler | string;
}
Note: Shards will be configured in the strategy's options. The reasoning for this is simple, not all strategies have to be single-layer. There are also proxies which act as a second layer,
Strategies
This class also instantiates shards in one of the following four ways:
- Worker: using
worker_threads
, it has the benefit of faster IPC than process-based strategies. - Fork: using
child_process.fork
, it's how the current shard manager creates the processes, and has the benefit that you can separate the manager from the worker.Note: If you open ports in the processes, you will have to open them at different ones, otherwise it will error.
- Cluster: using
cluster.fork
, unlike the fork mode, ports are shared and load-balanced.Note: You will have to handle the cluster-worker logic in the same file.
- Network: this is meant for cross-server sharding, and requires
ShardManagerProxy
. More information below.
Custom strategies will also be supported (although if you see something useful, please let us know and we'll consider it!).
Default: Fork
?
Lifecycle Handlers
This class will emit events (not necessarily using EventEmitter
) when:
- A shard has been created.
- A shard has been restarted.
- A shard has been destroyed.
- A shard has signalled a new status.
- A shard has signalled a "ready" status.
- A shard has signalled an "exit" status.
- A shard has signalled an "restarting" status.
- A shard has sent an invalid message. If unset, this will print the error to console.
- A shard has sent a ping.
- A shard has not sent a ping in the maximum time-frame specified. If unset, it will restart the shard.
Life checks
All of the manager's requests must automatically timeout within maximum ping time, so if the shards are configured to ping every 45 seconds, and have a maximum time of 60 seconds, then the request timeout must be 60 seconds. This timeout can also be configured globally (defaults to maximum ping time) and per-request (defaults to global value).
Similarly, if a shard hasn't sent a ping within the maximum time, the lifecycle handler is called.
ShardClient
Alternatively: ShardWorker
, Worker
, Client
This is a class that instantiates the channel with its respective ShardManager
(or ShardManagerProxy
) using the correct stream, and boils down to a wrapper of a duplex with extra management.
API
// Signals: TBD
// Error handling: TBD
class ShardClient {
public constructor(options: ShardClientOptions);
public static registerMessageHandler(name: string, op: () => MessageHandler): typeof ShardClient;
public static registerMessageTransformer(name: string, op: () => MessageTransformer): typeof ShardClient;
}
interface ShardClientOptions {
messageHandler?: MessageHandler | string;
messageTransformer?: MessageTransformer | string;
}
Signals
- Starting: the shard is starting up and is not yet ready to respond to the manager's requests.
- Ready: the shard is ready and fully operative.
- Exit: the shard is shutting down and should not be restarted.
- Restarting: the shard is shutting down and should be restarted.
MessageHandler
Alternatively: MessageFormat
, MessageBroker
, MessageSerder
This is a class that defines how messages are serialized and deserialized.
API
type ChannelData = string | Buffer;
class MessageTransformer {
public serialize(message: unknown, channel): Awaitable<ChannelData>;
public deserialize(message: ChannelData, channel): Awaitable<unknown>;
}
Strategies
- JSON: data is serialized with
JSON.parse
and deserialized withJSON.stringify
. - V8: data is serialized with
v8.serialize
and deserialized withv8.deserialize
.
Custom strategies will also be supported (custom format, ETF, YAML, TOML, XML, you name it).
Default: JSON
?
MessageTransformer
Alternatively: MessageMiddleware
, EdgeMiddleware
, ChannelMiddleware
This is a class that defines what to do when reading a message, and what to do when writing one:
- When reading:
graph LR A[Channel] A -->|Raw| B(MessageTransformer) B -->|Transformed| C(MessageHandler) B -->|Rejected| D[Channel] C -->|Deserialized| E[Application]
- When writing:
graph LR A[Application] A -->|Raw output| B(MessageHandler) B -->|Serialized| C(MessageTransformer) C -->|Raw| D[Channel]
Note: This is optional but also composable (e.g. you can have two transformers) with insertion order of execution when writing (and inverse for reading), and may be used to transform data into a compressed, encrypted, decorated, or transformed in any way the developer desires them to be.
[Gzip, Aes256]When writing, the data will go through
Gzip
and then toAes256
, while reading goes the opposite direction (Aes256
, thenGzip
).
API
type ChannelData = string | Buffer;
class MessageTransformer {
public read(message: ChannelData, channel): Awaitable<ChannelData>;
public write(message: ChannelData, channel): Awaitable<ChannelData>;
}
Strategies
It is unknown if we will ship any built-in strategy for encryption or transformation, since they're very application-defined, but we will at least provide two from Node.js:
- Gzip: uses
zlib.gzip
andzlib.gunzip
. - Brotli: uses
zlib.brotliCompress
andzlib.brotliDecompress
.
Warning: We may change the functions used if we find a way to work with streams for better efficiency, but the algorithms may not change.
Custom strategies will also be supported (although if you see something useful, please let us know and we'll consider it!).
Default: []
ShardManagerProxy
This is a special class that operates as a shard for ShardManager
and communicates to it via HTTPS, HTTP2, or HTTP3/QUIC, depending on the networking strategy used. Custom ones (such as gRPC) may be added. Encryption is recommended, so plain HTTP might not be available. There might also be a need to support SSH tunnels to bypass firewalls for greater security. A ShardManagerProxy
is configured in a way similar to ShardManager
, supporting all of its features, but also adds logic to communicate with the manager itself.
The proxy may also use a strategy so if a shard needs to send a message to another, which is available within the proxy, no request is made to the parent, otherwise it will send to the parent, which will try to find the shard among the different proxies.
Questions to answer
- Should we keep the names for the classes, or are any of the suggested alternative names better?
- Should we use
Result<T, E>
classes to avoidtry
/catch
and enhance performance? - Should we support
async
inMessageTransformer
? Require streams only? - Should we support have a raw data approach, or a command approach? If the latter, should we have
eval
?
Raw data approach, commands can be implemented on a bot-level. Similarly to how Discord.js just emits events, people are free to build frameworks on top of@discordjs/sharder
, so there's no need for us to force a command format, which would pose several limitations on how users can structure their payloads. - Should messages be "repliable" (requires tracking as well as message tagging, etc)?
- Always? Opt-in? Opt-out?
Opt-in, not all broadcasts or requests need to be replied to. - Should we have timeouts?
Yes, and tasks should be aborted and de-queued once the timeout is done. - Should we be able to cancel a request before/without timeout?
I personally think this could be beneficial, but the implementation remains to be seen.
- Always? Opt-in? Opt-out?
- What should be done when a message is sent from shard A to shard B, but B hasn't signalled "ready" (and is not "exit")? Should it wait for it to signal ready, or abort early? If the former, should we allow an option to "hint" how long a shard takes to ready?
- Is there a need for an
error
signal, or do we useexit
for it? - If a shard exits with
process.exit()
without signalling the manager (or proxy), should it try to restart indefinitely? Have a limit that resets on "restart"? - If
ShardManager
dies, what should the shards do? - If a
ShardManagerProxy
goes offline, what should its shards do? What should the manager do? Load-balance to other proxies? - If a
ShardManagerProxy
is offline at spawn, what should the manager do? Load-balance to other proxies? - If a
ShardManagerProxy
goes back online, should the manager load-balance its shards back to it? NEW
: Should the HTTP-based channel strategies be pooling or open on demand?
The WebSocket strategy obviously needs to be open at all times, and keeping a connection open makes it easier for messages to be sent in both ways, as well as they reduce latency since they need to do fewer network hops as they don't require a handshake on every request, but it'll also require an advanced system to properly read messages, and will also make aborting requests harder.NEW
: Should differentShardManagerProxy
s be able to communicate directly to each other, similar to P2P? Or should it be centralized, requiring theShardManager
?
I personally think there isn't really much a need for this, a centralized cache database (such as Redis) could store the information, same for RabbitMQ or alternatives. Is there a need for each proxy to know each other? One pro of having this is that it makes proxies more resilient to outages from the manager, but one downside is that it would need to send a lot of data between proxies and the manager to keep all the information synchronized. I believe it's best to leave this job to Redis and let the user decide how it should be used. After all, proxies are very configurable.NEW
: Should we allowShardManagerProxy
to have multiple managers, or stick to a single one for simplicity?
This way, managers can be mirrored, so if the main one is down, the proxies connect to a secondary one. How the data is shared between managers is up to the user, we will try to give the hooks required for the storage to be easy to store and persist.NEW
: What shouldShardManagerProxy
if all the managers are offline? Should they just stay up?
Their capabilities may be limited, since they can't load-balance by themselves, and they may not be able to communicate to each other without P2P communication or a central message broker system, but most of the operations should, in theory, be able to stay up. I believe we can make the behaviour customizable by the developer, with the default of keeping the work and try to reconnect once the managers are up.
Anything else missing
If you think this RFC is missing any critical/crucial bits of information, let me know. I will add it to the main issue body with a bolded EDIT tag to ensure everyone sees it (if they don't want to check the edited history).
Changelog
- Added answers to questions 4 and 5.
- Added four more questions (13, 14, 15, 16), all related to
ShardManagerProxy
.
Metadata
Metadata
Assignees
Type
Projects
Status