-
Notifications
You must be signed in to change notification settings - Fork 472
Internal sinks, part 3: coordinator plumbing #13346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b47de2b
to
86a1360
Compare
Do you already have steps in mind for persisting the shard ID? Or who should be responsible for it? (You mention in one of the commits that we currently loose this information because the storage controller doesn't store shard IDs for local sources) |
@aljoscha Based on this comment my assumption is that the storage controller will start storing these shard IDs once the system table question is sorted. I don't think it makes sense for someone else than the storage controller to persist that information, but I might be missing something. Edit: I think this is the relevant issue: https://github.com/MaterializeInc/database-issues/issues/3738 |
86a1360
to
8f4d493
Compare
Looks great! I tried to break it and couldn't :-) I think we could also test (i did both manually and they work, so no bug):
|
8f4d493
to
a3ab1cc
Compare
Yes! @jkosh44 got a PR out for that issue earlier today: #13373. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant!
This commit implements the part of `sequence_create_recorded_view` that adds the recorded view to the catalog. It also introduces the mz_recorded_views system table. This makes most of the SQL commands return successfully: ``` materialize=> CREATE RECORDED VIEW v AS SELECT 1; CREATE RECORDED VIEW materialize=> SHOW CREATE RECORDED VIEW v; Recorded View | Create Recorded View ----------------------+---------------------------------------------------------------------------- materialize.public.v | CREATE RECORDED VIEW "materialize"."public"."v" IN CLUSTER [1] AS SELECT 1 (1 row) materialize=> ALTER RECORDED VIEW v RENAME TO x; ALTER RECORDED VIEW materialize=> SELECT * FROM mz_recorded_views; id | oid | schema_id | name | cluster_id | definition ----+-------+-----------+------+------------+------------ u1 | 20288 | 3 | x | 1 | SELECT 1; (1 row) materialize=> DROP RECORDED VIEW x; DROP RECORDED VIEW materialize=> SELECT * FROM mz_recorded_views; id | oid | schema_id | name | cluster_id | definition ----+-----+-----------+------+------------+------------ (0 rows) ``` Even though they are registered in the catalog, the recorded views don't do anything yet.
``` materialize=> CREATE RECORDED VIEW v AS SELECT 1; CREATE RECORDED VIEW materialize=> SHOW RECORDED VIEWS; name ------ v (1 row) materialize=> SHOW FULL RECORDED VIEWS; cluster | name | type ---------+------+------ default | v | user (1 row) ```
Item No 1. EXPLAIN VIEW does not work for recorded views:
This in particular prevents me from checking if monotonicity is properly propagated from the source. |
Item No 2.
This would be a genuinely confusing message for the user. Also, once dropping is supported by the cloud UI, the code there will also need to special-case recorded views to avoid running into the same error. |
Item No 3. Same for SHOW CREATE:
|
Thanks, I've added making EXPLAIN work to the "Future work" list. I'll also check monotonicity then. I didn't come across it when working on the PR, so it might very likely not work correctly right now!
Improving that is already part of "Future work". This is the suggested error in the design doc (modeled after what Postgres does).
Good catch, this should be handled the same way as |
Item No 4. If recorded views exist outside of a particular cluster, why the mz_recorded_views table has a cluster_id column? |
A recorded view can be read by all clusters, but it is always maintained by a single one. That cluster runs the view dataflow and writes it output to storage, from where everyone can read it back. Like an index that gets exported globally, not just to the current cluster. |
Item 5. Same as the others,
|
Item No6 . |
This commit adds the last of the plumbing necessary to make recorded views actually do something. Sinking and sourcing works now: ``` materialize=> CREATE TABLE t (a int); CREATE TABLE materialize=> CREATE RECORDED VIEW v AS SELECT * FROM t; CREATE RECORDED VIEW materialize=> SELECT * FROM v; a --- (0 rows) materialize=> INSERT INTO t VALUES (1), (2), (2); INSERT 0 3 materialize=> SELECT * FROM v; a --- 1 2 2 (3 rows) ```
Prior to this commit, a `DROP RECORDED VIEW` only removed the item from the catalog. Now it also arranges for the compute sink and the storage source to be dropped.
This commit teaches coord how to bootstrap recorded views it finds in the catalog after a restart. Note that currently the contents of recorded views are still lost on restart because the storage controller does not persist shard IDs of local sources.
This commit adds a missing check to ensure that a cluster cannot be dropped (without CASCADE) when it still maintains an active recorded view.
Item No 7.
|
a3ab1cc
to
977821c
Compare
Item No 8. Source errors do not cause the recorded view to enter an errored state. If you apply the patch below and run:
You will see that the test will fail at the final statement -- the recorded view should have entered an error state but it did not . Normal views will start returning the error written in the test under those same circumstances.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience. If you could commit the .td file from the last comment that would be much appreciated -- I could do that myself bug I noticed you are force-pushing and did not want to screw up your pending rebases.
Separately from this PR I will add recorded views to all the other frameworks that are exercising upgrade, etc.
c5aff74
to
633aa4c
Compare
Thanks a lot @philip-stoev! I've added all your findings as follow-ups to this PR in the internal sinks epic. I've also added your testdrive file, but with the failing test commented out. I think that fix will be discussion-worthy enough that it warrants a separate PR. |
The test that ensures that recorded views correctly propagate errors is currently commented out, because that doesn't work yet.
633aa4c
to
e399f8b
Compare
This is the third PR in support of the internal storage sinks feature (#12860).
RECORDED VIEW
syntax in SQL.Part 3 now introduces the necessary plumbing in the coordinator to connect everything together. This means recorded views work now (roughly) as intended:
Future Work
After this PR, the following TODOs are left for the internal sinks feature to be complete:
CREATE SINK ... INTO PERSIST
(Remove user-visible persist sinks #13380)DROP VIEW <recorded-view>
,ALTER VIEW <recorded-view>
, orSHOW CREATE VIEW <recorded-view>
EXPLAIN
.SHOW OBJECTS
.SHOW INDEXES
.Also, there is work left to switch away from the placeholder
RECORDED VIEW
name to something better (see Internal storage sink syntax).Motivation
Part of MaterializeInc/database-issues#3692.
Tips for reviewer
I tried to split the change into sensible commits to make it more digestible.
If you can think of an edge-case that's not covered by recorded_views.slt, chances are I have not considered it and it is currently broken. Let me know!
Relevant design docs:
Testing
Release notes
This PR includes the following user-facing behavior changes: