Internal sinks, part 3: coordinator plumbing #13346

teskje · 2022-06-29T13:37:07Z

This is the third PR in support of the internal storage sinks feature (#12860).

Part 1 (Internal sinks, part 1 #13251) massaged our existing persist sinks implementation to look like the internal sinks we want.
Part 2 (Internal sinks, part 2: SQL support #13324) added support for the RECORDED VIEW syntax in SQL.

Part 3 now introduces the necessary plumbing in the coordinator to connect everything together. This means recorded views work now (roughly) as intended:

-- demo that we can sink and source
materialize=> CREATE TABLE t (a int, b int);
CREATE TABLE
materialize=> INSERT INTO t VALUES (1, 2), (3, 4), (5, 6);
INSERT 0 3
materialize=> CREATE RECORDED VIEW v AS SELECT a + b FROM t;
CREATE RECORDED VIEW
materialize=> SELECT * FROM v;
 ?column?
----------
        3
        7
       11
(3 rows)

-- demo that other stuff can build on top of recorded views
materialize=> CREATE DEFAULT INDEX ON v;
CREATE INDEX
materialize=> CREATE RECORDED VIEW v2 AS SELECT * FROM v;
CREATE RECORDED VIEW
materialize=> DELETE FROM t WHERE a != 1;
DELETE 2
materialize=> SELECT * FROM v2;
 ?column?
----------
        3
(1 row)

-- demo that dropping takes dependents into account
materialize=> DROP RECORDED VIEW v;
ERROR:  cannot drop materialize.public.v: still depended upon by catalog item 'materialize.public.v2'
materialize=> DROP RECORDED VIEW v CASCADE;
DROP RECORDED VIEW

Future Work

After this PR, the following TODOs are left for the internal sinks feature to be complete:

Remove support for CREATE SINK ... INTO PERSIST (Remove user-visible persist sinks #13380)
Introduce a helpful error when a user does DROP VIEW <recorded-view>, ALTER VIEW <recorded-view>, or SHOW CREATE VIEW <recorded-view>
Disallow creating recorded views on log sources.
Add support for EXPLAIN.
Add support for SHOW OBJECTS.
Add support for SHOW INDEXES.
Propagate dataflow errors.

Also, there is work left to switch away from the placeholder RECORDED VIEW name to something better (see Internal storage sink syntax).

Motivation

This PR adds a known-desirable feature.

Part of MaterializeInc/database-issues#3692.

Tips for reviewer

I tried to split the change into sensible commits to make it more digestible.

If you can think of an edge-case that's not covered by recorded_views.slt, chances are I have not considered it and it is currently broken. Let me know!

Relevant design docs:

Testing

This PR has adequate test coverage / QA involvement has been duly considered.

Release notes

This PR includes the following user-facing behavior changes:

Make recorded views functional.

aljoscha · 2022-06-30T12:33:28Z

Do you already have steps in mind for persisting the shard ID? Or who should be responsible for it? (You mention in one of the commits that we currently loose this information because the storage controller doesn't store shard IDs for local sources)

teskje · 2022-06-30T12:39:47Z

@aljoscha Based on this comment my assumption is that the storage controller will start storing these shard IDs once the system table question is sorted. I don't think it makes sense for someone else than the storage controller to persist that information, but I might be missing something.

Edit: I think this is the relevant issue: https://github.com/MaterializeInc/database-issues/issues/3738

lluki · 2022-06-30T13:20:30Z

Looks great! I tried to break it and couldn't :-) I think we could also test (i did both manually and they work, so no bug):

instead of cluster size 1, use cluster size '2-1' just to be sure it'll work with multiple workers
test that indices/materialized views create correct dependencies on recorded views?

benesch · 2022-06-30T19:25:03Z

@aljoscha Based on this comment my assumption is that the storage controller will start storing these shard IDs once the system table question is sorted. I don't think it makes sense for someone else than the storage controller to persist that information, but I might be missing something.

Edit: I think this is the relevant issue: MaterializeInc/database-issues#3738

Yes! @jkosh44 got a PR out for that issue earlier today: #13373.

benesch

Brilliant!

This commit implements the part of `sequence_create_recorded_view` that adds the recorded view to the catalog. It also introduces the mz_recorded_views system table. This makes most of the SQL commands return successfully: ``` materialize=> CREATE RECORDED VIEW v AS SELECT 1; CREATE RECORDED VIEW materialize=> SHOW CREATE RECORDED VIEW v; Recorded View | Create Recorded View ----------------------+---------------------------------------------------------------------------- materialize.public.v | CREATE RECORDED VIEW "materialize"."public"."v" IN CLUSTER [1] AS SELECT 1 (1 row) materialize=> ALTER RECORDED VIEW v RENAME TO x; ALTER RECORDED VIEW materialize=> SELECT * FROM mz_recorded_views; id | oid | schema_id | name | cluster_id | definition ----+-------+-----------+------+------------+------------ u1 | 20288 | 3 | x | 1 | SELECT 1; (1 row) materialize=> DROP RECORDED VIEW x; DROP RECORDED VIEW materialize=> SELECT * FROM mz_recorded_views; id | oid | schema_id | name | cluster_id | definition ----+-----+-----------+------+------------+------------ (0 rows) ``` Even though they are registered in the catalog, the recorded views don't do anything yet.

``` materialize=> CREATE RECORDED VIEW v AS SELECT 1; CREATE RECORDED VIEW materialize=> SHOW RECORDED VIEWS; name ------ v (1 row) materialize=> SHOW FULL RECORDED VIEWS; cluster | name | type ---------+------+------ default | v | user (1 row) ```

philip-stoev · 2022-07-04T07:36:56Z

Item No 1. EXPLAIN VIEW does not work for recorded views:

materialize=> explain physical plan for view v1;
ERROR:  Expected [u2 AS materialize.public.v1] to be a view, not a recorded view
materialize=> explain physical plan for recorded view v1;
ERROR:  Expected SELECT, VALUES, or a subquery in the query body, found RECORDED
LINE 1: explain physical plan for recorded view v1;

This in particular prevents me from checking if monotonicity is properly propagated from the source.

philip-stoev · 2022-07-04T07:38:18Z

Item No 2.

materialize=> drop view v1;
ERROR:  materialize.public.v1 is not of type VIEW

This would be a genuinely confusing message for the user. Also, once dropping is supported by the cloud UI, the code there will also need to special-case recorded views to avoid running into the same error.

philip-stoev · 2022-07-04T07:50:10Z

Item No 3. Same for SHOW CREATE:

materialize=> show create view v2;
ERROR:  materialize.public.v2 is not a view

teskje · 2022-07-04T07:54:57Z

Item No 1.

Thanks, I've added making EXPLAIN work to the "Future work" list. I'll also check monotonicity then. I didn't come across it when working on the PR, so it might very likely not work correctly right now!

Item No 2.

Improving that is already part of "Future work". This is the suggested error in the design doc (modeled after what Postgres does).

Item No 3.

Good catch, this should be handled the same way as DROP VIEW.

philip-stoev · 2022-07-04T08:01:38Z

Item No 4. If recorded views exist outside of a particular cluster, why the mz_recorded_views table has a cluster_id column?

teskje · 2022-07-04T08:26:43Z

Item No 4.

A recorded view can be read by all clusters, but it is always maintained by a single one. That cluster runs the view dataflow and writes it output to storage, from where everyone can read it back. Like an index that gets exported globally, not just to the current cluster.

philip-stoev · 2022-07-04T08:30:33Z

Item 5. Same as the others, ALTER VIEW RENAME does not work:

materialize=> alter view v1 rename to v3;
ERROR:  materialize.public.v1 is a recorded view not a VIEW

philip-stoev · 2022-07-04T08:32:12Z

Item No6 . SHOW OBJECTS does not list recorded views. It does list normal and materialized views.

This commit adds the last of the plumbing necessary to make recorded views actually do something. Sinking and sourcing works now: ``` materialize=> CREATE TABLE t (a int); CREATE TABLE materialize=> CREATE RECORDED VIEW v AS SELECT * FROM t; CREATE RECORDED VIEW materialize=> SELECT * FROM v; a --- (0 rows) materialize=> INSERT INTO t VALUES (1), (2), (2); INSERT 0 3 materialize=> SELECT * FROM v; a --- 1 2 2 (3 rows) ```

Prior to this commit, a `DROP RECORDED VIEW` only removed the item from the catalog. Now it also arranges for the compute sink and the storage source to be dropped.

This commit teaches coord how to bootstrap recorded views it finds in the catalog after a restart. Note that currently the contents of recorded views are still lost on restart because the storage controller does not persist shard IDs of local sources.

This commit adds a missing check to ensure that a cluster cannot be dropped (without CASCADE) when it still maintains an active recorded view.

philip-stoev · 2022-07-04T08:34:30Z

Item No 7. SHOW INDEXES returns an empty list even if an index has been created explicitly.

materialize=> create default index on v1;
CREATE INDEX
materialize=> show indexes in v1;
 cluster | on_name | key_name | seq_in_index | column_name | expression | nullable 
---------+---------+----------+--------------+-------------+------------+----------
(0 rows)

philip-stoev · 2022-07-04T09:09:18Z

Item No 8. Source errors do not cause the recorded view to enter an errored state. If you apply the patch below and run:

cd test/testdrive
./mzcompose run default recorded-views.td

You will see that the test will fail at the final statement -- the recorded view should have entered an error state but it did not . Normal views will start returning the error written in the test under those same circumstances.

diff --git a/test/testdrive/recorded-views.td b/test/testdrive/recorded-views.td
new file mode 100644
index 000000000..38022e83e
--- /dev/null
+++ b/test/testdrive/recorded-views.td
@@ -0,0 +1,79 @@
+
+# Copyright Materialize, Inc. and contributors. All rights reserved.
+#
+# Use of this software is governed by the Business Source License
+# included in the LICENSE file at the root of this repository.
+#
+# As of the Change Date specified in that file, in accordance with
+# the Business Source License, use of this software will be governed
+# by the Apache License, Version 2.0.
+
+# Additional test for recorded views, on top of those in test/sqllogictest/recorded_views.slt
+
+
+
+# Kafka source as a source for a recorded view
+
+$ set recorded-views={
+        "type" : "record",
+        "name" : "test",
+        "fields" : [
+            {"name":"f1", "type":"string"}
+        ]
+    }
+
+$ kafka-create-topic topic=recorded-views
+
+$ kafka-ingest format=avro topic=recorded-views schema=${recorded-views} publish=true
+{"f1": "123"}
+
+> CREATE MATERIALIZED SOURCE s1
+  FROM KAFKA BROKER '${testdrive.kafka-addr}' TOPIC
+  'testdrive-recorded-views-${testdrive.seed}'
+  FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY '${testdrive.schema-registry-url}'
+  ENVELOPE NONE
+
+$ kafka-ingest format=avro topic=recorded-views schema=${recorded-views} publish=true
+{"f1": "234"}
+
+> SELECT COUNT(*) FROM s1;
+2
+
+> CREATE RECORDED VIEW v1 AS SELECT COUNT(f1::integer) AS c1 FROM s1;
+
+$ kafka-ingest format=avro topic=recorded-views schema=${recorded-views} publish=true
+{"f1": "345"}
+
+> CREATE SINK sink1 FROM v1
+  INTO KAFKA BROKER '${testdrive.kafka-addr}'
+  TOPIC 'recorded-view-sink'
+  FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY '${testdrive.schema-registry-url}'
+
+$ kafka-ingest format=avro topic=recorded-views schema=${recorded-views} publish=true
+{"f1": "456"}
+
+$ set-regex match=\d{13} replacement=<TIMESTAMP>
+
+> SELECT * FROM v1;
+4
+
+$ kafka-verify format=avro sink=materialize.public.sink1 sort-messages=true
+{"before": null, "after": {"row": {"c1": 2}}}
+{"before": {"row": {"c1": 2}}, "after": {"row": {"c1": 4}}}
+
+> BEGIN
+
+> DECLARE c CURSOR FOR TAIL v1;
+
+> FETCH ALL c;
+<TIMESTAMP> 1 4
+
+> COMMIT
+
+# Inject failure in the source
+
+$ kafka-ingest format=avro topic=recorded-views schema=${recorded-views} publish=true
+{"f1": "ABC"}
+
+! SELECT * FROM v1;
+contains: invalid input syntax for type integer

philip-stoev

Thank you for your patience. If you could commit the .td file from the last comment that would be much appreciated -- I could do that myself bug I noticed you are force-pushing and did not want to screw up your pending rebases.

Separately from this PR I will add recorded views to all the other frameworks that are exercising upgrade, etc.

teskje · 2022-07-04T10:20:11Z

Thanks a lot @philip-stoev! I've added all your findings as follow-ups to this PR in the internal sinks epic. I've also added your testdrive file, but with the failing test commented out. I think that fix will be discussion-worthy enough that it warrants a separate PR.

The test that ensures that recorded views correctly propagate errors is currently commented out, because that doesn't work yet.

teskje force-pushed the recorded-view-plumbing branch 4 times, most recently from b47de2b to 86a1360 Compare June 30, 2022 11:02

teskje marked this pull request as ready for review June 30, 2022 11:48

teskje requested review from benesch, aljoscha, jkosh44, lluki and philip-stoev June 30, 2022 11:50

teskje force-pushed the recorded-view-plumbing branch from 86a1360 to 8f4d493 Compare June 30, 2022 12:48

teskje force-pushed the recorded-view-plumbing branch from 8f4d493 to a3ab1cc Compare June 30, 2022 15:40

teskje mentioned this pull request Jul 1, 2022

Remove user-visible persist sinks #13380

Merged

1 task

benesch approved these changes Jul 1, 2022

View reviewed changes

teskje added 4 commits July 4, 2022 09:11

Add ExecuteResponse::CreatedRecordedView

3f6bb44

Add CatalogItem::RecordedView

1c88758

teskje added 4 commits July 4, 2022 10:33

coord: clean up when dropping a recorded view

112858f

Prior to this commit, a `DROP RECORDED VIEW` only removed the item from the catalog. Now it also arranges for the compute sink and the storage source to be dropped.

coord: bootstrap recorded views

bac5c55

This commit teaches coord how to bootstrap recorded views it finds in the catalog after a restart. Note that currently the contents of recorded views are still lost on restart because the storage controller does not persist shard IDs of local sources.

Make recorded views prevent cluster drops

7e06a78

This commit adds a missing check to ensure that a cluster cannot be dropped (without CASCADE) when it still maintains an active recorded view.

teskje force-pushed the recorded-view-plumbing branch from a3ab1cc to 977821c Compare July 4, 2022 08:37

philip-stoev approved these changes Jul 4, 2022

View reviewed changes

teskje force-pushed the recorded-view-plumbing branch from c5aff74 to 633aa4c Compare July 4, 2022 10:17

teskje added 2 commits July 4, 2022 12:39

test/sqllogictest: add tests for recorded views

25814a8

testdrive: add recorded view tests

e399f8b

The test that ensures that recorded views correctly propagate errors is currently commented out, because that doesn't work yet.

teskje force-pushed the recorded-view-plumbing branch from 633aa4c to e399f8b Compare July 4, 2022 10:39

teskje merged commit 740cb05 into MaterializeInc:main Jul 4, 2022

teskje deleted the recorded-view-plumbing branch July 4, 2022 11:07

This was referenced Jul 4, 2022

adapter: add recorded views to mz_relations #13433

Merged

compute: sink dataflow errors to persist #13437

Merged

sql: make errors point to recorded view commands #13445

Merged

sql*: add support for EXPLAIN RECORDED VIEW #13474

Merged

Internal sinks, part 3: coordinator plumbing #13346

Internal sinks, part 3: coordinator plumbing #13346

Uh oh!

Conversation

teskje commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Future Work

Motivation

Tips for reviewer

Testing

Release notes

Uh oh!

aljoscha commented Jun 30, 2022

Uh oh!

teskje commented Jun 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lluki commented Jun 30, 2022

Uh oh!

benesch commented Jun 30, 2022

Uh oh!

benesch left a comment

Choose a reason for hiding this comment

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

teskje commented Jul 4, 2022

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

teskje commented Jul 4, 2022

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

philip-stoev commented Jul 4, 2022

Uh oh!

philip-stoev left a comment

Choose a reason for hiding this comment

Uh oh!

teskje commented Jul 4, 2022

Uh oh!

Uh oh!

teskje commented Jun 29, 2022 •

edited

Loading

teskje commented Jun 30, 2022 •

edited

Loading