Skip to content

Data projection with views #6181

Open
Open
@peternied

Description

@peternied

Is your feature request related to a problem? Please describe.

Sometimes there are clear relationships between indices, e.g. http-logs-2023-01-20 http-logs-2023-01-21. As data gets reshaped or physically moved there is a desire to preserve how the data is referenced. OpenSearch Dashboards has a feature around this called index patterns that doesn't exist in the backend.

If there was a way to create a logical grouping of these physical storage mediums the responsibilities between data usage and ingestion could be separated. I think this would be a big win for lower maintenance of OpenSearch clusters over time.

Describe the solution you'd like

In SQL there are tables and views, views offer flexibility and centralized management, see great answers on this stack overflow question What is a good reason to use SQL views? Pulling from the great answer by user210748 I'd suggest this system does the following:

  • Views can join and simplify multiple indices into a single virtual index
  • Views can act as aggregated tables, where the database engine aggregates data (sum, average etc) and presents the calculated results as part of the data
  • Views can hide the complexity of data; for example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying indices
  • Views take very little space to store; the database contains only the definition of a view, not a copy of all the data it presents
  • Depending on the SQL engine used, views can provide extra security
  • Views can limit the degree of exposure of an index or indices to the outer world

Describe alternatives you've considered

Aliases

OpenSearch already has aliases that represent a virtualized view, maybe they could be built up to offer these additional features. Alternatively, there are some quirks like the is_write_index that we might want to be careful around.

Data streams

Data streams are a virtualized view focused on managing the physical storage, maybe they could be built up to handle data projections filtering.

Additional context

Coming from the security plugin, there are features for document level security (DLS), field level security, and field masking. These features are built into index permissions and they are kind of clunky where a query to apply DLS has to be double-encoded in the json body. Views could easily encompass these scenarios. Modeling view creation and management as a separately from managing permissions to the views is a cleaner separation compared to what is available in the security plugin.

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcdiscussIssues intended to help drive brainstorming and decision makingenhancementEnhancement or improvement to existing feature or requestfeatureNew feature or request

    Type

    No type

    Projects

    Status

    Later (6 months plus)

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions