Skip to content

Table, Lookup, Union, Inline, Query and Join datasource types #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anskarl opened this issue May 24, 2020 · 0 comments
Open

Table, Lookup, Union, Inline, Query and Join datasource types #104

anskarl opened this issue May 24, 2020 · 0 comments

Comments

@anskarl
Copy link
Contributor

anskarl commented May 24, 2020

In recent versions of Druid the datasource specification has been extended, in order to support Joins between datasources, Inline datasources, Queries as datasources, etc. Scruid at the moment supports only table datasources, which is the most common type (the one that you get when you perform data ingestion).

With some additions, Scruid can support the following:

  • Table, Lookup, Union, Inline, Query and Join datasource types in Scruid defitions, as well as in DQL API.
  • Druid expressions which are useful for expressing join conditions.
  • Expression operators and functions in join conditions using the same DQL syntax as filtering and post-aggregation conditions.

Example scan query over inline data:

import ing.wbaa.druid._
import ing.wbaa.druid.definitions._
import ing.wbaa.druid.dql.DSL._


val countryData = Locale.getISOCountries.toList
  .map { code =>
    val locale = new Locale("en", code)
    List(code, locale.getISO3Country, locale.getDisplayCountry)
  }
		
 val query: ScanQuery = DQL
  .scan()
  .interval("0000/3000")
  .from(Inline(columnNames, countryData))
  .build()
		

Example inner join over inline data. Specifically the query below joins country ISO-2 code between table wikipedia and inline data of ISO-2 code, ISO-3 code and English name of country:

val query: ScanQuery = DQL
  .scan()
  .columns(
    "channel",
    "cityName",
    "countryIsoCode",
    "user",
    "mapped_country_iso3_code",
    "mapped_country_name")
  .granularity(GranularityType.All)
  .interval("0000/4000")
  .batchSize(10)
  .limit(numberOfResults)
  .from(
    Table("wikipedia")
      .join(
          right = Inline(Seq("iso2_code", "iso3_code", "name"), countryData),
      	  prefix = "mapped_country_",
      	  condition = d"countryIsoCode" === d"mapped_country_iso2_code"
     )
  )
  .build()

The expression d"countryIsoCode" === d"mapped_country_iso2_code" uses the same syntax with filtering and having clauses (e.g., .where(d"countryIsoCode" === d"mapped_country_iso2_code")), alternatively the expression can also written as:

expr"""countryIsoCode == mapped_country_iso2_code"""

A work in progress branch that contains functional Join, Inline and Table datasource types, as well as all the operators of the Druid expressions can be found in https://github.com/anskarl/scruid/tree/wip/datasource

Internal implementation details

All native query types in package ing.wbaa.druid extend the DruidNativeQuery trait, in which the dataSource field from String changes to Datasource type:

sealed trait DruidNativeQuery extends DruidQuery {

  val dataSource: Datasource

}

Trait Datasource is located in package ing.wbaa.druid.definitions:

sealed trait Datasource {
  val `type`: DatasourceType
}

The types Table, Lookup, Union, Inline, Query and Join are outlined in the enumeration DatasourceType. Each one of them is represented by a trait that extends the Datasource.
For example, Union datasource type:

case class Union(dataSources: Iterable[String]) extends Datasource {
  override val `type`: DatasourceType = DatasourceType.Union
}

For Join operations, the left side of the operation support any of Table, Lookup, Union, Inline, Query and Join datasource types, while the right side of the operation supports only Lookup, Query and Inline types.
For that reason Lookup, Query and Inline classes extend RightHandDatasource trait (which directly extends Datasource).

sealed trait RightHandDatasource extends Datasource


case class Inline(columnNames: Iterable[String], rows: Iterable[Iterable[String]])
    extends RightHandDatasource {
  override val `type`: DatasourceType = DatasourceType.Inline
}

Regarding DQL, the main additions are:

  • Support for Druid Expressions, in a similar way with Filtering and Aggregation Expression.
  • Implicits that convert Dim to expression
  • Operators between Dim that result to expressions
  • Extension function (through implicit value class) for Datasource that helps joins to be performed with DSL-like expressions

For Druid expressions that are syntactically common with Filtering and Aggregation expressions, there are BaseExpression and BaseArithmeticExpression traits in package ing.wbaa.druid.dql.expressions.

  • BaseExpression provides asFilteringExpression and asExpression functions that convert the BaseExpression to FilteringExpression and Expression, respectively.
  • Similarly, BaseArithmeticExpression provides asArithmeticPostAgg and asExpression functions that convert the BaseArithmeticExpression to ArithmeticPostAgg and Expression, respectively.

For example the BaseExpression for and expression, is represented as an AND logical expression filter when appears in a where clause, and as && (binary logical AND) expression inside a Join condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant