Skip to content

Why save epoch_millis as string? #2318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
puppylpg opened this issue Oct 1, 2022 · 4 comments
Closed

Why save epoch_millis as string? #2318

puppylpg opened this issue Oct 1, 2022 · 4 comments
Labels
status: waiting-for-feedback We need additional information before we can continue

Comments

@puppylpg
Copy link
Contributor

puppylpg commented Oct 1, 2022

I've noticed that if a field is Instant with epoch_millis format:

    @Field(type = FieldType.Date, format = DateFormat.epoch_millis)
    private Instant timestamp;

spring-data-elasticsearch will convert this object as a json with string value like "timestamp":"1644234181000" rather than long value "timestamp":1644234181000. After digging into the code, I find that it's DateFormatter#format that returns only string value, so timestamp in Instant is converted into a string value rather long.

  1. Although string value(long literally) for epoch_mills is accepted by elasticsearch, it's not mentioned in the doc;
  2. Worse, we save/update our value as long for epoch_millis before(without using spring-data-elasticsearch), so now after using spring-data-elasticsearch, both string and long exist for timestamp field;
  3. Additionally, we also use elasticsearch-hadoop to read data in elasticsearch, and it can only read epoch_millis as long or string, not both.

Any ideas to support to convert epoch_millis and epoch_second for date type as long rather than string? or at least supply an option to determine it as long or string, rather than just use string whatever the real date type is.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Oct 1, 2022
@sothawo sothawo changed the title Why save epoli_millis as string? Why save epoch_millis as string? Oct 1, 2022
@sothawo
Copy link
Collaborator

sothawo commented Oct 1, 2022

The documentation you already linked explicitly states:

Dates will always be rendered as strings, even if they were initially supplied as a long in the JSON document.

Elasticsearch stores the values in the _source the way they came in and when returning the _source in a query Elasticsearch will return what came in.

But when fields for example are retrieved with the fields option or with the docvalue_fields option, they are returned as string, no matter how they were sent in.

Consider this mapping for two fields with the same date format:

{
  "epoch-millis": {
    "mappings": {
      "properties": {
        "date1": {
          "type": "date",
          "format": "epoch_millis"
        },
        "date2": {
          "type": "date",
          "format": "epoch_millis"
        }
      }
    }
  }
}

We store this document:

{
  "date1": 1664641434,
  "date2": "1664641434"
}

The search it with field values (normally you'd set "_source": false when using fields):

{
  "fields": [
    "date1",
    "date2"
  ],
  "_source": true
}

The response is:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "epoch-millis",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "date1": 1664641434,
          "date2": "1664641434"
        },
        "fields": {
          "date2": [
            "1664641434"
          ],
          "date1": [
            "1664641434"
          ]
        }
      }
    ]
  }
}

In the _source the mixed notation is returned, but in the fields the values are returned as strings. Elasticsearch takes whatever it gets, internally uses an numeric instant value, but whenever returning it (besides in the _source) it represents the date as string - as documented.

If Spring Data Elasticsearch would convert Instant properties to a numeric values then it would fail on reading responses when users do not request the full document source but only selected fields, so there's no point in changing that behaviour.

If you got mixed data in your _source of the documents, you'd probably better use fields in your queries to get a consistently representation (which would be string).

One possibility would be to add a new format value epoch_millis_long which would explicitly convert to/from a Long value.

@sothawo sothawo added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-triage An issue we've not yet triaged labels Oct 1, 2022
@puppylpg
Copy link
Contributor Author

puppylpg commented Oct 3, 2022

Thanks very much for your detailed response! It really helps me a lot.

Does spring-data-elasticsearch support query with fields/stored_fields/docvalue_fields options? I don't find clues about that in docs and codes so far.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Oct 3, 2022
@sothawo
Copy link
Collaborator

sothawo commented Oct 3, 2022

Support for fields has been in Spring Data Elasticsearch from the beginning, since 4.4 it is available on every QueryBuilder with the withFields() methods. In older versions I think you had to set it directly on the Query instance.

Support for stored_fields has been added in 4.4 (#2004) to the NativeSearchQuery. In version 5 this is moved to the BaseQueryBuilder (#2250) so it's available for all queries then.

For docvalue_fields there is the open issue #2316.

@sothawo sothawo added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Oct 3, 2022
@puppylpg
Copy link
Contributor Author

puppylpg commented Oct 3, 2022

Thanks~ I'll consider using these in the future.

Appreciated!

@puppylpg puppylpg closed this as completed Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue
Projects
None yet
Development

No branches or pull requests

3 participants