Skip to content

Default FieldType.Auto on Arrays of Objects #1803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Stexxen opened this issue May 9, 2021 · 3 comments · Fixed by #1807
Closed

Default FieldType.Auto on Arrays of Objects #1803

Stexxen opened this issue May 9, 2021 · 3 comments · Fixed by #1807
Labels
type: documentation A documentation update

Comments

@Stexxen
Copy link
Contributor

Stexxen commented May 9, 2021

I'm not sure if this is by design or not, so appologies if it is.

If an array of Objects is not annotated with FieldType.Object the created mapping will ignore the FieldType annotations on the fields within that Object definition.

Here is some sample code that shows this.

@SpringBootApplication
@Import(EConfig.class)
public class Keyword implements CommandLineRunner {

  public static void main(String[] args) {
    SpringApplication.run(Keyword.class).close();
  }

  @Autowired
  ElasticScrollRepository repo;

  @Override
  public void run(String... args) throws Exception {
    repo.count();

    // State 1

    TestRecord testRecord =new TestRecord();
    testRecord.subRecordList2 = new ArrayList<>();
    SubRecord subRecord = new SubRecord();
    subRecord.subKeyword = "Key";
    subRecord.subString = "String";
    testRecord.subRecordList2.add(subRecord);
    repo.save(testRecord);

    // State 2
  }

}

@Configuration
@EnableElasticsearchRepositories(basePackageClasses = ElasticScrollRepository.class)
class EConfig extends AbstractElasticsearchConfiguration {

  @Override
  public RestHighLevelClient elasticsearchClient() {
    final ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                                                            .connectedTo("localhost:9200")
                                                            .build();
    return RestClients.create(clientConfiguration).rest();
  }
}

@Repository
interface ElasticScrollRepository extends ElasticsearchRepository<TestRecord, String> {
}

@Document(indexName = "keyword_test")
class TestRecord {

  @Id
  public String id;

  @Field(name = "sub_record", type = FieldType.Object)
  public SubRecord subRecord;

  @Field(name = "sub_array_1", type = FieldType.Object)
  public List<SubRecord> subRecordList1;

  @Field(name = "sub_array_2")
  public List<SubRecord> subRecordList2;

}

class SubRecord {

  @Field(name = "sub_keyword", type = FieldType.Keyword)
  public String subKeyword;

  @Field(name = "sub_string", type = FieldType.Text)
  public String subString;

}

Within the TestRecord there are 3 fields

  @Field(name = "sub_record", type = FieldType.Object)
  public SubRecord subRecord;

  @Field(name = "sub_array_1", type = FieldType.Object)
  public List<SubRecord> subRecordList1;

  @Field(name = "sub_array_2")
  public List<SubRecord> subRecordList2;

During execution at // State 1 the created mapping looks like this

{
  "keyword_test" : {
    "mappings" : {
      "properties" : {
        "sub_array_1" : {
          "properties" : {
            "sub_keyword" : {
              "type" : "keyword"
            },
            "sub_string" : {
              "type" : "text"
            }
          }
        },
        "sub_record" : {
          "properties" : {
            "sub_keyword" : {
              "type" : "keyword"
            },
            "sub_string" : {
              "type" : "text"
            }
          }
        }
      }
    }
  }
}

Everything is ok, sub_array_1 and the fields are correctly defined as keywords where required. sub_array_2 is not defined in the mapping (maybe because of the default FieldType.Auto?)

Then at // State 2 we've added our first record. The mapping now looks like this

{
  "keyword_test" : {
    "mappings" : {
      "properties" : {
-- SNIP --
        "sub_array_1" : {
          "properties" : {
            "sub_keyword" : {
              "type" : "keyword"
            },
            "sub_string" : {
              "type" : "text"
            }
          }
        },
        "sub_array_2" : {
          "properties" : {
            "sub_keyword" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "sub_string" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
-- SNIP --
      }
    }
  }
}

You can see within sub_array_2 the fields sub_keyword and sub_string are both created with a text field and a keyword field as if they were defined as a MultiField

Is this the correct result? Which I think then means that all arrays of Objects must be annoted with a FieldType.Object to make the mapping creation correctly observe the annotations on any child fields.

I wasn't sure if this is as designed or not. We've now annotated all our Lists and arrays with FieldType.Object and recreated the index once we realised this.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label May 9, 2021
@sothawo
Copy link
Collaborator

sothawo commented May 9, 2021

That's exactly the behaviour for FieldType.Auto. auto means that no mapping is written for that property (state 1) but that Elasticsearch will automatically create mapping entries when data is first encountered (state 2). The mapping that is written then is done by Elasticsearch and Elasticsearch does not know about the @Field annotations on these properties.

Does that answer your question?

@sothawo sothawo added the status: waiting-for-feedback We need additional information before we can continue label May 9, 2021
@Stexxen
Copy link
Contributor Author

Stexxen commented May 9, 2021

Thanks, initially I'd assumed Auto meant it would perform a best guess at the mapping, but when the results weren't what I expected I wrote the test above.

I couldn't find anything explicit with the definition you just provided though, where should I be looking?
I've been using here and here

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels May 9, 2021
@sothawo
Copy link
Collaborator

sothawo commented May 9, 2021

That's based on the mapping rules defined by Elasticsearch. We should add some documentation that auto does not create a mapping entry but lets Elasticsearch do that.

@sothawo sothawo added type: documentation A documentation update and removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged labels May 9, 2021
sothawo added a commit that referenced this issue May 10, 2021
@sothawo sothawo added this to the 4.3 M1 (2021.1.0) milestone May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation A documentation update
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants