Constants for dimensions and variables

Can we live without constants for dimension names in datasets (cf. https://github.com/pystatgen/sgkit/pull/10)?

Specifically I mean these in `api.py`:

```python
DIM_VARIANT = "variants"
DIM_SAMPLE = "samples"
DIM_PLOIDY = "ploidy"
DIM_ALLELE = "alleles"
DIM_GENOTYPE = "genotypes"
```

Using these to build/append a dataset involves something like this:

```python
data_vars = { "variant/contig": ([DIM_VARIANT], variant_contig)}
```

What I'm wondering is if we're going to use constants, why stop at the dimension names?  It could look like this:

```python
data_vars = { f"{VAR_GROUP_VARIANT}/{VAR_NAME_CONTIG}": ([DIM_VARIANT], variant_contig)}
```

but I doubt any of us would prefer it.  I think the two biggest advantages of the constants are:

1. Preventing typos
2. Allowing us to change the names

Of these, 2 seems unlikely to be important (and we'll probably not use the constants in examples/documentation anyhow) and 1 might eventually be solved with things like https://github.com/python/typing/issues/28#issuecomment-351284520 and https://github.com/pydata/xarray/issues/3967.  I'm not going to hold my breath for that, but I do think it's worth asking whether or not we would all prefer this instead:

```python
data_vars = { "variant/contig": (["variants"], variant_contig)}
```

As a user, I think I would be happy to make my own constants or use mypy Literal types if I wanted some static safety and to just live with the risk of typos otherwise.  As a contributor, I'm not so sure but I'm leaning towards preferring the latter.  Any thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Constants for dimensions and variables #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Constants for dimensions and variables #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions