Skip to content

Remove hourly gtsm3 data from subset retrieve functions #1160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
veenstrajelmer opened this issue Mar 19, 2025 · 1 comment
Open

Remove hourly gtsm3 data from subset retrieve functions #1160

veenstrajelmer opened this issue Mar 19, 2025 · 1 comment

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Mar 19, 2025

The gtsm3-era5-cds source was added to the subset/retrieve observations functions in #1153. The original dataset is available on CDS and includes a user guide. Both the frequencies 10-minute and hourly were implemented (default is 10-minute). It was assumed the hourly data was instantaneous, just like the 10-minute values. This seems to be not the case when comparing the two datasets:

import os
import dfm_tools as dfmt
import xarray as xr
import matplotlib.pyplot as plt
plt.close("all")

dir_output = r"c:\Users\veenstra\Downloads"
dirh = os.path.join(dir_output, "gtsm3_hourly")
dirm = os.path.join(dir_output, "gtsm3_10min")
os.makedirs(dirh, exist_ok=True)
os.makedirs(dirm, exist_ok=True)

gdf = dfmt.ssh_catalog_subset(source='gtsm3-era5-cds')

dfmt.ssh_retrieve_data(gdf.iloc[[0]], dir_output=dirh, time_min="2020-01-01", time_max="2020-01-02 00:00", time_freq="hourly")
dfmt.ssh_retrieve_data(gdf.iloc[[0]], dir_output=dirm, time_min="2020-01-01", time_max="2020-01-02 00:00", time_freq="10_min")

fname = "gtsm3-era5-0-id_coast_glob_eur_00001.nc"
ds_h = xr.open_dataset(os.path.join(dirh, fname))
ds_m = xr.open_dataset(os.path.join(dirm, fname))

fig, ax = plt.subplots(figsize=(12,6))
ds_m.waterlevel.plot(ax=ax, marker=".", linestyle=None, label="10min")
ds_h.waterlevel.plot(ax=ax, marker="x", linestyle=None, label="hourly")

da_h2 = ds_m.waterlevel.resample(time='h').mean(dim='time')
da_h2.plot(ax=ax, marker="+", linestyle=None, label="hourly from 10min")

ds_m.close()
ds_h.close()
fig.tight_layout()
ax.grid()
ax.legend()

Gives:
Image

The hourly timeseries can approximately be reproduced by resampling to hours and taking the mean, so this seemed to have been the method with which the hourly timeseries was derived. All hourly means are however snapped to the start-of-interval times, which causes a 30-minute timeshift with the original 10-minute timeseries. In an "observation" context the hourly mean dataset is therefore not of added value, a hourly subset would have been more useful instead. To avoid confusion, it might be better to remove the possiblility to retrieve the hourly dataset from dfm_tools.

@veenstrajelmer veenstrajelmer changed the title Remove hourly freq from gtsm3-era5-cds subset retrieve functions Remove hourly gtsm3 data from subset retrieve functions Mar 19, 2025
@veenstrajelmer
Copy link
Collaborator Author

veenstrajelmer commented Apr 8, 2025

This offset in the orange line is not observed for the year 2010. So it seems that the hourly average has a different time administration in the CDS extension (start-of-interval, years 1980-2018), compared to the orginal CDS (center-of-interval, from 2019 onwards):
Image

This might be corrected for still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant