Remove hourly gtsm3 data from subset retrieve functions #1160

veenstrajelmer · 2025-03-19T09:55:04Z

The gtsm3-era5-cds source was added to the subset/retrieve observations functions in #1153. The original dataset is available on CDS and includes a user guide. Both the frequencies 10-minute and hourly were implemented (default is 10-minute). It was assumed the hourly data was instantaneous, just like the 10-minute values. This seems to be not the case when comparing the two datasets:

import os
import dfm_tools as dfmt
import xarray as xr
import matplotlib.pyplot as plt
plt.close("all")

dir_output = r"c:\Users\veenstra\Downloads"
dirh = os.path.join(dir_output, "gtsm3_hourly")
dirm = os.path.join(dir_output, "gtsm3_10min")
os.makedirs(dirh, exist_ok=True)
os.makedirs(dirm, exist_ok=True)

gdf = dfmt.ssh_catalog_subset(source='gtsm3-era5-cds')

dfmt.ssh_retrieve_data(gdf.iloc[[0]], dir_output=dirh, time_min="2020-01-01", time_max="2020-01-02 00:00", time_freq="hourly")
dfmt.ssh_retrieve_data(gdf.iloc[[0]], dir_output=dirm, time_min="2020-01-01", time_max="2020-01-02 00:00", time_freq="10_min")

fname = "gtsm3-era5-0-id_coast_glob_eur_00001.nc"
ds_h = xr.open_dataset(os.path.join(dirh, fname))
ds_m = xr.open_dataset(os.path.join(dirm, fname))

fig, ax = plt.subplots(figsize=(12,6))
ds_m.waterlevel.plot(ax=ax, marker=".", linestyle=None, label="10min")
ds_h.waterlevel.plot(ax=ax, marker="x", linestyle=None, label="hourly")

da_h2 = ds_m.waterlevel.resample(time='h').mean(dim='time')
da_h2.plot(ax=ax, marker="+", linestyle=None, label="hourly from 10min")

ds_m.close()
ds_h.close()
fig.tight_layout()
ax.grid()
ax.legend()

Gives:

The hourly timeseries can approximately be reproduced by resampling to hours and taking the mean, so this seemed to have been the method with which the hourly timeseries was derived. All hourly means are however snapped to the start-of-interval times, which causes a 30-minute timeshift with the original 10-minute timeseries. In an "observation" context the hourly mean dataset is therefore not of added value, a hourly subset would have been more useful instead. To avoid confusion, it might be better to remove the possiblility to retrieve the hourly dataset from dfm_tools.

The text was updated successfully, but these errors were encountered:

veenstrajelmer · 2025-04-08T09:36:18Z

This offset in the orange line is not observed for the year 2010. So it seems that the hourly average has a different time administration in the CDS extension (start-of-interval, years 1980-2018), compared to the orginal CDS (center-of-interval, from 2019 onwards):

This might be corrected for still.

veenstrajelmer changed the title ~~Remove hourly freq from gtsm3-era5-cds subset retrieve functions~~ Remove hourly gtsm3 data from subset retrieve functions Mar 19, 2025

veenstrajelmer mentioned this issue Mar 19, 2025

Prepare 0.37.0 release #1158

Open

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove hourly gtsm3 data from subset retrieve functions #1160

Remove hourly gtsm3 data from subset retrieve functions #1160

veenstrajelmer commented Mar 19, 2025 •

edited

Loading

veenstrajelmer commented Apr 8, 2025 •

edited

Loading

Remove hourly gtsm3 data from subset retrieve functions #1160

Remove hourly gtsm3 data from subset retrieve functions #1160

Comments

veenstrajelmer commented Mar 19, 2025 • edited Loading

veenstrajelmer commented Apr 8, 2025 • edited Loading

veenstrajelmer commented Mar 19, 2025 •

edited

Loading

veenstrajelmer commented Apr 8, 2025 •

edited

Loading