Skip to content

1121 subsetretrieve observations add gtsmip c3s locations and data #1153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

n-aleksandrova
Copy link
Collaborator

GTSM reanalysis data from CDS was added to the observations.py to enable retrieving a dataframe of GTSM output locations and downloading GTSM-ERA5 data from CDS.

The GTSM obs points are read from the .csv file in the GTSM-ERA5 repository. Please note that the station names are not always identical to the station names handled by the model, because some names included special characters which I have removed in postprocessing (this was causing issues when using the station name in filenames when saving individual observations) (example: 'currents_l��staviken' vs. 'currents_lstaviken').

The GTSM data cannot be subset by station when downloading from CDS, therefore the data is downloaded once to a cache folder, and then accessed to retrieve data for individual stations.

Possible further improvements:

  • Currently this code retrieves hourly timeseries from CDS. We can optionally add 10-min timeseries as well, this is not done yet.
  • For any area, there are many more GTSM stations than stations from other sources (x10 more). To mitigate this a bit in the subset/retrieve notebook, I am subsetting the GTSM dataset to exclude gridded points in open sea (keeping only points located close to the coast).
  • The large number of GTSM stations results in a very long legend for the figure with the timeseries. For now I have commented the legend out, but we can consider to not plot GTSM data in this overview.
  • it seems unnecessary to show GTSM data in the overview created by dfmt.ssh_netcdf_overview() because there will always be data available for any period in 1950-2024. But there is no simple way to exclude this data, so I left it as it is.

@n-aleksandrova n-aleksandrova linked an issue Mar 5, 2025 that may be closed by this pull request
17 tasks
Copy link
Collaborator

@veenstrajelmer veenstrajelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this valuable contribution, including the very clear description with discussion points. Very neat! It looks like it will work well (have not tested it yet, I will do that after your changes), but I do have added some suggestions to specific parts of the code. Please let me know of any of these review comments are annoying to you. I realize I am micro-managing (for a good cause, but still), so you can drop any of the tasks back at my desk.

For the notebook (too large diff to comment inline):

  • maybe skip downloading (and plotting subset) just like gesla, this avoids all issues you encountered. But of course do keep them in the global overview. I am also planning to update the modelbuilder notebook to include GTSM as model obspoints, so there will be exposure nevertheless.
  • remove the new gtsm source from the TODO comment again (see other review points)

@n-aleksandrova
Copy link
Collaborator Author

I have submitted a new version of the scripts, with adjustments based on the discussion we had.

In summary:

  • Source for GTSM was renamed to gtsm3-era5-cds.
  • Added option to retrieve surge timeseries, and choose between hourly and 10-min time resolutions for GTSM. To do: currently all retrieved TS are saved to files with station name in its name and 'waterlevel' variable, we need to enable distinguishing between the waterlevel and surge timeseries. This is however a 'special case' compared to other observation sources. To be discussed how to best do that without adding too many exceptions to the code.
  • Fixed other minor issues related to formatting and visualizations.
  • ModelBuilder notebook example now includes the retrieval and visualization of the obs points based on GTSM. Note: I did not yet test if the model runs correctly with these new obs points.

Copy link

@veenstrajelmer
Copy link
Collaborator

Some minor things added:

  • removed support for surge again. It made the code complex and impossible to distuinguish in the netcdf files as you also noted. Furthermore, there is no request for this yet.
  • whatsnew: linked to PR instead of issue
  • added testcase to cover time_min/time_max =None and invalid time_freq
  • suppressed the CMEMS logging in the notebook

@veenstrajelmer veenstrajelmer merged commit c29c4d9 into main Mar 18, 2025
2 of 7 checks passed
@veenstrajelmer veenstrajelmer deleted the 1121-subsetretrieve-observations-add-gtsmip-c3s-locations-and-data branch March 18, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

subset/retrieve observations: add GTSMip C3S locations and data
2 participants