Skip to content

Time series with missingness #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hojjatkarami opened this issue Apr 8, 2024 · 5 comments
Open

Time series with missingness #268

hojjatkarami opened this issue Apr 8, 2024 · 5 comments

Comments

@hojjatkarami
Copy link

Feature Description

I have developed a GAN framework for generating irregularly sampled time series with missing values, however, I cannot add it to synthcity as it does not support time series data loaders with missing values.

Do you have any solution? If not, it would be great if you plan to add it in the future.

@robsdavis
Copy link
Contributor

Hi @hojjatkarami, Thanks for engaging with Synthcity!

We currently consider the scope Synthcity to be that it is only for generating synthetic records from complete datasets with no missingness data. All data must be imputed in the real dataset before training and none of our models generate missing values. However, we do already support generating synthetic time series datasets from real irregular time series datasets. These such datasets could be said to theoretically contain missing time points, but the data set does not actually contain any missing values with placeholders. You just need to label the time points you have in your dataloader.

Is this the sort of thing you mean, or are you suggesting something else, like generating a dataset with missing values in it or training on a dataset with multiple features at irregular time points with some (but not all) feature values missing?

@hojjatkarami
Copy link
Author

Hi @robsdavis,

I am thinking about irregularly sampled time series with missingness such as clinical time series of ICU patients. So, at each time stamp, a few variables might be missing.

@robsdavis
Copy link
Contributor

Thanks for your response. Are you generating synthetic data from data with missingness or generating synthetic data with missingness? In either case, we currently consider that out of scope for Synthcity as it is possible to 1) impute missing values first before using Synthcity to create synthetic records or 2) create a synthetic dataset with Synthcity then retrospectively delete values to create missing data. Can I ask, what the method you have developed provides to improve on this situation?

@hojjatkarami
Copy link
Author

hojjatkarami commented Apr 15, 2024

I consider generating synthetic data from data with missingness (so, no imputation is needed). Consider hourly measurements of ICU patients. In this case, the missingness rate is very high for many laboratory variables and the type of missingness is MNAR.
I would like to add the following model to synthcity: https://github.com/hojjatkarami/TimEHR .

@hojjatkarami
Copy link
Author

Hi @robsdavis,

I have added TimEHR to a forked repo of synthcity.
Could you please check the tutorial and give a feedback on that?
https://github.com/hojjatkarami/synthcity/blob/timehr/tutorials/plugins/time_series/TimEHR/plugin_timehr.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants