-
Notifications
You must be signed in to change notification settings - Fork 74
Time series with missingness #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @hojjatkarami, Thanks for engaging with Synthcity! We currently consider the scope Synthcity to be that it is only for generating synthetic records from complete datasets with no missingness data. All data must be imputed in the real dataset before training and none of our models generate missing values. However, we do already support generating synthetic time series datasets from real irregular time series datasets. These such datasets could be said to theoretically contain missing time points, but the data set does not actually contain any missing values with placeholders. You just need to label the time points you have in your dataloader. Is this the sort of thing you mean, or are you suggesting something else, like generating a dataset with missing values in it or training on a dataset with multiple features at irregular time points with some (but not all) feature values missing? |
Hi @robsdavis, I am thinking about irregularly sampled time series with missingness such as clinical time series of ICU patients. So, at each time stamp, a few variables might be missing. |
Thanks for your response. Are you generating synthetic data from data with missingness or generating synthetic data with missingness? In either case, we currently consider that out of scope for Synthcity as it is possible to 1) impute missing values first before using Synthcity to create synthetic records or 2) create a synthetic dataset with Synthcity then retrospectively delete values to create missing data. Can I ask, what the method you have developed provides to improve on this situation? |
I consider generating synthetic data from data with missingness (so, no imputation is needed). Consider hourly measurements of ICU patients. In this case, the missingness rate is very high for many laboratory variables and the type of missingness is MNAR. |
Hi @robsdavis, I have added TimEHR to a forked repo of synthcity. |
Feature Description
I have developed a GAN framework for generating irregularly sampled time series with missing values, however, I cannot add it to synthcity as it does not support time series data loaders with missing values.
Do you have any solution? If not, it would be great if you plan to add it in the future.
The text was updated successfully, but these errors were encountered: