-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add Unlabeled Image Dataset for Unsupervised Training #9050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Would you be open to adding this? I’d be happy to contribute a PR if there’s interest. Thanks! |
Thanks for the feature request @mduszyk . Can you share a bit more about the API you have in mind? Naively this sounds like a shallow wrapper around |
I was thinking about sth like this: from pathlib import Path
from torchvision.io import read_image, ImageReadMode
class UnlabeledImageFolder:
def __init__(self, root_dir, patterns=('**/*.jpg', '**/*.png'), transform=None):
self.root = Path(root_dir)
self.images = []
for pattern in patterns:
self.images.extend(self.root.glob(pattern))
self.transform = transform
def __len__(self):
return len(self.images)
def __getitem__(self, i):
img = read_image(self.images[i], ImageReadMode.RGB)
if self.transform:
img = self.transform(img)
return img It uses glob allowing for multiple patterns, loads the image and performs optional transformation. One more idea is to unify the Looking forward to learn about your thoughts on this. |
Thanks for the details. I think this is reasonable but I hope we can support that with the existing |
Here is general view of this in the code: class DatasetFolder(VisionDataset):
...
def find_classes(self, directory: Union[str, Path]) -> tuple[list[str], dict[str, int]]:
"""Find the class folders in a dataset structured as follows::
directory/
├── class_x
│ ├── xxx.ext
│ ├── xxy.ext
│ └── ...
│ └── xxz.ext
└── class_y
├── 123.ext
├── nsdf3.ext
└── ...
└── asd932_.ext
...
"""
class ImageFolder(DatasetFolder):
...
I was thinking initially about extending Let me know your thoughts on this, and if you are interested, I could propose |
Uh oh!
There was an error while loading. Please reload this page.
🚀 The feature
I’m proposing to add a dataset class for unsupervised learning (e.g., generative models), where the dataset consists of a flat folder of unlabeled images.
Introduce a new class, e.g.
UnlabeledImageDataset
, that:ImageFolder
conventions where applicabletorchvision/datasets/folder.py
and reuses existing utilitiesImageFolder
Motivation, pitch
torchvision.datasets.ImageFolder
andDatasetFolder
are designed for supervised tasks, requiring a specific directory structure and class-label mappings. In unsupervised scenarios, I end up writing custom datasets for this case. A built-in dataset would improve usability and consistency across the PyTorch ecosystem.This feature request is similar in spirit to Issue #660, where a user suggested supporting unlabeled or unsupervised datasets. The use case remains common, and a lightweight, built-in solution would reduce boilerplate and improve consistency.
Alternatives
An alternative would be to have an "unsupervised" mode for
ImageFolder
as suggested in Issue #660. But that would result in increased complexity in this class as pointed out in the comment of the issue.Additional context
It feels like this functionality belongs in a common library especially that
ImageFolder
is already present intorchvision
.The text was updated successfully, but these errors were encountered: