-
Notifications
You must be signed in to change notification settings - Fork 737
How to train MVTec dataset without defective samples? #1238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@fanchuanster, you need to use |
BTW, I found a workaround for this error, simply disabling all the assert by running the train.py with python flag -O, like this: |
You can work on the official "fix", but it is not blocking me now. |
If MVTech format does allow absense of test and ground_truth, anomalib can eliminate folder format support, as the strengthened MVTech covers the folder format. |
Sorry, I'm not aware where MVTec supports this. Can you provide an example where this is done please? |
@samet-akcay |
oh I understand your point now. I'm not sure if we should make MVTec
├── bottle
│ ├── ground_truth
│ │ ├── broken_large
│ │ ├── broken_small
│ │ └── contamination
│ ├── license.txt
│ ├── readme.txt
│ ├── test
│ │ ├── broken_large
│ │ ├── broken_small
│ │ ├── contamination
│ │ └── good
│ └── train
│ └── good For any sort of customisation, or customised data, @djdameln, what is your thought here? |
Until @djdameln provides his opinion, you could meanwhile use the following data configuration to train a model that uses only the good images from an MVTec category dataset:
name: mvtec_good
format: folder
root: ./datasets/MVTec
normal_dir: bottle/train/good
normal_test_dir: bottle/test/good
task: classification
abnormal_dir: null
mask_dir: null
extensions: null
train_batch_size: 32
eval_batch_size: 32
num_workers: 8
image_size: 256 # dimensions to which images are resized (mandatory)
center_crop: null # dimensions to which images are center-cropped after resizing (optional)
normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
transform_config:
train: null
eval: null
test_split_mode: synthetic # options: [from_dir, synthetic]
test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
val_split_mode: synthetic # options: [same_as_test, from_test, synthetic]
val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode) |
I agree with @samet-akcay that the recommended dataset format for custom datasets is the
, it seems that there is technically nothing blocking us from running MVTec dataset without anomalous samples, but the training fails due to a failed assert. This is most likely a left-over from a previous release in which we did not support training without anomalous images at all. When we added support for this, we only updated the Folder dataset, under the assumption that MVTec would always have anomalous images, because this is the case for the official MVTec dataset. If this is correct, we could consider removing this restriction to provide this little bit of additional flexibility for users who may prefer the MVTec format, or who happen to have their custom dataset arranged in MVTec style. Of course, I will need to have a closer look at the code to confirm that this change does not have any unwanted side effects, but these are my first thoughts. |
Thanks, cheers! |
Describe the bug
It errros when train without defective data in test folder, dataset folder structure
mydata
-- train
-- good
-- test
-- good
(no ground_truth as no defective data, as defective data does not matter in my case, and I would like to make my train process simple, without providing defective data/ ground_truth)
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/base/datamodule.py", line 118, in _setup
self.train_data.setup()
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/base/dataset.py", line 162, in setup
self._setup()
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/mvtec.py", line 195, in _setup
self.samples = make_mvtec_dataset(self.root_category, split=self.split, extensions=IMG_EXTENSIONS)
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/mvtec.py", line 156, in make_mvtec_dataset
assert (
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 1527, in nonzero
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Dataset
MVTec
Model
PADiM
Steps to reproduce the behavior
mydata
-- train
-- good
-- test
-- good
OS information
OS information:
Expected behavior
train successfully without error
Screenshots
No response
Pip/GitHub
pip
What version/branch did you use?
No response
Configuration YAML
based on default padim yaml config, with changes to dataset section to provide the train data
Logs
Code of Conduct
The text was updated successfully, but these errors were encountered: