Skip to content

How to train MVTec dataset without defective samples? #1238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
fanchuanster opened this issue Aug 7, 2023 · 10 comments · Fixed by #1241
Closed
1 task done

How to train MVTec dataset without defective samples? #1238

fanchuanster opened this issue Aug 7, 2023 · 10 comments · Fixed by #1241
Assignees

Comments

@fanchuanster
Copy link

Describe the bug

It errros when train without defective data in test folder, dataset folder structure
mydata
-- train
-- good
-- test
-- good

(no ground_truth as no defective data, as defective data does not matter in my case, and I would like to make my train process simple, without providing defective data/ ground_truth)

File "/usr/local/lib/python3.8/dist-packages/anomalib/data/base/datamodule.py", line 118, in _setup
self.train_data.setup()
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/base/dataset.py", line 162, in setup
self._setup()
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/mvtec.py", line 195, in _setup
self.samples = make_mvtec_dataset(self.root_category, split=self.split, extensions=IMG_EXTENSIONS)
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/mvtec.py", line 156, in make_mvtec_dataset
assert (
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 1527, in nonzero
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Dataset

MVTec

Model

PADiM

Steps to reproduce the behavior

  1. train a padim model with MVTech without ground_truth:
    mydata
    -- train
    -- good
    -- test
    -- good

OS information

OS information:

  • OS: [e.g. Ubuntu 20.04]
  • Python version: [e.g. 3.8.10]
  • Anomalib version: version from latest source code main branch
  • PyTorch version: version from nvcr.io/nvidia/pytorch:22.12-py3
  • CUDA/cuDNN version: [e.g. 11.8]
  • GPU models and configuration: [e.g. 2x GeForce RTX 3090]
  • Any other relevant information: [e.g. I'm using a custom dataset]

Expected behavior

train successfully without error

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

based on default padim yaml config, with changes to dataset section to provide the train data

Logs

N/A

Code of Conduct

  • I agree to follow this project's Code of Conduct
@samet-akcay
Copy link
Contributor

@fanchuanster, you need to use folder format if you want to modify the dataset structure. mvtec format will not work since it checks these directories by default.

@samet-akcay samet-akcay changed the title [Bug]: MVTech error when train without defective test images How to train MVTec dataset without defective samples? Aug 7, 2023
@fanchuanster
Copy link
Author

BTW, I found a workaround for this error, simply disabling all the assert by running the train.py with python flag -O, like this:
python3 -O anomalib/tools/train.py ...

@fanchuanster
Copy link
Author

You can work on the official "fix", but it is not blocking me now.

@fanchuanster
Copy link
Author

@fanchuanster, you need to use folder format if you want to modify the dataset structure. mvtec format will not work since it checks these directories by default.

If MVTech format does allow absense of test and ground_truth, anomalib can eliminate folder format support, as the strengthened MVTech covers the folder format.

@samet-akcay
Copy link
Contributor

If MVTech format does allow absense of test and ground_truth

Sorry, I'm not aware where MVTec supports this. Can you provide an example where this is done please?

@fanchuanster
Copy link
Author

fanchuanster commented Aug 7, 2023

@samet-akcay
Copy link
Contributor

oh I understand your point now.

I'm not sure if we should make MVTec more flexible though. When we customise the MVTec dataset structure, it is not mvtec format anymore. When we use the mvtec format, I think the file structure should be the following:

MVTec
├── bottle
│   ├── ground_truth
│   │   ├── broken_large
│   │   ├── broken_small
│   │   └── contamination
│   ├── license.txt
│   ├── readme.txt
│   ├── test
│   │   ├── broken_large
│   │   ├── broken_small
│   │   ├── contamination
│   │   └── good
│   └── train
│       └── good

For any sort of customisation, or customised data, folder should be used.

@djdameln, what is your thought here?

@samet-akcay
Copy link
Contributor

samet-akcay commented Aug 7, 2023

Until @djdameln provides his opinion, you could meanwhile use the following data configuration to train a model that uses only the good images from an MVTec category

dataset:
  name: mvtec_good
  format: folder
  root: ./datasets/MVTec
  normal_dir: bottle/train/good
  normal_test_dir: bottle/test/good
  task: classification
  abnormal_dir: null
  mask_dir: null
  extensions: null
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: synthetic # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: synthetic # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)

@djdameln
Copy link
Contributor

djdameln commented Aug 7, 2023

I agree with @samet-akcay that the recommended dataset format for custom datasets is the Folder format. However, based on this comment

BTW, I found a workaround for this error, simply disabling all the assert by running the train.py with python flag -O, like this:
python3 -O anomalib/tools/train.py ...

, it seems that there is technically nothing blocking us from running MVTec dataset without anomalous samples, but the training fails due to a failed assert. This is most likely a left-over from a previous release in which we did not support training without anomalous images at all. When we added support for this, we only updated the Folder dataset, under the assumption that MVTec would always have anomalous images, because this is the case for the official MVTec dataset.

If this is correct, we could consider removing this restriction to provide this little bit of additional flexibility for users who may prefer the MVTec format, or who happen to have their custom dataset arranged in MVTec style.

Of course, I will need to have a closer look at the code to confirm that this change does not have any unwanted side effects, but these are my first thoughts.

@fanchuanster
Copy link
Author

Thanks, cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants