How to train MVTec dataset without defective samples? #1238

fanchuanster · 2023-08-07T08:04:01Z

Describe the bug

It errros when train without defective data in test folder, dataset folder structure
mydata
-- train
-- good
-- test
-- good

(no ground_truth as no defective data, as defective data does not matter in my case, and I would like to make my train process simple, without providing defective data/ ground_truth)

File "/usr/local/lib/python3.8/dist-packages/anomalib/data/base/datamodule.py", line 118, in _setup
self.train_data.setup()
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/base/dataset.py", line 162, in setup
self._setup()
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/mvtec.py", line 195, in _setup
self.samples = make_mvtec_dataset(self.root_category, split=self.split, extensions=IMG_EXTENSIONS)
File "/usr/local/lib/python3.8/dist-packages/anomalib/data/mvtec.py", line 156, in make_mvtec_dataset
assert (
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 1527, in nonzero
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Dataset

MVTec

Model

PADiM

Steps to reproduce the behavior

train a padim model with MVTech without ground_truth:
mydata
-- train
-- good
-- test
-- good

OS information

OS information:

OS: [e.g. Ubuntu 20.04]
Python version: [e.g. 3.8.10]
Anomalib version: version from latest source code main branch
PyTorch version: version from nvcr.io/nvidia/pytorch:22.12-py3
CUDA/cuDNN version: [e.g. 11.8]
GPU models and configuration: [e.g. 2x GeForce RTX 3090]
Any other relevant information: [e.g. I'm using a custom dataset]

Expected behavior

train successfully without error

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

based on default padim yaml config, with changes to dataset section to provide the train data

Logs

N/A

Code of Conduct

I agree to follow this project's Code of Conduct

samet-akcay · 2023-08-07T08:07:11Z

@fanchuanster, you need to use folder format if you want to modify the dataset structure. mvtec format will not work since it checks these directories by default.

fanchuanster · 2023-08-07T10:16:06Z

BTW, I found a workaround for this error, simply disabling all the assert by running the train.py with python flag -O, like this:
python3 -O anomalib/tools/train.py ...

fanchuanster · 2023-08-07T10:17:07Z

You can work on the official "fix", but it is not blocking me now.

fanchuanster · 2023-08-07T10:37:00Z

@fanchuanster, you need to use folder format if you want to modify the dataset structure. mvtec format will not work since it checks these directories by default.

If MVTech format does allow absense of test and ground_truth, anomalib can eliminate folder format support, as the strengthened MVTech covers the folder format.

samet-akcay · 2023-08-07T10:39:55Z

If MVTech format does allow absense of test and ground_truth

Sorry, I'm not aware where MVTec supports this. Can you provide an example where this is done please?

fanchuanster · 2023-08-07T10:49:11Z

@samet-akcay
Hi Samet,
this is the MVTec dataset format
https://github.com/openvinotoolkit/anomalib/blob/5a46d03042b4564a2931000d0fa509a86afb08ee/src/anomalib/models/padim/config.yaml#L3

samet-akcay · 2023-08-07T10:55:30Z

oh I understand your point now.

I'm not sure if we should make MVTec more flexible though. When we customise the MVTec dataset structure, it is not mvtec format anymore. When we use the mvtec format, I think the file structure should be the following:

MVTec
├── bottle
│   ├── ground_truth
│   │   ├── broken_large
│   │   ├── broken_small
│   │   └── contamination
│   ├── license.txt
│   ├── readme.txt
│   ├── test
│   │   ├── broken_large
│   │   ├── broken_small
│   │   ├── contamination
│   │   └── good
│   └── train
│       └── good

For any sort of customisation, or customised data, folder should be used.

@djdameln, what is your thought here?

samet-akcay · 2023-08-07T11:01:47Z

Until @djdameln provides his opinion, you could meanwhile use the following data configuration to train a model that uses only the good images from an MVTec category

dataset:
  name: mvtec_good
  format: folder
  root: ./datasets/MVTec
  normal_dir: bottle/train/good
  normal_test_dir: bottle/test/good
  task: classification
  abnormal_dir: null
  mask_dir: null
  extensions: null
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: synthetic # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: synthetic # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)

djdameln · 2023-08-07T11:18:57Z

I agree with @samet-akcay that the recommended dataset format for custom datasets is the Folder format. However, based on this comment

BTW, I found a workaround for this error, simply disabling all the assert by running the train.py with python flag -O, like this:
python3 -O anomalib/tools/train.py ...

, it seems that there is technically nothing blocking us from running MVTec dataset without anomalous samples, but the training fails due to a failed assert. This is most likely a left-over from a previous release in which we did not support training without anomalous images at all. When we added support for this, we only updated the Folder dataset, under the assumption that MVTec would always have anomalous images, because this is the case for the official MVTec dataset.

If this is correct, we could consider removing this restriction to provide this little bit of additional flexibility for users who may prefer the MVTec format, or who happen to have their custom dataset arranged in MVTec style.

Of course, I will need to have a closer look at the code to confirm that this change does not have any unwanted side effects, but these are my first thoughts.

fanchuanster · 2023-08-08T01:20:40Z

Thanks, cheers!

samet-akcay changed the title ~~[Bug]: MVTech error when train without defective test images~~ How to train MVTec dataset without defective samples? Aug 7, 2023

djdameln self-assigned this Aug 7, 2023

djdameln mentioned this issue Aug 7, 2023

Enable training with only normal images for MVTec #1241

Merged

13 tasks

samet-akcay closed this as completed in #1241 Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train MVTec dataset without defective samples? #1238

How to train MVTec dataset without defective samples? #1238

fanchuanster commented Aug 7, 2023

samet-akcay commented Aug 7, 2023

fanchuanster commented Aug 7, 2023

fanchuanster commented Aug 7, 2023

fanchuanster commented Aug 7, 2023

samet-akcay commented Aug 7, 2023

fanchuanster commented Aug 7, 2023 •

edited

Loading

samet-akcay commented Aug 7, 2023

samet-akcay commented Aug 7, 2023 •

edited

Loading

djdameln commented Aug 7, 2023

fanchuanster commented Aug 8, 2023

How to train MVTec dataset without defective samples? #1238

How to train MVTec dataset without defective samples? #1238

Comments

fanchuanster commented Aug 7, 2023

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

samet-akcay commented Aug 7, 2023

fanchuanster commented Aug 7, 2023

fanchuanster commented Aug 7, 2023

fanchuanster commented Aug 7, 2023

samet-akcay commented Aug 7, 2023

fanchuanster commented Aug 7, 2023 • edited Loading

samet-akcay commented Aug 7, 2023

samet-akcay commented Aug 7, 2023 • edited Loading

djdameln commented Aug 7, 2023

fanchuanster commented Aug 8, 2023

fanchuanster commented Aug 7, 2023 •

edited

Loading

samet-akcay commented Aug 7, 2023 •

edited

Loading