Skip to content

Commit 95b8982

Browse files
author
Cheng-Yang Fu
authored
Merge pull request facebookresearch#2 from chengyangfu/retina_maskrcnn
Retina maskrcnn
2 parents f2fd7ed + 2d96517 commit 95b8982

File tree

77 files changed

+4159
-49
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+4159
-49
lines changed

ABSTRACTIONS.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,15 @@ a specific image, as well as the size of the image as a `(width, height)` tuple.
3131
It also contains a set of methods that allow to perform geometric
3232
transformations to the bounding boxes (such as cropping, scaling and flipping).
3333
The class accepts bounding boxes from two different input formats:
34-
- `xyxy`, where each box is encoded as a `x1`, `y1`, `x2` and `y2` coordinates)
34+
- `xyxy`, where each box is encoded as a `x1`, `y1`, `x2` and `y2` coordinates, and
3535
- `xywh`, where each box is encoded as `x1`, `y1`, `w` and `h`.
3636

3737
Additionally, each `BoxList` instance can also hold arbitrary additional information
3838
for each bounding box, such as labels, visibility, probability scores etc.
3939

4040
Here is an example on how to create a `BoxList` from a list of coordinates:
4141
```python
42-
from maskrcnn_baseline.structures.bounding_box import BoxList, FLIP_LEFT_RIGHT
42+
from maskrcnn_benchmark.structures.bounding_box import BoxList, FLIP_LEFT_RIGHT
4343

4444
width = 100
4545
height = 200
@@ -49,7 +49,7 @@ boxes = [
4949
[10, 10, 50, 50]
5050
]
5151
# create a BoxList with 3 boxes
52-
bbox = BoxList(boxes, size=(width, height), mode='xyxy')
52+
bbox = BoxList(boxes, image_size=(width, height), mode='xyxy')
5353

5454
# perform some box transformations, has similar API as PIL.Image
5555
bbox_scaled = bbox.resize((width * 2, height * 3))
+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
MODEL:
2+
META_ARCHITECTURE: "RetinaNet"
3+
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-101"
4+
RPN_ONLY: True
5+
BACKBONE:
6+
CONV_BODY: "R-101-FPN"
7+
OUT_CHANNELS: 256
8+
RPN:
9+
USE_FPN: True
10+
FG_IOU_THRESHOLD: 0.5
11+
BG_IOU_THRESHOLD: 0.4
12+
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
13+
PRE_NMS_TOP_N_TRAIN: 2000
14+
PRE_NMS_TOP_N_TEST: 1000
15+
POST_NMS_TOP_N_TEST: 1000
16+
FPN_POST_NMS_TOP_N_TEST: 1000
17+
ROI_HEADS:
18+
USE_FPN: True
19+
BATCH_SIZE_PER_IMAGE: 256
20+
ROI_BOX_HEAD:
21+
POOLER_RESOLUTION: 7
22+
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
23+
POOLER_SAMPLING_RATIO: 2
24+
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
25+
PREDICTOR: "FPNPredictor"
26+
DATASETS:
27+
TRAIN: ("coco_2017_train",)
28+
TEST: ("coco_2017_val",)
29+
INPUT:
30+
MIN_SIZE_TRAIN: (800, )
31+
MAX_SIZE_TRAIN: 1333
32+
MIN_SIZE_TEST: 800
33+
MAX_SIZE_TEST: 1333
34+
DATALOADER:
35+
SIZE_DIVISIBILITY: 32
36+
SOLVER:
37+
# Assume 4 gpus
38+
BASE_LR: 0.005
39+
WEIGHT_DECAY: 0.0001
40+
STEPS: (120000, 160000)
41+
MAX_ITER: 180000
42+
IMS_PER_BATCH: 8
43+
RETINANET:
44+
RETINANET_ON: True
45+
SCALES_PER_OCTAVE: 3
46+
STRADDLE_THRESH: -1
47+
48+
+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
MODEL:
2+
META_ARCHITECTURE: "RetinaNet"
3+
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
4+
RPN_ONLY: True
5+
BACKBONE:
6+
CONV_BODY: "R-50-FPN"
7+
OUT_CHANNELS: 256
8+
RPN:
9+
USE_FPN: True
10+
FG_IOU_THRESHOLD: 0.5
11+
BG_IOU_THRESHOLD: 0.4
12+
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
13+
PRE_NMS_TOP_N_TRAIN: 2000
14+
PRE_NMS_TOP_N_TEST: 1000
15+
POST_NMS_TOP_N_TEST: 1000
16+
FPN_POST_NMS_TOP_N_TEST: 1000
17+
ROI_HEADS:
18+
USE_FPN: True
19+
BATCH_SIZE_PER_IMAGE: 256
20+
ROI_BOX_HEAD:
21+
POOLER_RESOLUTION: 7
22+
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
23+
POOLER_SAMPLING_RATIO: 2
24+
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
25+
PREDICTOR: "FPNPredictor"
26+
DATASETS:
27+
TRAIN: ("coco_2017_train",)
28+
TEST: ("coco_2017_val",)
29+
INPUT:
30+
MIN_SIZE_TRAIN: (800,)
31+
MAX_SIZE_TRAIN: 1333
32+
MIN_SIZE_TEST: 800
33+
MAX_SIZE_TEST: 1333
34+
DATALOADER:
35+
SIZE_DIVISIBILITY: 32
36+
SOLVER:
37+
# Assume 4 gpus
38+
BASE_LR: 0.01
39+
WEIGHT_DECAY: 0.0001
40+
STEPS: (60000, 80000)
41+
MAX_ITER: 90000
42+
IMS_PER_BATCH: 16
43+
RETINANET:
44+
RETINANET_ON: True
45+
SCALES_PER_OCTAVE: 3
46+
STRADDLE_THRESH: -1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
MODEL:
2+
META_ARCHITECTURE: "RetinaNet"
3+
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
4+
RPN_ONLY: True
5+
BACKBONE:
6+
CONV_BODY: "R-50-FPN"
7+
OUT_CHANNELS: 256
8+
RPN:
9+
USE_FPN: True
10+
FG_IOU_THRESHOLD: 0.5
11+
BG_IOU_THRESHOLD: 0.4
12+
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
13+
PRE_NMS_TOP_N_TRAIN: 2000
14+
PRE_NMS_TOP_N_TEST: 1000
15+
POST_NMS_TOP_N_TEST: 1000
16+
FPN_POST_NMS_TOP_N_TEST: 1000
17+
ROI_HEADS:
18+
USE_FPN: True
19+
BATCH_SIZE_PER_IMAGE: 256
20+
ROI_BOX_HEAD:
21+
POOLER_RESOLUTION: 7
22+
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
23+
POOLER_SAMPLING_RATIO: 2
24+
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
25+
PREDICTOR: "FPNPredictor"
26+
DATASETS:
27+
TRAIN: ("coco_2017_train",)
28+
TEST: ("coco_2017_val",)
29+
INPUT:
30+
MIN_SIZE_TRAIN: (800,)
31+
MAX_SIZE_TRAIN: 1333
32+
MIN_SIZE_TEST: 800
33+
MAX_SIZE_TEST: 1333
34+
DATALOADER:
35+
SIZE_DIVISIBILITY: 32
36+
SOLVER:
37+
# Assume 4 gpus
38+
BASE_LR: 0.01
39+
WEIGHT_DECAY: 0.0001
40+
STEPS: (60000, 80000)
41+
MAX_ITER: 90000
42+
IMS_PER_BATCH: 16
43+
RETINANET:
44+
RETINANET_ON: True
45+
SCALES_PER_OCTAVE: 3
46+
STRADDLE_THRESH: -1
47+
SELFADJUST_SMOOTH_L1: True
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
MODEL:
2+
META_ARCHITECTURE: "RetinaNet"
3+
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
4+
RPN_ONLY: True
5+
BACKBONE:
6+
CONV_BODY: "R-50-FPN"
7+
OUT_CHANNELS: 256
8+
RPN:
9+
USE_FPN: True
10+
FG_IOU_THRESHOLD: 0.5
11+
BG_IOU_THRESHOLD: 0.4
12+
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
13+
PRE_NMS_TOP_N_TRAIN: 2000
14+
PRE_NMS_TOP_N_TEST: 1000
15+
POST_NMS_TOP_N_TEST: 1000
16+
FPN_POST_NMS_TOP_N_TEST: 1000
17+
ROI_HEADS:
18+
USE_FPN: True
19+
BATCH_SIZE_PER_IMAGE: 256
20+
ROI_BOX_HEAD:
21+
POOLER_RESOLUTION: 7
22+
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
23+
POOLER_SAMPLING_RATIO: 2
24+
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
25+
PREDICTOR: "FPNPredictor"
26+
DATASETS:
27+
TRAIN: ("coco_2017_train",)
28+
TEST: ("coco_2017_val",)
29+
INPUT:
30+
MIN_SIZE_TRAIN: (800,)
31+
MAX_SIZE_TRAIN: 1333
32+
MIN_SIZE_TEST: 800
33+
MAX_SIZE_TEST: 1333
34+
DATALOADER:
35+
SIZE_DIVISIBILITY: 32
36+
SOLVER:
37+
# Assume 4 gpus
38+
BASE_LR: 0.01
39+
WEIGHT_DECAY: 0.0001
40+
STEPS: (60000, 80000)
41+
MAX_ITER: 90000
42+
IMS_PER_BATCH: 16
43+
RETINANET:
44+
RETINANET_ON: True
45+
SCALES_PER_OCTAVE: 3
46+
STRADDLE_THRESH: -1
47+
BBOX_REG_BETA: 1.0
48+
SELFADJUST_SMOOTH_L1: True
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
MODEL:
2+
META_ARCHITECTURE: "RetinaNet"
3+
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
4+
RPN_ONLY: True
5+
BACKBONE:
6+
CONV_BODY: "R-50-FPN"
7+
OUT_CHANNELS: 256
8+
RPN:
9+
USE_FPN: True
10+
FG_IOU_THRESHOLD: 0.5
11+
BG_IOU_THRESHOLD: 0.4
12+
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
13+
PRE_NMS_TOP_N_TRAIN: 2000
14+
PRE_NMS_TOP_N_TEST: 1000
15+
POST_NMS_TOP_N_TEST: 1000
16+
FPN_POST_NMS_TOP_N_TEST: 1000
17+
ROI_HEADS:
18+
USE_FPN: True
19+
BATCH_SIZE_PER_IMAGE: 256
20+
ROI_BOX_HEAD:
21+
POOLER_RESOLUTION: 7
22+
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
23+
POOLER_SAMPLING_RATIO: 2
24+
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
25+
PREDICTOR: "FPNPredictor"
26+
DATASETS:
27+
TRAIN: ("coco_2017_train",)
28+
TEST: ("coco_2017_val",)
29+
INPUT:
30+
MIN_SIZE_TRAIN: (800,)
31+
MAX_SIZE_TRAIN: 1333
32+
MIN_SIZE_TEST: 800
33+
MAX_SIZE_TEST: 1333
34+
DATALOADER:
35+
SIZE_DIVISIBILITY: 32
36+
SOLVER:
37+
# Assume 4 gpus
38+
BASE_LR: 0.01
39+
WEIGHT_DECAY: 0.0001
40+
STEPS: (60000, 80000)
41+
MAX_ITER: 90000
42+
IMS_PER_BATCH: 16
43+
RETINANET:
44+
RETINANET_ON: True
45+
SCALES_PER_OCTAVE: 3
46+
STRADDLE_THRESH: -1
47+
SELFADJUST_SMOOTH_L1: True
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
MODEL:
2+
META_ARCHITECTURE: "RetinaNet"
3+
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
4+
RPN_ONLY: True
5+
BACKBONE:
6+
CONV_BODY: "R-50-FPN"
7+
OUT_CHANNELS: 256
8+
RPN:
9+
USE_FPN: True
10+
FG_IOU_THRESHOLD: 0.5
11+
BG_IOU_THRESHOLD: 0.4
12+
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
13+
PRE_NMS_TOP_N_TRAIN: 2000
14+
PRE_NMS_TOP_N_TEST: 1000
15+
POST_NMS_TOP_N_TEST: 1000
16+
FPN_POST_NMS_TOP_N_TEST: 1000
17+
ROI_HEADS:
18+
USE_FPN: True
19+
BATCH_SIZE_PER_IMAGE: 256
20+
ROI_BOX_HEAD:
21+
POOLER_RESOLUTION: 7
22+
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
23+
POOLER_SAMPLING_RATIO: 2
24+
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
25+
PREDICTOR: "FPNPredictor"
26+
DATASETS:
27+
TRAIN: ("coco_2017_train",)
28+
TEST: ("coco_2017_val",)
29+
INPUT:
30+
MIN_SIZE_TRAIN: (800,)
31+
MAX_SIZE_TRAIN: 1333
32+
MIN_SIZE_TEST: 800
33+
MAX_SIZE_TEST: 1333
34+
DATALOADER:
35+
SIZE_DIVISIBILITY: 32
36+
SOLVER:
37+
# Assume 4 gpus
38+
BASE_LR: 0.01
39+
WEIGHT_DECAY: 0.0001
40+
STEPS: (60000, 80000)
41+
MAX_ITER: 90000
42+
IMS_PER_BATCH: 16
43+
RETINANET:
44+
RETINANET_ON: True
45+
SCALES_PER_OCTAVE: 3
46+
STRADDLE_THRESH: -1
47+
BBOX_REG_BETA: 1.0
48+
SELFADJUST_SMOOTH_L1: False
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
MODEL:
2+
META_ARCHITECTURE: "RetinaNet"
3+
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
4+
RPN_ONLY: True
5+
BACKBONE:
6+
CONV_BODY: "R-50-FPN"
7+
OUT_CHANNELS: 256
8+
RPN:
9+
USE_FPN: True
10+
FG_IOU_THRESHOLD: 0.5
11+
BG_IOU_THRESHOLD: 0.4
12+
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
13+
PRE_NMS_TOP_N_TRAIN: 2000
14+
PRE_NMS_TOP_N_TEST: 1000
15+
POST_NMS_TOP_N_TEST: 1000
16+
FPN_POST_NMS_TOP_N_TEST: 1000
17+
ROI_HEADS:
18+
USE_FPN: True
19+
BATCH_SIZE_PER_IMAGE: 256
20+
ROI_BOX_HEAD:
21+
POOLER_RESOLUTION: 7
22+
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
23+
POOLER_SAMPLING_RATIO: 2
24+
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
25+
PREDICTOR: "FPNPredictor"
26+
DATASETS:
27+
TRAIN: ("coco_2017_train",)
28+
TEST: ("coco_2017_val",)
29+
INPUT:
30+
MIN_SIZE_TRAIN: (800,)
31+
MAX_SIZE_TRAIN: 1333
32+
MIN_SIZE_TEST: 800
33+
MAX_SIZE_TEST: 1333
34+
DATALOADER:
35+
SIZE_DIVISIBILITY: 32
36+
SOLVER:
37+
# Assume 4 gpus
38+
BASE_LR: 0.01
39+
WEIGHT_DECAY: 0.0001
40+
STEPS: (60000, 80000)
41+
MAX_ITER: 90000
42+
IMS_PER_BATCH: 16
43+
RETINANET:
44+
RETINANET_ON: True
45+
SCALES_PER_OCTAVE: 3
46+
STRADDLE_THRESH: -1
47+
LOW_QUALITY_THRESHOLD: 0.4

0 commit comments

Comments
 (0)