We tackle the problem of partially supervised instance segmentation in which we are given box annotations for all classes, but masks for only a subset of classes. We show that the architecture of the mask head plays a surprisingly important role in generalizing to masks of unseen classes. The figure below shows improved mask predictions for unseen classes as we use better mask-head architectures.

Just by using better mask-head architectures (no extra losses or modules) we achieve state-of-the-art performance in the partially supervised instance segmentation task. We call our model DeepMAC, which is short for Deep mask-heads above CenterNet.

Code

Demos

Main Results

In this table (X→Y) indicates that we train on masks from ‘X’ classes and evaluate with masks from ‘Y’ classes. These experiments are done on the COCO dataset. The VOC split contains 20 classes and the non-VOC split contains 60 classes. Bounding boxes are provided for all classes.

Train → Eval Model Mask mAP Config
Deep-MAC (CetnerNet based) VOC→Non-VOC 35.5 Link
Non-VOC→VOC 39.1 Link
Deep-MARC (Mask R-CNN based) VOC→Non-VOC 38.7 Link
Non-VOC→VOC 41.0 Link

Checkpoints

Both these models take Image + boxes as input and produce per-box instance masks as output.

Citation

@misc{birodkar2021surprising,
      title={The surprising impact of mask-head architecture on novel class segmentation}, 
      author={Vighnesh Birodkar and Zhichao Lu and Siyang Li and Vivek Rathod and Jonathan Huang},
      year={2021},
      eprint={2104.00613},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}