DAVIS: Densely Annotated VIdeo Segmentation 2017

DAVIS (Densely Annotated VIdeo Segmentation 2017) is a public dataset specifically designed for the task of video object segmentation. This dataset is built on top of DAVIS 2016 dataset.

DAVIS 2017 dataset consists of 150 sequences, totaling 10459 annotated frames and 376 objects. The main new challenge added to the sequences in DAVIS 2017 dataset is the presence of multiple objects in the scene.

In this dataset, in addition to segmenting the main moving objects in the scene, the authors divide them by semantics, even though they might have the same motion. Specifically, they segmented people and animals as a single instance, together with their clothes, (including helmet, cap, etc.), and separated any object that is carried and easily separated (such as bags, skis, skateboards, poles, etc.).

In terms of resolution, the majority of new sequences are at 4k resolution (3840×2160 pixels), but there are also some 1440p, 1080p, and 720p images at their raw resolution.

The DAVIS 2017 Challenge was comprised of DAVIS 2017 dataset, an evaluation methodology, and a public competition with a dedicated workshop co-located with CVPR 2017. The DAVIS 2017 challenge was conducted on the downsampled 480p images, as it was the de facto standard for DAVIS 2016 Challenge, and to facilitate their processing given the large amount of frames.

Related publications:

J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbelaez, A. Sorkine-Hornung, and L. Van Gool, “The 2017 DAVIS Challenge on Video Object Segmentation”, IEEE Conference on Computer Vision and Pattern Recognition, 2017.

Related datasets:

DAVIS 2016 dataset