Cityscapes is a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling and understanding of urban street scenes. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded from a moving vehicle during the span of several months, covering spring, summer, and fall in 50 cities, primarily in Germany but also in neighboring countries. Data recording and annotation methodology is designed in a way to capture the high variability of outdoor street scenes.

5,000 of these images have high quality pixel-level annotations and 20,000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data.

The authors deliberately did not record in adverse weather conditions, such as heavy rain or snow, as they believe such conditions require specialized techniques and datasets.

Related publications:

  • Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele, "The Cityscapes Dataset for Semantic Urban Scene Understanding," in Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.