Skip to main content
Log in

SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point Clouds

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

With the recent availability and affordability of commercial depth sensors and 3D scanners, an increasing number of 3D (i.e., RGBD, point cloud) datasets have been publicized to facilitate research in 3D computer vision. However, existing datasets either cover relatively small areas or have limited semantic annotations. Fine-grained understanding of urban-scale 3D scenes is still in its infancy. In this paper, we introduce SensatUrban, an urban-scale UAV photogrammetry point cloud dataset consisting of nearly three billion points collected from three UK cities, covering 7.6 km\(^2\). Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset. In addition to the more commonly encountered categories such as road and vegetation, urban-level categories including rail, bridge, and river are also included in our dataset. Based on this dataset, we further build a benchmark to evaluate the performance of state-of-the-art segmentation algorithms. In particular, we provide a comprehensive analysis and identify several key challenges limiting urban-scale point cloud understanding. The dataset is available at http://point-cloud-analysis.cs.ox.ac.uk/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://self-driving.lyft.com/level5/data/.

  2. https://www.sensefly.com/drone/ebee-x-fixed-wing-drone/.

  3. https://www.pix4d.com/.

  4. http://point-cloud-analysis.cs.ox.ac.uk.

  5. https://competitions.codalab.org/competitions/31519.

References

  • Aksoy, E. E., Baci, S., & Cavdar, S. (2019). Salsanet: Fast road and vehicle segmentation in LiDAR point clouds for autonomous driving. In 2020 IEEE intelligent vehicles symposium (IV) (pp. 926–932).

  • Armeni, I., Sax, S., Zamir, A. R., & Savarese, S. (2017). Joint 2D-3D-semantic data for indoor scene understanding. In Proceedings of the IEEE/CVF international conference on computer vision.

  • Behley, J., Garbade, M., Milioto, A., Quenzel, J., Behnke, S., Stachniss, C., & Gall, J. (2019). SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9297–9307)

  • Berman, M., Rannen Triki, A., & Blaschko, M. B. (2018). The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4413–4421).

  • Boulch, A. (2019). Generalizing discrete convolutions for unstructured point clouds. arXiv preprint arXiv:1904.02375.

  • Caesar, H., Bankiti, V., Lang, AH., Vora, S., Liong, VE., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., & Beijbom, O. (2020). nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 11621–11631).

  • Chang, A., Dai, A., Funkhouser, T., Halber, M., Niebner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2018). Matterport3D: Learning from RGB-D data in indoor environments. In 7th IEEE international conference on 3D vision, 3DV 2017 (pp. 667–676).

  • Chang, AX., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., & Su, H., et al. (2015). ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012.

  • Chang, MF., Lambert, J., Sangkloy, P., Singh, J., Bak, S., Hartnett, A., Wang, D., Carr, P., Lucey, S., Ramanan, D., et al. (2019) Argoverse: 3D tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8748–8757).

  • Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597–1607).

  • Cheng, R., Razani, R., Taghavi, E., Li, E., & Liu, B. (2021) 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12547–12556).

  • Choy, C., Gwak, J., & Savarese, S. (2019). 4D spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3075–3084).

  • Cortinhal, T., Tzelepis, G., & Aksoy, EE. (2020). Salsanext: Fast semantic segmentation of LiDAR point clouds for autonomous driving. arXiv preprint arXiv:2003.03653.

  • Dai, A., Chang, AX., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017). ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5828–5839).

  • De Deuge, M., Quadros, A., Hung, C., & Douillard, B. (2013). Unsupervised feature learning for classification of outdoor 3D scans. In Australasian conference on robitics and automation (Vol. 2, p. 1).

  • Gaidon, A., Wang, Q., Cabon, Y., & Vig, E. (2016). Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4340–4349).

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the KITTI vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354–3361).

  • Geiger, A., Lenz, P., Stiller, C., & Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11), 1231–1237.

    Article  Google Scholar 

  • Gerke, M., & Kerle, N. (2011). Automatic structural seismic damage assessment with airborne oblique pictometry imagery. Photogrammetric Engineering& Remote Sensing, 77(9), 885–898.

    Article  Google Scholar 

  • Geyer, J., Kassahun, Y., Mahmudi, M., Ricou, X., Durgesh, R., Chung, A. S., Hauswald, L., Pham, V. H., Mühlegg, M., Dorn, S., et al. (2020) A2D2: Audi autonomous driving dataset. arXiv preprint arXiv:2004.06320.

  • Graham, B., Engelcke, M., & van der Maaten, L. (2018). 3D semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision.

  • Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., & Bennamoun, M. (2020). Deep learning for 3D point clouds: A survey. IEEE TPAMI.

  • Hackel, T., Savinov, N., Ladicky, L., Wegner, JD., Schindler, K., & Pollefeys, M. (2017). Semantic3D.Net: A new large-scale point cloud classification benchmark. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences

  • Han, L., Zheng, T., Xu, L., & Fang, L. (2020). Occuseg: Occupancy-aware 3D instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2940–2949).

  • Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2016). SceneNet: understanding real world indoor scenes with synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision.

  • He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729–9738).

  • Hou, J., Graham, B., Nießner, M., & Xie, S. (2020). Exploring data-efficient 3D scene understanding with contrastive scene contexts. arXiv preprint arXiv:2012.09165.

  • Hou, L., Wang, Y., Wang, X., Maynard, N., Cameron, I. T., Zhang, S., & Jiao, Y. (2014). Combining photogrammetry and augmented reality towards an integrated facility management system for the oil industry. Proceedings of the IEEE, 102(2), 204–220.

    Article  Google Scholar 

  • Hu, J., You, S., & Neumann, U. (2003). Approaches to large-scale urban modeling. IEEE Computer Graphics and Applications, 23(6), 62–69.

    Article  Google Scholar 

  • Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, N., & Markham, A. (2020). RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF international conference on computer vision.

  • Hu, Q., Yang, B., Khalid, S., Xiao, W., Trigoni, N., & Markham, A. (2021). Towards semantic segmentation of urban-scale 3D point clouds: A dataset, benchmarks and challenges. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4977–4987).

  • Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, CW., & Jia, J. (2020). Pointgroup: Dual-set point grouping for 3D instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and Pattern recognition (pp. 4867–4876).

  • Kölle, M., Laupheimer, D., Schmohl, S., Haala, N., Rottensteiner, F., Wegner, JD., & Ledoux, H. (2021). H3d: Benchmark on semantic segmentation of high-resolution 3d point clouds and textured meshes from uav lidar and multi-view-stereo. arXiv preprint arXiv:2102.05346.

  • Landrieu, L., & Simonovsky, M. (2018). Large-scale point cloud semantic segmentation with superpoint graphs. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4558–4567).

  • Lang, AH., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12697–12705).

  • Le, T., & Duan, Y. (2018). PointGrid: A deep network for 3D shape understanding. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9204–9214).

  • Lei, H., Akhtar, N., & Mian, A. (2020). Spherical kernel for efficient graph convolution on 3D point clouds. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Li, X., Li, C., Tong, Z., Lim, A., Yuan, J., Wu, Y., Tang, J., & Huang, R. (2020). Campus3D: A photogrammetry point cloud benchmark for hierarchical understanding of outdoor scene. ACM MM.

  • Li, Y., Bu, R., Sun, M., Wu, W., Di, X., & Chen, B. (2018). PointCNN: Convolution on X-transformed points. Advances in Neural Information Processing Systems.

  • Lin, TY., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE/CVF international conference on computer vision.

  • Liu, Z., Tang, H., Lin, Y., & Han, S. (2019). Point-voxel cnn for efficient 3D deep learning. Advances in Neural Information Processing Systems.

  • Lyu, Y., Huang, X., & Zhang, Z. (2020). Learning to segment 3D point clouds in 2D image space. In Proceedings of the IEEE/CVF International Conference on Computer Vision.

  • McCormac, J., Handa, A., Leutenegger, S., & Davison, AJ. (2016) SceneNet RGB-D: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079.

  • Meng, HY., Gao, L., Lai, YK., & Manocha, D. (2019). VV-Net: Voxel vae net with group convolutions for point cloud segmentation. In Proceedings of the IEEE/CVF international conference on computer vision.

  • Milioto, A., Vizzo, I., Behley, J., & Stachniss, C. (2019). Rangenet++: Fast and accurate LiDAR semantic segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 4213–4220).

  • Mo, K., Zhu, S., Chang, AX., Yi, L., Tripathi, S., Guibas, LJ., & Su, H. (2019). PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 909–918).

  • Munoz, D., Bagnell, JA., Vandapel, N., & Hebert, M. (2009). Contextual classification with functional max-margin markov networks. In Proceedings of the IEEE/CVF international conference on computer vision.

  • Özdemir, E., Toschi, I., & Remondino, F. (2019). a multi-purpose benchmark for photogrammetric urban 3D reconstruction in a controlled environment. Evaluation and Benchmarking Sensors, Systems and Geospatial Data in Photogrammetry and Remote Sensing, 42, 53–60.

  • Pan, Y., Gao, B., Mei, J., Geng, S., Li, C., & Zhao, H. (2020). Semanticposs: A point cloud dataset with large quantity of dynamic instances. arXiv preprint arXiv:2002.09147.

  • Poursaeed, O., Jiang, T., Qiao, Q., Xu, N., & Kim, V. G. (2020). Self-supervised learning of point clouds via orientation estimation. arXiv preprint arXiv:2008.00305.

  • Qi, CR., Su, H., Mo, K., & Guibas, LJ. (2017a). PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 652–660).

  • Qi, CR., Yi, L., Su, H., & Guibas, LJ. (2017b). PointNet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems.

  • Qin, N., Tan, W., Ma, L., Zhang, D., & Li, J. (2021). Opengf: An ultra-large-scale ground filtering dataset built upon open als point clouds around the world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1082–1091).

  • Rao, D., Le, Q. V., Phoka, T., Quigley, M., Sudsang, A., & Ng, A. Y. (2010). Grasping novel objects with depth segmentation. In 2010 IEEE/RSJ international conference on intelligent robots and systems (pp. 2578–2585).

  • Ros, G., Sellart, L., Materzynska, J., Vazquez, D., & Lopez, A. M. (2016). The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE/CVF international conference on computer vision pp. 3234–3243).

  • Rosu, R. A., Schütt, P., Quenzel, J., & Behnke, S. (2019). LatticeNet: Fast point cloud segmentation using permutohedral lattices. arXiv preprint arXiv:1912.05905.

  • Rottensteiner, F., Sohn, G., Jung, J., Gerke, M., Baillard, C., Benitez, S., & Breitkopf, U. (2012). The ISPRS benchmark on urban object classification and 3D building reconstruction. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences I-3 (2012), Nr 1, 1(1), 293–298.

    Article  Google Scholar 

  • Roynard, X., Deschaud, J. E., & Goulette, F. (2018). Paris-Lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification. The International Journal of Robotics Research, 37(6), 545–557.

    Article  Google Scholar 

  • Sauder, J., & Sievers, B. (2019). Self-supervised deep learning on point clouds by reconstructing space. Advances in Neural Information Processing Systems, 12962–12972.

  • Serna, A., Marcotegui, B., Goulette, F., & Deschaud, JE. (2014). Paris-rue-madame database: A 3D mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods. In 4th international conference on pattern recognition, applications and methods ICPRAM 2014.

  • Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In European conference on computer vision (pp. 746–760).

  • Song, S., Lichtenberg, S. P., & Xiao, J. (2015). Sun RGB-D: A RGB-D scene understanding benchmark suite. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 567–576).

  • Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y. Caine B, et al. (2020) Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2446–2454).

  • Tan, W., Qin, N., Ma, L., Li, Y., Du, J., Cai, G., Yang, K., & Li, J. (2020). Toronto-3D: A large-scale mobile LiDAR dataset for semantic segmentation of urban roadways. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 202–203).

  • Tang, H., Liu, Z., Zhao, S., Lin, Y., Lin, J., Wang, H., & Han, S. (2020). Searching efficient 3D architectures with sparse point-voxel convolution. In European conference on computer vision (pp. 685–702).

  • Tatarchenko, M., Park, J., Koltun, V., & Zhou, Q. Y. (2018). Tangent convolutions for dense prediction in 3D. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 3887–3896).

  • Tchapmi, L., Choy, C., Armeni, I., Gwak, J., & Savarese, S. (2017). Segcloud: Semantic segmentation of 3D point clouds. In 2017 international conference on 3D vision (3DV) (pp. 537–547).

  • Thomas H, Qi CR, Deschaud JE, Marcotegui B, Goulette F, & Guibas LJ (2019) KPConv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6411–6420).

  • Tong, G., Li, Y., Chen, D., Sun, Q., Cao, W., & Xiang, G. (2020). CSPC-dataset: New LiDAR point cloud dataset and benchmark for large-scale scene semantic segmentation. IEEE Access.

  • Uy, M. A., Pham, Q. H., Hua, B. S., Nguyen, T., & Yeung, S. K. (2019). Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1588–1597).

  • Valada, A., Vertens, J., Dhall, A., & Burgard, W. (2017). Adapnet: Adaptive semantic segmentation in adverse environmental conditions. In 2017 IEEE international conference on robotics and automation (ICRA) (pp. 4644–4651).

  • Vallet, B., Brédif, M., Serna, A., Marcotegui, B., & Paparoditis, N. (2015). TerraMobilita/iQmulus urban point cloud analysis benchmark. Computers & Graphics

  • Varney, N., Asari, V. K., & Graehling, Q. (2020). DALES: A large-scale aerial LiDAR data set for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp 186–187).

  • Wang, H., Liu, Q., Yue, X., Lasenby, J., & Kusner, M. J. (2020). Pre-training by completing point clouds. arXiv preprint arXiv:2010.01089.

  • Wang, L., Huang, Y., Hou, Y., Zhang, S., & Shan, J. (2019a). Graph attention convolution for point cloud semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision.

  • Wang, Y., Sun, Y., Liu, Z., Sarma, S. E., Bronstein, M. M., & Solomon, J. M. (2019). Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics (TOG), 38(5), 1–12.

    Article  Google Scholar 

  • Wei, J., Lin, G., Yap, K. H., Hung, T. Y., & Xie, L. (2020). Multi-path region mining for weakly supervised 3D semantic segmentation on point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4384–4393).

  • Westoby, M. J., Brasington, J., Glasser, N. F., Hambrey, M. J., & Reynolds, J. M. (2012). ‘structure-from-motion’ photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology, 179, 300–314.

    Article  Google Scholar 

  • Wu, B., Wan, A., Yue, X., & Keutzer, K. (2018a). SqueezeSeg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud. In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 1887–1893).

  • Wu, B., Zhou, X., Zhao, S., Yue, X., & Keutzer, K. (2019). Squeezesegv2: Improved model structure and unsupervised domain adaptation for road-object segmentation from a LiDAR point cloud. In 2019 international conference on robotics and automation (ICRA) (pp. 4376–4382).

  • Wu, W., Qi, Z., & Fuxin, L. (2018b). PointConv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9621–9630).

  • Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1912–1920).

  • Xie, S., Gu, J., Guo, D., Qi, C. R., Guibas, L., & Litany, O. (2020). Pointcontrast: Unsupervised pre-training for 3D point cloud understanding. In European conference on computer vision (pp. 574–591).

  • Xu, C., Wu, B., Wang, Z., Zhan, W., Vajda, P., Keutzer, K., & Tomizuka, M. (2020). SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 1–19).

  • Xu, X., & Lee, G. H. (2020). Weakly supervised semantic point cloud segmentation: Towards 10x fewer labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13706–13715)

  • Yan, X., Zheng, C., Li, Z., Wang, S., & Cui, S. (2020). PointASNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5589–5598).

  • Yang, B., Wang, J., Clark, R., Hu, Q., Wang, S., Markham, A., & Trigoni, N. (2019). Learning object bounding boxes for 3D instance segmentation on point clouds. Advances in Neural Information Processing Systems.

  • Ye, X., Li, J., Huang, H., Du, L., & Zhang, X. (2018). 3D recurrent neural networks with context fusion for point cloud semantic segmentation. In Proceedings of the European conference on computer vision (ECCV)

  • Ye, Z., Xu, Y., Huang, R., Tong, X., Li, X., Liu, X., Luan, K., Hoegner, L., & Stilla, U. (2020). LASDU: A large-scale aerial LiDAR dataset for semantic labeling in dense urban areas. ISPRS International Journal of Geo-Information, 9(7), 450.

  • Yi, L., Kim, V. G., Ceylan, D., Shen, I. C., Yan, M., Su, H., Lu, C., Huang, Q., Sheffer, A., & Guibas, L. (2016). A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics (TOG), 35(6), 1–12.

    Article  Google Scholar 

  • Yuan, W., Khot, T., Held, D., Mertz, C., & Hebert, M. (2018). PCN: Point completion network. In 2018 International Conference on 3D Vision (3DV) (pp. 728–737)

  • Zhang, Y., Zhou, Z., David, P., Yue, X., Xi, Z., Gong, B., & Foroosh, H. (2020). PolarNet: An improved grid representation for online LiDAR point clouds semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9601–9610).

  • Zhang, Z., Gerke, M., Vosselman, G., & Yang, M. Y. (2018). A patch-based method for the evaluation of dense image matching quality. International Journal of Applied Earth Observation and Geoinformation, 70, 25–34.

    Article  Google Scholar 

  • Zhang, Z., Hua, B. S., & Yeung, S. K. (2019) ShellNet: Efficient point cloud convolutional neural networks using concentric shells statistics. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1607–1616).

  • Zhang, Z., Girdhar, R., Joulin, A., & Misra, I. (2021). Self-supervised pretraining of 3D features on any point-cloud. arXiv preprint arXiv:2101.02691.

  • Zhao, H., Jiang, L., Jia, J., Torr, P., & Koltun, V. (2020). Point transformer. arXiv preprint arXiv:2012.09164.

  • Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ADE20K dataset. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 633–641).

  • Zhou, Y., & Tuzel, O. (2018). VoxelNet: End-to-end learning for point cloud based 3D object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4490–4499).

  • Zhu, X., Zhou, H., Wang, T., Hong, F., Ma, Y., Li, W., Li, H., & Lin, D. (2021). Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

  • Zolanvari, S., Ruano, S., Rana, A., Cummins, A., da Silva, R. E., Rahbar, M., & Smolic, A. (2019). DublinCity: Annotated LiDAR point cloud and its applications. In British machine vision conference.

Download references

Acknowledgements

This work was supported by a China Scholarship Council (CSC) scholarship, Huawei UK AI Fellowship, and the UKRI Natural Environment Research Council (NERC) Flood-PREPARED project (NE/P017134/1). Bo Yang was partially supported by HK PolyU (P0034792) and Shenzhen Science and Technology Innovation Commission (JCYJ20210324120603011). The authors highly appreciate the Data Study Group (DSG) organised by the Alan Turing Institute and the GPU resources generously provided by the LAVA group led by Professor Yulan Guo in the Sun Yat-sen University, China. The authors would also like to thank the pre-training results provided by Hanchen Wang from the University of Cambridge.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang.

Additional information

Communicated by A. Hilton.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Q., Yang, B., Khalid, S. et al. SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point Clouds. Int J Comput Vis 130, 316–343 (2022). https://doi.org/10.1007/s11263-021-01554-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01554-9

Keywords

Navigation