Vectorizing World Buildings: Planar Graph Reconstruction by Primitive Detection and Relationship Inference

Nauata, Nelson; Furukawa, Yasutaka

doi:10.1007/978-3-030-58598-3_42

Nelson Nauata¹² &
Yasutaka Furukawa¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12353))

Included in the following conference series:

European Conference on Computer Vision

3497 Accesses
17 Citations

Abstract

This paper tackles a 2D architecture vectorization problem, whose task is to infer an outdoor building architecture as a 2D planar graph from a single RGB image. We provide a new benchmark with ground-truth annotations for 2,001 complex buildings across the cities of Atlanta, Paris, and Las Vegas. We also propose a novel algorithm utilizing 1) convolutional neural networks (CNNs) that detects geometric primitives and infers their relationships and 2) an integer programming (IP) that assembles the information into a 2D planar graph. While being a trivial task for human vision, the inference of a graph structure with an arbitrary topology is still an open problem for computer vision. Qualitative and quantitative evaluations demonstrate that our algorithm makes significant improvements over the current state-of-the-art, towards an intelligent system at the level of human perception. We will share code and data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Rooms are regions in their problem and can be detected easily. Our regions are roof segments and much less distinguishable.
2.
In short, a corner is declared to be correct if there exists a ground-truth corner within a certain distance. An edge is declared to be correct if both corners are declared to be correct. A region is declared to be correct if there exists a ground-truth region with more than 0.7 IOU. Our only change is to tighten the distance tolerance on the corner detection from 10 pixels to 8 pixels.

References

SpaceNet on Amazon Web Services (AWS). “Datasets.” The SpaceNet Catalog. Last modified April 30, 2018. https://spacenetchallenge.github.io/datasets/datasetHomePage.html Accessed 19 Oct 2018
Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with polygon-rnn++. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 859–868 (2018)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Google Scholar
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 (2018)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Google Scholar
Chao, Y.-W., Choi, W., Pantofaru, C., Savarese, S.: Layout estimation of highly cluttered indoor scenes using geometric and semantic cues. In: Petrosino, A. (ed.) ICIAP 2013. LNCS, vol. 8157, pp. 489–499. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41184-7_50
Chapter Google Scholar
Chen, J., Liu, C., Wu, J., Furukawa, Y.: Floor-sp: inverse cad for floorplans by sequential room-wise shortest path. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2661–2670 (2019)
Google Scholar
Cheng, D., Liao, R., Fidler, S., Urtasun, R.: Darnet: deep active ray network for building segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7431–7439 (2019)
Google Scholar
Etten, A.V., Lindenbaum, D., Bacastow, T.M.: Spacenet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232 (2018)
Flint, A., Mei, C., Murray, D., Reid, I.: A dynamic programming approach to reconstructing building interiors. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 394–407. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_29
Chapter Google Scholar
Flint, A., Murray, D., Reid, I.: Manhattan scene understanding using monocular, stereo, and 3D features. In: 2011 International Conference on Computer Vision, pp. 2228–2235. IEEE (2011)
Google Scholar
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Manhattan-world stereo. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1422–1429. IEEE (2009)
Google Scholar
Hamaguchi, R., Hikosaka, S.: Building detection from satellite imagery using ensemble of size-specific detectors. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 223–2234. IEEE (2018)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 2980–2988. IEEE (2017)
Google Scholar
Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: Computer vision, 2009 IEEE 12th international conference on, pp. 1849–1856. IEEE (2009)
Google Scholar
Huang, K., Wang, Y., Zhou, Z., Ding, T., Gao, S., Ma, Y.: Learning to parse wireframes in images of man-made environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 626–635 (2018)
Google Scholar
Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: Roomnet: end-to-end room layout estimation. arXiv preprint arXiv:1703.06241 (2017)
Lin, H., et al.: Semantic decomposition and reconstruction of residential scenes from lidar data. ACM Trans. Graph. (TOG) 32(4), 66 (2013)
Google Scholar
Liu, C., Wu, J., Kohli, P., Furukawa, Y.: Raster-to-vector: revisiting floorplan transformation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2195–2203 (2017)
Google Scholar
Liu, C., Wu, J., Furukawa, Y.: Floornet: a unified framework for floorplan reconstruction from 3D scans. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 201–217 (2018)
Google Scholar
Liu, H., Zhang, J., Zhu, J., Hoi, S.: Deepfacade: a deep learning approach to facade parsing. pp. 2301–2307 (2017) https://doi.org/10.24963/ijcai.2017/320
Martinović, A., Mathias, M., Weissenberg, J., Van Gool, L.: A three-layered approach to facade parsing. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 416–429. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_31
Chapter Google Scholar
Nishida, G., Bousseau, A., Aliaga, D.G.: Procedural modeling of a building from a single image. Comput. Graph. Forum 37, 415–429 (2018)
Google Scholar
Parish, Y.I., Müller, P.: Procedural modeling of cities. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 301–308. ACM (2001)
Google Scholar
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction for 3D indoor scene understanding. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2815–2822. IEEE (2012)
Google Scholar
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science & Business Media, Springer, London (2010). https://doi.org/10.1007/978-1-84882-935-0
Book MATH Google Scholar
Von Gioi, R.G., Jakubowicz, J., Morel, J.M., Randall, G.: Lsd: a line segment detector. Image Process. Line 2, 35–55 (2012)
Google Scholar
Yu, F., Koltun, V., Funkhouser, T.A.: Dilated residual networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 636–644 (2017)
Google Scholar
Zeng, H., Wu, J., Furukawa, Y.: Neural procedural reconstruction for residential buildings. In: The European Conference on Computer Vision (ECCV), pp. 737–753 (2018)
Google Scholar
Zhang, Z., et al.: Ppgnet: learning point-pair graph for line segment detection. arXiv preprint arXiv:1905.03415 (2019)
Zhou, Y., Qi, H., Ma, Y.: End-to-end wireframe parsing. arXiv preprint arXiv:1905.03246 (2019)

Download references

Acknowledgement

This research is partially supported by NSERC Discovery Grants, NSERC Discovery Grants Accelerator Supplements, and DND/NSERC Discovery Grant Supplement. This research is also supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DOI/IBC) contract number D17PC00288. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.

Author information

Authors and Affiliations

Simon Fraser University, Burnaby, Canada
Nelson Nauata & Yasutaka Furukawa

Authors

Nelson Nauata
View author publications
You can also search for this author in PubMed Google Scholar
Yasutaka Furukawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nelson Nauata or Yasutaka Furukawa .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17335 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nauata, N., Furukawa, Y. (2020). Vectorizing World Buildings: Planar Graph Reconstruction by Primitive Detection and Relationship Inference. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12353. Springer, Cham. https://doi.org/10.1007/978-3-030-58598-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-58598-3_42
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58597-6
Online ISBN: 978-3-030-58598-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics