Education, Science, Technology, Innovation and Life
Open Access
Sign In

Multi-View Stereo Matching Method Based on Recursive Neural Networks

Download as PDF

DOI: 10.23977/jipta.2023.060113 | Downloads: 5 | Views: 213


Jiajia Liu 1, Guoliang Jiang 1


1 School of Avionics and Electrical Engineering, Civil Aviation Flight University of China, Guanghan, Sichuan, 618307, China

Corresponding Author

Guoliang Jiang


This article proposes a recursive layered network reconstruction method with pixel-wise attention cost aggregation to address the problems of textureless areas and poor reconstruction results at scene edges in multi-view stereo matching methods. First, multi-scale features of multiple images are extracted through downsampling and transformed into a cost volume using three-dimensional differentiable homography. Then, a pixel-wise attention aggregation module is added to the cost volume aggregation stage to reweight different pixels and generate a new cost volume. Next, a network with recursive layers is used to regularize the cost volume, replacing the traditional 3D CNN network, and an initial depth map is generated. Finally, the filtered and refined depth maps are merged to generate a three-dimensional dense point cloud. Experimental results show that the proposed network model improves completeness, accuracy, and overall quality by 0.377, 0.363, and 0.370, respectively, compared to other network models, and produces more complete point cloud reconstructions in weak texture areas and scene edges.


Cost Volume Aggregation, Recursive Hierarchy, Multi View Stereo, Three-Dimensional Reconstruction


Jiajia Liu, Guoliang Jiang, Multi-View Stereo Matching Method Based on Recursive Neural Networks. Journal of Image Processing Theory and Applications (2023) Vol. 6: 111-121. DOI:


[1] Wang Siqi, Zhang Jiaqiang, Li Liyuan, et al. Application of MVSNet in Three-dimensional Reconstruction of Spatial Objects. Chinese Laser, 2022, (23): 176-185.
[2] Schonberger J L, Frahm J M. Structure-from-motion revisited. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4104-4113.
[3] Zhou Min, Zhang Junran, Li Nanxin. Single Image 3D Reconstruction Model Based on Axial Spatial Attention and Intermediate Fusion Representation. . Semiconductor Optoelectronics, 2023, 44(01): 122-127.
[4] Huang P H, Matzen K, Kopf J, et al. Deepmvs: Learning multi-view stereopsis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2821-2830.
[5] Xie Qiqi, Xin Yuelan, Zeng Xi. Multi-view 3D Reconstruction Based on Attention Mechanism. . Laser Journal, 2023, (1): 136-142.
[6] Liu Huijie, Bai Zhengyao, Cheng Wei, et al. Multi-view Stereo Reconstruction with Fusion Attention Mechanism and Multi-layer U-Net. Journal of Image and Graphics, 2022, (2): 475-485.
[7] Yao Y, Luo Z, Li S, et al. Mvsnet: Depth inference for unstructured multi-view stereo. Proceedings of the European conference on computer vision (ECCV). 2018: 767-783.
[8] Yao Y, Luo Z, Li S, et al. Recurrent mvsnet for high-resolution multi-view stereo depth inference. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 5525-5534
[9] Chen R, Han S, Xu J, et al. Point-based multi-view stereo network. Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1538-1547.
[10] Gu X, Fan Z, Zhu S, et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2495-2504.
[11] Yang J, Mao W, Alvarez J M, et al. Cost volume pyramid based depth inference for multi-view stereo. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 4877-4886.
[12] Cheng S, Xu Z, Zhu S, et al. Deep stereo using adaptive thin volume representation with uncertainty awareness. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2524-2534
[13] Yan J, Wei Z, Yi H, et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking. European conference on computer vision. Springer, Cham, 2020: 674-689.
[14] Jiao L, Huo L, Hu C, et al. Refined Une-t: Unet-based refinement network for cloud and shadow precise segmentation. Remote Sensing, 2020, 12(12): 2001.
[15] Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: lstm cells and network architectures. Neural computation, 2019, 31(7): 1235-1270.
[16] Wan Lei, Tong Xin, Sheng Mingwei, et al. Overview of Deep Learning Image Classification Methods with Softmax Classifier. . Navigation and Control, 2019, 18(6): 1-9.
[17] Li W, Hu C. Multi-focus Image Fusion and Depth Map Estimation Based on Iterative Region Splitting Techniques. Journal of Imaging, 2019, 5(9): 73.
[18] Liu, X., Li, Y., & Wang, Q. “Multi-View Hierarchical Bidirectional Recurrent Neural Network for Depth Video Sequence Based Action Recognition”, International Journal of Pattern Recognition and Artificial Intelligence, (2018), 32(10): 1850033.
[19] Aanæs H, Jensen R R, Vogiatzis G, et al. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, 2016, 120: 153-168.
[20] Wang A, An N, Zhao Y, et al. 3D Reconstruction of remote sensing image using region growing combining with cmvs-pmvs. International Journal of Multimedia and Ubiquitous Engineering, 2016, 11(8): 29-36.

Downloads: 1075
Visits: 95446

Sponsors, Associates, and Links

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.