Education, Science, Technology, Innovation and Life
Open Access
Sign In

Study on Vegetation Extraction from Riparian Zone Images Based on Cswin Transformer

Download as PDF

DOI: 10.23977/acss.2024.080209 | Downloads: 15 | Views: 348

Author(s)

Yuanjie Ma 1, Yaping Zhang 1

Affiliation(s)

1 School of Information Science and Technology, Yunnan Normal University, Kunming, Yunnan, 650500, China

Corresponding Author

Yaping Zhang

ABSTRACT

In the field of ecological conservation, accurately extracting vegetation areas in UAV images is a critical task. This study aims to accurately identify vegetation from high-resolution riverine zone UAV images. Facing the challenges of complex factors such as light variations and water ripples, a deep learning technique, which combines Convolutional Neural Networks and Vision Transformer, is used in this study, which proposes a semantic segmentation network structure based on an encoder-decoder. We innovatively introduce the Explicit Visual Center mechanism (EVC) and CSWin Transformer structure to optimize image feature capture, especially in dealing with the classification challenges caused by the similarity between vegetation and water ripples. The experimental results show that the proposed network has the best results compared with the classical network models such as U-Net, PSP-Net, DeepLabv3+, etc., and the mIOU phase of U-Net, which is the highest among the three networks, is 1.3 percentage points higher. In this paper, an effective scheme is proposed for vegetation extraction from UAV images in the riparian zone.

KEYWORDS

Vegetation extraction; deep learning; semantic segmentation; Transformer

CITE THIS PAPER

Yuanjie Ma, Yaping Zhang, Study on Vegetation Extraction from Riparian Zone Images Based on Cswin Transformer. Advances in Computer, Signals and Systems (2024) Vol. 8: 57-62. DOI: http://dx.doi.org/10.23977/acss.2024.080209.

REFERENCES

[1] Dong X, Bao J, Chen D, et al. Cswin transformer: A general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 12124-12134.
[2] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
[3] Arnab A, Dehghani M, Heigold G, et al. Vivit: A video vision transformer[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 6836-6846.
[4] Tolstikhin I O, Houlsby N, Kolesnikov A, et al. Mlp-mixer: An all-mlp architecture for vision [J]. Advances in neural information processing systems, 2021, 34: 24261-24272. 
[5] Quan Y, Zhang D, Zhang L, et al. Centralized feature pyramid for object detection[J]. IEEE Transactions on Image Processing, 2023, 32:4341-4354
[6] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer International Publishing, 2015: 234-241.
[7] Bieniek A, Moga A. An efficient watershed algorithm based on connected components [J]. Pattern recognition, 2000, 33(6): 907-916.
[8] Xiaoqin W, Miaomiao W, Shaoqiang W, et al. Extraction of vegetation information from visible unmanned aerial vehicle images[J]. Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2015, 31(5): 152-159.
[9] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[10] Zhu L, Wang X, Ke Z, et al. Biformer: Vision transformer with bi-level routing attention[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 10323-10333.
[11] Ding M, Xiao B, Codella N, et al. Davit: Dual attention vision transformers[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 74-92.
[12] Li Y, Yuan G, Wen Y, et al. Efficientformer: Vision transformers at mobilenet speed[J]. Advances in Neural Information Processing Systems, 2022, 35: 12934-12949. 

Downloads: 23487
Visits: 391517

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.