Tibetan Speech Recognition Based on Wav2vec Feature
DOI: 10.23977/acss.2023.070302 | Downloads: 15 | Views: 457
Author(s)
Zixi Yan 1, Guanyu Li 1, Senyan Li 1
Affiliation(s)
1 Northwest Minzu University, Lanzhou, Gansu, 730000, China
Corresponding Author
Guanyu LiABSTRACT
Speech recognition tasks for small languages such as Tibetan language have been unable to achieve the same results as those for large languages. In this paper, the wav2vec2 model is introduced into the traditional model to extract features and improve the effect of Tibetan speech recognition. In this paper, the kaldi tool was used to construct a speech recognition system for Tibetan language, and the wav2vec2 model was used as the feature extractor to replace the traditional mfcc features. The improvement effect of the front-end model and traditional speech recognition model on speech recognition of minority languages such as Tibetan wa comparatively anayzed, and the effectiveness of wav2vec2 model in non-pre-trained languages was verified. Finally, the recognition efficiency of the proposed model on per and wer was increased by 2.92% and 5.24% respectively as compared with the baseline system.
KEYWORDS
Tibetan, wav2vec2, speech recognition, kaldi, low resourceCITE THIS PAPER
Zixi Yan, Guanyu Li, Senyan Li. Tibetan Speech Recognition Based on Wav2vec Feature. Advances in Computer, Signals and Systems (2023) Vol. 7: 9-12. DOI: http://dx.doi.org/10.23977/acss.2023.070302.
REFERENCES
[1] Liu Jingping and Dexi Jiacuo, Design of ando-tibetan consonant recognition, Research on Ethnic Language Information Technology -- The 11th National Ethnic Language Information Symposium. 2007: p. 11-15.
[2] Müller and Meinard, Information retrieval for music and motion. 2007: Springer Berlin Heidelberg. 69-84.
[3] Rebello Sinda and Y.H. Y, An integrated approach for system functional reliability assessment using Dynamic Bayesian Network and Hidden Markov Model. Reliability Engineering & System Safety, 2018: p. 124-135.
[4] Pei Chunbao, Research on Tibetan Speech Recognition Technology based on Standard Lhasa language, 2009, Tibet University: Lhasa.
[5] Li Guanyu and Meng meng, Study on Acoustic Model for Continuous Large Vocabulary Speech Recognition of Tibetan Lhasa Dialect. Computer Engineering, 2012: p.189-191.
[6] Baevski A., et al., wav2vec 2.0 A Framework for Self-Supervised. 2020.
[7] Baevski A., S. Schneider and M. Auli, vq-wav2vec- Self-Supervised Learning of Discrete Speech Representations. 2020.
[8] Conneau A., et al., {Unsupervised Cross-lingual Representation Learning for Speech Recognition}. arXiv e-prints, 2020: p. arXiv:2006.13979.
[9] Yin Chen and Wu Min, A review of N-gram model. Computer Systems & Applications, 2018. 27(10): p.33-38.
[10] Xu Q., A. Baevski and M. Auli, Simple and Effective Zero-shot Cross-lingual Phoneme Recognition. arXiv e-prints, 2021: p. arXiv:2109.11680.
Downloads: | 13338 |
---|---|
Visits: | 257235 |
Sponsors, Associates, and Links
-
Power Systems Computation
-
Internet of Things (IoT) and Engineering Applications
-
Computing, Performance and Communication Systems
-
Journal of Artificial Intelligence Practice
-
Journal of Network Computing and Applications
-
Journal of Web Systems and Applications
-
Journal of Electrotechnology, Electrical Engineering and Management
-
Journal of Wireless Sensors and Sensor Networks
-
Journal of Image Processing Theory and Applications
-
Mobile Computing and Networking
-
Vehicle Power and Propulsion
-
Frontiers in Computer Vision and Pattern Recognition
-
Knowledge Discovery and Data Mining Letters
-
Big Data Analysis and Cloud Computing
-
Electrical Insulation and Dielectrics
-
Crypto and Information Security
-
Journal of Neural Information Processing
-
Collaborative and Social Computing
-
International Journal of Network and Communication Technology
-
File and Storage Technologies
-
Frontiers in Genetic and Evolutionary Computation
-
Optical Network Design and Modeling
-
Journal of Virtual Reality and Artificial Intelligence
-
Natural Language Processing and Speech Recognition
-
Journal of High-Voltage
-
Programming Languages and Operating Systems
-
Visual Communications and Image Processing
-
Journal of Systems Analysis and Integration
-
Knowledge Representation and Automated Reasoning
-
Review of Information Display Techniques
-
Data and Knowledge Engineering
-
Journal of Database Systems
-
Journal of Cluster and Grid Computing
-
Cloud and Service-Oriented Computing
-
Journal of Networking, Architecture and Storage
-
Journal of Software Engineering and Metrics
-
Visualization Techniques
-
Journal of Parallel and Distributed Processing
-
Journal of Modeling, Analysis and Simulation
-
Journal of Privacy, Trust and Security
-
Journal of Cognitive Informatics and Cognitive Computing
-
Lecture Notes on Wireless Networks and Communications
-
International Journal of Computer and Communications Security
-
Journal of Multimedia Techniques
-
Automation and Machine Learning
-
Computational Linguistics Letters
-
Journal of Computer Architecture and Design
-
Journal of Ubiquitous and Future Networks