Education, Science, Technology, Innovation and Life
Open Access
Sign In

Investigation on Chinese Function Words and Construction of Generalized Function Word Knowledge Base in Natural Language Processing

Download as PDF

DOI: 10.23977/langl.2023.061705 | Downloads: 6 | Views: 256

Author(s)

Lin Yang 1, Mengyang Yu 1

Affiliation(s)

1 School of Computer Science and Technology, North University of China, Taiyuan, Shanxi, China

Corresponding Author

Lin Yang

ABSTRACT

The study of function words in Chinese has a long history and has achieved fruitful results. However, most of the existing research on function words is human-centered, and it is difficult to avoid subjective and vague description of the character characteristics of function words. Therefore, it is difficult to apply it directly to the research of Natural Language Processing (NLP). From the perspective of computational linguistics, and combined with the existing literature and the People’s Daily of part of speech tagging and its corpus, this paper aimed to provide some scientific basis for the use of function words in Chinese. This paper mainly explains the theory and method of conjunctions and phrases in Chinese function words. This paper integrated, proofread and identified Chinese function words through various kinds of dictionary corpus, and the knowledge base was constructed. By comparing the two preprocessing models in NLP, this paper mainly compared the function word recognition rate, text, grammar, semantics, lexical resolution, and the amount of training text data. The results showed that the ELMo model improved the recognition rate of function words by 14.5% compared with NNLM model in the constructed knowledge base, and had a higher understanding of all kinds of grammar and morphology. There would also be more text data for training. However, the NNLM model would be more stable in the amount of training text.

KEYWORDS

Knowledge Base Construction, Generalized Function Words, Chinese Function Words, Natural Language Processing, ELMo Model, NNLM Model

CITE THIS PAPER

Lin Yang, Mengyang Yu, Investigation on Chinese Function Words and Construction of Generalized Function Word Knowledge Base in Natural Language Processing . Lecture Notes on Language and Literature (2023) Vol. 6: 26-36. DOI: http://dx.doi.org/10.23977/langl.2023.061705.

REFERENCES

[1] Otter, Daniel W., Julian R. Medina, and Jugal K. Kalita. "A survey of the usages of deep learning for natural language processing." IEEE transactions on neural networks and learning systems 32.2 (2020): 604-624.
[2] Galassi, Andrea, Marco Lippi, and Paolo Torroni. "Attention in natural language processing." IEEE Transactions on Neural Networks and Learning Systems 32.10 (2020): 4291-4308.
[3] Qiu, Xipeng. "Pre-trained models for natural language processing: A survey." Science China Technological Sciences 63.10 (2020): 1872-1897.
[4] Li, Hang. "Deep learning for natural language processing: advantages and challenges." National Science Review 5.1 (2018): 24-26.
[5] Zhang, Wei. "Adversarial attacks on deep-learning models in natural language processing: A survey." ACM Transactions on Intelligent Systems and Technology (TIST) 11.3 (2020): 1-41.
[6] Soysal, Ergin. "CLAMP–a toolkit for efficiently building customized clinical natural language processing pipelines." Journal of the American Medical Informatics Association 25.3 (2018): 331-336.
[7] Hu, Yingjie, Huina Mao, and Grant McKenzie. "A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements." International Journal of Geographical Information Science 33.4 (2019): 714-738.
[8] Zhang, Yi. "Chinese adverbs: A Cognitive Grammar exploration." Review of Cognitive Linguistics. Published under the auspices of the Spanish Cognitive Linguistics Association 20.2 (2022): 330-356.
[9] Daojia, C. "The analysis of the errors in acquisition of chinese adverbs by Uzbek speakers." Journal of Advanced Research in Dynamical and Control Systems 12.S2 (2020): 171-181.
[10] Zhang, Ning. "Sentence-final aspect particles as finite markers in Mandarin Chinese." Linguistics 57.5 (2019): 967-1023.
[11] Liu, Chensheng. "Projecting adjectives in Chinese." Journal of East Asian Linguistics 27.1 (2018): 67-109.
[12] Lu, Xiaolong. "The constructionalization of antonymous compounds: Evidence from the adverb chi-zao in Mandarin Chinese." Concentric 48.1 (2022): 30-69.
[13] Li, Long, and Canzhong Wu. "DEGREE OF INTENSITY in English-Chinese translation: a corpus-based approach." Functional Linguistics 6.1 (2019): 1-19.
[14] Hsieh, Miao-Ling, and Yu-Fang Wang. "Acquiring the polysemous adverb HAI in Chinese by English-speaking, Japanese-speaking, and Korean-speaking CSL learners." Concentric 46.2 (2020): 173-205.
[15] Xu, Zheng. "The word status of Chinese adjective-noun combinations." Linguistics 56.1 (2018): 207-256.
[16] Shen, S., Zhu, C., Fan, C. Wu, C., Huang, X., Zhou, L. Research on the evolution and driving forces of the manufacturing industry during the “13th five-year plan” period in Jiangsu province of China based on natural language processing. PLoS ONE .2021, 16(8): e0256162. 
[17] C. Zhang, C. -N. Hsu, Y. Katsis, H. -C. Kim and Y. Vázquez-Baeza, Theoretical Rule-based Knowledge Graph Reasoning by Connectivity Dependency Discovery. 2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1-9.

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.