Education, Science, Technology, Innovation and Life
Open Access
Sign In

The Scope of Part of Speech Tagging: A Bibliometric Study

Download as PDF

DOI: 10.23977/langl.2024.070408 | Downloads: 6 | Views: 142

Author(s)

Xiaoyi Du 1, Litong Wu 2, Bingliang Zhang 2, Xinye Li 2, Feng Hu 3

Affiliation(s)

1 Research Center for Language and Language Education, Central China Normal University, Wuhan, China
2 College of Economics and Management, Zhejiang Normal University, Jinhua, China
3 School of Business, Taizhou University, Taizhou, Jinhua, China

Corresponding Author

Xiaoyi Du

ABSTRACT

POS tagging is a process of identifying the part of speech of a word in a text by considering the context in which it appears. To better understand the intellectual framework of POS tagging research, we conducted a thorough analysis of POS literature available in the Web of Science repository. By using co-word, co-citation, and social network analysis techniques on 1,656 relevant articles and 69,357 cited references, we were able to identify the main research topics and research streams related to POS tagging. We have explained each of the research streams in detail, along with an informative visualization that shows the evolution of research streams over time and the intellectual structure. After that, we have provided a comprehensive discussion of the findings, highlighting the current hotspots and future prospects in POS tagging research.

KEYWORDS

Part of Speech, Natural Language Processing, Bibliometric analysis, Research scope

CITE THIS PAPER

Xiaoyi Du, Litong Wu, Bingliang Zhang, Xinye Li, Feng Hu, The Scope of Part of Speech Tagging: A Bibliometric Study. Lecture Notes on Language and Literature (2024) Vol. 7: 51-58. DOI: http://dx.doi.org/10.23977/langl.2024.070408.

REFERENCES

[1] Abney, S. (1997). Part-of-Speech Tagging and Partial Parsing. In Corpus-based methods in language and speech processing (pp. 118-136). Dordrecht: Springer Netherlands. 
[2] Martinez, A. R. (2012). Part-of-speech tagging. Wiley Interdisciplinary Reviews: Computational Statistics, 4(1), 107–113. 
[3] Broadus, R. N. (1987). Toward a definition of “bibliometrics.” Scientometrics, 12(5–6), 373–379. 
[4] Nerur, S. P., Rasheed, A. A., & Natarajan, V. (2008). The intellectual structure of the strategic management field: An author co-citation analysis. Strategic Management Journal, 29(3). 
[5] Teichert, T., & Shehu, E. (2010). Investigating Research Streams of Conjoint Analysis: A Bibliometric Study. Business Research, 3(1), 49–68. 
[6] Hota, P. K., Subramanian, B., & Narayanamurthy, G. (2020). Mapping the Intellectual Structure of Social Entrepreneurship Research: A Citation/Co-citation Analysis. Journal of Business Ethics, 166(1), 89–114.
[7] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. 
[8] Ling, W., Luís, T., Marujo, L., Astudillo, R. F., Amir, S., Dyer, C., Black, A. W., & Trancoso, I. (2015). Finding function in form: Compositional character models for open vocabulary word representation. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, September, 1520–1530. 
[9] Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - EMNLP ’02, 10, 79–86.
[10] Subrahmanian, V. S., & Reforgiato, D. (2008). AVA: Adjective-Verb-Adverb Combinations for Sentiment Analysis. IEEE Intelligent Systems, 23(4), 43–50.
[11] Wang, Y., Huang, M., Zhao, L., & Zhu, X. (2016). Attention-based LSTM for aspect-level sentiment classification. EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings, 606–615.
[12] Fan, F., Feng, Y., & Zhao, D. (2018). Multi-grained attention network for aspect-level sentiment classification. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, 3433–3442.
[13] Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model (short version). Advances in Neural Information Processing Systems.
[14] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
[15] Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.
[16] Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44(4), 991–997.
[17] Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods, 42(3), 643–650.
[18] Santorini, B. (1990). Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision). University of Pennsylvania 3rd Revision 2nd Printing.
[19] Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, June, 173–180.
[20] Brill, E. (1995). Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-speech Tagging. Computational Linguistics, 21(4), 543–565.
[21] Petrov, S., Barrett, L., Thibaux, R., & Klein, D. (2006). Learning accurate, compact, and interpretable tree annotation. COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 1(July), 433–440.
[22] Marcus, M., Santorini, B., Ann Marcinkiewicz, M., & Large, B. (1993). Building a Large Annotated Corpus of English: The Penn Treebank Building a Large Annotated Corpus of English: The Penn Treebank Recommended Citation Recommended Citation. Computational Linguistics, 19(October), 313.
[23] David M. Blei, Andrew Y. Ng, M. I. J. (2003). Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data. J Mach Learn Res, 3, 993–1022.
[24] Chen, T., Xu, R., He, Y., & Wang, X. (2017). Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Systems with Applications, 72, 221–230.
[25] Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model (short version). Advances in Neural Information Processing Systems, 13 
[26] Wang, Y., Huang, M., Zhao, L., & Zhu, X. (2016). Attention-based LSTM for aspect-level sentiment classification. EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings, 606–615.

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.