Education, Science, Technology, Innovation and Life
Open Access
Sign In

Theme Classification of the Complete Song Ci from the Perspective of the Digital Humanities

Download as PDF

DOI: 10.23977/langl.2023.061103 | Downloads: 8 | Views: 363

Author(s)

Yuanyuan Fang 1

Affiliation(s)

1 Xihua University, Chengdu, Sichuan, 610000, China

Corresponding Author

Yuanyuan Fang

ABSTRACT

To fully explore the underlying themes in the Complete Song Ci, we adopted a new paradigm of the digital humanities to efficiently extract themes from large-scale ancient poetry texts, which is expected to provide new perspectives and ideas for the study of traditional poetic themes. Under the BERTopic classification framework, we carried out fine-tuning training by combining a pre-training model for BERT with ancient Chinese and the SimCSE unsupervised learning method. We derived topic classification results of the Complete Song Ci through quantitative and visual means. The results indicate that the Complete Song Ci is divided into 43 sub-themes, among which certain similarities and compatibilities exist. After a further merging of the sub-themes based on cosine similarity values, we identified ten distinct themes, conforming to the Ten Major Themes theory of classical Chinese literature proposed in previous research, simultaneously establishing the research value of machine learning theories such as BERTopic in the topic classification of ancient poetic texts.

KEYWORDS

Digital humanities; the Complete Song Ci; topic classification; unsupervised learning

CITE THIS PAPER

Yuanyuan Fang, Theme Classification of the Complete Song Ci from the Perspective of the Digital Humanities. Lecture Notes on Language and Literature (2023) Vol. 6: 13-24. DOI: http://dx.doi.org/10.23977/langl.2023.061103.

REFERENCES

[1] Burdick A., Drucker J. and Lunnefeld P. Digital Humanities: Changing the Game of Knowledge Innovation and Sharing [M], Ma, Linqing Han, Ruohua, Translation. People's University of China Press, Beijing, 2018. 
[2] Zhaopeng W. Progress and prospect of lexicographic research since the new century [J]. Academic Research, 6: 143–151, 2015. 
[3] Börner K. Plug-and-play microscopes [J]. Communications of the ACM, 2011, 54(3). 
[4] Zhaopeng W. and Dawei S. The initial practice and academic significance of digital humanities in studying ancient literature [J]. Chinese Social Sciences, 8: 108–129, 206–207, 2020. 
[5] Daiyun Le(Ed). A Course on Comparative Literature between Chinese and Western [M]. Higher Education Press, Beijing, 1988. 
[6] Xuehui Z. The art of bird imagery in song lyrics [J]. Journal of Soochow University (Philosophy and Social Science Edition), 32(2): 158–163, 2011. 
[7] Ning X. Research on Rain Imagery in Song Lyrics [D]. Dissertation, Nanjing University of Information Engineering, 2022. 
[8] Yuxuan Z. Study on the Imagery of Paper in Song Lyrics [D]. Dissertation, East China Normal University, 2021. 
[9] Yixuan D. Study of Night Imagery in Song Lyrics [D]. Dissertation, Hunan University, 2016. 
[10] Qian W. The development and metamorphosis of insect words in the Song Dynasty from butterfly and cricket words [J]. Jiangsu Social Science, 4: 228–232, 2014. 
[11] Yang L. Aria of life and talent: Aesthetic description of the creation of Song Dynasty shou lyrics [J]. Masterpiece Appreciation, 6: 34–37, 1995. 
[12] Gai H. Research on Song Dynasty festival words: A literature review [J]. Chongqing Social Science, 2: 78–83, 2013. 
[13] Boqing X. Studies on Song Lyric Themes [M]. China Book Bureau, Beijing, 2007. 
[14] Wang L. Ten Themes of Ancient Chinese Literature: Archetypes and Fluxes [M]. Liaoning Education Publishing House, Shenyang, 1990. 
[15] Blei D. and Jordan M. I. Latent Dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993–1022. 
[16] Li X., Hu Y. and Huang L. A study on hybrid automatic classification of multiple types of documents using LDA topic model. Library Forum, 35(1): 74–80, 2015. 
[17] Zizhuo S., Ying Y. and Yanqiu S. Topic model-based text mining of classical musical instrument poems [J]. Journal of Chinese Information, 33(3): 79–86, 2019. 
[18] Kaiyan M., Yao X. and Qian C. A study of plant imagery in contemporary Chinese literature in the digital humanities perspective. Digital Humanities Research, 2(2): 35–45, 2022. 
[19] Qi L. The spirit of the times: Keyword extraction and interpretation of short texts – Practice based on the text of People's Daily [J]. Digital Humanities, 3: 125–150, 2020. 
[20] Zhang D. X. and Zhang M. A review of the progress of research on applying the LDA topic model in the field of graphical intelligence [J]. Library Intelligence Knowledge, 39(6): 143–157, 2022. 
[21] Mikolov T., Chen K., Corrado G. and Dean J. Efficient estimation of word representations in vector space[J]. arXiv preprint arXiv: 1301. 3781, 2013. 
[22] Angelov D. Top2vec: Distributed representations of topics [J]. arXiv preprint arXiv: 2008. 0947, 2020. 
[23] Devlin J., Chang M. W., Lee K and Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding [J]. arXiv preprint arXiv: 1810. 04805, 2018. 
[24] Grootendorst M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv: 2203. 05794, 2022. 
[25] OpenAl. GPT-4 technical report [J]. arXiv preprint arXiv:2303. 08774, 2023. 
[26] Reimers N. and Gurevych I. Sentence-BERT: Sentence embeddings using Siamese BERT networks [J]. arXiv preprint arXiv:1908. 10084, 2019. 
[27] Wang P. and Ren Z. The uncertainty-based retrieval framework for ancient Chinese CWS and POS [J]. Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages. 164–168, 2022. 
[28] Gao T., Yao X. and Chen D. SimCSE: Simple contrastive learning of sentence embeddings [J]. arXiv preprint arXiv:2104. 08821, 2021. 
[29] McInnes L., Healy J. and Melville J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802. 03426, 2018. 
[30] McInnes L. and Healy J. Astels S. hdbscan Hierarchical density-based clustering [J]. Journal of Open Source Software, 2(11): 205, 2017.

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.