Education, Science, Technology, Innovation and Life
Open Access
Sign In

A survey of Few-Shot Action Recognition

Download as PDF

DOI: 10.23977/jaip.2023.060105 | Downloads: 13 | Views: 720

Author(s)

Congmin Wang 1, Yancong Zhou 2

Affiliation(s)

1 School of Science, Tianjin University of Commerce, Tianjin, China
2 School of Information Engineering, Tianjin University of Commerce, Tianjin, China

Corresponding Author

Yancong Zhou

ABSTRACT

In recent years, with the development of network technology, countless videos are produced every day. Many achievements have also been made in the field of action recognition in computer vision. Training action recognition models requires a large number of labeled samples, but in reality, the amount of data is scarce, and it is extremely difficult to obtain a large amount of data due to costs and other reasons. The few-shot learning aims to solve the problem of using several samples to learn new categories. This paper combs the relevant research in recent years of few-shot action recognition technology. According to the classification of training process, this paper summarizes the research progress and typical models of few-shot action recognition from the perspectives of data processing, feature embedding, feature augmentation, and metric learning; finally points out the challenges faced by current research and the future development directions.

KEYWORDS

Few-Shot Learning, Action Recognition, Deep Learning

CITE THIS PAPER

Congmin Wang, Yancong Zhou, A survey of Few-Shot Action Recognition. Journal of Artificial Intelligence Practice (2023) Vol. 6: 34-40. DOI: http://dx.doi.org/10.23977/jaip.2023.060105.

REFERENCES

[1] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 2015, 28.
[2] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. Advancesin neural information processing systems, 2012, 25. 
[3] Yan L, Zheng Y, Cao J. Few-shot learning for short text classification. Multimedia Tools and Applications, 2018, 77(22): 29799-29810. 
[4] Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press, 2016.
[5] Qin T, Li W, Shi Y, et al. Diversity helps: Unsupervised few-shot learning via distribution shift-based data augmentation. arXiv:2004.05805, 2020. 
[6] Xu H, Wang J, Li H, et al. Unsupervised meta-learning for few-shot learning. Pattern Recognition, 2021, 116: 107951. 
[7] Zhang H, Zhan T, Davidson I. A self-supervised deep learning framework for unsupervised few-shot learning and clustering. Pattern Recognition Letters, 2021, 148: 75-81.
[8] Wang Y, Yao Q, Kwok J T, et al. Generalizing from a Few Examples: A Survey on Few-shot Learning. ACM Computing Surveys, 2020, 53(3):1-34.
[9] An Shengbiao, Guo Yuqi, Bai Yu, Wang Tengbo. Summary of image classification studies in small samples. And Computer Science and Exploration. 2022. 1-22.
[10] H Kuehne, T Serre, H Jhuang, E Garrote, T Poggio, and T Serre. HMDB: A large video database for human motion recognition. In International Conference on Computer Vision, nov 2011. 2, 4, 10.
[11] Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H S Torr, and Piotr Koniusz. Few-shot Action Recognition with Permutation-invariant Attention. In European Conference on Computer Vision, 2020. 1, 2, 3, 4, 5, 10,11
[12] Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah.UCF101: A Dataset of 101 Human Actions Classes from Videos in The Wild. arXiv, 2012. 2, 4, 10.
[13] Raghav Goyal, Vincent Michalski, Joanna Materzy, Susanne Westphal, Heuna Kim, Valentin Haenel, Peter Yianilos, Moritz Mueller-freitag, Florian Hoppe, Christian Thurau, Ingo Bax, and Roland Memisevic. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense. In International Conference on Computer Vision, 2017. 1, 2, 4, 10.
[14] Linchao Zhu and Yi Yang. Compound Memory Networks for Few-Shot Video Classification. In European Conference on Computer Vision, 2018. 1, 2, 4, 5, 10, 11.
[15] Linchao Zhu and Yi Yang. Label Independent Memory for Semi-Supervised Few-shot Video Classification. Transactions on Pattern Analysis and Machine Intelligence, 14(8), 2020. 1, 2, 4, 5, 7, 8, 10, 11.
[16] Joao Carreira and Andrew Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Computer Vision and Pattern Recognition, 2017. 1, 2, 4, 10.
[17] Lawrence S, Giles C L, Tsoi A C. Lessons in Neural Network Training: Overfitting May be Harder than Expected. Fourteenth National Conference on Artificial Intelligence & Ninth Innovative Applications of Artificial Intelligence Conference. AAAI Press, 1997.
[18] G. Huang, A. G. Bors. Busy-Quiet Video Disentangling for Video Classification.2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2022, 756-765.
[19] Yuqian Fu, Li Zhang, Junke Wang, Yanwei Fu, Yu-Gang Jiang. Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition. accepted by ACM Multimedia 2020.
[20] K. Cao, J. Ji, Z. Cao, C. -Y. Chang and J. C. Niebles. Few-Shot Video Classification via Temporal Alignment.2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, 10615-10624.
[21] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, pages 20–36. Springer, 2016. 1, 2, 4, 6.
[22] Mina Bishay, Georgios Zoumpourlis, and Ioannis Patras. TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition. In British Machine Vision Conference, 2019. 1, 2, 4, 5, 8, 10, 11.
[23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Computer Vision and Pattern Recognition, 2016. 5.
[24] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations. 2021.
[25] H.-J. Ye, H. Hu, D.-C. Zhan and F. Sha. Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions.2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, 8805-8814.
[26] Haddad, M., Ghassab, V.K., Najar, F. et al. A statistical framework for few-shot action recognition. Multimed Tools Appl 80, 2021,24303–24318. 
[27] Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H S Torr, and Piotr Koniusz. Few-shot Action Recognition with Permutation-invariant Attention. In European Conference on Computer Vision, 2020. 1, 2, 3, 4, 5, 10, 11
[28] Toby Perrett, Alessandro Masullo, Tilo Burghardt, Majid Mirmehdi, and Dima Damen. Temporal-relational crosstransformers for few-shot action recognition. In CVPR, 2021. 1, 2, 3, 6, 7, 8, 12.
[29] Carl Doersch, Ankush Gupta, and Andrew Zisserman.CrossTransformers: Spatially-Aware Few-Shot Transfer. In Advances in Nerual Information Processing Systems, 2020.1, 2, 3, 5.
[30] Thatipelli, A., Narayan, S., Khan, S.H., Anwer, R.M., Khan, F.S.& Ghanem,B. Spatio-temporal Relation Modeling for Few-shot Action Recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 19926-19935.
[31] Xing E, Jordan M, Russell S J, et al. Distance metric learning with application to clustering with side-information. Advances in neural information processing systems,2002, 15.
[32] SNELLJ, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, Long Beach: MIT Press, 2017.4077-4087.
[33] SUNG F, YANG YX, ZHANG L, XIANG T, TORR P H, Hospedales T M. Learning to compare: relation network for few-shot learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA: IEEE, 2018. 1199-1208.
[34] VINYALS O, BLUNDELL C, LILLICRAP T, KORAY K. Matching networks for one shot learning. Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: MITPress, 2016. 3630-3638.
[35] Wei Shihong, Liu Hongmei, Tang Hong, Zhu Longjiao. Small-sample learning of multilevel metric networks. Computer Engineering and Application, 2023,59 (02): 94-101.
[36] Zhu X,Toisoul A,Prez-Ra J M,et al. Few-shot Action Recognition with Prototype-centered Attentive Learning. 2021.
[37] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In Computer Vision and Pattern Recognition, 2009. 5.

Downloads: 9790
Visits: 261735

Sponsors, Associates, and Links


All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.