A Transformer-Based Model for Abnormal Activity Recognition

Document Type : Computer Article

Authors

1 Master's student, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran

2 Associate Professor, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran

3 Assistant Professor, Electrical and Computer Faculty, Semnan University, Semnan, Iran

Abstract

Given the increasing daily volume of videos generated by security cameras in personal and public spaces, monitoring the activities present in videos has become crucial. Many video surveillance systems are designed to verify performance accuracy and provide alerts during the occurrence of abnormal activities. In this regard, various intelligent models have been proposed for detecting activities in videos. Considering recent advances in artificial intelligence, particularly deep learning, this paper introduces a model based on the Transformer network. To reduce computational complexity, keypoints of the human body are utilized in this approach. Fifteen key body points are input into the Transformer model, leveraging parallel processing during training and a self-attention mechanism. This enhances the speed and accuracy of the model. Experimental results on the JHMDB public database indicate an improvement in the accuracy of detecting abnormal activities compared to baseline models.

Keywords

Main Subjects


[1] C. Dhiman, and D.K. Vishwakarma. "A review of state-of-the-art techniques for abnormal human activity." Engineering Applications of Artificial Intelligence 77, (2019): 21-45.
[2] R. Rastgoo, K. Kiani, and S. Escalera. "ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos." arXiv:2108.10059, (2021).
[3] R. Rastgoo, K. Kiani, S. Escalera, and M. Sabokrou. "Multi-modal zero-shot sign language recognition". arXiv:2109.00796, (2021).
[4] R. Rastgoo, K. Kiani, and S. Escalera. "Word separation in continuous sign language using isolated signs and post-processing." arXiv:2204.00923, 2022.
[5] M.A. Gul, M.H. Yousaf, S. Nawaz, Z.U. Rehman, and H.W. Kim. "Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture." Electronics 12, no. 9 (2020): 1993.
[6] "JHMDB: Joint-annotated Human Motion Data Base". https://ps.is.mpg.de/code/jhmdb-joint-annotated-human-motion-data-base. Access Date: Feb. 2024.
[7] M. Jain, H. Jégou, and P. Bouthemy. "Improved Motion Description for Action Classification." Frontiers in ICT 2, no. 28 (2015).
[8] R. Rastgoo, and V. Sattari Naeini. "A neuro-fuzzy QoS-aware routing protocol for smart grids." 22nd Iranian Conference on Electrical Engineering (ICEE), pp. 1080-1084, 2014.
[9] R. Rastgoo, and V. Sattari Naeini. "Tuning parameters of the QoS-aware routing protocol for smart grids using genetic algorithm." Applied Artificial Intelligence 30, no. 1 (2016): 52-76.
[10] N. Majidi, K. Kiani, and R. Rastgoo. "A deep model for super-resolution enhancement from a single image." Journal of AI and Data Mining 8, no. 4, (2020): 451-460.
[11] K. Kiani, R. Hematpour, and R. Rastgoo. "Automatic grayscale image colorization using a deep hybrid model." Journal of AI and Data Mining 9, no. 3 (2021): 321-328.
[12] R. Rastgoo, and V. Sattari-Naeini. "Gsomcr: Multi-constraint genetic-optimized qos-aware routing protocol for smart grids." Iranian Journal of Science and Technology, Transactions of Electrical Engineering 42, (2018): 185-194.
[13] R. Rastgoo, and K. Kiani. "Face recognition using fine-tuning of Deep Convolutional Neural Network and transfer learning." Journal of Modeling in Engineering 17, no. 58 (2019): 103-111.
[14] S. Zarbafi, K. Kiani, and R. Rastgoo. "Spoken Persian digits recognition using deep learning." Journal of Modeling in Engineering 21, (2023): 163-172.
[15] F. Alinezhad, K. Kiani, and R. Rastgoo. "A Deep Learning-based Model for Gender Recognition in Mobile Devices." Journal of AI and Data Mining 11, (2023): 229-236.
[16] F. Bagherzadeh, and R. Rastgoo. "Deepfake image detection using a deep hybrid convolutional neural network." Journal of Modeling in Engineering 75, no. 21 (2023): 19-28.
[17] F. Yang, Y. Wu, S. Sakti, and S. Nakamura. "Make Skeleton-based Action Recognition Model Smaller, Faster and Better." Proceedings of the ACM Multimedia Asia, 2019.
[18] M.G. Morshed, T. Sultana, A. Alam and Y.K. Lee. "Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities." Sensors 23, no. 4 (2023): 2182.
[19] J. Liu, A. Shahroudy, M. Perez, G. Wang, L.Y. Duan, and A.C. Kot. "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 10 (2020): 2684-2701.
[20] J. Jang, D. Kim, C. Park, M. Jang, J. Lee, and J. Kim. "ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly." IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
[21] S. Yan, Y. Xiong, and D. Lin. "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition." Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[22] W. Sultani, C. Chen and M. Shah. " Real-World Anomaly Detection in Surveillance Videos.” IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[23] E.M. Saoudi, J. Jaafari, and S.J. Andaloussi. "Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN." Scientific African 21, (2023).
[24] R. Rastgoo, K. Kiani, and S. Escalera. "Hand sign language recognition using multi-view hand skeleton." Expert Systems with Applications 150, (2020): 113336.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need." Advances in Neural Information Processing Systems, 2023.
[26] H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M.J. Black. "Towards Understanding Action Recognition." IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2013.