بکارگیری مدل مبتنی بر ترنسفورمر برای تشخیص فعالیت‌های غیرطبیعی در ویدئو

نوع مقاله : مقاله کامپیوتر

نویسندگان

1 دانشجوی کارشناسی ارشد، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

2 دانشیار، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

3 استادیار، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

چکیده

با توجه به افزایش روز افزون حجم ویدئوهای تولید شده توسط دوربین‏های امنیتی و نظارتی در مکان‏های شخصی و عمومی، نظارت بر فعالیت­های موجود در ویدئو امری حیاتی می­باشد. بسیاری از نظارت‏های ویدئویی برای بررسی صحت عملکرد و هشدار هنگام وقوع یا انجام اعمال غیرطبیعی می‏باشد. در این راستا، مدل­های هوشمند مختلفی جهت تشخیص فعالیت­های موجود در ویدئو ارائه گردیده است. با توجه به پیشرفت­های اخیر در حوزه هوش مصنوعی و به­خصوص یادگیری عمیق، در این مقاله، مدلی مبتنی بر شبکه ترنسفورمر ارائه می­گردد. در این راستا، به منظور کاهش میزان محاسبات، نقاط کلیدی بدن مورد استفاده قرار می­گیرند. تعداد 15 نقطه کلیدی بدن به مدل ترنسفورمر وارد می­گردند تا با تکیه بر پردازش موازی این شبکه در حالت آموزش و نیز مکانیسم خودتوجهی، سرعت و دقت مدل افزایش داده شود. نتایج تجربی بر روی پایگاه داده عمومی JHMDB حاکی از بهبود دقت تشخیص فعالیت­های غیرطبیعی نسبت به مدل‌های پایه می­باشد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

A Transformer-Based Model for Abnormal Activity Recognition

نویسندگان [English]

  • Amir Mohammad Ahmadi 1
  • Kourosh Kiani 2
  • Razieh Rastgoo 3
1 Master's student, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran
2 Associate Professor, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran
3 Assistant Professor, Electrical and Computer Faculty, Semnan University, Semnan, Iran
چکیده [English]

Given the increasing daily volume of videos generated by security cameras in personal and public spaces, monitoring the activities present in videos has become crucial. Many video surveillance systems are designed to verify performance accuracy and provide alerts during the occurrence of abnormal activities. In this regard, various intelligent models have been proposed for detecting activities in videos. Considering recent advances in artificial intelligence, particularly deep learning, this paper introduces a model based on the Transformer network. To reduce computational complexity, keypoints of the human body are utilized in this approach. Fifteen key body points are input into the Transformer model, leveraging parallel processing during training and a self-attention mechanism. This enhances the speed and accuracy of the model. Experimental results on the JHMDB public database indicate an improvement in the accuracy of detecting abnormal activities compared to baseline models.

کلیدواژه‌ها [English]

  • Video processing
  • Video surveillance
  • Abnormal activities
  • Deep learning
  • Transformer network
[1] C. Dhiman, and D.K. Vishwakarma. "A review of state-of-the-art techniques for abnormal human activity." Engineering Applications of Artificial Intelligence 77, (2019): 21-45.
[2] R. Rastgoo, K. Kiani, and S. Escalera. "ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos." arXiv:2108.10059, (2021).
[3] R. Rastgoo, K. Kiani, S. Escalera, and M. Sabokrou. "Multi-modal zero-shot sign language recognition". arXiv:2109.00796, (2021).
[4] R. Rastgoo, K. Kiani, and S. Escalera. "Word separation in continuous sign language using isolated signs and post-processing." arXiv:2204.00923, 2022.
[5] M.A. Gul, M.H. Yousaf, S. Nawaz, Z.U. Rehman, and H.W. Kim. "Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture." Electronics 12, no. 9 (2020): 1993.
[6] "JHMDB: Joint-annotated Human Motion Data Base". https://ps.is.mpg.de/code/jhmdb-joint-annotated-human-motion-data-base. Access Date: Feb. 2024.
[7] M. Jain, H. Jégou, and P. Bouthemy. "Improved Motion Description for Action Classification." Frontiers in ICT 2, no. 28 (2015).
[8] R. Rastgoo, and V. Sattari Naeini. "A neuro-fuzzy QoS-aware routing protocol for smart grids." 22nd Iranian Conference on Electrical Engineering (ICEE), pp. 1080-1084, 2014.
[9] R. Rastgoo, and V. Sattari Naeini. "Tuning parameters of the QoS-aware routing protocol for smart grids using genetic algorithm." Applied Artificial Intelligence 30, no. 1 (2016): 52-76.
[10] N. Majidi, K. Kiani, and R. Rastgoo. "A deep model for super-resolution enhancement from a single image." Journal of AI and Data Mining 8, no. 4, (2020): 451-460.
[11] K. Kiani, R. Hematpour, and R. Rastgoo. "Automatic grayscale image colorization using a deep hybrid model." Journal of AI and Data Mining 9, no. 3 (2021): 321-328.
[12] R. Rastgoo, and V. Sattari-Naeini. "Gsomcr: Multi-constraint genetic-optimized qos-aware routing protocol for smart grids." Iranian Journal of Science and Technology, Transactions of Electrical Engineering 42, (2018): 185-194.
[13] R. Rastgoo, and K. Kiani. "Face recognition using fine-tuning of Deep Convolutional Neural Network and transfer learning." Journal of Modeling in Engineering 17, no. 58 (2019): 103-111.
[14] S. Zarbafi, K. Kiani, and R. Rastgoo. "Spoken Persian digits recognition using deep learning." Journal of Modeling in Engineering 21, (2023): 163-172.
[15] F. Alinezhad, K. Kiani, and R. Rastgoo. "A Deep Learning-based Model for Gender Recognition in Mobile Devices." Journal of AI and Data Mining 11, (2023): 229-236.
[16] F. Bagherzadeh, and R. Rastgoo. "Deepfake image detection using a deep hybrid convolutional neural network." Journal of Modeling in Engineering 75, no. 21 (2023): 19-28.
[17] F. Yang, Y. Wu, S. Sakti, and S. Nakamura. "Make Skeleton-based Action Recognition Model Smaller, Faster and Better." Proceedings of the ACM Multimedia Asia, 2019.
[18] M.G. Morshed, T. Sultana, A. Alam and Y.K. Lee. "Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities." Sensors 23, no. 4 (2023): 2182.
[19] J. Liu, A. Shahroudy, M. Perez, G. Wang, L.Y. Duan, and A.C. Kot. "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 10 (2020): 2684-2701.
[20] J. Jang, D. Kim, C. Park, M. Jang, J. Lee, and J. Kim. "ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly." IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
[21] S. Yan, Y. Xiong, and D. Lin. "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition." Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[22] W. Sultani, C. Chen and M. Shah. " Real-World Anomaly Detection in Surveillance Videos.” IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[23] E.M. Saoudi, J. Jaafari, and S.J. Andaloussi. "Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN." Scientific African 21, (2023).
[24] R. Rastgoo, K. Kiani, and S. Escalera. "Hand sign language recognition using multi-view hand skeleton." Expert Systems with Applications 150, (2020): 113336.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need." Advances in Neural Information Processing Systems, 2023.
[26] H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M.J. Black. "Towards Understanding Action Recognition." IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2013.