بکارگیری مدل مبتنی بر ترنسفورمر برای تشخیص فعالیت‌های غیرطبیعی در ویدئو

احمدی, امیر محمد; کیانی, کورش; راستگو, راضیه

doi:10.22075/jme.2024.32914.2604

بکارگیری مدل مبتنی بر ترنسفورمر برای تشخیص فعالیت‌های غیرطبیعی در ویدئو

نوع مقاله : مقاله پژوهشی

نویسندگان

¹ دانشجوی کارشناسی ارشد، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

² دانشیار، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

³ استادیار، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

10.22075/jme.2024.32914.2604

چکیده

با توجه به افزایش روز افزون حجم ویدئوهای تولید شده توسط دوربین‏های امنیتی و نظارتی در مکان‏های شخصی و عمومی، نظارت بر فعالیتهای موجود در ویدئو امری حیاتی میباشد. بسیاری از نظارت‏های ویدئویی برای بررسی صحت عملکرد و هشدار هنگام وقوع یا انجام اعمال غیرطبیعی می‏باشد. در این راستا، مدلهای هوشمند مختلفی جهت تشخیص فعالیتهای موجود در ویدئو ارائه گردیده است. با توجه به پیشرفتهای اخیر در حوزه هوش مصنوعی و بهخصوص یادگیری عمیق، در این مقاله، مدلی مبتنی بر شبکه ترنسفورمر ارائه میگردد. در این راستا، به منظور کاهش میزان محاسبات، نقاط کلیدی بدن مورد استفاده قرار میگیرند. تعداد 15 نقطه کلیدی بدن به مدل ترنسفورمر وارد میگردند تا با تکیه بر پردازش موازی این شبکه در حالت آموزش و نیز مکانیسم خودتوجهی، سرعت و دقت مدل افزایش داده شود. نتایج تجربی بر روی پایگاه داده عمومی JHMDB حاکی از بهبود دقت تشخیص فعالیتهای غیرطبیعی نسبت به مدل‌های پایه میباشد.

کلیدواژه‌ها

موضوعات

مهندسی کامپیوتر

عنوان مقاله [English]

A Transformer-Based Model for Abnormal Activity Recognition

نویسندگان [English]

Amir Mohammad Ahmadi ¹
Kourosh Kiani ²
Razieh Rastgoo ³

¹ Master's student, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran

² Associate Professor, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran

³ Assistant Professor, Electrical and Computer Faculty, Semnan University, Semnan, Iran

چکیده [English]

Given the increasing daily volume of videos generated by security cameras in personal and public spaces, monitoring the activities present in videos has become crucial. Many video surveillance systems are designed to verify performance accuracy and provide alerts during the occurrence of abnormal activities. In this regard, various intelligent models have been proposed for detecting activities in videos. Considering recent advances in artificial intelligence, particularly deep learning, this paper introduces a model based on the Transformer network. To reduce computational complexity, keypoints of the human body are utilized in this approach. Fifteen key body points are input into the Transformer model, leveraging parallel processing during training and a self-attention mechanism. This enhances the speed and accuracy of the model. Experimental results on the JHMDB public database indicate an improvement in the accuracy of detecting abnormal activities compared to baseline models.

کلیدواژه‌ها [English]

Video processing
Video surveillance
Abnormal activities
Deep learning
Transformer network

مراجع

[1] C. Dhiman, and D.K. Vishwakarma. "A review of state-of-the-art techniques for abnormal human activity." Engineering Applications of Artificial Intelligence 77, (2019): 21-45.

[2] R. Rastgoo, K. Kiani, and S. Escalera. "ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos." arXiv:2108.10059, (2021).

[3] R. Rastgoo, K. Kiani, S. Escalera, and M. Sabokrou. "Multi-modal zero-shot sign language recognition". arXiv:2109.00796, (2021).

[4] R. Rastgoo, K. Kiani, and S. Escalera. "Word separation in continuous sign language using isolated signs and post-processing." arXiv:2204.00923, 2022.

[5] M.A. Gul, M.H. Yousaf, S. Nawaz, Z.U. Rehman, and H.W. Kim. "Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture." Electronics 12, no. 9 (2020): 1993.

[6] "JHMDB: Joint-annotated Human Motion Data Base". https://ps.is.mpg.de/code/jhmdb-joint-annotated-human-motion-data-base. Access Date: Feb. 2024.

[7] M. Jain, H. Jégou, and P. Bouthemy. "Improved Motion Description for Action Classification." Frontiers in ICT 2, no. 28 (2015).

[8] R. Rastgoo, and V. Sattari Naeini. "A neuro-fuzzy QoS-aware routing protocol for smart grids." 22nd Iranian Conference on Electrical Engineering (ICEE), pp. 1080-1084, 2014.

[9] R. Rastgoo, and V. Sattari Naeini. "Tuning parameters of the QoS-aware routing protocol for smart grids using genetic algorithm." Applied Artificial Intelligence 30, no. 1 (2016): 52-76.

[10] N. Majidi, K. Kiani, and R. Rastgoo. "A deep model for super-resolution enhancement from a single image." Journal of AI and Data Mining 8, no. 4, (2020): 451-460.

[11] K. Kiani, R. Hematpour, and R. Rastgoo. "Automatic grayscale image colorization using a deep hybrid model." Journal of AI and Data Mining 9, no. 3 (2021): 321-328.

[12] R. Rastgoo, and V. Sattari-Naeini. "Gsomcr: Multi-constraint genetic-optimized qos-aware routing protocol for smart grids." Iranian Journal of Science and Technology, Transactions of Electrical Engineering 42, (2018): 185-194.

[13] R. Rastgoo, and K. Kiani. "Face recognition using fine-tuning of Deep Convolutional Neural Network and transfer learning." Journal of Modeling in Engineering 17, no. 58 (2019): 103-111.

[14] S. Zarbafi, K. Kiani, and R. Rastgoo. "Spoken Persian digits recognition using deep learning." Journal of Modeling in Engineering 21, (2023): 163-172.

[15] F. Alinezhad, K. Kiani, and R. Rastgoo. "A Deep Learning-based Model for Gender Recognition in Mobile Devices." Journal of AI and Data Mining 11, (2023): 229-236.

[16] F. Bagherzadeh, and R. Rastgoo. "Deepfake image detection using a deep hybrid convolutional neural network." Journal of Modeling in Engineering 75, no. 21 (2023): 19-28.

[17] F. Yang, Y. Wu, S. Sakti, and S. Nakamura. "Make Skeleton-based Action Recognition Model Smaller, Faster and Better." Proceedings of the ACM Multimedia Asia, 2019.

[18] M.G. Morshed, T. Sultana, A. Alam and Y.K. Lee. "Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities." Sensors 23, no. 4 (2023): 2182.

[19] J. Liu, A. Shahroudy, M. Perez, G. Wang, L.Y. Duan, and A.C. Kot. "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 10 (2020): 2684-2701.

[20] J. Jang, D. Kim, C. Park, M. Jang, J. Lee, and J. Kim. "ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly." IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.

[21] S. Yan, Y. Xiong, and D. Lin. "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition." Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[22] W. Sultani, C. Chen and M. Shah. " Real-World Anomaly Detection in Surveillance Videos.” IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.

[23] E.M. Saoudi, J. Jaafari, and S.J. Andaloussi. "Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN." Scientific African 21, (2023).

[24] R. Rastgoo, K. Kiani, and S. Escalera. "Hand sign language recognition using multi-view hand skeleton." Expert Systems with Applications 150, (2020): 113336.

[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention Is All You Need." Advances in Neural Information Processing Systems, 2023.

[26] H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M.J. Black. "Towards Understanding Action Recognition." IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2013.

بکارگیری مدل مبتنی بر ترنسفورمر برای تشخیص فعالیت‌های غیرطبیعی در ویدئو

A Transformer-Based Model for Abnormal Activity Recognition

مراجع

دوره 22، شماره 76
اردیبهشت 1403
صفحه 213-221

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

بکارگیری مدل مبتنی بر ترنسفورمر برای تشخیص فعالیت‌های غیرطبیعی در ویدئو

A Transformer-Based Model for Abnormal Activity Recognition

مراجع

دوره 22، شماره 76اردیبهشت 1403صفحه 213-221

فایل ها

سابقه مقاله

هم رسانی

ارجاع به این مقاله

آمار

دوره 22، شماره 76
اردیبهشت 1403
صفحه 213-221