بکارگیری مدل مبتنی بر ترنسفورمر برای تشخیص فعالیت های غیرطبیعی در ویدئو

نوع مقاله : مقاله کامپیوتر

نویسندگان

1 دانشجوی کارشناسی ارشد، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

2 دانشیار، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

3 استادیار، دانشکده برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

چکیده

با توجه به افزایش روز افزون حجم ویدئوهای تولید شده توسط دوربین‏ های امنیتی و نظارتی در مکان‏های شخصی و عمومی، نظارت بر فعالیت های موجود در ویدئو امری حیاتی می باشد. بسیاری از نظارت‏های ویدئویی برای بررسی صحت عملکرد و هشدار هنگام وقوع یا انجام اعمال غیرطبیعی می‏باشد. در این راستا، مدل های هوشمند مختلفی جهت تشخیص فعالیت های موجود در ویدئو ارائه گردیده است. با توجه به پیشرفت های اخیر در حوزه هوش مصنوعی و به خصوص یادگیری عمیق، در این مقاله، مدلی مبتنی بر شبکه ترنسفورمر ارائه می گردد. در این راستا، به منظور کاهش میزان محاسبات، نقاط کلیدی بدن مورد استفاده قرار می‌گیرند. تعداد 15 نقطه کلیدی بدن به مدل ترنسفورمر وارد می گردند تا با تکیه بر پردازش موازی این شبکه در حالت آموزش و نیز مکانیسم خودتوجهی، سرعت و دقت مدل افزایش داده شود. نتایج تجربی بر روی پایگاه داده عمومی JHMDB حاکی از بهبود دقت تشخیص فعالیت های غیرطبیعی نسبت به مدل های پایه می باشد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

A Transformer-based model for abnormal activity recognition in video

نویسندگان [English]

  • Amir Mohammad Ahmadi 1
  • Kourosh Kiani 2
  • Razieh Rastgoo 3
1 Master's student, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran
2 Associate Professor, Faculty of Electrical and Computer Science, Semnan University, Semnan, Iran
3 Assistant Professor, Electrical and Computer Faculty, Semnan University, Semnan, Iran
چکیده [English]

Given the increasing daily volume of videos generated by security cameras in personal and public spaces, monitoring the activities present in videos has become crucial. Many video surveillance systems are designed to verify performance accuracy and provide alerts during the occurrence of abnormal activities. In this regard, various intelligent models have been proposed for detecting activities in videos. Considering recent advances in artificial intelligence, particularly deep learning, this paper introduces a model based on the Transformer network. To reduce computational complexity, keypoints of the human body are utilized in this approach. Fifteen key body points are input into the Transformer model, leveraging parallel processing during training and a self-attention mechanism. This enhances the speed and accuracy of the model. Experimental results on the JHMDB public database indicate an improvement in the accuracy of detecting abnormal activities compared to baseline models.

Keywords: Video processing, Video surveillance, Abnormal activities, Deep learning, Transformer Network.

کلیدواژه‌ها [English]

  • Video processing
  • Video surveillance
  • Abnormal activities
  • Deep learning
  • Transformer Network
[1] Dhiman, Chhavi and Dinesh Kumar Vishwakarma. "A review of state-of-the-art techniques for abnormal human activity". Engineering Applications of Artificial Intelligence 77, (2019): 21-45.
[2] Rastgoo, Razieh, Kourosh Kiani, and Sergio Escalera. "ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos". arXiv:2108.10059, (2021).
[3] Rastgoo, Razieh, Kourosh Kiani, Sergio Escalera, and Mohammad Sabokrou. "Multi-modal zero-shot sign language recognition". arXiv:2109.00796, (2021).
[4] Rastgoo, Razieh, Kourosh Kiani, and Sergio Escalera. "Word separation in continuous sign language using isolated signs and post-processing". arXiv:2204.00923, 2022.
[5] Gul, Malik Ali, Muhammad Haroon Yousaf, Shah Nawaz, Zaka Ur Rehman, and Hyung Won Kim. "Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture". Electronics 12, no. 9 (2020): 1993.
[6] "JHMDB: Joint-annotated Human Motion Data Base". https://ps.is.mpg.de/code/jhmdb-joint-annotated-human-motion-data-base. Access Date: Feb. 2024.
[7] Jain, Mihir, Hervé Jégou, and Patrick Bouthemy. "Improved Motion Description for Action Classification". Frontiers in ICT 2, no. 28 (2015).
[8] Rastgoo, Razieh and Vahid Sattari Naeini. "A neuro-fuzzy QoS-aware routing protocol for smart grids". 22nd Iranian Conference on Electrical Engineering (ICEE), pp. 1080-1084, 2014.
[9] Rastgoo, Razieh and Vahid Sattari Naeini. "Tuning parameters of the QoS-aware routing protocol for smart grids using genetic algorithm". Applied Artificial Intelligence 30, no. 1 (2016): 52-76.
[10] Majidi, Nezam, Kourosh Kiani, and Razieh Rastgoo. "A deep model for super-resolution enhancement from a single image". Journal of AI and Data Mining 8, no. 4, (2020): 451-460.
[11] Kiani, Kourosh, Razieh Hematpour, and Razieh Rastgoo. "Automatic grayscale image colorization using a deep hybrid model". Journal of AI and Data Mining 9, no. 3 (2021): 321-328.
[12] Rastgoo, Razieh and Vahid Sattari-Naeini. "Gsomcr: Multi-constraint genetic-optimized qos-aware routing protocol for smart grids". Iranian Journal of Science and Technology, Transactions of Electrical Engineering 42, (2018): 185-194.
[13] Rastgoo, Razieh and Kourosh Kiani. "Face recognition using fine-tuning of Deep Convolutional Neural Network and transfer learning". Journal of Modeling in Engineering 17, no. 58 (2019): 103-111.
[14] Zarbafi, Sahar, Kourosh Kiani, and Razieh Rastgoo. "Spoken Persian digits recognition using deep learning". Journal of Modeling in Engineering 21, (2023): 163-172.
[15] Alinezhad, Fatemeh, Kourosh Kiani, and Razieh Rastgoo. "A Deep Learning-based Model for Gender Recognition in Mobile Devices". Journal of AI and Data Mining 11, (2023): 229-236.
[16] Bagherzadeh, Fahimeh and Razieh Rastgoo. "Deepfake image detection using a deep hybrid convolutional neural network". Journal of Modeling in Engineering 75, no. 21 (2023): 19-28.
[17] Yang, Fan, Yang Wu, Sakriani Sakti, and Satoshi Nakamura. "Make Skeleton-based Action Recognition Model Smaller, Faster and Better". Proceedings of the ACM Multimedia Asia, 2019.
[18] Morshed, Md Golam, Tangina Sultana, Aftab Alam andYoung-Koo Lee. "Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities". Sensors 23, no. 4 (2023): 2182.
[19] Liu, Jun, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot. "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 10 (2020): 2684-2701.
[20] Jang, Jinhyeok, Dohyung Kim, Cheonshu Park, Minsu Jang, Jaeyeon Lee, and Jaehong Kim. "ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize Daily Activities of the Elderly". IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
[21] Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition". Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[22] Sultani, Waqas, Chen Chen and Mubarak Shah. " Real-World Anomaly Detection in Surveillance Videos”. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[23] Saoudi, El Mehdi, Jaafar Jaafari, and Said Jai Andaloussi. "Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN". Scientific African 21, (2023).
[24] Rastgoo, Razieh, Kourosh Kiani, and Sergio Escalera. "Hand sign language recognition using multi-view hand skeleton". Expert Systems with Applications 150, (2020): 113336.
[25] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin, "Attention Is All You Need". Advances in Neural Information Processing Systems, 2023.
[26] Jhuang, Hueihan, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J. Black. "Towards Understanding Action Recognition". IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2013.