Multiple transfer learning-based multimodal sentiment analysis using weighted convolutional neural network ensemble

Document Type : Computer Article

Authors

1 Semnan branch

2 Department of Computer, Semnan branch, Islamic Azad University, Semnan, Iran

3 Semnan University

Abstract

Analyzing the opinions of social media users can lead to a correct understanding of their attitude on different topics. The emotions found in these comments, feedback, or criticisms provide useful indicators for many purposes and can be divided into negative, positive, and neutral categories. Sentiment analysis is one of the natural language processing's tasks used in various areas. Some of social media users' opinions is are multimodal and share a combination of multiple media, including text, image and imoji, which provide a useful structure for extracting and better understanding emotions. This paper presents a hybrid transfer learning method using 5 pre-trained models and hybrid convolutional networks for multimodal sentiment analysis. In this method, 2 pre-trained convolutional network-based models are used to extract the properties of images, and 3 other pre-trained models are used to extract the properties of texts and embed words. The extracted features are used in hybrid convolutional networks, and the visual attention mechanism is used to focus on the most important emotional areas of the images and the multi-head attention mechanism is used to highlight the emotional words. The results of the classification of images and texts are combined using the voting technique, and finally the late fusion is used to determine the polarity and the final label. The results of empirical experiments of the proposed model on a standard data set show 96% accuracy.

Keywords

Main Subjects


[1] N. Jing, Z. Wu, and H. Wang, "A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction", Expert Systems with Applications, Vol. 178, 2021, 115019
[2] K. Chakraborty, S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag, and AE. Hassanien, "Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media",Applied Soft Computing, Vol.97, 2020, 106754.
]3[ حمیدرضا میرشاهولد، رامین قاسمی اصل، ناهید رئوفی و مهرداد ملک زاده دیرین، مدل سازی و پیش بینی نقطه اشتعال ترکیبات هیدرو کربنی با استفاده از شبکه عصبی"، نشریه مدل سازی در مهندسی، دوره 19، شماره 64، بهار 1400، صفحه 109-116.
[4] Z. Abbasi-Moud, H. Vahdat-Nejad, and J. Sadri, " Tourism recommendation system based on semantic clustering and sentiment analysis", Expert Systems with Applications, Vol. 167, 2021, 114324
]5[  هادی تقی‌زاده، تاج بخش نوید چاخرلو، عادل علیزاده و آیدین شیخ عبداله‌زاده ممقانی، "مدل‌سازی عمر خستگی اتصالات دو لبه برشی با استفاده از شبکه عصبی مصنوعی"، نشریه مدل سازی در مهندسی، دوره 15، شماره 49، تابستان 1396، صفحه 55-63. 
[6] H. Jafarian, AH. Taghavi, A. Javaheri, and R. Rawassizadeh ,"Exploiting BERT to improve aspect-based sentiment analysis performance on Persian language", In: 7th International Conference on Web Research (ICWR), IEEE, 2021, pp. 5-8.
[7] F. Huang, X. Zhang, Z. Zhao, J. Xu, and Z. Li, "Image–text sentiment analysis via deep multimodal attentive fusion", Knowledge-Based Systems, Vol. 167, 2019, pp. 26-37.
[8] W. Nie, Y. Yan, D. Song, and K. Wang," Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition", Multimedia Tools and Applications , Vol. 80(11), 2021, pp. 16205-16214.
[9] V. Aiswaryadevi, S. Kiruthika, G. Priyanka, N. Nataraj, and M. Sruthi,"Effective Multimodal Opinion Mining Framework Using Ensemble Learning Technique for Disease Risk Prediction", In: Inventive Computation and Information Technologies. Springer,2021, pp. 925-933.
[10] A. Ghorbanali, MK. Sohrabi, and F. Yaghmaee, "Ensemble transfer learning-based multimodal sentiment analysis using weighted convolutional neural networks:, Information Processing & Management, 2022, Vol. 59(3), p. 102929.
[11] J. Deng, S. Frühholz, Z. Zhang, and B. Schuller, "Recognizing emotions from whispered speech based on acoustic feature transfer learning", IEEE Access, Vol. 5, 2017, pp. 235-246.
[12] A. Maas, RE. Daly, PT. Pham, D. Huang, AY. Ng, and C.Potts,"Learning word vectors for sentiment analysis", In: Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, 2011, pp. 142-150.
[13] Y. Rao, J. Lei, L. Wenyin, Q. Li, and M. Chen , "Building emotional dictionary for sentiment analysis of online news", World Wide Web, Vol. 17 (4), 2014, pp. 723-742.
[14] A. Ishaq, S. Asghar, and SA. Gillani , " Aspect-based sentiment analysis using a hybridized approach based on CNN and GA", IEEE Access, Vol. 8,2020, pp. 499-512.
[15] Y. Ma, J. Yu, B. Ji, J. Chen, S. Zhao, and J. Chen, "Three-Way Decisions Based RNN Models for Sentiment Classification", In: International Joint Conference on Rough Sets. Springer, 2021, pp. 247-258.
[16] L. Zhao, L. Li, X. Zheng, and J. Zhang, "A BERT based sentiment analysis and key entity detection approach for online financial texts", In: 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), IEEE, 2021, pp. 233-238.
[17] Y. Yang, J. Jia, S. Zhang, B. Wu, Q. Chen, J. Li, C. Xing, and J. Tang, "How do your friends on social media disclose your emotions? ",In: 28th AAAI conference on artificial intelligence, 2014, pp. 306-312.
[18] A. Yadav, DK. Vishwakarma  A deep learning architecture of RA-DLNet for visual sentiment analysis. Multimedia Systems, Vol. 26, 2020, pp. 431-451.
[19] Q. You, J. Luo, H. Jin, and J. Yang, "Joint visual-textual sentiment analysis with deep neural networks", In: Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 1071-1074.
[20] C. Baecchi, T. Uricchio, M. Bertini, and A. Del Bimbo, " A multimodal feature learning approach for sentiment analysis of social network multimedia", Multimedia Tools and Applications , Vol. 75 (5) ,2016, pp. 507-525.
[21] X. Zhu, B. Cao, S. Xu, B. Liu, and J. Cao, "Joint visual-textual sentiment analysis based on cross-modality attention mechanism", In: International conference on multimedia modeling, Springer, 2019, pp. 264-276.
[22] D. Borth, R. Ji, T. Chen, T. Breuel, and S-F. Chang , "Large-scale visual sentiment ontology and detectors using adjective noun pairs", In: Proceedings of the 21st ACM international conference on Multimedia, 2013, pp. 223-232.
[23] Z. Zhao, H. Zhu, Z. Xue, Z. Liu, J. Tian, MCH. Chua, and M. Liu, "An image-text consistency driven multimodal sentiment analysis approach for social media", Information Processing & Management, Vol. 56 (6), 2019, p. 102097.
[24] Q. Fang, C. Xu, J. Sang, MS. Hossain, and G. Muhammad, " Word-of-mouth understanding: Entity-centric multimodal aspect-opinion mining in social media", IEEE Transactions on Multimedia 17, Vol. 12, 2015, pp. 281-296.
]25[ علیرضا قربانعلی، محمد کریم سهرابی و فرزین یغمایی، "طبقه‌بندی و تجزیه و تحلیل احساسات چندوجهی با استفاده از شبکه‌های کانولوشن وزن‌دار ترکیبی" ، نشریه فناوری اطلاعات در طراحی مهندسی، دوره 14، شماره 1، شهریور1400، صفحه 1-10.
[26] N. Xu, W. Mao, "A residual merged neutral network for multimodal sentiment analysis", In: 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), IEEE, 2017, pp. 6-10.
[27] N. Xu, "Analyzing multimodal public sentiment based on hierarchical semantic attentional network", In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), IEEE, 2017, pp. 152-154.
[28] N. Xu, W. Mao, and G. Chen, "A co-memory network for multimodal sentiment analysis ", In: The 41st international ACM SIGIR conference on research & development in information retrieval, 2018, pp. 929-932.
[29] T. Jiang, J. Wang, Z. Liu, and Y. Ling, "Fusion-extraction network for multimodal sentiment analysis", Advances in Knowledge Discovery and Data Mining , Vol. 12085,2020,785.
[30] X. Yang, S. Feng, D. Wang, and Y. Zhang, "Image-text Multimodal Emotion Classification via Multi-view Attentional Network", IEEE Transactions on Multimedia, Vol. 23, 2020, pp. 41014-4026
[31] D. Gkoumas, Q. Li, C. Lioma, Y. Yu, and D. Song, "What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis", Information Fusion, Vol. 66, 2021, pp. 184-197.
[32] Y. Xiao, F. Codevilla, A. Gurram, O. Urfalioglu, and AM. López, "Multimodal end-to-end autonomous driving", IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 2022, pp. 537-547.
[33] X. Zhang, J. Liu, J. Shen, S. Li, K. Hou, B. Hu, J. Gao, and T. Zhang, "Emotion recognition from multimodal physiological signals using a regularized deep fusion of kernel machine", IEEE transactions on cybernetics, Vol. 51(9), 2021, 4386-4399.
[34] J. Huang, J. Tao, B. Liu, Z. Lian, and M. Niu, "Multimodal transformer fusion for continuous emotion recognition", In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2020, pp. 507-511.
]35[ فاضل فصیحی، محمودرضا کی­منش، سیدعلی صحاف و سهیل قره، "تعیین ضریب بار هم‌ارز مبتنی بر الگوریتم شبکه عصبی مصنوعی ، نشریه مدل سازی در مهندسی، دوره 19، شماره 65، تابستان 1400، صفحه 149-160.
[36] Y. Zhang, B. Wallace, "A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification", arXiv preprint arXiv:1510.03820, 2015.
[37] Y. Cheng, L. Yao, G. Xiang, G. Zhang, T. Tang, and L. Zhong , " Text sentiment orientation analysis based on multi-channel CNN and bidirectional GRU with attention mechanism", IEEE Access , Vol. 8, 2020, pp. 964-975.
[38] AH. Ombabi, W. Ouarda, and AM. Alimi , "Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks", Social Network Analysis and Mining , Vol 10 (1) ,2020, pp.1-13.
[39] JP. Gujjar, HP. Kumar, and NN. Chiplunkar, Image classification and prediction using transfer learning in colab notebook. Global Transitions Proceedings, 2021. Vol. 2(2), p. 382-385.
[40] T. Tang, X. Tang, and T. Yuan ,"Fine-Tuning BERT for Multi-Label Sentiment Analysis in Unbalanced Code-Switching Text",IEEE Access , Vol.8,2020, pp. 248-256.
[41] TN. Rincy, R. Gupta,"Ensemble Learning Techniques and its Efficiency in Machine Learning: A Survey", In: 2nd International Conference on Data, Engineering and Applications (IDEA), IEEE, 2020, pp. 1-6.
[42] X. Frazao, LA. Alexandre, "Weighted convolutional neural network ensemble", in Iberoamerican Congress on Pattern Recognition. 2014. In: E. Bayro-Corrochano, E. Hancock, (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2014. Lecture Notes in Computer Science, vol 8827.
[43] Á. Casado-García, J. Heras," Ensemble methods for object detection", In:  ECAI 2020. IOS Press, (2020) pp. 688-695.
[44] Y. Kawana, N. Ukita, J-B. Huang,and M-H. Yang,"Ensemble convolutional neural networks for pose estimation", Computer Vision and Image Understanding , Vol 169 ,2018, pp. 62-74.
[45] S. Poria, H. Peng, A. Hussain, N. Howard, and E. Cambria, "Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis", Neurocomputing, Vol. 261, 2017, pp. 217-230.
[46] L. Nanni, YM. Costa, RL. Aguiar, RB. Mangolin, S. Brahnam, and CN Silla ," Ensemble of convolutional neural networks to improve animal audio classification", EURASIP Journal on Audio, Speech, and Music Processing, 2020, https://doi.org/10.1186/s13636-020-00175-3.
[47] AK. Das, S. Ghosh, S. Thunder, R. Dutta, S. Agarwal, and A. Chakrabarti, "Automatic COVID-19 detection from X-ray images using ensemble learning with convolutional neural network", Pattern Analysis and Applications, Vol. 24, 2021, pp. 1111-1124.
[48] D. Alexandru, S. Stelian, NI. Alina, and F.Aschim, "Ensembles of Convolutional Neural Networks Trained Using Unconventional Data for Stock Predictions", In:  Business Revolution in a Digital Era, Springer, 2021, pp. 241-250.
[49] Y. Wang, M. Huang, X. Zhu, and L. Zhao, "Attention-based LSTM for aspect-level sentiment classification", In: Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 606-615.
[50] J. Briskilal, C. Subalalitha, "An ensemble model for classifying idioms and literal texts using BERT and RoBERTa", Information Processing & Management, Vol. 59(1), 2022, 102756.
[51] J. Devlin, M-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding", 2018, arXiv preprint arXiv:181004805.
[52] K. Simonyan, A. Zisserman, "Very deep convolutional networks for large-scale image recognition", arXiv preprint ,2014,arXiv:14091556.
[53] S. Qian, C. Ning,and Y. Hu ,"MobileNetV3 for Image Classification", In: 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), IEEE, 2021, pp. 490-497.
[54] M. Abd Elaziz, A. Dahou, NA. Alsaleh, AH. Elsheikh, AI. Saba, and M. Ahmadein, " Boosting COVID-19 Image Classification Using MobileNetV3 and Aquila Optimizer Algorithm" Entropy, Vol. 23(11), 2021, 1383.
[55] Q .You, L. Cao, H. Jin, and J. Luo ,"Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networks", In: proceedings of the 24th ACM international conference on multimedia, 2016, pp. 1008-1017.
[56] Q. You, H. Jin,and J. Luo  ,"Visual sentiment analysis by attending on local image regions", In: Thirty-First AAAI Conference on Artificial Intelligence, 2017, pp. 231-237.
[57] H. Chen, M. Sun, C. Tu, Y. Lin, and Z. Liu  ,"Neural sentiment classification with user and product attention", In: Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 1650-1659.
[58] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, AN. Gomez, Ł. Kaiser,and I. Polosukhin," Attention is all you need. In: Advances in neural information processing systems", 2017, pp. 5998-6008.
[59] Y. Zhu, W. Zheng, and H. Tang, "Interactive dual attention network for text sentiment classification", Computational Intelligence and Neuroscience 2020, p. 8858717.
[60] Q. Le, T. Mikolov ,"Distributed Representations of Sentences and Documents", In: Proceedings of the 31st International Conference on Machine Learning Research, PMLR, Vol. 32(2), 2014, pp. 1188-1196.