چت‌بات مولد شرطی با استفاده از یادگیری عمیق

نوع مقاله : مقاله پژوهشی

نویسندگان

دانشکده مهندسی برق و کامپیوتر، دانشگاه سمنان، سمنان، ایران

چکیده

چت‌بات‌ها به عنوان ابزاری برای ارتباط بین کاربر انسانی و ماشین عمل می‌کنند تا براساس ورودی انسان، پاسخ مناسبی ارائه دهند. در رویکردهای اخیر، ترکیبی از پردازش زبان طبیعی و مدل‌های دنباله‌ای برای ساخت چت ‌بات‌های مولد استفاده می‌شود. چالش اصلی این مدل‌ها ماهیت دنباله‌ای آن‌ها است که منجر به کاهش دقت پاسخ ها می‌شود. برای مقابله با این چالش، در این مقاله، یک معماری جدید با استفاده از شبکه مولد تخاصمی Wasserstein شرطی (cWGAN) و مدل ترانسفورمر برای تولید پاسخ در چت ‌بات‌ها پیشنهاد شده است. در حالی که مولد مدل پیشنهادی شامل یک مدل کامل ترانسفورمر برای تولید پاسخ است، تمایزگر تنها شامل قسمت رمزگذار مدل ترانسفورمر به همراه یک طبقه‌بند است. تا آنجا که ما می‌دانیم، این اولین باری است که یک چت‌بات مولد با استفاده از ترانسفورمر تعبیه‌شده در هر دو مدل مولد و تمایزگر پیشنهاد می‌شود. با تکیه بر محاسبات موازی مدل ترانسفورمر، نتایج مدل پیشنهادی بر روی مجموعه داده‌های مکالمات Cornell و Chit-Chat، برتری مدل پیشنهادی را نسبت به روش‌های قبلی با استفاده از معیارهای ارزیابی مختلف تأیید می‌کند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

A Conditional Generative Chatbot using Deep Learning

نویسندگان [English]

  • Nura Esfandiari
  • Kourosh Kiani
  • Razieh Rastgoo
Electrical and Computer Engineering Faculty, Semnan University, Semnan, Iran
چکیده [English]

A Chatbot serves as a communication tool between a human user and a machine to achieve an appropriate answer based on the human input. In more recent approaches, a combination of Natural Language Processing and sequential models are used to build a generative Chatbot. The main challenge of these models is their sequential nature, which leads to less accurate results. To tackle this challenge, in this paper, a novel architecture is proposed using conditional Wasserstein Generative Adversarial Networks and a transformer model for answer generation in Chatbots. While the generator of the proposed model consists of a full transformer model to generate an answer, the discriminator includes only the encoder part of a transformer model followed by a classifier. To the best of our knowledge, this is the first time that a generative Chatbot is proposed using the embedded transformer in both generator and discriminator models. Relying on the parallel computing of the transformer model, the results of the proposed model on the Cornell Movie-Dialog corpus and the Chit-Chat datasets confirm the superiority of the proposed model compared to state-of-the-art alternatives using different evaluation metrics.

کلیدواژه‌ها [English]

  • Generative chatbot
  • Deep learning
  • Conditional generative model
  • Transformer model
  • Dialog system
[1] Tran, Anh D., Jason I. Pallant, and Lester W. Johnson. "Exploring the impact of chatbots on consumer sentiment and expectations in retail." Journal of Retailing and Consumer Services 63 (2021): 102718.
[2] Miklosik, Andrej, Nina Evans, and Athar Mahmood Ahmed Qureshi. "The use of chatbots in digital business transformation: A systematic literature review." Ieee Access 9 (2021): 106530-106539.
[3] Okonkwo, Chinedu Wilfred, and Abejide Ade-Ibijola. "Chatbots applications in education: A systematic review." Computers and Education: Artificial Intelligence 2 (2021): 100033.
[4] Mogaji, Emmanuel, Janarthanan Balakrishnan, Arinze Christian Nwoba, and Nguyen Phong Nguyen. "Emerging-market consumers’ interactions with banking chatbots." Telematics and Informatics 65 (2021): 101711.
[5] Tsai, Meng-Han, Cheng-Hsuan Yang, James Yichu Chen, and Shih-Chung Kang. "Four-stage framework for implementing a chatbot system in disaster emergency operation data management: A flood disaster management case study." KSCE Journal of Civil Engineering 25, no. 2 (2021): 503-515.
[6] Ayanouz, Soufyane, Boudhir Anouar Abdelhakim, and Mohammed Benhmed. "A Smart Chatbot Architecture Based NLP and Machine Learning for Health Care Assistance." Niss (2020).
[7] Esfandiari, Nura, Kourosh Kiani, and Razieh Rastgoo. "Development of a Persian Mobile Sales Chatbot based on LLMs and Transformer." Journal of AI and Data Mining, no. 12 (2025): 465-472.
[8] Esfandiari, Nura, Kourosh Kiani, and Razieh Rastgoo. "Transformer-based Generative Chatbot Using Reinforcement Learning." Journal of AI and Data Mining, no. 12 (2025): 349-358.  
[9] Esfandiari, Nura, Kourosh Kiani, and Razieh Rastgoo. "A conditional generative chatbot using transformer model." arXiv:2306.02074, (2023).
[10] Zhu, Yutao, Jian-Yun Nie, Kun Zhou, Pan Du, Hao Jiang, and Zhicheng Dou. "Proactive retrieval-based chatbots based on relevant knowledge and goals." In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2000-2004. 2021.
[11] Dhyani, Manyu, and Rajiv Kumar. "An intelligent Chatbot using deep learning with Bidirectional RNN and attention model." Materials today: proceedings 34 (2021): 817-824.
[12] Wang, Yanmeng, Wenge Rong, Yuanxin Ouyang, and Zhang Xiong. "Augmenting dialogue response generation with unstructured textual knowledge." IEEE Access 7 (2019): 34954-34963.
[13] Peng, Yehong, Yizhen Fang, Zhiwen Xie, and Guangyou Zhou. "Topic-enhanced emotional conversation generation with attention mechanism." Knowledge-Based Systems 163 (2019): 429-437.
[14] Yang, Min, Wenting Tu, Qiang Qu, Zhou Zhao, Xiaojun Chen, and Jia Zhu. "Personalized response generation by dual-learning based domain adaptation." Neural Networks 103 (2018): 72-82.
[15] Lin, Tzu-Hsuan, Yu-Hua Huang, and Alan Putranto. "Intelligent question and answer system for building information modeling and artificial intelligence of things based on the bidirectional encoder representations from transformers model." Automation in Construction 142 (2022): 104483.
[16] Masum, Abu Kaisar Mohammad, Sheikh Abujar, Sharmin Akter, Nushrat Jahan Ria, and Syed Akhter Hossain. "Transformer based bengali chatbot using general knowledge dataset." In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1235-1238. IEEE, 2021.
[17] Peng, Baolin, Michel Galley, Pengcheng He, Chris Brockett, Lars Liden, Elnaz Nouri, Zhou Yu, Bill Dolan, and Jianfeng Gao. "Godel: Large-scale pre-training for goal-directed dialog." arXiv preprint arXiv:2206.11309 (2022).
[18] Shao, Taihua, Yupu Guo, Honghui Chen, and Zepeng Hao. "Transformer-based neural network for answer selection in question answering." IEEE Access 7 (2019): 26146-26156.
[19] Shang, Shengjie, Jin Liu, and Yihe Yang. "Multi-layer transformer aggregation encoder for answer generation." IEEE Access 8 (2020): 90410-90419.
[20] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is All You Need." Nips'17, (2017): 6000–6010.
[21] Yu, Lantao, Weinan Zhang, Jun Wang, and Yong Yu. "Seqgan: Sequence generative adversarial nets with policy gradient." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1. 2017.
[22] Tuan, Yi-Lin, and Hung-Yi Lee. “Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, (2019).
[23] Lin, Chien-Chang, Anna YQ Huang, and Stephen JH Yang. "A review of ai-driven conversational chatbots implementation methodologies and challenges (1999–2022)." Sustainability 15, no. 5 (2023): 4012.
[24] Jeong, Cheonsu. "Fine-tuning and utilization methods of domain-specific llms." arXiv preprint arXiv:2401.02981 (2024).
[25] Adamopoulou, Eleni, and Lefteris Moussiades. "Chatbots: History, technology, and applications." Machine Learning with Applications 2 (2020): 100006.
[26] Peng, Zhenhui, and Xiaojuan Ma. "A survey on construction and enhancement methods in service chatbots design." CCF Transactions on Pervasive Computing and Interaction 1, no. 3 (2019): 204-223.
[27] Weizenbaum, Joseph. "ELIZA—a computer program for the study of natural language communication between man and machine." Communications of the ACM 9, no. 1 (1966): 36-45.
[28] Colby, Kenneth Mark. "Artificial Paranoia: A Computer Simulation of Paranoid Processes." Elsevier Science Inc: New York, (1975).
[29] Wallace, R.S. "The Anatomy of A.L.I.C.E, in Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer." Springer Netherlands: Dordrecht. (2009): p. 181-210.
[30] Yan, Rui, Yiping Song, and Hua Wu. "Learning to respond with deep neural networks for retrieval-based human-computer conversation system." In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 55-64. 2016.
[31] Lowe, Ryan, Nissan Pow, Iulian Serban, and Joelle Pineau. "The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems." arXiv preprint arXiv:1506.08909 (2015).
[32] Lu, Zhengdong, and Hang Li. "A deep architecture for matching short texts." Advances in Neural Information Processing Systems 26 (2013).
[33] Zhou, Xiangyang, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu, and Hua Wu. "Multi-turn response selection for chatbots with deep attention matching network." In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1118-1127. 2018.
[34] Shu, Chang, Zijian Zhang, Youxin Chen, Jing Xiao, Jey Han Lau, Qian Zhang, and Zheng Lu. "Open domain response generation guided by retrieved conversations." IEEE Access 11 (2022): 99365-99375.
[35] Serban, Iulian, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, and Yoshua Bengio. "A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues." Aaai'17, (2017): 3295–3301.
[36] Zhang, Wei-Nan, Qingfu Zhu, Yifa Wang, Yanyan Zhao, and Ting Liu. "Neural personalized response generation as domain adaptation." World Wide Web 22, no. 4 (2019): 1427-1446.
[37] u, Jiatao, Zhengdong Lu, Hang Li, and Victor OK Li. “Incorporating Copying Mechanism in Sequence-to-Sequence Learning.” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics. (2016).
[38] Palasundram, Kulothunkan, Nurfadhlina Mohd Sharef, Khairul Azhar Kasmiran, and Azreen Azman. "SEQ2SEQ++: A multitasking-based Seq2Seq model to generate meaningful and relevant answers." IEEE Access 9 (2021): 164949-164975.
[39] Rao, K. Yogeswara, and K. Srinivasa Rao. "Modeling text generation with contextual feature representation and di-mensionalitreduction using deep transfer learning and bi-lstm." Journal of Theoretical and Applied Information Technology 100, no. 9 (2022).
[40] Zhang, Yizhe, et al. “DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation.” in Annual Meeting of the Association for Computational Linguistics. (2019).
[41] Yu, Shi, Yuxin Chen, and Hussain Zaidi. "AVA: A financial service chatbot based on deep bidirectional transformers." Frontiers in Applied Mathematics and Statistics 7 (2021): 604842.
[42] Lin, Zhaojiang, Peng Xu, Genta Indra Winata, Farhad Bin Siddique, Zihan Liu, Jamin Shin, and Pascale Fung. "Caire: An end-to-end empathetic chatbot." In Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 09, pp. 13622-13623. 2020.
[43] Nuruzzaman, Mohammad, and Omar Khadeer Hussain. "IntelliBot: A Dialogue-based chatbot for the insurance industry." Knowledge-Based Systems 196 (2020): 105810.
[44] Wu, Yuxi, and Junli Wang. "Text generation service model based on truth-guided SeqGAN." IEEE Access 8 (2020): 11880-11886.
[45] Diao, Shizhe, Xinwei Shen, Kashun Shum, Yan Song, and Tong Zhang. "TILGAN: transformer-based implicit latent GAN for diverse and coherent text generation." In Findings of the Association for Computational linguistics: ACL-IJCNLP 2021, pp. 4844-4858. 2021.
[46] Zhang, Jiayi, Chongyang Tao, Zhenjing Xu, Qiaojing Xie, Wei Chen, and Rui Yan. "Ensemblegan: Adversarial learning for retrieval-generation ensemble model on short-text conversation." In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 435-444. 2019.
[47] Zhu, Qingfu, Lei Cui, Weinan Zhang, Furu Wei, Ting Liu. "Retrieval-Enhanced Adversarial Training for Neural Response Generation." In Annual Meeting of the Association for Computational Linguistics. (2018).
[48] Zhang, Liang, Yan Yang, Jie Zhou, Chengcai Chen, and Liang He. "Retrieval-polished response generation for chatbot." IEEE Access 8 (2020): 123882-123890.
[49] Cuayáhuitl, Heriberto, Donghyeon Lee, Seonghan Ryu, Yongjin Cho, Sungja Choi, Satish Indurthi, Seunghak Yu, Hyungtak Choi, Inchul Hwang, and Jihie Kim. "Ensemble-based deep reinforcement learning for chatbots." Neurocomputing 366 (2019): 118-130.
[50] Lin, Chien-Chang, Anna YQ Huang, and Stephen JH Yang. "A review of ai-driven conversational chatbots implementation methodologies and challenges (1999–2022)." Sustainability 15, no. 5 (2023): 4012.
[51] Brownlee, Jason. Generative adversarial networks with python: deep learning generative models for image synthesis and image translation. Machine Learning Mastery, 2019.
[52] Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in neural information processing systems 33 (2020): 9459-9474.