نوع مقاله : مقاله کامپیوتر
نویسندگان
1 Hamedan
2 کارشناسی ارشد- دانشگاه صنعتی همدان
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Image captioning involves the process of assigning descriptive text to images or photographs. To create an accurate description, several steps are necessary: Object Identification: Initially, the objects within the image must be correctly identified. This includes recognizing their specific features and understanding the relationships between them. Sentence Generation: Once the objects are identified, grammatically and semantically correct sentences are generated to describe the image.
In this research, an encoder-decoder architecture is employed for producing textual descriptions. The proposed model consists of three following components: Encoder (ResNet): The ResNet network serves as the encoder, extracting visual features from the input image. Decoder (Convolutional Network): In the decoding section, a four-layer convolutional neural network (CNN) generates descriptions within the language model. Attention Mechanism: To enhance the representation of image features and understand object relationships, an attention mechanism is utilized. This mechanism allows the model to focus on both the input image and the language model. The performance of the proposed model is evaluated using the MSCOCO and Flickr datasets. Experimental results demonstrate that the proposed architecture outperforms state-of-the-art researches in terms of Bleu1 and Meteor measures, while also achieving reduced training time compared to them
کلیدواژهها [English]