Multi-Stage Selective Re-Decoding Module for Image Paragraph Captioning
Download as PDF
DOI: 10.23977/cnci2021.008
Author(s)
Guozhang Nie, Xian Zhong, Chengming Zou, Qi Cu and Luo Zhong
Corresponding Author
Xian Zhong
ABSTRACT
Image paragraph captioning describes an image with a paragraph. Existing
methods typically train hierarchical networks with a one-stage strategy, where one-stage
means those models directly generate a description without multi-stage modification. Due
to the exposure bias, we have observed that there may be errors and omissions in the
description generation process, such as one object in the image is wrongly expressed or one
subregion in the image is neglected. To solve this problem,we present a novel approach for
image paragraph captioning, called the multi-stage selective re-decoding (MSSRD)
module,which extends the conventional one-stage methods to generate richer captions. After gaining a preliminary caption, our module dynamically selects appropriate words and
un-decoded visual features that are in the previous stage. These selected features are re- decoded into a new caption in the next stage. The new caption is more diverse and finer
than previous one. We conduct extensive experiments to demonstrate the significance of our
work.
KEYWORDS
Image Paragraph Captioning, Encoder-Decoder, Multi-Stage Re-Decoding