Deep learning based image description generation

Philip Kinghorn, Li Zhang, Ling Shao

Research output: Chapter in Book/Report/Conference proceedingChapter

6 Citations (Scopus)


Describing the contents of images is a challenging task for machines to achieve. It requires not only accurate recognition of objects and humans, but also their attributes and relationships as well as scene information. It would be even more challenging to extend this process to identify falls and hazardous objects to aid elderly or users in need of care. This research makes initial attempts to deal with the above challenges to produce multi-sentence natural language description of image contents. It employs a local region based approach to extract regional image details and combines multiple techniques including deep learning and attribute learning through the use of machine learned features to create high level labels that can generate detailed description of real-world images. The system contains the core functions of scene classification, object detection and classification, attribute learning, relationship detection and sentence generation. We have also further extended this process to deal with open-ended fall detection and hazard identification. In comparison to state-of-the-art related research, our system shows superior robustness and flexibility in dealing with test images from new, unrelated domains, which poses great challenges to many existing methods. Our system is evaluated on a subset from Flickr8k and Pascal VOC 2012 and achieves an impressive average BLEU score of 46 and outperforms related research by a significant margin of 10 BLEU score when evaluated with a small dataset of images containing falls and hazardous objects. It also shows impressive performance when evaluated using a subset of IAPR TC-12 dataset.
Original languageEnglish
Title of host publicationNeural Networks (IJCNN), 2017 International Joint Conference on
Place of PublicationPiscataway
ISBN (Electronic)978-1-5090-6182-2
ISBN (Print)978-1-5090-6183-9
Publication statusE-pub ahead of print - 3 Jul 2017
Event2017 International Joint Conference on Neural Networks (IJCNN) - Anchorage, AK, USA
Duration: 14 May 201719 May 2017


Conference2017 International Joint Conference on Neural Networks (IJCNN)


Dive into the research topics of 'Deep learning based image description generation'. Together they form a unique fingerprint.

Cite this