摘要

In order to enable scientists to quickly locate the key process of the experiment and obtain detailed experimental process information, it is necessary to automatically add descriptive content to space science experiments. Aiming at the problem of small target and small data sample of space science experiment, this paper proposes the image captioning of space science experiment based on multi-modal learning. It is mainly divided into four parts: semantic segmentation model based on improved U-Net, space science experimental vocabulary candidate based on semantic segmentation, general scene image feature vector extraction from bottom-up model and image caption based on multimodal learning. In addition, the dataset of space science experiment is constructed, including semantic masks and image caption annotations. Experimental results demonstrate that: compared with the state-of-the-art image caption model neuraltalk2, the accuracy evaluation of the proposed algorithm is improved by 0.089 for METEOR and 0.174 for SPICE. It solves the difficulty of small objectives and small data samples of space science experiment. It constructs a model of space science experiment image caption based on multi-modal learning, which meets the requirements of describing space science experiment professionally and accurately, and realizes the ability from low-level sense to deep scene understanding.