Faculty Publications

Answering knowledge-based visual questions via the exploration of Question Purpose

Lingyun Song, Northwestern Polytechnical University
Jianao Li, Northwestern Polytechnical University
Jun Liu, Xi'an Jiaotong University
Yang Yang, University of Electronic Science and Technology of China
Xuequn Shang, Northwestern Polytechnical University
Mingxuan Sun, LSU College of Engineering

Document Type

Article

Publication Date

1-1-2023

Abstract

Visual question answering has been greatly advanced by deep learning technologies, but still remains an open problem subjected to two aspects of factors. First, previous works estimate the correctness of each candidate answer mainly by its semantic correlations with visual questions, overlooking the fact that some questions and their answers are semantically inconsistent. Second, previous works that require external knowledge mainly uses the knowledge facts retrieved by key words or visual objects. However, the retrieved knowledge facts may only be related to the semantics of the question, but are useless or even misleading for answer prediction. To address these issues, we investigate how to capture the purpose of visual questions and propose a Purpose Guided Visual Question Answering model, called PGVQA. It mainly has two appealing properties: (1) It can estimate the correctness of candidate answers based on the Question Purpose (QP) that reveals which aspects of the concept are examined by visual questions. This is helpful for avoiding the negative effect of the semantic inconsistency between answers and questions. (2) It can incorporate the knowledge facts accordant with the QP into answer prediction, which helps to improve the probability of answering visual questions correctly. Empirical studies on benchmark datasets show that PGVQA achieves state-of-the-art performance.

Publication Source (Journal or Book title)

Pattern Recognition

Recommended Citation

Song, L., Li, J., Liu, J., Yang, Y., Shang, X., & Sun, M. (2023). Answering knowledge-based visual questions via the exploration of Question Purpose. Pattern Recognition, 133 https://doi.org/10.1016/j.patcog.2022.109015

This document is currently not available here.

COinS

Faculty Publications

Answering knowledge-based visual questions via the exploration of Question Purpose

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

Recommended Citation

Search

Browse

Author Corner

SPONSORED BY

Faculty Publications

Answering knowledge-based visual questions via the exploration of Question Purpose

Authors

Document Type

Publication Date

Abstract

Publication Source (Journal or Book title)

Recommended Citation

Share

Search

Browse

Author Corner

SPONSORED BY