Discourse structure identification for knowledge extraction

Document Type

Conference Proceeding

Publication Date

1-1-2013

Abstract

Identification of a document's discourse structure - what each part contributes to the ideas presented, such as hypothesis, support, comparison, and results - is a key precursor to improving knowledge extraction from technical documents. As yet, only a few efforts have been made at automating discourse structure identification, with limited success. The current state-of-the-art discourse parser, SPADE, is limited to parsing discourse within a single sentence. HILDA extends the parsing abilities of SPADE to the document level structure, but with a significant decrease in performance. Both are based on Rhetorical Structure Theory (RST), a widely accepted approach for analyzing discourse coherence, and which holds that coherent text can be placed into a hierarchical organization of interrelated clauses. This paper documents the first part of a study that will achieve RST-based document-level discourse parsing without sacrificing performance. It addresses the first two steps of discourse parsing: structuring and nuclearity labeling. An algorithm was developed for classifying relation existence and nuclearity that improved upon previous methods.

Publication Source (Journal or Book title)

IIE Annual Conference and Expo 2013

First Page

214

Last Page

223

This document is currently not available here.

Share

COinS