Discourse structure identification for knowledge extraction
Document Type
Conference Proceeding
Publication Date
1-1-2013
Abstract
Identification of a document's discourse structure - what each part contributes to the ideas presented, such as hypothesis, support, comparison, and results - is a key precursor to improving knowledge extraction from technical documents. As yet, only a few efforts have been made at automating discourse structure identification, with limited success. The current state-of-the-art discourse parser, SPADE, is limited to parsing discourse within a single sentence. HILDA extends the parsing abilities of SPADE to the document level structure, but with a significant decrease in performance. Both are based on Rhetorical Structure Theory (RST), a widely accepted approach for analyzing discourse coherence, and which holds that coherent text can be placed into a hierarchical organization of interrelated clauses. This paper documents the first part of a study that will achieve RST-based document-level discourse parsing without sacrificing performance. It addresses the first two steps of discourse parsing: structuring and nuclearity labeling. An algorithm was developed for classifying relation existence and nuclearity that improved upon previous methods.
Publication Source (Journal or Book title)
IIE Annual Conference and Expo 2013
First Page
214
Last Page
223
Recommended Citation
Guidry, J., Javadpour, L., & Knapp, G. (2013). Discourse structure identification for knowledge extraction. IIE Annual Conference and Expo 2013, 214-223. Retrieved from https://repository.lsu.edu/mechanical_engineering_pubs/1477