Identifier
etd-11142012-040550
Degree
Master of Science in Engineering Science (MSES)
Department
Engineering Science (Interdepartmental Program)
Document Type
Thesis
Abstract
Rhetorical Structure Theory (Mann et al. 1988), a popular approach for analyzing discourse coherence, suggests that coherent text can be placed into a hierarchical organization of clauses. Identification of a text’s rhetorical structure through automatic discourse analysis is a crucial element for many of today’s Natural Language Processing tasks, but no sufficient tool is available. The current state-of -the-art discourse parser, SPADE (Soricut et al. 2003), is limited to parsing discourse within a single sentence. HILDA (Hernault et al. 2010) extends the parsing abilities of SPADE to the document level, but with a decrease in performance. This study achieved document-level discourse parsing without sacrificing performance. Provided text was already segmented into elementary discourse units, the task of discourse parsing was separated into three steps: structuring, nuclearity labeling, and relation labeling. An algorithm was developed for classifying relation existence, nuclearity, and relation label that improved upon previous methods. New features were explored for all three steps to maintain state-of-the-art performance when parsing at the document-level.
Date
2012
Document Availability at the Time of Submission
Student has submitted appropriate documentation to restrict access to LSU for 365 days after which the document will be released for worldwide access.
Recommended Citation
Guidry, Jamie Allison, "Improving discourse structure identification" (2012). LSU Master's Theses. 2209.
https://repository.lsu.edu/gradschool_theses/2209
Committee Chair
Knapp, Gerald
DOI
10.31390/gradschool_theses.2209