LSU Doctoral Dissertations

Knowledge-based methods for automatic extraction of domain-specific ontologies

Identifier

etd-01042007-194007

Janardhana R. Punuru, Louisiana State University and Agricultural and Mechanical CollegeFollow

Degree

Doctor of Philosophy (PhD)

Department

Computer Science

Document Type

Dissertation

Abstract

Semantic web technology aims at developing methodologies for representing large amount of knowledge in web accessible form. The semantics of knowledge should be easy to interpret and understand by computer programs, so that sharing and utilizing knowledge across the Web would be possible. Domain specific ontologies form the basis for knowledge representation in the semantic web. Research on automated development of ontologies from texts has become increasingly important because manual construction of ontologies is labor intensive and costly, and, at the same time, large amount of texts for individual domains is already available in electronic form. However, automatic extraction of domain specific ontologies is challenging due to the unstructured nature of texts and inherent semantic ambiguities in natural language. Moreover, the large size of texts to be processed renders full-fledged natural language processing methods infeasible. In this dissertation, we develop a set of knowledge-based techniques for automatic extraction of ontological components (concepts, taxonomic and non-taxonomic relations) from domain texts. The proposed methods combine information retrieval metrics, lexical knowledge-base(like WordNet), machine learning techniques, heuristics, and statistical approaches to meet the challenge of the task. These methods are domain-independent and automatic approaches. For extraction of concepts, the proposed WNSCA+{PE, POP} method utilizes the lexical knowledge base WordNet to improve precision and recall over the traditional information retrieval metrics. A WordNet-based approach, the compound term heuristic, and a supervised learning approach are developed for taxonomy extraction. We also developed a weighted word-sense disambiguation method for use with the WordNet-based approach. An unsupervised approach using log-likelihood ratios is proposed for extracting non-taxonomic relations. Further more, a supervised approach is investigated to learn the semantic constraints for identifying relations from prepositional phrases. The proposed methods are validated by experiments with the Electronic Voting and the Tender Offers, Mergers, and Acquisitions domain corpus. Experimental results and comparisons with some existing approaches clearly indicate the superiority of our methods. In summary, a good combination of information retrieval, lexical knowledge base, statistics and machine learning methods in this study has led to the techniques efficient and effective for extracting ontological components automatically.

Date

2007

Document Availability at the Time of Submission

Release the entire work immediately for access worldwide.

Recommended Citation

Punuru, Janardhana R., "Knowledge-based methods for automatic extraction of domain-specific ontologies" (2007). LSU Doctoral Dissertations. 708.
https://repository.lsu.edu/gradschool_dissertations/708

Committee Chair

Jianhua Chen

DOI

10.31390/gradschool_dissertations.708

Download

Included in

Computer Sciences Commons

COinS

LSU Doctoral Dissertations

Knowledge-based methods for automatic extraction of domain-specific ontologies

Identifier

Degree

Department

Document Type

Abstract

Date

Document Availability at the Time of Submission

Recommended Citation

Committee Chair

DOI

Included in

Search

Browse

Author Corner

SPONSORED BY

LSU Doctoral Dissertations

Knowledge-based methods for automatic extraction of domain-specific ontologies

Identifier

Author

Degree

Department

Document Type

Abstract

Date

Document Availability at the Time of Submission

Recommended Citation

Committee Chair

DOI

Included in

Share

Search

Browse

Author Corner

SPONSORED BY