LSU Master's Theses

Parallel Suffix Tree Construction for Genome Sequence Using Hadoop

Identifier

etd-08122013-104039

Umesh Chandra Satish, Louisiana State University and Agricultural and Mechanical CollegeFollow

Degree

Master of Science in Computer Science (MSCS)

Department

Computer Science

Document Type

Thesis

Abstract

Indexing the genome is the basis for many of the bioinformatics applications. Read mapping (sequence alignment) is one such application to align millions of short reads against reference genome. Several tools like BLAST, SOAP, BOWTIE, Cloudburst, and Rapid Parallel Genome Indexing with MapReduce use indexing technique for aligning short reads. Many of the contemporary alignment techniques are time consuming, memory intensive and cannot be easily scaled to larger genomes. Suffix tree is a popular data structure which can be used to overcome the demerits of other alignment techniques. However, constructing the suffix tree is highly memory intensive and time consuming. In this thesis, a MapReduce based parallel construction of the suffix tree is proposed. The performance of the algorithm is measured on the hadoop framework over commodity cluster with each node having 8GB of primary memory. The results show a significantly less time for constructing suffix tree for a big data like human genome.

Date

2013

Document Availability at the Time of Submission

Secure the entire work for patent and/or proprietary purposes for a period of one year. Student has submitted appropriate documentation which states: During this period the copyright owner also agrees not to exercise her/his ownership rights, including public use in works, without prior authorization from LSU. At the end of the one year period, either we or LSU may request an automatic extension for one additional year. At the end of the one year secure period (or its extension, if such is requested), the work will be released for access worldwide.

Recommended Citation

Satish, Umesh Chandra, "Parallel Suffix Tree Construction for Genome Sequence Using Hadoop" (2013). LSU Master's Theses. 1665.
https://repository.lsu.edu/gradschool_theses/1665

Committee Chair

Park, Seung-Jong

DOI

10.31390/gradschool_theses.1665

Download

Included in

Computer Sciences Commons

COinS

LSU Master's Theses

Parallel Suffix Tree Construction for Genome Sequence Using Hadoop

Identifier

Degree

Department

Document Type

Abstract

Date

Document Availability at the Time of Submission

Recommended Citation

Committee Chair

DOI

Included in

Search

Browse

Author Corner

SPONSORED BY

LSU Master's Theses

Parallel Suffix Tree Construction for Genome Sequence Using Hadoop

Identifier

Author

Degree

Department

Document Type

Abstract

Date

Document Availability at the Time of Submission

Recommended Citation

Committee Chair

DOI

Included in

Share

Search

Browse

Author Corner

SPONSORED BY