Semester of Graduation

Summer 2018


Master of Science (MS)


Computer Science and Engineering

Document Type



A tremendous amount of data is generated every day from a wide range of sources such as social networks, sensors, and application logs. Among them, graph data is one type that represents valuable relationships between various entities. Analytics of large graphs has become an essential part of business processes and scientific studies because it leads to deep and meaningful insights into the related domain based on the connections between various entities. However, the optimal processing of large-scale iterative graph computations is very challenging due to the issues like fault tolerance, high memory requirement, parallelization, and scalability. Most of the contemporary systems focus either on keeping the entire graph data in memory and minimizing the disk access or on processing the graph data completely on a single node with a centralized disk system. GraphMap is one of the state-of-the-art scalable and efficient out-of-core disk-based iterative graph processing systems that focus on using the secondary storage and optimizing the I/O access. In this thesis, we investigate two new extensions to the existing out-of-core NoSQL-based distributed iterative graph processing system: 1) Intra-worker data locality and 2) Mincut-based partitioning. We design an additional suite of data locality that moves the computation towards the data rather than the other way around. A significant improvement in performance, up to 39\%, is demonstrated by this locality implementation. Similarly, we use the mincut-based graph partitioning technique to distribute the graph data uniformly across the workers for parallelization so that the inter-worker communication volume is minimized. By extensive experiments, we also show that the mincut-based graph partitioning technique can lead to improper parallelization due to sub-optimal load-balancing.



Committee Chair

Lee, Kisung