Doctor of Philosophy (PhD)


Computer Science

Document Type



Multivariate informational data, which are abstract as well as complex, are becoming increasingly common in many areas such as scientific, medical, social, business, and so on. Displaying and analyzing large amounts of multivariate data with more than three variables of different types is quite challenging. Visualization of such multivariate data suffers from a high degree of clutter when the numbers of dimensions/variables and data observations become too large. We propose multiple approaches to effectively visualize large datasets of ultrahigh number of dimensions by generalizing two standard multivariate visualization methods, namely star plot and parallel coordinates plot. We refine three variants of the star plot, which include overlapped star plot, shifted origin plot, and multilevel star plot by embedding distribution plots, displaying dataset in groups, and supporting adjustable positioning of the star axes. We introduce a bifocal parallel coordinates plot (BPCP) based on the focus + context approach. BPCP splits vertically the overall rendering area into the focus and context regions. The focus area maps a few selected dimensions of interest at sufficiently wide spacing. The remaining dimensions are represented in the context area in a compact way to retain useful information and provide the data continuity. The focus display can be further enriched with various options, such as axes overlays, scatterplot, and nested PCPs. In order to accommodate an arbitrarily large number of dimensions, the context display supports the multi-level stacked view. Finally, we present two innovative ways of enhancing parallel coordinates axes to better understand all variables and their interrelationships in high-dimensional datasets. Histogram and circle/ellipse plots based on uniform and non-uniform frequency/density mappings are adopted to visualize distributions of numerical and categorical data values. Color-mapped axis stripes are designed in the parallel coordinates layout so that correlations can be fully realized in the same display plot irrespective of axes locations. These colors are also propagated to histograms as stacked bars and categorical values as pie charts to further facilitate data exploration. By using the datasets consisting of 25 to 130 variables of different data types we have demonstrated effectiveness of the proposed multivariate visualization enhancements.



Committee Chair

Bijaya B. Karki