A parallel and accurate method for large-scale image segmentation on a cloud environment

Document Type

Article

Publication Date

2-1-2022

Abstract

In this paper, we present a parallel algorithm for SLIC on Apache Spark, which we call PSLIC-on-Spark. To this purpose, we have extended the original SLIC algorithm to use the operations in Apache Spark, supporting its parallel processing on multiple executors in the Apache Spark cluster. Then, we analyze the trade-off relationship of PSLIC-on-Spark between its processing speed and accuracy due to partitioning of the original image datasets. Through experiments, we verify the trade-off relationship. Specifically, we show that PSLIC-on-Spark using 8 CPU cores reduces the processing time of SLIC by 2.24–2.93 times while it reduces the boundary recall (BR) of SLIC by 1.54–6.32% and increases under-segmentation error (UE) by 1.79–6.2%. Then, we propose an improved algorithm of PSLIC-on-Spark that improves the accuracy of PSLIC-on-Spark, which we call PASLIC-on-Spark. We employ two important features for PASLIC-on-Spark. It contains two main features: (1) image partitioning considering the shape and position of the clusters rather than a evenly partitioning method and (2) controllable duplication for the boundary between image partitions. Through experiments, we show the accuracy and efficiency of PASLIC-on-Spark on an actual cloud environment configured with 8 worker nodes using Amazon AWS. The experimental results indicate that PASLIC-on-Spark improves the accuracy of PSLIC-on-Spark by 3.66–3.77% of BR and 1.39–1.96% of UE. PASLIC-on-Spark still decreases that of processing time SLIC significantly 1.5–1.67 times on a single-node configuring using 8 CPU cores and 1.18–1.26 times on a cloud environment using 8 computing nodes.

Publication Source (Journal or Book title)

Journal of Supercomputing

First Page

4330

Last Page

4357

This document is currently not available here.

Share

COinS