Abstract
The Gradient Vector Flow (GVF) is a feature preserving
spatial diffusion of gradients. It is used extensively
in several image segmentation and skeletonization
algorithms. Calculating the GVF is slow as many iterations
are needed to reach convergence. However, each pixel or
voxel can be processed in parallel for each iteration. This
makes GVF ideal for execution on Graphic Processing
Units (GPUs). In this paper, we present a highly optimized
parallel GPU implementation of GVF written in OpenCL.
We have investigated memory access optimization for
GPUs, such as using texture memory, shared memory and a
compressed storage format. Our results show that this
algorithm really benefits from using the texture memory
and the compressed storage format on the GPU. Shared
memory, on the other hand, makes the calculations slower
with or without the other optimizations because of an
increased kernel complexity and synchronization. With
these optimizations our implementation can process 2D
images of large sizes (5122) in real-time and 3D images
(2563) using only a few seconds on modern GPUs.
spatial diffusion of gradients. It is used extensively
in several image segmentation and skeletonization
algorithms. Calculating the GVF is slow as many iterations
are needed to reach convergence. However, each pixel or
voxel can be processed in parallel for each iteration. This
makes GVF ideal for execution on Graphic Processing
Units (GPUs). In this paper, we present a highly optimized
parallel GPU implementation of GVF written in OpenCL.
We have investigated memory access optimization for
GPUs, such as using texture memory, shared memory and a
compressed storage format. Our results show that this
algorithm really benefits from using the texture memory
and the compressed storage format on the GPU. Shared
memory, on the other hand, makes the calculations slower
with or without the other optimizations because of an
increased kernel complexity and synchronization. With
these optimizations our implementation can process 2D
images of large sizes (5122) in real-time and 3D images
(2563) using only a few seconds on modern GPUs.