Sorting is a very important task in computer science and becomes a critical operation for programs that make heavy use of sorting algorithms, in particular when dealing with huge amounts of data. Generalpurpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU-based implementations of the quicksort algorithm were presented in literature: the GPU-quicksort, a CUDA iterative implementation, and the NVIDIA CUDA Dynamic Parallel (CDP) advanced quicksort, a recursive implementation. In this article we propose CUDA-Quicksort a new blockoriented iterative GPU-based implementation of the sorting algorithm. CUDA-Quicksort has been designed starting from GPU-Quicksort. Unlike GPU-Quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA-Quicksort is up to four times faster than the iterative GPU-Quicksort and up to three times faster than the recursive NVIDIA CDPQuicksort. An in depth analysis of the performance between the proposed CUDA-Quicksort and the GPUQuicksort show that the main improvement is related to the optimized GPU memory access rather than to the use of the atomic primitives. Moreover, with the aim to assess the advantages of using the CUDA dynamic parallelism, we also implemented a recursive version of the CUDA-Quicksort. Experimental results show that the proposed implementation is faster than the one provided by NVIDIA, with better performance achieved using the iterative implementation.

CUDA-quicksort: An improved GPU-based implementation of quicksort

MANCA, EMANUELE;MANCONI, ANDREA;ARMANO, GIULIANO;
2016-01-01

Abstract

Sorting is a very important task in computer science and becomes a critical operation for programs that make heavy use of sorting algorithms, in particular when dealing with huge amounts of data. Generalpurpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU-based implementations of the quicksort algorithm were presented in literature: the GPU-quicksort, a CUDA iterative implementation, and the NVIDIA CUDA Dynamic Parallel (CDP) advanced quicksort, a recursive implementation. In this article we propose CUDA-Quicksort a new blockoriented iterative GPU-based implementation of the sorting algorithm. CUDA-Quicksort has been designed starting from GPU-Quicksort. Unlike GPU-Quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA-Quicksort is up to four times faster than the iterative GPU-Quicksort and up to three times faster than the recursive NVIDIA CDPQuicksort. An in depth analysis of the performance between the proposed CUDA-Quicksort and the GPUQuicksort show that the main improvement is related to the optimized GPU memory access rather than to the use of the atomic primitives. Moreover, with the aim to assess the advantages of using the CUDA dynamic parallelism, we also implemented a recursive version of the CUDA-Quicksort. Experimental results show that the proposed implementation is faster than the one provided by NVIDIA, with better performance achieved using the iterative implementation.
2016
CUDA; GPU; High performance computing; Quick sort; Computer Networks and Communications; Computer Science Applications1707 Computer Vision and Pattern Recognition; Software; Computational Theory and Mathematics; Theoretical Computer Science
File in questo prodotto:
File Dimensione Formato  
CUDA-Quicksort.pdf

Solo gestori archivio

Tipologia: versione editoriale (VoR)
Dimensione 2 MB
Formato Adobe PDF
2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/134254
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 11
social impact