Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data

SangWoo An;Seog Chung Seo;An; SangWoo;Seo; Seog Chung;

doi:10.3390/app10113711

Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data

Clicks: 182

ID: 269938

2020

Free PDF

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Emerging Content

30.0 /100

180 views

23 readers

AI Quality Assessment

Not analyzed

Abstract

EN
- Turkish
- Spanish
- Portuguese
- Arabic
- Chinese
- French
- German
- Indonesian
- Russian
- Thai

With the advent of IoT and Cloud computing service technology, the size of user data to be managed and file data to be transmitted has been significantly increased. To protect users’ personal information, it is necessary to encrypt it in secure and efficient way. Since servers handling a number of clients or IoT devices have to encrypt a large amount of data without compromising service capabilities in real-time, Graphic Processing Units (GPUs) have been considered as a proper candidate for a crypto accelerator for processing a huge amount of data in this situation. In this paper, we present highly efficient implementations of block ciphers on NVIDIA GPUs (especially, Maxwell, Pascal, and Turing architectures) for environments using massively large data in IoT and Cloud computing applications. As block cipher algorithms, we choose AES, a representative standard block cipher algorithm; LEA, which was recently added in ISO/IEC 29192-2:2019 standard; and CHAM, a recently developed lightweight block cipher algorithm. To maximize the parallelism in the encryption process, we utilize Counter (CTR) mode of operation and customize it by using GPU’s characteristics. We applied several optimization techniques with respect to the characteristics of GPU architecture such as kernel parallelism, memory optimization, and CUDA stream. Furthermore, we optimized each target cipher by considering the algorithmic characteristics of each cipher by implementing the core part of each cipher with handcrafted inline PTX (Parallel Thread eXecution) codes, which are virtual assembly codes in CUDA platforms. With the application of our optimization techniques, in our implementation on RTX 2070 GPU, AES and LEA show up to 310 Gbps and 2.47 Tbps of throughput, respectively, which are 10.7% and 67% improved compared with the 279.86 Gbps and 1.47 Tbps of the previous best result. In the case of CHAM, this is the first optimized implementation on GPUs and it achieves 3.03 Tbps of throughput on RTX 2070 GPU.

Reference Key	an2020appliedhighly Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	SangWoo An;Seog Chung Seo;An, SangWoo;Seo, Seog Chung;
Journal	applied sciences
Year	2020
DOI	10.3390/app10113711 Searching for DOI...
URL	https://www.mdpi.com/2076-3417/10/11/3711 https://doi.org/10.3390/app10113711
Keywords	aes cuda parallel processing cham lea graphic processing unit (gpu) counter (ctr) mode

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

Login to comment Register

No comments yet. Be the first to comment on this article.