High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm
The fast Fourier transform (FFT) algorithm is considered as an important element of the processor’s basic software ecology, and it is widely applied in the field of engineering, science, physics and mathematics. Meanwhile, the requirements for the performance of FFT in these applications...
Main Author: | |
---|---|
Format: | Article |
Language: | zho |
Published: |
Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
2022-06-01
|
Series: | Jisuanji kexue yu tansuo |
Subjects: | |
Online Access: | http://fcst.ceaj.org/fileup/1673-9418/PDF/2011092.pdf |
_version_ | 1811241799448002560 |
---|---|
author | GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng |
author_facet | GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng |
author_sort | GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng |
collection | DOAJ |
description | The fast Fourier transform (FFT) algorithm is considered as an important element of the processor’s basic software ecology, and it is widely applied in the field of engineering, science, physics and mathematics. Meanwhile, the requirements for the performance of FFT in these applications are also continuously rising. Therefore, it is of definite significance to study the high-performance implementation of FFT algorithm, especially the high-performance implementation of large radices of FFT in ARMv8 and X86-64, and to improve the calculation performance of FFT algorithm. In view of the architectural features of the ARMv8 and X86-64 computing platforms, this paper studies the high-performance implementation and optimization methods of the FFT algorithm. Through the application of butterfly network optimization, large radices network stages decrease, large radices butterfly computation optimization, SIMD (single instruction multiple data) assembly optimization, and register usage optimization methods, this paper effectively improves the performance of the FFT algorithm, considerably improves the calculation performance of the large radices of FFT, and solves the performance bottlenecks of insufficiency of register resources. Lastly, the summary of a set of Cooley-Tukey FFT algorithm high-performance implementation strategies and optimization solutions is made. The experimental results indicate that for the ARM and X86-64 processors, the FFT algorithm implemented can achieve a significant improvement in performance compared with ARMPL (ARM performance library), Intel MKL (math kernel library) and FFTW (fastest Fourier transform in the West) and can achieve a significant improvement in performance compared with small and medium radices. |
first_indexed | 2024-04-12T13:41:40Z |
format | Article |
id | doaj.art-dedbd87ff16746849d61f47674267132 |
institution | Directory Open Access Journal |
issn | 1673-9418 |
language | zho |
last_indexed | 2024-04-12T13:41:40Z |
publishDate | 2022-06-01 |
publisher | Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press |
record_format | Article |
series | Jisuanji kexue yu tansuo |
spelling | doaj.art-dedbd87ff16746849d61f476742671322022-12-22T03:30:50ZzhoJournal of Computer Engineering and Applications Beijing Co., Ltd., Science PressJisuanji kexue yu tansuo1673-94182022-06-011661304131510.3778/j.issn.1673-9418.2011092High-Performance Implementation and Optimization of Cooley-Tukey FFT AlgorithmGUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng01. College of Data Science, Taiyuan University of Technology, Taiyuan 030024, China;2. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, ChinaThe fast Fourier transform (FFT) algorithm is considered as an important element of the processor’s basic software ecology, and it is widely applied in the field of engineering, science, physics and mathematics. Meanwhile, the requirements for the performance of FFT in these applications are also continuously rising. Therefore, it is of definite significance to study the high-performance implementation of FFT algorithm, especially the high-performance implementation of large radices of FFT in ARMv8 and X86-64, and to improve the calculation performance of FFT algorithm. In view of the architectural features of the ARMv8 and X86-64 computing platforms, this paper studies the high-performance implementation and optimization methods of the FFT algorithm. Through the application of butterfly network optimization, large radices network stages decrease, large radices butterfly computation optimization, SIMD (single instruction multiple data) assembly optimization, and register usage optimization methods, this paper effectively improves the performance of the FFT algorithm, considerably improves the calculation performance of the large radices of FFT, and solves the performance bottlenecks of insufficiency of register resources. Lastly, the summary of a set of Cooley-Tukey FFT algorithm high-performance implementation strategies and optimization solutions is made. The experimental results indicate that for the ARM and X86-64 processors, the FFT algorithm implemented can achieve a significant improvement in performance compared with ARMPL (ARM performance library), Intel MKL (math kernel library) and FFTW (fastest Fourier transform in the West) and can achieve a significant improvement in performance compared with small and medium radices.http://fcst.ceaj.org/fileup/1673-9418/PDF/2011092.pdf|fast fourier transform (fft)|armv8|x86-64|fftw|simd optimization |
spellingShingle | GUO Jinxin, ZHANG Guangting, ZHANG Yunquan, CHEN Zehua, JIA Haipeng High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm Jisuanji kexue yu tansuo |fast fourier transform (fft)|armv8|x86-64|fftw|simd optimization |
title | High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm |
title_full | High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm |
title_fullStr | High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm |
title_full_unstemmed | High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm |
title_short | High-Performance Implementation and Optimization of Cooley-Tukey FFT Algorithm |
title_sort | high performance implementation and optimization of cooley tukey fft algorithm |
topic | |fast fourier transform (fft)|armv8|x86-64|fftw|simd optimization |
url | http://fcst.ceaj.org/fileup/1673-9418/PDF/2011092.pdf |
work_keys_str_mv | AT guojinxinzhangguangtingzhangyunquanchenzehuajiahaipeng highperformanceimplementationandoptimizationofcooleytukeyfftalgorithm |