Low power and high performance FFT with different radices

Received Jan 25, 2019 Revised Apr 3, 2019 Accepted May 15, 2019 FFT is one of the most active blocks in digital signal processing and in various field of communication systems. FFT has received significant attention over the past years to increase its capability and versatility. This paper describes an extensive study on trade-off of different radices with different computational elements of butterfly such as adders and multipliers. Finding an efficient radix along with computational elements is the key point to find best suite i.e. high precision, low power and low area applications like radar, filtering, image compression etc. The work also considers the precision and the data format to represent constant value such as Q-point. The proposed FFT architectures not only uphold better solutions for low power and high-performance application systems, but also open up a new research lines. This paper demonstrates that radix-2^3 consumes 43% less LUTs and 17% less power consumption, 40% increase of frequency in radix-2^2 in comparison with radix-2 algorithm for the combination of CSA with modified booth multiplier and the increment of frequency about 19%, 26% less LUTs consumption and 26% less power in Radix-2^2 when compared to radix-4 with various combination of adder and multiplier. In this work we have used Xilinx 14.7 XST for synthesis and the target device used is Spartan6 XC6SLX100. Simulation is carried out in Xilinx ISIM and also performed timing analysis and generated post-place and route.


INTRODUCTION
Fast Fourier transform (FFT) was developed by Cooley-Tukey in 1965 and it plays a vital role in many arenas. FFT is an efficient class of computational algorithms to speed up DFT as FFT requires less computations due to its process of recursion known as butterfly [1]. Today's communication market features for strong competition regarding news standards. Fourier transform converts time domain to frequency domain and vice versa, FFT rapidly prefers such transformations. The radix 2 algorithm is well known simple algorithm for FFT processors, but it requires many complex multipliers [2]. As we move on to higher radices the number of twiddle factor decreases. FFT requires more computational elements while computing butterfly units in the radices. As multiplication utilizes large area and consumes more power when implemented on hardware. The complex computational elements should be reduced i.e. the complex multipliers and adders to make efficient FFT processor. The adders are very simple and easy to compute when compared to multipliers. Multiplier is required while multiplying the input with the twiddle factor in every radix [3]. This paper is briefly discussed about the different 16-point radices (radix-2, radix-4, radix-2² and radix-2^3) FFT algorithms using different computational elements (multipliers and adders) and to know the overall impact on FFT processor. Fixed point representation is implemented while multiplying the input with the twiddle factor. Trivial and non-trivial multiplication is existing in the radix-4 FFT algorithms [4,5]. Usage of high radix with higher bit width gives the high precision value in fixed point representation and used in radar applications, encoding the image. The remaining sections in the paper is organized as follows. In section II, the paper describes about the radix-2 algorithm. Section III discussed about the radix-4 algorithm. Section IV explains about the radix-2i algorithms (including radix-2^2 and radix-2^3). Section V clearly gives the idea regarding the proposed work. Section VI represents the synthesis result of different radices and section VII includes conclusion.

RADIX-2 ALGORITHM
Radix-2 FFT algorithm simple radix in used in FFT. The original input vector, x(n) is divided into two N/2 length vectors i.e. even and odd input terms (xe(n), xo(n)) [6,7]. The equation is defined as, X odd (n) = X(2n) X even (n)= X(2n+1) n= 0, 1 N/2-1 The radix-2 DIT FFT is rewritten by deriving the equation The above equation divides radix-2 in even index inputs and the odd index inputs and then combines the two results to produce the entire DFT sequence [8,9]. From the figure, it is observed that the second input gets multiplied with the twiddle factor and added with the first input to get the first output. Similarly, the second output is obtained by subtracting the multiplied term with the first input. Figure 1 shows signal flow graph of 16 Point radix-2 DIT-FFT.

RADIX-FFT ALGORITHM 4.1. Radix-2^2 algorithm
Radix-2^2approach proposed by He and Torkelson. By using linear mapping techniques, the two butterfly units are computed to one butterfly unit in radix-22 [11]. For N=16, radix-22 is computed in two stages but with different twiddle factorswhen compared to radix-2 algorithm. Figure 4 shows signal flow graph of 16 Point radix-2^2 DIF-FFT. Figure 5 shows signal flow graph of 16 Point radix-2^3 DIT-FFT.

PROPOSED WORK
In this paper, the proposed work demonstrates using the different combinations of the computational elements and implementation of those computational elements (multipliers and adders) in different radices such as radix-2, radix-4, radix-2 2 and radix-2^3 FFT architectures for 16-point. The overall impact and performance is considered in different radices using different computational elements. The work focuses on the FFT architecture and the computations to be done in each butterfly unit in the radix using the different combinations of multipliers. From the above signal flow graphs, it is illustrated that to compute twiddle factor with the input, the multiplication is necessary. So efficient multiplier should be considered to have an efficient radix in FFT processors.
Firstly, different computational elements (multipliers and adders) have been studied and utilized in different radices to acquire the efficient FFT architecture. In this paper, the different multipliers used such as Booth multiplier, Modified Booth [13], Canonical signed Digit (CSD) [6], multipliers to compute twiddle factor in butterfly unit. The adder used is carry save adder as it is faster and more efficient when compared to carry-look ahead adder.

Twiddle Factor Multiplication
Twiddle factor multiplication plays significant role in solving the butterfly unit in each stage of the different radices. While multiplying the twiddle factor with the value, an efficient multiplier is used in the radix. Different twiddle factors used in 16-point radices are represented as: From the above figure, it is observed that how the multiplication and addition process occur in the butterfly units in the radix. Wherever the addition requires adder is used in that place and for multiplication different above mention multipliers can used. Twiddle factor values are represented as 0.707, 0.923.0.382, these values are converted into binary form and then represented in Q-format (fixed point representation) [14]. The twiddle factor values represented in Q-format are shown below in the Table 1. Figure 6 shows diagram of butterfly unit.

Fixed point Multiplication
Fixed point multiplication has different method to multiply two signed or unsigned numbers. Fixed point number representation the number of digits or bits are fixed either before or after the radix point i.e. binary point. Figure 7 shows format of fixed-point number.

RESULTS
This paper deliberates the implementation of 16-point radices using the different combination of computational elements in the butterfly unit of FFT architecture. The analysis and comparison are made between different radices of FFT. The functionality of different multiplier modules was verified by running the test benches, simulations and synthesis in Xilinx ISE 14.7 tools using Spartan 6 family. From the above Tables 2 and 3, it is illustrated that the Vedic multiplier consumes less LUTs (502) and canonical signed digit consumes less delay 18.052ns for 16-bits. Carry save adder is faster when compared to carry look ahead adder. From the above Tables 4 and 5, it is observed that Comparison result of different radices (radix-2, radix-2^2 and radix-2^3) in percentage: a) Radix-2^3 consumes 43% less LUTs when compared to the radix-2. b) Radix-2^2 increases the frequency(MHZ) about to 40% when compared to radix-2. c) Radix-2^3 consumes 17% less power(W) when compared to radix-2.

CONCLUSION
This paper has presented an efficient multiplicative and additive method for the FFT algorithm, where various combinations of computational elements were discussed, placing the emphasis on the butterfly units of the different radices such as radix-2, radix-4, radix-2^2 and radix-2^3 FFT algorithms. These different radices algorithm has been realized by Verilog code. The synthesis results shown in the Tables 1 and 2 confirms the efficient radix by computing the butterfly units with different combinations of computational elements. In this paper, the architectures for 16-point FFTs are analyzed, and the architectures with different computational elements are simulated and synthesized. The synthesis result shows the best performance in terms of LUTs and frequency (MHZ). In summary, our proposed work is to reduce power consumption, reduces the area and increases the speed of FFT architectures.