High Speed Area Efficient FPGA Implementation of AES Algorithm

ABSTRACT


INTRODUCTION
NIST has started development process of FIPS for AES algorithm stating that this is the replacement for Data Encryption Standard algorithm.Alternatively, this algorithm is also known as Rijndael Algorithm.Rijndael algorithm has the advantages like resistance against all recognized attacks, code and speed compactness and simple design.Cryptography is a process in which the information to be sent is added with secret key so as to transmit the data securely at the destination.There are two types of cryptography based on type of key applied: Symmetric key cryptography and asymmetric key cryptography.In symmetric key cryptography, equal key is utilized for encryption as well as decryption whereas in asymmetric key cryptography, different keys are required in encryption and decryption.AES algorithm is selected for implementation because it is secure, its components and design principles are completely specified.AES is a symmetric key block cipher.Design of AES algorithm is based on linear transformation Due to the use of Rijndael algorithm, different block and key sizes can be selected which was not possible in DES algorithm.Block and key size can be selected from 128/160/192/224/256 bits and need not be the same.According to AES standard, this algorithm can only accept 128 bits of block and key size can be selected from 128/192/256 bits.Based on the key size, number of rounds will vary.For example, if key size is 128, 192 or 256, then number of rounds will be 10, 12 and 14 respectively.Structure of AES algorithm is as shown in Figure 1.In this paper, this algorithm is designed with 128 bits of block size and key size respectively i.e.AES generates cipher text of 128 bits for 128 bits of plaintext.After the initial round, plaintext process through 10 rounds.Each round contains processes like byte substitution, shift rows, mix columns and add round key.

Byte Substitution
The sixteen input bytes are substituted by using fixed look up table known as s-box.Figure 2 shows s-box of AES algorithm.This s-box consists of all possible combinations of 8 bit sequence.The resulting new 16 bytes are organized in a matrix having four rows and four columns.Figure 3 shows byte substitution stage in AES algorithm.

Shift Row
Each row from the matrix generated from the byte substitution is cyclically shifted to the left.Any entry that is dropped off is reinserted to the right side. 1 st row is kept as it is, 2 nd row is shifted by one byte position to the left, 3 rd row is shifted by two byte position to the left and 4 th row is shifted by three byte position to the left.The resultant matrix consists of same 16 bytes but at different position.Figure 4 shows Shift row stage in AES algorithm.

Mix Column
Each column of four bytes is now transformed using special arithmetical function of Galois field (GF) 2 8 .This function takes four bytes of column as input and outputs completely new four bytes that replaces the original four bytes.Figure 5 shows Mix column stage in AES algorithm.

Add Round Key
The sixteen bytes of the resultant matrix generated from mix column stage are then considered as 128 bits.In add round key stage, 128 bits of state are bitwise EX-ORed with 128 bits of round key.If this result belongs to last round, then the output is ciphertext else the resulting 128 bits considered as 16 bytes and another round is started with new byte substitution process.This is a column wise operation between four bytes of state column and one word of round key.In the last round, there is no mix column step.Figure 6 shows add round key stage in AES algorithm.
Decryption of cipher text generated from AES encryption contains all the stages in encryption but in reverse order.AES decryption starts with inverse initial round.Remaining nine rounds in decryption consists of processes like add round key, inverse shift rows, inverse byte substitution and inverse mix columns.Add round key: Add round key has its own inverse function since XOR functions its own inverse and the round keys should be selected in reverse order.Inverse shift rows: Inverse shift rows functions exactly in the same way as shift row stage but in opposite direction.The 1st row is kept as it is, 2nd row is shifted by one byte position to the right, 3rd row is shifted by two byte position to the right and 4th row is shifted by three byte position to the right.The resultant matrix consists of same 16 bytes but at different position.Figure 7 shows Inverse Shift row stage in AES algorithm.Inverse byte substitution: Inverse byte substitution is done using predefined substitution table known as inverse s-box.Figure 8 shows inverse s-box in AES algorithm.Inverse mix column: Transformation in inverse mix column is done using polynomials of degree less than 4 over Galois field (GF) 2 8 in which coefficients are the elements from the column of the state.[2] proposed implementation of AES algorithm with low power MUX LUT based s-box on FPGA.This design achieved total power distribution of 0.55 W. A. Agarwal et al [4] suggested implementation of AES algorithm using Verilog on Spartan3E FPGA.This design utilizes 1464 slices.U. Farooq et al [5] discussed implementation of AES algorithm on FPGA device using five different techniques which are suitable for area critical applications and speed critical applications.This design was implemented on Spartan-6 FPGA device and it utilizes 161 slices at maximum operating frequency is 886.64 MHz.The throughput of this system is 113.5 Gbps.N. S. Sai Srinivas et al [6] proposed less complex hardware implementation of AES Rijndael algorithm on Xilinx Virtex-7 XC7VX90T FPGA.In the proposed design, synthesis tool was set to optimize speed, area and power.
Nishtha Mathur et al [7] proposed a cryptosystem which is a combination of AES algorithm and ECC.This is a hybrid encryption scheme and the key size is 192 bits and there are 12 number of iterations in this system.K. Kalaiselvi et al [8] proposed low power and high throughput FPGA implementation of AES algorithm using key expansion technique.This design accepts key size of 256 bits for both encryption and decryption.This design utilizes 5493 slices and its maximum operating frequency is 277.4MHz.The throughput of this system is 0.06 Gbps.H. S. Deshpande et al [9] suggested BRAM based FPGA based implementation of AES algorithm.Due to use of BRAMs for implementing s-box, this design utilizes less number of slices.The design was implemented on XC3S1400AN and it utilizes 3376 slices.Atef Ibrahim [10] presented FPGA implementation of AES encryption core that is suitable for limited resource limited applications.This design was implemented on Spartan-3 and it utilizes 150 slices at maximum operating frequency of 90 MHz.Khose P. N. et al [11] proposed implementation of AES algorithm on FPGA in order to achieve high speed of data processing and also to reduce time for generating key.This design utilizes 201 slices and 2 BRAMs at maximum operating frequency of 70 MHz.A. O. Mulani et al [12] proposed FPGA implementation of DES algorithm.The design was implemented on XC2S200 and it utilizes 2118 slices and 97 IOBs.Yewale Minal J. et al [13] proposed implementation of AES encryption using VHDL and decryption using Visual basic.With this approach, 1403 slices are utilized at maximum operating frequency of 160.875 MHz and it has a throughput of 2.059 Gbps.H. S. Deshpande et al [14] discussed FPGA based optimized architecture that utilizes less area.This design was intended for plaintext of 128 bits and key of 128 bits. A. R. Tonde et al [15] discussed FPGA based implementation of AES algorithm using iterative looping approach for 128 bits of block and key size.Sonali A. Varhade et al [18] proposed FPGA based AES algorithm which utilizes 1746 logic elements and 32768 memory bits.This design was synthesized on Cyclone-II using Altera.Salim M Wadi et al [19] proposed some modifications like decreasing number of rounds and replacing S-box with new s-box to reduce hardware requirements in order to enhance the performance of AES algorithm in terms of time ciphering and pattern appearance.
Wei Wang et al [21] suggested high speed implementation of AES algorithm on FPGA to transmit the data securely using pipelining and parallel processing methods.Shylashree N. et al [22] focused on various novel FPGA architectures of AES algorithm.Borkar A. M. et al [23] proposed iterative design approach for FPGA implementation of AES algorithm using VHDL.This design utilizes 1853 slices and its operating frequency is 140.390MHz. A. M. Deshpande et al [24] presented very low complexity FPGA base architecture for integrated AES encryptor and decryptor.This design is synthesized on Spartan-3 XC3S400 FPGA.S. Kaur et al [25] suggested an efficient implementation of AES algorithm on FPGA in which multiple rounds are processed simultaneously.Due to this implementation, speed is increased but it increases area.This design utilizes 6279 slices and 5 BRAMs and its operating frequency is 119.954MHz.Sounak Samanta [26] proposed fast and efficient reconfigurable platform based implementation of AES algorithm using pipelining.This design utilizes 1051 slices and 11 BRAMs and its operating frequency is 76.699 MHz.T. Good et al [27] discussed hardware implementation of fastest and slowest AES algorithm which utilizes 16,693 slices at maximum operating frequency of 184.8 MHz.

IMPLEMENTATION OF PROPOSED DESIGN
The proposed design is implemented with the aim to achieve both area and speed optimization.This is achieved by generating the keys required for each round using MATLAB and then the keys are used in the VHDL code.Due to this approach, the design occupies less number of slices and also the speed is faster as compared to normal approach.The design is implemented using Xilinx system generator.

RTL Schematic
The design is synthesized using Xilinx XST synthesizer.In the proposed design, an optimized and synthesizable VHDL code for the implementation of image as well as 128-bit data encryption is developed so as to utilize less area and increase the speed.Table 1 shows design utilization summary of proposed design.From the synthesis results of the proposed design, it is clear that this system utilizes only 121 slice registers and its maximum operating frequency is 1102.536MHz.The throughput of the system is calculated using the following formula: (1) By substituting the values in equation ( 1), throughput of the systems is 14.1125 Gbps.

PERFORMANCE ANALYSIS
Performance analysis is must to compare the performance of proposed implementation with existing methods.The performance is compared on the basis of area and operating frequency.Till date various researchers have worked on FPGA based implementations of AES algorithm, some of them have optimized speed and some have optimized area.In this proposed system, both area and speed is optimized.Table 2 shows performance comparison of proposed system with previous work.

CONCLUSION
In this paper, fast and secure implementation of AES algorithm on FPGA is suggested.As per the literature survey, it is clear that [5] achieves better performance in terms of speed whereas [10] achieves better performance in terms of area.In this design, due to offline key generation and better Xilinx System Generator based design the system is optimized and it utilizes only 121 slice registers at maximum operating frequency of 1102.536MHz.Also, throughput of the proposed system is 14.1125 Gbps.

Figure 1 .
Figure 1.Structure of AES algorithm

Figure 6 .
Figure 6.Add round key stage

Figure 8 .
Figure 8. Inverse S-box of AES Algorithm

Figure 9 .
Figure 9. System Generator based Simulink model for AES algorithm

Figure 10 .
Figure 10.Detailed RTL Schematic of AES algorithm

Table 1 .
Design Utilization Summary

Table 2 .
Performance Comparison of Proposed System with Previous Work