Approximate arithmetic circuits

Received May 4, 2020 Revised Jun 5, 2020 Accepted Jul 27, 2020 Low power consumption is the necessity for the integrated circuit design in CMOS technology of nanometer scale. Recent research proves that to achieve low power dissipation, implementation of approximate designs is the best design when compared to accurate designs. In most of the multimedia applications, DSP blocks has been used as the core blocks. Most of the video and image processing algorithms implemented by these DSP blocks, where result will be in the form of image or video for human observing. As human sense of observation is less, the output of the DSP blocks allows being numerically approximate instead of being accurate. The concession on numerical exactness allows proposing approximate analysis. In this project approximate adders, approximate compressors and multipliers are proposed. Two approximate adders namely PA1 and PA2 are proposed which are of type TGA which provides better results like PA1 comprises of 14 transistors and 2 error distance, achieves reduction in delay by 64.9 % and reduction in power by 74.33% whereas the TGA1 had 16 transistors and more power dissipation.PA2 comprises of 20 transistors and 2 error distance. Similarly, PA2 achieves delay reduction by 51.43%, power gets reduced by 67.2%. PDP is reduced by 61.97 % whereas TGA2 had 22 transistors. Approximate 4-2 compressor was proposed in this project to reduce number of partial product stages. The compressor design in circuit level took 30 transistors with 4 errors out of 16 combinations whereas existing compressor design 1 took 38 and design 2 took 36 transistors. By using the proposed adder and compressors, approximate 4x4 multiplier is proposed. The proposed multiplier achieves delay 124.56 (ns) and power 29.332 (uW) which is reduced by 68.01 % in terms of delay and 95.97 % in terms of power when compared to accurate multiplier.


INTRODUCTION
Digital Signal Processing (DSP) blocks are most commonly used in multimedia applications whose output is in the form of image for human recognition [1,2]. The output image need not to be numerically accurate for human sense. This allows us to perform approximate computation to reduce power consumption over conventional designs. As Adders and multipliers are main components in an ALU, those arithmetic circuits will be responsible for overall performance of processor. So in order to achieve better performance and reduced power consumption design of approximate arithmetic circuits is needed. Various high-speed conventional adders such as carry look ahead adders (CLAs) and multipliers like Wallace tree multipliers, ISSN: 2089-4864  Approximate arithmetic circuits (Navabharat Reddy. G) 185 only one error out of seven possible inputs. The modified K-Map is presented with change in one output bit of an accurate 2x2 multiplier. From the approximate multiplier complexity of the design is reduced and also critical path is reduced when compared to accurate multiplier. In [11] discuss about novel multiplier design which involves multiplication of group of coefficients in DSP blocks. 4x4 modified array multiplier with reduced switching activity is proposed. The proposed multiplier in uses new adder blocks by adding multiplexers to the existing blocks. The design achieves 50% less power con-sumption when compared to conventional multiplier. In [12] 4x4 conventional array and vedic multipliers were proposed and performs spice simulations. Simulation results shows that vedic multiplier achieves 29% reduction in power when compared to array multiplier. Hardware complexity of array multiplier is more when compared to vedic multiplier. In [4] two novel approximate 4-2 compressors were implemented. The accurate 4-2 compressor requires 52 transistors which is implemented in [13] and existing approximate compressors requires 38 and 36 for compressor 1 and compressor 2 respec-tively. The proposed compressors in [4] were implemented in circuit level using transmission gate based technology in hspice tool. The approximate compressors uses very less number of transistors, achieves less critical path delay and also power consumption is very less when compared to exact 4-2 compressors. The compressors were used as main block in implementing approximate dadda multiplier. Two dadda multiplier designs were implemented in [4]. In one of the multiplier design compressor 1 is used in LSBs and in other design compressor 2 is used in LSBs. Normalized Error Distance (NED) is calculated for the dadda multipliers and compared with other multipliers. The application of these multipliers in image processing is presented by multiplying two images.
In [14] error tolerant multiplier was proposed, in this method input bits are divided into multiplier and non-multiplier parts, the multiplier part consists of MSBs and the non-multiplier part consists of LSBs. The size of the error tolerant multiplier is 12 bits. Accuracy, area and power of conventional 12 bit multiplier and 12 bit error tolerant multiplier were compared and tabulated. Various new terminologies like Minimum Ac-ceptable Accuracy (MAA) and Acceptance Probability (AP) were used in [14]. For MSBs normal multiplication method is applied, whereas for LSBs a new method is applied in which no partial products were generated and carry propagation path is removed. From simulation results the 12 bit error tolerant multiplier drastically reduces power from 52% to 94% depending on input transitions and also reduces the area overhead. In [13] designed low power 4-2 and 5-2 compressors. The accurate 4-2 compressor and accurate 5-2 compressors both can operate at low supply voltage of 0.6v. The 4-2 compressors consist of three XOR-XNOR blocks, two MUX blocks and one XOR block. The transistor count of 4-2 compressor is 52. Similarly the 5-2 compressors consist of five XOR-XNOR blocks, three MUX blocks and one XOR block. The compressors were implemented in transmission gate based technology and compared with the existing compressors which were implemented in CMOS style, simulation results shows that the compressors 4-2 and 5-2 in [13] achieves low power dissipation and less hardware complexity. In [15,16] analysis and design of three new approximate 4-2 compressors were proposed by changing logic in accurate compressor for use of those compressors in multiplier. The design had reduction in power dissipation and transistor count compared to exact design. The compressor achieves better accuracy when com-pared to accurate compressor. An 8 bit approximate dadda multiplier is implemented in [15] in which both approximation and truncation methods were used for reducing the partial product stages. The multiplier design in such a way that 4 bits in LSB is truncated and the next four bits uses approximate compressors. For the MSBs accurate compressors were used. Hence in total the approximate multiplier uses 9 accurate, 8 approximate compressors, 3 full adders and 2 half adders. The use of approximate and truncation compressors reduces power dissipation and area overhead when compared to accurate multiplier.

PROPOSED METHOD
This section deals with working of PA1, PA2, proposed approximate 4-2 compressor and proposed 4*4 approximate multiplier.

Proposed adder design 1 (PA1)
As transmission gate passes strong 0 and strong 1 it is used as alternate style to pass transistor. PA1 consists of TGA based multiplexer for designing of XOR/XNOR modules and also consists of inverters. In [5] the Sum and Carry expressions of TGA1 is given as In PA1 the sum and carry expression is modified as In sum expression the second term (XȲ ) is removed and carry expression is connected to input X for PA1. The total number of transistors used are 14 were 2 transistors less when compared to existing TGA1 in [5]. The error table of PA1 is shown in Table 1.
From Table 1 we can observe that the PA1 have 2 errors in sum and 2 errors in carry. we observe that when inputs X and Y are 1 the carry will get error output, when X and Y are 1 and 0 respectively the sum will get error. When both X and Y are 0 the sum and carry both gets error output. The circuit level implementation of PA1 is shown in Figure 1.

Proposed adder design 2 (PA2)
In [5] the sum and carry expressions of TGA2 is given as The PA2 requires 20 transistors which is less than that of TGA2 in [3]. The sum and carry expressions of PA2 is given by The carry expression of PA2 is same as exact full adder carry expression. Hence there are zero errors in carry. The sum is obtained by inverting carry. The error table of PA2 is shown below. From Table 2 when X and Y are 0s then we will get error at sum. When X and Y are 1 and 0 respectively then sum output has error. Carry output has zero errors. The circuit level implementation of PA2 is shown in Figure 2. It contains transmission gate-based multiplexer for implemen-tation of carry and one inverter for implementation of sum output.

Proposed approximate 4-2 compressor
In [13] exact 4-2 and 5-2 compressors were implemented. The exact 4-2 compressor have five inputs and three outputs. The four inputs X1, X2, X3, X4 and output sum have same weight. The output carry size is one bit more. The 4-2 compressor receives an input Cin from the last cell which is of one bit size lower, and produces an output Cout to the later stage compressor cell which is of size higher. The different forms of 4-2 compressor follows the fundamental equation given by: The conventional 4-2 compressor composed of two serially connected full adders. It mainly composed of six modules. Two modules are 2-1 mux, one module is XOR and three modules are XOR-XNOR. The circuit level implementation of 2-1 mux comprises of 8 transistors, for XOR-XNOR circuit 10 transistors and for XOR circuit 6 transistors. So totally 52 transistors for implementing exact 4-2 compressor in transmission gate based technology. The sum, carry and cout expressions of exact 4-2 compressor is given by: In order to reduce the no. of transistors and to reduce the power consumption the approximation of compressors were implemented in [4,15] with relaxation in accuracy. Two approximate 4-2 compressors were designed in [4] by reducing the transistors to 38 and 36 respectively. The design 1 has 12 errors out of 35 combinations has error. The design 2 has 4 errors out of 16 combinations. But there was drastic reduce in power consumption. The presence of errors in approximate designs will not affect the image clarity and also for human perception there is no need of exactness of image. In this paper new approximate 4-2 compressor is implemented. The proposed circuit level compressor took only 30 transistors which is very much less when compared to existing with same number of errors. The expression of sum and carry for existing design 1 in [4] were given by: For design 2 compressor the sum and carry expressions in [14] were given by: In proposed approximate 4-2 compressor the sum expres-sion and carry expression is modified in such a way that num-ber of transistors is reduced to 30. sum and carry Expressions for proposed 4-2 compressor is given by: The proposed compressor does not have Cin and Cout, which were present in existing compressors [4, 13 15]. The circuit level implementation of proposed design in shown in Figure 3. In above diagram two blocks comprises of XOR circuit which took 10 transistors for implementing each block. The next two blocks is OR function which took 8 transistors and it is implemented by MUX based transmission gate technology. The last block will perform AND function. Hence the above proposed design is implemented with 30 transistors less than that of [4, 13 15]. The error table for proposed approximate 4-2 is shown Table 3.
From Table 4 we can observe that there are 4 errors out of 16 combinations which is same as we observed in existing approximate 4-2 compressor but the main advantage in the proposed method is the reduction of transistors is achieved. If the inputs are 0111 then its output will be equal to 11 but the output we get is 10, so difference is 1. Similarly for remaining three cases the actual output is not equal to the output obtained from the proposed design. The error table for proposed approximate multiplier is shown below in Table 4.  ED  0000  0000  00000000  00001110  3  0101  0111  00100011  00101110  3  1111  1111  11100001  11100111  2  1111  1011  10111101  10110011  3  1011  1001  01100011  01101101  3  1010  1101  10000010  10001111  3  1110  1101  10100100  10100010  2  0110  1000  01110010  01111110  2  0100  1000  01000000  01001110  3  0001 0100 00000010 00000110 1

Proposed approximate 4x4 multiplier design
In literature various multiplier designs in circuit level is implemented namely array multiplier, vedic multiplier and dadda multiplier. All the exact multiplier design process too place in three stages: − An CMOS logic style based AND gate is used for generating the partial products is the first stage. − Use of exact/approximate 4-2/5-2 compressors to deduce number of partial product stages in multiplier design. − Using accurate half adder, accurate full adder or any other approximate adder designs like PA1/PA2 to add the partial products and to generate the final output.
In [17] 2x2 multiplier were implemented and error were intro-duced in the design by manipulating its logic function. The K-Map is modified in such a way that when all inputs were 1 the output product usually it will be equal to 1001 (9), but modified this result to 111 (7), it results in reduction of critical path by two gates compared to accurate 2x2 multiplier. With [17] as reference in this paper approximate 4*4 multiplier has been proposed. In this design the proposed approximate adder and approximate compressor are used as basic blocks. The multiplier shown in 4 is of size 4 bits. There are 2 inputs A, B each of size 4 bits and output product out of size 8 bits. The circuit level of proposed 4*4 array multiplier is implemented with accurate 1 bit transmission gate based half adder, PA1,PA2 and approximate 4-2 compressor. The total possible input combinations were tabulated in error table. The most of the errors were found in LSBs. Out of 256 combinations 32 combinations were found error according to error table. The circuit level implementation of proposed multiplier design is shown in Figure 4.
Accuracy: Accuracy of an adder defines how much percent-age is the output of an approximate adder for an particular input. The value ranges from 0% to 100%.

RESULTS AND DISCUSSION
The proposed approximate adders (PA1, PA2), approximate 4-2 compressor and proposed approximate multiplier were implemented in circuit level in Cadence Virtuoso tool in gpdk 90nm technology. The results obtained is compared with existing adders, comparators and multipliers and observed that the proposed PA1 and PA2 requires less transistors with minimal errors in sum and carry expressions. As transistor number reduces the power of PA1 and PA2 also reduces when compared to existing adders. In proposed approximate 4-2 compressor the no. of transistors requires is reduced by modifying the sum and carry expression logic in circuit level. Th no. of transistors required is reduced to 30 as compared to existing compressors in [4] which requires 38 for existing compressor 1 and 36 for existing compressor 2. As transistor count is reduced to 30, power consumption also reduces drastically. As the proposed approximate multiplier uses PA1, PA2 and approximate 4-2 compressor, the transistor count of proposed multiplier also reduces by large number when compared to accurate multilplier.

Proposed adder design (PA1)
The schematic of PA1 consists of 14 transistors. The circuit level of PA1 is implemented in cadence virtuoso tool in gpdk 90nm technology.
From Figure 5 we can observe that transmission gate based xnor circuit and AND gate were implemented to get sum output. The carry output is obtained by directly connecting to input X.

Proposed adder design (PA2)
The PA2 comprises of 20 transistors which is less than existing design 2 adder in [3] which requires 22 transistors. The circuit level of PA2 is implemented in cadence virtuoso tool in gpdk 90nm technology. The schematic on Figure 7 consists of MUX based OR and AND function for implementing carry expression and sum is inverted output of carry. PA2 also implemented in transmission gate based technology.

Proposed approximate 4-2 compressor
The proposed compressor comprises of 30 transistors im-plemented in transmission gate based technology. The circuit level implementation of approximate compressor is shown in Figure 9. It consists of four inputs and two outputs. The compressor consists of XOR, OR and AND logic functions which were implemented with MUX based transmission gate technology. The circuit level is implemented in cadence virtuoso tool in gpdk 90nm technology.

Proposed approximate 4 bit ripple carry adder using PA1
By using proposed adder design (PA1), 4 bit ripple carry adder is proposed. The proposed 4 bit adder is compared with accurate ripple carry adder. From Figure 12 we observed that more number of errors were present at LSBs than im MSBs. The circuit level of 4 bit RCA using PA1 is implemented in cadence virtuoso tool in gpdk 90nm technology. The transistor count of proposed 4 bit RCA is 56. Power consumption of proposed design is reduced when compared to accurate ripple carry adder. The schematic of proposed RCA is shown in Figure 11.

Proposed approximate 4 bit ripple carry adder using PA2
By using proposed adder design (PA2), 4 bit ripple carry adder is proposed. The proposed 4 bit adder is compared with accurate ripple carry adder. From Figure 14 we observed that more number of errors were present at LSBs than in MSBs. The circuit level of 4 bit RCA using PA2 is implemented in cadence virtuoso tool in gpdk 90nm technology. The transistor count of proposed 4 bit RCA is 80. Power consumption of proposed design is reduced when compared to accurate ripple carry adder. The schematic of proposed RCA is shown in Figure 13.

Proposed approximate multiplier using PA1 and compressor
The proposed approximate multiplier is of size 4*4. The proposed multiplier consists of accurate half adder, proposed transmission gate based approximate adder design (PA1) and proposed approximate 4-2 compressor as the main blocks. The circuit level implementation of the proposed multiplier is shown in Figure 15. The design is implemented in cadence virtuoso tool in 90nm technology. The design consists of two inputs A and B with size of 4 bits and product of size 8 bits. From error table we observed that from total 256 combinations, errors were found in 32 combinations. Most of the errors were found in LSBs.
The waveform of approximate multiplier using PA1 is shown in Figure 16. The inputs of the design are A and B each of size 4 bits. The output is product with size 8 bits. From Figure 16 observed that most of the errors were present in LSBs so that the proposed design does not affect the image quality and also accuracy does not get affected. The transistor count of proposed design is 246 which is huge less than that of accurate multiplier, and also power reduces significantly in approximate multiplier.

Proposed approximate multiplier using PA2 and compressor
The proposed approximate multiplier is of size 4*4. The proposed multiplier consists of accurate half adder, proposed transmission gate based approximate adder design (PA2) and proposed approximate 4-2 compressor as the main blocks. The circuit level implementation of the proposed multiplier is shown in Figure 17. The design is implemented in cadence virtuoso tool in 90nm technology. The design consists of two inputs A and B with size of 4 bits and product of size 8 bits. From error table we observed that from total 256 combinations, errors were found in 32 combinations. Most of the errors were found in LSBs.
The waveform of approximate multiplier using PA2 is shown in Figure 18. The inputs of the design are A and B each of size 4 bits. The output is product with size 8 bits. From Figure 18 observed that most of the errors were present in LSBs so that the proposed design does not affect the image quality and also accuracy does not get affected. The transistor count of proposed design is 282 which is huge less than that of accurate multiplier, and also power reduces significantly in approximate multiplier.

Comparison with existing designs
The proposed adders, compressors and multipliers were compared in terms of number of transistors, power, delay and Power Delay Product (PDP).

Comparison of Number of Transistors for proposed designs
From Table 5 we can observe that [3] proposes adder based on transmission gate based technology, were TGA [1] uses 16 and TGA [2] uses 22 transistors with 2 error distance. Similarly in this paper also uses transmission gate based technology but with reduced number of transistors when compared to [3] or adders in which there is more voltage drop for cascading stages. From Table 6 we can observe that [13] implemented circuit level exact compressor design which consists of XOR-XNOR module, MUX and XOR blocks. In total exact 4-2 compressor requires 52 transistors. In [4] two approximate designs were implemented in which design 1 requires 38 transistors with Cin and Cout and design 2 requires 36 transistors without Cin and Cout. The proposed design modifies the sum and carry expression in transistor level and reduces the transistor count to 30.

Comparison of delay, power and PDP
The proposed approximate adders (PA1, PA2), approximate 4-2 compressor and proposed approximate multiplier were implemented in circuit level in Cadence Virtuoso tool in gpdk 90nm technology. The results obtained is compared with existing adders, comparators and multipliers in terms of power, delay and PDP as shown in Table 7, Table 8 and Table 9.

CONCLUSION
From literature we conclude that mirror adders and pass transistors have signal degradation problem. To achieve high voltage swing with reduced number of errors we proposed approximate adders based on TGA. Also in this paper approx-imate multipliers were proposed. In order to reduce partial product stages in approximate multiplier, approximate 4-2 compressors were implemented. The proposed designs were simulated in cadence virtuoso tool in 90nm technology. The simulation results shows that the PA1 and PA2 uses less no. of transistors i.e. PA1 with 14 and PA2 with 22 transistors which is less when compared to existing adders like TGA1 and TGA2 in [14] with same error rate and achieves reduced delay and power consumption. The PA1 achieves delay of 0.755 (ns), power consumption is 1.13 (uW). The PA2 achieves delay of 0.824 (ns) and power gets reduced to 1.32 (uW). The proposed compressor uses 30 transistors and achieves delay of 87.36 ps and power consumption is 1.06 uW. The proposed approximate multiplier achieves less area with 282 transistors, delay of 124.56 ns and power consumption is 29.332 uW. Area, delay and power of proposed multiplier is very less when compared to accurate array multiplier. The proposed adders and multipliers are widely used in multimedia applications whose output image does not need to be numerically exact. The power reduction will be achieved by using approximate designs.