An efficient floating point adder for low-power devices

ABSTRACT


INTRODUCTION
Battery-operated portable electronic devices have increasingly become an indispensable part of everyday life.The key behind this is the scaling ability of metal oxide silicon field effect transistors (MOSFETS) seen in very large-scale integration (VLSI) due to which functionality per unit area has increased which has brought the price of the devices down leading to wide usage.Due to scaling and an increase in functionality per unit area, the power consumption has increased.The increase in power consumption of the VLSI devices has not been matched by the improvement in the capacity of the battery.Therefore, operation time per charge has come down causing inconvenience to the users.For this reason, reducing the power consumption of portable devices has become a compelling design constraint.A large portion of energy consumption is dominated by two components: dynamic power and leakage power.To extend the battery life various technology-based, architecture-based, and circuit-based solutions that reduce the sum of the two power components without sacrificing the performance have to be developed.At the technology level, feature size scaling has continuously brought lower power circuits by reducing the supply voltages.To retain performance, the threshold voltages of these circuits have also been reduced with technology scaling.However, in recent technologies, the benefits of constant-field scaling have been compromised by an exponential increase in the leakage current.On the architectural level, pipelining and parallelism have helped in lowering the power consumption of digital circuits.
In the current complementary metal-oxide semiconductor (CMOS) technology, the benefits of device scaling are impeded by the reliability issues due to the process variations, ageing effects and soft errors.Leakage current, static power are increasingly adding to the concerns towards achieving low power consumption.Hence, the device scaling which once used to offer advantages for low power applications is no  ISSN: 2089-4864 Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 253-261 254 more attractive and hence, new architectures need to be evolved to achieve low power consumption.Design of approximate computation blocks is one such potential solution [1].
Most of the modern graphics processors for multimedia and other applications have dedicated digital signal processing blocks.These applications output an image, video or an audio signal and the limited perception of human senses allows for an approximation of the computations involved in the demanding digital signal processing (DSP) algorithms for these applications [2].Even an analog computation that yields good enough results instead of accurate results is also acceptable [3].Addition is the most fundamental and significant mathematical operations used in all signal/image processing applications [4], [5].Deterministic approximate logic or probabilistic imprecise arithmetic are normally employed for soft adders [6].
Various low-power design approaches using approximate computing have been introduced, such as algorithmic noise tolerance [7], [8], non-uniform voltage over scaling [9], and significance-driven computation [10], [11].Verma et al. [12] have presented an innovative adder design known as the almost correct adder (ACA), which offers exponentially faster performance compared to traditional adders.They also proposed the variable latency speculative adder (VLSA) with a slight area overhead.Additionally, some adder configurations meet real-time energy requirements by reducing complexity at the algorithmic level [13], [14].The lower part OR adder [15] relies on approximate logic with a distinct truth table compared to a standard adder.The probabilistic full adder (PFA) [16]- [20] is based on probabilistic CMOS technology, which is a platform for modeling nano-scale designs and reducing power consumption [21]- [23].

BACKGROUND 2.1. IEEE-754 floating point format
Floating point representation offers a wider dynamic range in comparison to fixed-point representation for real numbers.However, floating point hardware is known for its complexity and substantial power consumption.The predominant standard for floating point formats is IEEE 754-2008 [24], which encompasses various basic and extended types.These formats include half precision (16-bits), single precision (32-bits), double precision (64-bits), extended precision (80-bits), and quad precision (128-bits).The typical IEEE floating point format, as depicted in Figure 1, features an exponent part with a bias of 2^(E-1)-1, where E denotes the number of exponent bits.Single precision and double precision formats are the most commonly used in contemporary computer systems.You can find details regarding the exponent and mantissa bits for IEEE-754 basic and extended floating point types in Table 1.

Floating point adder architecture
A typical floating point adder architecture comprises distinct hardware components for tasks like exponent comparison, mantissa alignment, mantissa addition, normalization, and rounding of the mantissa (as depicted in Figure 2 and elaborated by Behrooz [25]).Initially, two operands are extracted from their floating point formats, and each mantissa has the hidden '1' bit added to it.The addition of floating point numbers entails a series of operations, starting with comparing the exponents and adding the mantissas.The exponents are first assessed to determine the larger of the two.Depending on the result of the exponent comparison, the mantissas are swapped and then aligned to have the same exponent value before undergoing addition in the mantissa adder.After the addition, normalization shifts are essential to bring the result back to the IEEE standard format.Normalization is achieved by left-shifting with a count of leading zeros, making the

APPROXIMATE FLOATING POINT ADDER
The approximate floating point adder design originates at the architecture level with the exponent and mantissa adder/subtractor designed using approximate fixed-point adders.An N-bit adder consists of two parts, i.e., an m-bit exact adder and an n-bit inexact adder as shown in Figure 3.The exact adder part can have the exact implementation as a full adder circuit.The inexact adder will ignore the carry bits for computation thereby reducing the critical path as well as the hardware utilization.The modified approximate adder concept can also be used for the mantissa adder for approximate computation.The mantissa adder will provide a larger scope, as the number of bits in the mantissa are higher than the exponent and at the same time, the approximate design in the mantissa adder has a lower impact on the error, because the mantissa part is less significant than the exponent part.Therefore, an inexact design of a mantissa adder is more appropriate.

Basic building block: 8-bit approximate adder
The carry equation for a conventional carry look ahead adder is given by (1): where,   is the input carry and   and   are propagate and generate signals of the i th stage.If the carry equation is split up into two segments, as in (2): where,  is the window size and, the first segment consists of W most significant (MS) bits and the second segment consists of N-W least significant (LS) bits.The first part of the ( 2) is the approximate part, while the second part is called the augmenting part.For approximate carry generation with a window size of W, the output carry at the i th stage is compute using the approximate part only.Computing an approximate  +1 is faster and consumes less hardware resources and hence lesser power as compared to computing precise carry.An 8-bit adder is chosen as the basic building block for the floating point approximate adder in the proposed design.Figure 4 shows the structure of conventional full adder.As shown in Figure 5, the 8 bits are partitioned into two blocks; the MS block is of 4 bits, while the LS block is of 4 bits.The output carry of this 8-bit adder block is computed approximately using the approximate part in (2), 4-bit carry generator block as shown in Figure 6 is used for generating the approximate carry for the 8-bit adder using the 4-MS bits.
Overall, 3 errors are introduced in sum computation and 1 error in carry computation.Assigning the inverted carry out at each stage to the sum computed for that stage reduces the hardware for sum computation block.This is a significant reduction in hardware requirements as compared to a conventional adder.Utilizing the look ahead carry generation logic from 4 MS bits improves the timing performance of the circuit by not depending on the sequential computation of carry at each bit.A total of 8 transistors are used for 1-bit sum and carry generation.

Mantissa approximate adder
For realizing the 23-bit approximate adder, three 8-bit adders are used.The lower two 8-bit adders are the proposed 8-bit approximate adders, while the MS byte is implemented using an exact 8-bit adder.The proposed 23-bit mantissa adder is shown in Figure 8.

Exponent adder/subtractor
As the exponent for realizing the 23-bit approximate adder, is having the most impact on the accuracy of the result.The exponent adder is proposed to be implemented using exact 8-bit adder.Further, we discuss error metrics for the evaluation purpose.

ERROR METRICS 4.1. Error distance
The error distance (ED) between two binary numbers,  (erroneous) and  (correct), is defined as the arithmetic distance between these two numbers.Where,  and  are the indices for the bits in  and , respectively.Suppose for an 8-bit adder, the correct sum for a given set of operands is "1110 0101" and the incorrect outputs are "11100100" and "11110101".Then the two erroneous values "11100100" and "11110101" have an ED of 1 and 16 respectively.
For a non-deterministic implementation, the output is probabilistic and usually follows a distribution for a given input   .In this case, the ED of the output (denoted by   ) is defined as the weighted average of EDs of all possible outputs to the nominal output.Assume that for a given input, the output has a nominal value b, but it can take any value given in a set of vectors   (1 ≤  ≤ ).The ED of the output is then given by (6).Where   is the output probability of   (1 ≤  ≤ ).

Mean error distance
Mean error distance (MED)   of a circuit for non-deterministic inputs with a certain probability of occurrence is defined as the mean value of all the EDs of all possible outputs for each input.Assuming that the inputs are defined by   .(1 ≤  ≤ ) and probability of occurrence of each vector is   (1 ≤  ≤ ), then MED is given by (7).Where,   is the ED of the outputs for input   .For a uniformly distributed system, all the inputs have an equal probability of occurrence and hence,   is same for all input vector.

SIMULATION AND RESULTS
The proposed adder circuit is simulated in Cadence environment for delay and power consumption and error analysis.The results are presented hereby.The maximum ED for the 8-bit adder is 3.The proposed 8-bit adder with a window size of W=4 is simulated for all possible input combinations of a and b.For all the 256×256 combinations, the approximate and the exact sum and carries are computed, error distances computed between the approximate and accurate outputs.The maximum ED for the 8-bit adder is 3.

Error metrics for proposed 8-bit adder
The proposed 8-bit adder with a window size of W=4 is simulated for all possible input combinations of a and b.For all the 256×256 combinations, the approximate and the exact sum and carries are computed, error distances computed between the approximate and accurate outputs.The maximum ED for the 8-bit adder is 3.

Delay
Considering a conventional 8-bit adder, the delay in 8-bit computation is due to the ripple carry effect, which takes 8 cycles.Assuming the delay in computation of 1-bit full adder result to be T, the delay in generating the 8-bit adder is 8T.In the proposed adder, the total delay is equal to the delay for computation of carry out from MS 4-bits, which is equal to 4T.

Power consumption tradeoff
The energy consumed by a probabilistic inverter experiences an exponential increase as the probability of obtaining the correct output rises.When it comes to approximate implementations, power consumption is generally viewed as being directly proportional to the number of gates involved.In the newly proposed adder circuits, the reduction in the number of transistors enables a lower operating voltage, resulting in an overall reduction in power consumption.In the case of a traditional full adder, if the power consumed for a 1-bit operation is normalized to 1, then the power consumed for a k-bit conventional full adder amounts to k.However, in the context of the proposed 8-bit adder, the reduction in the number of transistors leads to a decrease in the operating voltage, from 1.13 V in an accurate implementation to 1.04 V. Consequently, this reduction in voltage contributes to an estimated decrease in power consumption.
Which is 7.5% lower than the conventional adder.Both the discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) blocks operates at a lower supply voltage in case of approximate adders than the exact mode.Here, DCT and IDCT operates at a supply voltage of 1.28 and 1.13 V in the exact mode, respectively.The different supply operating voltages are demonstrated in Figures 9 and 10 for different approximations and truncations considering varied bits.Table 3 demonstrates the percentage power savings considering varied approximations and truncation against the base case.Approximation 3 saves the maximum power.

CONCLUSION
A novel approximate adder topology for single point floating point adder is presented in the paper.The proposed design takes advantage of the fact that the lower significant bit addition can be approximate and this will not be affecting the solution to a great extent, at the same time the power savings due to the approximate computation will be significant.The proposed configurations has a lower propagation delay and comparable error performance as compared to other architectures.With the proposed mantissa adder, which is a hybrid of look, ahead carry adder for the carry generation and that of the approximate adder for the sum generation gives a distinct advantage in terms of the power consumption as compared to the conventional full adder.

Figure 1 .
Figure 1.General IEEE-754 floating point format Int J Reconfigurable & Embedded Syst ISSN: 2089-4864  An efficient floating point adder for low-power devices (Manjula Narayanappa) 255 detection of leading zeros a critical step in this process.Finally, rounding the normalized result is the last operation before storing the result back.Special cases such as overflow, underflow, and not-a-number are also detected and indicated by flags.

Table 1 .
Exponent and Mantissa bits for IEEE-754 basic and extended floating point types

Table 2 .
Truth table for proposed adder