ISSN: 2089-4864, DOI: 10.11591/ijres.v14.i1.pp117-125 # Design of medium grain integrated clock gater for low power clock network # Shylashree Nagaraja<sup>1</sup>, Abhinav Sathisha<sup>1</sup>, Mamatha Aruvanalli Shivaraj<sup>2</sup>, Latha Bavikatte Nanjundappa<sup>3</sup>, Prakash Tunga Pandeshwara<sup>4</sup> <sup>1</sup>Department of Electronics and Communication Engineering, RV College of Engineering, Bengaluru, India <sup>2</sup>Department of Electronics and Communication, NITTE (Deemed to be University), NMAM Institute of Technology, Nitte, India <sup>3</sup>Department of Electronics and Communication Engineering, JSS Academy of Technology, Bengaluru, India <sup>4</sup>Department of Electronics and Communication Engineering, RNS Institute of Technology, Bengaluru, India #### **Article Info** # Article history: Received Oct 7, 2023 Revised Aug 7, 2024 Accepted Sep 22, 2024 # Keywords: Flow regression Integrated clock gating Low power Medium grain Physical synthesis #### **ABSTRACT** The very large scale integration (VLSI) applications were mainly dependent on area, reliability, and cost rather than power. The power-increasing demand was mainly due to the latest growth of electronic products such as portable mobile phones, laptops, and other devices that needs high speed and low power consumption. The power analysis provides insights on the switching activity of various sequential logic and thus would help early power optimization approaches to be incorporated in the design flow. The medium grain integrated clock gater insertion will help with synthesis flows for other low-power techniques to be applied. The power analysis is performed with a physically driven synthesis network for both leakage and dynamic. The power analysis revealed that medium grain clock gaters help with finer granularity of the clock gating principle thus improving gating efficiency. The medium grain clock gating techniques help the tool understand the activities of various sinks thus helping in the insertion of fine gaters as well. For a single medium grain clock gater, the power savings obtained were 41.37% and 79.35% without and with fine gater insertion respectively while cloning of the medium gaters resulted in 45.1% and 67.4% power savings without and with fine gater insertion respectively. The fine-grain integrated clock gating insertion incurred a maximum of 14.7% increased gate count. This is an open access article under the <u>CC BY-SA</u> license. 117 # Corresponding Author: Shylashree Nagaraja Department of Electronics and Communication Engineering, RV College of Engineering RV Vidyanikethan Post, 8th Mile, Mysuru Road, Bengaluru 560059, Karnataka, India Email: shylashreen@rvce.edu.in or drshylashreen@gmail.com #### 1. INTRODUCTION Previously, very large scale integration (VLSI) applications were primarily determined by area, reliability, and cost rather than power. The recent growth of electronic products such as portable mobile phones, laptops, and other devices that require high speed and low power consumption was primarily responsible for the rising power demand. The main disadvantage of portable devices was that they consumed a lot of power, which reduced battery life and caused failure in the silicon parts of the devices. The device requires high packaging costs and cooling arrangements with low power consumption to control the heat levels. As a result, in the semiconductor industry today, low-power devices are critical. Simultaneously, we must reduce the critical path delay of the devices while decreasing their power. Low static power consumption has long been associated with complementary metal oxide semiconductor (CMOS) technology. Journal homepage: http://ijres.iaescore.com 118 □ ISSN: 2089-4864 It is widely used in the design of VLSI circuits such as digital logic devices, integrated circuits, microprocessors, and microcontrollers, and so on. However, as technology advances, there is a greater demand for lower power consumption, smaller footprints, and high-speed performance. As integrating processed technologies advances, chip density, and operating frequency have increased, raising concerns about power consumption in battery-powered portable and non-portable devices. Dustin is a 16-core parallel ultra-low-power cluster with fully adjustable bit precision from 2b-to-32b and vector lockstep execution mode [1]. It is designed to be used in edge devices for computationally demanding tasks such as deep neural network inference. Dustin's high performance and efficiency are the result of many design choices. Finegrained performance and efficiency tuning is possible thanks to the 16 RISC-V cores with 2b-to-32b bitprecision arithmetic. Tasks requiring high accuracy, for example, can be executed with 32-bit precision, whereas tasks requiring lower accuracy can be executed with 2b precision. Dustin's adaptability enables him to perform well across a wide range of tasks [2]. In highly data parallel kernels, the vector lockstep execution mode (VLEM) mode reduces power consumption even further. A single leader core retrieves instructions and broadcasts them to the 15-follower cores in VLEM mode. This reduces the number of instructions that must be fetched and executed, lowering power consumption. Dustin's high efficiency is also due to the 65 nm CMOS technology used to implement it [3]. Because of its high performance, efficiency, and flexibility, it is well suited for a wide variety of computationally intensive tasks. A technique described in the literature [4] for reducing power consumption in digital circuits by efficient alignment of flip-flops within the clock network. This methodology employs a virtual tile-based approach to strategically place flip-flops in such a way that clock signal distribution is minimized and, as a result, power consumption is reduced [5]. The virtual tile-based approach divides the chip area into virtual tiles. Each tile represents a specific location where a group of flip-flops can be placed optimally. The clock network's complexity and power consumption are managed more effectively by organizing the placement of flip-flops based on this tile structure. Instead of distributing flip-flops randomly across the chip, this methodology aligns them within their designated virtual tiles [6]. This alignment aims to reduce signal propagation delays and overall power consumption by minimizing the distance that clock signals must travel. The capacitance and switching activity associated with clock distribution is reduced by shortening clock signal paths, resulting in lower power consumption [7]. Power-aware placement is an approach that considers power consumption during the physical design phase, ensuring that flip-flop placement is strategically optimized to meet power reduction targets. While the methodology's goal is to optimize power consumption, it may involve trade-offs with other design parameters such as performance, area utilization, and signal integrity [8]. It is critical to balance these factors to achieve an overall effective design. To provide a more comprehensive overview of this methodology's effectiveness and applicability, technical data and specific details such as algorithms, simulation results, and comparisons with other power optimization techniques would be required. Research by Tunga *et al.* [9] focuses on the problem of determining the best path for two mobile sinks in a wireless sensor network that share a common junction. It aims to maximise data collection efficiency while conserving energy. The research presents a novel algorithm that takes sink mobility and network topology into account [10]. It reduces energy consumption while increasing data retrieval by dynamically adjusting the paths of mobile sinks. The proposed method is tested using simulations and outperforms existing methods in terms of data delivery rate and energy efficiency. This study provides important insights into improving the performance of wireless sensor networks with mobile sinks and shared junctions. Research by Afridi *et al.* [11] introduces a method for quickly producing shift timing constraints and sanity checks. It suggests an efficient algorithm for automating the process of defining timing constraints within a system or software. The algorithm identifies critical timing parameters by analysing the system's design and requirements, ensuring proper component synchronisation. It also includes sanity checks to detect and prevent errors. This method simplifies the process of defining timing constraints, reducing human error and saving time during the development phase. It provides a practical solution for improving system performance and reliability while simplifying design. # 2. METHOD The synthesis and optimization processes are described further below. The process of synthesis optimization is critical in determining the quality of results (QoR) of the design's power, performance, and area (PPA). Figure 1 depicts the various steps taken by the optimization [12] engine present in the synthesis tool. The optimization occurs at various hierarchies of design to be synthesized as well as the synthesis process. In logic optimization [13], the goal is to remove redundant logic asmuch as possible. The different logic functionalities are optimized accordingly in the order of high-speed data path, multiplexer logic optimization, and sequential element optimization. Further in physical aware synthesis the technology based constraint driven optimization for timing and power is done. Upon optimization of timing and power, area recovery is performed to squeeze out maximum performance for the minimum area possible [14]. The tool begins multiple iterations to converge the design for a constrained area and compares the timing and power for the present iteration with the previous iterations. The tool is equipped with complex optimization algorithms such that every previous iteration providing a global minimum for a batch of runs, become a seed to the next batch of iterations. The optimization ends when the variance of the obtained global minimums in subsequent runs becomes negligible [15]. The architecture of the processor core, implementation of the cache block is depicted in the Figure 2. Figure 1. Synthesis optimization flow Figure 2. High level processor architecture The Figure 2 represents the high-level architecture of the processor. The instruction cache is part of the L1 cache that is smaller in size and faster in operationas it the closest to the execution engine. The instruction cache is fed to the decoder which decodes the instructions and places it the operation queue. The instructions are also fed to the branch predictor module to analyze any branching possibilities of the 120 ☐ ISSN: 2089-4864 instructions from the data write-back [16]. This helps in improving the efficiency of the processor as a wrong branch prediction can corrupt the pipeline. Modern processors have deep pipelines, hence there is a need for superior branch prediction algorithms to ensure peak efficiency of the processor. The instructions from the operation queue are scheduled into the execution engines. The scheduled instructions are then present to the execution engine of the processors which occurs as the back-end data flow of the processor. The execution engine can have arithmetic logic unit (ALU), floating-point arithmetic units, multipliers, accumulators, and accelerators. The instructions are executed on the data and the results are stored in the load/store unit that determines the data and instructions to be read or written back for the next set of operations to be performed. The data cache handles the data flow into the processor core [17]–[20]. The Figure 3 represents the physical synthesis implementation of the cache block present in the front end data flow of the processor. The various macros containing the static random access memory (SRAM), the built- in self- test (BIST), counters, are synthesized along with the standard cell logic [21]. The cache module here had an SRAM macro for storing the instructions while the standard cell logic incorporate suitable functionality of the instruction cache enabling high-speed communication between the front-end data flow units of the processor. The BIST macro is used for design for testability (DFT) analysis of the SRAM macro. It is used for enabling proper functioning of the SRAM macro [22]. The macros are surrounded by sufficient halo spacing to provide physical verification like design rule check (DRC) and layout versus schematic (LVS) analysis for any movement in the designed seed floorplan. This also aids with future improvements of the design with respect to floorplan changes in place and route (PnR). In general, the processor core designs will have tighter budgets compared to the other components of the SOC [23]. This will have an impact on PPA convergence as large trade-offs would be incurred to maintain the desired performance metrics. Figure 3. Synthesize netlist # 3. RESULTS AND DISCUSSION The data furnished below represent the obtained physical synthesis metrics for the implemented cache block. The Figure 4 represents the medium grain clock gater register transfer level (RTL) mapped to standard cell logic. The medium grain clock gater consists of two latches and 'AND' gates. The latches help with storing the enable signal while the AND gates help with the gating of the clock signal based on the enable signal. The use of latches can impose additional timing checks to ensure proper functionality of the gater design. This would also imply precise control over the switching time frames of the gater for power savings thus providing architectural features in low power designs. The design can be retained based on its RTL description in the netlist by providing don't touch attribute to the tool in the physical synthesis flow or the gater design can be further optimized by the tool to enable lesser toggle activity [24]. The same gater is cloned to reduce effective capacitance driven the gater to the amount of sinks it is connected. This could help in timing analysis as it is still functionally correct as a clock gating element while the number of sinks attached to the gater is reduced thus improving the launch side delay for setup timing analysis. Thus, it is a trade-off between power and timing. The Table 1 represents the data for power of the physically synthesized design without any low power techniques applied to it in the design flow. The amount of dynamic power indicates the need for low power techniques required in highly integrated designs to improve the power efficiency of such designs and improve the operational life-time of the design as well. The physical synthesis flow has low power related parameters incorporated into it with suitable values such that it could yield a decent power design metric convergence without large trade-off in timing. The primary metric getting impacted upon using power driven flow or indulging low power parameters in the flow is timing. Area recovery is usually incorporated in the optimization engine of the synthesis tool and is generally a second priority compared to timing or performance metric. Hence there is a need for detailed understanding of the impact of such flow parameters in the design convergence at physical synthesis stage. Asserting low power with clock gating is one of the parameters in physical synthesis flow. The parameter if enabled will insert clock gating elements in the synthesized netlist depending on the amount of module ungrouping mentioned in the flow [25]. Figure 4. Medium grain integrated clock gater Table 1. Design metrics without clock gating | Design metrics | Values | |----------------|-------------------| | Gates | 7746 | | Area | $13420 \ \mu m^2$ | | Leakage power | 27990.597 nW | | Dynamic power | 1343148.39 nW | | Total power | 1371138.98 nW | Usually if the design is completely flattened, the fine grain clock gating elements are added as a sounder knowledge on the activity of the design elements is available. The designer can achieve better power efficiency depending on the architecture of the design and the appropriate low power techniques applied to 122 ☐ ISSN: 2089-4864 the design. The same steps to achieve low power should be incorporated in the synthesis flow. It is possible to control the type of clock gating elements that can be inserted to the design. The list of suitable clock gating elements can be compiled in a file and fed to the design flow such that the tool is only allowed to pick clock gating elements from only these cells mentioned in the file. The work here describes the design hierarchy such that medium grain coarse gater are inserting in RTL and flattening of sub-modules could enable the tool to spend more time in providing an optimized fine grain gating in the flattened netlist. This makes power recovery also easy as the optimization is limited to only the modules that are flattened. The Figure 4 illustrates the power report obtained for design synthesized with both medium grain and fine grain integrated clock gating elements. The Table 2 provides the power savings obtained by applying a single medium grain integrated clock gater to the design. Table 2. Desing metrics with single medium grain integrated clock gater | Design metrics | Without fine gater | With fine gater | | |----------------|-----------------------|-------------------|--| | Gates | 7602 | 8365 | | | Area | 13480 μm <sup>2</sup> | $13410 \ \mu m^2$ | | | Leakage power | 28822.773 nW | 32195.061 nW | | | Dynamic power | 702679.878 nW | 250822.902 nW | | | Total power | 731502.650 nW | 283017.963 nW | | The clock gating elements usually provide significant improvement in power savings when the sinks are sequential logic or macros as these are large cells with large cumulative load capacitance. The activity reduction in such large cumulative load capacitance will lead to good power savings. The design is also physical driven synthesized for without fine grain integrated clock gaters and with fine grain integrated clock gaters. The additional activity reduction analyzed by the tool provides a greater clock gating efficiency. Thus, there is a substantial reduction in power consumption of the design. The reduction of power observed is 79.35% with fine grain integrated clock gater while 41.37% without fine grain integrated clock gater. The power savings is obtained at the cost of timing performance. The single medium grain integrated clock gater connected to a large amount of capacitive load will consume large surge current during the transition from OFF state to ON state and vice versa. The fanout of the medium grain integrated clock gater is also pretty large, hence the launch path of the timing paths may notice a large delay. This can impact setup timing analysis as improving this large is not easy. Any improvement such as increasing the size of the cell, swapping the Vt of the gater cell wouldn't significantly improve the launch path delay. It is also no feasible to custom construct the routes from the integrated clock gater to the major sinks to improve the delay in the launch path. The addition of such buffers or inverter chain may introduce functional verification failures which may need additional effort to address and verify. Hence there is a need for improving the timing performance of the design without losing the obtained power savings. This is achieved by cloning the medium grain integrated clock gater element through the design flow. The Table 3, provides the data on power savings obtained by cloning the medium grain integrated clock gating element through the physical synthesis flow. It can be observed that the dynamic power component of the design has reduced with the cloning of medium grain integrated clock gaters without insertion of the fine grain integrated clock gating elements. The distribution of the capacitive load to the cloned medium grain integrated clock gaters will have been more optimized by the tool for activity reduction. The static power of the design of the is not varying across all the various runs carried out to demonstrate the different clock gating scenarios. The maximum difference observed in static power consumption is 17.7% for a 14.7% increase in the gate count which is approximately scaled with the gate count increase. The design with insertion of fine grain integrated clock gaters shows an improved timing performance with a difference of 12% between single medium grain integrated clock gater and cloned medium grain integrated clock gater in power savings. This brings to a conclusion that cloning of integrated clock gating cells introduces an important trade-off in power and timing to the designer. The increased in the count of the cloned integrated clock gaters would increase the leakage power consumption while the dynamic power component optimization is left to the tool while the timing of the design improves. There is a need for developing aflow regression which could provide insights on the optimum number of integrated clock gaters to be cloned such that there isn't much degradation in timing while there is as much power savings possible. The Table 4, provides insights on the power savings of this work with the comparative studies used in developing this work. Research by Ottavi et al. [1] claims a maximum power savings of 38% in the instruction cache design by operating in different mode of processor instruction fetch modes while [4] a maximum power savings of 14.1% by applying suitable low power techniques involving flop multi banking, register multi banking and connections to integrated clock gaters based on the placement of these sequential sinks. Table 3. Design metrics with cloning of medium grain integrated clock gater | Design metrics | Without fine gater | With fine gater | | |----------------|-----------------------|-------------------|--| | Gates | 7783 | 8541 | | | Area | 13540 μm <sup>2</sup> | $13470 \ \mu m^2$ | | | Leakage power | 29553.729 nW | 32957.946 nW | | | Dynamic power | 662841.615 nW | 413614.005 nW | | | Total power | 692395.344 nW | 446571.951 nW | | Table 4. Power savings comparison | Design metric | Without fine gater | With fin gater | Cloning and without fine gater | Cloning and with fine gater | Ottavi et al. [1] | Kwon et al. [4] | |---------------|--------------------|----------------|--------------------------------|-----------------------------|-------------------|-----------------| | Power savings | 41.37% | 79.35% | 45.1% | 67.4% | 38% | 14.1% | # 4. CONCLUSION The power analysis provides insights on the switching activity of various sequential logic and thus would help for early power optimization approaches to be incorporated in the design flow. The medium grain integrated clock gater insertion will help with synthesis flows for other low power techniques to be applied. The power analysis is performed with physical driven synthesis network for both leakage and dynamic. The power analysis revealed that medium grain clock gaters help with finer granularity of the clock gating principle thus improving gating efficiency. The medium grain clock gating techniques help the tool understand the activities of various sinks thus helping in insertion of fine gater as well. For a single medium grain clock gater, the power savings obtained where 41.37% and 79.35% without and with fine gater insertion respectively while cloning of the medium gaters resulted in 45.1% and 67.4% power savings without and with fine gater insertion respectively. The fine grain integrated clock gating insertion incurred a maximum of 14.7% increased gate count. #### REFERENCES - [1] G. Ottavi et al., "Dustin: a 16-cores parallel ultra-low-power cluster with 2b-to-32b fully flexible bit-precision and vector lockstep execution mode," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 70, no. 6, pp. 2450–2463, 2023, doi: 10.1109/TCSI.2023.3254810. - [2] T. Chindhu S. and N. Shanmugasundaram, "Clock gating techniques: an overview," 2018 Conference on Emerging Devices and Smart Systems (ICEDSS), Tiruchengode, India, 2018, pp. 217-221, doi: 10.1109/ICEDSS.2018.8544281. - [3] G. Hyun and T. Kim, "Flip-flop state driven clock gating: concept, design, and methodology," in *IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD*, IEEE, Nov. 2019, pp. 1–6, doi: 10.1109/ICCAD45719.2019.8942061. - [4] T. Kwon, M. Imran, D. Z. Pan, and J.-S. Yang, "Virtual-tile-based flip-flop alignment methodology for clock network power optimization," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 28, no. 5, pp. 1256–1268, May 2020, doi: 10.1109/TVLSI.2020.2966912. - [5] G. Yang and T. Kim, "Design and algorithm for clock gating and flip-flop co-optimization," in *IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD*, New York, NY, USA: ACM, Nov. 2018, pp. 1–6, doi: 10.1145/3240765.3240793. - [6] M. Sanaeepur, "Crosstalk delay and stability analysis of MLGNR interconnects on rough surface dielectrics," *IEEE Transactions on Nanotechnology*, vol. 18, pp. 1181–1187, 2019, doi: 10.1109/TNANO.2019.2945354. - [7] N. Egidos *et al.*, "20-ps resolution clock distribution network for a fast-timing single-photon detector," *IEEE Transactions on Nuclear Science*, vol. 68, no. 4, pp. 434–446, Apr. 2021, doi: 10.1109/TNS.2021.3057581. - [8] K. S. Zaman, M. B. I. Reaz, S. H. Md Ali, A. A. A. Bakar, and M. E. H. Chowdhury, "Custom hardware architectures for deep learning on portable devices: a review," *IEEE Transactions on Neural Networks and Learning Systems*, vol. 33, no. 11, pp. 6068– 6088, Nov. 2022, doi: 10.1109/TNNLS.2021.3082304. - [9] S. Tunga, S. V. Chakrasali, N. Shylashree, B. N. Latha, and A. S. Mamatha, "Optimal path discovery for two moving sinks with a common junction in a wireless sensor network," *Indonesian Journal of Electrical Engineering and Computer Science*, vol. 23, no. 2, pp. 879–889, Aug. 2021, doi: 10.11591/ijeecs.v23.i2.pp879-889. - [10] P. Bhattacharjee, P. Rana, B. K. Bhattacharyya, and A. Majumder, "Clock-gated variable frequency signaling to alleviate power supply noise in a packaged IC," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 41, no. 6, pp. 1704–1715, Jun. 2022, doi: 10.1109/TCAD.2021.3099438. - [11] S. M. A. Afridi, N. Shylashree, S. Tunga, and L. B. Nanjundappa, "An effective way to generate the shift timing constraints and sanity checks," *Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)*, vol. 30, no. 3, pp. 1399–1406, Jun. 2023, doi: 10.11591/ijeecs.v30.i3.pp1399-1406. - [12] S. Madhavan, R. Phadke, and N. Bhagyashree, "Physical design and implementation of lakshya -sub-system of built in self test system," in 2021 International Conference on Circuits, Controls and Communications, CCUBE 2021, IEEE, Dec. 2021, pp. 1–6, doi: 10.1109/CCUBE53681.2021.9702732. - [13] P. U. Sathyakam, P. S. Mallick, and P. Singh, "Geometry-based crosstalk reduction in CNT interconnects," *Journal of Circuits, Systems and Computers*, vol. 29, no. 6, May 2020, doi: 10.1142/S0218126620500942. - [14] S. Majji, T. R. Patnala, M. Valleti, C. S. Pasumarthi, S. Kothapalli, and S. R. Karanam, "A study on the comprehensive analysis of electro migration for the nano technology trends," in 2020 6th International Conference on Advanced Computing and Communication Systems, ICACCS 2020, IEEE, Mar. 2020, pp. 898–901, doi: 10.1109/ICACCS48705.2020.9074328. - [15] D. M. T. Nguyen, T. Van Quang, A. H. Nguyen, and M. S. Nguyen, "Advanced on-chip variation in static timing analysis for deep submicron regime," in *Proceedings - 2020 International Conference on Advanced Computing and Applications, ACOMP 2020*, IEEE, Nov. 2020, pp. 130–134, doi: 10.1109/ACOMP50827.2020.00026. - [16] H. Cheng, X. Li, Y. Gu, and P. A. Beerel, "Converting flip-flop to clock-gated 3-phase latch-based designs using graph-based retiming," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 41, no. 4, pp. 979–992, Apr. 2022, doi: 10.1109/TCAD.2021.3068109. - [17] B. M. Shah and U. Mehta, "Development of static timing analysis tool in perl," Proceedings 5th IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology, RTEICT 2020, pp. 252–255, 2020, doi: 10.1109/RTEICT49044.2020.9315618. - [18] V. G. Srivatsa, A. P. Chavan, and Di. Mourya, "Design of low power high performance multi source h-tree clock distribution network," *Proceedings of 2nd International Conference on VLSI Device, Circuit and System, VLSI DCS 2020*, 2020, doi: 10.1109/VLSIDCS47293.2020.9179954. - [19] D. Mishagli, E. Koskin, and E. Blokhina, "Path-based statistical static timing analysis for large integrated circuits in a weak correlation approximation," in *Proceedings - IEEE International Symposium on Circuits and Systems*, IEEE, May 2019, pp. 1–5, doi: 10.1109/ISCAS.2019.8702198. - [20] S. Lyu and Z. Shi, "On-chip process variation sensor based on sub-threshold leakage current with weak bias voltages," in 17th IEEE International Conference on IC Design and Technology, ICICDT 2019 - Proceedings, IEEE, Jun. 2019, pp. 1–4, doi: 10.1109/ICICDT.2019.8790891. - [21] I. L. Tseng, Z. C. Lee, V. Tripathi, C. M. T. Yip, Z. Chen, and J. Ong, "A system for standard cell routability checking and placement routability improvements," in *Proceedings APCCAS 2019: 2019 IEEE Asia Pacific Conference on Circuits and Systems: Innovative CAS Towards Sustainable Energy and Technology Disruption*, IEEE, Nov. 2019, pp. 125–128, doi: 10.1109/APCCAS47518.2019.8953119. - [22] U. Gandhi, I. Bustany, W. Swartz, and L. Behjat, "A reinforcement learning-based framework for solving physical design routing problem in the absence of large test sets," in 2019 ACM/IEEE 1st Workshop on Machine Learning for CAD, MLCAD 2019, IEEE, Sep. 2019, pp. 1–6, doi: 10.1109/MLCAD48534.2019.9142109. - [23] S. J. Kulkarni and N. S. Murty, "Methodology for congestion reduction and timing closure during placement," in 2019 3rd International Conference on Electronics, Materials Engineering and Nano-Technology, IEMENTech 2019, IEEE, Aug. 2019, pp. 1–4. doi: 10.1109/IEMENTech48150.2019.8981032. - [24] S. Abhinav, S. Srinivasan, A. Ganesan, M. R. Anala, and T. Mamatha, "Wireless water quality monitoring and quality deterioration prediction system," in *Proceedings 26th IEEE International Conference on High Performance Computing Workshops, HiPCW 2019*, IEEE, Dec. 2019, pp. 23–28, doi: 10.1109/HiPCW.2019.00013. - [25] S. Abhinav, D. Sagar, and K. B. Sowmya, "Modified floating point adder and multiplier IP design," in *Lecture Notes in Networks and Systems*, 2023, pp. 347–361, doi: 10.1007/978-981-19-7874-6\_25. #### **BIOGRAPHIES OF AUTHORS** Dr. Shylashree Nagaraja o sc is currently working as Associate Professor in the Department of Electronics and Communication Engineering at RV College of Engineering, Bengaluru. She is having 17 years of teaching experience. She was a recipient of the best Ph.D. thesis award for the year 2016-2017 in electronics and communication engineering from BITES. She has received the best IEEE researcher award in IEEE-AGM meeting held during 2021 from Bangalore IEEE section. She has also received the best paper award in IEEE-ICERECT held during 2015 at Mandya. She has research publication in 40 International Journals (out of which 12 journals are SCI journals), 6 Springer book chapters and 10 International conferences. She received one US patent grant, two Indian patent grant in cryptography. She has also received two Indian patent grants in the area of VLSI. She is also the co-author of the network theory, engineering statistics, and linear algebra and control engineering textbook. She has funded projects consultancy projects and has delivered many technical talks on VLSI. She has delivered lectures as a subject matter expert in VTU eshikshana and EDUSAT program. She is a recipient of an international travel grant under SERB young research scholar category. She is a life member of ISTE, IETE, fellow member of ISVE, senior member of IEEE and IEEE CAS Secretary, Bangalore section. Her areas of interest include cryptography network security, network analysis, analysis, and design of digital circuits, digital VLSI design, analog mixed mode VLSI design, low power VLSI design, statistics and linear algebra, and control engineering. She can be contacted at email: shylashreen@rvce.edu.in or drshylashreen@gmail.com. Abhinav Sathisha (D) ST creceived his bachelors degree in electronics and communication engineering at RV College of Engineering. He is pursuing his post-graduation in VLSI design and embedded systems at RV College of Engineering. His research interests lie in processor micro-architecture designs, silicon for AI, high-performance architectures, accelerators for AI/ML, and physical design. He can be contacted at email: abhinavs.lvs21@rvce.edu.in. П **Dr. Mamatha Aruvanalli Shivaraj** is currently working as Associate Professor in the Department of Electronics and Communication Engineering at NITTE (Deemed to be University), NMAM Institute of Technology, Nitte, India. She has 26 years of teaching experience. She is the author of eight international journals and six international conferences in the field of multispectral image compression. She is the author of network theory, engineering statistics and linear algebra, and control engineering textbook. Her areas of interest are signal processing, HDL, image compression, control engineering, design of digital circuits, and digital VLSI design. She is a senior IEEE member. She can be contacted at email: mamatha.girish@nitte.edu.in or mamathatanya@gmail.com. Prof. Latha Bavikatte Nanjundappa is surrently working as Assistant Professor in the Department of Electronics and Communication Engineering at JSS Academy of Technical Education, Bangalore. She has 34 years of teaching experience, her areas of interest are signal processing, power electronics, computer networks, HDL, and control engineering. She has completed B.E. and M.Tech. from Mysore University and NITK, Surthkal, Mangalore University in the year 1990 and 1997 respectively. She can be contacted at email: lathabn@jssateb.ac.in or lathajss@gmail.com. **Dr. Prakash Tunga Pandeshwara** © So is currently working as Associate Professor in the Department of Electronics and Communication Engineering at RNS Institute of Technology, Bangalore. He has 20 years of teaching experience. His areas of interest are signal processing, power electronics, image processing, HDL, and control engineering. He has completed M.Tech. and Ph.D. from VTU. He can be contacted at email: prakashtunga.p@rnsit.ac.in.