Design of fault tolerant algorithm for network on chip router using field programmable gate array

ABSTRACT


INTRODUCTION
System on chip (SoC) consists of several intellectual property (IP) modules such as a general-purpose processor, input/output modules, signal processing blocks on a single chip and it is considered ideal systems with lower IP blocks as they take up lesser area and consume less power as compared to the multi-chip systems.However, the direct connections and shared bus aspect is responsible in it falling short when it comes to using it for an application with tens to thousands of IP modules.The network on chip (NoC) paradigm is advancing as an ensured answer for scalability issue of SoC due to its communication bounding restriction.With the ever developing very large-scale integration (VLSI) industry that has even crossed a billion-transistor mark, NoC circuitry is of great value.
In NoC, IPs are placed in grid (tiles) and the communication between them occurs with the help of networking protocols.This architecture has three main components: router, links, and network interface (NI).The backbone of the NoC architecture is its router.It consists of the routing algorithm which is the logic that decides the path taken by an incoming packet in order to reach its assigned destination.The NI acts as an adapter which is utilized to decouple network computations from communication and links from physical connections between routers in a network.
The router being the most critical aspect in any NoC architecture is the main focus of this study.It provides direction to organize traffic from source to destination making use of the routing algorithm.It comprises of input and output pathways in five directions north, south, east and west and core.The structure for any N×N matrix can be achieved by implementing any of the topologies such as mesh, torus, star, polygon and butterfly [1]- [3].
Another important aspect is the algorithm used in order to provide the optimum route for data packets in any given network.The routing algorithm is a crucial component of the NoCs implementation.The route between the source and destination nodes is determined by the routing algorithm.Any routing method should aim to transport packets to their destinations, distribute network traffic so as to minimize packet interference, choose a short route, reduce packet delays, and ultimately improve network performance.The routing algorithms have a major significance in NoCs as it plays crucial role on the performance, complexity and implementation cost of NoC [4], [5].There can be broadly classified depending upon the routing decisions made into deterministic routing, in which data packets take fixed pathway towards destination based on designated standards and adaptive routing, in which the pathway depends on the network conditions.
Fault-tolerant routing is one of the key and fundamental concerns that needs to be addressed for NoCs.In NoC system, packet routing unquestionably plays a crucial role because it significantly affects system performance.It is possible that some errors may occur during system fabrication and routing of the data packets due to faulty nodes.A single faulty node disrupts data packet routing which results in the failure of the entire NoC system.Therefore, the development of fault-tolerant NoC become one of the research areas in SoC-based design [5], [6].In this work, a fault tolerant routing algorithm is developed and applied to a 4×4 mesh based NoC router architecture to validate the algorithm.
In this paper, section 2 reviews the work done in NOC domain.Section 3 proposes single node router architecture.Faults in the NoC router are discussed in section 4. In section 5, fault tolerant routing algorithm and its implementation is discussed along with the results.Section 6 concludes with a discussion of the router's fault-tolerant algorithm implementation on the field programmable gate array (FPGA) and an emphasis on future work.

LITERATURE REVIEW
NoC is circuitry that has been studied keenly over the last few years by researchers due to its upper hand over SoCs in terms of better electrical properties, higher bandwidth and scalability issues.Here is a review of some of the research done that has played a significant role to this study.The basic NoC architecture explained by Bjerregaard and Mahadevan [7].It also gives a brief description of the research areas in NoC fields.At system level, design methodology and architectural domain plays very important role for NoC design.For networks, it is the topology, protocol and flow control and at link level, reliability and packets synchronization are important.
Benini and De Micheli [8] highlighted the issues with SoC that cause the communication bottleneck.Some of these are synchronization with various network domains on the chip, physical wiring delays, shared resources, and scalability issues.It also explains why the authors believed that NoC approach is the best solution to overcome these obstacles.
[9] explained how NoC is an interconnection of several networks computational, storage, and input/output block.This paper also discusses a honeycomb architecture implementation made up of hexagonal structures with resources at its centre and switches at its nodes.This seems to be an efficient architecture for NoC and is similar to the 2D mesh topology.Area and performance parameters of this implementation were also discussed.
Agarwal et al. [10] reviewed the NoC analysis for the various topologies like mesh, torus, ring, and butterfly.It gives information about the basic key aspects of NoC router architecture components like routing algorithm and its types, switching algorithm and its types, flow control methods, virtual channels, importance of quality of service (QoS).This paper also gives a brief idea about the real-world implementations of NoC.For example, Ethereal NoC by Philips.
Hesham et al. [11] examined the real time challenges encountered by NoC such as deadlines, performance, and cost.It also analyses the time division multiplexing (TDM) concept as a substitute to one of the commonly used circuit switching in NoC, for improving efficiency.It also compares the real time NoCs on the basis of routing, clocking and design flexibilities and analyze the parameters affecting its performance such as energy efficiency, runtime adaptability as well as scalability.
Achballah and Saoud [12] explained the NoC architecture with the help of the key characteristics topology, routing and switching.It also examines NoC dedicated simulation tools, some of which are NS-2, which is a prototyping and simulation tool.Noxim is a tool that allows user to define 2D mesh NoC architecture for throughput and latency.Orion is a simulator to estimate power and space for NoC architecture and carries out a comparison among them for modelling and hardware synthesis.
Silva et al. [13] talked about how NoC has overcome the constraints offered by bus based architecture of SoC with the features of parallel transactions and scalability.It also evaluates the topologies of 2D mesh Design of fault tolerant algorithm for network on chip router using field programmable … (Priti Shahane) 3 and 2D torus and routing schemes such as west first, odd even (OE).The NoC based multiprocessor systemon-chip (MPSoC) shown in this paper is modelled in very high-speed integrated circuit hardware description language (VHDL) and SystemC.Gamal et al. [14] implemented NoC design on Virtex 5 FPGA and analyzed on sub-module level.Soft and hard implementations were carried out and compared.It was observed that the NoCs designed for FPGA in soft implementations would utilize area better than application specific integrated circuit (ASIC).The concept of buffered and bufferless routing was explained by Cai et al. [15].A comparative study of the implementation of buffered and bufferless routing processes on ASIC and FPGA were carried out.The buffered routing requires buffers to store data packets for routing.The calculation of the path that will be taken by a packet for routing is predetermined.This removes any route calculation.On the other hand, the bufferless routing was carried with the help of FLIT-BLESS routing algorithm, in this each flit is routed individually.When compared to buffered routing, the area and power taken up is more than bufferless.
Gindin et al. [16] proposed NoC based FPGA architecture design which is a combination of hard (implemented in silicon which includes modules like processors, multipliers, and memory interfaces) and soft (programmable units) functionalities.The approach used is a new routing scheme, weighted ordered toggled (WOT).In WOT, packets are split the same as per the XY-YX routing algorithm, with additional weights added.It is optimal when traffic requirements are not symmetric in the network.
Brugge and Khalid [17] presented an NoC router for FPGA with features such as mesh topology, XY routing algorithm, store and forward flow control, 4 ports north, south, east and west and a first in, first out (FIFO) buffer.It is a low area router design with low area overhead, which is key in NoC designs for FPGA and the router was created with flexible parameters.
Akshay et al. [18] presented the implementation of a new technique for mesh NoC in the realm of fault tolerance.This method makes use of the fault bits.These bits are patterned to examine the type of fault.On comparison with XY routing, the proposed method is less efficient as it takes more hops due to deflection.Another disadvantage is the increase in latency.The only advantage is the guarantee of packet delivery.
Patooghy and Miremadi [19] presented a routing algorithm for NoC with eliminating problems caused by flaws in the system.In this method, two copies of each packet are made and both are routed through separate paths-one through the standard XY routing path and the second copy through the YX routing path.This method generates limited redundant bits and traverses them through the pathway with minimum traffic.However, this XYX routing algorithm gives negligible performance and power consumption overheads.It provides almost the same reliability as that of deterministic approaches.
Pasricha et al. [20] presented a turn model based approach for NoC.The approach here is to combine OE and inverted OE (IOE) to get better fault tolerant results compared to existing turn model and N-random walk.The packet is sent through OE route and redundant bit through IOE route.In OE many turns are restricted.Hence, the chances of deadlock are less and the packets always take the shortest path to reach end point.
Typically, traditional fault-tolerant routing algorithms or shortest path routing algorithms base their conclusions on the predefined rules.Additionally, because to a lack of intelligence, the packets always pass via the same node on the way to the destination, causing congestion and queuing issues.Every new case necessitates human interaction in order to update the rules, which are user-defined based on commonly occurring routing problems noticed by the programmer.However, as there are more routing issues, there is a greater need to design new rules to properly address each issue, which could reduce efficiency or accuracy.And this becomes easy and more reliable with the help of machine learning (ML) technology.A recent field of research is the use of ML approaches to recommend a fault-tolerant routing algorithm for mesh-based NoC.The ML algorithm understands many routing scenarios throughout the learning process, which enable it to handle complex scenarios precisely and effectively.Samala et al. [21] proposed a new concept reinforcement learning (RL) based fault-tolerant routing algorithm to address the link and router faults in NoC rather than using traditional method.To develop a mesh-based NoC fault-tolerant routing system using ML becomes upcoming research domain [22], [23].
It is evident from the analysis of several NoC router architectures that buffers serve as the means to implement the majority of NoC router designs.It is always preferable the use of buffers and virtual channels to reduce network latency and contention.However major concern with the buffers is more area requirement.Therefore, there is a need to emphasize on optimized use of buffer in NoC router.This work is primarily concerned with the challenges of efficient design of the NoC router blocks in terms of fast scheduler and optimized use of buffers.As a result, the study's emphasis is on the efficient NoC router design for the area and latency optimization.Also, the evaluation of NoC router parameters targeted for implementation on FPGAs since FPGA serves as an excellent platform for hardware based NoC router implementation as it provides support for a large number of IP modules.However, due to limited logic and routing recourses, low area overhead for NoC router implementation is pivotal in FPGA based NoC.Therefore, this study deals with the efficient design of NoC router implementation on FPGA.

RESEARCH METHOD
In the NoC architecture, router is the most important block.It is the structure that enables the incoming packet to reach its appropriate destination in an optimal path with the help of the routing algorithm.The router architecture in study here has three main blocks: input block, scheduler block and the crossbar switch block.Each of these have a vivid role in the processing of the incoming data packet over its course of travel within the router.The novelty of the proposed design is that the design has a router with a single side buffer instead of using virtual channels in the buffers and iSLIP as a fast scheduler.Therefore, the proposed NoC router design consists of the following blocks: i) input block with single side buffer memory, ii) iSLIP scheduler, and iii) cross bar switch.
A single router consists of 5 ports architecture in which data packets can be transferred to east, west, north and south directions and the final destination is a core port of the respective node.Traditionally in routers, buffers are included in order to save the incoming packets in a queued form before they can be forwarded.This causes area and power consumption significantly affecting performance of the circuitry.In order to reduce this, single data storage buffer is implemented at every input instead.The input block is designed with a single data packet memory buffer that only stores deflected data packets in the side buffer memory.Therefore, in order to prevent data loss, it is required to have 5 buffers each for a specific direction to hold data packets in the case of contention of output ports.
The second block is the iSLIP scheduler, which determines the precise direction of data packet transfer and contains a programmable priority encoder that can vary the priority throughout each clock cycle dependent on the active node.The benefit of the iSLIP scheduler is that data packets move in a predetermined direction rather than round robin, allowing for fast scheduling.The crossbar switch in the design is responsible for physically connecting an input port to its destined output port based on the grant issued by the scheduler.And finally output port stores the data packets.The following Figure 1 shows the architecture of single node router.
Figure 1.Architecture of single node router [24] This single node architecture accepts a 24-bit input data packet along with clk and reset signals.The 24-bit input data can be split into 16-bits of information, along with 4-bit source address and 4-bit destination address and the output data consists of 24-bits of data received at the end of a successful cycle of data transfer.The architecture used in this study is a 4×4 mesh topology with a routing algorithm derived from XY routing as it is the most widely used routing algorithm.

FAULTS IN NETWORK ON CHIP
NoC architectures offer adaptability for on-chip communications through the use of fault-tolerant routing algorithms.When fault-tolerance adaptivity is added to a routing algorithm, its design complexity rises and if improperly designed, it becomes vulnerable to deadlock and other issues.The following system level faults discussed by Bengtsson et al. [25] in their study: Design of fault tolerant algorithm for network on chip router using field programmable … (Priti Shahane) 5

Fault due to dropping of data
These could happen in FIFO, routers, or multiplexers.For portraying the primary instance FIFO, the working has to be in details.In FIFO, the first and last areas of the FIFO are the head and tail counters.Consider the head counter gets damaged and present worth is expanded, in such a circumstance, a portion of data gets corrupted and doesn't come out.In a subsequent instance (router), assume that a fault has happened as it was steering a flit to its end point.In this circumstance, the flit gets expelled from the FIFO and does not get forwarded to any destination port.Subsequently, the packet gets dropped.

Fault due to corruption of data
It might occur in every one of segments of a switch.Consider a switch that has gotten the header flit of a data packet and has focused on a proper output port.Consequently, it sends the front bit along with its following flits, however in between this activity, an issue occurs and brings about to another data port.It can be considered that this issue a corrupt data fault as some part of the parcel is diverted away from its intended goal and will be lost because of absence of front bit which is a suitable steering data.

Fault in data travel direction
Choosing a wrong direction is the consequence of defective conduct of the network.Consider a switch with defective conduct and settles on mistaken choices instead of directing its approaching information.The flawed conduct could bring about giving unseemly select sign for router components and along these lines transferring an information to the output port that is not the intended one.

Space fault due to several copies
As examined over, flawed conduct of a switch could bring about giving improper select sign for units within the router.This type of wrong information transferring can affect the operation of several of the components in a given system.Along these lines, the information turns out from the right output port just as from at least one other output ports.In the event, if an unintended multiplexer chooses similar approaching information in light of the broken conduct, similar information will likewise go towards different yield port.

Time fault due to several copies
The different duplicates in time faults start from the FIFOs.In a FIFO with multiple copies, the fault leads to sending of old information to the ports.Such information is generally a previous parcel's flits and sending this prompts reiteration of a packet, i.e., different duplicates in time.
Along with these system level faults, routers tend to be affected by faults at architectural level in the form of link faults and node faults.Link faults occur when one or more links are unable to send the data to the next node and node faults occur when an entire node and all its surrounding links are become inactive in terms of data transfer.These faults are difficult to detect with the naked eye and can be taken care of with help of appropriate measures with the routing algorithm.These make data transfer difficult within the circuitry causing loss of information as well as power.In the study conducted, we are analysing node faults.We have generated a node fault in the 4×4 mesh architecture by disabling the node and then rerouting the data packet upon the detection of the fault.

FAULT TOLERANT XY ROUTING
This implemented algorithm works on the principle of firstly, detecting the fault and then rerouting the data packet to reach its intended destination via an alternate route following YX routing as soon as a fault is detected.Here, for convenience in terms of coding the node enable signal (signal from the basic building block; which acts as a grant signal for movement of data when its value is 1 and acts as a stop signal when 0) is made 0. Thereby, making a particular node as switched off/faulty.This is done so that no data transfer process is carried out through these faulty nodes and this mimics a faulty node in an NoC Router architecture.This is implemented over a 4×4 mesh topology of routers.The detailed working of fault tolerant XY routing algorithm is given as: Step 1. Data enters in and source address is divided into 2 parts: src_reg_x, src_reg_y as well as destination is divided into 2 parts: dest_x, dest_y Step 2. First src_y is compared to indicate whether data moves in which direction horizontally or in which direction along X-axis, i.e., east or west direction.
If src > dest, data is intended to move towards west.But it only moves in west if the node_enable signal is 1.If this signal is 0, it means that this particular node is faulty and it needs to be re-routed.
If src < dest, data is intended to move towards east.But it only moves in west if the node_enable signal is 1.If this signal is 0, it means that this particular node is faulty and it needs to be re-routed.Step 3.For rerouting, the x-bits of source and destination are compared.If x-bits of src < dest, only then it will be routed in north.Then it is proceeded to rerouted until it reaches a pre-destination node and then proceeds towards the original destination.If x-bits of src > dest, only then it will be routed in south.Then it is proceeded to rerouted until it reaches a pre-destination node and then proceeds towards the original destination.
Step 4. If a fault occurs in the first leg of data traversal, i.e., during it movement along X-axis, the data packet has to be rerouted based on the y-bits.If the y-bits of source is greater, then the packet has to move in north to reach destination and thereby, fault tolerance also has to proceed in the same direction.
Similarly, if the y-bits of source is smaller, then the packet has to move in south to reach destination and thereby, fault tolerance also has to proceed in the same direction.
Step 5.In the second leg of data traversal, i.e., during it movement along Y-axis, the data packet has to be rerouted based on the x-bits.Anytime a fault occurs along the Y-axis movement, it can be rerouted towards the east direction first.The only exception to this being all the nodes in the right most column.Step 6.For rerouting, the y-bits of source and destination is compared.
If y-bits of src < dest, only then it will be routed in east.Then it is proceeded to rerouted until it reaches a pre-destination node and then proceeds towards the original destination.If x-bits of src > dest, only then it will be routed in west.Then it is proceeded to rerouted until it reaches a pre-destination node and then proceeds towards the original destination.
As shown in the following figure, the traditional route used by the data for travelling form source (1110) to destination (0001) is to first travel in the X-direction.Once it reaches the same column as the destination, then it proceeds to move along the Y-direction until it reaches the intended destination.However, if a fault occurs at a node that is a part of the path taken to reach the destination (here fault is considered on node 1001), then upon detecting the fault, the packet reroutes its path in order to reach its destination.Data packet traversal for traditional and fault tolerant routing is given in the following Figure 2. Algorithm for routing with fault tolerant approach will work as per follows for this specific source and destination nodes with fault at node 5: Data packet is supposed to continue moving in north, but node_enable of 9 = 0. Therefore, it moves west (D -> C) packet_src_y_reg < packet_dest_y, i.e., As per XY routing it should move in East direction in order to reach the same column as destination.Due to fault at node 5, it to move in north if packet_src_y_reg == 2'b00.Therefore, it travels in north direction until it reaches 0. Data transfer is C -> 8 -> 4 -> 0.
The design is modelled in VHDL, using Xilinx ISE design suite 14.7 with Virtex 5 device xc5vlx50t-2ff665.Simulation is performed using iSIM simulator and synthesis is achieved using XST tool.The Figure 3 shows data transfer from source node 1110 to destination node 0001 with fault at node 1001.The fault tolerant algorithm reroutes the data packets from nodes 1110-1101-1100-1000-0100-0000-0001.The suggested fault tolerant algorithm provides optimized path for data packet traversal in the faulty node scenario.It may increase data latency due to rerouting of the packet.The proposed design of a 4×4 mesh topology NoC router is synthesized on Virtex 5 FPGA and the result of the synthesized design of fault-tolerant NoC algorithm is presented in Table 1.It is observed that LUT utilization is 12% and slice registers utilization is 25% on the Virtex 5 FPGA family device.The synthesized design supports a maximum operating frequency of 201 MHz.The overall power consumption is of 0.125 W using Xpower analyzer.

CONCLUSION
The proposed approach of fault tolerance in a 4×4 mesh topology was inspired by XY routing for mesh topology being the most commonly used routing algorithm.This methodology detects the fault when it appears in the pathway of a given source and destination.Upon detecting the fault in the path, router reroutes the data packet with new path based on the principles of XY routing.As it is based on XY routing, we can say that it takes lesser hops than other routing algorithms.This approach can be extended to even larger mesh topology-based systems.The future scope of the design is to develop fault tolerant algorithm for other NoC topology-based routers and optimize the router path to avoid data latency.

Table 1 .
Logic utilization summary