Image processing using a reconfigurable platform: Pre-processing block hardware architecture

ABSTRACT


INTRODUCTION
The majority of the time, image processing algorithms/architecture [1], [2] perform best when given specific types of data as inputs. However, in the vast majority of instances, the input picture fails to meet critical requirements. Prior to the application-specific processing [3], preprocessing takes place. The image storage problem is a significant issue in image processing. Many image file formats have been developed over the years with the aim of representing images in a streamlined and premium manner that can be used on a variety of platforms [4]. According to preprocessing, different images of the same type can have a different scale of signal intensities. The operations that are usually needed prior to the main data operations in IP [5] core are grouped as preprocessing functions. Hardware architecture is used as the initial method or preprocessing block in image processing applications with a higher degree of accuracy in reading pixel values in this study. As a result, these images are processed in such a way that they can be used for operations, reducing data storage access time [6]- [8]. Pre-processing often entails the elimination of unnecessary or irrelevant regions, as well as the enhancement of contrast and service features like zero padding.
Digital images containing a finite set of image components, usually known as pixels, are used to display two-dimensional images. Digital image processing allows for the retrieval, delivery, and representation of image data in a human-readable format [9]. In the implementation areas of image processing, a variety of techniques are applied to the chosen image data set for image pre-processing. This work has proposed a technique to improve memory read and write operations, which are needed by IP cores. The chosen pre-processing hardware block has proven to be the most effective method for reading pixel  [10]. This research paper is organized as follows. Section 2 discusses about Pre processing Blocks in Image Processing, Section 3 explains clearly about Pre processing block Hardware Architecture used for this research work. Section 4 shows Pre-procesing block memory. section 5 shows Experimental evaluation over reconfigurable platform carried throughout the preprocessing process. Finally, section 6 concludes the research work with its findings.

PRE PROCESSING BLOCKS IN IMAGE PROCESSING
Preparation of data is the primary goal of most preprocessing systems. So that following blocks can make optimum use of them. The main aim of pre-processing is to enhance the image's quality [11], [12] so that we can properly analyze it. We can remove unwanted distortions [13] and improve some features (reconstruction and regression) [14], [15] that are essential for the application we're working on by preprocessing. Those characteristics can differ depending on the application so that other types of algorithms [16] can use them effectively (general image processing, image enhancement, or image analysis).
The enormous amount of information needed to depict images is one of their most distinguishing features (architecture using Vedic computing) [17]. Even a gray-scale image with a moderate resolution, such as 512 by 512, requires 512 *512*8= 2*106 bits to represent. As a result, in order to store and transmit digital images (using XSG blocks) [18], some type of image compression and image edge detection [19] or the use of pre-processing hardware is required.
Within FPGA pre processing sub-systems (DSP modules) [20], algorithms evolve from standard software-suitable representations [21] to more hardware-friendly ones, which can completely exploit data parallelism [22], [23] across application-specific hardware architectures(signal and video processing architecture) [24], which are often significantly different from the conventional Von Neumann model, such as dataflow [25].
The aim of the architecture is to prepare data and make image processing activities easier [26], [27]. The general structure of the strategy suggested in this study is depicted in Figure 1. The image preliminary pre-processing method and the image screening algorithm are the foundations of this article.

PRE PROCESSING BLOCK MEMORY
Design of preprocessing block memory, two constraints need to be taken care from the user end. The first one is writing one pixel value which is of one byte into the memory at each clock cycle of the target device. The second one is the size of the input image (example: 512*512). The main target we are considering in this block memory is flexibility in modifying the size of kernel during read and writes operations. The ability to choose the kernel size during read operations is seen as a benefit over the inbuilt IP core model. A detail the IP core can access the data in terms of 2 powers bit [i.e. 2,4,8,16,32,64]data during read operation, but in the proposed design it can access based on the kernel requirements and not reserved to any specific values. A general 8bit pixel value * kernel size is the data accessed during read operation.

Write mode
As indicated by the address pointer, one pixel of data is written into the memory during this operation with regard to the clock cycle.

Read mode
To get around the first-in first-out (FIFO) paradigm, the proposed hardware architecture activates read operations 'N' times depending on the user's needs. The read operation iteration is activated based on the kernel size. For example, if the kernel size is 3 by 3, the values from three adjacent positions are read. This will decide whether enough data is available for IP core architecture. The user must specify the memory hardware location from which we will read. That is, the read output is extracted as a concatenation of threepixel data values highlighted by the read pointer from the pre-processing block hardware. Finally, data was accessed simultaneously from all three locations.

Additional user choice
Around the edges of an image, image padding adds new pixels. When advanced filtering methods are used, the border provides space for annotations or serves as a boundary. As user preference inputs, the three separate case studies are used.

Duplicate mode
Two rows and two columns have been added. The first row's and last row's pixel values are copied for the new row that comes before the first row and after the last row, respectively which is indicated in Figure 2. Similarly, the values of the first and last columns of the original image are copied to the new columns. Leading to the cascading of two additional rows and two additional columns, the image would be 514 by 514 after duplication.

Zero padding
Additional rows are added, with all pixel values set to zero which is indicated in Figure 3. Similarly, columns with a pixel value of zero are introduced.

Non duplicate mode
For IP core operations, function with existing/ignore the boundaries in this case model. Ignore the value of the edge pixel and compute for those pixels that have all of their neighbors which are highlighted in Figure 4.

EXPERIMENTAL EVALUATION OVER RECONFIGURABLE PLATFORM
On an FPGA, hardware design strategies such as parallelism and pipelining are feasible, which are not possible in dedicated DSP modules. The use of reconfigurable hardware to implement image processing algorithms reduces time-to-market costs, allows for rapid prototyping of complex algorithms, and simplifies debugging and verification. As a result, FPGAs are an excellent alternative for real-time image processing algorithms. Verilog coding is used in this research paper to build the preprocessing hardware architecture. The verification of functionality is carried out on reconfigurable hardware with the help of design flow diagram which is shown in Figure 5. This is the external view that is included in the 5 by 5 kernel shown in Figure 6. The output is indicated as 40 bits in this case. In one clock cycle, that's 5 times the 8-bit pixel value which depicts in the above figure. It is having access to all the external input and output pins.  The findings from Figure 8, demonstrate the effectiveness of FPGA-based reconfigurable systems in image processing applications, demonstrating a significant speedup over software versions as operations become more complex. Then, using our scheme, we will be able to relax the number of operations in a realtime application, enabling us to incorporate more complex algorithms.

RESULTS AND DISCUSSION
Convolution is a general-purpose image filter effect that determines the value of the center pixel by adding the weighted values of all its neighbors. The product of convolution of 512*512 image matrix with kernel of [(3 *3), (5*5) … (K*K)] is a new modified filtered image. Overlap the kernel on top of the image, calculate the product of the mutually overlapping pixels and their sum in each case, and the result will be the value of the output pixel at that particular location Figure 9 depicts input pixels being written into memory and accessed through a read operation; the above results are for a 3 by 3 matrix kernel. In other words, it's turning on the consecutive read operation three times in a row.  Figure 9. Simulation results for preprocessing architecture of the work

CONCLUSION
The pre-processing technique used in this study aids in the access of data required by the IP core and the enhancement of image quality by the use of various image processing techniques. The findings are analyzed and checked with a standard reconfigurable platform (Zynq board), and the consistency in terms of hardware utilization is also evaluated (area) The focus of this research is to concentrate on selecting the appropriate memory operations, such as multiple read and write techniques, as well as taking into account the user's choice of kernel size as a primary input. Not only does the pre-processing technique minimize memory access time. The information gathered as a result of this process could be useful in furthering this study. FPGAs as deployment reconfigurable platforms for high-level image processing applications include efficient mapping of high-level descriptions of image frames to low-level memory systems.