# FPGA Implementation of DTCWT and PCA Based Watermarking Technique

# M. S. Sudha<sup>1</sup>, T. C. Thanuja<sup>2</sup>

<sup>1</sup>Department of Electronics and communication Engineering, Jain university, Bengaluru, India <sup>2</sup>Department of Very large scale integration and Embedded system, Center for Post graduation Studies, VTU, Belgaum, India

#### Article Info

#### Article history:

Received March 03, 2018 Revised May 04, 2018 Accepted May 28, 2018

# Keywords:

2D DWT 2D signal spectrum DTCWT PCA Robustness

# ABSTRACT

The hardware implementation of the image watermarking algorithm offers numerous distinct advantages over the software implementation in terms of low power consumption, less area usage and reliability. The advantages of Dual Tree Complex Wavelet Transform (DTCWT) and Principle Component Analysis (PCA) techniques are extracted to improve the robustness and perceptibility. The hardware watermarking solution is more economical, because adding the component only takes up a small dedicated area of silicon. The algorithm is developed and simulated using Matlab, Simulink and system generator. The implementation is carried out using Spartan 6 Diligent Atlys Field Programmable Gate array (FPGA). The architecture uses 256 slice registers, 257 slice Look Up Tables (LUT's) and 47 I/O pins. It also meets the requirement of high speed architecture with a delay of 1.328ns and an operating frequency of 549.451MHz.

Copyright © 2018 Institute of Advanced Engineering and Science. All rights reserved.

#### Corresponding Author:

M. S. Sudha, Department of Electronics and Communication Engineering. Jain University, Atria Towers, No.1, Palace Road, Bangalore 560001, India. Email: sudhams@sapthagiri.edu.in

#### 1. INTRODUCTION

The technology of digital watermarking has gained great success to solve the basic problem of legal ownership and content authentication for digital media such as like image, video, music etc. These problems arise due to advances in internet and computer technology in the recent years. Development in these two technologies coming together provides the tool for unlimited copying of data and share it on internet without any loss in fidelity [1].

Digital watermark is the information signal that contents the owners copyright information to protect the multimedia data. Later, watermark can be extracted from suspected image to verify the ownership identification [2]. The hardware implementation of watermarking offers several distinct advantages over the software implementation in terms of low power consumption, less area usage and reliability [3]. The hardware implementation of is done on custom-designed circuitry either on Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGA). Its attributes are real time capabilities, compact implementations, power consumption, propagation delay in turn speed and area required.

The main objective of this paper is to describe an invisible, robust, efficient hardware based concept of a digital image watermarking system, with additional features low power consumption, low cost implementation, high processing speed and reliability [3]. The proposed algorithm is based on two techniques DTCWT and PCA. The transform DTCWT is an enhanced version of discrete wavelet transform, which includes the advantage of good directionality, phase information and perfect reconstruction.

82

83

The PCA is basically used to reduce a complex datasets to a lower dimension. The algorithm has the inherent property of removing the correlation amongst the data i.e. the wavelet coefficients and it helps in distributing the watermark bits over the sub-band used for embedding thus resulting in a more robust watermarking scheme that is resistant to almost all possible attacks [4]. The watermark is embedded into the luminance component of the extracted frames as it is less sensitive to the human visual system. The proposed algorithm is designed in Matlab 2013a, Simulink and system generator, Xilinx 14.7. The algorithm is implemented using Spartan6 Digilent Atlys. In this paper section 2 gives review of literature, section 3 explains the watermarking algorithm, section 4 is about architecture of algorithm using simulinkblocks, section 5 gives results and lastly section 6 is conclusion.

# 2. LITERATURE SURVEY

Data authentication is one of the most important requirement in present day communication systems. One of the popular technique used in authentication is watermarking techniques. Lim, Hyun et.al has used Discret cosine transform and Least Significant Bit (LSB) technique to design invisible [5] watermarking techniquefor thedigital cameras and it is tested by implementing it on FPGA kit. The watermark is embedded to the image coming out of senser much faster than the software implementation. Tamilvanan et.al adopted DWT and LSB technique to improve the robustness and perceptibility of the watermark technique. This watermarking technique is implemented on FPGA [6]. Sonjob Deb Roy, et.al presents a hardware implementation of a digital watermarking system that can insert invisible, semifragile watermark information into compressed video streams in realtime [7]. The work discusses about the watermark embedding using discrete cosine transform domain which usually consumes high power as it requires larges computations and also results in degradation of the quality. Enas Dhuhri Kusuma and et.al also discusses DCT based image compression using Hardware Descriptive Language (HDL) for hardware application which in turn results in high power consumption [8].

Pankaj U. Lande et.al described water marking algorithm in Discrete Hadamard Transform (DHT) domain [9]. The algorithm was developed using the human visual system (HVS) based on DHT. This technique consumed larger area. Therefor, there is a need of algorithm for high speed applications. The paper also discuss about a DCT/IDCT for digital watermarking in spatial domain. Rajesh Kannan Megalingam et.al describes [10] theimplementation of image watermarking method in the spatial domain and DCT using Matlab, Verilog HDL and FPGA.Peak signal to noise ratio(PSNR) is compared for both domains.

S. Sowmya et.al narrates the FPGA implementation of enhancement techniques during trasmission due to the loss of information. The algorithms were implemented on xc2vp30-7ff896 target device. Number of block Random Access Memory (RAM) required in case of histogram equalization is less as compared to brightness control and contrast stretching. The minimum period in case of all the the implementations is 5.001ns [11]. P Karthigaikumara et.al describes the fragile and semi fragile watermarking techniques and some serious disadvantages like increased use of resources, larger area requirements, and high power consumption. In order to overcome this, robust invisible watermarking technique is used using DWT [12].

Wael Wasfya et.al discusses about the increasing the speed and accuracy for a fast image processing algorithms during computing the image intensity for low level 3x3 algorithmwith different kernel but having the same parallel calculation method is achieved in this work [13]. The paper also explains FPGA is one of the fastest embedded systems that can be used for implementing the fast image processing image algorithms by using DSP slice module inside the FPGA [13]. The advantage of the DSP slice is a faster, accurate, higher number of bits in calculations. Using a higher number of bits during algorithm calculations will lead to a higher accuracy compared with using the same image algorithm calculations with less number of bits, also reducing FPGA resources as minimum as possible.

Mohammad-Reza Keyvanpoura discusses the importance of wavelets for watermarking and comparison of various transform domain techniches. This paper discusses the usage of DWT in designing authentication algorithm which fails for different variances and different directionalities. Hence need of wavlets like DTCWT is required to achieve higher degree of freedom [14].

#### 3. THE IMAGE WATERMARKING ALGORITHM

The proposed watermarking is developed using transform based domain technique. In transform domain the transform coefficients are modified rather than the pixel value. To detect watermark, Inverse transform is used. The DTCWT is the transform domain used in this work. The principle component algorithm, LSB techniques are used to embed the watermark signal [4]. The embedding and extraction algorithm is shown in Figure 1.



Figure 1. Watermark embedding and extraction algorithm

Figure1 shows the embedding and extraction algorithm. These steps are as follows

- a. The host and watermark images of any size are converted to grayscale and then resized to smaller value to reduce the complexity of processing the image.
- b. Dual tree complex wavelet transform is performed to generate 8 different subbands LaLa, LaLb, LaHa, LbHb, LaHa, LbHb, HaHa and HbHb.
- c. TheLaLa component is used for watermark embedding andPrinciple component analysis is applied on this LaLa sub band coefficients.
- d. Least significant bit technique is used to embed the watermark signal in host image.
- e. The watermarked image can be extracted using extraction algorithm which is a reverse process of embedding [4].

Peceptibility is checked by calculating peak signal to noise ratio between original host image and the watermarked image. The extracted watermarks can be compared with original watermark subjectively. Beside subjectively judgment for the watermark fidelity, we have defined an objective measure of similarity between the original watermark and the extracted watermark.

For instance applying any image processing operation to the watermarked image that performs low pass filtering (compression, resizing), will result in loss of multiscale DT-CWT coefficients in higher frequency bands of the watermark. In this case, multiscale DT-CWT coefficients in lower frequency sub bands to be used to determine whether suspected image contains watermarks. The signals are usually embedded into the perceptually important components of the host image to achieve a balance of perceptual quality and Robustness.

## **3.1. DTCWT Computation**

Dual-tree complex discrete wavelet transform (DTCWT) provides advantages over the critically sampled DWT for signal, image, and video processing.



Figure 2. DTCWT level decomposition

The DTCWT is one of the most promising decompositions that has an advantage of good directionality, phase information, perfect reconstruction and limited redundancy. The drawbacks of DWT is satisfactorily removed in the dual-tree complex wavelet transform (DTCWT) [4]. Two classical wavelet trees (with real filters) are developed in parallel, with the wavelets forming (approximate) Hilbert pairs.

One can then interpret the wavelets in the two trees of the DTCWT as the real and imaginary parts of some complex wavelet. The requirement for the dual-tree setting for forming Hilbert transform pairs is the well-known half sample delay condition. The resulting complex wavelet is then approximately analytic (i.e. approximately one sided in the frequency domain). It has the ability to differentiate positive and negative frequencies. It produces six subbands oriented in  $\pm 15$ ,  $\pm 45$ ,  $\pm 75$ . Figure 2 shows the DTCWT decomposition of two trees which consists of real and imaginary parts. The Matlab simulation of 1-level DTCWT is shown in Figure 3. The information is available at LaLa band and remaining sub bands has high frequency component. Since human eyes are not sensitive to small change in the edges and the textures of the image, LaLa subband is selected to embed watermark signal efficiently and invisibility of the watermark is kept at low resolution.



Figure 3. 1-Level DTCWT

# 4. ARCHITECTURE OF THE WATERMARKING ALGORITHMS.

The Designed and simulated watermarked algorithm using MATLAB is implemented on the Spartan 6 Diligent Atlys FPGA kit to check the area required, speed and power consumption. The VHDL and Verilog code can be automatically generated for Xilinx FPGAs from MATLAB using HDL Coder, Simulink, and Stateflow models.HDL Coder supports code generation for Simulink models constructed with a combination of blocks from Simulink and Xilinx-specific block sets from system generator. The system generator Subsystem block in HDL coder enables to include models built with system generator in Simulink as subsystems. HDL Coder uses system generator to generate code from the subsystem blocks and integrates the complete design into synthesizeable HDL.

#### 4.1. Watermark Embedding and Extraction System

Simulink model of a watermark embedding and extraction system is shown in Figure 4. Import host and watermark images of any size. The color space conversion converts the input values from the R'G'B' color space to intensity. These two images are resized to smaller dimensions to make the the processing easier and resizing technique used here is bilinear interpolation. The resized output is padded with constant values to improve the resolution of the image. The preprocessing includes transposing, 2D to 1D conversion, frame conversion and then unbuffer or convert to scalar sample output at higher rate.







Figure 4. Simulink model of a watermark embedding and extraction system

# **4.2. The DTCWT computation Model**

The DTCWT is performed after pre processingon host and watermark image Figure 5 shows simulink model of DTCWT .Each subsystem uses real DTCWT 1st level co-efficient real (tree a) and imaginary (tree b) parts. Real and imaginary parts of DTCWT form a quadrature pair. Subsystem1 and subsystem 2 process the co efficient through 2nd level DTCWT coefficients generated is shown in Sthe Figure 4. Each subsystem uses real DTCWT 1st level co-efficient real (tree a) and imaginary (tree b) parts. Real and imaginary parts of DTCWT 1st level co-efficient real (tree a) and imaginary (tree b) parts. Real and imaginary parts of DTCWT form a quadrature pair. Subsystem 2 process the co efficient through 2nd level DTCWT coefficients and subsystem 2 process the co efficient through 2nd level DTCWT form a quadrature pair. Subsystem 1 and subsystem 2 process the co efficient through 2nd level DTCWT coefficients.



Figure 5. Simulink model of 2 level decomposition using DTCWT

**D** 87

#### 4.3. The IDTCWT computation Model

The extraction algorithm includes reverse process of embedding.Inverse DTCWT is used to extract the watermark from host image. Figure 6 shows simulink model of IDTCWT and extraction algorithm. Proposed algorithm based IDTCWT provide better features using low pass and high pass analytic filters.



Figure 5. Inverse dual tree complex wavelet transform (IDTCWT) at extraction.

# 5. EXPERIMENTAL RESULTS

The implementation is carried out using simulink, system generator and Spartan 6 Digilent Atlys. To view the images simulink models are simulated results are displayed in 5.1.HDL coder will convert Matlab code to Verilog HDL code and then implemented on Digilent Atlys. Reports of this implementation gives speed, power consumption and area required, this information is displayed in section 5.2.

#### 5.1. Proposed Architecture is Designed Using Matlab, Simulink and System Generator

The designed simulink model for watermark embedding and extraction algorithm is simulated and results are shown below. The video viewer display of host image, watermark image, embedded image and extracted image are shown Figures 6-9 respectively.

| Video Viewer3                   | Video Viewer1                   |                      |
|---------------------------------|---------------------------------|----------------------|
| File Tools View Simulation Help | File Tools View Simulation Help | 3                    |
| 🚔 🏚 🕕 💱 🔍 Q. Q. 🐡 🌄 100% 🗸 🗸    | 🚔 🏠 🚺 🗛 🔍 💥 100%                | •                    |
| 0 🕨 🖲 🍓 🐎                       |                                 |                      |
|                                 |                                 |                      |
|                                 | Ready                           | E128×128 T=73728.000 |



Figure 7. Watermark image

FPGA Implementation of DTCWT and PCA Based Watermarking Technique (M. S. Sudha)





Figure 8. Watermark embedded image

Figure 9. Extracted Watermark image

# 5.2. The FPGA Implementation Results

88

Proposed work is also designed using Verilog using Spartan 6 XC6SLX45 Digilent Atlys. HDL coder generates VHDL or Verilog code. The RTL (Register Transfer block) block of watermarking system is shown in Figure 10. The LUT's of the RTL block are shown in Figure 11. The device utilization summary, timing summary and RTL view of proposed work is as shown.



Figure 10. RTL schematic

| 8                                                          | default_clock_driver_dtcwt_wm_main_embedd    | dtcwt_wm_main_embedd    | gateway_out1(31:0) |
|------------------------------------------------------------|----------------------------------------------|-------------------------|--------------------|
| <u>cik</u><br>gdeway1(7 <u>20)</u><br>gdeway1(7 <u>20)</u> | default_clock_driver_dtcwt_wm_main_embedd_x0 | dtcwt_wm_main_embedd_x0 |                    |
|                                                            | xlpersistentdff                              |                         |                    |

Figure 11. Look Up Tables

0%

0%

3%

0%

0%

99%

21%

6%

Selected Device : 6s1x45csg324-2 Slice Logic Utilization: 257 out of 54576 Number of Slice Registers: 256 out of 27288 Number of Slice LUTs: 256 out of 6408 Number used as Memory: Number used as SRL: 256 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 257 0 out of Number with an unused Flip Flop: 257 Number with an unused LUT: 1 out of 257 Number of fully used LUT-FF pairs: 256 out of 257 Number of unique control sets: 2 IO Utilization: Number of IOs: 47 Number of bonded IOBs: 46 out of 218 Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 1 out of 16

Device utilization summary:

Figure 12. Device utilization summary

Figure 13. Timing summary

Figure 12 and Figure 13 show device summaries, timming summary of watermarking system respectively. Device summary reveals area occupied by the watermarking system. Timming summary gives information about clock period that is 1.82ns, operating frequency 549.451MHz and delay of 1.32ns. Figure 14 shows power consumption report. Total power is sumof dynamic and qucient power. This system requires 0.004w dynamic and 0.036 quicent power for the image of 128x128.

| A               | В             | C      | D       | E          | F             | G           | H               | IJ     | К         | L           | М           | N           |
|-----------------|---------------|--------|---------|------------|---------------|-------------|-----------------|--------|-----------|-------------|-------------|-------------|
| Device          |               | 1      | On-Chip | Power (W)  | Used          | Available   | Utilization (%) | Supply | Summary   | Total       | Dynamic     | Quiescent   |
| Family          | Spartan6      |        | Clocks  | 0.004      | 1             | -           |                 | Source | Voltage   | Current (A) | Current (A) | Current (A) |
| Part            | xc6sbx45      |        | Logic   | 0.000      | 204           | 27288       | 1               | Vccint | 1.200     | 0.019       | 0.004       | 0.015       |
| Package         | csg324        |        | Signals | 0.000      | 1616          | -           |                 | Vccaux | 2.500     | 0.005       | 0.000       | 0.005       |
| Temp Grade      | C-Grade       | -      | DSPs    | 0.000      | 20            | 58          | 34              | Vcco25 | 2.500     | 0.002       | 0.000       | 0.002       |
| Process         | Typical       | -      | 10s     | 0.000      | 46            | 218         | 21              |        |           |             |             |             |
| Speed Grade     | -2            |        | Leakage | 0.036      |               |             | e               |        |           | Total       | Dynamic     | Quiescent   |
|                 |               |        | Total   | 0.040      |               |             |                 | Supply | Power (W) | 0.040       | 0.004       | 0.036       |
| Environment     |               |        |         |            |               |             |                 | 10     |           |             |             |             |
| Ambient Temp    | (C) 25.0      |        |         |            | Effective TJA | Max Ambient | Junction Temp   |        |           |             |             |             |
| Use custom TJ   | A? No         | -      | Thermal | Properties | (C/W)         | (C)         | (C)             |        |           |             |             |             |
| Custom TJA (C   | W) NA         |        |         |            | 22.6          | 84.1        | 25.9            |        |           |             |             |             |
| Airflow (LFM)   | 0             | -      |         |            |               |             |                 |        |           |             |             |             |
| Heat Sink       | None          | -      |         |            |               |             |                 |        |           |             |             |             |
| Custom TSA (C   | /W) NA        | 2002   |         |            |               |             |                 |        |           |             |             |             |
|                 |               |        |         |            |               |             |                 |        |           |             |             |             |
| Characterizatio | 1             | 2.<br> |         |            |               |             |                 |        |           |             |             |             |
| Production      | v1.3.2011-05- | 04     |         |            |               |             |                 |        |           |             |             |             |

Figure 14. Power consumption report

FPGA Implementation of DTCWT and PCA Based Watermarking Technique (M. S. Sudha)

## 6. CONCLUSION

The hardware implementation of the image watermarking algorithm based on DTCWT and PCA is carried out using image Matlab,simulink,system generator and interfaced with Spartan6 Digilent Atlys C6SLX45. The watermarking architecture uses 256 slice registers, 257 slice LUT's and 47 I/O pins. It also meets high speed architecture requirements with a delay of 1.328ns and reaches an operating frequency of 549.451MHz and a dynamic power of 0.004 and quicent power of 0.036mW for 128x128 image size.

#### ACKNOWLEDGEMENTS

I thank to Sapthagiri College of engineering, Bangalore, India for the facility extended to carry out my work. The software and FPGA boards are supplied by the college.

#### REFERENCES

- [1] Chang and J. C. Chuan, "An image intellectual property protection scheme for gray-level images using visual secret sharing strategy," Pattern Recognition Letters, vol. 23, pp. 931-941, June 2002.
- N. Nikolaidis, I. Pitas, "Robust Image Watermarking in Spatial Domain", International journal of signal processing, 66(3),385-403, 1988.
- [3] Xin Li, Yonatan Shoshan, et.al. "*Hardware implementation for video watermarking*" Conference on Information Research and Applications, 2008, Varna, Bulgaria, June-July 2008.
- [4] Sudha, M. S., and T. C. Thanuja. "A Robust Image Watermarking Technique using DTCWT and PCA." International Journal of Applied Engineering Research 12.19 (2017): 8252-8256.
- [5] Lim, Hyun, Wan-Hyun Cho. "FPGA implementation of image watermarking algorithm for a digital camera." Communications, Computers and signal Processing, 2003. PACRIM. 2003 IEEE Pacific Rim Conference on. Vol. 2. IEEE, 2003.
- [6] Tamilvanan, K., and R. B. Selvakumar. "FPGA implementation of digital watermarking system." *International Journal of Computer Science and Mobile Computing (IJCSMC)* 3.4 (2014): 1321-7.
- [7] Sonjoy Deb Roy, Xin Li, Yonatan Shoshan, Alexander Fish, Member, IEEE, and Orly Yadid-Pecht, "Hardware Implementation of a Digital Watermarking System for Video Authentication", 1051-8215, *IEEE*, 2012.
- [8] Enas Dhuhri Kusuma, Thomas Sri Widodo, "FPGA Implementation of Pipelined 2D-DCT and Quantization Architecture for JPEG ImageCompression", *IEEE*, 2010.
- [9] Pankaj U. Lande, Sanjay N. Talbar, G.N. Shinde, "FPGA Implementation of Image Adaptive Watermarking UsingHuman Visual Model", *ICGST-PDCS Journal*, Volume 9, Issue 1, October 2009.
- [10] Rajesh Kannan Megalingam, Mithun Muralidharan Nair, Rahul Srikumar, Venkat Krishnan Balasubramanian, Vineeth Sarma Venugopala Sarma, "Performance Comparison of Novel, Robust Spatial Domain Digital Image Watermarking with the Conventional Frequency Domain Watermarking Techniques", *IEEE*, 2010.
- [11] Sowmya, S., and Roy Paily. "FPGA implementation of image enhancement algorithms." Communications and Signal Processing (ICCSP), 2011 International Conference on. IEEE, 2011.
- [12] Karthigaikumar, P., and K. Baskaran. "FPGA implementation of High Speed Low Area DWT based invisible image watermarking algorithm." Proceedia Engineering 30 (2012): 266-273.
- [13] Wasfy, Wael, and Hong Zheng. "General structure design for fast image processing algorithms based upon FPGA DSP slice." Physics Procedia 33 (2012): 690-697.
- [14] Keyvanpour, Mohammad-Reza, and Farnoosh Merrikh-Bayat "Robust dynamic block-based image watermarking in DWT domain." Proceedia Computer Science 3 (2011): 238-242.

## **BIOGRAPHIES OF AUTHORS**



Mrs Sudha M S is workig as Assitant Professor in Electronics and Communication Engineering department at Sapathagiri college of Engineering, Bangalore. Her Research Insterests are Digital image watermarking.



Dr. T. C Thanuja is working as professor in Department of VLSI and Embedded system, VTU Belaguam. Her research insterests are Digital image watermarking and Nanotechnology.