Proximate node aware optimal and secure data aggregation in wireless sensor network based IoT environment

ABSTRACT


INTRODUCTION
Wireless sensor network, also popularly known as WSN comprises some sensors.One can define it as a wireless network that is self-configured and it does not have and specific infrastructure.The purpose of this is to document and observe the environment and its physical circumstances.Moreover, internet of things (IoT) collaborated with WSN possesses enormous application; the information that is collected will be preserved in a centralized position.Lately, the WSNs have got substantial recognition due to its applications in various fields like military, healthcare industries and underwater observations as it is inexpensive, consumes minimal area [1]- [3].
Lately, the WSNs have stretched their applications to various sectors like intelligent factories, and smart cities.Here, the device, data, and network administration techniques of the WSNs play an important role.To implement the factory operations intelligently, the sensor nodes are used for accumulating information on machines and products.The vital part of planning a smart city is the efficient usage of the available resources without any wastage.Hence, the WSNs are employed to provide the people and the municipal workers a wellorganized service.Figure 1 shows the typical data transmission in the WSN based environment; moreover, it comprises source devices i.e. sensor node, cluster head also known as CH, and base station.Moreover, the data is sensed through the source device and it is sent to the cluster head further and through the cluster head, it is sent to a base station.We can observe the distribution of the sensor nodes over a maximum area as it can sense the monitoring information and transmit the data to a server or a sink.For the successful transmission of the accumulated data to the server, the multi-hop transmission method i employed.The server, where the data storage takes place will be positioned far away from the source sensor node viz, out of the range of transmission.For the determination of an ideal sink route, sensor node accumulation is done [4]- [6].One can expect some redundancy in the source information as the information is being accumulated by many sensors and they follow the same phenomena though they are positioned in their respective areas.The popularly employed method in WSN for redundancy removal and the reduction of the transmission or information size is data aggregation (DA) viz.for the processing within the network.This method decreases the energy consumption during the data collection.In various applications of WSN, there is no requirement to transmit the exact data that is collected at the sensor node to the sink node and there can be processing done on the data for easy transmission.Based on the observing purposes of applications, multiple aggregation methods can be employed for the abstraction or compression of the raw information present in the network.
The techniques that are used here are, abstracting as {mean, variance}, maximum and minimum values, lossy compression, feature domain reduction, and information prediction.An increase in the value of the correlation between the information accumulated by multiple sensors increases the efficiency of the DA.DA can successfully enhance the data accumulation's energy efficiency where the aggregation of the sensed information is done by the relaying nodes [7].DA has a plethora of applications because of its dominance inefficient energy usage.But assuring its security is a significant thing as the WSNs are mostly used in a neglected or hostile domain where forging of the data in the process of delivery or the capturing of the sensor nodes may be the consequences.We can say network security aims to assure confidentiality, integrity, and availability (CIA).To achieve these multiple approaches are delineated namely, encryption, vulnerability analysis, authentication, and detection of the attack.Yet, conventional security approaches cannot be employed to DA directly as they can clash with a WSN with DA [8].Considering the encryption as an example, during the process of aggregation operations (i.e.add, multiply, subtract, divide, max/min, sum, and average) there is a requirement of the actual plain text but the encryption of the data restricts the relay nodes from plaintext being used.
An ideal remedy to solve this issue is the usage of a sharing key, initially, the two nodes should get the sharing key, and then the source will encrypt its sensed information to a ciphertext and the destination gets the ciphertext then decrypts the text using the shared key.By utilizing this method, the plaintext is transmitted without making itself visible to other nodes [9].Furthermore, in past, several types of research were carried out for secure DA; the dew of them have been reviewed further.Girao et al. [10] concealed data aggregation (CDA) was introduced; symmetric homomorphic encryption is the base for this algorithm.Here, aggregation of the encrypted information can directly be performed by individual nodes.However, this approach has the disadvantage of not providing proper security as every node in the CDA shares a similar key.So, if one node is failed it troubles the whole global network.Research by Castelluccia et al. [11] delineates an algorithm that is implemented using the one-time pad for the CDA to be enhanced.For assuring information accuracy this algorithm needs extra data which, in turn, exceeds the communication overhead; a technique is developed where the encryption of the information present in the network would be done using the fully homomorphic encryption method [12].This method will strengthen security and will decrease energy consumption.Research Proximate node aware optimal and secure data aggregation in wireless sensor … (Sushma Priyadarshini) 145 by Zhang et al. [13] presented an algorithm is presented which is efficient in DA and it utilizes techniques like exclusive-or (XOR) homomorphic encryption and the method of probabilistic coding.Chim et al. [14] presented by using the features of Parlier homomorphic encryption and the bloom filters.According to Cheon et al. [15] a unique method is used where homomorphic cryptography plays an important role.By using this method, there is no requirement for secret key management.Hence, this method removes the problem of encrypting and decrypting the information inside the network.According to Acs and Castelluccia [16] gamma distribution and the method of homomorphic encryption are used for introducing a new scheme.The methods guarantee that the intruders and the aggregators cannot get the original information of each user in the process of aggregation.According to Ding et al. [17] for the deduction of the overhead during the process of encryption or decryption, the homomorphic encryption method is presented.This method will improve the security also against quantum computation.Research by Kapusta et al. [18] delineates a method called as additively homomorphic encryption and fragmentation (AHEF) scheme.Research by [19], [20] this scheme, put backs additively homomorphic fragmentation at the place of additively homomorphic secret sharing which is employed in the current methods.Data volume reduction and reduction in energy intake can be observed after these changes are made in the technique, [21] and [22] also developed a secure approach for DA, which was promising; however, they failed to address the nodes privacy [23]- [25].
In general, sensor nodes in WSN operate with limited resources like energy, storage due to the simplicity of WSN architecture.There are several motivational areas such as energy utilization, which can increase the network lifetime; furthermore, DA is one of the mechanisms, which helps to tackle the issue of computation overhead, data redundancy and improvises the network lifetime.However, secure DA remains a foremost issue in the DA; hence motivated by these, this research work designs and develops PNA-SDA mechanism, further contribution of research work is given as: i) in this research work, the PNA-SDA mechanism is introduced for secure and optimal DA, ii) PNA-SDA mechanism is a proximate node aware aggregation where proximate nodes hold the information of others and further it is updated in each state, and iii) PNA-SDA is evaluated considering average energy consumption, average deceased node; also, comparative analysis is carried out with the existing model and PNA-SDA outperforms the existing model.
This research work comprises various distinctive section and sub-section, the first section of the research work starts with background and application of WSN based IoT; further in the same section security issue is addresses and different related work is reviewed.Moreover, this section ends with the motivation and contribution of our research work.The second section involves the mathematical design of the PNA-SDA mechanism along with the algorithm.The third section of the research work evaluates the methodology along with comparative analysis and discussion.

PROPOSED METHOD
In this section, we design and develop a secure and efficient DA method.This not only preserves the particular node privacy but also provides efficiency for network lifetime such as optimal energy consumption.Moreover, the PNA-SDA methodologies are divided into several parts i.e. network design problem definition, optimal, and secure DA.

Network model and problem definition
PNA-SDA methodologies initialize the network and assume that in a network nodes are self-arranged in any given cluster; further, this network follows a hierarchical level of clustering where each node holds the same probability of selected as cluster head as .Let us consider any connected graph with characteristics of undirected graph denoted as  = (, ) which represents the node-set and communication link.The main intention to develop the communication link is to secure the network topology information.Furthermore, any communication link (, ) belongs to  if and only if two distinctive edges  and  can communicate with one another.Furthermore, let us consider   as the proximate node-set, then the proximate node is mathematically formulated as (1).

Problem definition
To develop a secure DA mechanism, each node will communicate with the proximate node and update the information; moreover, in order to secure, the model additional data (randomly added data similar to noise) is added and information is sent to the proximate node.Moreover, the information sent is given through as (2).
In ( 2),   indicates the additional data with  belongs  ; further, the updated equation can be written as (3).
The equation   and   are considered as the weight matrix where   and   is greater than 0; also  is a stochastic matrix where the average is computed through the (5).
Considering the node's privacy, initial state sharing is a real concern and the node might not be willing to share the real state to its prominent nodes; thus, we use the additional data added to the original state whenever nodes tend to communicate with the proximate nodes.Moreover, this can be mathematically represented as (6).
Thus, we design an algorithm by adding the additional data such that the designed objective in the equation.

Proximate node aware secure data aggregation algorithm
The Algorithm 1 is designed for number of nodes.Γ indicates the threshold iteration which is equal to  2 .Furthermore,  2 gives the guarantee of absolute DA.Also, each involved node ends the iteration once the proximate nodes are found.

Algorithm 1. PNA-SDA algorithm
Step 1. Selecting the individual element in   (0) from random data.
In the case of secure DA, node  in the PNA-SDA model only sends the sequence i.e. ̂() where  = 0.1 to the proximate nodes; further, in case of each data packets of ̂() there is additional data   (0) which is added to ().Thus, any external node or apart from the proximate node, other nodes will not have any kind of information.Further, when  is greater than or equal to 1, then ̂() is updated and it will be different from the initial state since each update comprises the averaging process from the information through the proximate nodes.Thus, for all  ∈   , the information set for the available node  at given iteration  is given as (7).

PERFORMANCE EVALUATION
In this section of the research, we evaluate the PNA-SDA model; moreover, the PNA-SDA model is evaluated through designing the specific network parameter given in the Table 1.Furthermore, evaluation is carried out on the Windows 10 platform using the visual studio 2017 integrated development environment (IDE) using the sensoria simulator; moreover, system architecture follows the 8 GB of Cuda enabled Nvidia RAM and 1 TB of a hard disk.Furthermore, a sensoria simulator is used for the simulation.

Energy consumption
Energy plays an important role in network lifetime; thus, we adopt suitable DA, which can optimize the energy consumption.Figure 2 depicts the energy consumption considering the various percentage of dishonest nodes.In the case of 20% dishonest nodes, the average energy consumption is 0.00712 mj, for 40% dishonest nodes energy consumed is 0.07709 mj.Similarly, in the case of 60% of nodes, the average energy consumption is 0.008189 mj.

Packet identification rate
In general, considering the secured DA, the network model comprises two distinctive types of nodes i.e. sincere nodes and compromised nodes.Thus, compromised nodes have the compromised packet.Packet identification rate is one parameter that identifies the correct node identification.Figure 3 shows correct packet identification comparison with the existing model through varying the percentage of compromised nodes.Moreover, in the case of 20% compromised nodes, the existing model achieves the packet identification of 83 whereas the PNA-SDA model achieves an identification rate of In the case of 40% compromised nodes, the existing model identifies 86 correct packets whereas the PNA-SDA model identifies 89 packets.Similarly, for 60% of compromised nodes, existing model identifies 90 packets whereas the PNA-SDA model identifies 96 packets.

Throughput
A general definition of throughput is the rate at which work is being done; it is one of the primary parameters that is considered to prove the model efficiency.Figure 4 shows the throughput comparison of the existing and PNA-SDA model by varying the number of compromised nodes (in percentage).Thus, in case of 20% nodes, throughput of existing model is 0.5395 whereas PNA-SDA model gets the throughput value of 0.585, similarly in case of 40% compromised nodes, existing model achieves the throughput of 0.3182 whereas PNA-SDA model gets throughput of 0.3293.At last, for 60% compromised nodes, existing model gets throughput of 0.189 and PNA-SDA model gets throughput of 0.2016.

Average deceased nodes
Nodes are primary components of WSN; however, number of alive nodes makes model more efficient and it further increases the network lifetime, as less energy is required for the data transmission.Figure 5 shows the average number of deceased node in network with various percentage of compromised nodes.In case of 20% nodes, average node deceased is 0.03.Similarly, for 40% and 60% of compromised node, deceased nodes are 0.21.

Comparative analysis and discussion
In this section, we present the improvisation of PNA-SDA model over the existing model considering the security and model efficiency as primary concern.Table 2 shows the improvisation of PNA-SDA model over the existing model considering the packet identification.Moreover, considering other parameters like average deceased nodes we observe that only 0.03 nodes fails on an average for 20% compromised nodes and in case of 40% and 60% only 0.21 nodes fails to survive the network.

CONCLUSION
WSN generates a huge range of application-based data; moreover, these data require processing and transmission to the base station.Meanwhile, since WSNs are resource constrained, efficient data processing and energy conservation is the primary challenge.However, these issues can be tackled through DA which helps in avoiding redundancy and increasing the network lifetime; furthermore, security has been a major constraint, thus this research work designed a novel mechanism named PNA-SDA which aims at secure and efficient DA by adding the additional data and proximate node monitoring.In order to evaluate model efficiency, average energy consumption and deceased node were considered on 20%, 40%, and 60% compromised nodes; also, from the security, perspective packet identification parameter is evaluated along with a comparison with an existing model.Comparative analysis indicates the improvisation of 15.66%, 3.48%, and 6.66% of improvisation in comparison with the existing model on 20%, 40%, and 60% compromised nodes in respective manner.Although PNA-SDA outperforms the existing model, there are other parameters like packet misclassification, node identification need to be evaluated which would be carried out in future work.

Figure 1 .
Figure 1.Typical data transmission in WSN environment

Figure 5 .
Figure 5. Average number of deceased node in the network

Table 1 .
Network parameter and value

Table 2 .
Improvisation observed for correct packet identification