An Adaptive Data Gathering Scheme for Multi-Hop Wireless Sensor Networks Based on Compressed Sensing and Network Coding.

sensors Article

An Adaptive Data Gathering Scheme for Multi-Hop Wireless Sensor Networks Based on Compressed Sensing and Network Coding Jun Yin 1 , Yuwang Yang 1, * and Lei Wang 2 1 2

*

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; [email protected] School of Computer Science and Technology, School of Software, Nangjing University of Posts and Telecommunications, Nanjing 210023, China; [email protected] Correspondence: [email protected]; Tel.: +86-25-8431-5660

Academic Editor: Mihai Lazarescu Received: 25 January 2016; Accepted: 22 March 2016; Published: 1 April 2016

Abstract: Joint design of compressed sensing (CS) and network coding (NC) has been demonstrated to provide a new data gathering paradigm for multi-hop wireless sensor networks (WSNs). By exploiting the correlation of the network sensed data, a variety of data gathering schemes based on NC and CS (Compressed Data Gathering—CDG) have been proposed. However, these schemes assume that the sparsity of the network sensed data is constant and the value of the sparsity is known before starting each data gathering epoch, thus they ignore the variation of the data observed by the WSNs which are deployed in practical circumstances. In this paper, we present a complete design of the feedback CDG scheme where the sink node adaptively queries those interested nodes to acquire an appropriate number of measurements. The adaptive measurement-formation procedure and its termination rules are proposed and analyzed in detail. Moreover, in order to minimize the number of overall transmissions in the formation procedure of each measurement, we have developed a NP-complete model (Maximum Leaf Nodes Minimum Steiner Nodes—MLMS) and realized a scalable greedy algorithm to solve the problem. Experimental results show that the proposed measurement-formation method outperforms previous schemes, and experiments on both datasets from ocean temperature and practical network deployment also prove the effectiveness of our proposed feedback CDG scheme. Keywords: data gathering; sensor networks; compressed sensing; network coding

1. Introduction Wireless sensor networks (WSNs) are currently deployed for many applications, such as environmental monitoring, civil structure maintenance, military surveillance, and so on. In most of these kinds of applications, sensor nodes in the network are set to periodically report their sensed data (i.e., readings) to a sink node (or remote base station) through intermediate nodes’ relay. Under such circumstances, energy efficiency becomes one of the dominating issues of this data gathering process. Many solutions have been proposed based on various aspects, which include, among others, topology control (e.g., [1]), sleep scheduling (e.g., [2]), mobile data collectors (e.g., [3]), and data aggregation (e.g., [4–7]). The first three approaches focus on the energy efficiency of data gathering protocols or strategies, while the last one aims at reducing the required number of data packets to be sent to the sink node by eliminating data redundancy [8], hence it complements the others. Generally speaking, depending on the information that is needed at the sink node, existing data aggregation research falls into two categories: “functional” and “recoverable” [9]. The first one

Sensors 2016, 16, 462; doi:10.3390/s16040462

www.mdpi.com/journal/sensors

Sensors 2016, 16, 462

2 of 22

corresponds to cases where only some values of statistical function of the network sensed data (e.g., AVG, MAX, SUM) are required by the sink node. Alternatively, the second one is for applications Sensors 2016, 16, 462 2 of 22 where the sink node needs the set of entire network sensed data. Compressed data gathering (CDG), basedAVG, on the theory of compressed sensing (CS, [10]) and network coding (NC, [11]), has recently MAX, SUM) are required by the sink node. Alternatively, the second one is for applications been where proposed as anode promising “recoverable” scheme, it enables the sink node the complete the sink needs the set of entire network sensed data. Compressed dataacquire gathering (CDG), network sensed in an energy-efficient manner, as well balance the energy consumption among based on thedata theory of compressed sensing (CS, [10]) and as network coding (NC, [11]), has recently sensor nodes. been proposed as a promising “recoverable” scheme, it enables the sink node acquire the complete network sensed in some an energy-efficient manner, as well as the energy consumption First of all, we data make definitions on the problem to balance be discussed. Given a WSN of N among sensor nodes. sensor nodes, each having a piece of reading xi pi “ 1, 2, . . . , Nq at each data gathering epoch, we T some definitions on the problem to be discussed. Given a WSN of First of ,all, we make denote X N “ rx 1 . . . , x N s as the network sensed data. In a typical CDG scheme, the sink node will sensor nodes, each having a piece of reading ( = 1, 2, … , ) at each data gathering epoch, we reconstruct X N by collecting only M pM ! Nq weighted sums of xi from the network rather than all denote = [ , … , ] as the network sensed data. In a typical CDG scheme, the sink node will N original readings (in CS theory, these M weighted sums YM “ ry1 , y2 , ¨ ¨ ¨ , y M sT are formed by reconstruct by collecting only ( ≪ ) weighted sums of from the network rather than YM “allΦX Noriginal , wherereadings the M-by-N matrix Φ is the measurement matrix (or called projection matrix), and (in CS theory, these weighted sums = [ , , ⋯ , ] are formed by each yi = is the ,corresponding measurement result of X unless otherwise specified, where the -bymatrix is(or theprojection) measurement matrix (orN ;called projection matrix), “projection” and “measurement” are interchangeable hereunder). Such a reconstruction process is and each is the corresponding measurement (or projection) result of ; unless otherwise guaranteed and performedand based on an observation fact that, in hereunder). most network deployment cases, X N specified, “projection” “measurement” are interchangeable Such a reconstruction is called guaranteed and (K performed based on anorthogonal observationtransform fact that,domains in most (e.g., network has aprocess property K-sparse ! N) under certain discrete deployment cases, has a property called[10,12]). -sparse ( ≪ ) under certain orthogonal transform wavelet transform, discrete cosine transform domains (e.g., discrete wavelet transform, cosine transform [10,12]).without concern for the the In most previous CDG studies, classicaldiscrete CS theory is directly adopted In most previous CDG studies, classical CS theory is directly adopted without concern for the WSN specialty, which inevitably incurs several shortcomings. For example, when processing a signal, the WSN specialty, which inevitably incurs several shortcomings. For example, when processing a CS theory conventionally supposes that the sparsity (i.e., sparse degree) of this signal is fixed, while in signal, CS theory conventionally supposes that the sparsity (i.e., sparse degree) of this signal is fixed, a more general network deployment environment, the sparsity of the data by the while in a more general network deployment environment, the sparsity of theobserved data observed by network the may network change frequently. As shown in Figure 1, two 1000-reading datasets of ocean temperature may change frequently. As shown in Figure 1, two 1000-reading datasets of ocean data (OTD) are plotted with red lines. Thesewith datasets were collected from thecollected Pacific from Ocean 29 March temperature data (OTD) are plotted red lines. These datasets were theon Pacific 2014 Ocean (Figure 2 April 2014 (Figure respectively [13]. on1a) 29 and March 2014 (Figure 1a) and 1c), 2 April 2014 (Figure 1c), respectively [13]. (b) Wavelet coefficients

(a) Original Data Temperature(oC)

30

300 200

20

100 10 0

0 0

500 Depth/Pressure(dbar) (c) Original Data

1000

Temperature(oC)

30

-100

0

500 1000 61 non-zero coefficients (d) Wavelet coefficients

0

500 1000 53 non-zero coefficients

300 200

20

100 10 0

0 0

500 Depth/Pressure(dbar)

1000

-100

Figure 1. The sparsityvaries varies in in OTD OTD datasets. TheThe Pacific Ocean temperature observed on 29 on Figure 1. The sparsity datasets.(a)(a) Pacific Ocean temperature observed March 2014 (OTD A); (b) The corresponding coefficients of OTD A after six-layer Daubechies2 29 March 2014 (OTD A); (b) The corresponding coefficients of OTD A after six-layer Daubechies2 wavelet transformation; (c) The Pacific Ocean temperature observed on 2 April 2014 (OTD B); (d) The wavelet transformation; (c) The Pacific Ocean temperature observed on 2 April 2014 (OTD B); (d) The corresponding coefficients of OTD B after six-layer Daubechies2 wavelet transformation. corresponding coefficients of OTD B after six-layer Daubechies2 wavelet transformation.

By comparing these two figures, we can see that the sparsity is changing; specifically, there are 61 larger coefficients of see the that first dataset (61-sparse signal), and only 53 inthere the are By comparing theseafter twotransformation figures, we can the sparsity is changing; specifically, second (53-sparse signal). There are also some other practical shortcomings, which will 61 larger coefficients after transformation of the first dataset (61-sparse signal), and only 53bein the elaborated in Section To address these challenges, we propose an adaptivewhich CDG scheme this second (53-sparse signal).3.There are also some other practical shortcomings, will be in elaborated paper. During each data gathering epoch, we evaluate the current network sensed data at the sink in Section 3. To address these challenges, we propose an adaptive CDG scheme in this paper. During node and adjust the measurement-formation process according to this evaluation. By doing so, it each data gathering epoch, we evaluate the current network sensed data at the sink node and adjust

Sensors 2016, 16, 462

3 of 22

the measurement-formation process according to this evaluation. By doing so, it forms a kind of Sensors 2016, 16, 462 3 of 22 feedback-control process, and the required number of measurements is tuned adaptively according to the real-time variation of data to be gathered. forms a kind of feedback-control process, and the required number of measurements is tuned From another point to ofthe view, despite CDG’s to gathered. reduce the global communication cost, adaptively according real-time variation of ability data to be multiple studies show that the effectiveness of CDG is still affected bythe theglobal strategy of the measurements’ From another point of view, despite CDG’s ability to reduce communication cost, formation process [9,12,14]. Several as Hybrid-CDG [9], are of proposed multiple studies show that the optimization effectiveness ofschemes, CDG is such still affected by the strategy the measurements’ formation process [9,12,14]. Several optimization such data as Hybrid-CDG to form all of the measurement results through a single routingschemes, tree in each gathering[9], epoch. are proposed to form all of the measurement results through a single routing tree in each data To further reduce the energy consumption of such process, different from Hybrid-CDG, we supply gathering epoch. To algorithm further reduce thepaper, energy consumption of such process, different a measurement-formation in this where each measurement-formation path isfrom treated Hybrid-CDG, we supply a measurement-formation algorithm in this paper, where each individually in a data gathering epoch. Similar idea was also adopted by PB-CDG [14], yet the novelty measurement-formation path is treated individually in a data gathering epoch. Similar idea was of our approach lies in the path-generation procedure and the underlying method of measurement also adopted by PB-CDG [14], yet the novelty of our approach lies in the path-generation procedure coefficients’ distribution, which omits the massive coordination among sensor nodes in the network. and the underlying method of measurement coefficients’ distribution, which omits the massive The remainder of this paper is organized as follows: in Section 2 we first give a comprehensive coordination among sensor nodes in the network. overview on typical of CDG scheme mentioned we 2propose our aexplicit motivation Thethe remainder this paper is organized as above. follows: Then, in Section we first give comprehensive and main resolve method Section 3. Next, we illustrate our data gathering approach in detail in overview on the typicalinCDG scheme mentioned above. Then, we propose our explicit motivation and main resolve method in Section 3. Next, we illustrate our data gathering approach in detail in we Sections 4 and 5. Simulation and practical experiment results are presented in Section 6. At last, Sections 4 and 5. Simulation and7.practical experiment results are presented in Section 6. At last, we address the conclusions in Section address the conclusions in Section 7.

2. Related Work of Compressed Data Gathering 2. Related Work of Compressed Data Gathering

In non-aggregation data gathering schemes, data packets generated by sensor nodes are directly In non-aggregation data gathering schemes, data packets generated by sensor nodes are forwarded to the sink node through a certain topology organization (e.g., the tree-type topology as directly forwarded to the sink node through a certain topology organization (e.g., the tree-type shown in Figure 2A). These data gathering schemes do not exploit the correlation of network sensed topology as shown in Figure 2A). These data gathering schemes do not exploit the correlation of data, resulting in the network forwarding larger number of original packets; more,packets; in addition network sensed data, resulting in theanetwork forwarding a larger numberwhat’s of original to sending their own detected data, nodes that are closer to the sink node tend to relay a number what’s more, in addition to sending their own detected data, nodes that are closer to the sink node of packets from nodes (e.g., in Figure 2A, node nodes #5 forwards packets leaf node tend to remote relay a number of packets from remote (e.g., in11Figure 2A,while node each #5 forwards 11only packets each Such leaf node only forwards packet). Such an imbalance of energylead consumption forwards onewhile packet). an imbalance of one energy consumption will inevitably to the quick to the quick failure of the whole network. Thus,are thecloser lifetime nodes failurewill of inevitably the wholelead network. Thus, the lifetime of nodes which toofthe sinkwhich nodeare forms closer to the sink node forms a bottleneck in these data-gathering schemes. a bottleneck in these data-gathering schemes. Sink

S

5 1

4

1

3

8 1

1

4

10

4

6

1

1 2

2

S

Aggregation node

11 1

9

3

3

3 3

2

3

1

3

4

4

1

3 3

9

11

S

Number of transmissions

N 10

3 8 1

1 1

9

7 (C) Hybrid-Compressed Data Gathering

11

Sink y2=ϕ22x2+ϕ24x4+ϕ26x6 y3=ϕ37x7+ϕ310x10+ϕ311x11 ϕ 5 ϕ 310 x10 ϕ 13x 3 + ϕ 31 x 10 10 1 11 x1 ϕ 11x 1 ϕ1 1 x11 + ϕ 310 9x 3 9 ϕ 8 31 x2 + 1x ϕ 22 4x 4 6 11 2 ϕ22x2 ϕ 11 2 4 9 7

y1=ϕ11x1+ϕ13x3+ϕ19x9

ϕ 19x 9

6

1

3

3

ϕ37x7

1 2

3

8

(B) Plain-Compressed Data Gathering

S

5 1

10

3

7

(A) Non-Compressed Data Gathering

3

3 6

3

7

Sink

Sink

3 5

1

1 11

*

(D) Sparse Compressed Data Gathering

Figure 2. Non-CDG schemes. Figure 2. Non-CDGand andtypical typical CDG CDG schemes.

Plain-CDG, as shown in Figure 2B, is the earliest and the most rudimental data gathering

Plain-CDG, as shown in Figure 2B, is the earliest and the most rudimental data gathering scheme scheme which exploits the CS and NC theory. During each data gathering epoch, every node needs which exploits the CS and NC theory. During each data gathering epoch, every node needs to forward to forward a fixed number of packets (denoted as ) to form weighted sums (we let =3 a fixedinnumber of packets form M weighted Yi (we = 3 in of Figure Figure 2B, and =(denoted + as M) + to ⋯+ , = 1,2,3 ,sums where is let theM reading node2B, # and Y “ ϕi1 x1 ` ϕi2 x2 ` ¨ ¨ ¨ ` ϕi11 x11 , i “ 1, 2, 3 , where x j is the reading of node #j and ϕij is its coefficient

Sensors 2016, 16, 462

4 of 22

of this measurement). Note that each weighted sum corresponds to one measurement of network sensed data, and after receiving these measurement results, the sink node can reconstruct the network sensed data by adopting an appropriate CS reconstruction algorithm. Comparing to the Non-CDG schemes, Plain-CDG can reduce the number of transmissions, and balance the transmission load among sensor nodes. The authors in [9] proposed a scheme based on Plain-CDG, called Hybrid-CDG (shown in Figure 2C), to further reduce the number of transmissions. In the Hybrid-CDG scheme, nodes whose degrees are equal to or less than M will employ the Non-CDG to forward their readings and the others will employ the Plain-CDG scheme to generate M measurement packets. By comparing Figure 2C to Figure 2B, it is easy to find that the Hybrid-CDG scheme indeed reduces the redundancy transmission of the Plain-CDG (e.g., the leaf node #1 forwards three packets in Plain-CDG while only one packet in Hybrid-CDG). More efficient data gathering schemes using random sparse measurements were first introduced in [15] and developed in [14] as follows: at the beginning of each epoch, M projection nodes are selected randomly to collect M measurements (i.e., each projection node collects one measurement), meanwhile, each projection node is assigned or generates a sparse vector Φi by itself. Then, projection node #i is obliged to inform all nodes whose coefficient ϕij ‰ 0 to report their contributions (ϕij x j ) back. After receiving all segments of a measurement (yi ), the projection node sends the result to the sink node through the shortest routing path. An example of such measurement-formation process is illustrated in Figure 2D. To form the measurement result y1 , as the projection node, node #5 has randomly generated a sparse coefficient vector Φ1 “ rϕ11 0 ϕ13 0 0 0 0 0 ϕ19 0 0s, then, for these non-zero coefficients (ϕ11 , ϕ13 and ϕ19 ), it sends transmission requests to nodes #1, #3 and #9; these nodes reply to the requests by sending their contributions back (e.g., node #3 will reply ϕ13 x3 ). Note that, similar to other CDG schemes, those packets can be merged into one single packet on their measurement formation path (e.g., packets can be merged on nodes #3 and #5 for forming measurement y1 —this packet-merging process can be regarded as a network coding process [16]). Similarly, other measurements are formed in the network and forwarded to the sink node. It is easy to see that these sparse measurement-based CDG schemes would outperform prior dense schemes, because the formation of each measurement only involves several nodes, and the rest of the nodes can still stay in an idle/sleeping mode to reduce energy consumption. 3. Motivation and Proposed Method Design 3.1. Motivation of the Study In most existing CDG studies, the network sensed data is considered to have a known and fixed sparsity. Therefore, according to the CS theory, once a recovery algorithm is picked, an upper bound of required measurements (O(Klog(N/K))) is determined. However, this assumption is not practical, and even problematic. Firstly, as depicted in Section 1, the sparsity of data signals observed in a real natural environment is changing, making it impractical to know the exact sparsity value before reconstruction. Secondly, to successfully reconstruct the network sensed data, the necessary number of measurements is continuously varying over a range, thus, taking a fixed upper bound as the number of measurements may inevitably cause excessive measurements. To show this point, we carried on an experiment to reconstruct an N-dimensional K-sparse signal from two typical measurement methods, namely a Gaussian measurement method and a Bernoulli measurement method (each entire measurement takes values of either ˘1 each with probability s/2, or 0 with probability 1 ´ s (0 < s < 1)). We randomly generated an original signal x (N = 100, K = 10, s = 1/2), and to avoid the variations we repeated experiments of each measurement method for 600 times and employed the same algorithm (L1-formulation matching pursuit) on each reconstruction of x. The experimental results are depicted in Figure 3. We can conclude that both measurement ensembles exhibit high variance (mean values are 32.9450 and 33.8167, respectively; standard deviations are 4.8468 and 4.9414, respectively) and the most minimum number of required

Sensors 2016, 16, 462 Sensors 2016, 16, 462

5 of 22 5 of 22

measurements measurements are are lower lower than than the the upper upper bound bound (50). (50). This This explains explains that that schemes schemes relying relying on on an an upper upper bound canguarantee guarantee successful recovery, but means it often that manymeasurements unnecessary bound can successful data data recovery, but it often that means many unnecessary measurements areexperiment taken (in our experiment of 50 For on average). For CDG applications, are taken (in our 17.5 and 16.1817.5 of 50and on 16.18 average). CDG applications, as shown in as shown in Figure 2, the formation of each measurement result requires processing and forwarding Figure 2, the formation of each measurement result requires processing and forwarding through through several intermediate nodes; thus, reducing the unnecessary measurements will definitely several intermediate nodes; thus, reducing the unnecessary measurements will definitely decrease decrease nodes'consumption. energy consumption. nodes' energy

100

Gaussian

50

0 15

100

20

25 30 35 40 Minimum number of measurements

45

50

25

45

50

Bernoulli

50

0 15

20

30

35

40

Minimum number of measurements

Figure 3. minimum times distribution of Gaussian (dense) (top) and Bernoulli (sparse) Figure 3. Histogram Histogramofofthe the minimum times distribution of Gaussian (dense) (top) and Bernoulli measurement for a 10-sparse signal (bottom). (sparse) measurement for a 10-sparse signal (bottom).

Note method, wewe areare particularly interested in the Note that, that,comparing comparingtotothe thedense densemeasurement measurement method, particularly interested in sparse measurement method. The reason is the sparse measurement methodmethod requiresrequires a very similar the sparse measurement method. The reason is the sparse measurement a very number of dense measurements (as shown in Figure 3), 3), while similar number of dense measurements (as shown in Figure whilefor forCDG CDG applications, applications, sparse measurement coefficient coefficientvector vectorleads leads a fewer interested nodes (interested the whose nodes to to a fewer interested nodes (interested nodesnodes are theare nodes whose corresponding coefficient of such measurement is non-zero) thatmethod of dense in the corresponding coefficient of such measurement is non-zero) than that than of dense inmethod the formation formation of each measurement, and thus other nodes can keep sleeping to save energy. For the of each measurement, and thus other nodes can keep sleeping to save energy. For the sparse Bernoulli sparse Bernoulli example in Figure 3, we experimented = 1/2, thus there were averagely example in Figure 3, we experimented with s = 1/2, thuswith there were averagely 50% fewer nodes in50% the fewer nodes themeasurement formation of each than that method. of dense Gaussian method. formation of in each than measurement that of dense Gaussian Main Program Program of the Proposed Method 3.2. Main

In the CDG scheme, the key problem is how these measurements will be formed and sent to the sink node in an energy-efficiently energy-efficiently manner. As As we mentioned mentioned above, above, in in actual actual network network deployment, deployment, sink node network sensed data is in fluctuation, and thus, the sparsity also changes with time. Given Given that the network required number of measurements (M) ( ) cannot be decided at the beginning (it is up to the sparsity of network sensed sensed data data X), use aa specific specific upper upper bound bound in inprevious previousresearches. researches. the network ), it is inappropriate to use defect, we we propose propose aa feedback feedback control control strategy strategy without without the the requirement requirement of In order to make up this defect, having the specific sparsity knowledge knowledge of the network sensed data. The procedure is as follows follows (as having shown in Figure 4 and Algorithm 1). During each epoch, the sink node evaluates the reconstruction quality of measurements which have been received; if the reconstruction quality indicators have not feedback packet packet will will be be sent sentto toinform informthe thenodes nodesthat thatadditional additional ∆∆M measurements been obtained, aa feedback This measurement-reconstruction measurement-reconstruction process will repeat until the reconstruction quality is are required. This Therefore, unlike existing schemes, a typical feature of our scheme is that that the the measurement measurement satisfied. Therefore, procedure during each epoch is separated into several rounds. Since the network sensed data is

Sensors 2016, 16, 462

6 of 22

Sensors 2016, 16, 462

6 of 22

procedure during each epoch is separated into several rounds. Since the network sensed data is adaptively and and progressively progressively reconstructed reconstructed in in our our scheme, scheme, the the sink sink node node can can also also satisfy satisfy the the varying varying adaptively levels of quality requirement of network sensed data by different users. levels of quality requirement of network sensed data by different users.

Figure Figure 4. 4. Main Main procedure procedure of of the the proposed proposed scheme. scheme.

Algorithm 1: Main Procedure of the Proposed Scheme Algorithm 1: Main Procedure of the Proposed Scheme Input: Reconstruction quality requirement, time of each data gathering period T Input: Reconstruction quality requirement, time of each data gathering period Output: Network sensed data X in epoch Ti Output: Network sensed data in epoch 1 Initiate the network settings, flag = 1 1 Initiate the network settings, flag = 1 2 If clock interrupt of T is received do 2 If clock interrupt of i is received do 3 generate feed-back packet and propagate it to the network 3 generate feed-back packet and propagate it to the network 4 ∆M measurements are propagated to the sink node through designed routing architecture 4 ∆ measurements are propagated to the sink node through designed routing architecture ˆ 5 the sink node receives and adds these projections into projection pool and gets reconstruction result X 5 the sink node receives and adds these projections into projection pool and gets reconstruction ˆ 6 evaluate X result 7 If stopping rule is achieved then 6 evaluate 8 flag = 0 7 If stopping rule is achieved then 9 Ifend 8 flag = 0 10 While flag = 0 9 Ifend 11 Ifend 10 While flag = 0 12 Return Xˆ 11 Ifend 12 Return 4. Measurement Formation Process

4. Measurement Formation Process in Section 2, CDG schemes require a certain number of According to the description measurements of network sensed data, and these measurement results are generated and aggregated According to the description in Section 2, CDG schemes require a certain number of on measurement-formation paths (i.e., trees). As each measurement is a linear combination of multiple measurements of network sensed data, and these measurement results are generated and contributions, each of which is generated by multiplying the reading of an interested node and its aggregated on measurement-formation paths (i.e., trees). As each measurement is a linear corresponding measurement coefficient, an underlying question surfaces that how the measurement combination of multiple contributions, each of which is generated by multiplying the reading of an coefficients are generated and distributed. For the example proposed in Figure 2D, before responding to interested node and its corresponding measurement coefficient, an underlying question surfaces the requirement of the formation of the measurement y1 , interested nodes #1, #3 and #9 should receive that how the measurement coefficients are generated and distributed. For the example proposed in their coefficients first. One may think that the sink node can pack and broadcast each measurement Figure 2D, before responding to the requirement of the formation of the measurement , interested coefficient vector Φi to the network during each data gathering epoch, and every node in the network nodes #1, #3 and #9 should receive their coefficients first. One may think that the sink node can can directly acquire its measurement coefficient vector by monitoring these packets. However, for pack and broadcast each measurement coefficient vector to the network during each data a WSN which has dozens of sensor nodes deployed in a large harsh field, such a process will incur a very gathering epoch, and every node in the network can directly acquire its measurement coefficient heavy overhead of the time slice. What’s worst, such broadcasting requires the synchronization among vector by monitoring these packets. However, for a WSN which has dozens of sensor nodes nodes in the network, thus it is impractical, especially for those WSNs applying sleeping mechanisms. deployed in a large harsh field, such a process will incur a very heavy overhead of the time slice. Considering the tree-type and sink-rooted measurement-formation path which covers all What’s worst, such broadcasting requires the synchronization among nodes in the network, thus it interested nodes, we propose to generate measurement coefficients during constructing these paths. is impractical, especially for those WSNsthe applying sleeping mechanisms. We achieve this object associating a pseudo-random generator, which is an algorithm publicly known Considering thebytree-type and sink-rooted measurement-formation path which covers all by both the sink node and sensor nodes in the network. In each measurement-coefficient-generating interested nodes, we propose to generate the measurement coefficients during constructing these procedure, the sink this nodeobject first produces a global seed corresponding to thiswhich measurement (in fact, paths. We achieve by associating a pseudo-random generator, is an algorithm publicly known by both the sink node and sensor nodes in the network. In each measurement-coefficient-generating procedure, the sink node first produces a global seed

Sensors Sensors2016, 2016,16, 16,462 462

7 of 2222 7 of

corresponding to this measurement (in fact, the sink node’s running time would be a greater the sink node’s running time would be a greater candidate for this global seed). Once the interested candidate for this global seed). Once the interested nodes in the network receive this global seed, it nodes in the network receive this global seed, it can generate its own measurement coefficient by can generate its own measurement coefficient by adding the node ID. The sink node can generate a adding the node ID. The sink node can generate a sparse coefficient vector Φi , where each element sparse coefficient vector , where each element is randomly generated by using the group Φij is randomly generated by using the group seed: (global seed Si , node ID j). Thus, coefficients seed: (global seed , node ID ). Thus, coefficients in obey an independent and identical in Φi obey an independent and identical distribution (i.i.d.) and the sink node can programme distribution (i.i.d.) and the sink node can programme the measurement-formation path according to the measurement-formation path according to Φi . For the measurement y1 proposed in Figure 2D, . For the measurement proposed in Figure 2D, such measurement-formation process is such measurement-formation process is depicted in Figure 5, and two contribution packets can be depicted in Figure 5, and two contribution packets can be merged on node #3. Now, there’s still an merged on node #3. Now, there’s still an issue remaining, i.e., how to energy efficiently build an issue remaining, i.e., how to energy efficiently build an optimal measurement-formation path which optimal measurement-formation path which covers all interested nodes. In other words, how to covers all interested nodes. In other words, how to ensure that all interested nodes in the network ensure that all interested nodes in the network can receive the global seeds efficiently, and meanwhile can receive the global seeds efficiently, and meanwhile how to minimize the total transmission cost. how to minimize the total transmission cost. In the following sections, we would like to present our In the following sections, we would like to present our measurement-formation method. measurement-formation method.

Figure5.5.Global Globalseed seeddistribution distributionand andmeasurement measurementresult resultpacket packetmerge mergeon onnode node#3. #3. Figure

4.1.Construction Constructionofofthe theMeasurement-Formation Measurement-FormationPath Path(Tree) (Tree) 4.1. Accordingtotothethe perspective of graph theory, this measurement-formation comes According perspective of graph theory, this measurement-formation processprocess comes down down to constructing a sink-rooted path which covers all interested nodes. To improve the efficiency to constructing a sink-rooted path which covers all interested nodes. To improve the efficiency of of measurement formation, length of this should be minimized. Thus, construction measurement formation, the the totaltotal length of this treetree should be minimized. Thus, the the construction of of such optimal tree is a typical Steiner minimum tree problem [17]. Unfortunately, it has been such optimal tree is a typical Steiner minimum tree problem [17]. Unfortunately, it has been proved to beofaNP-complete kind of NP-complete [18]. To solve this problem, a heuristic toproved be a kind problemproblem [18]. To solve this problem, a heuristic method method based onbased the on the All-Pair-Shortest-Paths (APSP) [14] algorithm has been proposed for CDG applications. All-Pair-Shortest-Paths (APSP) [14] algorithm has been proposed for CDG applications. However, However, APSP tries to form a line-shape measurement-formation which does notthe consider the APSP tries to form a line-shape measurement-formation path whichpath does not consider wireless wireless transmission characteristics For the example, the total lengths lefttrees and transmission characteristics of WSNs. of ForWSNs. example, total lengths of both the of leftboth andthe right right trees in Figure 6 are 6, but if we consider the eavesdropping characteristics of the wireless in Figure 6 are 6, but if we consider the eavesdropping characteristics of the wireless channels used used in WSNs, the actual transmission overheads of trees the left right trees are 6 and inchannels WSNs, the actual transmission overheads of the left and right are and 6 and 4, respectively. From4, respectively. From we can of find the overhead thedepends construction this tree this example, we can this find example, that the overhead thethat construction of thisof tree on the of number of depends on the number of non-leaf nodes, namely the number of relay nodes. Thus, this is non-leaf nodes, namely the number of relay nodes. Thus, this is a Maximum Leaf Nodes-Minimuma Maximum Leaf Nodes-Minimum Steiner problem, i.e., we need build a Steiner Nodes (MLMS) problem, i.e., we need Nodes to build(MLMS) a sink-rooted minimum Steiner treeto(denoted sink-rooted minimum Steiner nodes tree (denoted byby )I).that covers all interested nodes (denoted by ). by T) that covers all interested (denoted Note that not all interested nodes are located Note that not all interested nodes are located on the leaf nodes, some can also act as non-leaf nodes. on the leaf nodes, some can also act as non-leaf nodes. MLMS tree has the requirement to minimize MLMS tree of has the requirement to minimize of introduced (or the number introduced non-interested nodesthe (ornumber called Steiner nodes), non-interested and maximumnodes number called Steiner nodes), and maximum number of leaf nodes. In the next subsection, we will present of leaf nodes. In the next subsection, we will present the mathematical model of this problem, and the mathematical model of this algorithm problem, is and after that, a global approximation algorithm after that, a global approximation designed to solve this problem. Note that, classicalis designed to solve this problem. Note candidate that, classical converge-cast protocols can a good candidate converge-cast protocols can be a good for data aggregation schemes asbe well as those CDG for data aggregation schemes as well as those CDG applications which do not consider the applications which do not consider the measurement coefficients generation/distribution (these CDG measurement coefficients generation/distribution (these CDG applications assume that nodes in applications assume that nodes in network have already been assigned the measurement coefficients). networkthis have been assigned the measurement measurement results coefficients). Through thistolocal Through localalready gradient-routing mechanism, can be safely reported the gradient-routing mechanism, measurement results can be safely reported to the sink We sink node. We proposed our MLMS tree-shape method based on an overall considerationnode. of path proposed our MLMS tree-shape method basedand on the an measurement overall consideration of path generation, generation, coefficients generation/distribution results’ formation, and the coefficients generation/distribution and the measurement results’ formation, and the measurement-formation path is created at the time of the global seed distribution. measurement-formation path is created at the time of the global seed distribution.

Sensors 2016, 16, 462 Sensors 2016, 16, 462

8 of 22 8 of 22

Figure 6. Measurement-formation path of PB-CDG and MLMS. Figure 6. Measurement-formation path of PB-CDG and MLMS.

4.2. Mathematical Model of MLMS Problem 4.2. Mathematical Model of MLMS Problem We describe the WSN as a connected graph G “ xV, E, Ty, where V is the set of nodes; E is the We describe the WSN as a connected graph =< , , >, where is the set of nodes; is set of undirected edges connecting any two nodes that can directly communicate with each other. the set of undirected edges connecting any two nodes that can directly communicate with each T “ tIni , Si , Yu , i “ 1, 2, . . . , M is the set of M measurement-formation routing trees. Ini is the set of other. = { , , }, = 1,2, … , is the set of measurement-formation routing trees. is the interested nodes of ith measurement, and Si (Si Ă V ´ Ini ) is the set of Steiner nodes, Yi “ tyijt u is set of interested nodes of th measurement, and ( ⊂ − ) is the set of Steiner nodes, a binary variable indicating whether there is an edge between node i and node j for tree t, when (i, j) is = { } is a binary variable indicating whether there is an edge between node and node for the edge of the optimal Steiner tree t, yij “ 1, otherwise, yij “ 0. Then the objective and constraints of = 1, otherwise, = 0. Then the tree , when ( , ) is the edge of the optimal Steiner tree , our MLMS construction problem can be formulated as below: objective and constraints of our MLMS construction problem can be formulated as below: MM ÿ ÿ Min z    yyijt ijt ,,s.t. Min z“ s.t. t  1 (i,j)E

(1) (1)

t“1 pi,jqPE

ÿ yij  1, i  Ii , j  Si i:( i , j )Eyij “ 1, @i P Ii , @j P Si

(2) (2)

i:pi,jqPE

ÿ

i , jI i  Si

y ij |ˇIni  Si |ˇ1 yij “ˇ Ini Y Si ˇ´1

(3) (3)

y ij |ˇIni  Si |ˇ1 yij “ˇ Ini Y Si ˇ´1

(4) (4)

i,jP Ii YSi

ÿ

i , jI i  Si

i,jP Ii YSi

N yijt  R NV, t  T (5) t yij “  M  , @t P T (5) M iPIit,jPIi YSi yij  {0,1}, (i , j )  E, t  T (6) yijt P t0, 1u, @pi, jq P E, @t P T (6) Constraint (2) limits each interested node that can only connect with one Steiner node. By using Constraint (2) degree limits each interested can only connect with Steiner (4) node. By using Constraint (3), the of each Steiner node node that is limited to no less than 2. one Constraint ensures that Constraint (3), the degree of each Steiner node is limited to no less than 2. Constraint (4) ensures that the result we obtained is a spanning tree. Constraint (5) is a restriction on the number of interested the result a spanning tree. Constraint (5) is a restriction on the number of interested nodes for we the obtained spanning is tree. nodes for the spanning tree.

ÿ

iI i , jI i  S i

4.3. Scalable Algorithm for MLMS Construction 4.3. Scalable Algorithm for MLMS Construction within the communication range of node can be Notice that all interested nodes Notice that all interested nodes Iv within the communication range of node v can be considered considered to come from the same group, and based on this circumstance, we can perform a greedy to come from the same group, and based on this circumstance, we can perform a greedy iterative iterative algorithm to reduce the complexity of MLMS construction problem. Without loss of algorithm to reduce the complexity of MLMS construction problem. Without loss of generality, node v generality, node can perform the iteration operation on behalf of other nodes in such a group. can perform the iteration operation on behalf of other nodes in such a group. During each iteration, During each iteration, we first select the node that connects the most interested nodes or groups; we first select the node that connects the most interested nodes or groups; combine this node and its combine this node and its associated nodes or groups to form a new group; and we repeat this process until find out all groups. Next, in order to make group leader nodes and the remaining

Sensors 2016, 16, 462

9 of 22

associated nodes or groups to form a new group; and we repeat this process until find out all groups. Next, in order to make group leader nodes and the remaining interested nodes be connected with the sink node, an optimal Steiner tree approximation algorithm is required. Since this problem is NP-complete, we proposed an alternative algorithm which is built upon the minimum spanning tree (MST) algorithm. The scope to run the MST is the closure that takes the sink node as a fulcrum and contains all the remaining interested nodes. Then, we perform the Graham scan algorithm to obtain a convex hull of all remaining groups. At last, we perform the MST algorithm to connect these nodes in convex hull. Detailed description of this algorithm is shown in Algorithm 2. It is easy to see that for each measurement formation tree, maximum iterations times for group formation is ρ/2; and for ` ˘ the node-sparse graph, the proposed algorithm has a complexity of O ρ2 d2 , and for the edge-sparse ` ˘ graph is O ρ2 d2 ` nlogn ` n2 , where d is the average degree of the nodes in network, n is the number of edges and ρ is the number of interested nodes. Algorithm 2: Construction of the Measurement-Formation Path Input: Sparse measurement coefficients vectors Φ1 , Φ2 . . . , Φ M each with n i.i.d. elements; nodes’ maximum communication range Rcomm ; ( Output: Set of measurement formation a trees T “ tT1 , T2 , . . . , TM u, Ti “ Ini , Si , Y i 1 For each measurement i “ 1, 2, . . . , M ( 2 Calculate the set of interested nodes Ini “ all nodes s.t. Φij ‰ 0, j “ 1, 2, . . . , n . 3 Calculate the set of neighbor nodes Ne “ teach node v s.t. dist pv, tq ď Rcomm , v P V, t P Ini u. 4 Do 5 Calculate the number of edges incident from Ne to Ini (denoted as De t|Ne|u). 6 If Max pDe t|Ne|uq ě 2 7 Remove all adjacent nodes of node v from Ini where v P Ini and De tvu “ Max pDe t|Ne|uq 8 Add node v into Ini 9 Renew Ne and Ini 10 Ifend ˇ ˇ ˇ ˇˇ ˇ 11 While (ˇ Innew ˇ ă ˇInold ˇ). i

i

12 Calculate the convex hull (denoted as Cl) of set Ne Y tsinku by using Graham scan method 13 Construct the MST which takes the sink as a root and connects all Cl nodes ( 14 Ti “ Ini , Cl, Y i where Y i recorded the edge information of such MST 15 Forend 16 Return T “ tT1 , . . . , TM u 4.4. MLMS Tree Construction and Maintenance In order to avoid a large number of control packets caused by constructing MLMS trees, after calculating the approximate minimum cost MLMS path T, we let the sink node pack the information of T with the global seed into several packets. Through the nodes’ relay of these packets the MLMS tree is constructed orderly (as shown in Figure 7), but for a large-scale network (meaning there are many interested nodes in each measurement and the path from the sink node to the interested nodes may be extremely deep), the complete path information will take up a lot of space in the packet head. Due to the length limitation of packets transmitted in the WSN, the sink node needs to compress and encode the information of T. Noticing that Algorithm 2 divides the neighboring interested nodes into a group, we can use the effect of wireless communication to locally broadcast such packets at each branch node and thus all its neighbors can overhear them. Thus, in the branches of the MLMS path, we only need to encode the branch node information and add the local broadcast hops to the packet header, rather than the information of all interested nodes. In this way, we can compress the path information to reduce the number of control packets.

Sensors 2016, 16, 462 Sensors 2016, 16, 462

10 of 22 10 of 22

Figure 7. MLMS the compressed compressed packet packet format. format. Figure 7. MLMS construction construction and and the

As shown in Figure 7, all interested nodes and branch nodes are in four groups a, b, c and d, As shown in Figure 7, all interested nodes and branch nodes are in four groups a, b, c and d, and and we can guarantee that all interested nodes can receive the corresponding global seed simply by we can guarantee that all interested nodes can receive the corresponding global seed simply by coding coding the information of branch nodes into the packet header, and the MLMS path can be the information of branch nodes into the packet header, and the MLMS path can be constructed in constructed in disseminating such packets. Obviously, this mechanism is affected by the density of disseminating such packets. Obviously, this mechanism is affected by the density of the interested the interested nodes (i.e., the proportion of the number of interested nodes to the total number of nodes (i.e., the proportion of the number of interested nodes to the total number of network nodes), network nodes), and we will evaluate it in Section 6. and we will evaluate it in Section 6. 4.5. Maintenance Path 4.5. Maintenance of of MLMS MLMS Measurement-Formation Measurement-Formation Path WSNs are are always always deployed deployed in in aa harsh harsh environment, where unreliable unreliable wireless wireless links links and and failure failure WSNs environment, where of certain nodes are prevalent, which potentially leads to the failure of creating and maintaining the of certain nodes are prevalent, which potentially leads to the failure of creating and maintaining MLMS path. In fact, thethe failure of of any data the MLMS path. In fact, failure anyintermediate intermediatenode nodecompromises compromisesthe thedelivery delivery of of all all data aggregated and sent by the previous nodes in the path. Hence, some improvements to the scheme aggregated and sent by the previous nodes in the path. Hence, some improvements to the scheme should be be implemented. implemented. We will discuss discuss this this problem problem from from two two aspects: aspects: the nodes failure failure and and the the should We will the nodes packets loss. loss. packets Because failures of of other other nodes nodes will will not not affect affect the the current current MLMS MLMS measurement measurement formation formation Because the the failures process, we we mainly mainly deal deal with with the the node node failure failure on on the the MLMS MLMS path. path. If If aa node node has has failed, failed, aa new new routing routing process, path should be established in a timely fashion, and the information of this failed node should be path should be established in a timely fashion, and the information of this failed node should be reported to the sink node to guide the building of another MLMS path. Then, two kinds of node reported to the sink node to guide the building of another MLMS path. Then, two kinds of node failures failures on apath MLMS path will be below, discussed below,the including thenode group headand node andnode the on a MLMS will be discussed including group head failure thefailure ordinary ordinary node failure. If a group head node at the end of MLMS path has failed, a new head node failure. If a group head node at the end of MLMS path has failed, a new head node needs to be selected needs to be to selected to packets continueoftonodes transmit packets nodes individing the group. dividing the to continue transmit in the group.ofThrough the Through remaining nodes into remaining nodes intowe several smallthe groups, we canhas choose the node which has the second-highest several small groups, can choose node which the second-highest number of neighbor nodes number of neighbor nodes to act as a new group head. If a node on the path fails to operate, its to act as a new group head. If a node on the path fails to operate, its upstream node will choose the upstream nodenodes, will choose next neighbor nodes, which [19], is similar to the GPSR protocol [19], and next neighbor whichthe is similar to the GPSR protocol and continue to transmit the packet, continue to transmit the packet, since all routing information (heading to interested nodes) is since all routing information (heading to interested nodes) is encoded into the packet head. As shown #5 has failed, and its upstream encoded into the packet head. As shown in Figure 8, when node in Figure 8, when node #5 has failed, and its upstream node #4 encounters this situation, according to node #4 encounters thisneighbors, situation,it according to the select working nodes node in the neighbors, will the working nodes in the will automatically a next-hop (e.g., node #6).itBut at #6 ). But at this time, the routing path is probably automatically select a next-hop node (e.g., node this time, the routing path is probably not the optimal path, and if necessary, the optimal MLMS path not the be optimal path, and if necessary, the optimal MLMS path should be recalculated. should recalculated. For the thenode-failure node-failuremessage message return, we need to respectively dealnode withfailures node in failures in For return, we need to respectively deal with different different measurement-formation periods. In the distribution of the global seed, we can use the measurement-formation periods. In the distribution of the global seed, we can use the Piggyback Piggybackwhich method, which allowing thetosender to add ID of the failure to the fixed position method, allowing the sender add the ID the of the failure nodenode to the fixed position of #4 sends packets to node #6 , because node #3 can of seed-distribution packet. As node seed-distribution packet. As node #4 sends packets to node #6, because node #3 can eavesdrop eavesdrop on the packet, it writes the ID of failure node #8 into a corresponding fixed position of

Sensors 2016, 16, 462

11 of 22

Sensors 2016, 16, 462

11 of 22

on the packet, it writes the ID of failure node #8 into a corresponding fixed position of its next packet to #5 reversely next be the sent, and information in turn, until the#5failure information of to node beitssent, andpacket in turn,tountil failure of node reversely backtracking the Sink through backtracking to the Sink through the normal data channel. the normal data channel.

Figure8.8.Piggyback Piggybacknodes nodesrecovery recoveryand andNAck NAckpackets packetsloss. loss. Figure

Therefore, without increasing any control packet, by adopting the eavesdropping property, Therefore, without increasing any control packet, by adopting the eavesdropping property, information about failed nodes is quickly reported to the Sink, while for the information return of information about failed nodes is quickly reported to the Sink, while for the information return of the the failed nodes in the process of measurement-formation, we can directly write the ID of the failed failed nodes in the process of measurement-formation, we can directly write the ID of the failed node node to the fixed position of the measurement-formation packet. With the return of to the fixed position of the measurement-formation packet. With the return of measurement-formation measurement-formation packets, the sink node can directly obtain the information of failing nodes. packets, the sink node can directly obtain the information of failing nodes. For each measurement formation process, both in the global seed distribution period and the For each measurement formation process, both in the global seed distribution period and the establishment of measurement result, it will inevitably experience packet losses, and if the packet establishment of measurement result, it will inevitably experience packet losses, and if the packet losses occur without a recovery mechanism, failure of this measurement formation will be induced, losses occur without a recovery mechanism, failure of this measurement formation will be induced, definitely. If a node in MLMS path discoveries that several packets in a sequence are not received definitely. If a node in MLMS path discoveries that several packets in a sequence are not received within an acceptable TTL time, it will ask the upstream node for these packets along the reverse within an acceptable TTL time, it will ask the upstream node for these packets along the reverse path. If the upstream node does not keep these packets, this process will continue until reaching the path. If the upstream node does not keep these packets, this process will continue until reaching the interested nodes. As shown in Figure 8, if node #4 fails to receive the packet from #6, node #4 interested nodes. As shown in Figure 8, if node #4 fails to receive the packet from #6, node #4 will will request this packet from its upstream node #6; if node #6 does not keep it, node #6 will request this packet from its upstream node #6; if node #6 does not keep it, node #6 will request its request its upstream node #8 for this packet, and such process will be repeated until the packet is upstream node #8 for this packet, and such process will be repeated until the packet is found. Of course, found. Of course, in order to further improve the reliability and recover the lost packets timely, we in order to further improve the reliability and recover the lost packets timely, we can set nodes around can set nodes around the MLMS paths to buffer the packets they have eavesdropped on; when the MLMS paths to buffer the packets they have eavesdropped on; when certain packets are lost, certain packets are lost, they can be recovered quickly and accurately through multicast inquiry they can be recovered quickly and accurately through multicast inquiry messages to the neighbor messages to the neighbor nodes. What’s more, a consultation method can also be used to further nodes. What’s more, a consultation method can also be used to further improve the reliability of packet improve the reliability of packet transmission (for example, three-handshake negotiation). The transmission (for example, three-handshake negotiation). The disadvantages of these two methods are disadvantages of these two methods are the large overhead of transmitting control messages, and the large overhead of transmitting control messages, and we will evaluate it in Section 6. we will evaluate it in Section 6. 5. Adaptive Termination Rule of the Measurement-Formation 5. Adaptive Termination Rule of the Measurement-Formation As we mentioned before, after choosing a reconstruction algorithm (such as BP, or MP), the As we mentioned before, after choosing a reconstruction algorithm (such as BP, or MP), the sink sink node can reconstruct the K-sparse original signal with the probability close to 1 as long as it node can reconstruct the -sparse original signal with the probability close to 1 as long as it receives receives a certain number of measurement results (theoretically, O(KlogN)). However, as shown in )). However, as shown in Section 3, a certain number of measurement results (theoretically, ( Section 3, this bound cannot be applied to our method directly, because it requires prior knowledge this bound cannot be applied to our method directly, because it requires prior knowledge of the of the sparsity information of the network sensed data. By using the conclusion from “sequential sparsity information of the network sensed data. By using the conclusion from “sequential compress compress sensing [20]”, an adaptive termination condition of measurement procedure was proposed sensing [20]”, an adaptive termination condition of measurement procedure was proposed in in literature [21]. Specifically, assume that we can obtain measurement result of a K-sparse signal literature [21]. Specifically, assume that we can obtain measurement result of a -sparse signal x (x P RN , K is unknown) step by step (each step gets one measurement result and we denote the M and and ´ 1 , then is unknown) by ifstep step getsresult one measurement the ( ∈ N , measurement on step i as yi step “ ϕi x); the (each reconstruction on step M is xˆresult xˆ M “we xˆ Mdenote M M as = ); if the reconstruction result on step is and = , measurement on step xˆ is the result which we need (i.e., xˆ “ x) with probability 1. is thethis result which werule needis(i.e., = ) with probability thenAlthough termination very concise, it still has some 1. limitations when applied it to Although this termination rule is very concise, it still has some limitations when applied it to CDG for WSNs. Because this rule is based on the dense measurement method (each measurement CDG for WSNs. this rule distribution), is based on the dense measurement method (each measurement coefficients vector Because ϕi obey Gaussian as shown in Section 2, each measurement formation obey Gaussian distribution), as shown in Section 2, each measurement coefficients vector formation will concern every node in the network, which will lead to more energy consumption than the sparse measurement method. On the other hand, coefficients in the measurement vector of

Sensors 2016, 16, 462

12 of 22

will concern every node in the network, which will lead to more energy consumption than the sparse measurement method. On the other hand, coefficients in the measurement vector of this termination rule are continuous; thus, it is very difficult to generate on cheap WSN sensor nodes. Therefore, we need to find a termination rule for the sparse measurement method in this study. The difference emerging from the dense Gaussian measurement method is that the probability of being nonsingular for each submatrix of the measurement coefficient matrix is no longer 0. In order to meet our specific demands, we modified the termination rule in [20] to obtain agreement for a certain number of consecutive measurements (denoted as ∆M) (as shown in Proposition 1). Note that, we assume that the sink node can obtain and reconstruct the sparse Bernoulli measurement results step by step, and x represents the N-dimensional data vector which is desired by the sink node in each data-gathering epoch. Lemma 1. Let v be an n-dimensional sparse Bernoulli vector (v P t´1, 0, 1un ) and ||v0 || “ λ p0 ă λ ď nq. Let W be a deterministic w-dimensional subspace of Rn p0 ă λ ă w ď nq, then Probpv P Wq ď 3w´n . Proof. Given W being a w-dimensional subspace, there exist w coordinates ci1 , . . . , ciw that determine all the other n-w coordinates of an arbitrary vector c “ pc1 , . . . , cn q P W. Thus, if we condition on the coordinates ci1 , . . . , ciw of v, there is at most one case to make v P W in the other n-w coordinates. Hence the probability of v P W is at most 3´pn´wq . Proposition 1. Suppose that the sink node has got M ´ 1 measurement results in previous M ´ 1 steps, and at step M, if the sink node have got the measurement result y M from the network, it can get a reconstruction result of network sensed data (denote as xˆ M ) from these received measurement results; if xˆ M “ xˆ M´1 “ . . . “ xˆ M´∆M , xˆ M is the reconstruction result which we need with probability no less than 1 ´ 3´∆M . Proof. We prove this proposition by using argumentum ad absurdum. We denote the measurement coefficient vector at Step i as ϕi , and the measurement result of Step i as yi . Assume that, xˆ M “ xˆ M´1 “ . . . “ xˆ M´∆ M and xˆ M is not the correct reconstruction result of x, then x ‰ xˆ M´∆M “ . . . “ xˆ M is valid. Because we have obtained the solution result xˆ M´∆M at the Step M ´ ∆M, equation T T pϕ M´∆M q xˆ M´∆M “ y M´∆M “ pϕ M´∆M q x comes into existence. Thus, according to the assumption T

that xˆ M´∆M “ xˆ M , it is easy to obtain pϕ M´∆M q px ´ xˆ M q “ 0. By following this idea, for ∆M measurement coefficient vectors generated from Step M ´ ∆M ` 1 to Step M, we can also get T

equationspϕi q px ´ xˆ M q “ 0, i “ M ´ ∆M ` 1, . . . , M. Let V be a vector space which is spanned from the non-zero vector xˆ ´ xˆ M , and if this equation set is correct, entire ∆M measurement i K coefficient ! ˇvectors ϕ must exactly belong to)the orthogonal complement of V (denoted as W where ˇ W K “ xˇx P R N s.t. px, yq “ 0 f or all y P V ). For a 1-dimensional nonzero linear transformation px ´ xˆ M q1 , its orthogonal complement is an (n ´ 1)-dimensional subspace of R N . Hence, from Lemma 1, it can be seen that the probability of an imprecise solution xˆ M being calculated and kept is greater than 3´∆M , as desired. Proposition 1 provides us a termination rule of sparse Bernoulli measurement formation by comparing several construction results. In our experiments, signals are generated with the same signal length and sparse degree as shown in Figure 3. We test the signal reconstruction under the termination rule proposed in Proposition 1 with different ∆M values, and the experiment results are shown in Table 1. We can see that Proposition 1 presents a very efficient terminating condition in the reconstruction of sparse Bernoulli measurement method. The larger the quantity (∆M) is adopted, the lower error reconstruction generates (i.e., 128 errors happen under ∆M = 1, while only 2 under ∆M = 2). Thus, our data gathering process can adapt to the various requirements of network applications by adjusting the value of ∆M.

Sensors 2016, 16, 462

13 of 22

Table 1. Data reconstruction experiment on sparse Bernoulli measurement by using Proposition 1. ∆M

Errors

Upper Bound

Average Measurement

Standard Deviation

1 2 3 4

128 2 0 0

200 200/3 200/9 200/27

31.0733 34.8133 36.7350 37.9867

6.9430 4.8461 4.8882 4.8428

6. Numerical Results We evaluate the performance of the proposed and existing schemes through both simulations and practical experiments. We separate our experiments into two main parts: (1) the performance simulation of the cost of MLMS measurements formation and (2) the reconstruction of sensed data in the network by using our proposed termination rule. We employ OMNeT++ to establish our sensor network with different randomly distributed wireless nodes. Due to the data reconstruction performance of the sink node involved in the sparsity of data itself and the chosen of reconstruction algorithm, it belongs to the category of the application layer; as a matter of convenience, we adopt MATLAB to reconstruct the data received by the sink node. 6.1. Energy Consumption of Measurements Formation in MLMS Construction and Maintainance We randomly disperse N nodes in an 800 ˆ 800 two-dimensional plane. Each node has the same transmission distance of r. Only the two nodes, whose Euclidean distance (d) between each other is less than r, can communicate with each other. In order to guarantee the network connectivity, ? the communication distance of nodes is fixed at 800 5{N [22], and the collection node is located at (400,800). The detailed parameters are shown in Table 2. We compare our MLMS method with the schemes mentioned in the Section 2, including Plain-CS, Hybrid-CDG, and the PB-CDG [14]. Measurement coefficients for all of these methods obey Bernoulli distribution (entries take value of either 1 or ´1, each with probability of 1/4, and value 0, with probability of 1/2). Table 2. Parameters of the simulation experiments. Parameter

Value

size of network number of nodes max transmission radius sink node position packet length PHY layer MAC layer E_elec E_start channel type

800 ˆ 800 ? 100~1200 800 5/(number of nodes) (400,800) 120 Byte Free space propagation model CSMA/CA 50 nJ/bit 250 nJ/packet erasure channel

We first test the energy consumption of various measurement formation methods with the increasing number of measurements and the nodes in the network (density). Note that, similar to reference [14], we measure the node’s energy consumption by the number of packets it transmitted. The experiment results are shown in Figure 9. We can see that Non-CS and Hybrid-CS incur higher energy cost than that of PB-CDG and the proposed MLMS method, because all measurements in these two methods are collected through a single fixed forwarding tree, which does not consider the sparse structure and derivative routing path of each measurement. Despite the fact that both MLMS-CDG and PB-CDG are derived from the idea of a Steiner tree, MLMS-CDG consumes less energy than PB-CDG (up to 14.52% and 17.76% in Figure 9A,B, respectively). This is largely because MLMS incurs lower

Sensors 2016, 16, 462

14 of 22

overhead of measurement coefficient distribution than PB-CDG (with an average of 11.4% in different network densities). Sensors 2016, 16, 462 14 of 22 4500

4000

Transmission Cost

3500

3000

2500

Non-CS Hybrid-CS PB-CDG MLMS-CDG

2000

1500

10

20

30 40 50 Number of Projections, (N=500 nodes)

60

70

( A) 16000 Non-CS Hybrid-CS PB-CDG MLMS-CDG

14000

Transmission Cost

12000 10000 8000 6000 4000 2000 0 100

200

300

400 500 600 700 800 900 Number of nodes, (M=N*10%)

1000 1100

1200

(B) Figure 9. Comparison of MLMS measurement formation and other methods. (A) Experiment of

Figure 9. Comparison of MLMS measurement formation and other methods. (A) Experiment of the the influence of the number of measurements; (B) Experiment of the influence of the density of influence of the number of measurements; (B) Experiment of the influence of the density of the network. the network.

We We thenthen experiment control messages messagesof ofPB-CDG PB-CDG MLMS experimentononthe the number number of control and and MLMS measurement-formation methods with increasing value of node the node density in network. the network. measurement-formation methods with the the increasing value of the density in the Because Because the PB-CDG did not specify the generation of measurement coefficients as as the PB-CDG method didmethod not specify the generation methodmethod of measurement coefficients as well well as the processing strategy of the node failures and the packet losses, we treat the PB-CDG the processing strategy of the node failures and the packet losses, we treat the PB-CDG scheme in the scheme in the same manner of measurement generation as MLMS. Figure 10A shows the overhead same manner of measurement generation as MLMS. Figure 10A shows the overhead of constructing of constructing the measurement-formation paths by using original methods and compressed the measurement-formation paths by using original methods and compressed construction method construction method (MLMS-CC). With the growth of the number of sensor nodes, we observe that (MLMS-CC). With the more growth of thethan number of sensor we observe increases PB-CDG increases sharply both MLMS andnodes, MLMS-CC methods that do. PB-CDG This is mainly morebecause sharplyPB-CDG than both MLMS and MLMS-CC methods This is mainly because builds several line-shape paths for each do. measurement formation whichPB-CDG will take builds a several line-shape paths for eachmessages measurement formationglobal which seeds will take a greaterofnumber control greater number of control in distributing (overheads MLMS ofand messages in distributing global seeds MLMS MLMS-CC MLMS-CC are averagely 53.33% and(overheads 68.70% less of than that ofand PB-CDG). Then,are we averagely experiment53.33% on the and influence of the intensity of interested nodes. As shown in Figure 10B, at the beginning of increasing 68.70% less than that of PB-CDG). Then, we experiment on the influence of the intensity of interested theAs density of in interested nodes, messagesofofincreasing all schemesthe increase sharply, but the difference nodes. shown Figure 10B, at control the beginning density of interested nodes, control is that the MLMS-CC tends to increase slowly after intensity ratio equal to 50% which is less messages of all schemes increase sharply, but the difference is that the MLMS-CC tends tothan increase MLMS (60%) and PB-CDG (70%). This is because if the interested nodes have accounted for a certain slowly after intensity ratio equal to 50% which is less than MLMS (60%) and PB-CDG (70%). This is

Sensors 2016, 16, 462 Sensors 2016, 16, 462

15 of 22

15 of 22

because if the interested nodes have accounted for a certain part of nodes in the network, continuing part of nodes in the network, continuing to increase the number of interested nodes will lead toto a increase the number of interested nodes will lead to a high efficiency of constructing groups in MLMS high efficiency of constructing groups in MLMS method, therefore the MLMS-CC and MLMS method, therefore the MLMS-CC methods are not severely affected.and MLMS methods are not severely affected. 14

x 10

5

PB-CDG

Number of control messages

12

MLMS MLMS-CC

10 8 6 4 2 0

10

20

30

40

50 60 70 80 90 Number of nodes (10)

100 110 120 AVE

( A) x 10

5

14 PB-CDG 12

MLMS


MLMS-CC 10

8

6

4

2

20% 30% 40% 50% 60% 70% 80% AVE Proportion of the number of interested nodes to the total number of network nodes

(B) Figure 10. Comparison control messages messages of of MLMS MLMS measurement measurement formation formation and Figure 10. Comparison of of the the number number of of control and other methods. (A) Experiment of the overhead of the construction of measurement-formation other methods. (A) Experiment of the overhead of the construction of measurement-formation paths paths (interested each measurement measurement account account for 50% of of nodes nodes in in network); network); (B) (B) Experiment Experiment of of the (interested nodes nodes of of each for 50% the influence of interested interested nodes. nodes. influence of of the the number number of

In the performance evaluation of using the Piggyback mechanism to process node failures, we In the performance evaluation of using the Piggyback mechanism to process node failures, we respectively simulated the influence of network size (number of nodes 500 and 1000) and intensity respectively simulated the influence of network size (number of nodes 500 and 1000) and intensity of of interested nodes (30% and 60% of nodes in the network). Each node on a MLMS path may fail interested nodes (30% and 60% of nodes in the network). Each node on a MLMS path may fail with with a probability of 1%~7% during the measurement-formation periods. The experiment results a probability of 1%~7% during the measurement-formation periods. The experiment results are shown are shown in Figure 11A, we can see that both the network size and density of the interested nodes in Figure 11A, we can see that both the network size and density of the interested nodes will have will have a greater impact on the Piggyback mechanism, but the impact of network size is a greater impact on the Piggyback mechanism, but the impact of network size is significantly larger significantly larger than that of the intensity of interested nodes (for example, under the condition than that of the intensity of interested nodes (for example, under the condition of node failure rate = 7%, of node failure rate = 7%, if the number of network size doubles (from 500 to 1000), the number of if the number of network size doubles (from 500 to 1000), the number of packets increases 205.46%, packets increases 205.46%, while if node density doubles (from 30% to 60%), the number of packets only increases 42.17%). This is because when the network size increases, it will inevitably lengthen the MLMS paths, which will result in a remarkable overhead increase in returning failure-node

Sensors 2016, 16, 462

16 of 22

while node doubles (from 30% to 60%), the number of packets only increases 42.17%). This is Sensors if 2016, 16, density 462 16 of 22 because when the network size increases, it will inevitably lengthen the MLMS paths, which will result in a remarkable overhead increase in returning failure-node information in the information in the Piggyback mechanism; however, increasing the density of Piggyback interested mechanism; nodes does however, density of interested nodes does not have a significant influence. not have aincreasing significantthe influence. 16

x 10

6

Nodes 500, intensity 50% 14

Nodes 1000, intensity 50% Nodes 1000, intensity 30% Nodes 1000, intensity 60%


12

10

8

6

4

2

0

No error 1% 2% 3% 4% 5% 6% 7% Probability of failure nodes in each measurement-formation path

( A) x 10

6

4.5 4 NAck mechanism

Number of messages

3.5

Negotiation mechanism

3 2.5 2 1.5 1 0.5

No error 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Bit error rate ( 10 -4 )

(B) Figure 11. Overhead of maintaining measurement-formation paths. (A) Influence of nodes failure; Figure 11. Overhead of maintaining measurement-formation paths. (A) Influence of nodes failure; (B) Influence Influence of of packets packets loss. loss. (B)

Because nodes in the network communicate over an erasure channel, by setting the bit error Because nodes in the we network communicate an erasure channel, by setting the bit error rate (BER) of this channel, are able to generate aover random packet losses context. We simulated the rate (BER) of this channel, we are able to generate a random packet losses context. We simulated the proposed NAck mechanism and the hand-shaking negotiation mechanism on a 1000-node network proposed NAck mechanism and the hand-shaking negotiation mechanism on a 1000-node network (the intensity of interested nodes is 50%). The experimental results are shown in Figure 11B. We can (the intensity interested between nodes is 50%). The mechanisms. experimental With results areincreasing shown inof Figure 11B. packets We can easily find theofdifferences these two the the BER, easily find the differences between these two mechanisms. With the increasing of the BER, packets transmission in the NAck mechanism increase more rapidly than in the negotiation mechanism. transmissionwhen in thethe NAck more NAck rapidly than in the negotiation Specifically, BERmechanism is relativelyincrease small, using mechanism will incur lessmechanism. number of Specifically, when the BER is relatively small, using NAck mechanism will incur less numbersuccessful of control control packets; when BER is around 0.4 × 10 ~0.5 × 10 (in this case, the packets ´4 ~0.5 ˆ 10´4 (in this case, the packets successful delivery rate packets; when BER is around 0.4 ˆ 10 delivery rate is about 95%), both mechanisms have a similar performance, and if we still increase the error rate, the negotiation mechanism will outperform the NAck. This means that, when the network conditions are poor, the negotiation mechanism will be a better choice.

Sensors 2016, 16, 462

17 of 22

is about 95%), both mechanisms have a similar performance, and if we still increase the error rate, the negotiation mechanism will outperform the NAck. This means that, when the network conditions are poor, the negotiation mechanism will be a better choice. 6.2. Measurement-Formation and Reconstruction of OTD Datasets with the Proposed Termination Rule Sensors 2016, 16, 462

17 of 22

In order to verify the effectiveness of the proposed adaptive termination rule, we select the two Measurement-Formation and 1) Reconstruction of OTD Datasets withsignal. the Proposed Termination Rule of ∆M on OTD6.2. datasets (depicted in Figure as the original experimental Due to the influence the performance this algorithm, we carry out the experiment the number In order of to verify the effectiveness of the proposed adaptive on termination rule,of wemeasurements select the two and the quality of signal reconstruction under ∆M conditions. With of ensuring OTD datasets (depicted in Figure 1) as thedifferent original experimental signal. Duethe to purpose the influence of M the on theofperformance of this algorithm, werepeat carry out experimentofoneach the number of measurements accuracy the experimental results, we thethe experiment set of parameters 500 times, anddetailed the quality of signal reconstruction under and the experimental parameters are listeddifferent in Table3.M conditions. With the purpose of ensuring the accuracy of the experimental results, we repeat the experiment of each set of parameters 500 times, Table and the experimental listed in Table 3. 3. detailed Parameters of the OTDparameters simulationare experiments. Table 3. Parameters of the OTD simulation experiments.

Parameter Value Parameter Value data file of OTD A 1403291948.cor datafile fileof ofOTD OTDBA 1403291948.cor data 1404020436.cor data file of OTD B 1404020436.cor transformation base 6-level Daubechies2 reconstruction method BP transformation base 6-level Daubechies2 experiment rounds 500 reconstruction method BP relative error ||xˆ ´ 500 x||{||x|| experiment rounds SNR 20 lg ‖p||x| ‖⁄| ‖xˆ ‖´ x||q − |{| relative error range of ∆M 1~8 20 lg(‖ ‖⁄‖ − ‖) SNR range of ∆ 1~8

Figure 12 shows an execution process (∆M = 8) of our proposed method in the recovery of two shows process (  Mprocess ourtwo proposed method in the recovery of two the  8 ) of of datasets. Figure We can12find thataninexecution the reconstruction datasets, the relative error between datasets. We can find that in the reconstruction process of two datasets, the relative error between reconstructed signal and the original signal tends to be 0 as the number of measurements increases, the reconstructed signal and the original signal tends to be 0 as the number of measurements and this decline continues until reaching the termination condition of Proposition 1. It can be seen that increases, and this decline continues until reaching the termination condition of Proposition 1. It can whenbethe algorithm terminates (dataset OTD A stops at Step 276, and OTD B stops at Step 380), we can seen that when the algorithm terminates (dataset OTD A stops at Step 276, and OTD B stops at acquire results (relative error values are 0.0068 and 0.0036, respectively). Stephigh 380),quality we canreconstruction acquire high quality reconstruction results (relative error values are 0.0068 and Detailed experimental results are shown in Table 4, and Figure 13 shows the average relative 0.0036, respectively). Detailed experimental results are shown in Table 4, and Figure 13 shows the time consumption of the experiment on these two datasets. average relative time consumption of the experiment on these two datasets.

Relative error

Stop projection of OTD when  M = 8

0.5

0

Temperature(oC)

OTD A Stop sign of OTD A OTD B Stop sign of OTD B

1

0

50

100

150 200 250 300 Number of projections Reconstruction of OTD when  M = 8

25

350

400

Original OTD A Reconstructed OTD A Original OTD B Reconstructed OTD B

20 15 10 5 100

200

300

400 500 600 700 Depth/Pressure(dbar)

800

900

1000

Figure 12. Experiment process (∆M methodonon OTD datasets. Changing Figure 12. Experiment process (∆ ==8)8)of ofour our proposed proposed method OTD datasets. (top)(top) Changing process of relative the relative error; (bottom)Reconstruction Reconstruction results. process of the error; (bottom) results.

Sensors 2016, 16, 462

18 of 22

Sensors 2016, 16, 462

18 of 22

Table Table 4. 4. Experiment Experiment result result of of our our proposed proposedmethod methodperformed performedon onOTD OTDdatasets. datasets.

∆ ∆M

1 1

22 33 44 55 66 77 88

DataSet DataSet

A A B B A A B B A A B B A A B B A A B B A A B A A B A A B

Number Measurements Number of of Measurements AVG MIN MAX AVG MIN MAX 143.43 103 198 143.43 103 198 141.60 103 180 141.60 103 180 189.26 118 274 189.26 118 274 181.07 156 210 181.07 156 210 222.04 160 283 222.04 160 283 196.70 169 226 196.70 169 226 246 180 316 246 180 316 211.73 176 244 211.73 176 244 256.85 175 330 256.85 175 330 213.33 165 265 213.33 165 265 276.22 202 334 276.22 202 334 225.20 196 262 225.20 196 262 286.69 219 338 286.69 219 338 220.87 184 254 220.87 184 254 296.24 212 356 296.24 212 356 225.07 188 268 225.07 188 268

Relative Errorof ofResults Results Relative Error AVG MIN MAX AVG MIN MAX 0.1042 0.0139 0.4568 0.1042 0.0139 0.4568 0.1202 0.0110 0.4504 0.1202 0.0110 0.4504 0.0225 0.0068 0.1527 0.0225 0.0068 0.1527 0.0152 0.0056 0.0517 0.0152 0.0056 0.0517 0.0131 0.0069 0.0445 0.0131 0.0069 0.0445 0.0102 0.0045 0.0197 0.0102 0.0045 0.0197 0.0098 0.0057 0.0175 0.0098 0.0057 0.0175 −8 ´8 0.0066 2.1694 2.1694ˆ×1010 0.0115 0.0066 0.0115 0.0090 0.0051 0.0212 0.0090 0.0051 0.0212 −10 ´10 0.0065 4.6999 4.6999ˆ×1010 0.0144 0.0065 0.0144 0.0078 0.0043 0.0135 0.0078 0.0043 0.0135 −11 ´11 0.0047 4.5934 4.5934ˆ×1010 0.0078 0.0047 0.0078 0.0070 0.0043 0.0127 0.0070 0.0043 0.0127 ´11 −11 0.0052 6.7932 6.7932ˆ×1010 0.0118 0.0052 0.0118 0.0069 0.0040 0.0132 0.0069 0.0040 0.0132 ´10 −10 0.0049 1.3999 1.3999ˆ×1010 0.0105 0.0049 0.0105

MIN SNR MIN SNR

22.9357

22.9357 22.2775 22.2775

34.1279

34.1279 37.0925 37.0925

38.2235 38.2235 40.3772 40.3772 40.4925 40.4925 47.3546 47.3546 41.2457 41.2457 48.6981 48.6981 42.3482 42.3482 62.4401 62.4401 43.3284 43.3284 64.2592 64.2592 43.4967 43.4967 69.4010 69.4010

300 280

Dataset OTD A Dataset OTD B

Average relative time consumption

260 240 220 200 180 160 140 120 100

1

2

3

4 5 Value of  M

6

7

8

Figure = 8) Figure 13. 13. Experiment Experiment process process (∆M (∆ = 8) of of our our proposed proposedmethod methodon onOTD OTDdatasets. datasets.

By comparing comparing the the experimental experimental results results of of these these two two time-continuous time-continuous datasets, datasets, we find find that that the the By number ofofrequired requiredmeasurements measurements is different the number for OTD-B is less that of number is different (i.e., (i.e., the number for OTD-B is less than thatthan of OTD-A), OTD-A), which is duedifferent to their different sparsity(as degrees (asinshown 1, the sparsity degree of which is due to their sparsity degrees shown Figurein1,Figure the sparsity degree of OTD-B OTD-B higher than that of OTD-A, i.e., 53-sparse and 61-sparse, respectively). the increase of is higheristhan that of OTD-A, i.e., 53-sparse and 61-sparse, respectively). With With the increase of ∆M, , the average number of measurements required by the two datasets will increase rapidly (i.e., Maverage the number of measurements required by the two datasets will increase rapidly (i.e., OTD-A OTD-A increases byand 106.5% andincreases OTD-B increases 59%). The minimum and maximum number increases by 106.5% OTD-B by 59%). by The minimum and maximum number indexes indexes also show aphenomenon. similar phenomenon. The is that with theofincrease (i.e., the also show a similar The reason is reason that with the increase ∆M (i.e.,oftheM increase of increase of measurement required byofeach round of reconstruction (∆ according ) also grows), according to measurement required by each round reconstruction (∆M) also grows), to the conclusion theProposition conclusion1,of Proposition 1, ittoismeet morethe difficult to meettermination the measurement termination of it is more difficult measurement condition under thatcondition situation. under that situation. Another is typical is that at the beginning increasing M , the Another typical characteristic that atcharacteristic the beginning of increasing ∆M, theof SNR of reconstructed SNR ofincreases reconstructed signal increases rapidly; but awhen thelevel, SNRfurther reachesincreasing a certain ∆M level,has further signal rapidly; but when the SNR reaches certain little increasing little effect. For example, for= dataset OTD-A, when 1, Min SNRis=up 22.9, M hasfor M =∆M effect. For example, dataset OTD-A, when ∆M 1, Min SNR = 22.9, and when = 4, SNR to is up to to 40.49; continuing to SNR increase to 8, This the corresponding SNR and when M = 4, M 43.5. 40.49; if continuing toSNR increase ∆M 8, theif corresponding becomes feature is caused by becomes 43.5. This feature is caused by the shortcoming of CS reconstruction algorithm itself in processing non-strictly sparse signals, which is beyond the scope of this article. Note that, although

Sensors 2016, 16, 462

19 of 22

Sensors 2016, 16, 462

19 of 22

the shortcoming of CS reconstruction algorithm itself in processing non-strictly sparse signals, which is beyond the scope of this article. Note that, although we can acquireM a higher recovery quality and according to Proposition 1) we can acquire a higher ´ recovery quality and a lower error rate ( 1  3 ∆M a lower error rate (1 ´ 3 according to Proposition 1) with a higher ∆M value, the measurement with a higher M value, the measurement cost (constructed in the network) would also have a cost (constructed in the network) would also have a significant increase (because of the extra number significant increase (because of the extra number of required measurements). So, on the whole, there of required measurements). So, on the whole, there is a tradeoff between the performance and the cost, is a tradeoff between the performance and the cost, and the specific network implementation target and the specific network implementation target also should be considered to choose a proper value also should be considered to choose a proper value of M . of ∆M.

6.3. Experiment of Data Data Gathered Gathered by by Practical Practical Network Network 6.3. Experiment of In order order to to evaluate evaluate the the performance performance of of the the proposed proposed algorithm algorithm in in dealing dealing with with the the data data In acquired from consisting of 30 nodes to monitor the acquired from an an actual actualenvironment, environment,we weimplemented implementeda WSN a WSN consisting of 30 nodes to monitor luminosity of the environment. The microprocessor of the sensor nodes is CC2530 [23], and the the luminosity of the environment. The microprocessor of the sensor nodes is CC2530 [23], and the luminosity-observation sensor is BH1750 [24]. The measurement resolution of this sensor is 0.1 (lx), luminosity-observation sensor is BH1750 [24]. The measurement resolution of this sensor is 0.1 (lx), ±5(lx) (lx) the measurement range 0 to 65,535 (lx). impact and the the measurement measurementerror errorisis˘5 and inin the measurement range of of 0 to 65,535 (lx). ToTo testtest thethe impact of of the collected signals on the proposed method under different experiment environments and the collected signals on the proposed method under different experiment environments and topology topology conditions, we deployed theinto network into bothand outdoor and semi-outdoor scenes. The conditions, we deployed the network both outdoor semi-outdoor scenes. The so-called so-called semi-outdoor is as Figure part of areoutdoors, placed outdoors, the semi-outdoor is as shown inshown Figurein 14B, i.e., a14B, parti.e., of anodes arenodes placed and the and rest are rest are placed in the building. In order to obtain the entire environmental luminosity data, the sink placed in the building. In order to obtain the entire environmental luminosity data, the sink node will node will enquire sensorofreadings ofevery all nodes every 5 min. enquire the sensor the readings all nodes 5 min.

deployment of data gathering WSN.WSN. (A) Outdoor deployment; (B) Semi-outdoor Figure 14. 14. Network Network deployment of data gathering (A) Outdoor deployment; (B) Semideployment. outdoor deployment.

According to to our our experimental experimental results, results, our our proposed proposed schemes schemes can can be be directly directly applied applied to to both both According uniform and random outdoor deployment experiments; but for the semi-outdoor experimental data, uniform and random outdoor deployment experiments; but for the semi-outdoor experimental data, the algorithm algorithm is is not not able able to to meet meet the the stopping stopping condition condition within the within the the effective effective measurement measurement steps. steps. Analysis indicates indicatesthat, that,inin semi-outdoor experiment, nodes placed outdoors are exposed to Analysis thethe semi-outdoor experiment, nodes placed outdoors are exposed to strong strong light, while their neighbor nodes may be placed in a relatively dimmed building, which will light, while their neighbor nodes may be placed in a relatively dimmed building, which will destroy destroy thesparse spatial sparse characteristics of the luminosity data by observed by the whole(for network (for the spatial characteristics of the luminosity data observed the whole network one epoch one epoch of the network sensed data (denoted as vector x) which is shown in Figure 15A, we can of the network sensed data (denoted as vector x) which is shown in Figure 15A, we can see that it is see that it is not sparse under wavelet transformation, as shown in Figure 15C). not sparse under wavelet transformation, as shown in Figure 15C). After examining the statistics of the observation data, we find that, although the reading values After examining the statistics of the observation data, we find that, although the reading values of of the adjacent nodes are very different, for the two consecutive rounds of network sensed data, the adjacent nodes are very different, for the two consecutive rounds of network sensed data, there are denote the sensor there are a few differences between the readings of the adjacent nodes. We let a few differences between the readings of the adjacent nodes. We let xit denote the sensor reading of − denotes the difference of node reading of node during data collecting epoch ; ∆ = node i during data collecting epoch t; ∆xit “ xit`1 ´ xit denotes the difference of node i between two ∆ , ∆ of the adjacent nodes should be related (the between two consecutive periods. Then consecutive periods. Then ∆xit , ∆x tj of the adjacent nodes should be related (the experimental results experimental results15B,D; are shown in Figure 15B,D; the coefficient of wavelet is emanative, but the are shown in Figure the wavelet coefficient of xwavelet is emanative, but the transformation and as the wavelet transformation coefficient of ∆ obviously tends to be 0). If we denote ( + 1)th and the ( )th network sensed data, we can acquire the following two equations:

Sensors 2016, 16, 462

20 of 22

coefficient of ∆x obviously tends to be 0). If we denote xt` 1 and xt as the (t + 1)th and the (t)th network 20 of 22 sensed data, we can acquire the following two equations:

Sensors 2016, 16, 462

(7) (7)

xtt “  U ∆x ΨU

(8) (8)

Luminosity (lx)

t  11 txt“xxt` ´xxt t ∆x

Figure 15. Wavelet representation representation of of an an original originalillumination illuminationdata datavector vector x and the difference vector ∆ ∆ ; (C) Wavelet ∆x.. (A) (A) Original Original data data vector vector x;; (B) Difference between two consecutive data data vectors vectors ∆x; representation of x; ;(D) (D)Wavelet Waveletrepresentation representationofof∆x. ∆ .

We assume that that the the previous previous rounds rounds of of data data collection collection have been completed, completed, and and the the network network We assume have been has been worked out at the sink node. Thus, if the sink node has received the sensed data sensed data x t has been worked out at the sink node. Thus, if the sink node has received the measurement results yt`1 at at t ++11 epoch, epoch,where whereyt`1 “= Φt`1 x t`1,, from Equation (7), (7), by by solving solving measurement results from Equation out ∆ , we can get . Actually, can be treated as the measurement matrix of ∆ , because: t t ` 1 t ` 1 t out ∆x , we can get x . Actually, Φ can be treated as the measurement matrix of ∆x , because: y t  1  y ' t   t  1 x t  1   t  1 x t   t  1 x t . yt`1 ´ y1t “ Φt`1 x t`1 ´ Φt`1 x t “ Φt`1 ∆x t

(9) (9)

Therefore, by Equations (8) and (9), x t can be calculated through a typical CS reconstruction Therefore, by Equations (8) and ∆x t can=be calculated through a typical reconstruction method. It should be pointed out(9), that ≠ in Equation (9). CS This is because t 1 t`1 x t ‰ yt in Equation (9). This is because measurement method. It should be pointed out that y “ Φ measurement matrices are randomly generated in different data gathering epochs. matrices randomly in different dataconsecutive gathering epochs. Withare this method,generated we experiment on two rounds of data collected under each With this method, The we experiment two consecutive rounds collected under each deployment condition. experimentalonresults are shown in Tableof5. data We can see that, with an deployment condition. The experimental results are shown in Table 5. We can see that, with increasing ∆ , both the recovery quality of signal and the required number of measurements an increasing ∆M,outdoor both theexperimental recovery quality signal the required number of measurements ∆ = 5 and increase. But for data,ofSNR of and reconstructed signal between increase. But for outdoor experimental data, SNR of reconstructed signal between ∆M = 5 and ∆ = 6 changes slightly; thus, considering the factor of the average number of measurements, for ∆M application = 6 changes and slightly; thus, considering the factor∆ of = the5 average of measurements, for this will be anumber good choice. this the reconstruction algorithm, application and the reconstruction algorithm, ∆M = 5 will be a good choice. Table 5. Experiment of reconstruction the data collected by the network. Location outdoor semi-outdoor

Topology mesh random mesh random

∆

=

Measurement 9.3 12.4 13.3 14.9

SNR 32.5 28.6 32.1 29.7

∆ = Measurement 14.4 15.0 17.9 19.9

SNR 62.68 42.35 31.3 45.55

∆ = Measurement 20.6 22.5 24.4 26.6

SNR 67.83 42.72 46.06 44.08

Sensors 2016, 16, 462

21 of 22

Table 5. Experiment of reconstruction the data collected by the network.

Location

Topology

∆M = 4

∆M = 5

∆M = 6

Measurement

SNR

Measurement

SNR

Measurement

SNR

outdoor

mesh random

9.3 12.4

32.5 28.6

14.4 15.0

62.68 42.35

20.6 22.5

67.83 42.72

semi-outdoor

mesh random

13.3 14.9

32.1 29.7

17.9 19.9

31.3 45.55

24.4 26.6

46.06 44.08

7. Conclusions In this paper we have proposed a feedback-control-based compressed data gathering scheme. Different from existing studies, in this scheme, the sink node can adaptively adjust the measurement formation according to the reconstruction of received measurements in each data gathering epoch. To save the energy cost of the measurement-formation process in the network, a sparse measurement method and optimized measurement-formation construction strategy are proposed, and the control traffic on the network for requesting additional measurements is designed to combine with the generation of measurement coefficients. Simulation experiments verify that our measurement formation method outperforms existing methods in both energy consumption and controlling message quantity. Moreover, datasets from both OTD and practical network deployment also show the effectiveness of our proposed feed-back data recovery method. Acknowledgments: This research was supported by the National Natural Science Foundation of China (No. 61502249), the Pre-research Fund of General Reserve Department of PLA (No. 9140A05010113xxx), the Innovation Program of Jiangsu (No. CX(14)2114 and CX(13)3054), and Shenzhen Strategic Emerging Industry Development Special Fund (No. JCYJ20130331151710105). Author Contributions: Jun Yin and Yuwang Yang conceived and designed the experiments; Jun Yin and Lei Wang performed the experiments; Jun Yin analyzed the data; Lei Wang contributed analysis tools; Jun Yin wrote the paper. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Chiwewe, T.M.; Hancke, G.P. A distributed topology control technique for low interference and energy efficiency in wireless sensor networks. IEEE Trans. Ind. Inf. 2012, 8, 11–19. [CrossRef] Guo, P.; Jiang, T.; Zhang, Q.; Zhang, K. Sleep scheduling for critical event monitoring in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 345–352. [CrossRef] Ma, M.; Yang, Y.; Zhao, M. Tour planning for mobile data-gathering mechanisms in wireless sensor networks. IEEE Trans. Veh. Technol. 2013, 62, 1472–1483. [CrossRef] Bagaa, M.; Challal, Y.; Ouadjaout, A.; Lasla, N.; Badache, N. Efficient data aggregation with in-network integrity control for WSN. J. Parallel Distrib. Comput. 2012, 72, 1157–1170. [CrossRef] Tan, L.S.; Wu, M. Data reduction in wireless sensor networks: A hierarchical LMS prediction approach. IEEE Sens. J. 2016, 16, 1708–1715. [CrossRef] Wu, M.; Tan, L.S.; Xiong, N.X. Data prediction, compression, and recovery in clustered wireless sensor networks for environmental monitoring applications. Inf. Sci. 2016, 329, 800–818. [CrossRef] Wu, M.; Tan, L.S.; Xiong, N.X. A Structure Fidelity Approach for Big Data Collection in Wireless Sensor Networks. Sensors 2015, 15, 248–273. [CrossRef] [PubMed] Fasolo, E.; Rossi, M.; Widmer, J.; Zorzi, M. In-network aggregation techniques for wireless sensor networks: A survey. IEEE Wirel. Commun. 2007, 14, 70–87. [CrossRef] Liu, X.; Luo, J.; Rosenberg, C. Compressed data aggregation: Energy-efficient and high-fidelity data collection. IEEE/ACM Trans. Netw. 2013, 21, 1722–1735. Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [CrossRef] Ahlswede, R.; Cai, N.; Li, S.Y.; Yeung, R.W. Network information flow. IEEE Trans. Inf. Theory 2000, 46, 1204–1216. [CrossRef]

Sensors 2016, 16, 462

12.

13. 14. 15.

16.

17. 18.

19.

20. 21. 22.

23. 24.

22 of 22

Quer, G.; Masiero, R.; Munaretto, D.; Rossi, M.; Widmer, J.; Zorzi, M. On the interplay between routing and signal representation for compressive sensing in wireless sensor networks. In Proceedings of the IEEE Information Theory and Applications Workshop, San Diego, CA, USA, 8 February 2009; pp. 206–215. National Oceanic and Atmospheric Administration. Ocean Temperature Data. Available online: http://www.pmel.noaa.gov/tao/jsdisplay/ (accessed on 25 July 2015). Ebrahimi, D.; Assi, C. Optimal and Efficient Algorithms for Projection-Based Compressive Data Gathering. IEEE Commun. Lett. 2013, 17, 1572–1575. [CrossRef] Wang, W.; Garofalakis, M.; Ramchandran, K. Distributed sparserandom projections for refinable approximation. In Proceedings of the ACM 6th International Conference on Information Processing in Sensor Networks, Cambridge, MA, USA, 25 April 2007; pp. 331–339. Nguyen, N.; Jones, D.L.; Krishnamurthy, S. Netcompress: Coupling network coding and compressed sensing for efficient data communication in wireless sensor networks. In Proceedings of the IEEE Workshop on Signal Processing Systems, San Francisco, CA, USA, 6–8 October 2010; pp. 356–361. Hakimi, S.L. Steiner problem in graphs and its implications. Networks 1971, 1, 113–133. [CrossRef] Kuo, T.W.; Tsai, M.J. On the construction of data aggregation tree with minimum energy cost in wireless sensor networks: Np-completeness and approximation algorithms. In Proceedings of the 2012 IEEE INFOCOM, Orlando, FL, USA, 25 March 2012; pp. 2591–2595. Karp, B.; Kung, H.T. GPSR: Greedy perimeter stateless routing for wireless networks. In Proceedings of the 6th Annual International Conference on Mobile Computing and Networking ACM, Boston, MA, USA, 6–11 August 2000. Malioutov, D.M.; Sanghavi, S.R.; Willsky, A.S. Sequential compressed sensing. IEEE J. Sel. Top. Signal Process. 2010, 4, 435–444. [CrossRef] Wang, J.; Tang, S.; Yin, B.; Li, X.Y. Data gathering in wireless sensor networks through intelligent compressive sensing. In Proceedings of the 2012 IEEE INFOCOM, Orlando, FL, USA, 25 March 2012; pp. 603–611. Gupta, P.; Kumar, P.R. Critical power for asymptotic connectivity in wireless networks. In Stochastic Analysis, Control, Optimization and Applications; McEneaney, W.M., Yin, G.G., Zhang, Q., Eds.; Birkhäuser: Boston, MA, USA, 1999; pp. 547–566. Texas Instrument, a True System-on-Chip Solution for 2.4-GHz IEEE 802.15.4 and ZigBee Applications. Available online: http://www.ti.com/lit/ds/symlink/cc2530.pdf (accessed on 12 February 2015). ROHMTM Semiconductor, Technical Note of Digital 16 Bit Serial Output Type Ambient Light Sensor IC BH1750FV. Available online: http://www.datasheet.hk/search.php?part=bh1750fv&stype=part (accessed on 12 February 2015). © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

LPTA: location predictive and time adaptive data gathering scheme with mobile sink for wireless sensor networks.

Data-Gathering Scheme Using AUVs in Large-Scale Underwater Sensor Networks: A Multihop Approach.

Energy-efficient sensing in wireless sensor networks using compressed sensing.

An Energy-Aware Hybrid ARQ Scheme with Multi-ACKs for Data Sensing Wireless Sensor Networks.

Sensor data security level estimation scheme for wireless sensor networks.

An Online Dictionary Learning-Based Compressive Data Gathering Algorithm in Wireless Sensor Networks.

On Reliable and Efficient Data Gathering Based Routing in Underwater Wireless Sensor Networks.

A cycle-based data aggregation scheme for grid-based wireless sensor networks.

An Efficient Data-Gathering Routing Protocol for Underwater Wireless Sensor Networks.

Energy Efficient Cluster Based Scheduling Scheme for Wireless Sensor Networks.

An Efficient Location Verification Scheme for Static Wireless Sensor Networks.

A Novel Scheme for an Energy Efficient Internet of Things Based on Wireless Sensor Networks.

QOS-aware error recovery in wireless body sensor networks using adaptive network coding.

Connectivity Restoration in Wireless Sensor Networks via Space Network Coding.

An Efficient Distributed Compressed Sensing Algorithm for Decentralized Sensor Network.

Noise-tuning-based hysteretic noisy chaotic neural network for broadcast scheduling problem in wireless multihop networks.

Data Aggregation Based on Overlapping Rate of Sensing Area in Wireless Sensor Networks.

Power Allocation Based on Data Classification in Wireless Sensor Networks.

Adaptive routing for dynamic on-body wireless sensor networks.

An efficient identity-based key management scheme for wireless sensor networks using the Bloom filter.

Secure biometric image sensor and authentication scheme based on compressed sensing.

An adaptive handover prediction scheme for seamless mobility based wireless networks.

An efficient and reliable geographic routing protocol based on partial network coding for underwater sensor networks.

An efficient and adaptive mutual authentication framework for heterogeneous wireless sensor network-based applications.