CN110011847B

CN110011847B - Data source quality evaluation method under sensing cloud environment

Info

Publication number: CN110011847B
Application number: CN201910256445.9A
Authority: CN
Inventors: 李默涵; 田志宏; 孙彦斌; 顾钊铨; 韩伟红; 仇晶; 苏申
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2022-03-25
Anticipated expiration: 2039-03-29
Also published as: CN110011847A

Abstract

The embodiment of the invention discloses a data source quality evaluation method in a sensing cloud environment, which comprises the following steps: acquiring current and historical monitoring data of a sensing cloud storage data source, wherein the sensing cloud is a combination of cloud computing and a wireless sensor network and is used for collecting and processing monitoring data from a plurality of sensor nodes or sensor sub-networks; integrating monitoring data of the data source based on the spatial correlation and the temporal correlation and determining a data true value; generating an initial quality evaluation vector of a data source based on the data truth value, and adjusting the initial quality evaluation vector of the data source according to a quality rule; and calculating a final quality evaluation result of the data source according to the adjusted initial quality evaluation vector of the data source. By adopting the invention, the quality of the data source can be described in multiple angles, and the quality of the data source can be more comprehensively depicted.

Description

Data source quality evaluation method under sensing cloud environment

Technical Field

The invention relates to the field of quality evaluation, in particular to a data source quality evaluation method in a sensing cloud environment.

Background

Currently, some data source quality evaluation methods are proposed, and most of the existing methods evaluate the data quality in the database and monitor the poor quality data based on rules, such as conditional function dependency (conditional functional dependency), conditional inclusion dependency (conditional information dependency), aging constraint (current constraint), matching rule (matching rule), and the like. The data source quality may be further evaluated based on the proportion of bad data produced by the data source. The data quality rule generally has the form a → B, with the semantic that "if the value of attribute set a is a, then the value of attribute set B must be B". By screening the data meeting the rule antecedent A ═ a in the database and checking whether the rule postcedent B ═ B meets, whether errors exist in the data can be judged, the data not meeting the rule are considered to be poor data (or called error data), and the data quality of some data sources can be reduced due to the influence of self or external factors. These negatively affected bad data sources, if not discovered in a timely manner, can further affect the quality of service based on the data source. In order to accurately find the poor quality data source, an accurate data source quality evaluation method is required.

Disclosure of Invention

In order to solve the problems, the invention provides a data source quality evaluation method in a sensing cloud environment, which can describe the data source quality more accurately from multiple angles and describe the data source quality more comprehensively.

Based on the above, the invention provides a data source quality evaluation method under a sensing cloud environment, which comprises the following steps: a data source quality assessment method in a sensing cloud environment is characterized by comprising the following steps:

acquiring current and historical monitoring data of a sensing cloud storage data source, wherein the sensing cloud is a combination of cloud computing and a wireless sensor network and is used for collecting and processing monitoring data from a plurality of sensor nodes or sensor sub-networks;

integrating monitoring data of the data source based on the spatial correlation and the temporal correlation and determining a data true value;

generating an initial quality evaluation vector of a data source based on the data truth value, and adjusting the initial quality evaluation vector of the data source according to a quality rule;

and calculating a final quality evaluation result of the data source according to the adjusted initial quality evaluation vector of the data source.

After the current and historical monitoring data of the sensing cloud storage data source are obtained, if the current and historical monitoring data of the sensing cloud storage data source exceed a threshold value, data reduction is conducted on the data, the data reduction is used for reducing the data volume, and the data reduction comprises a segment-by-segment aggregation approximation method or a self-adaptive segment-by-segment constant approximation method.

Wherein the integrating the monitoring data of the data source and determining the data truth value based on the spatial correlation and the temporal correlation comprises: judging whether the data has spatial correlation, and if the data has spatial correlation, aiming at a given data source s_iReading s_iOther sensor node set S in one regular monitoring area around_N ⁽ⁱ⁾And S_N ⁽ⁱ⁾Monitoring data sequence of middle node, S_N ⁽ⁱ⁾The nodes in the Cluster form a Cluster⁽ⁱ⁾。

Wherein said obtaining s_iCluster where it is⁽ⁱ⁾Then, the S is obtained by integrating the position similarity and the data similarity_N ⁽ⁱ⁾Clustering the monitoring data sequence of the middle node, and calculating Cluster⁽ⁱ⁾The centroid at each time is used as the candidate sequence of the true value.

After the truth value candidate sequence after the spatial correlation processing is obtained, time correlation processing, namely smoothing processing, is also required to be performed on the truth value candidate sequence, wherein the smoothing processing comprises an n-order moving average method or a least square method, and the smoothed sequence is a final truth value sequence.

Wherein the generating an initial quality assessment vector for a data source based on the data truth values comprises: comparison s_iEvaluating s by the difference between the value of (a) and the true value_iS is obtained based on a quality evaluation function_iAt t_kQuality value of time Q(s)_i,t_k)，t₁～t_mQuality value of<Q(s_i,t₁),…,Q(s_i,t_m)>The initial quality assessment vector Qvec(s) of si is constructed_i) The quality assessment function includes:

Q(s_i,t_k)＝1-dist(v_ik,true(v_ik))/maxdist

wherein v is_ikIs s_iAt t_kThe value of time, true (v)_ik) Is s_iAt t_kTrue value, dist (v) corresponding to time_ik,true(v_ik) Is v)_ikAnd true (v)_ik) Distance of (1), maxdist is v_ikAnd true (v)_ik) Maximum value of distance (c).

Wherein the quality rule represents positive correlation, negative correlation and other numerical association relations, and the quality rule is represented as:

(f(A)∈target_A)→(g(B)∈valid_B)

where A and B are two attribute sets, f () and g () are functions acting on A and B, target_ATarget value field, valid, representing f (A)_BIs the legal value range, target, of g (B)_AAnd valid_BIs an interval or a set of values or another function, and if the quality rule is satisfied at a certain moment, the data at the moment is considered to be reasonable, and the quality problem does not exist.

Wherein said adjusting an initial quality assessment vector of the data source according to a quality rule comprises:

step (1), calculating Qvec(s)_i)＝<Q(s_i,t₁),…,Q(s_i,t_m)>Mean value Qmean of_iAnd standard deviation QSD_i；

Step (2) of defining a deviation threshold value T_iAt h times the standard deviation, i.e. T_i＝h·QSD_i；

Step (3) for the quality score lower than Qmean_iExceeds T_i(i.e., the quality score is too low) at time t_kGo through all the rules in the data quality rule Ψ, check at time t_k，s_iWhether or not there is a condition of a front-part in the data of (2)The case where the conditions for the back part are satisfied but not satisfied:

a) quality score Q(s) if there are violated rules_i,t_k) Keeping the original shape;

b) if the traversal is completed but no violated rule is found, the quality score at that moment is adjusted and modified to QMen_iAnd jumping to the steps (1) and (2) to update QMen_i、QSD_iAnd T_i；

Step (4), repeating steps (1), (2) and (3) until QMen_iAnd QSD_iNo further change occurred.

Wherein, said s_iThe quality assessment of (2) includes: qvec(s)_i) Mean value of (c), Qvec(s)_i) Standard deviation and stationarity QStationary of_i。

Wherein the mean value Qmean_iI.e. the average homogeneity score over the period of time evaluated, the higher the value, s_iThe better the quality performance;

standard deviation QSD_iI.e. s_iThe smaller the value of the stability of the mass of (1), the more stable the mass;

stationarity QStationary_iThe value range of (1) is { True, False }, the value of True represents stationary, and the value of False represents non-stationary.

The invention comprehensively considers the space-time relevance to discover the truth value and evaluates the quality of the data source, thereby overcoming the defect that the space-time attribute cannot be processed by the existing work and ensuring that the truth value discovery and the quality evaluation are more accurate;

in the quality evaluation process, a new quality rule is provided instead of relying on a true value found by an unsupervised method, and the evaluation result is corrected by using the quality rule, so that the possibility of misjudgment is reduced;

while current methods of assessing data source quality can only give one-dimensional estimates (e.g., error rate), the techniques proposed by the present invention use triplets<QMean_i,QSD_i,QStationary_i>The final quality of the data source can be described from multiple angles, and the quality of the data source can be more comprehensively characterized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a data source quality evaluation method in a sensing cloud environment according to an embodiment of the present invention;

fig. 2 is a flow chart for determining a true value of data according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a data source quality evaluation method in a sensing cloud environment according to an embodiment of the present invention, where the data source quality evaluation method in the sensing cloud environment includes:

s101, current and historical monitoring data of a sensing cloud storage data source are obtained, wherein the sensing cloud is a combination of cloud computing and a wireless sensor network and is used for collecting and processing monitoring data from a plurality of heterogeneous sensor nodes or sensor sub-networks.

Data of the physical world can be conveniently collected by deploying a Wireless Sensor Network (WSN) in a target area. The wireless sensor node has the advantages of small size, low price and the like, so that the wireless sensor node can be widely applied to the fields of environmental monitoring, national defense and military, traffic control, community security, target positioning and the like, but is limited by the capabilities of calculation, storage, communication and the like, and the application of a large-scale sensor network also faces a plurality of challenges. With the increasing demand of cloud computing development and information physical convergence, a sensor-cloud (sensor-cloud) is generated by combining cloud computing and a wireless sensor network into a necessary trend. In a sensor cloud, cloud services can collect and process data from multiple heterogeneous sensor nodes or sensor sub-networks, thereby completing data-driven, compute-intensive tasks that would otherwise be difficult to complete at the sensor end. Due to the excellent computing capability of cloud computing, originally heavier services such as data source quality assessment can be applied to sensor networks with limited computing capability, and data source quality assessment can be performed at the cloud end by integrating heterogeneous sensor nodes and sub-networks through the cloud service, so that the data source quality assessment can be used or abandoned as required.

However, since the sensor nodes are mostly deployed in a harsh environment or unattended area, the correctness and accuracy of data of some data sources (i.e. the sensor nodes or sub-networks) are vulnerable to the negative effects of the environment or attacks during the collection and transmission process. In other words, the data quality of some data sources may be degraded by either intrinsic or extrinsic factors. These negatively affected bad data sources, if not discovered in a timely manner, can further affect the quality of service of the cloud data driver. In order to accurately find the poor quality data source, an accurate data source quality evaluation method is required. Because the data acquisition process of the sensor is continuous, the quality of the data source of the sensor fluctuates with time, and therefore, the data source quality evaluation method also needs to support time sequence or data flow analysis to ensure that the data source quality evaluation result can be timely and accurately updated. Based on the evaluation result of the quality of the data source, the cloud application can better select and utilize the data source so as to improve the efficiency and quality of service.

Let Se be { s ═ s₁,…,s_nDenotes the set of data sources to participate in the evaluation, t₁Is an initial time, t_mIs the current time. First, n time sequences V ═ tone are read from a database<v₁₁,…,v_1m>,…,<v_n1,…,v_nm>Therein of<v_i1,…,v_im>Is dataSource s_iV sequence of monitoring values of_ijA vector of monitored values representing the data source at time i.

Next, it is checked whether the data amount exceeds a threshold value theta, if m>Theta, then for each sequence<v_i1,…,v_im>A dimension reduction operation is performed. For simplicity, the reduction operation may be selected from PAA (PAA) (degassing Aggregate adaptation) or APCA (adaptive degassing Constant adaptation), i.e., a sequence<v_i1,…,v_im>Divided into equal or longer l segments, each represented by the average of the corresponding points. The threshold θ is an empirical value and is set according to the computing power of the cloud.

S102, integrating monitoring data of the data source based on the spatial correlation and the temporal correlation and determining a data true value.

The quality of a data source is evaluated by adopting a simple voting and repeated iteration method in the current method, and the influence of the relevance of the space, time and different attributes of the data source on the quality evaluation is not considered for the Web data source and the relational database. However, in the sensor network, since the geographical positions of the sensor nodes are close, the monitored objects are consistent, the monitoring is continued, and some physical quantities are related, there are spatial correlation between data generated by different data sources at the same time, temporal correlation between data generated by the same data source at different times, and physical quantity correlation between different attributes of the same data source. These correlations may be expressed in either similarity of data values or positive or negative correlations of trends in data changes.

The spatial correlation may be processed first to obtain a true time series after the spatial correlation processing, and then the temporal correlation is processed based on the true time series, fig. 2 is a flowchart of determining a true value of data according to an embodiment of the present invention, please refer to fig. 2:

s201, whether the data have spatial correlation or not.

The sources of spatial correlation considered here are mainly whether the sensor nodes are close in position and whether the monitored objects are consistent. Due to vulnerability of the sensor itself: (Limited energy and easily damaged), there is often redundancy in deployment, i.e., there are multiple sensor nodes monitoring the same object at the same time. The sensor nodes are geographically close and the data they generate should be similar. Given data source (i.e. sensor node or subnet) s_iAll are then ANDed with s_iThe set of data sources with consistent monitoring objects is denoted as s_iCluster to which it belongs⁽ⁱ⁾。

S202, aiming at a given data source S_iReading s_iOther sensor node set S of the same monitoring area_N ⁽ⁱ⁾And S_N ⁽ⁱ⁾And monitoring data sequences of the middle nodes.

If the zone to which the monitoring object corresponds is regular (e.g. monitoring the temperature and humidity of a room), then for a given data source s_iCan read s_iOther sensor node set S in one regular monitoring area around_N ⁽ⁱ⁾And S_N ⁽ⁱ⁾And monitoring data sequences of the middle nodes. S_N ⁽ⁱ⁾All nodes in (1) naturally constitute Cluster⁽ⁱ⁾。

S203, integrating the position similarity and the data similarity pair S_N ⁽ⁱ⁾And clustering the monitoring data sequences of the middle nodes.

However, in some cases, due to the influence of river, valley, road, building, and the like, the area corresponding to the monitoring object is often not regular. At this time, S_N ⁽ⁱ⁾A part of the nodes in (1) may be associated with s_iMonitored objects are different, corresponding truth values are also different, and in order to prevent the partial nodes from polluting results found by the truth values, the sum s needs to be screened out through clustering (namely clustering)_iNodes which are similar enough need to consider the similarity of data and the similarity of positions simultaneously when clustering. The similarity of the sensor nodes is defined as the weighted average of the position similarity and the data similarity, and is shown as formula (1):

Sim(s_i,s_j)＝w₁×Simspace(s_i,s_j)+w₂×Simdata(s_i,s_j)

wherein, for s_iAnd optionally S_N ⁽ⁱ⁾Node s in_j，Simspace(s_i,s_j) Denotes s_iAnd s_jThe position similarity of (a) can be selected from the coordinate similarity, Simdata(s)_i,s_j) Denotes s_iAnd s_jThe data similarity of (1) can be normalized Euclidean Distance of time series or normalized EMD Distance (Earth Mover's Distance) w calculated by processing time series into histogram and calculating₁And w₂Are weights, and may all be set to 0.5.

S204, obtaining S_iAnd calculating the centroid of the class at each moment in the cluster as a candidate true value sequence.

To obtain s_iCluster where it is⁽ⁱ⁾Then, calculate Cluster⁽ⁱ⁾The centroid at each time is used as the candidate sequence of the true value (i.e. the true value at time t is Cluster⁽ⁱ⁾The centroid at time t).

And S205, whether the data have time correlation or not.

Processing temporal correlations first requires determining whether there is similarity in the time dimension. Consider similarities in nearby time instants. Data similarity in the near moment exists in many monitored objects, for example, temperature, humidity, altitude and the like are generally continuously changed, and the similarity should be reflected on a true value. Therefore, after the true value candidate sequence after the spatial correlation processing is obtained, the true value candidate sequence needs to be smoothed, so that the situation that the true value suddenly changes due to the error of the sensor data is avoided.

And S206, smoothing the time sequence of the centroid.

The smoothing strategy can adopt an n-order moving average method or a least square method, and the smoothed sequence is the final truth value sequence.

And S103, generating an initial quality evaluation vector of the data source based on the data truth value.

After obtaining the truth sequence, s can be compared_iEvaluating s by the difference between the value of (a) and the true value_iThe quality of (c). Can be based on the mass shown in formula (2)The evaluation function yields s_iAt t_kQuality value of time Q (si, t)_k)，t₁～t_mQuality value of<Q(s_i,t₁),…,Q(s_i,t_m)>Composition s_iInitial quality evaluation vector Qvec(s)_i)。

Q(s_i,t_k)＝1-dist(v_ik,true(v_ik))/maxdist

Wherein v is_ikIs s_iAt t_kThe value of time, true (v)_ik) Is s_iAt t_kTrue value, dist (v) corresponding to time_ik,true(v_ik) Is v)_ikAnd true (v)_ik) Distance of (1), maxdist is v_ikAnd true (v)_ik) Maximum value of distance (c). If only precision errors are considered, the dist function can use the absolute value of the difference of numerical values, if the deployment environment of the sensor network is severe, transmission and storage errors of bit strings need to be considered, and at the moment, the difference can be converted into binary strings firstly and then Hamming Distance (Hamming Distance) or edit Distance (edit Distance) is selected.

And S104, adjusting the initial quality evaluation vector of the data source according to a quality rule.

There are also problems with using the resulting initial quality assessment vector directly for data source quality assessment. The problem is that the influence of sudden abnormal events is not considered. Some emergencies in the environment (e.g., a sudden fire may cause a sudden increase in temperature readings) may cause sudden changes in the sensor readings that should not be considered quality issues, in other words, should not degrade the quality score of the data source due to such sudden changes.

In quality evaluation in relational databases, quality rules are typically employed to account for which dependencies are legitimate and should not be violated. However, the quality rules of the relational database cannot be directly used in the application scenario of sensor monitoring. Therefore, the invention designs a new quality rule, as shown in formula (3), which can represent positive correlation, negative correlation and other numerical correlation.

(f(A)∈target_A)→(g(B)∈valid_B)

Where A and B are two attribute sets, f () and g () are functions acting on A and B, target_ATarget value field, valid, representing f (A)_BIs the legal value range, target, of g (B)_AAnd valid_BEither intervals or sets of values, or another function, e.g., - ∞,0]And [0, + ∞) or {0,1}, etc.

The rules are used to declare associations that should exist in the physical world for attribute sets A and B (e.g., altitude and barometric pressure), with the following semantics: if the value of the function f (A) of the rule antecedent (i.e. arrow left) falls within the target value range target_AThen the value range of the function g (B) of the rule back-piece (i.e. right part of the arrow) should fall within valid_BIn (1). If the rule is satisfied at a certain time (i.e., when the condition of the front piece is satisfied, the condition of the back piece is also satisfied), the data at that time can be considered reasonable, and there is no quality problem.

The quality rule set Ψ is derived from the domain knowledge of the monitored object, and the quality assessment vector Qvec(s) is iteratively adjusted using the rule set as follows_i)。

Step (1), calculating Qvec(s)_i)＝<Q(s_i,t₁),…,Q(s_i,t_m)>Mean value Qmean of_iAnd standard deviation QSD_i。

Step (2) of defining a deviation threshold value T_iIs h times (h is a predetermined constant) standard deviation, i.e. T_i＝h·QSD_i。

Step (3) for the quality score lower than Qmean_iExceeds T_i(i.e., the quality score is too low) at time t_kGo through all rules in Ψ, check at time t_k，s_iWhether certain rules are violated in the data of (1), i.e., whether there is a case where the condition of the antecedent is satisfied but the condition of the consequent is not satisfied:

b) if the traversal is completed but no violated rule is found, the quality score at that moment is adjusted, modified to QMeni, and the process jumps to step(1) (2) update QMen_i、QSD_iAnd Ti.

Since quality rules reflect physical rules in the real world, the intuitive idea of the tuning process described above is that if a data anomaly satisfies all physical rules in its application scenario, the data anomaly is more likely to indicate an emergency in the physical world than erroneous data. Correspondingly, if the data is abnormal in value and the physical laws are violated, the data is more likely to be error data rather than the abnormal events actually occurring in the physical world. Based on the above adjustment, the erroneous determination can be corrected.

And S105, calculating a final quality evaluation result of the data source according to the adjusted initial quality evaluation vector of the data source.

Obtaining a data source s_iIs calculated as a final quality assessment vector Qvec(s)_i) Then s can be completed based on the vector_iThe quality of (2) is evaluated. s_iMay be evaluated using a triple<QMean_i,QSD_i,QStationary_i>To indicate.

Mean value Qmean_i。QMean_iIs Qvec(s)_i) The mean value of (a), i.e. the average homogeneity score over the period of time being evaluated, the higher the value of (b) is, the higher s is_iThe better the quality performance on average.

Standard deviation QSD_i。QSD_iIs Qvec(s)_i) Standard deviation of (a), represents s_iThe smaller the value of the stability of the mass of (1), the smaller the value of s_iThe less obvious the change of the quality score is, the more stable the quality is.

Stationarity QStationary_i。QStationary_iThe value range of (1) is { True, False }, the value of True represents stationary, and the value of False represents non-stationary. Normally, if the quality score of the data source at each time is regarded as a random process, the process should be a steady random process (steady stochastic process), in other words, the data source is faithfully providedThis behavior does not change over time for monitoring data at each moment. If the process is not a smooth random process, the quality score of the data source has some non-negligible correlation with time, and the data source itself can be presumed to have some abnormal factors which affect the data quality over time. Therefore, the pair Qvec(s) is required_i) And (5) carrying out stability test. If QStationary_iHas a value of False, i.e., Qvec(s)_i) If the data source is not stationary, the data source is used with a greater risk due to the influence of some unknown abnormal factors, and the abnormal factors of the data source should be checked in a conditional condition, and then whether the data source is to be used continuously is determined.

Based on triplets<QMean_i,QSD_i,QStationary_i>Can be applied to the data source s_iThe overall quality and stability of the material is depicted. For a set S of data sources participating in an evaluation_e＝{s₁,…,s_nAnd calculating the triplets of each data source, namely completing the data source quality evaluation task.

Compared with the prior art, the technology provided by the invention has the following advantages:

the truth value discovery is carried out by comprehensively considering the time-space relevance, and the quality of the data source is evaluated, so that the defect that the time-space attribute cannot be processed in the prior art is overcome, and the truth value discovery and the quality evaluation are more accurate;

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these modifications and substitutions should also be regarded as the protection scope of the present invention.

Claims

1. A data source quality assessment method in a sensing cloud environment is characterized by comprising the following steps:

generating an initial quality assessment vector for a data source based on the data truth values;

adjusting the initial quality evaluation vector of the data source according to a quality rule, specifically comprising:

Step (3) for the quality score lower than Qmean_iExceeds T_iTime t_kGo through the rules in the data quality rule Ψ, check at time t_k，s_iWhether the condition of the front part is satisfied but the condition of the back part is not satisfied exists in the data of (1):

a) if there are violated rules, the quality score Q(s)_i,t_k) Keeping the original shape;

b) if the traversal is completed but the violated rule is not found, the quality score at the moment is adjusted and is modified into QMen_iAnd jumping to the steps (1) and (2) to update QMen_i、QSD_iAnd T_i；

Step (4), repeating steps (1), (2) and (3) until QMen_iAnd QSD_iNo change occurs;

2. The method for evaluating the quality of the data source in the sensor cloud environment according to claim 1, wherein after the current and historical monitoring data of the sensor cloud storage data source are obtained, if the obtained current and historical monitoring data of the sensor cloud storage data source exceed a threshold, data reduction is performed on the data, the data reduction is used for reducing the data volume, and the data reduction includes a segment-by-segment aggregation approximation method or an adaptive segment-by-segment constant approximation method.

3. The method for evaluating the quality of a data source in a sensor cloud environment according to claim 1, wherein the integrating the monitored data of the data source based on the spatial correlation and the temporal correlation and determining the true value of the data comprises: judging whether the data has spatial correlation, and if the data has spatial correlation, aiming at a given data source s_iReading s_iOther sensor node set S in one regular monitoring area around_N ⁽ⁱ⁾And S_N ⁽ⁱ⁾Monitoring data sequence of middle node, S_N ⁽ⁱ⁾The nodes in the Cluster form a Cluster⁽ⁱ⁾。

4. The method for evaluating the quality of a data source in a sensor cloud environment according to claim 3, wherein said obtaining s_iCluster where it is⁽ⁱ⁾Then, the S is obtained by integrating the position similarity and the data similarity_N ⁽ⁱ⁾Clustering the monitoring data sequence of the middle node, and calculating Cluster⁽ⁱ⁾The centroid at each time instant is used as a candidate sequence for the truth.

5. The method for evaluating the quality of a data source in a sensing cloud environment according to claim 4, wherein after the true value candidate sequence is obtained, the true value candidate sequence is subjected to a time correlation process, i.e., a smoothing process, and the smoothing process includes a step n moving average method or a least square method, and the smoothed sequence is a final true value sequence.

6. The method for evaluating the quality of a data source in a sensor cloud environment according to claim 1, wherein said generating an initial quality evaluation vector of the data source based on the data truth values comprises: by comparison of s_iEvaluating s by the difference between the value of (a) and the true value_iS is obtained based on a quality evaluation function_iAt t_kQuality value of time Q(s)_i,t_k)，t₁～t_mQuality value of<Q(s_i,t₁),…,Q(s_i,t_m)>Composition s_iInitial quality evaluation vector Qvec(s)_i) The quality assessment function includes

Q(s_i,t_k)＝1-dist(v_ik,true(v_ik))/maxdist

7. The method for evaluating the quality of the data source in the sensing cloud environment according to claim 1, wherein the quality rule is expressed as:

(f(A)∈target_A)→(g(B)∈valid_B)

where A and B are two attribute sets, f () and g () are functions acting on A and B, target_ATarget value field, valid, representing f (A)_BIs the value range of g (B), target_AAnd valid_BIs an interval or a set of values or another function, and if the quality rule is satisfied at a certain time, the data at the time is reasonable and there is no quality problem.

8. As claimed in claim1 the data source quality evaluation method in the sensing cloud environment, characterized in that s_iThe quality assessment of (2) includes: qvec(s)_i) Mean value of (c), Qvec(s)_i) Standard deviation and stationarity QStationary of_i。

9. The method of claim 8, wherein the mean value QMean is a quality estimation method of the data source in the sensing cloud environment_iI.e. for indicating the average homogeneity score over the period evaluated, the higher the value, s_iThe better the quality performance;

stationarity QStationary_iThe value range of (1) comprises { True, False }, wherein a value of True represents stationary, and a value of False represents non-stationary.