CN110011847A - A kind of data source method for evaluating quality under sensing cloud environment - Google Patents

A kind of data source method for evaluating quality under sensing cloud environment Download PDF

Info

Publication number
CN110011847A
CN110011847A CN201910256445.9A CN201910256445A CN110011847A CN 110011847 A CN110011847 A CN 110011847A CN 201910256445 A CN201910256445 A CN 201910256445A CN 110011847 A CN110011847 A CN 110011847A
Authority
CN
China
Prior art keywords
data
data source
quality
value
true
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910256445.9A
Other languages
Chinese (zh)
Other versions
CN110011847B (en
Inventor
李默涵
田志宏
孙彦斌
顾钊铨
韩伟红
仇晶
苏申
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN201910256445.9A priority Critical patent/CN110011847B/en
Publication of CN110011847A publication Critical patent/CN110011847A/en
Application granted granted Critical
Publication of CN110011847B publication Critical patent/CN110011847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The embodiment of the invention discloses the data source method for evaluating quality under a kind of sensing cloud environment, it include: the current and Historical Monitoring data for obtaining sensing cloud storage data source, the sensing cloud is the combination of cloud computing and wireless sensor network, for collecting the monitoring data from multiple sensor nodes or sense signals network and being pocessed;Based on the monitoring data of spatial correlation and temporal associativity integrated data sources and determine data true value;The initial mass that true value generates data source based on the data assesses vector, and assesses vector according to the initial mass that quality rule adjusts the data source;The final mass assessment result that vector calculates the data source is assessed according to the initial mass of the data source adjusted.Using the present invention, data source quality can be described with multi-angle, to portraying more fully for data source quality.

Description

A kind of data source method for evaluating quality under sensing cloud environment
Technical field
The present invention relates to quality assessment fields, more particularly to the data source quality evaluation side under a kind of sensing cloud environment Method.
Background technique
Currently, having there is some data source method for evaluating quality to be suggested, existing method is mostly based on rule, such as condition letter Number relies on (conditional functional dependency), condition includes to rely on (conditional inclusion Dependency), timeliness constraint (currency constraint), matching rule (matching rule) etc., to assess number According to the quality of data in library, bad data is monitored.Ratio based on the bad data that data source generates, can further assess number According to source quality.The quality of data rule generally with A=a → B=b form, semanteme for " if the value of attribute set A is a, Then the value of attribute set B is necessary for b ".After screening the data of the regular former piece A=a of satisfaction in the database and checking rule Whether part B=b meets, it can be determined that in data whether there is mistake, be unsatisfactory for rule data be considered as bad data (or Claim wrong data), the qualities of data of some data sources may be influenced by itself or external factor and be declined.These are by negative shadow If loud bad data source can not be found in time, and can further influence the quality of the service based on the data source.For It can accurately find bad data source, need a set of accurate data source method for evaluating quality.
Summary of the invention
It to solve the above-mentioned problems, can the present invention provides the data source method for evaluating quality under a kind of sensing cloud environment Data source quality is more accurately described with multi-angle, to portraying more fully for data source quality.
Based on this, the present invention provides the data source method for evaluating quality under a kind of sensing cloud environment, comprising: a kind of sensing Data source method for evaluating quality under cloud environment characterized by comprising
The current and Historical Monitoring data of sensing cloud storage data source are obtained, the sensing cloud is cloud computing and wireless sensing The combination of device network, for collecting the monitoring data from multiple sensor nodes or sense signals network and being pocessed;
Based on the monitoring data of spatial correlation and temporal associativity integrated data sources and determine data true value;
The initial mass that true value generates data source based on the data assesses vector, and adjusts the number according to quality rule Vector is assessed according to the initial mass in source;
The final mass assessment that vector calculates the data source is assessed according to the initial mass of the data source adjusted As a result.
Wherein, it after the current and Historical Monitoring data for obtaining sensing cloud storage data source, is deposited if obtaining sensing cloud The current and Historical Monitoring data for storing up data source are more than threshold value, then carry out data regularization to the data, and the data regularization is used In simplifying data volume, the data regularization includes aggregation approximation method or adaptively piecewise constant approximation method paragraph by paragraph.
Wherein, it is described based on the monitoring data of spatial correlation and temporal associativity integrated data sources and determine data true value It include: the correlation for judging data and whether having spatially, if data have correlation spatially, for given data Source si, read siOther sensors node set S in one rules detection region of surroundingN (i)And SN (i)The prison of interior joint Measured data sequence, SN (i)In node constitute cluster Cluster(i)
Wherein, described to obtain siThe cluster Cluster at place(i)Afterwards, by integrated location similarity and data similarity to SN (i)The monitoring data sequence of interior joint clusters, and calculates Cluster(i)The mass center at each moment, using it as the candidate of true value Sequence.
Wherein, after the true value candidate sequence obtained after spatial correlation is handled, it is also necessary to be carried out to it Temporal associativity processing is smoothing processing, and the smoothing processing includes using the n rank method of moving average or least square method, smoothed out The true value sequence of sequence, that is, final.
Wherein, the initial mass assessment vector that the true value based on the data generates data source includes: to compare siValue S is evaluated with the difference of true valueiQuality, obtain s based on quality evaluation functionsiIn tkMass value Q (the s at momenti,tk), t1~ tmMass value < Q (si,t1),…,Q(si,tmThe initial mass of) > constitute si assesses vector Qvec (si), shown quality evaluation letter Number includes:
Q(si,tk)=1-dist (vik,true(vik))/maxdist
Wherein, vikIt is siIn tkThe value at moment, true (vik) it is siIn tkMoment corresponding true value, dist (vik,true (vik)) it is vikWith true (vik) distance, maxdist is vikWith true (vik) apart from maximum value.
Wherein, the quality rule indicates positive correlation, negatively correlated and other incidence relation numerically, the quality rule Then indicate are as follows:
(f(A)∈targetA)→(g(B)∈validB)
Wherein, A and B is two attribute sets, and f () and g () are the function acted on A and B, targetAIt indicates f (A) Target codomain, validBIt is the legal value range of g (B), targetAAnd validBFor section or value set or another Function, if the quality rule is satisfied at a certain moment, then it is assumed that the data at the moment are reasonable, and quality problems are not present.
Wherein, the initial mass assessment vector for adjusting the data source according to quality rule includes:
Step (1) calculates Qvec (si)=< Q (si,t1),…,Q(si,tm) > mean value QMeaniWith standard deviation QSDi
Step (2), definition deviate threshold value TiFor h times of standard deviation, i.e. Ti=hQSDi
Step (3) is lower than Qmean for quality scoreiMore than Ti(i.e. quality score is too low) moment tk, ergodic data matter Gauge then strictly all rules in Ψ, check in moment tk, siData in the presence or absence of former piece condition meet but consequent condition not The case where meeting:
A) if there is the rule violated, then quality score Q (si,tk) remain unchanged;
If b) traversal is completed but do not find the rule violated, the quality score at the moment is adjusted, is revised as QMeani, and step (1) (2) are jumped to, update QMeani、QSDiAnd Ti
Step (4) repeats step (1) (2) (3), until QMeaniAnd QSDiIt is no longer changed.
Wherein, the siQuality evaluation include: Qvec (si) mean value, Qvec (si) standard deviation and stationarity QStationaryi
Wherein, the mean value QMeani, that is, indicate average quality scoring in evaluated this period, value is higher, siMatter Scale is existing better;
Standard deviation QSDi, i.e. siQuality degree of stability, it is more stable to be worth smaller quality;
Stationarity QStationaryiValue range be { True, False }, being worth indicates steady for True, is worth for False When indicate non-stationary.
The space-time relationship that comprehensively considers of the invention carries out true value discovery, and makes evaluation to data source quality, this makes up The shortcomings that work on hand cannot handle time-space attribute, so that true value discovery and quality evaluation are more accurate;
The true value of unsupervised approaches discovery is not only relied only on during quality evaluation, but proposes new quality rule Then, and using quality rule to assessment result it is modified, reduces the possibility of erroneous judgement;
Current method to the assessment of data source quality can only one-dimensional evaluation (such as error rate), and it is proposed by the invention Technology uses triple < QMeani,QSDi,QStationaryi> indicate final data source quality, it can be retouched from multi-angle Data source quality is stated, to portraying more fully for data source quality.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the flow chart of the data source method for evaluating quality under sensing cloud environment provided in an embodiment of the present invention;
Fig. 2 is the flow chart of determining data true value provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Fig. 1 is the flow chart of the data source method for evaluating quality under sensing cloud environment provided in an embodiment of the present invention, described Sensing cloud environment under data source method for evaluating quality include:
S101, the current and Historical Monitoring data for obtaining sensing cloud storage data source, the sensing cloud is cloud computing and nothing The combination of line sensor network, for collecting the monitoring data from multiple heterogeneous sensor nodes or sense signals network simultaneously It is pocessed.
By disposing wireless sensor network (wireless sensor network, WSN) in target area, Ke Yifang Just the data of physical world are collected.Wireless sensor node it is small in size, cheap the advantages that allow to be widely applied In fields such as environmental monitoring, defense military, traffic control, community security protection, target positioning, but it is limited to it and calculates, storage, communicates Etc. ability, large-scale sensor network application be also faced with lot of challenges.With cloud computing development and information physical The demand of fusion is increasingly vigorous, and cloud computing and wireless sensor network are combined into for a kind of inexorable trend, cloud is sensed (sensor-cloud) it comes into being.In sensing cloud, cloud service can be collected from multiple heterogeneous sensor nodes or sensing The data of device sub-network are simultaneously pocessed, and then the calculating for completing the data-driven that some scripts are difficult to complete in sensor side is close Collection type task.The outstanding computing capability of cloud computing makes data source quality evaluation etc., and more bulky service can be applied to originally The limited sensor network of computing capability, data can be carried out beyond the clouds by integrating heterogeneous sensor node and sub-network by cloud service Source quality evaluation, and then use as needed or abandon.
However, due to the region that sensor node is deployed in bad environments mostly or nobody looks after, some of data The correctness and accuracy of the data in source (i.e. sensor node or sub-network) are during acquisition and transmission vulnerable to ring The negative effect of border or attack.In other words, the quality of data of some data sources may be influenced by itself or external factor and be declined. If these bad data sources become negatively affected can not be found in time, and can further influence cloud data-driven Service quality.In order to accurately find bad data source, a set of accurate data source method for evaluating quality is needed.Due to sensing The data acquisition of device is continual, therefore its data source quality can also fluctuate at any time, therefore above-mentioned data source quality Appraisal procedure also needs support time series or data-flow analysis, to guarantee timely and accurately to data source quality evaluation knot Fruit is updated.Based on the assessment result of data source quality, cloud application can preferably be chosen and using data source to improve The efficiency and quality of service.
Enable Se={ s1,…,snIndicate to participate in the data source set of evaluation, t1It is initial time, tmIt is current time. N time series V={ < v is read first from database11,…,v1m>,…,<vn1,…,vnm>, wherein<vi1,…,vim> be Data source siMonitoring value sequence, vijIndicate the vector that data source is constituted in the monitor value of moment i.
Then, check whether data volume is more than threshold θ, if m>θ, to each sequence<vi1,…,vim> execute to tie up and return About operate.The considerations of for simplicity, reduction, which operates, can be selected PAA (Piecewise Aggregate Approximation) Or APCA (Adaptive Piecewise Constant Approximation), i.e., by sequence < vi1,…,vim> be divided into it is isometric Or elongated l section, every section is represented with the average value of respective point.Threshold θ is an empirical value, according to the computing capability in cloud come Setting.
S102, based on the monitoring data of spatial correlation and temporal associativity integrated data sources and determine data true value.
Current method mostly uses simple vote, iterates method assesses data source quality, and spininess is to Web number According to source and relevant database, do not consider the space of data source, time, the relevance on different attribute to its quality evaluation It influences.But in sensor network, due to the geographical location of sensor node is close, monitoring object is consistent, continue to monitor, And some physical quantitys are relevant, therefore there are spatial correlation, same data for the data of different data source synchronization generations The data for the different moments that source generates are there are temporal associativity, and there are the associations of physical quantity between the different attribute of same data source Property.These relevances may both be shown in the similitude of data value, it is also possible to show data variation trend positive correlation or In negative correlation.
Spatial coherence can be obtained treated true value time series with first processing spatial coherence, then, then be based on The true value time Series Processing temporal correlation, Fig. 2 are the flow charts of determining data true value provided in an embodiment of the present invention, please be joined Examine Fig. 2:
Whether S201, data have correlation spatially.
Here the source of the spatial coherence considered is mainly that whether sensor node position is close and monitoring object is It is no consistent.Due to the fragility (finite energy and be easily destroyed) of sensor itself, often there is redundancy in arrangement, i.e., simultaneously There are multiple sensor nodes to monitor same target.These sensor nodes are number that is similar, generating on geographical location It is similar according to should also be.Given data source (i.e. sensor node or subnet) si, by all and siThe consistent number of monitoring object S is denoted as according to the set that source is constitutediAffiliated cluster Cluster(i)
S202, for given data source si, read siThe other sensors node set S in same monitoring regionN (i)And SN (i)The monitoring data sequence of interior joint.
If the corresponding region of monitoring object is regular (such as the temperature and humidity in monitoring room), for given Data source si, s can be readiOther sensors node set S in one rules detection region of surroundingN (i)And SN (i) The monitoring data sequence of interior joint.SN (i)In all nodes naturally constitute Cluster(i)
S203, integrated location similarity and data similarity are to SN (i)The monitoring data sequence of interior joint clusters.
However, in some cases, due to the influence of the factors such as river, mountain valley, road, building, monitoring object is corresponding Region be frequently not rule.At this moment, SN (i)In a part of node may be with siThe object of monitoring is not identical, corresponds to True value it is also different, in order not to allow the pollution true value discovery of this part of nodes as a result, need to by cluster (cluster) filter out and siNode alike enough needs when clustering while considering the similitude of data and the similitude of position.Define sensor section The similitude of point is the weighted average of location similarity and data similarity, as shown in formula (1):
Sim(si,sj)=w1×Simspace(si,sj)+w2×Simdata(si,sj)
Wherein, to siWith any SN (i)In node sj, Simspace (si,sj) indicate siAnd sjLocation similarity, can To select coordinate similarity, Simdata (si,sj) indicate siAnd sjData similarity, normalized time series can be selected Time Series Processing is first calculated normalized EMD distance (Earth Mover's for histogram by Euclidean distance again Distance), w1And w2It is weight, can be set as 0.5.
S204, s is obtainediThe cluster at place calculates such mass center of each moment, using as candidate true value sequence.
Obtain siThe cluster Cluster at place(i)Afterwards, Cluster is calculated(i)The mass center at each moment, using it as true value (i.e. the true value of t moment is Cluster to candidate sequence(i)In the mass center of t moment).
Whether S205, data are to have temporal correlation.
Processing temporal correlation firstly the need of judge data on time dimension whether there is similitude.Consideration closes on the moment Similitude.The data similarity for closing on the moment all exists in many monitoring objects, such as temperature, humidity, height equivalent are logical It is often all consecutive variations, this similitude should also be as reaction in true value.Therefore it is obtaining handling by spatial coherence After true value candidate sequence later, it is also necessary to be smoothed to it, avoid true value occur because sensing data malfunctions The case where cataclysm.
S206, smoothing processing is done to the time series of mass center.
The optional n rank method of moving average of smooth strategy or least square method, the true value sequence of smoothed out sequence, that is, final.
The initial mass that S103, based on the data true value generate data source assesses vector.
After obtaining true value sequence, s can be comparediValue and the difference of true value evaluate siQuality.It can be based on formula (2) Shown in quality evaluation functions obtain siIn tkMass value Q (si, the t at momentk), t1~tmMass value < Q (si,t1),…,Q (si,tm) > constitute siInitial mass assess vector Qvec (si)。
Q(si,tk)=1-dist (vik,true(vik))/maxdist
Wherein, vikIt is siIn tkThe value at moment, true (vik) it is siIn tkMoment corresponding true value, dist (vik,true (vik)) it is vikWith true (vik) distance, maxdist is vikWith true (vik) apart from maximum value.If only considering precision Mistake, then the absolute value of the difference of numerical value can be used also to need to consider if sensor network disposition environment is more severe for dist function The transimission and storage mistake of Bit String can first be converted into binary string at this time and select Hamming distance (Hamming Distance) again Or editing distance (edit distance).
S104, the initial mass assessment vector that the data source is adjusted according to quality rule.
Obtained initial mass assessment vector, which is directly used in data source quality evaluation, there is also some problems.Its problem is There is no the anomalous event bring influences for considering burst.Some emergency events in environment (such as on fire can enable temperature suddenly Reading increases suddenly) sensor reading may be enabled to be mutated, and this mutation is not construed as quality problems, in other words, no The quality score of data source should be reduced because of this mutation.
In quality evaluation in relational database, quality rule is generallyd use to illustrate which dependence is reasonably, no It should be violated.But, the quality rule of relational database cannot be directly used to the application scenarios of sensor monitoring.For this purpose, this Invention devises a kind of new quality rule, as shown in formula (3), can indicate to be positively correlated, negatively correlated and others numerically Incidence relation.
(f(A)∈targetA)→(g(B)∈validB)
Wherein, A and B is two attribute sets, and f () and g () are the function acted on A and B, targetAIt indicates f (A) Target codomain, validBIt is the legal value range of g (B), targetAAnd validBCan be section or value set, can also To be another function, such as (- ∞, 0], [0 ,+∞) or { 0,1 } etc..
For rule for stating in physical world, attribute set A and B (such as height with air pressure) existing should be associated with pass System, it is semantic as follows: if the value of the function f (A) of regular former piece (i.e. arrow left part) falls in target codomain targetAIn, that The codomain of the function g (B) of consequent (i.e. arrow right part) should fall in validBIn.If rule is satisfied at a certain moment Quality is not present then it is believed that the data at the moment are reasonable in (i.e. when the condition of former piece is satisfied, the condition of consequent is also satisfied) Problem.
Quality rule set Ψ is obtained according to the domain knowledge of monitored target, is iteratively adjusted as follows using regular collection Quality evaluation vector Qvec (si)。
Step (1) calculates Qvec (si)=< Q (si,t1),…,Q(si,tm) > mean value QMeaniWith standard deviation QSDi
Step (2), definition deviate threshold value TiFor the standard deviation of h times (h is the constant made an appointment), i.e. Ti=h QSDi
Step (3) is lower than Qmean for quality scoreiMore than Ti(i.e. quality score is too low) moment tk, traverse in Ψ Strictly all rules are checked in moment tk, siData in whether violated certain rules, i.e., meet with the presence or absence of the condition of former piece but The ungratified situation of consequent condition:
A) if there is the rule violated, then quality score Q (si,tk) remain unchanged;
If b) traversal is completed but do not find the rule violated, the quality score at the moment is adjusted, is revised as QMeani, and step (1) (2) are jumped to, update QMeani、QSDiAnd Ti.
Step (4) repeats step (1) (2) (3), until QMeaniAnd QSDiIt is no longer changed.
What it is due to quality rule reaction is physical rules in the real world, so the intuitive thought of above-mentioned adjustment process It is that, if a data exception meets all physical rules under its application scenarios, which more likely indicates Emergency event in physical world, rather than wrong data.Correspondingly, if data value is abnormal, while object is also violated Reason rule, then data are more likely wrong data, rather than the anomalous event really occurred in physical world.Based on above-mentioned tune It is whole, the case where erroneous judgement can be corrected.
S105, the final mass that vector calculates the data source is assessed according to the initial mass of the data source adjusted Assessment result.
Get data source siFinal mass assess vector Qvec (si) after, s can be completed based on the vectoriQuality Assessment.siQuality evaluation can use triple < QMeani,QSDi,QStationaryi> indicate.
Mean value QMeani。QMeaniIt is Qvec (si) mean value, i.e., average quality scoring in evaluated this period, The value the high, illustrates siQuality performance under average case is better.
Standard deviation QSDi。QSDiIt is Qvec (si) standard deviation, indicate siQuality degree of stability, be worth smaller, explanation siQuality score variation it is unobvious, quality is more stable.
Stationarity QStationaryi。QStationaryiValue range be { True, False }, being worth indicates flat for True Surely, it is worth to indicate non-stationary when False.Under normal circumstances, if regarding quality score of each moment to data source as one Random process, then the process should be a stationary random process (stationary stochastic process), in other words, Data source verily provides the monitoring data at each moment, this behavior does not change at any time.If the process is not steady Random process, then it represents that there are the correlation that certain be can not ignore in the quality score of data source and time, and then can be with tentative data There is certain abnormal factors for influencing the quality of data at any time in source itself.Therefore, it is necessary to Qvec (si) carry out stationarity inspection It tests.If QStationaryiValue be False, i.e. Qvec (si) be it is non-smoothly, then illustrate due to it is certain it is unknown it is abnormal because The influence of element, has greater risk using the data of this data source, the abnormal factors of data source should be coped in conditional situation It is checked, then decides whether to continue to use the data source again.
Based on triple < QMeani,QSDi,QStationaryi>, it can be to data source siTotal quality and stability Make description.For participating in the data source set S of evaluatione={ s1,…,snEach of data source all calculate its triple, Data source quality evaluation task can be completed.
Technology proposed by the invention has the following advantages compared with work on hand:
Comprehensively consider space-time relationship and carry out true value discovery, and evaluation is made to data source quality, this compensates for existing work The shortcomings that work cannot handle time-space attribute, so that true value discovery and quality evaluation are more accurate;
The true value of unsupervised approaches discovery is not only relied only on during quality evaluation, but proposes new quality rule Then, and using quality rule to assessment result it is modified, reduces the possibility of erroneous judgement;
Current method to the assessment of data source quality can only one-dimensional evaluation (such as error rate), and it is proposed by the invention Technology uses triple < QMeani,QSDi,QStationaryi> indicate final data source quality, it can be retouched from multi-angle Data source quality is stated, to portraying more fully for data source quality.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and replacement can also be made, these are improved and replacement Also it should be regarded as protection scope of the present invention.

Claims (10)

1. the data source method for evaluating quality under a kind of sensing cloud environment characterized by comprising
The current and Historical Monitoring data of sensing cloud storage data source are obtained, the sensing cloud is cloud computing and wireless sensor network The combination of network, for collecting the monitoring data from multiple sensor nodes or sense signals network and being pocessed;
Based on the monitoring data of spatial correlation and temporal associativity integrated data sources and determine data true value;
The initial mass that true value generates data source based on the data assesses vector;
Vector is assessed according to the initial mass that quality rule adjusts the data source;
The final mass assessment result that vector calculates the data source is assessed according to the initial mass of the data source adjusted.
2. the data source method for evaluating quality under sensing cloud environment as described in claim 1, which is characterized in that the acquisition sensing After the current and Historical Monitoring data of cloud storage data source, if obtaining the current and Historical Monitoring number of sensing cloud storage data source According to being more than threshold value, then data regularization is carried out to the data, the data regularization is for simplifying data volume, the data regularization packet Include aggregation approximation method paragraph by paragraph or adaptive constant approximation method paragraph by paragraph.
3. the data source method for evaluating quality under sensing cloud environment as described in claim 1, which is characterized in that described to be based on space The monitoring data of relevance and temporal associativity integrated data sources simultaneously determine that data true value includes: to judge whether data have space On correlation, if data have correlation spatially, for given data source si, read siOne rule of surrounding Monitor the other sensors node set S in regionN (i)And SN (i)The monitoring data sequence of interior joint, SN (i)In node structure Cluster Cluster(i)
4. the data source method for evaluating quality under sensing cloud environment as claimed in claim 3, which is characterized in that described to obtain siInstitute Cluster Cluster(i)Afterwards, by integrated location similarity and data similarity to SN (i)The monitoring data sequence of interior joint into Row clusters, and calculates Cluster(i)The mass center at each moment, the candidate sequence as true value.
5. the data source method for evaluating quality under sensing cloud environment as claimed in claim 4, which is characterized in that described to obtain true value After candidate sequence, carrying out temporal associativity processing to the true value candidate sequence is smoothing processing, and the smoothing processing includes using n The rank method of moving average or least square method, the true value sequence of smoothed out sequence, that is, final.
6. the data source method for evaluating quality under sensing cloud environment as described in claim 1, which is characterized in that described based on described The initial mass assessment vector that data true value generates data source includes: by comparing siValue and the difference of true value evaluate si's Quality obtains s based on quality evaluation functionsiIn tkMass value Q (the s at momenti,tk), t1~tmMass value < Q (si,t1),…,Q (si,tm) > constitute siInitial mass assess vector Qvec (si), shown quality evaluation functions include
Q(si,tk)=1-dist (vik,true(vik))/maxdist
Wherein, vikIt is siIn tkThe value at moment, true (vik) it is siIn tkMoment corresponding true value, dist (vik,true(vik)) It is vikWith true (vik) distance, maxdist is vikWith true (vik) apart from maximum value.
7. the data source method for evaluating quality under sensing cloud environment as described in claim 1, which is characterized in that the quality rule It indicates are as follows:
(f(A)∈targetA)→(g(B)∈validB)
Wherein, A and B is two attribute sets, and f () and g () are the function acted on A and B, targetAIndicate the mesh of f (A) Scale value domain, validBIt is the value range of g (B), targetAAnd validBFor section or value set or another function, if institute It states quality rule to be satisfied at a certain moment, then the data at the moment are reasonable, and quality problems are not present.
8. the data source method for evaluating quality under sensing cloud environment as described in claim 1, which is characterized in that described according to quality The initial mass of data source described in rule adjustment assesses vector
Step (1) calculates Qvec (si)=< Q (si,t1),…,Q(si,tm) > mean value QMeaniWith standard deviation QSDi
Step (2), definition deviate threshold value TiFor h times of standard deviation, i.e. Ti=hQSDi
Step (3) is lower than Qmean for quality scoreiMore than TiMoment tk, rule in ergodic data quality rule Ψ checks In moment tk, siData in the presence or absence of former piece condition meet but the ungratified situation of consequent condition:
A) rule violated if it exists, then quality score Q (si,tk) remain unchanged;
If b) traversal is completed but do not find the rule violated, the quality score at the moment is adjusted, QMean is revised asi, And step (1) (2) are jumped to, update QMeani、QSDiAnd Ti
Step (4) repeats step (1) (2) (3), until QMeaniAnd QSDiIt is no longer changed.
9. the data source method for evaluating quality under sensing cloud environment as described in claim 1, which is characterized in that the siQuality Assessment includes: Qvec (si) mean value, Qvec (si) standard deviation and stationarity QStationaryi
10. the data source method for evaluating quality under sensing cloud environment as claimed in claim 9, which is characterized in that the mean value QMeani, i.e., for indicating average quality scoring in evaluated this period, value is higher, siQuality performance is better;
Standard deviation QSDi, i.e. siQuality degree of stability, it is more stable to be worth smaller quality;
Stationarity QStationaryiValue range include { True, False }, being worth indicates steady for True, when being worth for False Indicate non-stationary.
CN201910256445.9A 2019-03-29 2019-03-29 Data source quality evaluation method under sensing cloud environment Active CN110011847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910256445.9A CN110011847B (en) 2019-03-29 2019-03-29 Data source quality evaluation method under sensing cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910256445.9A CN110011847B (en) 2019-03-29 2019-03-29 Data source quality evaluation method under sensing cloud environment

Publications (2)

Publication Number Publication Date
CN110011847A true CN110011847A (en) 2019-07-12
CN110011847B CN110011847B (en) 2022-03-25

Family

ID=67169319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910256445.9A Active CN110011847B (en) 2019-03-29 2019-03-29 Data source quality evaluation method under sensing cloud environment

Country Status (1)

Country Link
CN (1) CN110011847B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519720A (en) * 2019-08-23 2019-11-29 绍兴文理学院 Bursty traffic maps load capacity optimization method under a kind of sensing cloud environment
CN111898871A (en) * 2020-07-08 2020-11-06 南京南瑞水利水电科技有限公司 Power grid power end data quality evaluation method, device and system
CN115097526A (en) * 2022-08-22 2022-09-23 江苏益捷思信息科技有限公司 Seismic acquisition data quality evaluation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020478A (en) * 2012-12-28 2013-04-03 杭州师范大学 Method for checking reality of ocean color remote sensing product
CN103530347A (en) * 2013-10-09 2014-01-22 北京东方网信科技股份有限公司 Internet resource quality assessment method and system based on big data mining
CN103916860A (en) * 2014-04-16 2014-07-09 东南大学 Outlier data detection method based on space-time correlation in wireless sensor cluster network
CN108614803A (en) * 2018-04-16 2018-10-02 深圳市赑玄阁科技有限公司 A kind of meteorological data method of quality control and system
CN108898311A (en) * 2018-06-28 2018-11-27 国网湖南省电力有限公司 A kind of data quality checking method towards intelligent distribution network repairing dispatching platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020478A (en) * 2012-12-28 2013-04-03 杭州师范大学 Method for checking reality of ocean color remote sensing product
CN103530347A (en) * 2013-10-09 2014-01-22 北京东方网信科技股份有限公司 Internet resource quality assessment method and system based on big data mining
CN103916860A (en) * 2014-04-16 2014-07-09 东南大学 Outlier data detection method based on space-time correlation in wireless sensor cluster network
CN108614803A (en) * 2018-04-16 2018-10-02 深圳市赑玄阁科技有限公司 A kind of meteorological data method of quality control and system
CN108898311A (en) * 2018-06-28 2018-11-27 国网湖南省电力有限公司 A kind of data quality checking method towards intelligent distribution network repairing dispatching platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
伍荣坤: "定期统计报表数据质量组合评估方法初探", 《统计研究》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110519720A (en) * 2019-08-23 2019-11-29 绍兴文理学院 Bursty traffic maps load capacity optimization method under a kind of sensing cloud environment
CN111898871A (en) * 2020-07-08 2020-11-06 南京南瑞水利水电科技有限公司 Power grid power end data quality evaluation method, device and system
CN111898871B (en) * 2020-07-08 2023-07-18 南京南瑞水利水电科技有限公司 Method, device and system for evaluating data quality of power grid power supply end
CN115097526A (en) * 2022-08-22 2022-09-23 江苏益捷思信息科技有限公司 Seismic acquisition data quality evaluation method
CN115097526B (en) * 2022-08-22 2022-11-11 江苏益捷思信息科技有限公司 Seismic acquisition data quality evaluation method

Also Published As

Publication number Publication date
CN110011847B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
Bosman et al. Spatial anomaly detection in sensor networks using neighborhood information
Varela et al. Wireless sensor network for forest fire detection
CN109150868A (en) network security situation evaluating method and device
Liu et al. Fault-tolerant event region detection on trajectory pattern extraction for industrial wireless sensor networks
CN110011847A (en) A kind of data source method for evaluating quality under sensing cloud environment
KR20190019493A (en) It system fault analysis technique based on configuration management database
Mahdi et al. Diversity measure as a new drift detection method in data streaming
WO2022012295A1 (en) Fire detection method and apparatus
Ghosh et al. Outlier detection in sensor data using machine learning techniques for IoT framework and wireless sensor networks: A brief study
Byakatonda et al. Modeling dryness severity using artificial neural network at the Okavango Delta, Botswana
CN103533571A (en) FEDAV (fault-tolerant event detection algorithm based on voting)
CN106875613A (en) A kind of fire alarm Situation analysis method
WO2018086025A1 (en) Node identification in distributed adaptive networks
Zhang et al. Cleaning environmental sensing data streams based on individual sensor reliability
Yang et al. Cross-space building occupancy modeling by contextual information based learning
CN109063885A (en) A kind of substation&#39;s exception metric data prediction technique
Tsai et al. Sensor abnormal detection and recovery using machine learning for IoT sensing systems
CN107729293B (en) A kind of geographical space method for detecting abnormal based on Multivariate adaptive regression splines
Karthik et al. Data trust model for event detection in wireless sensor networks using data correlation techniques
Haribabu et al. Prediction of flood by rainf all using MLP classifier of neural network model
CN111542010A (en) WSN data fusion method based on classification adaptive estimation weighting fusion algorithm
US10921154B2 (en) Monitoring a sensor array
Tang et al. A framework of mining trajectories from untrustworthy data in cyber-physical system
KR102320707B1 (en) Method for classifiying facility fault of facility monitoring system
CN112128950B (en) Machine room temperature and humidity prediction method and system based on multiple model comparisons

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant