CN107341515A - A kind of particular point detection method based on data representation - Google Patents
A kind of particular point detection method based on data representation Download PDFInfo
- Publication number
- CN107341515A CN107341515A CN201710548770.3A CN201710548770A CN107341515A CN 107341515 A CN107341515 A CN 107341515A CN 201710548770 A CN201710548770 A CN 201710548770A CN 107341515 A CN107341515 A CN 107341515A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- data
- mtd
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The present invention relates to a kind of particular point detection method based on data representation, based on position existing for negative value element in affine combination, concentrates affine combinatorial theory with reference to data, utilizes reversible unreachable measure value NCiGo to judge the degree that sample point belongs to particular point, particular point is automatically determined by setting threshold value γ automatically, improve the accuracy and speed of particular point detection, the architectural feature of data set is preferably reacted, and it is detected simultaneously by abnormity point and marginal point, the present invention is influenceed weaker by data distribution and data dimension in addition, and the scope of application is wider in actual applications, solves prior art to the accuracy of detection of particular point and the deficiency bad to high dimensional data detection performance.
Description
Technical field
The invention belongs to Data Mining, more particularly to a kind of particular point detection method based on data representation.
Background technology
The technologies such as traditional cluster, classification and pattern-recognition are intended to find common-mode, and the detection of particular point, including side
Edge point and abnormity point, it is generally used for pattern that is effective, interesting and having potential value in identification data.With detecting normal point phase
Than particular point detection is typically a more meaningful task.In addition, most of algorithm can be influenceed by abnormity point.Such as
Famous non-linearity manifold study dimension-reduction algorithm Isomap does not describe the relevant issues of outlier detection in itself.Therefore, such as
How correctly to detect that the abnormity point of higher dimensional space is a realistic problem urgently to be resolved hurrily, be that one in data prediction is important
Task.
Domestic and international researcher proposes various detection algorithms from different technical standpoints.Can from the method for judging abnormity point
It is divided into world model and partial model.World model makees two-value judgement for all observation stations.Whether judge Current observation point
For abnormity point.And partial model typically assigns observation station certain measure value (such as angle change factor), for estimating that the point belongs to
The degree of abnormity point.According to whether data tag information can be divided into supervision, semi-supervised and unsupervised algorithm.At present, mostly
Figure method generally use and be based on statistics (Statistical-based), distance (Distance-based), neighbours or density
(Density-based), cluster (Clustering-based) or deviate (Deviation-based) 5 class shallow-layer technological means.
Prior art:
Statistics-Based Method is abnormal point detecting method earlier.Such method generally assumes that data-oriented meets certain
Distribution or probabilistic model, generally use inconsistency are examined to determine whether data are abnormal data.Such as meet Gaussian Profile
The 3 σ methods of (Gaussian distribution), meet normal distribution (Normal distribution) croup this inspection
Test method (Grubbs'test), based on χ2Method, the application such as analyze data divergence it is relatively broad.It is however, most based on system
The method of meter is intended to the univariate data collection that processing meets known distribution.It is higher for Unknown Distribution, dimension under actual environment
Data are then difficult to judge.The data age of magnanimity higher-dimension especially is being faced, such method can not obtain optimum detection effect
Fruit.
Because geometric meaning understands it is currently used most common method based on the method for distance.This method is survey with distance
The point for not having " enough " neighbours, is determined as abnormal data by degree, has been opened up extensively the inconsistency based on statistical method and has been examined thought.
Distance-Based abnormity points DB (ε, d) method that more classical method such as Knorr and Ng are proposed.If in data set at least
The distance of the point of ε × 100% to current data point is more than d, then is determined as abnormity point.This method have to parameter ε and d it is larger according to
Lai Xing.For identical data set, using different parameter values, differing greatly for performance can be caused.And distance is based on later
Neighbor approach, it may have similar parameters problem.Particularly in higher dimensional space, the measurement separability based on distance is poor (such as
Following formula).
Wherein, m be data dimension, distmaxAnd distminRespectively current point is to its farthest neighbour and nearest-neighbors
Distance.With m increase, the formula will go to zero.Therefore the High dimensional space data of distance measure processing generally existing has or more
Or few inadaptability.
Existing abnormal point method of determining and calculating effect under specified conditions or specific area is preferable, or to compared with lower dimensional space
Outlier detection effect is preferable, and when the dimension of data is higher, the effect of these algorithms is unsatisfactory, and generalization ability is weaker.At present
The outlier detection of higher dimensional space was studied also in the starting stage.For example, Kriegel proposes the outlier detection based on angle
Algorithm (ABOD), the algorithm is independent of problem of parameter selection.However, ABOD algorithms only consider current point and the relation of neighbours,
And do not consider more relations between its neighbour, cause the algorithm to recognize the abnormity point of mistake.Therefore, in higher dimensional space
Abnormal point method of determining and calculating need more in-depth study.
Therefore, how further to improve outlier detection precision and efficiency turns into current Data Mining urgent need to resolve
Problem.
The content of the invention
For the deficiency of prior art, the present invention proposes a kind of particular point detection method based on data representation, including
Following steps:
Step 1:Input data set X ∈ Rm′n, wherein, X represents m ' n data set matrix, each row of the data set X
A data sample is represented, i.e. X includes n sample, and each sample has m dimensions, data sample xi∈Rm, i ∈ { 1,2 ... n }, m
Sample dimension is represented, n represents the number of samples of the data set X;
Step 2:Calculate the weight expression matrix W ∈ R of the data set Xn′n, W is calculated using formula (1),
Wherein, xiRepresent i-th of data sample in the data set X, xjRepresent j-th of data sample in data set, wij
The annexation of i-th of data sample and j-th of data sample is represented, the weight expression matrix W is n ' n matrix;
Step 3:Calculate reversible unreachable index NCiMeasure value, for the data sample xi∈Rm, i ∈ { 1,2 ... n },
NC is calculated by formula (2)iMeasure value,
Wherein,
Wherein, wijFor the weight expression matrix W ∈ Rn′nI-th row, jth column element;
Step 4:N NC corresponding with n data sample is obtained to step 4iMeasure value sorts from big to small;
Step 5:Judge particular point, comprise the following steps:
Step 51:Threshold value γ is determined by formula (3),
Wherein σ is factor of influence;
Step 52:For each data sample xi, i ∈ { 1,2 ... n }, work as NCiDuring >=γ, xiIt is judged as abnormity point.
According to a kind of preferred embodiment, methods described also includes calculating the weight expression matrix W using formula (4),
Wherein, for i-th of data sample x in the data set Xi∈Rm, remaining n-1 data sample is done returned first
One change and Uncoupled Property pretreatment operation, obtain Xi∈Rm′(n-1);wiIt is a column vector, and requires wiThe value of middle all elements and
For 1;QiBe one apart from induced matrix, be the diagonal matrix of a positive definite;λ1It is balance factor, it is openness and imitative for balancing
Penetrate reconstruction error.
According to a kind of preferred embodiment, methods described also includes calculating the weight expression matrix W using formula (5),
min||w||1, s.t., b=Xw and 1TW=1. (5)
Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in the known data set X, by imitative
Penetrate constraint 1TW=1 and sparse limitation | | w1To determine that w, w are a column vectors, sample corresponding to nonzero element is used for reconstructing in w
Sample b.
According to a kind of preferred embodiment, methods described also includes being labeled particular point, and step is as follows:
Step 6:Visualization display, as 1≤m≤3, to the data set X ∈ Rm′n, marked in space is visualized aobvious
Judged fixed particular point is shown, to judge particular point whether on the marginal position of data set;Work as m>When 3, to the data
Collect X ∈ Rm′nThen its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.
The beneficial effect of technical solution of the present invention is:
1st, the abnormal point detecting method of the invention based on data representation, find that the inherent of high dimensional data is tied by data representation
Structure, the specific features for abnormality detection that data representation is contained are disclosed, be generally called off-note.Inventive algorithm is preferably anti-
The architectural feature of data set has been reflected, while has detected abnormity point and marginal point, suitable for the data of Unknown Distribution.
2nd, the present invention proposes reversible unreachable measurement index NCiBelong to the degree of marginal point for evaluating Current observation point,
The index considers the global structure of data, rather than just local neighborhood, by setting threshold value automatically come automatic true
Determine particular point, improve the accuracy and speed of particular point detection, preferably reacted the architectural feature of data set, met abnormity point
Detect the demand in actual application environment.
3rd, the present invention is influenceed weaker by data distribution and data dimension, therefore is equally applicable to high dimensional data, actually should
It is wider with the middle scope of application.
Brief description of the drawings
Fig. 1 is the flow chart of particular point detection method of the present invention;
Fig. 2 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional Flame data sets;
Fig. 3 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional XOR data sets;
Fig. 4 is the result figure for carrying out particular point judgement with the inventive method to Aggregation data sets;
Fig. 5 (a) is convex combination example;With
Fig. 5 (b) is affine example combinations.
Embodiment
It is described in detail below in conjunction with the accompanying drawings.
NC in the present inventioniMeasure value refers to:Reversible unreachable index, the journey at edge is belonged to for evaluating Current observation point
Degree, it is defined as the global table relevant with Current observation point up to the linear function of middle null value and negative value number.
Particular point in the present invention includes abnormity point and marginal point.
For insufficient existing for existing algorithm, the present invention is based on position, binding number existing for negative value element in affine combination
According to affine combinatorial theory is concentrated, NC is utilizediThe degree for judging that sample point belongs to particular point is estimated, by setting threshold value automatically
γ automatically determines particular point, improves the accuracy and speed of particular point detection, has preferably reacted the architectural feature of data set,
And it is detected simultaneously by abnormity point and marginal point.Meanwhile the present invention influenceed by data distribution and data dimension it is weaker, in practical application
The middle scope of application is wider.
Fig. 1 is the flow chart of particular point detection method of the present invention, as shown in figure 1, one kind proposed by the present invention is based on data
The particular point detection method of expression, comprises the following steps:
Step 1:Input data set X ∈ Rm′n, wherein, wherein X represents m ' n data set matrixes, and each row represent a data
Sample, i.e. X include n sample, and each sample has m dimensions, xi∈Rm, i ∈ { 1,2 ... n }, m represent sample dimension, n expressions
Data set number of samples.
Such as:The picture of 10 20 ' 20 sizes, column vector that can be each image procossing into 400 ' 1 sizes, then
The data set X ∈ R of 10 image constructions400′10, X is exactly the matrix of 400 ' 10 sizes.
Step 2:Calculate the weight expression matrix W ∈ R of data setn′n, W is calculated using formula (1),
Wherein, xiRepresent i-th of data sample in data set, xjRepresent j-th of data sample in data set, wijRepresent the
The annexation of i data sample and j-th of data sample.
Weight expression matrix W is n ' n matrix, because sharing n sample, wherein any one sample is to other n-1
Sample has distance, i.e. n-1 distance value, and in order to represent convenient, note to the distance of itself is 0, so any one sample arrives
The distance of others point has n value, so n sample, shares n ' n values, forms n ' n matrix.
For i-th of sample xi,Represent constraint weight expression matrix the i-th rows of W and for 1.Table
Linear combination, for reconstructing xi,Calculate xiReconstructed error,Accumulative n sample
Reconstructed error.
Another embodiment, weight expression matrix W can be also calculated by formula (4):
Wherein, for i-th of data sample x in data set Xi∈Rm, first to remaining n-1 data sample do normalization and
The pretreatment operation of Uncoupled Property, obtains Xi∈Rm′(n-1)。1TTransposition operation, w are carried out for complete 1 column vectoriIt is a column vector,
1Twi=1 is affine constraint wiThe value of middle all elements and for 1.QiApart from induced matrix, to be the diagonal matrix of a positive definite;λ1
Balance factor, for balance it is openness (| | | | 11Sparse limitation) and Affine Reconstruction errorW is by wiForm, wiFor
The column vector of n × 1.I span is 1-n.
Another embodiment, weight expression matrix W can be also calculated by formula (5):
min||w||1, s.t., b=Xw and 1TW=1. (5)
Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in given data collection X.Specifically select X
Which of sample, then pass through it is affine constraint 1TW=1 and sparse limitation | | w | |1To determine w.Sample corresponding to nonzero element in w
For reconstructed sample b.B=Xw is reconstructed sample b expression formula.Wherein, for a reconstructed sample b and data set X, wTIt is power
Certain row data in weight expression matrix W, wTIt is column vector w transposed vector, w represents a column vector.The value of all elements in w
With for 1.
By calculating weight expression matrix W, to disclose the relation in data set between each sample, such as position relationship.This
Invention provides three kinds of different embodiments, according to the characteristics of data set and can calculate requirement, choose different methods and calculate
Weight matrix W.Formula (1) calculates simple and convenient, efficiency high.For more complicated data set, formula (4) can be used to calculate weight expression
Matrix W.It is simple and convenient that formula (5) calculates weight expression matrix W.
Step 4:Calculate reversible unreachable survey index NCiMeasure value, for each sample point, NC is calculated by formula (2)i,
Wherein,
Wherein, wijFor weight expression matrix W ∈ Rn′nI-th row, jth column element.χ () is a kind of function representation form, when
Function variable wijValue be less than or equal to 0, the function χ () value be 1, other situation function χ () values be 0.
For data sample xi∈Rm, i ∈ { 1,2 ... n }, formula (2) is used for statistical weight expression matrix W ∈ Rn′nI-th row
The number of the upper element less than or equal to zero.
Compare the existing particular point detection method based on part, explicit based on weight expression matrix W, formula (2)
The global structure for considering data set, disclose current data point and other points can not be connective, judges from overall angle
Particular point, it is more beneficial for improving the performance of detection.When ambient density, to be distributed relatively low its reversible unreachable measure value of point higher.
Reversible unreachable survey index NCiAngle value is used to evaluate the degree that Current observation point belongs to marginal point.It is defined as
Linear function of the relevant global table of Current observation point up to middle null value and negative value number.It explicitly considers the global knot of data
Structure, rather than just local neighborhood, so further improve the degree of accuracy of particular point detection.
It following is a brief introduction of and calculate NCiThe theoretical foundation of measure value:
As shown in Fig. 5 (a), in geometry, all convex combinations in the convex closure of set point, formalization, convex group
Conjunction is represented by:
Wherein, w hereiniP expression w elemental composition, i.e. wiIt is i-th of elemental composition in column vector w, and wi≥0。w
What is represented is a column vector in weight expression matrix W.
Specifically, convex combination is a kind of special linear combination, wiIt is specific weight value, xi(i=1,2 ... k) it is data
Sample,It is k according to sample xi(i=1,2 ... linear combination k),It is then linear combination reconstruct p,
To determine suitable wi。
It is then the convex constraints to w.Meet that then p is referred to as data sample x to formula (6)i(i=
1,2 ... convex combination k).As shown in Fig. 5 (a), point p is x1,x2,x3Convex combination.
As shown in Fig. 5 (b), the affine hull of affine combination is then whole space.Formally, affine combination is represented by:
Meet formula (7), then q is referred to as xi(i=1,2 ... affine combination n).
Wherein, wiIt is specific weight value, xi(i=1,2 ... n) be data sample,For n sample xi(i=1,
2 ... linear combination n),Target is then linear combination reconstruct p, to determine suitable wi, andIt is then
To wiAffine constraints, not oriented convex combination limits w like thati≥0.As shown in Fig. 5 (b), point q is x1,x2,x3It is affine
Combination.
ConstraintsLimitation current point reconstructs in the space of its neighbour, and the weight of optimization passes through meter
Calculation obtains in the projection in the space.
Based on position existing for negative value element in affine combination, affine combinatorial theory is concentrated with reference to data, using it is reversible not
Up to index NCiAngle value goes to judge the degree that sample point belongs to particular point.The negative value composition of data representation generally corresponds to currently see
Compared with the point at edge in the affine combination of measuring point.
Theory shows, if 0≤wi≤ 1, point p is then on the inside of triangle or edge line.If arbitrary wiLess than 0
Or more than 1, then point p is located at the outside of triangle.If arbitrary wiEqual to 0, then point p is located on the side of triangle.
Clearly, if 0<wi<1, then point p be located at triangle interior.Other situations, point p are then located at the outside of triangle
Or on sideline.Assuming that internal data, in a convex set, according to affine combinatorial theory, with reference to weight expression matrix W, calculating can
Inverse unreachable NCiMeasure value, for judging particular point.The value that ambient density is distributed relatively low its reversible inaccessibility of point is higher.
According to above-mentioned theory, by increasing the object function of affine constraints, the weight expression matrix of data set is obtained
W.To each sample point, it is counted per a line wij≤ 0 element number, as NCiMeasure value.
Object function is shown below:
Wherein,
Step 5:Step 4 is obtained into n NC corresponding with n data sampleiValue sorts from big to small.Because NCiValue is got over
Greatly, the point is that the possibility of particular point is bigger, so come it is most preceding be abnormity point possibility it is maximum, more conducively judge abnormal
Point.
Step 6:Judge particular point, including step:
Step 61:Threshold value is determined by formula (3),
Wherein σ is factor of influence.
Step 62:For each sample xi, i ∈ { 1,2 ... n }, if NCi>=γ, then xiIt is judged as abnormity point.Its
In, xiRepresent i-th of data sample in data set X, NCiThe measure value obtained for step 4, γ are the threshold value that step 61 is set.
The particular point detection method of the present invention also includes checking particular point, and step is as follows:
Step 7:Visualization display, as 1≤m≤3, to data set X ∈ Rm′n, mark and show in space is visualized
Fixed particular point is judged, to judge particular point whether on the marginal position of data set.Work as m>When 3, to data set X ∈ Rm′n
Then its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.By intuitively observing mark
The performance of position judgment algorithm of particular point whether be improved.For method for visualizing, when the particular point detected is positioned at number
According on the edge of collection, then the carried detection method of explanation is effective.
Dimension reduction method can use LLE (Locally linear embedding), SMCE (Sparse Manifold
Clustering and Embedding) method etc..
Or the particular point detected by removing, see whether improve the performance of clustering algorithm to verify that particular point detects
The feasibility of method.Performance such as clustering algorithm (such as k-means, SMCE) increases, then shows that carried detection method can be with
Applied as a kind of effective data preprocessing method.When the performance of clustering algorithm is higher, illustrate that clustering algorithm is more effective.
In order to illustrate that algorithm detection data proposed by the present invention concentrate the effect of particular point, surveyed using different pieces of information collection
Have a try and test.
Fig. 2 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional Flame data sets.As shown in Fig. 2 with
The data point that diamond identifies is particular point, can intuitively be found out from Fig. 2, and method of the invention preferably have identified
Particular point positioned at data set edge.
Fig. 3 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional XOR data sets.As shown in figure 3, use water chestnut
The data point that shape frame identifies is particular point, can intuitively be found out from Fig. 3, and method of the invention preferably have identified position
Particular point in data set edge, preferably express the profile of data set.
Fig. 4 is the result figure for carrying out particular point judgement with the inventive method to Aggregation data sets.As shown in figure 4,
The data point identified with diamond is particular point, can intuitively be found out from Fig. 4, and method of the invention preferably identifies
It is located at the particular point at data set edge, preferably expresses the profile of data set.
In order to further objectively illustrate the performance of particular point detection method proposed by the present invention, essence is identified with reference to Top-m
The validity of degree checking the inventive method.Experimental data is BUPA data sets, and experimental result is as shown in table 1:
Table 1
Wherein, class is represented in table 1:Two different classifications in BUPA data sets, normal points (normal point)
Represent:The normal point detected, R (%) represent discrimination, and R is higher, and explanation algorithm accuracy of identification is the higher the better, and R is lower, and explanation is calculated
Method performance is poorer.
Top-m accuracy of identification refers to NCiValue is ranked up, to the NC of m before comingiValue is made to determine whether to be special
Point.
It should be noted that above-mentioned specific embodiment is exemplary, those skilled in the art can disclose in the present invention
Various solutions are found out under the inspiration of content, and these solutions also belong to disclosure of the invention scope and fall into this hair
Within bright protection domain.It will be understood by those skilled in the art that description of the invention and its accompanying drawing are illustrative and are not
Form limitations on claims.Protection scope of the present invention is limited by claim and its equivalent.
Claims (4)
1. a kind of particular point detection method based on data representation, it is characterised in that comprise the following steps:
Step 1:Input data set X ∈ Rm×n, wherein, X represents m × n data set matrix, and each row of the data set X represent
One data sample, i.e. X include n sample, and each sample has m dimensions, data sample xi∈Rm, i ∈ { 1,2 ... n }, m are represented
Sample dimension, n represent the number of samples of the data set X;
Step 2:Calculate the weight expression matrix W ∈ R of the data set Xn×n, W is calculated using formula (1),
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<munder>
<mo>&Sigma;</mo>
<mi>j</mi>
</munder>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<msub>
<mi>x</mi>
<mi>j</mi>
</msub>
<msup>
<mo>|</mo>
<mn>2</mn>
</msup>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>s</mi>
<mo>.</mo>
<mi>t</mi>
<mo>.</mo>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<munder>
<mo>&Sigma;</mo>
<mi>j</mi>
</munder>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>=</mo>
<mn>1.</mn>
</mrow>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, xiRepresent i-th of data sample in the data set X, xjRepresent j-th of data sample in data set, wijRepresent the
The annexation of i data sample and j-th of data sample, the weight expression matrix W are n × n matrixes;
Step 3:Calculate reversible unreachable index NCiMeasure value, for the data sample xi∈Rm, i ∈ { 1,2 ... n }, pass through
Formula (2) calculates NCiMeasure value,
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msub>
<mi>NC</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mi>j</mi>
</munder>
<mi>&chi;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>j</mi>
<mo>&Element;</mo>
<mo>{</mo>
<mn>1</mn>
<mo>,</mo>
<mn>2</mn>
<mo>,</mo>
<mo>...</mo>
<mi>n</mi>
<mo>}</mo>
<mo>,</mo>
</mrow>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein,
Wherein, wijFor the weight expression matrix W ∈ Rn×nI-th row, jth column element;
Step 4:N NC corresponding with n data sample is obtained to step 4iMeasure value sorts from big to small;
Step 5:Judge particular point, comprise the following steps:
Step 51:Threshold value γ is determined by formula (3),
<mrow>
<mi>&gamma;</mi>
<mo>=</mo>
<mfrac>
<mn>3</mn>
<msqrt>
<mn>2</mn>
</msqrt>
</mfrac>
<mi>&sigma;</mi>
<mo>,</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein σ is factor of influence;
Step 52:For each data sample xi, i ∈ { 1,2 ... n }, work as NCiDuring >=γ, xiIt is judged as abnormity point.
2. particular point detection method as claimed in claim 1, it is characterised in that methods described also includes calculating using formula (4)
The weight expression matrix W,
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msub>
<mi>X</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mo>&lsqb;</mo>
<mfrac>
<mrow>
<msub>
<mi>x</mi>
<mn>1</mn>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
</mrow>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>x</mi>
<mn>1</mn>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
</mfrac>
<mo>,</mo>
<mo>...</mo>
<mfrac>
<mrow>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
</mrow>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>n</mi>
</msub>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
</mfrac>
<mo>&rsqb;</mo>
<mo>&Element;</mo>
<msup>
<mi>R</mi>
<mrow>
<mi>m</mi>
<mo>&times;</mo>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
</msup>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>m</mi>
<mi>i</mi>
<mi>n</mi>
</mrow>
</mtd>
<mtd>
<mrow>
<msub>
<mi>&lambda;</mi>
<mn>1</mn>
</msub>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>Q</mi>
<mi>i</mi>
</msub>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mo>|</mo>
<mn>1</mn>
</msub>
<mo>+</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>X</mi>
<mi>i</mi>
</msub>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msubsup>
<mo>|</mo>
<mn>2</mn>
<mn>2</mn>
</msubsup>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>s</mi>
<mo>.</mo>
<mi>t</mi>
<mo>.</mo>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<msup>
<mn>1</mn>
<mi>T</mi>
</msup>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mn>1.</mn>
</mrow>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, for i-th of data sample x in the data set Xi∈Rm, remaining n-1 data sample is normalized first
With Uncoupled Property pretreatment operation, X is obtainedi∈Rm×(n-1);wiIt is a column vector, and requires wiThe value of middle all elements and for 1;
QiBe one apart from induced matrix, be the diagonal matrix of a positive definite;λ1It is balance factor, it is openness and affine heavy for balancing
Build error.
3. particular point detection method as claimed in claim 1, it is characterised in that methods described also includes calculating using formula (5)
The weight expression matrix W,
min||w||1, s.t., b=Xw and 1TW=1. (5)
Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in the known data set X, by it is affine about
Beam 1TW=1 and sparse limitation | | w | |1To determine that w, w are a column vectors, sample corresponding to nonzero element is used for reconstructing sample in w
This b.
4. particular point detection method as claimed in claim 2 or claim 3, it is characterised in that methods described also includes to special click-through
Rower is noted, and step is as follows:
Step 6:Visualization display, as 1≤m≤3, to the data set X ∈ Rm×n, mark and show in space is visualized
Fixed particular point is judged, to judge particular point whether on the marginal position of data set;Work as m>When 3, to the data set X ∈
Rm×nThen its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710548770.3A CN107341515A (en) | 2017-07-07 | 2017-07-07 | A kind of particular point detection method based on data representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710548770.3A CN107341515A (en) | 2017-07-07 | 2017-07-07 | A kind of particular point detection method based on data representation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107341515A true CN107341515A (en) | 2017-11-10 |
Family
ID=60219075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710548770.3A Pending CN107341515A (en) | 2017-07-07 | 2017-07-07 | A kind of particular point detection method based on data representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107341515A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921202A (en) * | 2018-06-12 | 2018-11-30 | 成都信息工程大学 | A kind of abnormal point detecting method based on data structure |
-
2017
- 2017-07-07 CN CN201710548770.3A patent/CN107341515A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921202A (en) * | 2018-06-12 | 2018-11-30 | 成都信息工程大学 | A kind of abnormal point detecting method based on data structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109947086B (en) | Mechanical fault migration diagnosis method and system based on counterstudy | |
Li et al. | Localizing and quantifying damage in social media images | |
CN111368690B (en) | Deep learning-based video image ship detection method and system under influence of sea waves | |
CN101520847B (en) | Pattern identification device and method | |
CN109658387A (en) | The detection method of the pantograph carbon slide defect of power train | |
CN113763312B (en) | Detection of defects in semiconductor samples using weak labels | |
CN102693452A (en) | Multiple-model soft-measuring method based on semi-supervised regression learning | |
CN103197299A (en) | Extraction and quantitative analysis system of weather radar radial wind information | |
CN110826642B (en) | Unsupervised anomaly detection method for sensor data | |
CN107016416B (en) | Data classification prediction method based on neighborhood rough set and PCA fusion | |
Zhang et al. | Data anomaly detection for structural health monitoring by multi-view representation based on local binary patterns | |
CN106991049A (en) | A kind of Software Defects Predict Methods and forecasting system | |
CN113591948A (en) | Defect pattern recognition method and device, electronic equipment and storage medium | |
CN112766301A (en) | Similarity judgment method for indicator diagram of oil extraction machine | |
CN117789038B (en) | Training method of data processing and recognition model based on machine learning | |
CN107341514A (en) | A kind of abnormity point and endpoint detections method based on joint density and angle | |
CN116842459B (en) | Electric energy metering fault diagnosis method and diagnosis terminal based on small sample learning | |
CN116720109B (en) | FPGA-based improved local linear embedded fan bearing fault diagnosis method | |
Rieger et al. | Aggregating explainability methods for neural networks stabilizes explanations | |
CN103093239B (en) | A kind of merged point to neighborhood information build drawing method | |
CN107341515A (en) | A kind of particular point detection method based on data representation | |
Qu et al. | Boundary detection using a Bayesian hierarchical model for multiscale spatial data | |
CN104462826B (en) | The detection of multisensor evidences conflict and measure based on Singular Value Decomposition Using | |
Li et al. | Semantic‐Segmentation‐Based Rail Fastener State Recognition Algorithm | |
CN116719714A (en) | Training method and corresponding device for screening model of test case |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171110 |
|
RJ01 | Rejection of invention patent application after publication |