CN107341515A - A kind of particular point detection method based on data representation - Google Patents

A kind of particular point detection method based on data representation Download PDF

Info

Publication number
CN107341515A
CN107341515A CN201710548770.3A CN201710548770A CN107341515A CN 107341515 A CN107341515 A CN 107341515A CN 201710548770 A CN201710548770 A CN 201710548770A CN 107341515 A CN107341515 A CN 107341515A
Authority
CN
China
Prior art keywords
mrow
msub
data
mtd
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710548770.3A
Other languages
Chinese (zh)
Inventor
李孝杰
吕建成
吴锡
周激流
史沧红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN201710548770.3A priority Critical patent/CN107341515A/en
Publication of CN107341515A publication Critical patent/CN107341515A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The present invention relates to a kind of particular point detection method based on data representation, based on position existing for negative value element in affine combination, concentrates affine combinatorial theory with reference to data, utilizes reversible unreachable measure value NCiGo to judge the degree that sample point belongs to particular point, particular point is automatically determined by setting threshold value γ automatically, improve the accuracy and speed of particular point detection, the architectural feature of data set is preferably reacted, and it is detected simultaneously by abnormity point and marginal point, the present invention is influenceed weaker by data distribution and data dimension in addition, and the scope of application is wider in actual applications, solves prior art to the accuracy of detection of particular point and the deficiency bad to high dimensional data detection performance.

Description

A kind of particular point detection method based on data representation
Technical field
The invention belongs to Data Mining, more particularly to a kind of particular point detection method based on data representation.
Background technology
The technologies such as traditional cluster, classification and pattern-recognition are intended to find common-mode, and the detection of particular point, including side Edge point and abnormity point, it is generally used for pattern that is effective, interesting and having potential value in identification data.With detecting normal point phase Than particular point detection is typically a more meaningful task.In addition, most of algorithm can be influenceed by abnormity point.Such as Famous non-linearity manifold study dimension-reduction algorithm Isomap does not describe the relevant issues of outlier detection in itself.Therefore, such as How correctly to detect that the abnormity point of higher dimensional space is a realistic problem urgently to be resolved hurrily, be that one in data prediction is important Task.
Domestic and international researcher proposes various detection algorithms from different technical standpoints.Can from the method for judging abnormity point It is divided into world model and partial model.World model makees two-value judgement for all observation stations.Whether judge Current observation point For abnormity point.And partial model typically assigns observation station certain measure value (such as angle change factor), for estimating that the point belongs to The degree of abnormity point.According to whether data tag information can be divided into supervision, semi-supervised and unsupervised algorithm.At present, mostly Figure method generally use and be based on statistics (Statistical-based), distance (Distance-based), neighbours or density (Density-based), cluster (Clustering-based) or deviate (Deviation-based) 5 class shallow-layer technological means.
Prior art:
Statistics-Based Method is abnormal point detecting method earlier.Such method generally assumes that data-oriented meets certain Distribution or probabilistic model, generally use inconsistency are examined to determine whether data are abnormal data.Such as meet Gaussian Profile The 3 σ methods of (Gaussian distribution), meet normal distribution (Normal distribution) croup this inspection Test method (Grubbs'test), based on χ2Method, the application such as analyze data divergence it is relatively broad.It is however, most based on system The method of meter is intended to the univariate data collection that processing meets known distribution.It is higher for Unknown Distribution, dimension under actual environment Data are then difficult to judge.The data age of magnanimity higher-dimension especially is being faced, such method can not obtain optimum detection effect Fruit.
Because geometric meaning understands it is currently used most common method based on the method for distance.This method is survey with distance The point for not having " enough " neighbours, is determined as abnormal data by degree, has been opened up extensively the inconsistency based on statistical method and has been examined thought. Distance-Based abnormity points DB (ε, d) method that more classical method such as Knorr and Ng are proposed.If in data set at least The distance of the point of ε × 100% to current data point is more than d, then is determined as abnormity point.This method have to parameter ε and d it is larger according to Lai Xing.For identical data set, using different parameter values, differing greatly for performance can be caused.And distance is based on later Neighbor approach, it may have similar parameters problem.Particularly in higher dimensional space, the measurement separability based on distance is poor (such as Following formula).
Wherein, m be data dimension, distmaxAnd distminRespectively current point is to its farthest neighbour and nearest-neighbors Distance.With m increase, the formula will go to zero.Therefore the High dimensional space data of distance measure processing generally existing has or more Or few inadaptability.
Existing abnormal point method of determining and calculating effect under specified conditions or specific area is preferable, or to compared with lower dimensional space Outlier detection effect is preferable, and when the dimension of data is higher, the effect of these algorithms is unsatisfactory, and generalization ability is weaker.At present The outlier detection of higher dimensional space was studied also in the starting stage.For example, Kriegel proposes the outlier detection based on angle Algorithm (ABOD), the algorithm is independent of problem of parameter selection.However, ABOD algorithms only consider current point and the relation of neighbours, And do not consider more relations between its neighbour, cause the algorithm to recognize the abnormity point of mistake.Therefore, in higher dimensional space Abnormal point method of determining and calculating need more in-depth study.
Therefore, how further to improve outlier detection precision and efficiency turns into current Data Mining urgent need to resolve Problem.
The content of the invention
For the deficiency of prior art, the present invention proposes a kind of particular point detection method based on data representation, including Following steps:
Step 1:Input data set X ∈ Rm′n, wherein, X represents m ' n data set matrix, each row of the data set X A data sample is represented, i.e. X includes n sample, and each sample has m dimensions, data sample xi∈Rm, i ∈ { 1,2 ... n }, m Sample dimension is represented, n represents the number of samples of the data set X;
Step 2:Calculate the weight expression matrix W ∈ R of the data set Xn′n, W is calculated using formula (1),
Wherein, xiRepresent i-th of data sample in the data set X, xjRepresent j-th of data sample in data set, wij The annexation of i-th of data sample and j-th of data sample is represented, the weight expression matrix W is n ' n matrix;
Step 3:Calculate reversible unreachable index NCiMeasure value, for the data sample xi∈Rm, i ∈ { 1,2 ... n }, NC is calculated by formula (2)iMeasure value,
Wherein,
Wherein, wijFor the weight expression matrix W ∈ Rn′nI-th row, jth column element;
Step 4:N NC corresponding with n data sample is obtained to step 4iMeasure value sorts from big to small;
Step 5:Judge particular point, comprise the following steps:
Step 51:Threshold value γ is determined by formula (3),
Wherein σ is factor of influence;
Step 52:For each data sample xi, i ∈ { 1,2 ... n }, work as NCiDuring >=γ, xiIt is judged as abnormity point.
According to a kind of preferred embodiment, methods described also includes calculating the weight expression matrix W using formula (4),
Wherein, for i-th of data sample x in the data set Xi∈Rm, remaining n-1 data sample is done returned first One change and Uncoupled Property pretreatment operation, obtain Xi∈Rm′(n-1);wiIt is a column vector, and requires wiThe value of middle all elements and For 1;QiBe one apart from induced matrix, be the diagonal matrix of a positive definite;λ1It is balance factor, it is openness and imitative for balancing Penetrate reconstruction error.
According to a kind of preferred embodiment, methods described also includes calculating the weight expression matrix W using formula (5),
min||w||1, s.t., b=Xw and 1TW=1. (5)
Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in the known data set X, by imitative Penetrate constraint 1TW=1 and sparse limitation | | w1To determine that w, w are a column vectors, sample corresponding to nonzero element is used for reconstructing in w Sample b.
According to a kind of preferred embodiment, methods described also includes being labeled particular point, and step is as follows:
Step 6:Visualization display, as 1≤m≤3, to the data set X ∈ Rm′n, marked in space is visualized aobvious Judged fixed particular point is shown, to judge particular point whether on the marginal position of data set;Work as m>When 3, to the data Collect X ∈ Rm′nThen its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.
The beneficial effect of technical solution of the present invention is:
1st, the abnormal point detecting method of the invention based on data representation, find that the inherent of high dimensional data is tied by data representation Structure, the specific features for abnormality detection that data representation is contained are disclosed, be generally called off-note.Inventive algorithm is preferably anti- The architectural feature of data set has been reflected, while has detected abnormity point and marginal point, suitable for the data of Unknown Distribution.
2nd, the present invention proposes reversible unreachable measurement index NCiBelong to the degree of marginal point for evaluating Current observation point, The index considers the global structure of data, rather than just local neighborhood, by setting threshold value automatically come automatic true Determine particular point, improve the accuracy and speed of particular point detection, preferably reacted the architectural feature of data set, met abnormity point Detect the demand in actual application environment.
3rd, the present invention is influenceed weaker by data distribution and data dimension, therefore is equally applicable to high dimensional data, actually should It is wider with the middle scope of application.
Brief description of the drawings
Fig. 1 is the flow chart of particular point detection method of the present invention;
Fig. 2 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional Flame data sets;
Fig. 3 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional XOR data sets;
Fig. 4 is the result figure for carrying out particular point judgement with the inventive method to Aggregation data sets;
Fig. 5 (a) is convex combination example;With
Fig. 5 (b) is affine example combinations.
Embodiment
It is described in detail below in conjunction with the accompanying drawings.
NC in the present inventioniMeasure value refers to:Reversible unreachable index, the journey at edge is belonged to for evaluating Current observation point Degree, it is defined as the global table relevant with Current observation point up to the linear function of middle null value and negative value number.
Particular point in the present invention includes abnormity point and marginal point.
For insufficient existing for existing algorithm, the present invention is based on position, binding number existing for negative value element in affine combination According to affine combinatorial theory is concentrated, NC is utilizediThe degree for judging that sample point belongs to particular point is estimated, by setting threshold value automatically γ automatically determines particular point, improves the accuracy and speed of particular point detection, has preferably reacted the architectural feature of data set, And it is detected simultaneously by abnormity point and marginal point.Meanwhile the present invention influenceed by data distribution and data dimension it is weaker, in practical application The middle scope of application is wider.
Fig. 1 is the flow chart of particular point detection method of the present invention, as shown in figure 1, one kind proposed by the present invention is based on data The particular point detection method of expression, comprises the following steps:
Step 1:Input data set X ∈ Rm′n, wherein, wherein X represents m ' n data set matrixes, and each row represent a data Sample, i.e. X include n sample, and each sample has m dimensions, xi∈Rm, i ∈ { 1,2 ... n }, m represent sample dimension, n expressions Data set number of samples.
Such as:The picture of 10 20 ' 20 sizes, column vector that can be each image procossing into 400 ' 1 sizes, then The data set X ∈ R of 10 image constructions400′10, X is exactly the matrix of 400 ' 10 sizes.
Step 2:Calculate the weight expression matrix W ∈ R of data setn′n, W is calculated using formula (1),
Wherein, xiRepresent i-th of data sample in data set, xjRepresent j-th of data sample in data set, wijRepresent the The annexation of i data sample and j-th of data sample.
Weight expression matrix W is n ' n matrix, because sharing n sample, wherein any one sample is to other n-1 Sample has distance, i.e. n-1 distance value, and in order to represent convenient, note to the distance of itself is 0, so any one sample arrives The distance of others point has n value, so n sample, shares n ' n values, forms n ' n matrix.
For i-th of sample xi,Represent constraint weight expression matrix the i-th rows of W and for 1.Table Linear combination, for reconstructing xi,Calculate xiReconstructed error,Accumulative n sample Reconstructed error.
Another embodiment, weight expression matrix W can be also calculated by formula (4):
Wherein, for i-th of data sample x in data set Xi∈Rm, first to remaining n-1 data sample do normalization and The pretreatment operation of Uncoupled Property, obtains Xi∈Rm′(n-1)。1TTransposition operation, w are carried out for complete 1 column vectoriIt is a column vector, 1Twi=1 is affine constraint wiThe value of middle all elements and for 1.QiApart from induced matrix, to be the diagonal matrix of a positive definite;λ1 Balance factor, for balance it is openness (| | | | 11Sparse limitation) and Affine Reconstruction errorW is by wiForm, wiFor The column vector of n × 1.I span is 1-n.
Another embodiment, weight expression matrix W can be also calculated by formula (5):
min||w||1, s.t., b=Xw and 1TW=1. (5)
Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in given data collection X.Specifically select X Which of sample, then pass through it is affine constraint 1TW=1 and sparse limitation | | w | |1To determine w.Sample corresponding to nonzero element in w For reconstructed sample b.B=Xw is reconstructed sample b expression formula.Wherein, for a reconstructed sample b and data set X, wTIt is power Certain row data in weight expression matrix W, wTIt is column vector w transposed vector, w represents a column vector.The value of all elements in w With for 1.
By calculating weight expression matrix W, to disclose the relation in data set between each sample, such as position relationship.This Invention provides three kinds of different embodiments, according to the characteristics of data set and can calculate requirement, choose different methods and calculate Weight matrix W.Formula (1) calculates simple and convenient, efficiency high.For more complicated data set, formula (4) can be used to calculate weight expression Matrix W.It is simple and convenient that formula (5) calculates weight expression matrix W.
Step 4:Calculate reversible unreachable survey index NCiMeasure value, for each sample point, NC is calculated by formula (2)i,
Wherein,
Wherein, wijFor weight expression matrix W ∈ Rn′nI-th row, jth column element.χ () is a kind of function representation form, when Function variable wijValue be less than or equal to 0, the function χ () value be 1, other situation function χ () values be 0.
For data sample xi∈Rm, i ∈ { 1,2 ... n }, formula (2) is used for statistical weight expression matrix W ∈ Rn′nI-th row The number of the upper element less than or equal to zero.
Compare the existing particular point detection method based on part, explicit based on weight expression matrix W, formula (2) The global structure for considering data set, disclose current data point and other points can not be connective, judges from overall angle Particular point, it is more beneficial for improving the performance of detection.When ambient density, to be distributed relatively low its reversible unreachable measure value of point higher.
Reversible unreachable survey index NCiAngle value is used to evaluate the degree that Current observation point belongs to marginal point.It is defined as Linear function of the relevant global table of Current observation point up to middle null value and negative value number.It explicitly considers the global knot of data Structure, rather than just local neighborhood, so further improve the degree of accuracy of particular point detection.
It following is a brief introduction of and calculate NCiThe theoretical foundation of measure value:
As shown in Fig. 5 (a), in geometry, all convex combinations in the convex closure of set point, formalization, convex group Conjunction is represented by:
Wherein, w hereiniP expression w elemental composition, i.e. wiIt is i-th of elemental composition in column vector w, and wi≥0。w What is represented is a column vector in weight expression matrix W.
Specifically, convex combination is a kind of special linear combination, wiIt is specific weight value, xi(i=1,2 ... k) it is data Sample,It is k according to sample xi(i=1,2 ... linear combination k),It is then linear combination reconstruct p, To determine suitable wi
It is then the convex constraints to w.Meet that then p is referred to as data sample x to formula (6)i(i= 1,2 ... convex combination k).As shown in Fig. 5 (a), point p is x1,x2,x3Convex combination.
As shown in Fig. 5 (b), the affine hull of affine combination is then whole space.Formally, affine combination is represented by:
Meet formula (7), then q is referred to as xi(i=1,2 ... affine combination n).
Wherein, wiIt is specific weight value, xi(i=1,2 ... n) be data sample,For n sample xi(i=1, 2 ... linear combination n),Target is then linear combination reconstruct p, to determine suitable wi, andIt is then To wiAffine constraints, not oriented convex combination limits w like thati≥0.As shown in Fig. 5 (b), point q is x1,x2,x3It is affine Combination.
ConstraintsLimitation current point reconstructs in the space of its neighbour, and the weight of optimization passes through meter Calculation obtains in the projection in the space.
Based on position existing for negative value element in affine combination, affine combinatorial theory is concentrated with reference to data, using it is reversible not Up to index NCiAngle value goes to judge the degree that sample point belongs to particular point.The negative value composition of data representation generally corresponds to currently see Compared with the point at edge in the affine combination of measuring point.
Theory shows, if 0≤wi≤ 1, point p is then on the inside of triangle or edge line.If arbitrary wiLess than 0 Or more than 1, then point p is located at the outside of triangle.If arbitrary wiEqual to 0, then point p is located on the side of triangle.
Clearly, if 0<wi<1, then point p be located at triangle interior.Other situations, point p are then located at the outside of triangle Or on sideline.Assuming that internal data, in a convex set, according to affine combinatorial theory, with reference to weight expression matrix W, calculating can Inverse unreachable NCiMeasure value, for judging particular point.The value that ambient density is distributed relatively low its reversible inaccessibility of point is higher.
According to above-mentioned theory, by increasing the object function of affine constraints, the weight expression matrix of data set is obtained W.To each sample point, it is counted per a line wij≤ 0 element number, as NCiMeasure value.
Object function is shown below:
Wherein,
Step 5:Step 4 is obtained into n NC corresponding with n data sampleiValue sorts from big to small.Because NCiValue is got over Greatly, the point is that the possibility of particular point is bigger, so come it is most preceding be abnormity point possibility it is maximum, more conducively judge abnormal Point.
Step 6:Judge particular point, including step:
Step 61:Threshold value is determined by formula (3),
Wherein σ is factor of influence.
Step 62:For each sample xi, i ∈ { 1,2 ... n }, if NCi>=γ, then xiIt is judged as abnormity point.Its In, xiRepresent i-th of data sample in data set X, NCiThe measure value obtained for step 4, γ are the threshold value that step 61 is set.
The particular point detection method of the present invention also includes checking particular point, and step is as follows:
Step 7:Visualization display, as 1≤m≤3, to data set X ∈ Rm′n, mark and show in space is visualized Fixed particular point is judged, to judge particular point whether on the marginal position of data set.Work as m>When 3, to data set X ∈ Rm′n Then its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.By intuitively observing mark The performance of position judgment algorithm of particular point whether be improved.For method for visualizing, when the particular point detected is positioned at number According on the edge of collection, then the carried detection method of explanation is effective.
Dimension reduction method can use LLE (Locally linear embedding), SMCE (Sparse Manifold Clustering and Embedding) method etc..
Or the particular point detected by removing, see whether improve the performance of clustering algorithm to verify that particular point detects The feasibility of method.Performance such as clustering algorithm (such as k-means, SMCE) increases, then shows that carried detection method can be with Applied as a kind of effective data preprocessing method.When the performance of clustering algorithm is higher, illustrate that clustering algorithm is more effective.
In order to illustrate that algorithm detection data proposed by the present invention concentrate the effect of particular point, surveyed using different pieces of information collection Have a try and test.
Fig. 2 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional Flame data sets.As shown in Fig. 2 with The data point that diamond identifies is particular point, can intuitively be found out from Fig. 2, and method of the invention preferably have identified Particular point positioned at data set edge.
Fig. 3 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional XOR data sets.As shown in figure 3, use water chestnut The data point that shape frame identifies is particular point, can intuitively be found out from Fig. 3, and method of the invention preferably have identified position Particular point in data set edge, preferably express the profile of data set.
Fig. 4 is the result figure for carrying out particular point judgement with the inventive method to Aggregation data sets.As shown in figure 4, The data point identified with diamond is particular point, can intuitively be found out from Fig. 4, and method of the invention preferably identifies It is located at the particular point at data set edge, preferably expresses the profile of data set.
In order to further objectively illustrate the performance of particular point detection method proposed by the present invention, essence is identified with reference to Top-m The validity of degree checking the inventive method.Experimental data is BUPA data sets, and experimental result is as shown in table 1:
Table 1
Wherein, class is represented in table 1:Two different classifications in BUPA data sets, normal points (normal point) Represent:The normal point detected, R (%) represent discrimination, and R is higher, and explanation algorithm accuracy of identification is the higher the better, and R is lower, and explanation is calculated Method performance is poorer.
Top-m accuracy of identification refers to NCiValue is ranked up, to the NC of m before comingiValue is made to determine whether to be special Point.
It should be noted that above-mentioned specific embodiment is exemplary, those skilled in the art can disclose in the present invention Various solutions are found out under the inspiration of content, and these solutions also belong to disclosure of the invention scope and fall into this hair Within bright protection domain.It will be understood by those skilled in the art that description of the invention and its accompanying drawing are illustrative and are not Form limitations on claims.Protection scope of the present invention is limited by claim and its equivalent.

Claims (4)

1. a kind of particular point detection method based on data representation, it is characterised in that comprise the following steps:
Step 1:Input data set X ∈ Rm×n, wherein, X represents m × n data set matrix, and each row of the data set X represent One data sample, i.e. X include n sample, and each sample has m dimensions, data sample xi∈Rm, i ∈ { 1,2 ... n }, m are represented Sample dimension, n represent the number of samples of the data set X;
Step 2:Calculate the weight expression matrix W ∈ R of the data set Xn×n, W is calculated using formula (1),
<mrow> <mtable> <mtr> <mtd> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>x</mi> <mi>j</mi> </msub> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mn>1.</mn> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein, xiRepresent i-th of data sample in the data set X, xjRepresent j-th of data sample in data set, wijRepresent the The annexation of i data sample and j-th of data sample, the weight expression matrix W are n × n matrixes;
Step 3:Calculate reversible unreachable index NCiMeasure value, for the data sample xi∈Rm, i ∈ { 1,2 ... n }, pass through Formula (2) calculates NCiMeasure value,
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>NC</mi> <mi>i</mi> </msub> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mi>&amp;chi;</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>j</mi> <mo>&amp;Element;</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mi>n</mi> <mo>}</mo> <mo>,</mo> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
Wherein,
Wherein, wijFor the weight expression matrix W ∈ Rn×nI-th row, jth column element;
Step 4:N NC corresponding with n data sample is obtained to step 4iMeasure value sorts from big to small;
Step 5:Judge particular point, comprise the following steps:
Step 51:Threshold value γ is determined by formula (3),
<mrow> <mi>&amp;gamma;</mi> <mo>=</mo> <mfrac> <mn>3</mn> <msqrt> <mn>2</mn> </msqrt> </mfrac> <mi>&amp;sigma;</mi> <mo>,</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
Wherein σ is factor of influence;
Step 52:For each data sample xi, i ∈ { 1,2 ... n }, work as NCiDuring >=γ, xiIt is judged as abnormity point.
2. particular point detection method as claimed in claim 1, it is characterised in that methods described also includes calculating using formula (4) The weight expression matrix W,
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>=</mo> <mo>&amp;lsqb;</mo> <mfrac> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>,</mo> <mo>...</mo> <mfrac> <mrow> <msub> <mi>x</mi> <mi>n</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mi>n</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>&amp;rsqb;</mo> <mo>&amp;Element;</mo> <msup> <mi>R</mi> <mrow> <mi>m</mi> <mo>&amp;times;</mo> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </msup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <mo>|</mo> <mo>|</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mo>|</mo> <mn>1</mn> </msub> <mo>+</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msup> <mn>1</mn> <mi>T</mi> </msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1.</mn> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
Wherein, for i-th of data sample x in the data set Xi∈Rm, remaining n-1 data sample is normalized first With Uncoupled Property pretreatment operation, X is obtainedi∈Rm×(n-1);wiIt is a column vector, and requires wiThe value of middle all elements and for 1; QiBe one apart from induced matrix, be the diagonal matrix of a positive definite;λ1It is balance factor, it is openness and affine heavy for balancing Build error.
3. particular point detection method as claimed in claim 1, it is characterised in that methods described also includes calculating using formula (5) The weight expression matrix W,
min||w||1, s.t., b=Xw and 1TW=1. (5)
Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in the known data set X, by it is affine about Beam 1TW=1 and sparse limitation | | w | |1To determine that w, w are a column vectors, sample corresponding to nonzero element is used for reconstructing sample in w This b.
4. particular point detection method as claimed in claim 2 or claim 3, it is characterised in that methods described also includes to special click-through Rower is noted, and step is as follows:
Step 6:Visualization display, as 1≤m≤3, to the data set X ∈ Rm×n, mark and show in space is visualized Fixed particular point is judged, to judge particular point whether on the marginal position of data set;Work as m>When 3, to the data set X ∈ Rm×nThen its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.
CN201710548770.3A 2017-07-07 2017-07-07 A kind of particular point detection method based on data representation Pending CN107341515A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710548770.3A CN107341515A (en) 2017-07-07 2017-07-07 A kind of particular point detection method based on data representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710548770.3A CN107341515A (en) 2017-07-07 2017-07-07 A kind of particular point detection method based on data representation

Publications (1)

Publication Number Publication Date
CN107341515A true CN107341515A (en) 2017-11-10

Family

ID=60219075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710548770.3A Pending CN107341515A (en) 2017-07-07 2017-07-07 A kind of particular point detection method based on data representation

Country Status (1)

Country Link
CN (1) CN107341515A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921202A (en) * 2018-06-12 2018-11-30 成都信息工程大学 A kind of abnormal point detecting method based on data structure

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921202A (en) * 2018-06-12 2018-11-30 成都信息工程大学 A kind of abnormal point detecting method based on data structure

Similar Documents

Publication Publication Date Title
CN109947086B (en) Mechanical fault migration diagnosis method and system based on counterstudy
Li et al. Localizing and quantifying damage in social media images
CN111368690B (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN101520847B (en) Pattern identification device and method
CN109658387A (en) The detection method of the pantograph carbon slide defect of power train
CN113763312B (en) Detection of defects in semiconductor samples using weak labels
CN102693452A (en) Multiple-model soft-measuring method based on semi-supervised regression learning
CN103197299A (en) Extraction and quantitative analysis system of weather radar radial wind information
CN110826642B (en) Unsupervised anomaly detection method for sensor data
CN107016416B (en) Data classification prediction method based on neighborhood rough set and PCA fusion
Zhang et al. Data anomaly detection for structural health monitoring by multi-view representation based on local binary patterns
CN106991049A (en) A kind of Software Defects Predict Methods and forecasting system
CN113591948A (en) Defect pattern recognition method and device, electronic equipment and storage medium
CN112766301A (en) Similarity judgment method for indicator diagram of oil extraction machine
CN117789038B (en) Training method of data processing and recognition model based on machine learning
CN107341514A (en) A kind of abnormity point and endpoint detections method based on joint density and angle
CN116842459B (en) Electric energy metering fault diagnosis method and diagnosis terminal based on small sample learning
CN116720109B (en) FPGA-based improved local linear embedded fan bearing fault diagnosis method
Rieger et al. Aggregating explainability methods for neural networks stabilizes explanations
CN103093239B (en) A kind of merged point to neighborhood information build drawing method
CN107341515A (en) A kind of particular point detection method based on data representation
Qu et al. Boundary detection using a Bayesian hierarchical model for multiscale spatial data
CN104462826B (en) The detection of multisensor evidences conflict and measure based on Singular Value Decomposition Using
Li et al. Semantic‐Segmentation‐Based Rail Fastener State Recognition Algorithm
CN116719714A (en) Training method and corresponding device for screening model of test case

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171110

RJ01 Rejection of invention patent application after publication