CN107341515A

CN107341515A - A kind of particular point detection method based on data representation

Info

Publication number: CN107341515A
Application number: CN201710548770.3A
Authority: CN
Inventors: 李孝杰; 吕建成; 吴锡; 周激流; 史沧红
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2017-07-07
Filing date: 2017-07-07
Publication date: 2017-11-10

Abstract

The present invention relates to a kind of particular point detection method based on data representation, based on position existing for negative value element in affine combination, concentrates affine combinatorial theory with reference to data, utilizes reversible unreachable measure value NC_iGo to judge the degree that sample point belongs to particular point, particular point is automatically determined by setting threshold value γ automatically, improve the accuracy and speed of particular point detection, the architectural feature of data set is preferably reacted, and it is detected simultaneously by abnormity point and marginal point, the present invention is influenceed weaker by data distribution and data dimension in addition, and the scope of application is wider in actual applications, solves prior art to the accuracy of detection of particular point and the deficiency bad to high dimensional data detection performance.

Description

A kind of particular point detection method based on data representation

Technical field

The invention belongs to Data Mining, more particularly to a kind of particular point detection method based on data representation.

Background technology

The technologies such as traditional cluster, classification and pattern-recognition are intended to find common-mode, and the detection of particular point, including side Edge point and abnormity point, it is generally used for pattern that is effective, interesting and having potential value in identification data.With detecting normal point phase Than particular point detection is typically a more meaningful task.In addition, most of algorithm can be influenceed by abnormity point.Such as Famous non-linearity manifold study dimension-reduction algorithm Isomap does not describe the relevant issues of outlier detection in itself.Therefore, such as How correctly to detect that the abnormity point of higher dimensional space is a realistic problem urgently to be resolved hurrily, be that one in data prediction is important Task.

Domestic and international researcher proposes various detection algorithms from different technical standpoints.Can from the method for judging abnormity point It is divided into world model and partial model.World model makees two-value judgement for all observation stations.Whether judge Current observation point For abnormity point.And partial model typically assigns observation station certain measure value (such as angle change factor), for estimating that the point belongs to The degree of abnormity point.According to whether data tag information can be divided into supervision, semi-supervised and unsupervised algorithm.At present, mostly Figure method generally use and be based on statistics (Statistical-based), distance (Distance-based), neighbours or density (Density-based), cluster (Clustering-based) or deviate (Deviation-based) 5 class shallow-layer technological means.

Prior art:

Statistics-Based Method is abnormal point detecting method earlier.Such method generally assumes that data-oriented meets certain Distribution or probabilistic model, generally use inconsistency are examined to determine whether data are abnormal data.Such as meet Gaussian Profile The 3 σ methods of (Gaussian distribution), meet normal distribution (Normal distribution) croup this inspection Test method (Grubbs'test), based on χ²Method, the application such as analyze data divergence it is relatively broad.It is however, most based on system The method of meter is intended to the univariate data collection that processing meets known distribution.It is higher for Unknown Distribution, dimension under actual environment Data are then difficult to judge.The data age of magnanimity higher-dimension especially is being faced, such method can not obtain optimum detection effect Fruit.

Because geometric meaning understands it is currently used most common method based on the method for distance.This method is survey with distance The point for not having " enough " neighbours, is determined as abnormal data by degree, has been opened up extensively the inconsistency based on statistical method and has been examined thought. Distance-Based abnormity points DB (ε, d) method that more classical method such as Knorr and Ng are proposed.If in data set at least The distance of the point of ε × 100% to current data point is more than d, then is determined as abnormity point.This method have to parameter ε and d it is larger according to Lai Xing.For identical data set, using different parameter values, differing greatly for performance can be caused.And distance is based on later Neighbor approach, it may have similar parameters problem.Particularly in higher dimensional space, the measurement separability based on distance is poor (such as Following formula).

Wherein, m be data dimension, dist_maxAnd dist_minRespectively current point is to its farthest neighbour and nearest-neighbors Distance.With m increase, the formula will go to zero.Therefore the High dimensional space data of distance measure processing generally existing has or more Or few inadaptability.

Existing abnormal point method of determining and calculating effect under specified conditions or specific area is preferable, or to compared with lower dimensional space Outlier detection effect is preferable, and when the dimension of data is higher, the effect of these algorithms is unsatisfactory, and generalization ability is weaker.At present The outlier detection of higher dimensional space was studied also in the starting stage.For example, Kriegel proposes the outlier detection based on angle Algorithm (ABOD), the algorithm is independent of problem of parameter selection.However, ABOD algorithms only consider current point and the relation of neighbours, And do not consider more relations between its neighbour, cause the algorithm to recognize the abnormity point of mistake.Therefore, in higher dimensional space Abnormal point method of determining and calculating need more in-depth study.

Therefore, how further to improve outlier detection precision and efficiency turns into current Data Mining urgent need to resolve Problem.

The content of the invention

For the deficiency of prior art, the present invention proposes a kind of particular point detection method based on data representation, including Following steps：

Step 1：Input data set X ∈ R^m′n, wherein, X represents m ' n data set matrix, each row of the data set X A data sample is represented, i.e. X includes n sample, and each sample has m dimensions, data sample x_i∈R^m, i ∈ { 1,2 ... n }, m Sample dimension is represented, n represents the number of samples of the data set X；

Step 2：Calculate the weight expression matrix W ∈ R of the data set X^n′n, W is calculated using formula (1),

Wherein, x_iRepresent i-th of data sample in the data set X, x_jRepresent j-th of data sample in data set, w_ij The annexation of i-th of data sample and j-th of data sample is represented, the weight expression matrix W is n ' n matrix；

Step 3：Calculate reversible unreachable index NC_iMeasure value, for the data sample x_i∈R^m, i ∈ { 1,2 ... n }, NC is calculated by formula (2)_iMeasure value,

Wherein,

Wherein, w_ijFor the weight expression matrix W ∈ R^n′nI-th row, jth column element；

Step 4：N NC corresponding with n data sample is obtained to step 4_iMeasure value sorts from big to small；

Step 5：Judge particular point, comprise the following steps：

Step 51：Threshold value γ is determined by formula (3),

Wherein σ is factor of influence；

Step 52：For each data sample x_i, i ∈ { 1,2 ... n }, work as NC_iDuring >=γ, x_iIt is judged as abnormity point.

According to a kind of preferred embodiment, methods described also includes calculating the weight expression matrix W using formula (4),

Wherein, for i-th of data sample x in the data set X_i∈R^m, remaining n-1 data sample is done returned first One change and Uncoupled Property pretreatment operation, obtain X_i∈R^m′(n-1)；w_iIt is a column vector, and requires w_iThe value of middle all elements and For 1；Q_iBe one apart from induced matrix, be the diagonal matrix of a positive definite；λ₁It is balance factor, it is openness and imitative for balancing Penetrate reconstruction error.

According to a kind of preferred embodiment, methods described also includes calculating the weight expression matrix W using formula (5),

min||w||₁, s.t., b=Xw and 1^TW=1. (5)

Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in the known data set X, by imitative Penetrate constraint 1^TW=1 and sparse limitation | | w₁To determine that w, w are a column vectors, sample corresponding to nonzero element is used for reconstructing in w Sample b.

According to a kind of preferred embodiment, methods described also includes being labeled particular point, and step is as follows：

Step 6：Visualization display, as 1≤m≤3, to the data set X ∈ R^m′n, marked in space is visualized aobvious Judged fixed particular point is shown, to judge particular point whether on the marginal position of data set；Work as m>When 3, to the data Collect X ∈ R^m′nThen its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.

The beneficial effect of technical solution of the present invention is：

1st, the abnormal point detecting method of the invention based on data representation, find that the inherent of high dimensional data is tied by data representation Structure, the specific features for abnormality detection that data representation is contained are disclosed, be generally called off-note.Inventive algorithm is preferably anti- The architectural feature of data set has been reflected, while has detected abnormity point and marginal point, suitable for the data of Unknown Distribution.

2nd, the present invention proposes reversible unreachable measurement index NC_iBelong to the degree of marginal point for evaluating Current observation point, The index considers the global structure of data, rather than just local neighborhood, by setting threshold value automatically come automatic true Determine particular point, improve the accuracy and speed of particular point detection, preferably reacted the architectural feature of data set, met abnormity point Detect the demand in actual application environment.

3rd, the present invention is influenceed weaker by data distribution and data dimension, therefore is equally applicable to high dimensional data, actually should It is wider with the middle scope of application.

Brief description of the drawings

Fig. 1 is the flow chart of particular point detection method of the present invention；

Fig. 2 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional Flame data sets；

Fig. 3 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional XOR data sets；

Fig. 4 is the result figure for carrying out particular point judgement with the inventive method to Aggregation data sets；

Fig. 5 (a) is convex combination example；With

Fig. 5 (b) is affine example combinations.

Embodiment

It is described in detail below in conjunction with the accompanying drawings.

NC in the present invention_iMeasure value refers to：Reversible unreachable index, the journey at edge is belonged to for evaluating Current observation point Degree, it is defined as the global table relevant with Current observation point up to the linear function of middle null value and negative value number.

Particular point in the present invention includes abnormity point and marginal point.

For insufficient existing for existing algorithm, the present invention is based on position, binding number existing for negative value element in affine combination According to affine combinatorial theory is concentrated, NC is utilized_iThe degree for judging that sample point belongs to particular point is estimated, by setting threshold value automatically γ automatically determines particular point, improves the accuracy and speed of particular point detection, has preferably reacted the architectural feature of data set, And it is detected simultaneously by abnormity point and marginal point.Meanwhile the present invention influenceed by data distribution and data dimension it is weaker, in practical application The middle scope of application is wider.

Fig. 1 is the flow chart of particular point detection method of the present invention, as shown in figure 1, one kind proposed by the present invention is based on data The particular point detection method of expression, comprises the following steps：

Step 1：Input data set X ∈ R^m′n, wherein, wherein X represents m ' n data set matrixes, and each row represent a data Sample, i.e. X include n sample, and each sample has m dimensions, x_i∈R^m, i ∈ { 1,2 ... n }, m represent sample dimension, n expressions Data set number of samples.

Such as：The picture of 10 20 ' 20 sizes, column vector that can be each image procossing into 400 ' 1 sizes, then The data set X ∈ R of 10 image constructions^400′10, X is exactly the matrix of 400 ' 10 sizes.

Step 2：Calculate the weight expression matrix W ∈ R of data set^n′n, W is calculated using formula (1),

Wherein, x_iRepresent i-th of data sample in data set, x_jRepresent j-th of data sample in data set, w_ijRepresent the The annexation of i data sample and j-th of data sample.

Weight expression matrix W is n ' n matrix, because sharing n sample, wherein any one sample is to other n-1 Sample has distance, i.e. n-1 distance value, and in order to represent convenient, note to the distance of itself is 0, so any one sample arrives The distance of others point has n value, so n sample, shares n ' n values, forms n ' n matrix.

For i-th of sample x_i,Represent constraint weight expression matrix the i-th rows of W and for 1.Table Linear combination, for reconstructing x_i,Calculate x_iReconstructed error,Accumulative n sample Reconstructed error.

Another embodiment, weight expression matrix W can be also calculated by formula (4)：

Wherein, for i-th of data sample x in data set X_i∈R^m, first to remaining n-1 data sample do normalization and The pretreatment operation of Uncoupled Property, obtains X_i∈R^m′(n-¹⁾。1^TTransposition operation, w are carried out for complete 1 column vector_iIt is a column vector, 1^Tw_i=1 is affine constraint w_iThe value of middle all elements and for 1.Q_iApart from induced matrix, to be the diagonal matrix of a positive definite；λ₁ Balance factor, for balance it is openness (| | | | 1₁Sparse limitation) and Affine Reconstruction errorW is by w_iForm, w_iFor The column vector of n × 1.I span is 1-n.

Another embodiment, weight expression matrix W can be also calculated by formula (5)：

min||w||₁, s.t., b=Xw and 1^TW=1. (5)

Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in given data collection X.Specifically select X Which of sample, then pass through it is affine constraint 1^TW=1 and sparse limitation | | w | |₁To determine w.Sample corresponding to nonzero element in w For reconstructed sample b.B=Xw is reconstructed sample b expression formula.Wherein, for a reconstructed sample b and data set X, w^TIt is power Certain row data in weight expression matrix W, w^TIt is column vector w transposed vector, w represents a column vector.The value of all elements in w With for 1.

By calculating weight expression matrix W, to disclose the relation in data set between each sample, such as position relationship.This Invention provides three kinds of different embodiments, according to the characteristics of data set and can calculate requirement, choose different methods and calculate Weight matrix W.Formula (1) calculates simple and convenient, efficiency high.For more complicated data set, formula (4) can be used to calculate weight expression Matrix W.It is simple and convenient that formula (5) calculates weight expression matrix W.

Step 4：Calculate reversible unreachable survey index NC_iMeasure value, for each sample point, NC is calculated by formula (2)_i,

Wherein,

Wherein, w_ijFor weight expression matrix W ∈ R^n′nI-th row, jth column element.χ () is a kind of function representation form, when Function variable w_ijValue be less than or equal to 0, the function χ () value be 1, other situation function χ () values be 0.

For data sample x_i∈R^m, i ∈ { 1,2 ... n }, formula (2) is used for statistical weight expression matrix W ∈ R^n′nI-th row The number of the upper element less than or equal to zero.

Compare the existing particular point detection method based on part, explicit based on weight expression matrix W, formula (2) The global structure for considering data set, disclose current data point and other points can not be connective, judges from overall angle Particular point, it is more beneficial for improving the performance of detection.When ambient density, to be distributed relatively low its reversible unreachable measure value of point higher.

Reversible unreachable survey index NC_iAngle value is used to evaluate the degree that Current observation point belongs to marginal point.It is defined as Linear function of the relevant global table of Current observation point up to middle null value and negative value number.It explicitly considers the global knot of data Structure, rather than just local neighborhood, so further improve the degree of accuracy of particular point detection.

It following is a brief introduction of and calculate NC_iThe theoretical foundation of measure value：

As shown in Fig. 5 (a), in geometry, all convex combinations in the convex closure of set point, formalization, convex group Conjunction is represented by：

Wherein, w herein_iP expression w elemental composition, i.e. w_iIt is i-th of elemental composition in column vector w, and w_i≥0。w What is represented is a column vector in weight expression matrix W.

Specifically, convex combination is a kind of special linear combination, w_iIt is specific weight value, x_i(i=1,2 ... k) it is data Sample,It is k according to sample x_i(i=1,2 ... linear combination k),It is then linear combination reconstruct p, To determine suitable w_i。

It is then the convex constraints to w.Meet that then p is referred to as data sample x to formula (6)_i(i= 1,2 ... convex combination k).As shown in Fig. 5 (a), point p is x₁,x₂,x₃Convex combination.

As shown in Fig. 5 (b), the affine hull of affine combination is then whole space.Formally, affine combination is represented by：

Meet formula (7), then q is referred to as x_i(i=1,2 ... affine combination n).

Wherein, w_iIt is specific weight value, x_i(i=1,2 ... n) be data sample,For n sample x_i(i=1, 2 ... linear combination n),Target is then linear combination reconstruct p, to determine suitable w_i, andIt is then To w_iAffine constraints, not oriented convex combination limits w like that_i≥0.As shown in Fig. 5 (b), point q is x₁,x₂,x₃It is affine Combination.

ConstraintsLimitation current point reconstructs in the space of its neighbour, and the weight of optimization passes through meter Calculation obtains in the projection in the space.

Based on position existing for negative value element in affine combination, affine combinatorial theory is concentrated with reference to data, using it is reversible not Up to index NC_iAngle value goes to judge the degree that sample point belongs to particular point.The negative value composition of data representation generally corresponds to currently see Compared with the point at edge in the affine combination of measuring point.

Theory shows, if 0≤w_i≤ 1, point p is then on the inside of triangle or edge line.If arbitrary w_iLess than 0 Or more than 1, then point p is located at the outside of triangle.If arbitrary w_iEqual to 0, then point p is located on the side of triangle.

Clearly, if 0<w_i<1, then point p be located at triangle interior.Other situations, point p are then located at the outside of triangle Or on sideline.Assuming that internal data, in a convex set, according to affine combinatorial theory, with reference to weight expression matrix W, calculating can Inverse unreachable NC_iMeasure value, for judging particular point.The value that ambient density is distributed relatively low its reversible inaccessibility of point is higher.

According to above-mentioned theory, by increasing the object function of affine constraints, the weight expression matrix of data set is obtained W.To each sample point, it is counted per a line w_ij≤ 0 element number, as NC_iMeasure value.

Object function is shown below：

Wherein,

Step 5：Step 4 is obtained into n NC corresponding with n data sample_iValue sorts from big to small.Because NC_iValue is got over Greatly, the point is that the possibility of particular point is bigger, so come it is most preceding be abnormity point possibility it is maximum, more conducively judge abnormal Point.

Step 6：Judge particular point, including step：

Step 61：Threshold value is determined by formula (3),

Wherein σ is factor of influence.

Step 62：For each sample x_i, i ∈ { 1,2 ... n }, if NC_i>=γ, then x_iIt is judged as abnormity point.Its In, x_iRepresent i-th of data sample in data set X, NC_iThe measure value obtained for step 4, γ are the threshold value that step 61 is set.

The particular point detection method of the present invention also includes checking particular point, and step is as follows：

Step 7：Visualization display, as 1≤m≤3, to data set X ∈ R^m′n, mark and show in space is visualized Fixed particular point is judged, to judge particular point whether on the marginal position of data set.Work as m>When 3, to data set X ∈ R^m′n Then its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.By intuitively observing mark The performance of position judgment algorithm of particular point whether be improved.For method for visualizing, when the particular point detected is positioned at number According on the edge of collection, then the carried detection method of explanation is effective.

Dimension reduction method can use LLE (Locally linear embedding), SMCE (Sparse Manifold Clustering and Embedding) method etc..

Or the particular point detected by removing, see whether improve the performance of clustering algorithm to verify that particular point detects The feasibility of method.Performance such as clustering algorithm (such as k-means, SMCE) increases, then shows that carried detection method can be with Applied as a kind of effective data preprocessing method.When the performance of clustering algorithm is higher, illustrate that clustering algorithm is more effective.

In order to illustrate that algorithm detection data proposed by the present invention concentrate the effect of particular point, surveyed using different pieces of information collection Have a try and test.

Fig. 2 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional Flame data sets.As shown in Fig. 2 with The data point that diamond identifies is particular point, can intuitively be found out from Fig. 2, and method of the invention preferably have identified Particular point positioned at data set edge.

Fig. 3 is the result figure for carrying out particular point judgement with the inventive method to two-dimentional XOR data sets.As shown in figure 3, use water chestnut The data point that shape frame identifies is particular point, can intuitively be found out from Fig. 3, and method of the invention preferably have identified position Particular point in data set edge, preferably express the profile of data set.

Fig. 4 is the result figure for carrying out particular point judgement with the inventive method to Aggregation data sets.As shown in figure 4, The data point identified with diamond is particular point, can intuitively be found out from Fig. 4, and method of the invention preferably identifies It is located at the particular point at data set edge, preferably expresses the profile of data set.

In order to further objectively illustrate the performance of particular point detection method proposed by the present invention, essence is identified with reference to Top-m The validity of degree checking the inventive method.Experimental data is BUPA data sets, and experimental result is as shown in table 1：

Table 1

Wherein, class is represented in table 1：Two different classifications in BUPA data sets, normal points (normal point) Represent：The normal point detected, R (%) represent discrimination, and R is higher, and explanation algorithm accuracy of identification is the higher the better, and R is lower, and explanation is calculated Method performance is poorer.

Top-m accuracy of identification refers to NC_iValue is ranked up, to the NC of m before coming_iValue is made to determine whether to be special Point.

It should be noted that above-mentioned specific embodiment is exemplary, those skilled in the art can disclose in the present invention Various solutions are found out under the inspiration of content, and these solutions also belong to disclosure of the invention scope and fall into this hair Within bright protection domain.It will be understood by those skilled in the art that description of the invention and its accompanying drawing are illustrative and are not Form limitations on claims.Protection scope of the present invention is limited by claim and its equivalent.

Claims

1. a kind of particular point detection method based on data representation, it is characterised in that comprise the following steps：

Step 1：Input data set X ∈ R^m×n, wherein, X represents m × n data set matrix, and each row of the data set X represent One data sample, i.e. X include n sample, and each sample has m dimensions, data sample x_i∈R^m, i ∈ { 1,2 ... n }, m are represented Sample dimension, n represent the number of samples of the data set X；

Step 2：Calculate the weight expression matrix W ∈ R of the data set X^n×n, W is calculated using formula (1),

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>x</mi> <mi>j</mi> </msub> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mn>1.</mn> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Wherein, x_iRepresent i-th of data sample in the data set X, x_jRepresent j-th of data sample in data set, w_ijRepresent the The annexation of i data sample and j-th of data sample, the weight expression matrix W are n × n matrixes；

Step 3：Calculate reversible unreachable index NC_iMeasure value, for the data sample x_i∈R^m, i ∈ { 1,2 ... n }, pass through Formula (2) calculates NC_iMeasure value,

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>NC</mi> <mi>i</mi> </msub> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <mi>&chi;</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>j</mi> <mo>&Element;</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mi>n</mi> <mo>}</mo> <mo>,</mo> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Wherein,

Wherein, w_ijFor the weight expression matrix W ∈ R^n×nI-th row, jth column element；

Step 5：Judge particular point, comprise the following steps：

Step 51：Threshold value γ is determined by formula (3),

<mrow> <mi>&gamma;</mi> <mo>=</mo> <mfrac> <mn>3</mn> <msqrt> <mn>2</mn> </msqrt> </mfrac> <mi>&sigma;</mi> <mo>,</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

Wherein σ is factor of influence；

2. particular point detection method as claimed in claim 1, it is characterised in that methods described also includes calculating using formula (4) The weight expression matrix W,

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>=</mo> <mo>&lsqb;</mo> <mfrac> <mrow> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mn>1</mn> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>,</mo> <mo>...</mo> <mfrac> <mrow> <msub> <mi>x</mi> <mi>n</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>x</mi> <mi>n</mi> </msub> <mo>-</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>&rsqb;</mo> <mo>&Element;</mo> <msup> <mi>R</mi> <mrow> <mi>m</mi> <mo>&times;</mo> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </msup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> <mo>|</mo> <mo>|</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mo>|</mo> <mn>1</mn> </msub> <mo>+</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>|</mo> <mo>|</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <msup> <mn>1</mn> <mi>T</mi> </msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1.</mn> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Wherein, for i-th of data sample x in the data set X_i∈R^m, remaining n-1 data sample is normalized first With Uncoupled Property pretreatment operation, X is obtained_i∈R^m×(n-1)；w_iIt is a column vector, and requires w_iThe value of middle all elements and for 1； Q_iBe one apart from induced matrix, be the diagonal matrix of a positive definite；λ₁It is balance factor, it is openness and affine heavy for balancing Build error.

3. particular point detection method as claimed in claim 1, it is characterised in that methods described also includes calculating using formula (5) The weight expression matrix W,

min||w||₁, s.t., b=Xw and 1^TW=1. (5)

Wherein, b is reconstructed sample, and reconstructed sample b is obtained using the part sample in the known data set X, by it is affine about Beam 1^TW=1 and sparse limitation | | w | |₁To determine that w, w are a column vectors, sample corresponding to nonzero element is used for reconstructing sample in w This b.

4. particular point detection method as claimed in claim 2 or claim 3, it is characterised in that methods described also includes to special click-through Rower is noted, and step is as follows：

Step 6：Visualization display, as 1≤m≤3, to the data set X ∈ R^m×n, mark and show in space is visualized Fixed particular point is judged, to judge particular point whether on the marginal position of data set；Work as m>When 3, to the data set X ∈ R^m×nThen its dimensionality reduction is labeled by display to particular point using dimension reduction method again to visualization space.