CN104268149A - Clustering method and clustering device - Google Patents

Clustering method and clustering device Download PDF

Info

Publication number
CN104268149A
CN104268149A CN201410432412.2A CN201410432412A CN104268149A CN 104268149 A CN104268149 A CN 104268149A CN 201410432412 A CN201410432412 A CN 201410432412A CN 104268149 A CN104268149 A CN 104268149A
Authority
CN
China
Prior art keywords
class
distance
rank
candidate
less
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410432412.2A
Other languages
Chinese (zh)
Inventor
陈志军
张涛
龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Technology Co Ltd
Xiaomi Inc
Original Assignee
Xiaomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Inc filed Critical Xiaomi Inc
Priority to CN201410432412.2A priority Critical patent/CN104268149A/en
Publication of CN104268149A publication Critical patent/CN104268149A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the invention discloses a clustering method and a clustering device. The clustering device includes that iterative merging is performed on classes, satisfying conditions, of inter-class Rank-Order distance DR(Ci, Cj), interclass normalized Rank-Order distance DN(Ci, Cj) and inter-class absolute distance d(Ci, Cj) in sample data, neighborhood between classes can be measured through the inter-class Rank-Order distance DR(Ci, Cj) and the interclass normalized Rank-Order distance DN(Ci, Cj), absolute distance relationship between the classes is measured through the inter-class absolute distance d(Ci, Cj), direct similarity of the two classes can be measured directly according to the inter-class absolute distance, the classes small in similarity are classified according to the inter-class absolute distance, and outliers in the classes are determined accordingly. Therefore, the outliers small in similarity are removed, and accuracy of clustering results is improved. Particularly, when many data objects exist in the sample data but data objects belonging to the same class are few, accuracy of the clustering results is high.

Description

Clustering method and device
Technical field
The disclosure relates to field of computer technology, particularly relates to a kind of clustering method and device.
Background technology
Cluster is the process set of physics or abstract object being divided into the multiple classes (bunch) be made up of similar object, by the process of object classification to different classes, object in same class has very large similarity, and the object between inhomogeneity has very large diversity.Hereafter use the concept of " class ", it should be noted that, " class " is identical with the implication of " bunch " herein.
Such as, when clustering method is used for the classification of face picture, the picture belonging to same person is divided into a class, relevant clustering method adopts the similarity between Rank-Order distance metric two faces, the picture of same person can be flocked together.But, more for the face number ratio comprised in a pile picture, and when the picture number comprising everyone face is fewer, the cluster result accuracy rate of this kind of clustering method is very low.
Summary of the invention
For overcoming Problems existing in correlation technique, the disclosure provides a kind of clustering method and device.
In order to solve the problems of the technologies described above, disclosure embodiment discloses following technical scheme:
According to the first aspect of disclosure embodiment, a kind of clustering method is provided, comprises:
Obtain any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j);
For any two the class C in described sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class;
As described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class;
When the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns and perform any two class C of acquisition iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
In conjunction with first aspect, in the first possible implementation of first aspect, described for any two class C iwith C j, according to described D r(C i, C j), described D n(C i, C j) and described d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class, in the following way:
Judge Rank-Order distance D between described class r(C i, C j) whether be less than the first distance threshold, and normalization Rank-Order distance D between described class n(C i, C j) whether be less than 1, and absolute distance d (C between described class i, C j) whether be less than second distance threshold value;
As described D r(C i, C j) be less than described first distance threshold, and described D n(C i, C j) be less than 1, and described d (C i, C j) when being less than described second distance threshold value, determine class C iwith class C jthat candidate merges class.
In conjunction with the first possible implementation of first aspect, in the implementation that the second of first aspect is possible, described method also comprises: as described D r(C i, C j) be not less than described first distance threshold, or, described D n(C i, C j) be not less than 1, or, described d (C i, C j) when being not less than described second distance threshold value, judge whether any two classes in described sample data except the class judged are that candidate merges class, until whole described sample data all completes judgement.
In conjunction with the first possible implementation of first aspect, first aspect or the possible implementation of the second of first aspect, in the third possible implementation of first aspect, between described class, absolute distance comprises the minor increment between mean distance between two classes or two classes.
In conjunction with first aspect, in the 4th kind of possible implementation of first aspect, described as described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, in the following way:
After determining that candidates whole in described sample data merges class, merge described candidate between two and merge class, until there is not candidate to merge class.
According to the second aspect of disclosure embodiment, a kind of clustering apparatus is provided, comprises:
First acquiring unit, for obtaining any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j);
Second acquisition unit, for obtaining any two class C in described sample data iwith C jbetween class between absolute distance d (C i, C j);
3rd acquiring unit, for obtaining any two class C in described sample data iwith C jbetween class between normalization Rank-Order distance D n(C i, C j);
First judging unit, for for any two the class C in described sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class;
Iteration merge cells, for working as described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class, and, when the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns to perform and obtain any two class C iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
In conjunction with second aspect, in the first possible implementation of second aspect, described first judging unit, comprising:
First judgment sub-unit, for judging Rank-Order distance D between described class r(C i, C j) whether be less than the first distance threshold, and normalization Rank-Order distance D between described class n(C i, C j) whether be less than 1, and absolute distance d (C between described class i, C j) whether be less than second distance threshold value;
Determine subelement, for working as described D r(C i, C j) be less than described first distance threshold, and described D n(C i, C j) be less than 1, and described d (C i, C j) when being less than described second distance threshold value, determine class C iwith class C jthat candidate merges class.
In conjunction with the first possible implementation of second aspect, in the implementation that the second of second aspect is possible, described first judging unit also comprises:
Second judgment sub-unit, for working as described D r(C i, C j) be not less than described first distance threshold, or, described D n(C i, C j) be not less than 1, or, described d (C i, C j) when being not less than described second distance threshold value, judge whether any two classes in described sample data except the class judged are that candidate merges class, until whole described sample data all completes judgement.
In conjunction with second aspect, in the third possible implementation of second aspect, described iteration merge cells comprises:
Merge subelement, after determining that candidates whole in described sample data merges class, merge described candidate between two and merge class, until there is not candidate to merge class.
According to the third aspect of disclosure embodiment, a kind of terminal is provided, comprises: processor; For the storer of storage of processor executable instruction; Wherein, described processor is configured to:
Obtain any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j);
For any two the class C in described sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class;
As described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class;
When the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns and perform any two class C of acquisition iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
The technical scheme that embodiment of the present disclosure provides can comprise following beneficial effect: the clustering method that the present embodiment provides, by Rank-Order distance D between class in sample data r(C i, C j), normalization Rank-Order distance D between class n(C i, C j) and class between absolute distance d (C i, C j) qualified class carries out iteration merging, wherein, by Rank-Order distance D between class r(C i, C j) and class between normalization Rank-Order distance D n(C i, C j) neighborhood between class and class can be measured, by absolute distance d (C between class i, C j) tolerance class and class between absolute distance relation, accurately can measure two direct similarities of class according to absolute distance between class, according to absolute distance between class, class less for similarity be divided out, namely determine the outlier in class.Thus realize in cluster process, outlier less for similarity being weeded out, improve the accuracy rate of cluster result.Especially, when in sample data, data object is more, but belong to of a sort object comparison few time, cluster result accurately higher.
Should be understood that, it is only exemplary that above general description and details hereinafter describe, and can not limit the disclosure.
Accompanying drawing explanation
Accompanying drawing to be herein merged in instructions and to form the part of this instructions, shows embodiment according to the invention, and is used from instructions one and explains principle of the present invention.
Fig. 1 is the sequence permutation schematic diagram of multiple object;
Fig. 2 is the process flow diagram of a kind of clustering method according to an exemplary embodiment;
Fig. 3 is the block diagram of a kind of clustering apparatus according to an exemplary embodiment;
Fig. 4 is a kind of device block diagram for clustering method according to an exemplary embodiment;
Fig. 5 is the device block diagram of the another kind according to an exemplary embodiment for clustering method.
By above-mentioned accompanying drawing, illustrate the embodiment that the disclosure is clear and definite more detailed description will be had hereinafter.These accompanying drawings be not in order to limited by any mode the disclosure design scope, but by reference to specific embodiment for those skilled in the art illustrate concept of the present disclosure.
Embodiment
Here will be described exemplary embodiment in detail, its sample table shows in the accompanying drawings.When description below relates to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawing represents same or analogous key element.Embodiment described in following exemplary embodiment does not represent all embodiments consistent with the disclosure.On the contrary, they only with as in appended claims describe in detail, the example of apparatus and method that aspects more of the present disclosure are consistent.
Before exemplary embodiment of the present disclosure is described, first introduce the relevant knowledge of Rank-Order distance.First, the distance (such as, cosine similarity, Euclidean distance etc.) between calculating object, resequences according to each object of large young pathbreaker of distance, obtains a sequence.Suppose there be n object, be respectively i 1, i 2, i 3, i 4, i 5, i 6i n, with object i 1for reference object, calculate other each object and object i 1between distance, and to sort by the size of distance, obtain the sequence O shown in Fig. 1 1; With object i 2for reference object, calculate the distance between other each object and reference object i2, obtain the sequence O shown in Fig. 1 2.
According to sequence O 1middle object i 1and i 2between neighbor objects at sequence O 2in sequence number, calculating object i 1and i 2between Rank-Order distance D (i 1, i 2), specifically according to the example of Fig. 1, object i 1, i 3, i 4, i 2at O 2in sequence number be respectively 5,2,4,0, then calculate D (i according to formula 1 1, i 2):
D ( i 1 , i 2 ) = Σ x = 0 O 1 ( i 2 ) O 2 ( f 1 ( X ) ) = O 2 ( i 1 ) + O 2 ( i 3 ) + O 2 ( i 4 ) + O 2 ( i 2 ) = 5 + 2 + 4 + 0 = 11 (formula 1)
In formula 1, O 2(i 1) represent object i 1at sequence O 2in sequence number, O 2(i 3) represent object i 3at sequence O 2in sequence number, O 2(i 4) represent object i 4at sequence O 2in sequence number, O 2(i 2) represent object i 2at sequence O 2in sequence number.
In like manner calculate object i 1and i 2between Rank-Order distance D (i 2, i 1), then, calculate object i according to formula 2 1and i 2between Rank-Order distance D after normalization r(i 1, i 2):
D R ( i 1 , i 2 ) = D ( i 1 , i 2 ) + D ( i 2 , i 1 ) min ( O 1 ( i 2 ) , O 2 ( i 1 ) ) (formula 2)
D r(i 1, i 2) represent Rank-Order distance between the object after normalization, between class, Rank-Order distance is identical with the Rank-Order distance algorithm between object, and a class is that then benchmark class resequences to each class according between class distance.
Between class, the computing formula of Rank-Order distance is as shown in Equation 3:
D R ( C i , C j ) = D ( C i , C j ) + D ( C j , C i ) min ( O C i ( C j ) , O C j ( C i ) ) (formula 3)
D (C in formula 3 i, C j) representation class C iwith class C jbetween Rank-Order distance, D (C j, C i) representation class C jwith class C ibetween Rank-Order distance; represent with C ifor class C in the sequence of benchmark class jsequence number, represent with class C jfor class C in the sequence of benchmark class isequence number.
Normalization Rank-Order distance D between class n(C i, C j) computing formula as shown in Equation 4:
D N ( C i , C j ) = 1 φ ( C i , C j ) · d ( C i , C j ) (formula 4)
In formula 4, d (C i, C j) representation class C iwith class C jbetween absolute distance, φ (C i, C j) represent mean distance between two their nearest K objects of class middle distance.
Wherein, φ (C i, C j) calculate according to formula 5:
φ ( C i , C j ) = 1 | C i | + | C j | Σ a ∈ C i ∪ C j 1 K Σ k = 1 K d ( a , f a ( k ) ) (formula 5)
In formula 5, | C i| representation class C iin object number and | C j| representation class C jinterior object number, K is constant, f ak () represents an object a kth neighbor objects.
Wherein, d (C i, C j) calculate according to formula 6, between class herein, absolute distance is class C iin object and class C jin object between smallest object spacing:
d ( C i , C j ) = min d ( a , b ) ∀ a ∈ C i , b ∈ C j (formula 6)
C in formula 6 iand C jrepresentation class, a representation class C iin object, b representation class C jin object.
Suppose object is facial image, and the image belonging to same person can flock together formation cluster by the described clustering method that the disclosure provides.Feature Conversion in facial image is become one group of vector, therefore, the distance between object and the distance between vector.Certainly, the clustering method that the disclosure provides also can be applied to other data.
Fig. 2 is the process flow diagram of a kind of clustering method according to an exemplary embodiment, and the method is applied in terminal, and as shown in Figure 2, the method comprises the following steps:
In S110, obtain any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j).
The present embodiment is that sample data is described with facial image, supposes, N number of facial image quantity, and time initial, using each facial image as an independent class, then time initial, the quantity of class is N number of.Meanwhile, the first distance threshold t and constant K is set.For any class C iand C j, according to above-mentioned formula 1 ~ formula 6, calculate Rank-Order distance D between class r(C i, C j), normalization Rank-Order distance D between class n(C i, C j), and absolute distance d (C between class i, C j).
Alternatively, between the class that the disclosure provides, absolute distance can adopt any one between class between the sign such as minor increment, mean distance class in absolute distance, and disclosure contrast does not limit.
N number of class time initial, the D finally obtained rthe matrix of a N × N, wherein, D rrank-Order distance between the class that in matrix, each vector representation is corresponding, such as, matrix D rin representation class C iand C jbetween Rank-Order distance.The D finally obtained nalso be the matrix of a N × N, wherein, D nvector in matrix representation class C iand C jbetween Rank-Order normalized cumulant.N number of class time initial, calculates the matrix d that absolute distance between class is also N × N, wherein, and the vectorial d in matrix ijrepresentation class C iand C jbetween class between absolute distance.
In S120, for any two the class C in sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class.
In the disclosure one exemplary embodiment, step S120 can comprise step 1) ~ step 3):
Step 1), judge Rank-Order distance D between described class r(C i, C j) whether be less than the first distance threshold, and normalization Rank-Order distance D between described class n(C i, C j) whether be less than 1, and absolute distance d (C between described class i, C j) whether be less than second distance threshold value.
Suppose that the first distance threshold is t 1, second distance threshold value is t 2, wherein, the first distance threshold t 1with second distance threshold value t 2for different data, can be determined by test findings.
Judge following three condition: D r(C i, C j) < t 1, D n(C i, C j) < 1 and d (C i, C j) < t 2whether all set up, if three conditions are all set up, then perform step 2); If have one not meet in three conditions, then perform step 3).
If D r(C i, C j) < t 1, then class C is shown iwith class C jbetween similarity larger; If D n(C i, C j) < 1, then show class C iwith class C jbetween dispersion less; If d is (C i, C j) < t 2, then class C is shown iwith class C jbetween similarity larger.Wherein, D r(C i, C j) and D n(C i, C j) characterize neighborhood between class and class, d (C i, C j) characterize absolute distance between class and class.
Time initial, each object is an independent class, if the absolute distance between two objects is larger, then show that the similarity between two objects is less, a class can not be merged into, be equivalent to the object determining should not be merged in class in such, namely determine outlier, and then outlier is rejected from such, re-start cluster.
Step 2), as described D r(C i, C j) be less than described first distance threshold, and described D n(C i, C j) be less than 1, and described d (C i, C j) when being less than described second distance threshold value, determine class C iwith class C jthat candidate merges class.
For N number of facial image, from Rank-Order Distance matrix D between class rin find out numerical value and be less than the first distance threshold t 1element; And, from normalization Rank-Order Distance matrix D between class nin select the element that numerical value is less than 1, and, from absolute distance matrix d between class, select numerical value be less than second distance threshold value t 2element.The class meeting three conditions is defined as candidate and merges class.
Step 3), when described Rank-Order Distance matrix D r(C i, C j) be not less than described first distance threshold, or, described D n(C i, C j) be not less than 1, or, described d (C i, C j) when being not less than described second distance threshold value, judge whether other any two classes are that candidate merges class, until all sample data all completes judgement.
In S130, as described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class.
After determining that candidates whole in described sample data merges class, merge described candidate between two and merge class, until there is not candidate to merge class.
Such as, class C 1and C 2meet three above-mentioned conditions, C 3and C 4also three above-mentioned conditions are met, then by C 1and C 2merge into a new class, by C 3and C 4merge into the new class of another one.Until all candidate merges after class all merged, the quantity of class after statistics merges, and upgrade the quantity of class.
In S140, when the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns and perform any two class C of acquisition iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
If the quantity of the class after upgrading is less than the quantity upgrading front class, show that within-cluster variance is comparatively large, the object namely in class assembles undertighten, the object that peels off may be there is, need to proceed iteration to merge, find out outlier, until the quantity of class after upgrading is not more than the quantity of the class before renewal.
Class is merged according to above-mentioned three conditions, supposes that before merging, the quantity of class is 6, becomes 4 classes after merging, after then upgrading, the quantity of class is 4, before upgrading, the quantity of class is 6, and the quantity after renewal is less than the quantity before renewal, returns and continues to perform iterative process.
Described cluster result may comprise following several situation: 1), comprises the class of multiple object, and one or more single target; 2) class of multiple object, is comprised; 3), multiple single target is comprised.Wherein, described single target can be called a class.
The clustering method that the present embodiment provides, by Rank-Order distance D between class in sample data r(C i, C j), normalization Rank-Order distance D between class n(C i, C j) and class between absolute distance d (C i, C j) qualified class carries out iteration merging, wherein, by Rank-Order distance D between class r(C i, C j) and class between normalization Rank-Order distance D n(C i, C j) neighborhood between class and class can be measured, by absolute distance d (C between class i, C j) tolerance class and class between absolute distance relation, accurately can measure two direct similarities of class according to absolute distance between class, according to absolute distance between class, class less for similarity be divided out, namely determine the outlier in class.Thus realize in cluster process, outlier less for similarity being weeded out, improve the accuracy rate of cluster result.Especially, when in sample data, data object is more, but belong to of a sort object comparison few time, cluster result accurately higher.
Fig. 3 is a kind of clustering apparatus block diagram according to an exemplary embodiment.As shown in Figure 3, this device comprises: the first acquiring unit 110, second acquisition unit 120, the 3rd acquiring unit 130, first judging unit 140 and iteration merge cells 150.
This first acquiring unit 110 is configured to, and obtains any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j).
This second acquisition unit 120 is configured to, and obtains any two class C in sample data iwith C jbetween class between absolute distance d (C i, C j).
3rd acquiring unit 130 is configured to, and obtains any two class C in sample data iwith C jbetween class between normalization Rank-Order distance D n(C i, C j).
This first judging unit 140 is configured to, for any two the class C in sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class.
In the disclosure one exemplary embodiment, this first judging unit can comprise: the first judgment sub-unit and determine subelement;
This first judgment sub-unit is configured to, and judges Rank-Order distance D between described class r(C i, C j) whether be less than the first distance threshold, and normalization Rank-Order distance D between described class n(C i, C j) whether be less than 1, and absolute distance d (C between described class i, C j) whether be less than second distance threshold value.
This determines that subelement is configured to, as described D r(C i, C j) be less than described first distance threshold, and described D n(C i, C j) be less than 1, and described d (C i, C j) when being less than described second distance threshold value, determine class C iwith class C jthat candidate merges class.
Alternatively, in another exemplary embodiment of the present disclosure, described first judging unit can also comprise the second judgment sub-unit;
This second judgment sub-unit is configured to, as described D r(C i, C j) be not less than described first distance threshold, or, described D n(C i, C j) be not less than 1, or, described d (C i, C j) when being not less than described second distance threshold value, whether any two classes in judgement sample data except the class judged are that candidate merges class, until whole described sample data all completes judgement.
This iteration merge cells 150 is configured to, as described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class, and, when the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns to perform and obtain any two class C iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result, wherein, described cluster result comprises the class and single object that include multiple object.
In the disclosure one exemplary embodiment, described iteration merge cells 150 comprises merging subelement;
This merging subelement is configured to, and after determining that candidates whole in described sample data merges class, merges described candidate between two and merges class, until there is not candidate to merge class.
The clustering apparatus that the present embodiment provides, by Rank-Order distance D between class in sample data r(C i, C j), normalization Rank-Order distance D between class n(C i, C j) and class between absolute distance d (C i, C j) qualified class carries out iteration merging, wherein, by Rank-Order distance D between class r(C i, C j) and class between normalization Rank-Order distance D n(C i, C j) neighborhood between class and class can be measured, by absolute distance d (C between class i, C j) tolerance class and class between absolute distance relation, accurately can measure two direct similarities of class according to absolute distance between class, according to absolute distance between class, class less for similarity be divided out, namely determine the outlier in class.Thus realize in cluster process, outlier less for similarity being weeded out, improve the accuracy rate of cluster result.Especially, when in sample data, data object is more, but belong to of a sort object comparison few time, cluster result accurately higher.
About the device in above-described embodiment, wherein the concrete mode of modules executable operations has been described in detail in about the embodiment of the method, will not elaborate explanation herein.
Fig. 4 is the block diagram of a kind of device 800 for clustering method according to an exemplary embodiment.Such as, device 800 can be mobile phone, computing machine, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc.
As shown in Figure 4, device 800 can comprise following one or more assembly: processing components 802, storer 804, power supply module 806, multimedia groupware 808, audio-frequency assembly 810, the interface 812 of I/O (I/O), sensor module 814, and communications component 816.
The integrated operation of the usual control device 800 of processing components 802, such as with display, call, data communication, camera operation and record operate the operation be associated.Processing components 802 can comprise one or more processor 820 to perform instruction, to complete all or part of step of above-mentioned method.In addition, processing components 802 can comprise one or more module, and what be convenient between processing components 802 and other assemblies is mutual.Such as, processing components 802 can comprise multi-media module, mutual with what facilitate between multimedia groupware 808 and processing components 802.
Storer 804 is configured to store various types of data to be supported in the operation of device 800.The example of these data comprises for any application program of operation on device 800 or the instruction of method, contact data, telephone book data, message, picture, video etc.Storer 804 can be realized by the volatibility of any type or non-volatile memory device or their combination, as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM), programmable read only memory (PROM), ROM (read-only memory) (ROM), magnetic store, flash memory, disk or CD.
The various assemblies that power supply module 806 is device 800 provide electric power.Power supply module 806 can comprise power-supply management system, one or more power supply, and other and the assembly generating, manage and distribute electric power for device 800 and be associated.
Multimedia groupware 808 is included in the screen providing an output interface between described device 800 and user.In certain embodiments, screen can comprise liquid crystal display (LCD) and touch panel (TP).If screen comprises touch panel, screen may be implemented as touch-screen, to receive the input signal from user.Touch panel comprises one or more touch sensor with the gesture on sensing touch, slip and touch panel.Described touch sensor can the border of not only sensing touch or sliding action, but also detects the duration relevant to described touch or slide and pressure.In certain embodiments, multimedia groupware 808 comprises a front-facing camera and/or post-positioned pick-up head.When device 800 is in operator scheme, during as screening-mode or video mode, front-facing camera and/or post-positioned pick-up head can receive outside multi-medium data.Each front-facing camera and post-positioned pick-up head can be fixing optical lens systems or have focal length and optical zoom ability.
Audio-frequency assembly 810 is configured to export and/or input audio signal.Such as, audio-frequency assembly 810 comprises a microphone (MIC), and when device 800 is in operator scheme, during as call model, logging mode and speech recognition mode, microphone is configured to receive external audio signal.The sound signal received can be stored in storer 804 further or be sent via communications component 816.In certain embodiments, audio-frequency assembly 810 also comprises a loudspeaker, for output audio signal.
I/O interface 812 is for providing interface between processing components 802 and peripheral interface module, and above-mentioned peripheral interface module can be keyboard, some striking wheel, button etc.These buttons can include but not limited to: home button, volume button, start button and locking press button.
Sensor module 814 comprises one or more sensor, for providing the state estimation of various aspects for device 800.Such as, sensor module 814 can detect the opening/closing state of device 800, the relative positioning of assembly, such as described assembly is display and the keypad of device 800, the position of all right pick-up unit 800 of sensor module 814 or device 800 1 assemblies changes, the presence or absence that user contacts with device 800, the temperature variation of device 800 orientation or acceleration/deceleration and device 800.Sensor module 814 can comprise proximity transducer, be configured to without any physical contact time detect near the existence of object.Sensor module 814 can also comprise optical sensor, as CMOS or ccd image sensor, for using in imaging applications.In certain embodiments, this sensor module 814 can also comprise acceleration transducer, gyro sensor, Magnetic Sensor, pressure transducer or temperature sensor.
Communications component 816 is configured to the communication being convenient to wired or wireless mode between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, as WiFi, 2G or 3G, or their combination.In one exemplary embodiment, communications component 816 receives from the broadcast singal of external broadcasting management system or broadcast related information via broadcast channel.In one exemplary embodiment, described communications component 816 also comprises near-field communication (NFC) module, to promote junction service.Such as, can based on radio-frequency (RF) identification (RFID) technology in NFC module, Infrared Data Association (IrDA) technology, ultra broadband (UWB) technology, bluetooth (BT) technology and other technologies realize.
In the exemplary embodiment, device 800 can be realized, for performing said method by one or more application specific integrated circuit (ASIC), digital signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD) (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components.
In the exemplary embodiment, additionally provide a kind of non-transitory computer-readable recording medium comprising instruction, such as, comprise the storer 804 of instruction, above-mentioned instruction can perform said method by the processor 820 of device 800.Such as, described non-transitory computer-readable recording medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc.
A kind of non-transitory computer-readable recording medium, when the instruction in described storage medium is performed by the processor of terminal, make terminal can perform a kind of clustering method, described method comprises:
Obtain any two class C in sample data iwith C jbetween class between Rank-Order distance D r(i 1, i 2), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j);
For any two the class C in described sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class;
As described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class;
When the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns and perform any two class C of acquisition iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
Fig. 5 is the block diagram of a kind of device 1900 for clustering method according to an exemplary embodiment.Such as, device 1900 may be provided in a server.As shown in Figure 5, device 1900 comprises processing components 1922, and it comprises one or more processor further, and the memory resource representated by storer 1932, can such as, by the instruction of the execution of processing components 1922, application program for storing.The application program stored in storer 1932 can comprise each module corresponding to one group of instruction one or more.In addition, processing components 1922 is configured to perform instruction, to perform the embodiment of the method shown in above-mentioned Fig. 2.
Device 1900 can also comprise the power management that a power supply module 1926 is configured to actuating unit 1900, and a wired or wireless network interface 1950 is configured to device 1900 to be connected to network, and input and output (I/O) interface 1958.Device 1900 can operate the operating system based on being stored in storer 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
Those skilled in the art, at consideration instructions and after putting into practice invention disclosed herein, will easily expect other embodiment of the present invention.The application is intended to contain any modification of the present invention, purposes or adaptations, and these modification, purposes or adaptations are followed general principle of the present invention and comprised the undocumented common practise in the art of the disclosure or conventional techniques means.Instructions and embodiment are only regarded as exemplary, and true scope of the present invention and spirit are pointed out by claim below.
Should be understood that, the present invention is not limited to precision architecture described above and illustrated in the accompanying drawings, and can carry out various amendment and change not departing from its scope.Scope of the present invention is only limited by appended claim.

Claims (10)

1. a clustering method, is characterized in that, comprising:
Obtain any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j);
For any two the class C in described sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class;
As described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class;
When the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns and perform any two class C of acquisition iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
2. method according to claim 1, is characterized in that, for any two class C iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class, in the following way:
Judge Rank-Order distance D between described class r(C i, C j) whether be less than the first distance threshold, and normalization Rank-Order distance D between described class n(C i, C j) whether be less than 1, and absolute distance d (C between described class i, C j) whether be less than second distance threshold value;
As described D r(C i, C j) be less than described first distance threshold, and described D n(C i, C j) be less than 1, and described d (C i, C j) when being less than described second distance threshold value, determine class C iwith class C jthat candidate merges class.
3. method according to claim 2, is characterized in that, described method also comprises: as described D r(C i, C j) be not less than described first distance threshold, or, described D n(C i, C j) be not less than 1, or, described d (C i, C j) when being not less than described second distance threshold value, judge whether any two classes in described sample data except the class judged are that candidate merges class, until whole described sample data all completes judgement.
4. the method according to any one of claim 1-3, is characterized in that, between described class, absolute distance comprises the minor increment between mean distance between two classes or two classes.
5. method according to claim 1, is characterized in that, described as described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, in the following way:
After determining that candidates whole in described sample data merges class, merge described candidate between two and merge class, until there is not candidate to merge class.
6. a clustering apparatus, is characterized in that, comprising:
First acquiring unit, for obtaining any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j);
Second acquisition unit, for obtaining any two class C in described sample data iwith C jbetween class between absolute distance d (C i, C j);
3rd acquiring unit, for obtaining any two class C in described sample data iwith C jbetween class between normalization Rank-Order distance D n(C i, C j);
First judging unit, for for any two the class C in described sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class;
Iteration merge cells, for working as described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class, and, when the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns to perform and obtain any two class C iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
7. device according to claim 6, is characterized in that, described first judging unit, comprising:
First judgment sub-unit, for judging Rank-Order distance D between described class r(C i, C j) whether be less than the first distance threshold, and normalization Rank-Order distance D between described class n(C i, C j) whether be less than 1, and absolute distance d (C between described class i, C j) whether be less than second distance threshold value;
Determine subelement, for working as described D r(C i, C j) be less than described first distance threshold, and described D n(C i, C j) be less than 1, and described d (C i, C j) when being less than described second distance threshold value, determine class C iwith class C jthat candidate merges class.
8. device according to claim 7, is characterized in that, described first judging unit also comprises:
Second judgment sub-unit, for working as described D r(C i, C j) be not less than described first distance threshold, or, described D n(C i, C j) be not less than 1, or, described d (C i, C j) when being not less than described second distance threshold value, judge whether any two classes in described sample data except the class judged are that candidate merges class, until whole described sample data all completes judgement.
9. device according to claim 6, is characterized in that, described iteration merge cells comprises:
Merge subelement, after determining that candidates whole in described sample data merges class, merge described candidate between two and merge class, until there is not candidate to merge class.
10. a terminal, is characterized in that, comprising:
Processor;
For the storer of storage of processor executable instruction;
Wherein, described processor is configured to:
Obtain any two class C in sample data iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j);
For any two the class C in described sample data iwith C j, according to Rank-Order distance D between described class r(C i, C j), normalization Rank-Order distance D between described class n(C i, C j) and described class between absolute distance d (C i, C j), judge described two class C iwith C jwhether be that candidate merges class;
As described any two class C iwith C jbe candidate when merging class, merge described candidate and merge class, and upgrade the quantity of class;
When the quantity of the class before the number ratio of the class after upgrading upgrades is few, returns and perform any two class C of acquisition iwith C jbetween class between Rank-Order distance D r(C i, C j), absolute distance d (C between class i, C j) and class between normalization Rank-Order distance D n(C i, C j) step, until the quantity of class after upgrading is identical with the quantity of class before renewal, obtain cluster result.
CN201410432412.2A 2014-08-28 2014-08-28 Clustering method and clustering device Pending CN104268149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410432412.2A CN104268149A (en) 2014-08-28 2014-08-28 Clustering method and clustering device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410432412.2A CN104268149A (en) 2014-08-28 2014-08-28 Clustering method and clustering device

Publications (1)

Publication Number Publication Date
CN104268149A true CN104268149A (en) 2015-01-07

Family

ID=52159671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410432412.2A Pending CN104268149A (en) 2014-08-28 2014-08-28 Clustering method and clustering device

Country Status (1)

Country Link
CN (1) CN104268149A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426878A (en) * 2015-12-22 2016-03-23 小米科技有限责任公司 Method and device for face clustering
CN106228188A (en) * 2016-07-22 2016-12-14 北京市商汤科技开发有限公司 Clustering method, device and electronic equipment
CN107679052A (en) * 2016-06-09 2018-02-09 株式会社岛津制作所 Big data analysis method and the mass spectrometry system that make use of the analysis method
CN108763462A (en) * 2018-05-28 2018-11-06 深圳前海微众银行股份有限公司 Update method, equipment and the computer readable storage medium of parallel statement library
CN110414429A (en) * 2019-07-29 2019-11-05 佳都新太科技股份有限公司 Face cluster method, apparatus, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120029454A1 (en) * 2010-07-27 2012-02-02 Wenbin Li Absorbent Articles with Printed Graphics Thereon Providing A Three-Dimensional Appearance
TW201407390A (en) * 2012-08-15 2014-02-16 Acer Inc Data clustering apparatus and method
CN103902689A (en) * 2014-03-26 2014-07-02 小米科技有限责任公司 Clustering method, incremental clustering method and related device
CN103914518A (en) * 2014-03-14 2014-07-09 小米科技有限责任公司 Clustering method and clustering device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120029454A1 (en) * 2010-07-27 2012-02-02 Wenbin Li Absorbent Articles with Printed Graphics Thereon Providing A Three-Dimensional Appearance
TW201407390A (en) * 2012-08-15 2014-02-16 Acer Inc Data clustering apparatus and method
CN103914518A (en) * 2014-03-14 2014-07-09 小米科技有限责任公司 Clustering method and clustering device
CN103902689A (en) * 2014-03-26 2014-07-02 小米科技有限责任公司 Clustering method, incremental clustering method and related device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHUNHUI ZHU ET AL.: "A Rank-Order Distance based Clustering Algorithm for Face Tagging", 《CVPR 2011》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426878A (en) * 2015-12-22 2016-03-23 小米科技有限责任公司 Method and device for face clustering
CN105426878B (en) * 2015-12-22 2019-05-21 小米科技有限责任公司 Face cluster method and device
CN107679052A (en) * 2016-06-09 2018-02-09 株式会社岛津制作所 Big data analysis method and the mass spectrometry system that make use of the analysis method
CN107679052B (en) * 2016-06-09 2021-09-14 株式会社岛津制作所 Big data analysis method and mass spectrometry system using the same
CN106228188A (en) * 2016-07-22 2016-12-14 北京市商汤科技开发有限公司 Clustering method, device and electronic equipment
CN106228188B (en) * 2016-07-22 2020-09-08 北京市商汤科技开发有限公司 Clustering method and device and electronic equipment
US11080306B2 (en) 2016-07-22 2021-08-03 Beijing Sensetime Technology Development Co., Ltd. Method and apparatus and electronic device for clustering
CN108763462A (en) * 2018-05-28 2018-11-06 深圳前海微众银行股份有限公司 Update method, equipment and the computer readable storage medium of parallel statement library
CN108763462B (en) * 2018-05-28 2021-11-12 深圳前海微众银行股份有限公司 Method and device for updating parallel sentence library and computer readable storage medium
CN110414429A (en) * 2019-07-29 2019-11-05 佳都新太科技股份有限公司 Face cluster method, apparatus, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103914518A (en) Clustering method and clustering device
CN104105169B (en) From method and the device of the WLAN (wireless local area network) that is dynamically connected
CN103902689A (en) Clustering method, incremental clustering method and related device
CN104156947A (en) Image segmentation method, mechanism and device
CN105488112A (en) Information pushing method and device
CN105224349A (en) The deletion reminding method of application program and device
CN104408402A (en) Face identification method and apparatus
CN105160320B (en) Fingerprint identification method, device and mobile terminal
CN103944804A (en) Contact recommending method and device
CN104850852A (en) Feature vector calculation method and device
CN104268149A (en) Clustering method and clustering device
CN105069089A (en) Picture detection method and device
CN105279499A (en) Age recognition method and device
CN104408404A (en) Face identification method and apparatus
CN105808050A (en) Information search method and device
CN105426878A (en) Method and device for face clustering
CN104268129A (en) Message reply method and message reply device
CN104537380A (en) Clustering method and device
CN104461568A (en) Electronic accessory recognition device and method
CN103927545A (en) Clustering method and device
CN104615663A (en) File sorting method and device and terminal
CN104077563A (en) Human face recognition method and device
CN103886284A (en) Character attribute information identification method and device and electronic device
CN105335684A (en) Face detection method and device
CN104598534A (en) Picture folding method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150107

RJ01 Rejection of invention patent application after publication