CN114399653A - Fast multi-view discrete clustering method and system based on anchor point diagram - Google Patents

Fast multi-view discrete clustering method and system based on anchor point diagram Download PDF

Info

Publication number
CN114399653A
CN114399653A CN202111452689.8A CN202111452689A CN114399653A CN 114399653 A CN114399653 A CN 114399653A CN 202111452689 A CN202111452689 A CN 202111452689A CN 114399653 A CN114399653 A CN 114399653A
Authority
CN
China
Prior art keywords
anchor point
image
clustering
view
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111452689.8A
Other languages
Chinese (zh)
Inventor
张斌
强倩瑶
王飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202111452689.8A priority Critical patent/CN114399653A/en
Publication of CN114399653A publication Critical patent/CN114399653A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Computing Systems (AREA)

Abstract

The invention discloses a rapid multi-view discrete clustering method and system based on an anchor point diagram, and belongs to the field of multi-view images. The invention relates to a rapid multi-view discrete clustering method based on an anchor point graph, which selects a small part of representative images from all original multi-view images as anchor point images, and calculates a similarity graph by constructing anchor point graphs between all the original images and the anchor point images on the characteristics of all the images; distributing weights to anchor point graphs constructed on different characteristics in the multi-view image clustering process so as to measure the contribution of the different characteristics to a multi-view clustering task; and solving the original problem of spectral clustering and directly calculating a discrete multi-view image clustering indication matrix. The method solves the problem of high complexity of similarity graph construction and graph Laplace matrix characteristic decomposition operation calculation.

Description

Fast multi-view discrete clustering method and system based on anchor point diagram
Technical Field
The invention belongs to the field of multi-view images, and particularly relates to a rapid multi-view discrete clustering method and system based on an anchor point diagram.
Background
Multi-view clustering is an important technique in multi-view image analysis. A multi-view image refers to image data that is represented using a variety of different features. As an efficient method for solving the problem of multi-view image clustering, the multi-view clustering method based on the graph is widely researched and applied in the field of computer vision.
Most of the existing multi-view clustering methods based on graphs comprise two steps: (1) constructing a similarity graph based on the characteristics of each view; (2) and (3) performing characteristic decomposition operation on the graph Laplace matrix, calculating a continuous clustering distribution matrix, and then obtaining a discrete clustering distribution matrix through a post-processing algorithm such as k-means or spectrum rotation. Although these methods work well, there are still two drawbacks to solving the multi-view image clustering problem. The first disadvantage is the expensive time cost. In general, in an actual multi-view image clustering task, the sample size is very large. The time complexity of the construction and feature decomposition operations of the traditional similarity graph is O (n) respectively2d) And O (n)3) Where n represents the data size and d represents the feature dimension. Such computational overhead is unacceptable for large-scale multi-view image clustering tasks. Another disadvantage is that obtaining an interpretable discrete image cluster assignment matrix through a two-stage process may cause loss of multi-view image feature information. When multi-view image clustering is carried out, the clustering process deviates from the direct solution of the original multi-view image clustering problem.
Many graph-based multi-view clustering approaches are devoted to speeding up similarity graph construction or feature decomposition. For example, constructing an anchor point map or sparse map, approximate feature decomposition operations, and the like accelerates the similarity map construction process. In general, anchor points may be generated by random sampling or lightweight clustering methods (e.g., k-means). When the method is applied to multi-view image clustering, a random sampling method randomly selects a few images from all original images as anchor point images; and clustering all original images by using a k-means method, and selecting an image in a clustering center as an anchor image. Obviously, anchor images generated by the k-means clustering method are more representative than anchor images selected by the random sampling method. However, the k-means method is very time consuming compared to the random sampling method and may generate an unbalanced number of image cluster clusters. When a Balanced k-means-based Hierarchical k-means method (BKKHK) is used, firstly, all images are divided into two image sub-cluster clusters with Balanced quantity, then the sub-cluster clusters are continuously layered to perform Balanced k-means until leaf image cluster clusters with the same quantity as required anchor point images are finally obtained, and finally, class center images of all the leaf clusters are used as anchor points. BKHK has been successfully used to accelerate graph-based methods including hashing, clustering, dimensionality reduction, classification, and the like. However, the k-means algorithm still has the defects that: the random selection of the sample as the initial clustering center causes the clustering result to have instability and the convergence speed to be slow. Among the methods for reducing the time consumption of the feature decomposition operation, approximate feature decomposition is most commonly used. Approximate feature decomposition typically uses random projection and finite-order polynomial expansion methods to compute pseudo-feature vectors, or avoids direct feature decomposition operations by using truncated power iterations, but this technique is ineffective for processing large multi-view image datasets. In addition, in the existing graph-based multi-view clustering method for multi-view image clustering, directly solving the discrete image clustering label matrix without any post-processing steps is yet to be explored.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method and a system for fast multi-view discrete clustering based on an anchor point diagram.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a fast multi-view discrete clustering method based on an anchor point diagram comprises the following steps:
step 1: generating m anchor point images for n original images based on a k-means + + balanced hierarchy k-means + + method, wherein m is smaller than n;
step 2: constructing an anchor point image between the original image and the anchor point image on each view characteristic;
and step 3: calculating a similarity matrix between original images based on the anchor point images;
and 4, step 4: distributing a weight coefficient for the similarity graph on each feature to obtain an aggregation graph;
and 5: taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target, and constructing an FMDC target function;
step 6: solving the target function of the FMDC by utilizing the original problem of solving the spectral clustering, outputting a discrete multi-view image clustering indication matrix and finishing a multi-view image clustering task.
Further, the anchor point image generation process in step 1 is as follows:
dividing all original images into two image sub-clusters with balanced quantity by using k-means + +, then continuing to perform balanced k-means + +, until m leaf clusters are finally obtained, and taking the class center images of the m leaf clusters as anchor images.
Further, in step 1, a multi-view image data set { X ] represented by four features of CENTRIST, wavelet momentums, Gabor and LBP is obtained1,X2,X3,X4};
Stitching data [ X ] using four kinds of feature data1;X2;X3;X4]Generating anchor stitching feature data U1;U2;U3;U4](ii) a And then according to the dimension d of the image features in the four views1、d2、d3、d4Cutting [ U ]1;U2;U3;U4]Obtaining a multi-view anchor image set { U1,U2,U3,U4};
Wherein the content of the first and second substances,
Figure BDA0003385611030000031
an anchor point representing the feature of centrrist,
Figure BDA0003385611030000032
a CENTRIST feature representing the jth anchor image;
Figure BDA0003385611030000033
an anchor point representing a wavelet moments feature,
Figure BDA0003385611030000034
representing the wavelet momentos characteristic of the jth anchor point image;
Figure BDA0003385611030000035
an anchor point representing a Gabor feature,
Figure BDA0003385611030000036
gabor characteristics of the j-th anchor point image;
Figure BDA0003385611030000037
an anchor point representing the characteristics of the LBP,
Figure BDA0003385611030000038
representing LBP characteristics of the jth anchor point image;
Figure BDA0003385611030000041
representing the centrrist feature, n is the number of all images contained in the dataset,
Figure BDA0003385611030000042
the CENTRIST feature representing the ith image, i e 1, …, n, d1A dimension representing a centrrist feature;
Figure BDA0003385611030000043
representing the features of wavelet movements,
Figure BDA0003385611030000044
represent the wavelet contributions of the ith image, d2A dimension representing a wavelet moments feature;
Figure BDA0003385611030000045
the characteristics of the Gabor are shown,
Figure BDA0003385611030000046
gabor feature representing the ith image, d3To representThe dimensions of the Gabor features;
Figure BDA0003385611030000047
the characteristics of the LBP are represented,
Figure BDA0003385611030000048
representing the LBP feature of the i-th image, d4Representing the dimensions of the LBP features.
Further, step 2 specifically comprises:
constructing anchor point maps on the v-th features respectively by minimizing equation (1),
Figure BDA0003385611030000049
Figure BDA00033856110300000410
Figure BDA00033856110300000411
in the formula (I), the compound is shown in the specification,
Figure BDA00033856110300000412
||·||2represents the norm of l-2,
Figure BDA00033856110300000413
representing the similarity between the ith original image and the jth anchor image in the vth feature,
Figure BDA00033856110300000414
representing similarity vectors of the ith image and all m anchor point images in the v characteristic; gamma denotes a regularization parameter which is,
Figure BDA00033856110300000415
the solution to the problem in equation (1) is:
Figure BDA00033856110300000416
anchor point diagram ZvAre k neighbors, i.e.
Figure BDA00033856110300000417
Contains k non-zero elements.
Further, step 3 specifically comprises:
calculating to obtain an anchor point diagram ZvThen, a similarity map on the v-th feature is calculated
Figure BDA00033856110300000418
Sv=Zvv)-1(Zv)T=Bv(Bv)T (18)
In the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000051
is a diagonal matrix with the jth diagonal element of
Figure BDA0003385611030000052
Similarity map S calculated by equation (3)vIs symmetric double random.
Further, step 4 specifically includes:
for similarity maps S on each featurevAssigning a weight coefficient αv
Figure BDA0003385611030000053
In the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000054
is a weight vector;
Figure BDA0003385611030000055
is a conglomerate plot, is symmetric double random.
Further, step 5 specifically comprises:
to minimize the aggregate map
Figure BDA0003385611030000056
Similarity graph Y (Y) reconstructed by using image clustering label matrixTY)-1YTThe difference between them is the optimization goal, the objective function of FMDC is constructed:
Figure BDA0003385611030000057
in the formula, | · the luminance | |FRepresents l-F norm, Y ═ Y1,…,yc]∈{0,1}n×cIs a discrete image clustering indication matrix, c represents the number of categories, yl(l ∈ 1, …, c) is the clustering indication matrix for the l-th image class; ind is an abbreviation for discrete image cluster indication matrix.
Further, the solution process of the objective function of FMDC in step 6 is:
(601) fixing alpha and updating Y;
when α is fixed, the sub-problem for solving Y is:
Figure BDA0003385611030000058
in the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000059
solving this problem is equivalent to solving the problem in the following equation:
Figure BDA00033856110300000510
equation (7) is written in another form:
Figure BDA0003385611030000061
Figure BDA0003385611030000062
updating Y according to rows by adopting a coordinate descending method for all rows containing Y;
(602) fixing Y, and updating alpha;
when Y is fixed, the sub-problem of α is solved as:
Figure BDA0003385611030000063
in the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000064
will be provided with
Figure BDA0003385611030000065
And SvWriting into a large vector form, order
Figure BDA0003385611030000066
Then there are:
Figure BDA0003385611030000067
solving the formula (15) by using an augmented Lagrange multiplier method;
(603) loops (601) and (602) alternately optimize α and Y until the objective function converges to a minimum.
Further, in (601), Y is updated by a coordinate descent method in rows, specifically:
when updated to the ith row Y of YiTo determine yiComparison of yiAre respectively [1,0, …,0]—[0,0,…,1]And c, selecting the value which maximizes the objective function as yiThe optimal solution of (2);
will yiTaking Y when different values in c as { Y(1),…,Y(c)The objective function value of the h-th value is:
Figure BDA0003385611030000068
thus, the problem solved is:
Figure BDA0003385611030000069
definition of Y(0)I-th behavior [0,0, …,0 ]]I.e., all 0 s, then:
Figure BDA00033856110300000610
definition of Δ (h) ═ obj (Y)(h))-obj(Y(0)) Solving the problem in equation (10) is equivalent to solving the following problem:
Figure BDA0003385611030000071
at this time, an optimal solution is obtained:
Figure BDA0003385611030000072
wherein if < > is true, then yip1, otherwise yip=0。
A fast multi-view discrete clustering system based on an anchor point diagram comprises an anchor point image generation module, an anchor point diagram construction module, a similarity matrix generation module, a weight distribution module, an FMDC construction target function module and an FMDC solving target function module;
the anchor point image generation module is used for generating m representative anchor point images for the n original images based on a balanced k-means + + hierarchical k-means + + method;
the anchor point diagram constructing module is used for constructing an anchor point diagram between the original image and the anchor point image on each view characteristic;
the similarity matrix generation module is used for calculating a similarity matrix between original images based on the anchor point images;
the weight distribution module is used for distributing weight coefficients to the similarity graphs on the characteristics to obtain an aggregation graph;
the FMDC constructing target function module is used for constructing a FMDC target function by taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target;
and the FMDC solving target function module is used for solving the FMDC target function, outputting a discrete multi-view image clustering indication matrix and completing a multi-view image clustering task.
Compared with the prior art, the invention has the following beneficial effects:
the invention relates to a rapid multi-view discrete clustering method and a rapid multi-view discrete clustering system based on an anchor point graph, wherein a small part of representative images are selected from all original multi-view images as anchor point images, and a similarity graph is calculated by constructing anchor point graphs between all the original images and the anchor point images on the characteristics of all the images; distributing weights to anchor point graphs constructed on different characteristics in the multi-view image clustering process so as to measure the contribution of the different characteristics to a multi-view clustering task; and solving the original problem of spectral clustering and directly calculating a discrete multi-view image clustering indication matrix. The invention solves two main problems existing when the multi-view image clustering problem is solved by using the current multi-view clustering method based on the graph: (1) the calculation complexity of the similarity graph construction and the graph Laplace matrix characteristic decomposition operation is high; (2) a two-stage solving method of firstly calculating a continuous image clustering distribution matrix and then obtaining a final discrete image clustering distribution matrix through a post-processing step can cause loss of multi-view image characteristic information, so that the multi-view image clustering process deviates from the original multi-view image clustering problem and is directly solved. The invention can complete the multi-view image clustering task with high quality on the premise of reducing the computational complexity of the graph-based clustering method.
Furthermore, an anchor point diagram is constructed by adopting a parameter-free strategy, a small number of representative anchor point images are generated for the images on all the views, and then a similarity diagram is calculated.
Furthermore, consistency of the anchor point image among different features of multiple views is guaranteed.
Furthermore, the invention designs an efficient solving method, and a discrete clustering indication matrix is directly obtained by clustering calculation for the multi-view images.
Further, the objective function of the FMDC of the present invention is very compact without introducing any additional terms and without adjusting any equalization parameters.
Drawings
FIG. 1 is a diagram of the anchor point selection architecture of the present invention;
FIG. 2 is a flowchart of the fast multi-view discrete clustering method based on the anchor point map of embodiment 1;
fig. 3 is a schematic block diagram of the anchor point diagram-based fast multi-view discrete clustering system according to embodiment 2.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention aims to provide a Fast Multi-view Discrete Clustering method and a Fast Multi-view Discrete Clustering system (FMDC) based on an Anchor point diagram, which are used for solving two main problems when the Multi-view image Clustering problem is solved by using the current Multi-view Clustering method based on a graph: (1) the calculation complexity of the similarity graph construction and the graph Laplace matrix characteristic decomposition operation is high; (2) a two-stage solving method of firstly calculating a continuous image clustering distribution matrix and then obtaining a final discrete image clustering distribution matrix through a post-processing step can cause loss of multi-view image characteristic information, so that the multi-view image clustering process deviates from the original multi-view image clustering problem and is directly solved. The invention relates to a rapid multi-view discrete clustering method based on an anchor point graph, which selects a small part of representative images from all original multi-view images as anchor point images, and calculates a similarity graph by constructing anchor point graphs between all the original images and the anchor point images on the characteristics of all the images; automatically distributing weights to anchor point graphs constructed on different characteristics in the multi-view image clustering process, and measuring the contribution of the different characteristics to a multi-view clustering task; and solving the original problem of spectral clustering and directly calculating a discrete multi-view image clustering indication matrix.
The invention is described in further detail below with reference to the accompanying drawings:
the invention relates to a rapid multi-view discrete clustering method based on an anchor point diagram, which is to use four-view image data { X) expressed by using four characteristics of CENTRIST, wavelet moments, Gabor and LBP1,X2,X3,X4As input, anchor point image data set { U } is efficiently selected1,U2,U3,U4And constructing an anchor point diagram { Z }1,Z2,Z3,Z4Automatically weighting the similarity matrix S on all features1,S2,S3,S4And constructing a multi-view clustering model based on the graph, and directly solving the multi-view image clustering problem. Pass-through mechanismAnd (3) calculating a symmetrical double-random similarity graph by using the artificial anchor graph, further solving the original problem of spectral clustering, and directly calculating a discrete multi-view image clustering indication matrix. The method can reduce time complexity and space complexity, ensure the clustering performance of the multi-view images, and realize discrete clustering in rapid multi-view image analysis.
Example 1
First a mathematical symbol is defined representing a multi-view image. Defining a four-view image dataset { X } represented using four features of CENTRIST, wavelet momentums, Gabor and LBP1,X2,X3,X4}. Wherein the content of the first and second substances,
Figure BDA0003385611030000101
(n is the number of all images contained in the dataset) represents the centrrist feature,
Figure BDA0003385611030000102
representing the CENTRIST feature of the i-th image, d1A dimension representing a centrrist feature;
Figure BDA0003385611030000103
representing the features of wavelet movements,
Figure BDA0003385611030000104
represent the wavelet contributions of the ith image, d2A dimension representing a wavelet moments feature;
Figure BDA0003385611030000105
the characteristics of the Gabor are shown,
Figure BDA0003385611030000106
gabor feature representing the ith image, d3Dimensions representing Gabor features;
Figure BDA0003385611030000107
the characteristics of the LBP are represented,
Figure BDA0003385611030000108
representing the LBP feature of the i-th image, d4Representing the dimensions of the LBP features.
Referring to fig. 2, fig. 2 is a flow chart of embodiment 1:
step 1: m (m < n) representative anchor point images are generated for n original images. FMDC uses a balanced k-means + + based hierarchical k-means + + method to generate anchor images for images in all views. Referring to fig. 1, fig. 1 is a structure diagram of anchor point selection in the present invention, which hierarchically clusters samples into subclasses with a balanced sample number by using k-means + +, and finally obtains the center of a leaf cluster as an anchor point. A balanced k-means + + based hierarchical k-means + + method uses k-means + + to firstly divide all original images into two image sub-clusters with balanced quantity, then the balanced k-means + + is continuously executed in a hierarchical mode on the image sub-clusters until m leaf clusters are finally obtained, and class center images of the m leaf clusters are selected as anchor images. To ensure consistency of the anchor image across different features of multiple views, FMDC first uses a concatenation of four feature data [ X ]1;X2;X3;X4]Generating anchor stitching feature data U1;U2;U3;U4](ii) a And then according to the dimension d of the image features in the four views1、d2、d3、d4Cutting [ U ]1;U2;U3;U4]Obtaining a multi-view anchor image set { U1,U2,U3,U4}. Wherein the content of the first and second substances,
Figure BDA0003385611030000111
an anchor point representing the feature of centrrist,
Figure BDA0003385611030000112
a CENTRIST feature representing the jth anchor image;
Figure BDA0003385611030000113
anchor points representing wavelet moments features,
Figure BDA0003385611030000114
representing the wavelet momentos characteristic of the jth anchor point image;
Figure BDA0003385611030000115
an anchor point representing a Gabor feature,
Figure BDA0003385611030000116
gabor characteristics of the j-th anchor point image;
Figure BDA0003385611030000117
an anchor point representing the characteristics of the LBP,
Figure BDA0003385611030000118
the LBP feature of the j-th anchor point image is shown.
Step 2: and constructing an anchor point image between the original image and the anchor point image. Constructing anchor point maps on the v (v epsilon 1,2,3,4) th features respectively by minimizing the following problem
Figure BDA0003385611030000119
Figure BDA00033856110300001110
In the formula (I), the compound is shown in the specification,
Figure BDA00033856110300001111
||·||2represents the norm of l-2,
Figure BDA00033856110300001112
representing the similarity between the ith original image and the jth anchor image in the vth feature,
Figure BDA00033856110300001113
and representing the similarity vectors of the ith image and all the m anchor point images in the v characteristic. Gamma denotes a regularization parameter, which can be set to
Figure BDA0003385611030000121
The solution to the problem in equation (1) is:
Figure BDA0003385611030000122
anchor point diagram ZvAre k neighbors, i.e.
Figure BDA0003385611030000123
Contains k non-zero elements.
And step 3: a similarity matrix between the original images is calculated. Calculating to obtain an anchor point diagram ZvThen, a similarity map on the v-th feature is calculated
Figure BDA0003385611030000124
Sv=Zvv)-1(Zv)T=Bv(Bv)T (33)
In the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000125
is a diagonal matrix with the jth diagonal element of
Figure BDA0003385611030000126
Similarity map S calculated by equation (3)vIs symmetric double random.
And 4, step 4: and fusing similarity graphs on a plurality of features. The invention automatically applies a similarity map S to each feature in consideration of the diversity contribution of different image featuresvAssigning a weight coefficient αv
Figure BDA0003385611030000127
In the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000128
is a weight vector. Can be combined with
Figure BDA0003385611030000129
Is regarded as an aggregation graph.
Figure BDA00033856110300001210
And is also symmetric dual random.
And 5: to minimize the aggregate map
Figure BDA00033856110300001211
Similarity graph Y (Y) reconstructed by using image clustering label matrixTY)-1YTThe difference between them is the optimization goal, the objective function of FMDC is constructed:
Figure BDA00033856110300001212
in the formula, | · the luminance | |FRepresents l-F norm, Y ═ Y1,…,yc]∈{0,1}n×cIs a discrete image clustering indication matrix, c represents the number of categories, yl(l ∈ 1, …, c) is the clustering indication matrix for the l-th image class. Ind is an abbreviation for discrete image cluster indication matrix. The objective function in equation (5) smoothes the weight distribution, and avoids the trivial solution of α, i.e. the case where only one similarity graph has a weight of 1 and the weights of the other graphs have weights of 0.
Step 6: the design optimization method solves the objective function of FMDC:
the invention designs an iterative alternation optimization method to solve the objective function in the formula (5), and the specific contents are as follows:
(601) fix α and update Y.
When α is fixed, the sub-problem for solving Y is:
Figure BDA0003385611030000131
in the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000132
solving this problem is equivalent to solvingProblems in the formula:
Figure BDA0003385611030000133
equation (7) can be written in another form:
Figure BDA0003385611030000134
due to the fact that
Figure BDA0003385611030000135
All rows of Y are included, and the invention updates Y by row using a coordinate descent method.
Specifically, suppose Y is updated to the ith row of Yi. To determine yiComparison of yiAre respectively [1,0, …,0]—[0,0,…,1]And c, selecting the value which maximizes the objective function as yiThe optimal solution of (1). For convenience, let yiTaking Y when different values in c as { Y(1),…,Y(c)The objective function value of the h (h epsilon 1, …, c) th value is:
Figure BDA0003385611030000136
therefore, the problem to be solved is:
Figure BDA0003385611030000137
definition of Y(0)I-th behavior [0,0, …,0 ]]I.e., all 0 s, then:
Figure BDA0003385611030000138
definition of Δ (h) ═ obj (Y)(h))-obj(Y(0)) Solving the problems in the formula (10), etcIt is worth solving the following problem:
Figure BDA0003385611030000141
at this time, an optimal solution is obtained:
Figure BDA0003385611030000142
wherein if < > is true, then yip1, otherwise yip=0。
(602) Y is fixed and α is updated.
When Y is fixed, the sub-problem of α is solved as:
Figure BDA0003385611030000143
in the formula (I), the compound is shown in the specification,
Figure BDA0003385611030000144
will be provided with
Figure BDA0003385611030000145
And SvWriting into a large vector form, order
Figure BDA0003385611030000146
Then there are:
Figure BDA0003385611030000147
the problem in equation (15) is a standard quadratic programming problem that can be solved directly by the augmented lagrange multiplier method.
(603) The loop (601), (602) alternately optimizes α and Y until the objective function converges to a minimum.
And 7: and outputting a discrete multi-view image clustering indication matrix Y, and calculating clusters ACC (accuracy rate), NMI (normalized mutual information) and Purity.
The rapid multi-view discrete clustering method based on the anchor point diagram is utilized to carry out simulation experiments.
15883 images from class 10 were classified using the NUS-WIDE dataset in this example. Each image is represented by five low-level features: 64-dimensional color histograms (color histograms), 144-dimensional color correlation maps (color histograms), 73-dimensional edge direction histograms (edge direction histograms), and 128-dimensional wavelet textures (wavelet textures).
In this example, MLHR (Multi-feature left vertical regression), smgi (sparse Multiple Graph integration), AMGL (Auto-weighted Multiple Graph Learning), MLAN (Multi-view left Adaptive Neighbors), FMSSL-K (FMSSL that uses K-means to select an anchor), FMSSL-R (FMSSL that uses random means to select an anchor) and FMSSL are used for clustering, and ACC (accuracy), NMI (normalized mutual information) and Purity are used, and the comparison results are shown in the following table 1 and the comparison results of the running times are shown in the following table 2.
TABLE 1 comparative results
Evaluation index Cotrain CoregSC SwMC MVGL MLAN OMSC LMVSC GMC FMDC
ACC 0.5845 0.5926 0.5760 0.7335 0.5886 0.5005 0.4669 0.7437 0.7389
NMI 0.5112 0.5057 0.5817 0.6152 0.6168 0.3927 0.4391 0.6339 0.6387
Purity 0.5761 0.5925 0.5782 0.7335 0.6302 0.5011 0.5987 0.7437 0.7392
TABLE 2 run time comparison results (units: seconds)
Method MNIST
Cotrain 6756.06
CoregSC 2599.33
SwMC 96021.83
MVGL 34901.54
MLAN 1249.61
LMVSC 3926.07
GMC 1268.18
FMDC 263.82
As can be seen from tables 1 and 2, the clustering performance and run time of the present invention are superior to other comparative methods. The effectiveness of the invention can be verified through the simulation experiment.
Example 2
A fast multi-view discrete clustering system based on an anchor point diagram is applied to a fast multi-view discrete clustering method based on the anchor point diagram, referring to FIG. 3, FIG. 3 is a schematic block diagram of the fast multi-view discrete clustering system based on the anchor point diagram, the fast multi-view discrete clustering system based on the anchor point diagram comprises an anchor point image generation module, an anchor point diagram construction module, a similarity matrix generation module, a weight distribution module, an object function module for constructing FMDC and an object function module for solving FMDC;
the anchor point image generation module is used for generating m representative anchor point images for the n original images based on a balanced k-means + + hierarchical k-means + + method;
the anchor point diagram constructing module is used for constructing an anchor point diagram between the original image and the anchor point image on each view characteristic;
the similarity matrix generation module is used for calculating a similarity matrix between original images based on the anchor point images;
the weight distribution module is used for distributing weight coefficients to the similarity graphs on the characteristics to obtain an aggregation graph;
the FMDC constructing target function module is used for constructing a FMDC target function by taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target;
and the FMDC solving target function module is used for solving the FMDC target function, outputting a discrete multi-view image clustering indication matrix and completing a multi-view image clustering task. The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (10)

1. A fast multi-view discrete clustering method based on an anchor point diagram is characterized by comprising the following steps:
step 1: generating m anchor point images for n original images based on a k-means + + balanced hierarchy k-means + + method, wherein m is smaller than n;
step 2: constructing an anchor point image between the original image and the anchor point image on each view characteristic;
and step 3: calculating a similarity matrix between original images based on the anchor point images;
and 4, step 4: distributing a weight coefficient for the similarity graph on each feature to obtain an aggregation graph;
and 5: taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target, and constructing an FMDC target function;
step 6: solving the target function of the FMDC by utilizing the original problem of solving the spectral clustering, outputting a discrete multi-view image clustering indication matrix and finishing a multi-view image clustering task.
2. The anchor point diagram-based fast multi-view discrete clustering method according to claim 1, wherein the anchor point image generation process in step 1 is:
dividing all original images into two image sub-clusters with balanced quantity by using k-means + +, then continuing to perform balanced k-means + +, until m leaf clusters are finally obtained, and taking the class center images of the m leaf clusters as anchor images.
3. The anchor point diagram-based fast multi-view discrete clustering method according to claim 2, wherein the multi-view image dataset { X) represented by four features of CENTRIST, wavelet clusters, Gabor and LBP is obtained in step 11,X2,X3,X4};
Stitching data [ X ] using four kinds of feature data1;X2;X3;X4]Generating anchor stitching feature data U1;U2;U3;U4](ii) a And then according to the dimension d of the image features in the four views1、d2、d3、d4Cutting [ U ]1;U2;U3;U4]Obtaining a multi-view anchor image set { U1,U2,U3,U4};
Wherein the content of the first and second substances,
Figure FDA0003385611020000021
an anchor point representing the feature of centrrist,
Figure FDA0003385611020000022
a CENTRIST feature representing the jth anchor image;
Figure FDA0003385611020000023
an anchor point representing a wavelet moments feature,
Figure FDA0003385611020000024
representing the wavelet momentos characteristic of the jth anchor point image;
Figure FDA0003385611020000025
an anchor point representing a Gabor feature,
Figure FDA0003385611020000026
gabor characteristics of the j-th anchor point image;
Figure FDA0003385611020000027
an anchor point representing the characteristics of the LBP,
Figure FDA0003385611020000028
representing LBP characteristics of the jth anchor point image;
Figure FDA0003385611020000029
representing the centrrist feature, n is the number of all images contained in the dataset,
Figure FDA00033856110200000210
the CENTRIST feature representing the ith image, i e 1, …, n, d1A dimension representing a centrrist feature;
Figure FDA00033856110200000211
representing the features of wavelet movements,
Figure FDA00033856110200000212
represent the wavelet contributions of the ith image, d2A dimension representing a wavelet moments feature;
Figure FDA00033856110200000213
the characteristics of the Gabor are shown,
Figure FDA00033856110200000214
gabor feature representing the ith image, d3Dimensions representing Gabor features;
Figure FDA00033856110200000215
the characteristics of the LBP are represented,
Figure FDA00033856110200000216
representing the LBP feature of the i-th image, d4Representing the dimensions of the LBP features.
4. The anchor point diagram-based fast multi-view discrete clustering method according to claim 3, wherein the step 2 specifically comprises:
constructing anchor point maps on the v-th features respectively by minimizing equation (1),
Figure FDA00033856110200000217
Figure FDA00033856110200000218
Figure FDA00033856110200000219
in the formula (I), the compound is shown in the specification,
Figure FDA00033856110200000220
||·||2represents the norm of l-2,
Figure FDA00033856110200000221
representing the similarity between the ith original image and the jth anchor image in the vth feature,
Figure FDA00033856110200000222
representing similarity vectors of the ith image and all m anchor point images in the v characteristic; gamma denotes a regularization parameter which is,
Figure FDA00033856110200000223
the solution to the problem in equation (1) is:
Figure FDA0003385611020000031
anchor point diagram ZvAre k neighbors, i.e.
Figure FDA0003385611020000032
Contains k non-zero elements.
5. The anchor point diagram-based fast multi-view discrete clustering method according to claim 4, wherein the step 3 specifically comprises:
calculating to obtain an anchor point diagram ZvThen, a similarity map on the v-th feature is calculated
Figure FDA0003385611020000033
Sv=Zvv)-1(Zv)T=Bv(Bv)T (3)
In the formula (I), the compound is shown in the specification,
Figure FDA0003385611020000034
Figure FDA0003385611020000035
is a diagonal matrix with the jth diagonal element of
Figure FDA0003385611020000036
Similarity map S calculated by equation (3)vIs symmetric double random.
6. The anchor point diagram-based fast multi-view discrete clustering method according to claim 5, wherein the step 4 specifically comprises:
for similarity maps S on each featurevAssigning a weight coefficient αv
Figure FDA0003385611020000037
In the formula (I), the compound is shown in the specification,
Figure FDA0003385611020000038
is a weight vector;
Figure FDA0003385611020000039
is a conglomerate plot, is symmetric double random.
7. The anchor point diagram-based fast multi-view discrete clustering method according to claim 6, wherein the step 5 specifically comprises:
to minimize the aggregate map
Figure FDA00033856110200000310
Similarity graph Y (Y) reconstructed by using image clustering label matrixTY)-1YTThe difference between them is the optimization goal, the objective function of FMDC is constructed:
Figure FDA00033856110200000311
in the formula, | · the luminance | |FRepresents l-F norm, Y ═ Y1,…,yc]∈{0,1}n×cIs a discrete image clustering indication matrix, c represents the number of categories, yl(l ∈ 1, …, c) is the clustering indication matrix for the l-th image class; ind is an abbreviation for discrete image cluster indication matrix.
8. The anchor point diagram-based fast multi-view discrete clustering method according to claim 7, wherein the solution process of the objective function of FMDC in step 6 is:
(601) fixing alpha and updating Y;
when α is fixed, the sub-problem for solving Y is:
Figure FDA0003385611020000041
in the formula (I), the compound is shown in the specification,
Figure FDA0003385611020000042
solving this problem is equivalent to solving the problem in the following equation:
Figure FDA0003385611020000043
equation (7) is written in another form:
Figure FDA0003385611020000044
Figure FDA0003385611020000045
updating Y according to rows by adopting a coordinate descending method for all rows containing Y;
(602) fixing Y, and updating alpha;
when Y is fixed, the sub-problem of α is solved as:
Figure FDA0003385611020000046
in the formula (I), the compound is shown in the specification,
Figure FDA0003385611020000047
will be provided with
Figure FDA0003385611020000048
And SvWriting into a large vector form, order
Figure FDA0003385611020000049
Then there are:
Figure FDA00033856110200000410
solving the formula (15) by using an augmented Lagrange multiplier method;
(603) loops (601) and (602) alternately optimize α and Y until the objective function converges to a minimum.
9. The anchor point diagram-based fast multi-view discrete clustering method according to claim 8, wherein Y is updated by rows in (601) by using a coordinate descent method, specifically:
when updated to the ith row Y of YiTo determine yiComparison of yiAre respectively [1,0, …,0]—[0,0,…,1]And c, selecting the value which maximizes the objective function as yiThe optimal solution of (2);
will yiTaking Y when different values in c as { Y(1),…,Y(c)The objective function value of the h-th value is:
Figure FDA0003385611020000051
thus, the problem solved is:
Figure FDA0003385611020000052
definition of Y(0)I-th behavior [0,0, …,0 ]]I.e., all 0 s, then:
Figure FDA0003385611020000053
definition of Δ (h) ═ obj (Y)(h))-obj(Y(0)) Solving the problem in equation (10) is equivalent to solving the following problem:
Figure FDA0003385611020000054
at this time, an optimal solution is obtained:
Figure FDA0003385611020000055
wherein if < > is true, then yip1, otherwise yip=0。
10. A fast multi-view discrete clustering system based on an anchor point diagram is characterized by comprising an anchor point image generation module, an anchor point diagram construction module, a similarity matrix generation module, a weight distribution module, an FMDC construction target function module and an FMDC solving target function module;
the anchor point image generation module is used for generating m representative anchor point images for the n original images based on a balanced k-means + + hierarchical k-means + + method;
the anchor point diagram constructing module is used for constructing an anchor point diagram between the original image and the anchor point image on each view characteristic;
the similarity matrix generation module is used for calculating a similarity matrix between original images based on the anchor point images;
the weight distribution module is used for distributing weight coefficients to the similarity graphs on the characteristics to obtain an aggregation graph;
the FMDC constructing target function module is used for constructing a FMDC target function by taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target;
and the FMDC solving target function module is used for solving the FMDC target function, outputting a discrete multi-view image clustering indication matrix and completing a multi-view image clustering task.
CN202111452689.8A 2021-11-30 2021-11-30 Fast multi-view discrete clustering method and system based on anchor point diagram Pending CN114399653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111452689.8A CN114399653A (en) 2021-11-30 2021-11-30 Fast multi-view discrete clustering method and system based on anchor point diagram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111452689.8A CN114399653A (en) 2021-11-30 2021-11-30 Fast multi-view discrete clustering method and system based on anchor point diagram

Publications (1)

Publication Number Publication Date
CN114399653A true CN114399653A (en) 2022-04-26

Family

ID=81225985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111452689.8A Pending CN114399653A (en) 2021-11-30 2021-11-30 Fast multi-view discrete clustering method and system based on anchor point diagram

Country Status (1)

Country Link
CN (1) CN114399653A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310452A (en) * 2023-02-16 2023-06-23 广东能哥知识科技有限公司 Multi-view clustering method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074220A1 (en) * 2018-09-04 2020-03-05 Inception Institute of Artificial Intelligence, Ltd. Multi-view image clustering techniques using binary compression
CN111753904A (en) * 2020-06-24 2020-10-09 广东工业大学 Rapid hyperspectral image clustering method, device, equipment and medium
AU2020103322A4 (en) * 2020-11-09 2021-01-14 Southwest University Supervised Discrete Hashing Algorithm With Relaxation Over Distributed Network
CN112418286A (en) * 2020-11-16 2021-02-26 武汉大学 Multi-view clustering method based on constrained non-negative matrix factorization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074220A1 (en) * 2018-09-04 2020-03-05 Inception Institute of Artificial Intelligence, Ltd. Multi-view image clustering techniques using binary compression
CN111753904A (en) * 2020-06-24 2020-10-09 广东工业大学 Rapid hyperspectral image clustering method, device, equipment and medium
AU2020103322A4 (en) * 2020-11-09 2021-01-14 Southwest University Supervised Discrete Hashing Algorithm With Relaxation Over Distributed Network
CN112418286A (en) * 2020-11-16 2021-02-26 武汉大学 Multi-view clustering method based on constrained non-negative matrix factorization

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QIANYAO QIANG: "《Fast Multi-view Discrete Clustering with Anchor Graphs》", 《THE THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-21)》, pages 1 - 4 *
夏冬雪;杨燕;王浩;阳树洪;: "基于邻域多核学习的后融合多视图聚类算法", 计算机研究与发展, no. 08 *
孔颉;孙权森;徐晖;***;纪则轩;: "基于仿射不变离散哈希的遥感图像多目标分类", 软件学报, no. 04 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310452A (en) * 2023-02-16 2023-06-23 广东能哥知识科技有限公司 Multi-view clustering method and system
CN116310452B (en) * 2023-02-16 2024-03-19 广东能哥知识科技有限公司 Multi-view clustering method and system

Similar Documents

Publication Publication Date Title
Jia et al. Bagging-based spectral clustering ensemble selection
Mur et al. Determination of the optimal number of clusters using a spectral clustering optimization
Alaba et al. Towards a more efficient and cost-sensitive extreme learning machine: A state-of-the-art review of recent trend
CN107292341A (en) Adaptive multi views clustering method based on paired collaboration regularization and NMF
Liu et al. Unsupervised feature selection via diversity-induced self-representation
CN111724867A (en) Molecular property measurement method, molecular property measurement device, electronic apparatus, and storage medium
Yin Nonlinear dimensionality reduction and data visualization: a review
Chen et al. Uncorrelated lasso
Chen et al. LABIN: Balanced min cut for large-scale data
Zhang et al. Enabling in-situ data analysis for large protein-folding trajectory datasets
CN114399649B (en) Rapid multi-view semi-supervised learning method and system based on learning graph
CN113255873A (en) Clustering longicorn herd optimization method, system, computer equipment and storage medium
CN114299362A (en) Small sample image classification method based on k-means clustering
CN112766400A (en) Semi-supervised classification integration method for high-dimensional data based on multiple data transformation spaces
CN111401413A (en) Optimization theory-based parallel clustering method with scale constraint
CN114399653A (en) Fast multi-view discrete clustering method and system based on anchor point diagram
CN110175631A (en) A kind of multiple view clustering method based on common Learning Subspaces structure and cluster oriental matrix
Chu et al. On regularized square-root regression problems: distributionally robust interpretation and fast computations
JP5552023B2 (en) Clustering system, method and program
Ng et al. Incremental hashing with sample selection using dominant sets
CN113139556B (en) Manifold multi-view image clustering method and system based on self-adaptive composition
Zhu et al. Incremental classifier learning based on PEDCC-loss and cosine distance
Wang et al. Adaptive hypergraph superpixels
CN114037931A (en) Multi-view discrimination method of self-adaptive weight
Chen et al. FINC: An efficient and effective optimization method for normalized cut

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination