CN114399653A - Fast multi-view discrete clustering method and system based on anchor point diagram - Google Patents
Fast multi-view discrete clustering method and system based on anchor point diagram Download PDFInfo
- Publication number
- CN114399653A CN114399653A CN202111452689.8A CN202111452689A CN114399653A CN 114399653 A CN114399653 A CN 114399653A CN 202111452689 A CN202111452689 A CN 202111452689A CN 114399653 A CN114399653 A CN 114399653A
- Authority
- CN
- China
- Prior art keywords
- anchor point
- image
- clustering
- view
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Evolutionary Biology (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Computing Systems (AREA)
Abstract
The invention discloses a rapid multi-view discrete clustering method and system based on an anchor point diagram, and belongs to the field of multi-view images. The invention relates to a rapid multi-view discrete clustering method based on an anchor point graph, which selects a small part of representative images from all original multi-view images as anchor point images, and calculates a similarity graph by constructing anchor point graphs between all the original images and the anchor point images on the characteristics of all the images; distributing weights to anchor point graphs constructed on different characteristics in the multi-view image clustering process so as to measure the contribution of the different characteristics to a multi-view clustering task; and solving the original problem of spectral clustering and directly calculating a discrete multi-view image clustering indication matrix. The method solves the problem of high complexity of similarity graph construction and graph Laplace matrix characteristic decomposition operation calculation.
Description
Technical Field
The invention belongs to the field of multi-view images, and particularly relates to a rapid multi-view discrete clustering method and system based on an anchor point diagram.
Background
Multi-view clustering is an important technique in multi-view image analysis. A multi-view image refers to image data that is represented using a variety of different features. As an efficient method for solving the problem of multi-view image clustering, the multi-view clustering method based on the graph is widely researched and applied in the field of computer vision.
Most of the existing multi-view clustering methods based on graphs comprise two steps: (1) constructing a similarity graph based on the characteristics of each view; (2) and (3) performing characteristic decomposition operation on the graph Laplace matrix, calculating a continuous clustering distribution matrix, and then obtaining a discrete clustering distribution matrix through a post-processing algorithm such as k-means or spectrum rotation. Although these methods work well, there are still two drawbacks to solving the multi-view image clustering problem. The first disadvantage is the expensive time cost. In general, in an actual multi-view image clustering task, the sample size is very large. The time complexity of the construction and feature decomposition operations of the traditional similarity graph is O (n) respectively2d) And O (n)3) Where n represents the data size and d represents the feature dimension. Such computational overhead is unacceptable for large-scale multi-view image clustering tasks. Another disadvantage is that obtaining an interpretable discrete image cluster assignment matrix through a two-stage process may cause loss of multi-view image feature information. When multi-view image clustering is carried out, the clustering process deviates from the direct solution of the original multi-view image clustering problem.
Many graph-based multi-view clustering approaches are devoted to speeding up similarity graph construction or feature decomposition. For example, constructing an anchor point map or sparse map, approximate feature decomposition operations, and the like accelerates the similarity map construction process. In general, anchor points may be generated by random sampling or lightweight clustering methods (e.g., k-means). When the method is applied to multi-view image clustering, a random sampling method randomly selects a few images from all original images as anchor point images; and clustering all original images by using a k-means method, and selecting an image in a clustering center as an anchor image. Obviously, anchor images generated by the k-means clustering method are more representative than anchor images selected by the random sampling method. However, the k-means method is very time consuming compared to the random sampling method and may generate an unbalanced number of image cluster clusters. When a Balanced k-means-based Hierarchical k-means method (BKKHK) is used, firstly, all images are divided into two image sub-cluster clusters with Balanced quantity, then the sub-cluster clusters are continuously layered to perform Balanced k-means until leaf image cluster clusters with the same quantity as required anchor point images are finally obtained, and finally, class center images of all the leaf clusters are used as anchor points. BKHK has been successfully used to accelerate graph-based methods including hashing, clustering, dimensionality reduction, classification, and the like. However, the k-means algorithm still has the defects that: the random selection of the sample as the initial clustering center causes the clustering result to have instability and the convergence speed to be slow. Among the methods for reducing the time consumption of the feature decomposition operation, approximate feature decomposition is most commonly used. Approximate feature decomposition typically uses random projection and finite-order polynomial expansion methods to compute pseudo-feature vectors, or avoids direct feature decomposition operations by using truncated power iterations, but this technique is ineffective for processing large multi-view image datasets. In addition, in the existing graph-based multi-view clustering method for multi-view image clustering, directly solving the discrete image clustering label matrix without any post-processing steps is yet to be explored.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method and a system for fast multi-view discrete clustering based on an anchor point diagram.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a fast multi-view discrete clustering method based on an anchor point diagram comprises the following steps:
step 1: generating m anchor point images for n original images based on a k-means + + balanced hierarchy k-means + + method, wherein m is smaller than n;
step 2: constructing an anchor point image between the original image and the anchor point image on each view characteristic;
and step 3: calculating a similarity matrix between original images based on the anchor point images;
and 4, step 4: distributing a weight coefficient for the similarity graph on each feature to obtain an aggregation graph;
and 5: taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target, and constructing an FMDC target function;
step 6: solving the target function of the FMDC by utilizing the original problem of solving the spectral clustering, outputting a discrete multi-view image clustering indication matrix and finishing a multi-view image clustering task.
Further, the anchor point image generation process in step 1 is as follows:
dividing all original images into two image sub-clusters with balanced quantity by using k-means + +, then continuing to perform balanced k-means + +, until m leaf clusters are finally obtained, and taking the class center images of the m leaf clusters as anchor images.
Further, in step 1, a multi-view image data set { X ] represented by four features of CENTRIST, wavelet momentums, Gabor and LBP is obtained1,X2,X3,X4};
Stitching data [ X ] using four kinds of feature data1;X2;X3;X4]Generating anchor stitching feature data U1;U2;U3;U4](ii) a And then according to the dimension d of the image features in the four views1、d2、d3、d4Cutting [ U ]1;U2;U3;U4]Obtaining a multi-view anchor image set { U1,U2,U3,U4};
Wherein the content of the first and second substances,an anchor point representing the feature of centrrist,a CENTRIST feature representing the jth anchor image;an anchor point representing a wavelet moments feature,representing the wavelet momentos characteristic of the jth anchor point image;an anchor point representing a Gabor feature,gabor characteristics of the j-th anchor point image;an anchor point representing the characteristics of the LBP,representing LBP characteristics of the jth anchor point image;
representing the centrrist feature, n is the number of all images contained in the dataset,the CENTRIST feature representing the ith image, i e 1, …, n, d1A dimension representing a centrrist feature;representing the features of wavelet movements,represent the wavelet contributions of the ith image, d2A dimension representing a wavelet moments feature;the characteristics of the Gabor are shown,gabor feature representing the ith image, d3To representThe dimensions of the Gabor features;the characteristics of the LBP are represented,representing the LBP feature of the i-th image, d4Representing the dimensions of the LBP features.
Further, step 2 specifically comprises:
in the formula (I), the compound is shown in the specification,||·||2represents the norm of l-2,representing the similarity between the ith original image and the jth anchor image in the vth feature,representing similarity vectors of the ith image and all m anchor point images in the v characteristic; gamma denotes a regularization parameter which is,
the solution to the problem in equation (1) is:
Further, step 3 specifically comprises:
calculating to obtain an anchor point diagram ZvThen, a similarity map on the v-th feature is calculated
Sv=Zv(Δv)-1(Zv)T=Bv(Bv)T (18)
In the formula (I), the compound is shown in the specification,is a diagonal matrix with the jth diagonal element ofSimilarity map S calculated by equation (3)vIs symmetric double random.
Further, step 4 specifically includes:
for similarity maps S on each featurevAssigning a weight coefficient αv:
In the formula (I), the compound is shown in the specification,is a weight vector;is a conglomerate plot, is symmetric double random.
Further, step 5 specifically comprises:
to minimize the aggregate mapSimilarity graph Y (Y) reconstructed by using image clustering label matrixTY)-1YTThe difference between them is the optimization goal, the objective function of FMDC is constructed:
in the formula, | · the luminance | |FRepresents l-F norm, Y ═ Y1,…,yc]∈{0,1}n×cIs a discrete image clustering indication matrix, c represents the number of categories, yl(l ∈ 1, …, c) is the clustering indication matrix for the l-th image class; ind is an abbreviation for discrete image cluster indication matrix.
Further, the solution process of the objective function of FMDC in step 6 is:
(601) fixing alpha and updating Y;
when α is fixed, the sub-problem for solving Y is:
solving this problem is equivalent to solving the problem in the following equation:
equation (7) is written in another form:
(602) fixing Y, and updating alpha;
when Y is fixed, the sub-problem of α is solved as:
solving the formula (15) by using an augmented Lagrange multiplier method;
(603) loops (601) and (602) alternately optimize α and Y until the objective function converges to a minimum.
Further, in (601), Y is updated by a coordinate descent method in rows, specifically:
when updated to the ith row Y of YiTo determine yiComparison of yiAre respectively [1,0, …,0]—[0,0,…,1]And c, selecting the value which maximizes the objective function as yiThe optimal solution of (2);
will yiTaking Y when different values in c as { Y(1),…,Y(c)The objective function value of the h-th value is:
thus, the problem solved is:
definition of Y(0)I-th behavior [0,0, …,0 ]]I.e., all 0 s, then:
definition of Δ (h) ═ obj (Y)(h))-obj(Y(0)) Solving the problem in equation (10) is equivalent to solving the following problem:
at this time, an optimal solution is obtained:
wherein if < > is true, then yip1, otherwise yip=0。
A fast multi-view discrete clustering system based on an anchor point diagram comprises an anchor point image generation module, an anchor point diagram construction module, a similarity matrix generation module, a weight distribution module, an FMDC construction target function module and an FMDC solving target function module;
the anchor point image generation module is used for generating m representative anchor point images for the n original images based on a balanced k-means + + hierarchical k-means + + method;
the anchor point diagram constructing module is used for constructing an anchor point diagram between the original image and the anchor point image on each view characteristic;
the similarity matrix generation module is used for calculating a similarity matrix between original images based on the anchor point images;
the weight distribution module is used for distributing weight coefficients to the similarity graphs on the characteristics to obtain an aggregation graph;
the FMDC constructing target function module is used for constructing a FMDC target function by taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target;
and the FMDC solving target function module is used for solving the FMDC target function, outputting a discrete multi-view image clustering indication matrix and completing a multi-view image clustering task.
Compared with the prior art, the invention has the following beneficial effects:
the invention relates to a rapid multi-view discrete clustering method and a rapid multi-view discrete clustering system based on an anchor point graph, wherein a small part of representative images are selected from all original multi-view images as anchor point images, and a similarity graph is calculated by constructing anchor point graphs between all the original images and the anchor point images on the characteristics of all the images; distributing weights to anchor point graphs constructed on different characteristics in the multi-view image clustering process so as to measure the contribution of the different characteristics to a multi-view clustering task; and solving the original problem of spectral clustering and directly calculating a discrete multi-view image clustering indication matrix. The invention solves two main problems existing when the multi-view image clustering problem is solved by using the current multi-view clustering method based on the graph: (1) the calculation complexity of the similarity graph construction and the graph Laplace matrix characteristic decomposition operation is high; (2) a two-stage solving method of firstly calculating a continuous image clustering distribution matrix and then obtaining a final discrete image clustering distribution matrix through a post-processing step can cause loss of multi-view image characteristic information, so that the multi-view image clustering process deviates from the original multi-view image clustering problem and is directly solved. The invention can complete the multi-view image clustering task with high quality on the premise of reducing the computational complexity of the graph-based clustering method.
Furthermore, an anchor point diagram is constructed by adopting a parameter-free strategy, a small number of representative anchor point images are generated for the images on all the views, and then a similarity diagram is calculated.
Furthermore, consistency of the anchor point image among different features of multiple views is guaranteed.
Furthermore, the invention designs an efficient solving method, and a discrete clustering indication matrix is directly obtained by clustering calculation for the multi-view images.
Further, the objective function of the FMDC of the present invention is very compact without introducing any additional terms and without adjusting any equalization parameters.
Drawings
FIG. 1 is a diagram of the anchor point selection architecture of the present invention;
FIG. 2 is a flowchart of the fast multi-view discrete clustering method based on the anchor point map of embodiment 1;
fig. 3 is a schematic block diagram of the anchor point diagram-based fast multi-view discrete clustering system according to embodiment 2.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention aims to provide a Fast Multi-view Discrete Clustering method and a Fast Multi-view Discrete Clustering system (FMDC) based on an Anchor point diagram, which are used for solving two main problems when the Multi-view image Clustering problem is solved by using the current Multi-view Clustering method based on a graph: (1) the calculation complexity of the similarity graph construction and the graph Laplace matrix characteristic decomposition operation is high; (2) a two-stage solving method of firstly calculating a continuous image clustering distribution matrix and then obtaining a final discrete image clustering distribution matrix through a post-processing step can cause loss of multi-view image characteristic information, so that the multi-view image clustering process deviates from the original multi-view image clustering problem and is directly solved. The invention relates to a rapid multi-view discrete clustering method based on an anchor point graph, which selects a small part of representative images from all original multi-view images as anchor point images, and calculates a similarity graph by constructing anchor point graphs between all the original images and the anchor point images on the characteristics of all the images; automatically distributing weights to anchor point graphs constructed on different characteristics in the multi-view image clustering process, and measuring the contribution of the different characteristics to a multi-view clustering task; and solving the original problem of spectral clustering and directly calculating a discrete multi-view image clustering indication matrix.
The invention is described in further detail below with reference to the accompanying drawings:
the invention relates to a rapid multi-view discrete clustering method based on an anchor point diagram, which is to use four-view image data { X) expressed by using four characteristics of CENTRIST, wavelet moments, Gabor and LBP1,X2,X3,X4As input, anchor point image data set { U } is efficiently selected1,U2,U3,U4And constructing an anchor point diagram { Z }1,Z2,Z3,Z4Automatically weighting the similarity matrix S on all features1,S2,S3,S4And constructing a multi-view clustering model based on the graph, and directly solving the multi-view image clustering problem. Pass-through mechanismAnd (3) calculating a symmetrical double-random similarity graph by using the artificial anchor graph, further solving the original problem of spectral clustering, and directly calculating a discrete multi-view image clustering indication matrix. The method can reduce time complexity and space complexity, ensure the clustering performance of the multi-view images, and realize discrete clustering in rapid multi-view image analysis.
Example 1
First a mathematical symbol is defined representing a multi-view image. Defining a four-view image dataset { X } represented using four features of CENTRIST, wavelet momentums, Gabor and LBP1,X2,X3,X4}. Wherein the content of the first and second substances,(n is the number of all images contained in the dataset) represents the centrrist feature,representing the CENTRIST feature of the i-th image, d1A dimension representing a centrrist feature;representing the features of wavelet movements,represent the wavelet contributions of the ith image, d2A dimension representing a wavelet moments feature;the characteristics of the Gabor are shown,gabor feature representing the ith image, d3Dimensions representing Gabor features;the characteristics of the LBP are represented,representing the LBP feature of the i-th image, d4Representing the dimensions of the LBP features.
Referring to fig. 2, fig. 2 is a flow chart of embodiment 1:
step 1: m (m < n) representative anchor point images are generated for n original images. FMDC uses a balanced k-means + + based hierarchical k-means + + method to generate anchor images for images in all views. Referring to fig. 1, fig. 1 is a structure diagram of anchor point selection in the present invention, which hierarchically clusters samples into subclasses with a balanced sample number by using k-means + +, and finally obtains the center of a leaf cluster as an anchor point. A balanced k-means + + based hierarchical k-means + + method uses k-means + + to firstly divide all original images into two image sub-clusters with balanced quantity, then the balanced k-means + + is continuously executed in a hierarchical mode on the image sub-clusters until m leaf clusters are finally obtained, and class center images of the m leaf clusters are selected as anchor images. To ensure consistency of the anchor image across different features of multiple views, FMDC first uses a concatenation of four feature data [ X ]1;X2;X3;X4]Generating anchor stitching feature data U1;U2;U3;U4](ii) a And then according to the dimension d of the image features in the four views1、d2、d3、d4Cutting [ U ]1;U2;U3;U4]Obtaining a multi-view anchor image set { U1,U2,U3,U4}. Wherein the content of the first and second substances,an anchor point representing the feature of centrrist,a CENTRIST feature representing the jth anchor image;anchor points representing wavelet moments features,representing the wavelet momentos characteristic of the jth anchor point image;an anchor point representing a Gabor feature,gabor characteristics of the j-th anchor point image;an anchor point representing the characteristics of the LBP,the LBP feature of the j-th anchor point image is shown.
Step 2: and constructing an anchor point image between the original image and the anchor point image. Constructing anchor point maps on the v (v epsilon 1,2,3,4) th features respectively by minimizing the following problem
In the formula (I), the compound is shown in the specification,||·||2represents the norm of l-2,representing the similarity between the ith original image and the jth anchor image in the vth feature,and representing the similarity vectors of the ith image and all the m anchor point images in the v characteristic. Gamma denotes a regularization parameter, which can be set toThe solution to the problem in equation (1) is:
And step 3: a similarity matrix between the original images is calculated. Calculating to obtain an anchor point diagram ZvThen, a similarity map on the v-th feature is calculated
Sv=Zv(Δv)-1(Zv)T=Bv(Bv)T (33)
In the formula (I), the compound is shown in the specification,is a diagonal matrix with the jth diagonal element ofSimilarity map S calculated by equation (3)vIs symmetric double random.
And 4, step 4: and fusing similarity graphs on a plurality of features. The invention automatically applies a similarity map S to each feature in consideration of the diversity contribution of different image featuresvAssigning a weight coefficient αv:
In the formula (I), the compound is shown in the specification,is a weight vector. Can be combined withIs regarded as an aggregation graph.And is also symmetric dual random.
And 5: to minimize the aggregate mapSimilarity graph Y (Y) reconstructed by using image clustering label matrixTY)-1YTThe difference between them is the optimization goal, the objective function of FMDC is constructed:
in the formula, | · the luminance | |FRepresents l-F norm, Y ═ Y1,…,yc]∈{0,1}n×cIs a discrete image clustering indication matrix, c represents the number of categories, yl(l ∈ 1, …, c) is the clustering indication matrix for the l-th image class. Ind is an abbreviation for discrete image cluster indication matrix. The objective function in equation (5) smoothes the weight distribution, and avoids the trivial solution of α, i.e. the case where only one similarity graph has a weight of 1 and the weights of the other graphs have weights of 0.
Step 6: the design optimization method solves the objective function of FMDC:
the invention designs an iterative alternation optimization method to solve the objective function in the formula (5), and the specific contents are as follows:
(601) fix α and update Y.
When α is fixed, the sub-problem for solving Y is:
in the formula (I), the compound is shown in the specification,solving this problem is equivalent to solvingProblems in the formula:
equation (7) can be written in another form:
due to the fact thatAll rows of Y are included, and the invention updates Y by row using a coordinate descent method.
Specifically, suppose Y is updated to the ith row of Yi. To determine yiComparison of yiAre respectively [1,0, …,0]—[0,0,…,1]And c, selecting the value which maximizes the objective function as yiThe optimal solution of (1). For convenience, let yiTaking Y when different values in c as { Y(1),…,Y(c)The objective function value of the h (h epsilon 1, …, c) th value is:
therefore, the problem to be solved is:
definition of Y(0)I-th behavior [0,0, …,0 ]]I.e., all 0 s, then:
definition of Δ (h) ═ obj (Y)(h))-obj(Y(0)) Solving the problems in the formula (10), etcIt is worth solving the following problem:
at this time, an optimal solution is obtained:
wherein if < > is true, then yip1, otherwise yip=0。
(602) Y is fixed and α is updated.
When Y is fixed, the sub-problem of α is solved as:
the problem in equation (15) is a standard quadratic programming problem that can be solved directly by the augmented lagrange multiplier method.
(603) The loop (601), (602) alternately optimizes α and Y until the objective function converges to a minimum.
And 7: and outputting a discrete multi-view image clustering indication matrix Y, and calculating clusters ACC (accuracy rate), NMI (normalized mutual information) and Purity.
The rapid multi-view discrete clustering method based on the anchor point diagram is utilized to carry out simulation experiments.
15883 images from class 10 were classified using the NUS-WIDE dataset in this example. Each image is represented by five low-level features: 64-dimensional color histograms (color histograms), 144-dimensional color correlation maps (color histograms), 73-dimensional edge direction histograms (edge direction histograms), and 128-dimensional wavelet textures (wavelet textures).
In this example, MLHR (Multi-feature left vertical regression), smgi (sparse Multiple Graph integration), AMGL (Auto-weighted Multiple Graph Learning), MLAN (Multi-view left Adaptive Neighbors), FMSSL-K (FMSSL that uses K-means to select an anchor), FMSSL-R (FMSSL that uses random means to select an anchor) and FMSSL are used for clustering, and ACC (accuracy), NMI (normalized mutual information) and Purity are used, and the comparison results are shown in the following table 1 and the comparison results of the running times are shown in the following table 2.
TABLE 1 comparative results
Evaluation index | Cotrain | CoregSC | SwMC | MVGL | MLAN | OMSC | LMVSC | GMC | FMDC |
ACC | 0.5845 | 0.5926 | 0.5760 | 0.7335 | 0.5886 | 0.5005 | 0.4669 | 0.7437 | 0.7389 |
NMI | 0.5112 | 0.5057 | 0.5817 | 0.6152 | 0.6168 | 0.3927 | 0.4391 | 0.6339 | 0.6387 |
Purity | 0.5761 | 0.5925 | 0.5782 | 0.7335 | 0.6302 | 0.5011 | 0.5987 | 0.7437 | 0.7392 |
TABLE 2 run time comparison results (units: seconds)
Method | MNIST |
Cotrain | 6756.06 |
CoregSC | 2599.33 |
SwMC | 96021.83 |
MVGL | 34901.54 |
MLAN | 1249.61 |
LMVSC | 3926.07 |
GMC | 1268.18 |
FMDC | 263.82 |
As can be seen from tables 1 and 2, the clustering performance and run time of the present invention are superior to other comparative methods. The effectiveness of the invention can be verified through the simulation experiment.
Example 2
A fast multi-view discrete clustering system based on an anchor point diagram is applied to a fast multi-view discrete clustering method based on the anchor point diagram, referring to FIG. 3, FIG. 3 is a schematic block diagram of the fast multi-view discrete clustering system based on the anchor point diagram, the fast multi-view discrete clustering system based on the anchor point diagram comprises an anchor point image generation module, an anchor point diagram construction module, a similarity matrix generation module, a weight distribution module, an object function module for constructing FMDC and an object function module for solving FMDC;
the anchor point image generation module is used for generating m representative anchor point images for the n original images based on a balanced k-means + + hierarchical k-means + + method;
the anchor point diagram constructing module is used for constructing an anchor point diagram between the original image and the anchor point image on each view characteristic;
the similarity matrix generation module is used for calculating a similarity matrix between original images based on the anchor point images;
the weight distribution module is used for distributing weight coefficients to the similarity graphs on the characteristics to obtain an aggregation graph;
the FMDC constructing target function module is used for constructing a FMDC target function by taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target;
and the FMDC solving target function module is used for solving the FMDC target function, outputting a discrete multi-view image clustering indication matrix and completing a multi-view image clustering task. The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.
Claims (10)
1. A fast multi-view discrete clustering method based on an anchor point diagram is characterized by comprising the following steps:
step 1: generating m anchor point images for n original images based on a k-means + + balanced hierarchy k-means + + method, wherein m is smaller than n;
step 2: constructing an anchor point image between the original image and the anchor point image on each view characteristic;
and step 3: calculating a similarity matrix between original images based on the anchor point images;
and 4, step 4: distributing a weight coefficient for the similarity graph on each feature to obtain an aggregation graph;
and 5: taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target, and constructing an FMDC target function;
step 6: solving the target function of the FMDC by utilizing the original problem of solving the spectral clustering, outputting a discrete multi-view image clustering indication matrix and finishing a multi-view image clustering task.
2. The anchor point diagram-based fast multi-view discrete clustering method according to claim 1, wherein the anchor point image generation process in step 1 is:
dividing all original images into two image sub-clusters with balanced quantity by using k-means + +, then continuing to perform balanced k-means + +, until m leaf clusters are finally obtained, and taking the class center images of the m leaf clusters as anchor images.
3. The anchor point diagram-based fast multi-view discrete clustering method according to claim 2, wherein the multi-view image dataset { X) represented by four features of CENTRIST, wavelet clusters, Gabor and LBP is obtained in step 11,X2,X3,X4};
Stitching data [ X ] using four kinds of feature data1;X2;X3;X4]Generating anchor stitching feature data U1;U2;U3;U4](ii) a And then according to the dimension d of the image features in the four views1、d2、d3、d4Cutting [ U ]1;U2;U3;U4]Obtaining a multi-view anchor image set { U1,U2,U3,U4};
Wherein the content of the first and second substances,an anchor point representing the feature of centrrist,a CENTRIST feature representing the jth anchor image;an anchor point representing a wavelet moments feature,representing the wavelet momentos characteristic of the jth anchor point image;an anchor point representing a Gabor feature,gabor characteristics of the j-th anchor point image;an anchor point representing the characteristics of the LBP,representing LBP characteristics of the jth anchor point image;
representing the centrrist feature, n is the number of all images contained in the dataset,the CENTRIST feature representing the ith image, i e 1, …, n, d1A dimension representing a centrrist feature;representing the features of wavelet movements,represent the wavelet contributions of the ith image, d2A dimension representing a wavelet moments feature;the characteristics of the Gabor are shown,gabor feature representing the ith image, d3Dimensions representing Gabor features;the characteristics of the LBP are represented,representing the LBP feature of the i-th image, d4Representing the dimensions of the LBP features.
4. The anchor point diagram-based fast multi-view discrete clustering method according to claim 3, wherein the step 2 specifically comprises:
in the formula (I), the compound is shown in the specification,||·||2represents the norm of l-2,representing the similarity between the ith original image and the jth anchor image in the vth feature,representing similarity vectors of the ith image and all m anchor point images in the v characteristic; gamma denotes a regularization parameter which is,
the solution to the problem in equation (1) is:
5. The anchor point diagram-based fast multi-view discrete clustering method according to claim 4, wherein the step 3 specifically comprises:
calculating to obtain an anchor point diagram ZvThen, a similarity map on the v-th feature is calculated
Sv=Zv(Δv)-1(Zv)T=Bv(Bv)T (3)
6. The anchor point diagram-based fast multi-view discrete clustering method according to claim 5, wherein the step 4 specifically comprises:
for similarity maps S on each featurevAssigning a weight coefficient αv:
7. The anchor point diagram-based fast multi-view discrete clustering method according to claim 6, wherein the step 5 specifically comprises:
to minimize the aggregate mapSimilarity graph Y (Y) reconstructed by using image clustering label matrixTY)-1YTThe difference between them is the optimization goal, the objective function of FMDC is constructed:
in the formula, | · the luminance | |FRepresents l-F norm, Y ═ Y1,…,yc]∈{0,1}n×cIs a discrete image clustering indication matrix, c represents the number of categories, yl(l ∈ 1, …, c) is the clustering indication matrix for the l-th image class; ind is an abbreviation for discrete image cluster indication matrix.
8. The anchor point diagram-based fast multi-view discrete clustering method according to claim 7, wherein the solution process of the objective function of FMDC in step 6 is:
(601) fixing alpha and updating Y;
when α is fixed, the sub-problem for solving Y is:
solving this problem is equivalent to solving the problem in the following equation:
equation (7) is written in another form:
(602) fixing Y, and updating alpha;
when Y is fixed, the sub-problem of α is solved as:
solving the formula (15) by using an augmented Lagrange multiplier method;
(603) loops (601) and (602) alternately optimize α and Y until the objective function converges to a minimum.
9. The anchor point diagram-based fast multi-view discrete clustering method according to claim 8, wherein Y is updated by rows in (601) by using a coordinate descent method, specifically:
when updated to the ith row Y of YiTo determine yiComparison of yiAre respectively [1,0, …,0]—[0,0,…,1]And c, selecting the value which maximizes the objective function as yiThe optimal solution of (2);
will yiTaking Y when different values in c as { Y(1),…,Y(c)The objective function value of the h-th value is:
thus, the problem solved is:
definition of Y(0)I-th behavior [0,0, …,0 ]]I.e., all 0 s, then:
definition of Δ (h) ═ obj (Y)(h))-obj(Y(0)) Solving the problem in equation (10) is equivalent to solving the following problem:
at this time, an optimal solution is obtained:
wherein if < > is true, then yip1, otherwise yip=0。
10. A fast multi-view discrete clustering system based on an anchor point diagram is characterized by comprising an anchor point image generation module, an anchor point diagram construction module, a similarity matrix generation module, a weight distribution module, an FMDC construction target function module and an FMDC solving target function module;
the anchor point image generation module is used for generating m representative anchor point images for the n original images based on a balanced k-means + + hierarchical k-means + + method;
the anchor point diagram constructing module is used for constructing an anchor point diagram between the original image and the anchor point image on each view characteristic;
the similarity matrix generation module is used for calculating a similarity matrix between original images based on the anchor point images;
the weight distribution module is used for distributing weight coefficients to the similarity graphs on the characteristics to obtain an aggregation graph;
the FMDC constructing target function module is used for constructing a FMDC target function by taking the difference between the minimized aggregation graph and the similarity graph reconstructed by using the image clustering label matrix as an optimization target;
and the FMDC solving target function module is used for solving the FMDC target function, outputting a discrete multi-view image clustering indication matrix and completing a multi-view image clustering task.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111452689.8A CN114399653A (en) | 2021-11-30 | 2021-11-30 | Fast multi-view discrete clustering method and system based on anchor point diagram |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111452689.8A CN114399653A (en) | 2021-11-30 | 2021-11-30 | Fast multi-view discrete clustering method and system based on anchor point diagram |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114399653A true CN114399653A (en) | 2022-04-26 |
Family
ID=81225985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111452689.8A Pending CN114399653A (en) | 2021-11-30 | 2021-11-30 | Fast multi-view discrete clustering method and system based on anchor point diagram |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114399653A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310452A (en) * | 2023-02-16 | 2023-06-23 | 广东能哥知识科技有限公司 | Multi-view clustering method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200074220A1 (en) * | 2018-09-04 | 2020-03-05 | Inception Institute of Artificial Intelligence, Ltd. | Multi-view image clustering techniques using binary compression |
CN111753904A (en) * | 2020-06-24 | 2020-10-09 | 广东工业大学 | Rapid hyperspectral image clustering method, device, equipment and medium |
AU2020103322A4 (en) * | 2020-11-09 | 2021-01-14 | Southwest University | Supervised Discrete Hashing Algorithm With Relaxation Over Distributed Network |
CN112418286A (en) * | 2020-11-16 | 2021-02-26 | 武汉大学 | Multi-view clustering method based on constrained non-negative matrix factorization |
-
2021
- 2021-11-30 CN CN202111452689.8A patent/CN114399653A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200074220A1 (en) * | 2018-09-04 | 2020-03-05 | Inception Institute of Artificial Intelligence, Ltd. | Multi-view image clustering techniques using binary compression |
CN111753904A (en) * | 2020-06-24 | 2020-10-09 | 广东工业大学 | Rapid hyperspectral image clustering method, device, equipment and medium |
AU2020103322A4 (en) * | 2020-11-09 | 2021-01-14 | Southwest University | Supervised Discrete Hashing Algorithm With Relaxation Over Distributed Network |
CN112418286A (en) * | 2020-11-16 | 2021-02-26 | 武汉大学 | Multi-view clustering method based on constrained non-negative matrix factorization |
Non-Patent Citations (3)
Title |
---|
QIANYAO QIANG: "《Fast Multi-view Discrete Clustering with Anchor Graphs》", 《THE THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-21)》, pages 1 - 4 * |
夏冬雪;杨燕;王浩;阳树洪;: "基于邻域多核学习的后融合多视图聚类算法", 计算机研究与发展, no. 08 * |
孔颉;孙权森;徐晖;***;纪则轩;: "基于仿射不变离散哈希的遥感图像多目标分类", 软件学报, no. 04 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310452A (en) * | 2023-02-16 | 2023-06-23 | 广东能哥知识科技有限公司 | Multi-view clustering method and system |
CN116310452B (en) * | 2023-02-16 | 2024-03-19 | 广东能哥知识科技有限公司 | Multi-view clustering method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jia et al. | Bagging-based spectral clustering ensemble selection | |
Mur et al. | Determination of the optimal number of clusters using a spectral clustering optimization | |
Alaba et al. | Towards a more efficient and cost-sensitive extreme learning machine: A state-of-the-art review of recent trend | |
CN107292341A (en) | Adaptive multi views clustering method based on paired collaboration regularization and NMF | |
Liu et al. | Unsupervised feature selection via diversity-induced self-representation | |
CN111724867A (en) | Molecular property measurement method, molecular property measurement device, electronic apparatus, and storage medium | |
Yin | Nonlinear dimensionality reduction and data visualization: a review | |
Chen et al. | Uncorrelated lasso | |
Chen et al. | LABIN: Balanced min cut for large-scale data | |
Zhang et al. | Enabling in-situ data analysis for large protein-folding trajectory datasets | |
CN114399649B (en) | Rapid multi-view semi-supervised learning method and system based on learning graph | |
CN113255873A (en) | Clustering longicorn herd optimization method, system, computer equipment and storage medium | |
CN114299362A (en) | Small sample image classification method based on k-means clustering | |
CN112766400A (en) | Semi-supervised classification integration method for high-dimensional data based on multiple data transformation spaces | |
CN111401413A (en) | Optimization theory-based parallel clustering method with scale constraint | |
CN114399653A (en) | Fast multi-view discrete clustering method and system based on anchor point diagram | |
CN110175631A (en) | A kind of multiple view clustering method based on common Learning Subspaces structure and cluster oriental matrix | |
Chu et al. | On regularized square-root regression problems: distributionally robust interpretation and fast computations | |
JP5552023B2 (en) | Clustering system, method and program | |
Ng et al. | Incremental hashing with sample selection using dominant sets | |
CN113139556B (en) | Manifold multi-view image clustering method and system based on self-adaptive composition | |
Zhu et al. | Incremental classifier learning based on PEDCC-loss and cosine distance | |
Wang et al. | Adaptive hypergraph superpixels | |
CN114037931A (en) | Multi-view discrimination method of self-adaptive weight | |
Chen et al. | FINC: An efficient and effective optimization method for normalized cut |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |