CN112784902A - Two-mode clustering method with missing data - Google Patents

Two-mode clustering method with missing data Download PDF

Info

Publication number
CN112784902A
CN112784902A CN202110095029.2A CN202110095029A CN112784902A CN 112784902 A CN112784902 A CN 112784902A CN 202110095029 A CN202110095029 A CN 202110095029A CN 112784902 A CN112784902 A CN 112784902A
Authority
CN
China
Prior art keywords
modal
data
modality
encoder
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110095029.2A
Other languages
Chinese (zh)
Other versions
CN112784902B (en
Inventor
彭玺
林义杰
杨谋星
李云帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202110095029.2A priority Critical patent/CN112784902B/en
Publication of CN112784902A publication Critical patent/CN112784902A/en
Application granted granted Critical
Publication of CN112784902B publication Critical patent/CN112784902B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a two-modal clustering method with missing data, which is based on a self-encoder, learns modal special representation of each modal data through intra-modal reconstruction loss, learns modal consistency representation through cross-modal comparison learning loss, recovers information of inconsistency between a lost mode and a abandoned mode through cross-modal dual prediction loss, further improves consistency, performs unified processing on data recovery and consistency learning, and has better clustering effect.

Description

Two-mode clustering method with missing data
Technical Field
The invention relates to the field of big data analysis, in particular to a two-modal clustering method with missing data.
Background
At present, the multi-modal data clustering technology is widely applied to various fields. In commodity recommendation, massive commodity images and text attributes are combined, semantic feature expression of the images is learned, and commodity recommendation degree meeting user requirements is improved; in multi-round conversation with the intelligent customer service, a multi-mode clustering technology of vision and language is integrated, and automatic text, picture or video response can be automatically realized for a user. The success of these multi-modal techniques has mainly been benefited by consistent learning of multi-modal data, i.e., exploring and exploiting the inherent correlations and invariance of data between different modalities. However, consistency learning is based on the completeness of multi-modal data-all data samples cover all modalities, and there cannot be a situation of modality data missing. However, due to the complexity of the data acquisition environment, there are often situations of modality deficiency in the actual data, for example, in an online conference, some video frames may lose visual or auditory signals due to the damage of the sensor. For example, in medical diagnosis, patients often do not perform all physical examinations, but only perform partial examinations, and how to perform etiology diagnosis by using partial examination information is the essential problem of multi-modal clustering with missing data. On the basis of the current technology, for clustering real multi-modal data, data needs to be supplemented in advance to ensure the completeness of an object to be clustered. The current completion method mainly aims at the similarity among samples rather than missing data samples, such as a matrix decomposition-based double-end-aligned incomplete multi-modal clustering (DAIMC), partial multi-modal clustering (PVC) and incomplete multi-modal visualized data grouping (IMG) method.
Data clustering methods of incomplete multi-modal can be roughly divided into two categories: the method is based on a shallow model, for example, the DAIMC method proposed by Menglei Hu et al models the high-order correlation among multiple modes through low-rank matrix decomposition, and effectively utilizes the consistent information among the multiple modes by combining with related prior information to realize the multi-mode subspace learning. The other is a method based on deep learning, for example, the DM2C method proposed by Yangbangyan Jiang et al first obtains a modal-specific representation of each mode through a self-encoder, then adopts a cyclic generation countermeasure network (cyclic generation adaptive Networks), generates missing modal data by using complete modal data, and splices the modal-specific representations of each mode to obtain a common representation.
Second, almost all existing methods view data recovery and consistency learning as two separate problems or steps, lacking a unified theoretical understanding. Such as deep mixed modal clustering (DM2C) and Antagonistic Incomplete Multimodal Clustering (AIMC) based on generating antagonistic networks. Therefore, under the condition of modal data missing, the research on a unified data completion and consistency learning data clustering technology has a very high application prospect and a very high practical value.
Disclosure of Invention
Aiming at the defects in the prior art, the two-modal clustering method with missing data provided by the invention solves the problem that data recovery and consistency learning are not uniformly processed in the prior art.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
a two-mode clustering method with missing data is provided, which comprises the following steps:
s1, respectively sending two modal data of the sample with two modes into corresponding self-coders to obtain corresponding hidden representations;
s2, respectively acquiring corresponding cross-modal contrast learning loss and intra-modal reconstruction loss according to the hidden representations corresponding to the two modal data;
s3, reversely propagating the current self-encoder according to the cross-modal contrast learning loss and the intra-modal reconstruction loss to update the parameters and the weight of the current self-encoder;
s4, judging whether the reverse propagation times reach a threshold value, if so, entering a step S5, otherwise, returning to the step S1;
s5, acquiring corresponding cross-modal contrast learning loss, intra-modal reconstruction loss and cross-modal dual prediction loss according to the current latest hidden representation corresponding to the two modal data;
s6, reversely propagating the current self-encoder according to the current latest cross-modal contrast learning loss, cross-modal dual prediction loss and intra-modal reconstruction loss to update the parameters and the weight of the current self-encoder;
s7, judging whether the current self-encoder is converged, if yes, entering the step S8, otherwise, returning to the step S5;
s8, sending a set of samples with two simultaneous modes, a sample with only a first mode and a sample with only a second mode into a current latest self-encoder as a two-mode data set with missing data to obtain a hidden representation corresponding to the two-mode data set with the missing data;
s9, acquiring a representation of a missing mode corresponding to a hidden representation corresponding to a sample set only having a first mode and a representation of a missing mode corresponding to a hidden representation corresponding to a sample set only having a second mode in the two-mode data sets based on dual mapping;
and S10, splicing different modal representations corresponding to each sample and using the spliced modal representations as common representations of the modal representations, clustering the common representations, and finishing the two-modal clustering with missing data.
Further, the self-encoder in step S1 includes an encoder and a decoder, where the encoder includes a first fully-connected layer, a first batch of normalization layers, a first activation function, a second fully-connected layer, a second batch of normalization layers, a second activation function, a third fully-connected layer, a third batch of normalization layers, a third activation function, a fourth fully-connected layer, and a fourth activation function, which are connected in sequence; the input dimension of the first fully-connected layer is the dimension of input modal data; the output dimensionalities of the first full connection layer, the second full connection layer and the third full connection layer are all 1024; the first activation function, the second activation function and the third activation function are all ReLU; the output dimension of the fourth fully-connected layer is 128, and the fourth activation function is Softmax;
the decoder comprises a fifth full-connection layer, a fourth batch of normalization layers, a fifth activation function, a sixth full-connection layer, a fifth batch of normalization layers, a sixth activation function, a seventh full-connection layer, a sixth batch of normalization layers, a seventh activation function, an eighth full-connection layer, a seventh batch of normalization layers and an eighth activation function which are connected in sequence; the input dimensionality of the fifth full connection layer is 128, the output dimensionalities of the fifth full connection layer, the sixth full connection layer and the seventh full connection layer are 1024, and the fifth activation function, the sixth activation function, the seventh activation function and the eighth activation function are ReLU; the output dimension of the eighth fully connected layer is the dimension of the input modal data.
Further, the specific method for obtaining the corresponding cross-modal contrast learning loss according to the hidden representation corresponding to the two modal data in step S2 is as follows:
according to the formula:
Figure BDA0002913693930000041
obtaining cross-modal contrast learning loss lcl(ii) a Where m is the total number of samples in which both modalities exist simultaneously, and t represents the t-th sample;
Figure BDA0002913693930000047
the mutual information is represented by a representation of the mutual information,
Figure BDA0002913693930000042
for the corresponding hidden representation of the first modality data in the tth sample of simultaneous two modalities,
Figure BDA0002913693930000043
the hidden representation corresponding to the second modal data in the tth sample with two simultaneous modals; h (·) represents information entropy; alpha is the equilibrium parameter of entropy.
Further, the specific method for acquiring the corresponding intra-modality reconstruction loss according to the hidden representation corresponding to the two modality data in step S2 is as follows:
according to the formula:
Figure BDA0002913693930000044
obtaining intra-modal reconstruction loss lrec(ii) a Where m is the total number of samples in which both modalities exist simultaneously, and t represents the t-th sample;
Figure BDA0002913693930000045
representing the v-th modal data in the t-th sample; f. of(v)(. and g)(v)() represents the encoder and decoder, respectively, to which the v-th modality data currently corresponds;
Figure BDA0002913693930000046
is a norm.
Further, the specific method of step S3 is:
will lcl+0.1lrecThe calculation result is used as the current loss to carry out back propagation on the current self-encoder, and the parameters and the weight of the current self-encoder are updated; wherein lclComparing learning loss for cross-modal; lrecIs intra-modal reconstruction loss.
Further, in step S5, the specific method for obtaining the corresponding cross-modal dual prediction loss according to the current latest hidden representation corresponding to the two modal data is as follows:
according to the formula:
Figure BDA0002913693930000051
obtaining Cross-modal Dual prediction loss lpre(ii) a Wherein Z1A hidden representation set corresponding to all first modality data in a sample with two modalities simultaneously; z2A hidden representation set corresponding to all second modality data in a sample with two modalities simultaneously; g(2)(Z2) To be Z2Performing mapping, G(1)(Z1) To be Z1Performing mapping, G(2)(. and G)(1)(. to) form a dual map;
Figure BDA0002913693930000052
is a norm.
Further, the specific method of step S6 is:
will be given by formula lcl+0.1lpre+0.1lrecThe calculation result is used as the current loss to carry out back propagation on the current self-encoder, and the parameters and the weight of the current self-encoder are updated; wherein lclComparing learning loss for cross-modal; lprePredicting loss for cross-modal duality; lrecIs intra-modal reconstruction loss.
Further, the specific method of step S8 is:
according to the formula:
Figure BDA0002913693930000053
Figure BDA0002913693930000054
Figure BDA0002913693930000055
Figure BDA0002913693930000056
obtaining a hidden representation corresponding to the two-modal data set with missing data, including a sample set X for the simultaneous existence of the two-modal data1Corresponding hidden representation
Figure BDA0002913693930000057
Sample set X for simultaneous presence of two modal data2Corresponding hidden representation
Figure BDA0002913693930000058
For sample sets X in which only the first modality exists(1)Corresponding hidden representation
Figure BDA0002913693930000059
And for sample set X where only the second modality is present(2)Corresponding hidden representation
Figure BDA00029136939300000510
Wherein
Figure BDA00029136939300000511
An encoder of the latest self-encoder corresponding to the 1 st modality data;
Figure BDA00029136939300000512
represents the encoder of the latest auto-encoder corresponding to the 2 nd modality data.
Further, the specific method of step S9 is:
according to the formula:
Figure BDA0002913693930000061
Figure BDA0002913693930000062
obtaining hidden representations corresponding to sample sets where only the first modality is present, respectively
Figure BDA0002913693930000063
Representation of the corresponding missing modality
Figure BDA0002913693930000064
Implicit representation corresponding to a set of samples in which only a second modality is present
Figure BDA0002913693930000065
Representation of the corresponding missing modality
Figure BDA0002913693930000066
G(1)(. represents a mapping corresponding to a first modality, G(2)(. represents a mapping corresponding to a second modality, G(2)(. and G)(1)(. cndot.) constitutes a dual map.
Further, the specific method of splicing the different modality representations corresponding to each sample in step S10 and using the spliced different modality representations as a common representation includes:
will be provided with
Figure BDA0002913693930000067
As a common representation of samples of both modalities present simultaneously; will be provided with
Figure BDA0002913693930000068
As a common representation of samples where only the first modality is present; will be provided with
Figure BDA0002913693930000069
As a common representation of samples where only the second modality is present.
The invention has the beneficial effects that: the method is based on the self-encoder, learns the modal special representation of each modal data through intra-modal reconstruction loss, learns the consistency representation of the modalities through cross-modal comparison learning loss, recovers the information of the lost modalities and abandons the inconsistency among the modalities through cross-modal dual prediction loss, further improves the consistency, performs unified processing on data recovery and consistency learning, and has better clustering effect.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a block diagram of a model of the present invention;
FIG. 3 is a graph comparing the accuracy of the change in the deletion rate from 0 to 0.8 in example 1;
FIG. 4 is a graph comparing normalized mutual information for the missing rate varying from 0 to 0.8 in example 1;
FIG. 5 is a graph showing the comparison of adjusted Reed coefficients for the deletion ratio of example 1 varying from 0 to 0.8;
FIG. 6 is a graph comparing the accuracy of the change in the deletion rate from 0 to 0.8 in example 2;
FIG. 7 is a graph comparing normalized mutual information for the loss rates varying from 0 to 0.8 in example 2;
FIG. 8 is a graph comparing the adjusted Lande ratios for the loss rates from 0 to 0.8 in example 2.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1 and fig. 2, the two-modal clustering method with missing data includes the following steps:
s1, respectively sending two modal data of the sample with two modes into corresponding self-coders to obtain corresponding hidden representations;
s2, respectively acquiring corresponding cross-modal contrast learning loss and intra-modal reconstruction loss according to the hidden representations corresponding to the two modal data;
s3, reversely propagating the current self-encoder according to the cross-modal contrast learning loss and the intra-modal reconstruction loss to update the parameters and the weight of the current self-encoder;
s4, judging whether the reverse propagation times reach a threshold value, if so, entering a step S5, otherwise, returning to the step S1; the threshold is 100;
s5, acquiring corresponding cross-modal contrast learning loss, intra-modal reconstruction loss and cross-modal dual prediction loss according to the current latest hidden representation corresponding to the two modal data;
s6, reversely propagating the current self-encoder according to the current latest cross-modal contrast learning loss, cross-modal dual prediction loss and intra-modal reconstruction loss to update the parameters and the weight of the current self-encoder;
s7, judging whether the current self-encoder is converged, if yes, entering the step S8, otherwise, returning to the step S5;
s8, sending a set of samples with two simultaneous modes, a sample with only a first mode and a sample with only a second mode into a current latest self-encoder as a two-mode data set with missing data to obtain a hidden representation corresponding to the two-mode data set with the missing data;
s9, acquiring a representation of a missing mode corresponding to a hidden representation corresponding to a sample set only having a first mode and a representation of a missing mode corresponding to a hidden representation corresponding to a sample set only having a second mode in the two-mode data sets based on dual mapping;
and S10, splicing different modal representations corresponding to each sample and using the spliced modal representations as common representations of the modal representations, clustering the common representations, and finishing the two-modal clustering with missing data.
The self-encoder in the step S1 includes an encoder and a decoder, the encoder includes a first full-connection layer, a first batch of normalization layers, a first activation function, a second full-connection layer, a second batch of normalization layers, a second activation function, a third full-connection layer, a third batch of normalization layers, a third activation function, a fourth full-connection layer and a fourth activation function, which are connected in sequence; the input dimension of the first fully-connected layer is the dimension of input modal data; the output dimensionalities of the first full connection layer, the second full connection layer and the third full connection layer are all 1024; the first activation function, the second activation function and the third activation function are all ReLU; the output dimension of the fourth fully-connected layer is 128, and the fourth activation function is Softmax;
the decoder comprises a fifth full-connection layer, a fourth batch of normalization layers, a fifth activation function, a sixth full-connection layer, a fifth batch of normalization layers, a sixth activation function, a seventh full-connection layer, a sixth batch of normalization layers, a seventh activation function, an eighth full-connection layer, a seventh batch of normalization layers and an eighth activation function which are connected in sequence; the input dimensionality of the fifth full connection layer is 128, the output dimensionalities of the fifth full connection layer, the sixth full connection layer and the seventh full connection layer are 1024, and the fifth activation function, the sixth activation function, the seventh activation function and the eighth activation function are ReLU; the output dimension of the eighth fully connected layer is the dimension of the input modal data.
The specific method for obtaining the corresponding cross-modal contrast learning loss according to the hidden representation corresponding to the two modal data in step S2 is as follows: according to the formula:
Figure BDA0002913693930000081
obtaining cross-modal contrast learning loss lcl(ii) a Where m is the total number of samples in which both modalities exist simultaneously, and t represents the t-th sample;
Figure BDA0002913693930000091
the mutual information is represented by a representation of the mutual information,
Figure BDA0002913693930000092
represents the corresponding hidden representation of the v-th modal data in the t-th sample, v ∈ {1,2}, i.e., the
Figure BDA0002913693930000093
For the corresponding hidden representation of the first modality data in the tth sample of simultaneous two modalities,
Figure BDA0002913693930000094
the hidden representation corresponding to the second modal data in the tth sample with two simultaneous modals; h (·) represents information entropy; alpha is the equilibrium parameter of entropy.
The specific method for acquiring the corresponding intra-modal reconstruction loss according to the hidden representation corresponding to the two modal data in step S2 is as follows: according to the formula:
Figure BDA0002913693930000095
obtaining intra-modal reconstruction loss lrec(ii) a Where m is the total number of samples in which both modalities exist simultaneously, and t represents the t-th sample;
Figure BDA0002913693930000096
representing the v-th modal data in the t-th sample; f. of(v)(. and g)(v)() represents the encoder and decoder, respectively, to which the v-th modality data currently corresponds;
Figure BDA0002913693930000097
is a norm.
The specific method of step S3 is: will lcl+0.1lrecThe calculation result is used as the current loss to carry out back propagation on the current self-encoder, and the parameters and the weight of the current self-encoder are updated; wherein lclComparing learning loss for cross-modal; lrecIs intra-modal reconstruction loss.
In step S5, the specific method for obtaining the corresponding cross-modal dual prediction loss according to the current latest hidden representation corresponding to the two modal data is as follows: according to the formula:
Figure BDA0002913693930000098
obtaining Cross-modal Dual prediction loss lpre(ii) a Wherein Z1A hidden representation set corresponding to all first modality data in a sample with two modalities simultaneously; z2A hidden representation set corresponding to all second modality data in a sample with two modalities simultaneously; g(2)(Z2) To be Z2Performing mapping, G(1)(Z1) To be Z1Performing mapping, G(2)(. and G)(1)(. to) form a dual map;
Figure BDA0002913693930000099
is a norm.
The specific method of step S6 is: will be given by formula lcl+0.1lpre+0.1lrecThe calculation result is used as the current loss to carry out back propagation on the current self-encoder, and the parameters and the weight of the current self-encoder are updated; wherein lclComparing learning loss for cross-modal; lprePredicting loss for cross-modal duality; lrecIs intra-modal reconstruction loss.
The specific method of step S8 is: according to the formula:
Figure BDA0002913693930000101
Figure BDA0002913693930000102
Figure BDA0002913693930000103
Figure BDA0002913693930000104
obtaining a hidden representation corresponding to the two-modal data set with missing data, including a sample set X for the simultaneous existence of the two-modal data1Corresponding hidden representation
Figure BDA0002913693930000105
Sample set X for simultaneous presence of two modal data2Corresponding hidden representation
Figure BDA0002913693930000106
For sample sets X in which only the first modality exists(1)Corresponding hidden representation
Figure BDA0002913693930000107
And for sample set X where only the second modality is present(2)Corresponding hidden representation
Figure BDA0002913693930000108
Wherein
Figure BDA0002913693930000109
An encoder of the latest self-encoder corresponding to the 1 st modality data;
Figure BDA00029136939300001010
represents the encoder of the latest auto-encoder corresponding to the 2 nd modality data.
The specific method of step S9 is: according to the formula:
Figure BDA00029136939300001011
Figure BDA00029136939300001012
obtaining hidden representations corresponding to sample sets where only the first modality is present, respectively
Figure BDA00029136939300001013
Representation of the corresponding missing modality
Figure BDA00029136939300001014
Implicit representation corresponding to a set of samples in which only a second modality is present
Figure BDA00029136939300001015
Representation of the corresponding missing modality
Figure BDA00029136939300001016
G(1)(. represents a mapping corresponding to a first modality, G(2)(. represents a mapping corresponding to a second modality, G(2)(. and G)(1)(. cndot.) constitutes a dual map.
In step S10, the specific method of splicing the different modal representations corresponding to each sample and using the spliced modal representations as a common representation includes: will be provided with
Figure BDA00029136939300001017
As a common representation of samples of both modalities present simultaneously; will be provided with
Figure BDA00029136939300001018
As a common representation of samples where only the first modality is present; will be provided with
Figure BDA00029136939300001019
As a common representation of samples where only the second modality is present.
In the specific implementation process, the entropy is regularized, the parameter alpha is fixed to be 10, and the design of cross-modal contrast learning loss has two advantages: on one hand, from the information theory, the information entropy is the average information quantity transmitted by an event, and a larger entropy represents a more information-quantity representation; on the other hand, maximizing
Figure BDA0002913693930000111
And
Figure BDA0002913693930000112
a trivial solution of assigning all samples to the same cluster can be avoided. To be constructed with
Figure BDA0002913693930000113
The joint probability distribution p (z, z ') of the variables z and z' can be defined first, since the Softmax activation function is stacked in the last layer of the encoder, and therefore
Figure BDA0002913693930000114
And
Figure BDA0002913693930000115
can be regarded as an over-clustering probability, i.e.
Figure BDA0002913693930000116
And
Figure BDA0002913693930000117
it can be understood that the distribution of the two discrete cluster-assigned variables z and z' over D classes, D being
Figure BDA0002913693930000118
And
Figure BDA0002913693930000119
of (c) is calculated. The joint probability p (z, z') is thus defined as a matrix
Figure BDA00029136939300001110
Figure BDA00029136939300001111
Let PdAnd Pd'The edge probability distributions P (z ═ d) and P (z ' ═ d ') are represented, respectively, and can be obtained by summing up the d-th row and d ' -th column of the joint probability distribution matrix P, respectively. For discrete variables, the cross-modal contrast learning penalty function can be redefined as:
Figure BDA00029136939300001112
wherein P isdd'Is the element in row d, column d' of P.
To infer the modality of the missing, the present invention proposes a dual prediction mechanism. In particular, in a potential space parameterized by a neural network, by minimizing the conditional entropy H (Z)i|Zj) Representation of a particular mode ZiCan be covered by ZjPredicting, wherein i ═ 1, j ═ 2, or j ═ 1, i ═ 2; namely ZiIs totally composed of ZiDetermining if and only if conditional entropy
Figure BDA00029136939300001113
One common approach to optimizing this goal is to introduce a variational distribution
Figure BDA00029136939300001114
And maximize
Figure BDA00029136939300001115
Lower boundary of
Figure BDA00029136939300001116
Wherein
Figure BDA00029136939300001117
The variation distribution Q described above may be of any type, such as gaussian distribution, class distribution, and laplacian distribution. In particular, the method may use the distribution Q as a Gaussian distribution N (Z)i|G(j)(Zj) σ I), σ I is a variance matrix. By omitting constant derivation in Gaussian distribution, maximization
Figure BDA00029136939300001118
Is equivalent to
Figure BDA00029136939300001119
For a given bimodal data, a cross-over can be obtainedModal dual prediction penalty
Figure BDA0002913693930000121
It is noted that the above dual prediction penalty may result in a trivial solution, Z, if there is no intra-modal reconstruction penalty1And Z2Equal to one and the same constant. After the model converges, we can easily predict the missing mode representation corresponding to the hidden representation corresponding to the sample set with only the first mode through the dual mapping.
When the whole model is trained on the data with complete two modes to be converged, the whole data set is directly sent into the network model, and the network model can execute missing mode completion and deduce corresponding representation. And then directly splicing the representations of different modes together to obtain a common representation, and clustering the common representation by using a traditional clustering method such as k-means to complete the two-mode clustering with missing data. Similarly, the method is applied to any two-modal clustering, so that the method can be directly popularized to multi-modal clustering.
Mapping model G(2)And G(1)The same network structure is adopted, and the network structure is 6 layers:
a first layer: a fully-connected layer, input 128, output 128, immediately following batch normalization layer BatchNorm1 d; the activation function is ReLU;
a second layer: a fully-connected layer, input 128, output 256, immediately following batch normalization layer BatchNorm1 d; the activation function is ReLU;
and a third layer: a fully connected layer, 256 inputs and 128 outputs, immediately following batch normalization layer BatchNorm1 d; the activation function is ReLU;
a fourth layer: a fully-connected layer, input 128, output 256, immediately following batch normalization layer BatchNorm1 d; the activation function is ReLU;
and a fifth layer: a fully connected layer, 256 inputs and 128 outputs, immediately following batch normalization layer BatchNorm1 d; the activation function is ReLU;
a sixth layer: the fully connected layer, input 128, output 128, and the activation function is Softmax.
In one embodiment of the invention, a data set Caltech-101-20 is used, which contains 2386 pictures from 20 object classes, using 2 extracted image features as 2 modalities, including (HOG, GIST). The experimental data category information and sample number distribution are shown in table 1.
Table 1: experimental data Classification information and sample number distribution
Figure BDA0002913693930000131
The experiments were performed at different deletion rates, defined as η ═ n-m/n, where n is the size of the data set and m is the number of modal complete samples. To verify the superiority of this solution, we compared this solution (complete) with other 10 multimodal clustering techniques, namely partial multimodal clustering (PVC), incomplete multimodal visualization data grouping (IMG), Unified Embedded Alignment Framework (UEAF), double-aligned incomplete multimodal clustering (DAIMC), incomplete multimodal clustering of spectral Perturbation (PIC), efficient regularization incomplete multimodal clustering (EERIMVC), Deep Canonical Correlation Analysis (DCCA), Deep Canonical Correlation Autocoder (DCCAE), binary multimodal clustering (BMVC), and dual autocoder network (AE)2 Nets)。
The test results at a deletion rate of 0.5 are shown in table 2.
Table 2: test results when the deletion rate η is 0.5
Figure BDA0002913693930000132
Figure BDA0002913693930000141
The test results at a deletion rate of 0 are shown in table 3.
Table 3: test results when the deletion rate η is 0
Figure BDA0002913693930000142
Figure BDA0002913693930000151
As can be seen from tables 2 and 3, compared with other clustering methods, the method has a large improvement in the two indexes of standardized mutual information and adjusted reed coefficient, which means that the data of the object picture can be clustered correctly in practical application, and a large amount of manpower resources are not consumed for picture classification.
To further explore the effectiveness of our method, we varied the deletion rate η from 0 to 0.8 on Caltech101-20, with 0.1 as the interval, as shown in fig. 3, 4 and 5. From the results in fig. 3-5 it can be observed that: i) COMPLETER (method) is significantly better than all comparative methods in all deletion rate settings ii) as the deletion rate increases, the performance of the comparative method drops by a much greater amount than our method. For example, with η ═ 0, complete and PIC achieve NMI of 0.6806 and 0.6793, respectively, while complete is significantly better than PIC with increasing miss rates.
In another embodiment of the invention, a Scene-15 dataset is used, containing 4485 pictures from 15 Scene categories, using 2 extracted image features as 2 modalities, including (PHOG, GIST). The experimental data category information and sample number distribution are shown in table 4.
Table 4: experimental data Classification information and sample number distribution
Office room Kitchen cabinet Parlor Bedroom Shop
215 210 289 216 315
Industrial process High-rise building City Street Highway with a light-emitting diode
311 356 308 292 260
Coast of the ocean Open field Mountain Forest (forest) Suburb
360 410 374 328 241
The results of the experiment when the deletion rate η was 0.5 are shown in table 5.
Table 5: experimental results when the deletion rate η is 0.5
Figure BDA0002913693930000161
The results of the experiment when the deletion rate η is 0 are shown in table 6.
Table 6: experimental results when the deletion rate η is 0
Figure BDA0002913693930000162
Figure BDA0002913693930000171
As can be seen from tables 5 and 6, compared with other clustering methods, the method has great improvement in two indexes of accuracy and standardized mutual information, which means that the object picture data can be clustered correctly in practical application, and the consumption of a large amount of human resources for picture classification is avoided. Meanwhile, the method achieves the best effect under the two conditions of deletion and non-deletion.
As shown in fig. 6, 7 and 8, in order to further investigate the effectiveness of the method, experiments were performed by changing the deletion rate η from 0 to 0.8 at intervals of 0.1. From the results in fig. 6-8, it can be observed that complete (the present method) outperforms all comparative methods in almost all miss rate settings.
In summary, the present invention learns the modality specific representation of each modality data through intra-modality reconstruction loss, learns the consistency representation of the modalities through cross-modality contrast learning loss, and recovers the information of inconsistency between the lost modalities and the abandoned modalities through cross-modality dual prediction loss based on the self-encoder, thereby further improving consistency, and performing unified processing on data recovery and consistency learning, and achieving a better clustering effect.

Claims (10)

1. A two-mode clustering method with missing data is characterized by comprising the following steps:
s1, respectively sending two modal data of the sample with two modes into corresponding self-coders to obtain corresponding hidden representations;
s2, respectively acquiring corresponding cross-modal contrast learning loss and intra-modal reconstruction loss according to the hidden representations corresponding to the two modal data;
s3, reversely propagating the current self-encoder according to the cross-modal contrast learning loss and the intra-modal reconstruction loss to update the parameters and the weight of the current self-encoder;
s4, judging whether the reverse propagation times reach a threshold value, if so, entering a step S5, otherwise, returning to the step S1;
s5, acquiring corresponding cross-modal contrast learning loss, intra-modal reconstruction loss and cross-modal dual prediction loss according to the current latest hidden representation corresponding to the two modal data;
s6, reversely propagating the current self-encoder according to the current latest cross-modal contrast learning loss, cross-modal dual prediction loss and intra-modal reconstruction loss to update the parameters and the weight of the current self-encoder;
s7, judging whether the current self-encoder is converged, if yes, entering the step S8, otherwise, returning to the step S5;
s8, sending a set of samples with two simultaneous modes, a sample with only a first mode and a sample with only a second mode into a current latest self-encoder as a two-mode data set with missing data to obtain a hidden representation corresponding to the two-mode data set with the missing data;
s9, acquiring a representation of a missing mode corresponding to a hidden representation corresponding to a sample set only having a first mode and a representation of a missing mode corresponding to a hidden representation corresponding to a sample set only having a second mode in the two-mode data sets based on dual mapping;
and S10, splicing different modal representations corresponding to each sample and using the spliced modal representations as common representations of the modal representations, clustering the common representations, and finishing the two-modal clustering with missing data.
2. The two-modal clustering method with missing data of claim 1, wherein the self-encoder in step S1 comprises an encoder and a decoder, the encoder comprises a first fully-connected layer, a first batch normalization layer, a first activation function, a second fully-connected layer, a second batch normalization layer, a second activation function, a third fully-connected layer, a third batch normalization layer, a third activation function, a fourth fully-connected layer and a fourth activation function which are connected in sequence; the input dimension of the first fully-connected layer is the dimension of input modal data; the output dimensionalities of the first full connection layer, the second full connection layer and the third full connection layer are all 1024; the first activation function, the second activation function and the third activation function are all ReLU; the output dimension of the fourth fully-connected layer is 128, and the fourth activation function is Softmax;
the decoder comprises a fifth full-connection layer, a fourth batch of normalization layers, a fifth activation function, a sixth full-connection layer, a fifth batch of normalization layers, a sixth activation function, a seventh full-connection layer, a sixth batch of normalization layers, a seventh activation function, an eighth full-connection layer, a seventh batch of normalization layers and an eighth activation function which are connected in sequence; the input dimensionality of the fifth full connection layer is 128, the output dimensionalities of the fifth full connection layer, the sixth full connection layer and the seventh full connection layer are 1024, and the fifth activation function, the sixth activation function, the seventh activation function and the eighth activation function are ReLU; the output dimension of the eighth fully connected layer is the dimension of the input modal data.
3. The two-modal clustering method with missing data according to claim 1, wherein the specific method for obtaining the corresponding cross-modal contrast learning loss according to the hidden representation corresponding to the two-modal data in step S2 is as follows:
according to the formula:
Figure FDA0002913693920000021
obtaining cross-modal contrast learning loss lcl(ii) a Where m is the total number of samples in which both modalities exist simultaneously, and t represents the t-th sample;
Figure FDA0002913693920000022
the mutual information is represented by a representation of the mutual information,
Figure FDA0002913693920000023
for the corresponding hidden representation of the first modality data in the tth sample of simultaneous two modalities,
Figure FDA0002913693920000024
the hidden representation corresponding to the second modal data in the tth sample with two simultaneous modals; h (·) represents information entropy; alpha is the equilibrium parameter of entropy.
4. The two-modality clustering method with missing data according to claim 1, wherein the specific method for obtaining the corresponding intra-modality reconstruction loss according to the hidden representation corresponding to the two-modality data in step S2 is as follows:
according to the formula:
Figure FDA0002913693920000031
obtaining intra-modal reconstruction loss lrec(ii) a Where m is the total number of samples in which both modalities exist simultaneously, and t represents the t-th sample;
Figure FDA0002913693920000032
representing the v-th modal data in the t-th sample; f. of(v)(. and g)(v)() represents the encoder and decoder, respectively, to which the v-th modality data currently corresponds;
Figure FDA0002913693920000033
is a norm.
5. The method for two-modal clustering with missing data according to claim 1, wherein the specific method of step S3 is:
will lcl+0.1lrecThe calculation result is used as the current loss to carry out back propagation on the current self-encoder, and the parameters and the weight of the current self-encoder are updated; wherein lclComparing learning loss for cross-modal; lrecIs intra-modal reconstruction loss.
6. The two-modal clustering method with missing data according to claim 1, wherein the specific method for obtaining the corresponding cross-modal dual prediction loss according to the current latest hidden representation corresponding to the two-modal data in step S5 is as follows:
according to the formula:
Figure FDA0002913693920000034
obtaining Cross-modal Dual prediction loss lpre(ii) a Wherein Z1A hidden representation set corresponding to all first modality data in a sample with two modalities simultaneously; z2A hidden representation set corresponding to all second modality data in a sample with two modalities simultaneously; g(2)(Z2) To be Z2Performing mapping, G(1)(Z1) To be Z1Performing mapping, G(2)(. and G)(1)(. to) form a dual map;
Figure FDA0002913693920000035
is a norm.
7. The method for two-modal clustering with missing data according to claim 1, wherein the specific method of step S6 is:
will be given by formula lcl+0.1lpre+0.1lrecThe calculation result is used as the current loss to carry out back propagation on the current self-encoder, and the parameters and the weight of the current self-encoder are updated; wherein lclComparing learning loss for cross-modal;lprepredicting loss for cross-modal duality; lrecIs intra-modal reconstruction loss.
8. The method for two-modal clustering with missing data according to claim 1, wherein the specific method of step S8 is:
according to the formula:
Figure FDA0002913693920000041
Figure FDA0002913693920000042
Figure FDA0002913693920000043
Figure FDA0002913693920000044
obtaining a hidden representation corresponding to the two-modal data set with missing data, including a sample set X for the simultaneous existence of the two-modal data1Corresponding hidden representation
Figure FDA0002913693920000045
Sample set X for simultaneous presence of two modal data2Corresponding hidden representation
Figure FDA0002913693920000046
For sample sets X in which only the first modality exists(1)Corresponding hidden representation
Figure FDA0002913693920000047
And for sample set X where only the second modality is present(2)Corresponding hidden representation
Figure FDA0002913693920000048
Wherein
Figure FDA0002913693920000049
An encoder of the latest self-encoder corresponding to the 1 st modality data;
Figure FDA00029136939200000410
represents the encoder of the latest auto-encoder corresponding to the 2 nd modality data.
9. The method for two-modal clustering with missing data according to claim 8, wherein the specific method of step S9 is:
according to the formula:
Figure FDA00029136939200000411
Figure FDA00029136939200000412
obtaining hidden representations corresponding to sample sets where only the first modality is present, respectively
Figure FDA00029136939200000413
Representation of the corresponding missing modality
Figure FDA00029136939200000414
Implicit representation corresponding to a set of samples in which only a second modality is present
Figure FDA00029136939200000415
Representation of the corresponding missing modality
Figure FDA00029136939200000416
G(1)(. table)Showing the mapping for the first modality, G(2)(. represents a mapping corresponding to a second modality, G(2)(. and G)(1)(. cndot.) constitutes a dual map.
10. The two-modal clustering method for missing data according to claim 9, wherein the specific method of splicing the different modal representations corresponding to each sample and using the spliced modal representations as a common representation in step S10 is as follows:
will be provided with
Figure FDA0002913693920000051
As a common representation of samples of both modalities present simultaneously; will be provided with
Figure FDA0002913693920000052
As a common representation of samples where only the first modality is present; will be provided with
Figure FDA0002913693920000053
As a common representation of samples where only the second modality is present.
CN202110095029.2A 2021-01-25 2021-01-25 Image classification method with missing data in mode Active CN112784902B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110095029.2A CN112784902B (en) 2021-01-25 2021-01-25 Image classification method with missing data in mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110095029.2A CN112784902B (en) 2021-01-25 2021-01-25 Image classification method with missing data in mode

Publications (2)

Publication Number Publication Date
CN112784902A true CN112784902A (en) 2021-05-11
CN112784902B CN112784902B (en) 2023-06-30

Family

ID=75758853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110095029.2A Active CN112784902B (en) 2021-01-25 2021-01-25 Image classification method with missing data in mode

Country Status (1)

Country Link
CN (1) CN112784902B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657272A (en) * 2021-08-17 2021-11-16 山东建筑大学 Micro-video classification method and system based on missing data completion
CN114742132A (en) * 2022-03-17 2022-07-12 湖南工商大学 Deep multi-view clustering method, system and equipment based on common difference learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255739B1 (en) * 2008-06-30 2012-08-28 American Megatrends, Inc. Achieving data consistency in a node failover with a degraded RAID array
CN106202281A (en) * 2016-06-28 2016-12-07 广东工业大学 A kind of multi-modal data represents learning method and system
WO2017122785A1 (en) * 2016-01-15 2017-07-20 Preferred Networks, Inc. Systems and methods for multimodal generative machine learning
WO2018232378A1 (en) * 2017-06-16 2018-12-20 Markable, Inc. Image processing system
CN112001437A (en) * 2020-08-19 2020-11-27 四川大学 Modal non-complete alignment-oriented data clustering method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8255739B1 (en) * 2008-06-30 2012-08-28 American Megatrends, Inc. Achieving data consistency in a node failover with a degraded RAID array
WO2017122785A1 (en) * 2016-01-15 2017-07-20 Preferred Networks, Inc. Systems and methods for multimodal generative machine learning
CN106202281A (en) * 2016-06-28 2016-12-07 广东工业大学 A kind of multi-modal data represents learning method and system
WO2018232378A1 (en) * 2017-06-16 2018-12-20 Markable, Inc. Image processing system
CN112001437A (en) * 2020-08-19 2020-11-27 四川大学 Modal non-complete alignment-oriented data clustering method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YIJIE LIN 等: "COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction", 《2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 11169 - 11178 *
敬明旻: "基于深度神经网络的多模态特征自适应聚类方法", 计算机应用与软件, no. 10, pages 262 - 269 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657272A (en) * 2021-08-17 2021-11-16 山东建筑大学 Micro-video classification method and system based on missing data completion
CN114742132A (en) * 2022-03-17 2022-07-12 湖南工商大学 Deep multi-view clustering method, system and equipment based on common difference learning

Also Published As

Publication number Publication date
CN112784902B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
US20210012198A1 (en) Method for training deep neural network and apparatus
CN111310707B (en) Bone-based graph annotation meaning network action recognition method and system
Shao et al. Multiple incomplete views clustering via weighted nonnegative matrix factorization with regularization
CN111639544B (en) Expression recognition method based on multi-branch cross-connection convolutional neural network
WO2021218471A1 (en) Neural network for image processing and related device
CN113468227B (en) Information recommendation method, system, equipment and storage medium based on graph neural network
WO2022042043A1 (en) Machine learning model training method and apparatus, and electronic device
Sun et al. Global-local label correlation for partial multi-label learning
CN112016601B (en) Network model construction method based on knowledge graph enhanced small sample visual classification
CN111860193B (en) Text-based pedestrian retrieval self-supervision visual representation learning system and method
CN112784902A (en) Two-mode clustering method with missing data
CN110210540B (en) Cross-social media user identity recognition method and system based on attention mechanism
WO2020253180A1 (en) Smart home decision support system and decision support method
CN110110724A (en) The text authentication code recognition methods of function drive capsule neural network is squeezed based on exponential type
CN113705596A (en) Image recognition method and device, computer equipment and storage medium
CN112786160A (en) Multi-image input multi-label gastroscope image classification method based on graph neural network
Liang et al. ClusterFomer: Clustering As A Universal Visual Learner
CN113378938B (en) Edge transform graph neural network-based small sample image classification method and system
CN114359656A (en) Melanoma image identification method based on self-supervision contrast learning and storage device
CN112668543B (en) Isolated word sign language recognition method based on hand model perception
CN112084913B (en) End-to-end human body detection and attribute identification method
CN107729821B (en) Video summarization method based on one-dimensional sequence learning
CN116450827A (en) Event template induction method and system based on large-scale language model
CN111177492A (en) Cross-modal information retrieval method based on multi-view symmetric nonnegative matrix factorization
CN113378934B (en) Small sample image classification method and system based on semantic perception map neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant