CN106650817B - A kind of multimode data fusion method based on deep learning - Google Patents

A kind of multimode data fusion method based on deep learning Download PDF

Info

Publication number
CN106650817B
CN106650817B CN201611243618.6A CN201611243618A CN106650817B CN 106650817 B CN106650817 B CN 106650817B CN 201611243618 A CN201611243618 A CN 201611243618A CN 106650817 B CN106650817 B CN 106650817B
Authority
CN
China
Prior art keywords
mode data
data
hidden layer
layer
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611243618.6A
Other languages
Chinese (zh)
Other versions
CN106650817A (en
Inventor
郭利
周盛宗
王开军
余志刚
付璐斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Institute of Research on the Structure of Matter of CAS
Original Assignee
Fujian Institute of Research on the Structure of Matter of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Institute of Research on the Structure of Matter of CAS filed Critical Fujian Institute of Research on the Structure of Matter of CAS
Priority to CN201611243618.6A priority Critical patent/CN106650817B/en
Publication of CN106650817A publication Critical patent/CN106650817A/en
Application granted granted Critical
Publication of CN106650817B publication Critical patent/CN106650817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The multimode data fusion method based on deep learning that this application discloses a kind of, comprising: vectorization processing is carried out respectively to N number of mode data;N is natural number, and includes sensing data in N number of mode data;Each mode data in N number of mode data is modeled, N number of single mode data are obtained;Obtained any two single mode data are merged, dual-mode data is obtained;Any two dual-mode data comprising model identical data is merged, is merged by any one dual-mode data and with the different single mode data of the dual-mode data, obtains three mode datas;And so on, the fusion of N mode data is carried out according to obtained N-1 mode data, obtains N mode data.The application can merge the various modes data including sensing data.

Description

Multi-mode data fusion method based on deep learning
Technical Field
The application relates to a deep learning-based multi-mode data fusion method, and belongs to the field of machine learning.
Background
Deep learning has become a form of machine learning that dominates in computer vision, speech analysis, and many other areas. The deep learning adopts a layered structure similar to a neural network, the system is a multilayer network consisting of an input layer, a plurality of hidden layers and an output layer, only the nodes of the adjacent layers are connected, and the nodes of the same layer and the nodes of the cross-layer are not connected with each other.
In the prior art, in deep learning, multi-mode data fusion is mainly to use a deep self-coding machine to realize fusion of two modes of audio and video data, or use a gaussian bernoulli limit boltzmann machine and a repeated softMax limit boltzmann machine to realize fusion of two modes of picture and text data, or use deep learning of the deep boltzmann machine to realize fusion of data of audio, video, text and the like.
However, in practical applications, a large amount of sensor data is included, and there is no fusion of multiple mode data such as audio, image, text, sensor data, and the like.
Disclosure of Invention
According to one aspect of the application, a deep learning based multimodal data fusion method is provided that can fuse multimodal data including sensor data.
A deep learning-based multi-mode data fusion method comprises the following steps:
vectorizing the N pieces of mode data respectively; n is a natural number, and the N pieces of mode data comprise sensor data;
modeling each mode data in the N mode data to obtain N single mode data;
fusing any two obtained single-mode data to obtain dual-mode data;
fusing any two dual-mode data containing the same mode data, and fusing any one dual-mode data with single-mode data different from the dual-mode data to obtain three-mode data;
and by analogy, performing N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data.
Wherein N is 4, and the four mode data are audio data, sensor data, image data, and text data, respectively.
The audio data thinning and vectorizing processing specifically includes:
mean activation according to jth hidden layer neuronTo obtainm is the number of audio data, x(i)Represents the ith audio data;
wherein,denotes the two values as p andthe relative entropy of the mean bernoulli distribution, ρ is the sparsity parameter,the activation degree of the hidden neuron j is shown, and n is the number of hidden neurons;
setting a truncation kernel norm;
then sparse self-coding learning is carried out to obtain sparse and vectorized audio data Jsparse(W,b);
Wherein h isW,b(x(i)) X representing reconstruction(i)β denotes the weight of the sparsification penalty factor, W(1)Representing the weight of the visible layer to the first hidden layer.
The method comprises the following steps of performing thinning and vectorization processing on sensor data and image data, specifically:
setting a neural network as a k layer, setting sensor data and image data to be composed of N data samples, wherein each data sample is a D-dimensional vector, and setting the data of the k layerVector is
Presetting a learning threshold value of (b) for each layer1,…bK) The learning threshold value of each layer is gradually increased;
learning from a visible layer to a first hidden layer to obtain a vector of the first hidden layer;
learning from the ith hidden layer to the (i + 1) th hidden layer according to the obtained vector of the first hidden layer to obtain the vector of the (i + 1) th hidden layer, wherein 0< i < k-2;
and learning from the k-2 hidden layer to the k-1 hidden layer according to the vector of the k-2 hidden layer to obtain the thinned and vectorized sensor data and image data.
The vector of the first hidden layer isWherein,{v1,...,vmindicates that m training samples exist in the training set, rho is a sparsity parameter, D represents a dimension, and W representsijRepresenting the contribution degree of the ith unit of the visible layer to the jth neuron of the first hidden layer; bjThe offset value of the jth neuron is indicated.
The vector of the k-2 hidden layer is
Wherein, represents the jth vector of the ith hidden layer,Represents the jth vector of the kth-2 hidden layer,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer,an offset value representing the jth neuron from the i-1 th layer to the ith layer;representing the contribution degree of the s-th neuron of the k-3 hidden layer to the j-th neuron of the k-2 hidden layer,the offset values for the jth neuron from layer k-3 to layer k-2 are indicated.
Let the thinned and vectorized sensor data and image data be h(k-1)Then, then Represents the jth vector of the (k-1) th hidden layer, wherein,the contribution degree of the s-th neuron of the k-2 hidden layer to the j-th neuron of the k-1 hidden layer,the offset values for the jth neuron from layer k-2 to layer k-1 are indicated.
The method comprises the following steps of fusing any two dual-mode data containing the same mode data, fusing any one dual-mode data with a single-mode data different from the dual-mode data to obtain three-mode data, and specifically comprises the following steps:
combining any two dual-mode data containing the same mode data in the dual-mode data, namely a first combination; combining any one of the dual mode data with the different single mode data, which is called a second combination;
and respectively modeling the first hidden layer and the second hidden layer of any one of the first combination and the second combination by using a limiting Boltzmann machine to obtain three-mode data.
The method comprises the following steps of carrying out N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data, and specifically comprises the following steps:
combining any one of the obtained three-mode data with the different single-mode data, which is called a third combination; combining any one of the dual-mode data with the dual-mode data different from the dual-mode data, which is called a fourth combination;
and respectively modeling the first hidden layer and the second hidden layer of any one of the third combination and the fourth combination by using a limiting Boltzmann machine to obtain four-mode data.
Further, still include: the first mode data is inferred from multi-mode data that includes the first mode data and corresponding single mode or multi-mode data that does not include the first mode data.
The beneficial effects that this application can produce include:
1) the method comprises the steps of respectively conducting vectorization processing on N mode data, modeling each mode data in the N mode data after vectorization processing to obtain dual-mode data, fusing the obtained dual-mode data to obtain three-mode data, and repeating the steps to obtain the N mode data finally, so that fusion of multiple mode data including sensor data is achieved;
2) furthermore, when vectorization processing is carried out on high-dimensional data such as sensors, the learning threshold value of each layer is gradually increased, the number of activated neurons of a plurality of hidden layers in the middle is the minimum, and the limited number of neurons with the maximum activation relevance of the last layer is the maximum, so that layer-by-layer learning and layer-by-layer correction can be realized, and the high-dimensional data can be distributed on the limited dimension, so that data compression is realized, namely the high-dimensional data such as the sensors are sparsely represented in the learning process, and subsequent processing is facilitated;
3) furthermore, when the dual-mode or multi-mode data are fused based on the Gaussian Bernoulli limiting Boltzmann machine, the number of hidden layer activated neurons is limited, so that the representation of high-dimensional data on a limited dimension is obtained, and the fusion process is simplified;
4) further, in the event that a pattern or patterns are missing, missing or incomplete data can be inferred by bringing known pattern data into the multimodal data model.
Drawings
Fig. 1 is a schematic flow chart of a deep learning-based multi-mode data fusion method.
Detailed Description
The present application will be described in detail with reference to examples, but the present application is not limited to these examples.
Example 1
Referring to fig. 1, an embodiment of the present invention provides a deep learning-based multi-mode data fusion method, where the method includes:
101. vectorizing the N pieces of mode data respectively; n is a natural number, and the N pieces of mode data comprise sensor data;
in the embodiment of the present invention, N is set to 4, that is, the four pattern data include audio data, image data, and text data in addition to the sensor data.
Specifically, the audio data is subjected to sparsification and vectorization, specifically:
mean activation according to jth hidden layer neuronTo obtainm is the number of audio data, x(i)Represents the ith audio data;
wherein,denotes the two values as p andthe relative entropy of the primary effort distribution of the mean, p is the sparsity parameter,the activation degree of hidden layer neuron j is shown, and n is the number of hidden layer neurons;andare independent of each other, soAndindependent of each other, minimizedI.e. minimizing all relative entropy such thatApproaching rho;
setting a truncation kernel norm;
in particular, the matrix W is given(1)∈RD×nTruncating the kernel norm WrDefined as the sum of min (D, n) -r singular values;
then sparse self-coding learning is carried out to obtain sparse and vectorized audio data Jsparse(W,b);
Wherein h isW,b(x(i)) X representing reconstruction(i)β denotes the weight of the sparsification penalty factor, W(1)Representing the weight of the visible layer to the first hidden layer.
Performing sparsification and vectorization processing on the sensor data and the image data, specifically:
setting a neural network as a k layer, setting sensor data and image data to be composed of N data samples, wherein each data sample is a D-dimensional vector, and the data vector of the k layer is
Presetting a learning threshold of each layerA value of (b)1,…bK) The learning threshold value of each layer is gradually increased;
learning from a visible layer to a first hidden layer to obtain a vector of the first hidden layer;
specifically, learning from a visible layer to a first hidden layer is realized by adopting a Gaussian Bernoulli limiting Boltzmann machine based on a limited number of active neurons in the hidden layer, and the edge distribution of the visible layer v is P (v; theta), so that
Wherein, the truth unit v belongs to RD,h∈{0,1}F
The energy function of v, h is:
where θ ═ { a, b, W, σ } is the model parameter, with a penalty termAnd (3) performing sparseness, wherein lambda is the weight of a penalty term, F represents the number of neurons, then reconstructing a visible layer through a contrast divergence algorithm, and then learning a model theta by using a gradient descent algorithm, wherein the contrast divergence algorithm and the gradient descent learning algorithm belong to common knowledge of technicians in the field, and are not repeated herein.
After θ is obtained, the conditional probability when the visible layer is a given value and the hidden layer neuron is 1 can be obtained based on the above energy function as:
based on the energy function, the conditional probability of the hidden layer as a given value and the visible layer as x can be obtained:
obtaining the vector of the first hidden layer asWherein,{v1,...,vmrepresenting that m training samples exist in the training set, D is the dimension of an input vector, and rho is a sparsity parameter; wijRepresenting the contribution degree of the ith unit of the visible layer to the jth neuron of the first hidden layer; bjThe offset value of the jth neuron is indicated.
Learning from the i hidden layer to the i +1 hidden layer according to the obtained vector of the first hidden layer to obtain a vector of the i +1 hidden layer, wherein 0< i < k-2;
specifically, the vector of the ith hidden layer is used as the input of the (i + 1) th hidden layer to obtain the vector of the (i + 1) th hidden layer, namely the vector of the (k-2) th hidden layer is obtained after multi-layer learning;
in the embodiment of the invention, the vector of the ith hidden layer consists of real values, and the jth vector of the ith hidden layer is set asWherein,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer;represents the offset value of the jth neuron during learning from the i-1 th hidden layer to the ith hidden layer;
the reconstruction obtains the i-1 hidden layer vector as Representing the contribution degree of the ith hidden layer of the s-th neuron to the ith-1 hidden layer of the j-th neuron;representing the offset value from the ith hidden layer to the ith-1 hidden layer when the jth neuron is reconstructed;
with the knowledge of z, the conditional distribution of the i-1 th hidden layer is set as H(i-1)|z~N(z,σ2I) Andwherein H(i-1)The (i-1) th hidden layer is shown,representing the kth neuron of the i-1 hidden layer.
Then using the Loss function Loss (x, z) ═ C (σ)2)||h(i-1)-z||21||1{sigmoid(h(i))≥ξ}||*Optimizing to obtain the vector of the (i + 1) th hidden layer meeting the constraint condition, wherein the constraint condition is | z-h(i-1)|≤Bi
Wherein, C (σ)2) Denotes a constant, λ1Weight representing penalty term | · | | non-woven phosphor*Representing the nuclear norm, ξ is the activation value, Loss (x, z) is optimized such that z is at σ2The error range is as close as possible. Through multilayer learning, a plurality of vectors of the (i + 1) th hidden layer can be obtained, and the vectors of the (k-2) th hidden layer obtained through optimization are as follows:
wherein, represents the jth vector of the ith hidden layer,represents the jth vector of the kth-2 hidden layer,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer,an offset value representing the jth neuron from the i-1 th layer to the ith layer;representing the contribution degree of the s-th neuron of the k-3 hidden layer to the j-th neuron of the k-2 hidden layer,represents the offset value of the jth neuron from the k-3 layer to the k-2 layer;
according to the vector of the k-2 hidden layer, learning from the k-2 hidden layer to the k-1 hidden layer is carried out to obtain the vector of the k-1 hidden layer:
specifically, the vector of the k-2 hidden layer is used as the input of the k-1 hidden layer to obtain the vector of the k-1 hidden layer, namely the thinned and vectorized sensor data and image data h(k-1) Represents the jth vector of the (k-1) th hidden layer, wherein,the contribution degree of the s-th neuron of the k-2 hidden layer to the j-th neuron of the k-1 hidden layer,represents the offset value of the jth neuron from the (k-2) th layer to the (k-1) th layer;
the reconstructed k-2 hidden layer vector iszk-2Is the constraint ofk-2-h(k-2)|≤B(k-2)
Then useAnd optimizing the k-2 hidden layer vector obtained by reconstruction, so that the sensor data and the image data features are distributed on a limited number of neurons with the maximum relevance in a large quantity.
Wherein 1 isTW(k-1,k)1, ⊙ denotes the multiplication of elements, diIs the k-2 hidden layer vector h(k-2)With the ith vector of the weight matrix from the k-1 th hidden layer to the kth hidden layerThe euclidean distance of (c).
In the embodiment of the invention, each layer is associated with the previous layer after learning, the learning threshold value of each layer is gradually increased, the number of the activated neurons of a plurality of hidden layers in the middle is the least, and the limited number of neurons with the maximum activated association of the last layer is the most, so that layer-by-layer learning and layer-by-layer correction can be realized, high-dimensional data can be distributed on the limited dimension, the compression of the data is realized, and the high-dimensional data such as a sensor and the like are sparsely represented in the learning process.
The vectorization processing of the text data specifically includes: different words in the text data are sequenced and the corresponding text is converted into a frequency vector for the occurrence of the corresponding word.
102. Modeling each mode data in the N mode data after the vector quantization processing to obtain N single mode data;
in the embodiment of the invention, N pieces of mode data after vectorization processing are visible layers, and the visible layers are modeled by using a limited Boltzmann machine, namely the limited Boltzmann machine is used for modeling quantized audio data, inductor data and image data, and the limited Boltzmann machine which repeats Softmax is used for modeling text data; the first and second hidden layers of N mode data are modeled using a constrained boltzmann machine.
103. Fusing any two obtained single-mode data to obtain dual-mode data;
in the embodiment of the invention, if A is vectorized audio data, B is vectorized sensor data, C is vectorized image data, and D is vectorized text data, the obtained dual-mode data are AB, AC, AD, BC, BD and CD.
The fusion of any single mode data and text data can be obtained by a mode based on a Gaussian Bernoulli limit Boltzmann machine and a limit Boltzmann machine based on a repetitive SoftMax, such as AD, BD, CD.
The dual-mode or multi-mode data fusion result is represented by a first mode and a second mode, for example, if the dual-mode data fusion result is (a, B), the first mode is a, and the second mode is B; the three-mode data fusion result is ((AB) C), the first mode is AB which is fused, and the second mode is C; the four-mode data fusion result is ((AB) (CD)), then the first mode is the fused AB, and the second mode is the fused CD; and so on.
In the embodiment of the present invention, the dual-mode or multi-mode data fusion process may be represented by a gaussian bernoulli-limited boltzmann machine, where a first mode based on the gaussian bernoulli-limited boltzmann machine is represented by m, a second mode based on the gaussian bernoulli-limited boltzmann machine is represented by t, and the above example is continued, where in the dual-mode data fusion (a, B), the first mode m represents a, and the second mode t represents B; in triple-mode data fusion ((AB) C), the first mode m represents AB that has been fused, and the second mode represents C; in four-mode data fusion ((AB) (CD)), a first mode represents AB that has been fused, and a second mode t represents CD that has been fused; and so on.
The process of dual-mode or multi-mode data fusion may also be represented by a {0,1} constrained Boltzmann machine, the first mode of the {0,1} constrained Boltzmann machine being represented by n, and the second mode of the {0,1} constrained Boltzmann machine being represented by a.
Besides text data, the fusion of any two other single-mode data can be obtained through a first mode m based on the Gaussian Bernoulli limiting Boltzmann machine and a second mode t based on the Gaussian Bernoulli limiting Boltzmann machine; it is also possible to obtain by limiting the first mode n of the boltzmann machine based on {0,1} and the second mode a of the boltzmann machine based on {0,1 }.
104. Fusing any two dual-mode data containing the same mode data, and fusing any one dual-mode data with single-mode data different from the dual-mode data to obtain three-mode data;
specifically, any two dual-mode data combinations containing the same mode data in the dual-mode data are referred to as a first combination; combining any one of the dual mode data with the different single mode data, which is called a second combination;
and respectively modeling the first hidden layer and the second hidden layer of any one of the first combination and the second combination by using a limiting Boltzmann machine to obtain three-mode data.
Continuing with the above example, the first combination of any two dual-mode data including the same mode data among the 6 dual-mode data is: (AB, AC), (AB, AD), (AB, BC), (AB, BD), (AC, AD), (AC, BC), (AC, CD), (AD, BD), (AD, CD), (BC, BD), (BC, CD), (BD, CD);
the second combination of the different single-mode data in any one of the 6 dual-mode data is as follows: (AB, C), (AB, D), (AC, B), (AC, D), (AD, B), (AD, C), (BC, A), (BC, D), (BD, A), (BD, C), (CD, A), (CD, B);
then, the first and second hidden layers of any one of the first and second combinations are modeled using a constrained boltzmann machine, resulting in three-mode data (ABC, ABD, ACD, BCD).
When the dual-mode or multi-mode data is fused based on the Gaussian Bernoulli limiting Boltzmann machine, a hidden layer variable h is { h ═ h(1m),h(2m),h(1t),h(2t),h(3)That is, the number of hidden layer-activated neurons is limited, the specific fusion result of bimodal or multimodal data is expressed as follows:
wherein,
representing the conditional probability of the first hidden layer under the condition of known neurons of the visible layer and the second hidden layer in the m mode;
representing the condition probability of the known first hidden layer, the known third hidden layer neuron condition and the known second hidden layer in the m mode;
representing the conditional probability of the known visible layer, the known second hidden layer and the known first hidden layer in the t mode;
representing the conditional probability of the first hidden layer, the third hidden layer and the second hidden layer which are known in the t mode;
the conditional probability of a second hidden layer of a known t mode, a second hidden layer of an m mode and a third hidden layer is represented;
representing that in a t mode, given a first hidden layer, a visible layer obeys Gaussian distribution;
representing that in the m mode, given a first hidden layer, a visible layer obeys Gaussian distribution;
when the boltzmann machine is limited to fuse dual-mode or multi-mode data based on {0,1}, the hidden layer variable h is { h ═ h(1n),h(2n),h(1a),h(2a),h(3)And then the specific fusion result of the dual-mode or multi-mode data is expressed as follows:
wherein,
representing the conditional probability of the first hidden layer under the condition of known neurons of the visible layer and the second hidden layer in the n mode;
representing the conditional probability of the first hidden layer under the condition of known neurons of the visible layer and the second hidden layer in the n mode;
representing the conditional probability of a first hidden layer under the condition of known neurons of a visible layer and a second hidden layer in an a mode;
representing the conditional probability of a mode second hidden layer under the condition of a known a mode first hidden layer and a known a mode third hidden layer neuron;
the conditional probability of a third hidden layer under the condition of a known n-mode second hidden layer and a known a-mode second hidden layer neuron is represented;
representing that in a mode, given a first hidden layer, a visible layer obeys a Gaussian distribution;
representing that in an n mode, given a first hidden layer, a visible layer obeys Gaussian distribution;
105. and by analogy, performing N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data.
In the embodiment of the invention, if N is 4, four-mode modeling is carried out on the obtained three-mode data.
Specifically, any one of the obtained three-mode data is combined with the different single-mode data, which is called a third combination; combining any one of the dual-mode data with the dual-mode data different from the dual-mode data, which is called a fourth combination;
and then modeling the first hidden layer and the second hidden layer of any one of the third combination and the fourth combination by using a limiting Boltzmann machine to obtain four-mode data.
Continuing with the above example, a third combination of any tri-modal data and its different mono-modal data is: (ABC, D), (ABD, C), (ACD, B), (BCD, A);
a fourth combination of either dual mode data and dual mode data distinct therefrom: (AB, CD), (AC, BD), (AD, BC);
according to the third combination and the fourth combination, four-pattern data ABCD is obtained.
Further, after obtaining the multi-mode data, the present invention can also use the multi-mode data to learn, since the learning using the multi-mode data belongs to the common knowledge of those skilled in the art, the present invention will be briefly described as follows: to find the average domain parameter μ that satisfies the condition, the right part of the following inequality should be maximized to obtain the optimal μ ═ μ(1m),μ(1t),μ(2m),μ(2t),μ(3)}; wherein μ is related to θ; the inequality mentioned above is
Wherein,
as a result of this, it is possible to,
when Q (h | v; μ) ═ P (h | v; θ), logP (v; θ) ═ L (Q (h | v; μ)), then minimizing logP (v; θ) translates into maximizing L (Q (h | v; μ)) on the right side of the above inequality; when KL (Q (h | v; μ) | P (h | v; θ)) > 0, L (Q (h | v; μ)) is the largest, so optimizing logP (v; θ) can translate into approximating P (h | v; θ) with Q (h | v; μ).
The embodiment of the invention uses a naive average domain approximation to approximate P (h | v; theta);
when the process of bi-modal or multi-modal data fusion is represented by a Gaussian Bernoulli-limited Boltzmann machine, Q (h | v; μ) is constructed, and μ obtained by the above learning process is as follows:
wherein,q (h | v; μ) is a form of continuous multiplication on hidden layer neurons in that it simplifies the approximation to the posterior distribution.
Then mu in mu(1m),μ(1t),μ(2m),μ(2t),μ(3)The methods are respectively as follows:
wherein,a first hidden layer and a second hidden layer respectively corresponding to the m-mode,a first hidden layer and a second hidden layer respectively corresponding to the t mode,the corresponding mixed layer is also the third hidden layer.
When the process of bi-modal or multi-modal data fusion can also be represented by a {0,1} based constrained Boltzmann machine, the Q (h | v; μ) constructed is:
in this case, { μ(1n),μ(1a),μ(2n),μ(2a),μ(3)The methods are respectively as follows:
wherein,a first hidden layer and a second hidden layer corresponding to the n-mode respectively,a first hidden layer and a second hidden layer respectively corresponding to the a mode,the corresponding mixed layer is also the third hidden layer.
Then, the user can use the device to perform the operation,
sampling from Q (v, h; theta) to obtain a model expectation P (v, h | theta);
vi、hjdenotes the sample on Q (v, h; theta), NvIndicates the number of visible layer elements, NhRepresenting the number of hidden layer neurons;
data expectation E is then calculated from the obtained μP(h|v)
Calculating a full sample according to the condition distribution of the current neuron relative to other neurons and the collected partial sample;
specifically, Q samples are provided, and the initial state of the s-th sample is (v)0,s,h0,s) And acquiring a Markov chain after Gibbs sampling, acquiring a stable state sample after t-50 times of sampling, and forming Q Markov chains by Q samples.
Updating the connection weight w between each layer of the neural network and the deviant b of hidden layer neuron according to the collected new sample and the obtained mu;
specifically, when θ is set, the energy function P (v, h) is calculated with respect to θ and W(1)、W(i)A, b, obtaining a difference value corresponding to the desired case with respect to P (h | v; theta) and with respect to P (v, h; theta);
and updating the weight by using a gradient descent method according to the obtained expected difference value so as to achieve the aim of learning by using the obtained multimode data.
Wherein the energy function P (v, h) is related to theta, W(1)、W(i)The first derivative of a, b is prior art, and the embodiment of the present invention only lists the energy function with respect to the connection weight W(1)And W(i)The first derivative of (a) will not be described in detail.
Energy function with respect to connection weight W(1)Is a first derivative of
W(1)Representing weight matrixes of the visible layer and the first hidden layer; p (v; θ) is the edge probability for v;
energy function with respect to connection weight W(i)Is a first derivative of
W(i)And representing the weight matrix of the i-1 hidden layer and the i hidden layer.
Finally, when Q (h | v; μ) is equal to P (h | v; θ), the right part of the above inequality is maximized, and the optimal μ is obtained.
Furthermore, the multi-mode data model finally obtained by the embodiment of the invention can also estimate lost or incomplete data, namely, the first mode data can be estimated according to the multi-mode data containing the first mode data and the corresponding single-mode or multi-mode data not containing the first mode data;
for example, if the first mode data is C mode data, if the C mode data is missing, the C mode data can be estimated by guiding through the dual mode data p (ac), p (bc) or p (cd), and then correspondingly sampling in the single mode p (a), p (b) or p (d); the C mode data can also be deduced by guiding through the three mode data p (ABC), P (ACD) or P (BCD) and then correspondingly sampling on the dual mode data P (AB), P (AD) or P (BD); the C-mode data can also be inferred by guiding through the four-mode data p (abcd), and then sampling on the three-mode data p (abd), and similarly, the inference method when two or more mode data are missing is similar, and the embodiment of the present invention is not repeated herein.
In the embodiment of the invention, N mode data are respectively subjected to vectorization processing, then each mode data in the N mode data after vectorization processing is obtained through modeling, so that dual mode data are obtained, then the obtained dual mode data are fused to obtain three mode data, and the like, so that the N mode data are finally obtained, and the fusion of multiple mode data including sensor data is realized; furthermore, when vectorization processing is carried out on high-dimensional data such as sensors, the learning threshold value of each layer is gradually increased, the number of activated neurons of a plurality of hidden layers in the middle is the minimum, and the limited number of neurons with the maximum activation relevance of the last layer is the maximum, so that layer-by-layer learning and layer-by-layer correction can be realized, and the high-dimensional data can be distributed on the limited dimension, so that data compression is realized, namely the high-dimensional data such as the sensors are sparsely represented in the learning process, and subsequent processing is facilitated; furthermore, when the dual-mode or multi-mode data are fused based on the Gaussian Bernoulli limiting Boltzmann machine, the number of hidden layer activated neurons is limited, so that the representation of high-dimensional data on a limited dimension is obtained, and the fusion process is simplified; further, in the event that a pattern or patterns are missing, missing or incomplete data can be inferred by bringing known pattern data into the multimodal data model.
Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

Claims (7)

1. A multi-mode data fusion method based on deep learning is characterized by comprising the following steps:
vectorizing the N pieces of mode data respectively; n is a natural number, and the N pieces of mode data comprise sensor data;
modeling each mode data in the N mode data to obtain N single mode data;
fusing any two obtained single-mode data to obtain dual-mode data;
fusing any two dual-mode data containing the same mode data, and fusing any one dual-mode data with single-mode data different from the dual-mode data to obtain three-mode data;
by analogy, performing N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data;
the N is 4, the four mode data are audio data, sensor data, image data and text data, and the thinning and vectorizing processing on the audio data specifically includes:
mean activation according to jth hidden layer neuronTo obtainm is the number of audio data, x(i)Represents the ith audio data;
wherein,denotes the two values as p andthe relative entropy of the mean bernoulli distribution, ρ is the sparsity parameter,the activation degree of the hidden neuron j is shown, and n is the number of hidden neurons;
setting a truncation kernel norm;
then sparse self-coding learning is carried out to obtain sparse and vectorized audio data Jsparse(W,b);
Wherein h isW,b(x(i)) X representing reconstruction(i)β denotes the weight of the sparsification penalty factor, W(1)Representing the weight of the visible layer to the first hidden layer;
performing sparsification and vectorization processing on the sensor data and the image data, specifically:
setting a neural network as a k layer, setting sensor data and image data to be composed of N data samples, wherein each data sample is a D-dimensional vector, and the data vector of the k layer is
Presetting a learning threshold value of (b) for each layer1,…bK) The learning threshold value of each layer is gradually increased;
learning from a visible layer to a first hidden layer to obtain a vector of the first hidden layer;
learning from the ith hidden layer to the (i + 1) th hidden layer according to the obtained vector of the first hidden layer to obtain the vector of the (i + 1) th hidden layer, wherein 0< i < k-2;
and learning from the k-2 hidden layer to the k-1 hidden layer according to the vector of the k-2 hidden layer to obtain the thinned and vectorized sensor data and image data.
2. The method of claim 1, wherein the vector of the first hidden layer isWherein,{v1,...,vmindicates that m training samples exist in the training set, rho is a sparsity parameter, D represents a dimension, and W representsijRepresenting the contribution degree of the ith unit of the visible layer to the jth neuron of the first hidden layer; bjMeans the jth godOffset value of the element.
3. The method of claim 1, wherein the vector of the k-2 hidden layer is
Wherein,hj (i)represents the jth vector of the ith hidden layer,represents the jth vector of the kth-2 hidden layer,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer,an offset value representing the jth neuron from the i-1 th layer to the ith layer;representing the contribution degree of the s-th neuron of the k-3 hidden layer to the j-th neuron of the k-2 hidden layer,the offset values for the jth neuron from layer k-3 to layer k-2 are indicated.
4. The method of claim 1, wherein the thinned and vectorized sensor data and image data are taken as h(k-1) Represents the jth vector of the (k-1) th hidden layer, wherein,the contribution degree of the s-th neuron of the k-2 hidden layer to the j-th neuron of the k-1 hidden layer,the offset values for the jth neuron from layer k-2 to layer k-1 are indicated.
5. The method according to claim 1, wherein any two dual-mode data containing the same mode data are fused, and any one dual-mode data and a single-mode data different from the dual-mode data are fused to obtain three-mode data, specifically:
combining any two dual-mode data containing the same mode data in the dual-mode data, namely a first combination; combining any one of the dual mode data with the different single mode data, which is called a second combination;
and respectively modeling the first hidden layer and the second hidden layer of any one of the first combination and the second combination by using a limiting Boltzmann machine to obtain three-mode data.
6. The method according to claim 1, wherein N-mode data fusion is performed according to the obtained N-1-mode data to obtain N-mode data, specifically:
combining any one of the obtained three-mode data with the different single-mode data, which is called a third combination; combining any one of the dual-mode data with the dual-mode data different from the dual-mode data, which is called a fourth combination;
and respectively modeling the first hidden layer and the second hidden layer of any one of the third combination and the fourth combination by using a limiting Boltzmann machine to obtain four-mode data.
7. The method of any of claims 1 to 6, further comprising:
the first mode data is inferred from multi-mode data that includes the first mode data and corresponding single mode or multi-mode data that does not include the first mode data.
CN201611243618.6A 2016-12-29 2016-12-29 A kind of multimode data fusion method based on deep learning Active CN106650817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611243618.6A CN106650817B (en) 2016-12-29 2016-12-29 A kind of multimode data fusion method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611243618.6A CN106650817B (en) 2016-12-29 2016-12-29 A kind of multimode data fusion method based on deep learning

Publications (2)

Publication Number Publication Date
CN106650817A CN106650817A (en) 2017-05-10
CN106650817B true CN106650817B (en) 2019-09-20

Family

ID=58835902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611243618.6A Active CN106650817B (en) 2016-12-29 2016-12-29 A kind of multimode data fusion method based on deep learning

Country Status (1)

Country Link
CN (1) CN106650817B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107280697A (en) * 2017-05-15 2017-10-24 北京市计算中心 Lung neoplasm grading determination method and system based on deep learning and data fusion
CN109993291B (en) * 2017-12-30 2020-07-07 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN108768585B (en) * 2018-04-27 2021-03-16 南京邮电大学 Multi-user detection method of uplink signaling-free non-orthogonal multiple access (NOMA) system based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118610A (en) * 2007-09-10 2008-02-06 东北大学 Sparseness data process modeling approach
CN104766433A (en) * 2015-04-23 2015-07-08 河南理工大学 Electrical fire warning system based on data fusion
CN105022835A (en) * 2015-08-14 2015-11-04 武汉大学 Public safety recognition method and system for crowd sensing big data
CA2951723A1 (en) * 2014-06-10 2015-12-17 Sightline Innovation Inc. System and method for network based application development and implementation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118610A (en) * 2007-09-10 2008-02-06 东北大学 Sparseness data process modeling approach
CA2951723A1 (en) * 2014-06-10 2015-12-17 Sightline Innovation Inc. System and method for network based application development and implementation
CN104766433A (en) * 2015-04-23 2015-07-08 河南理工大学 Electrical fire warning system based on data fusion
CN105022835A (en) * 2015-08-14 2015-11-04 武汉大学 Public safety recognition method and system for crowd sensing big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"一种多源异构数据融合方法及其应用研究";姜建华等;《电子设计工程》;20160620;第24卷(第12期);全文 *
"无线传感器网络中基于神经网络的数据融合模型";俞黎阳等;《计算机科学》;20081215;第35卷(第12期);全文 *

Also Published As

Publication number Publication date
CN106650817A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN107516110B (en) Medical question-answer semantic clustering method based on integrated convolutional coding
Sirat et al. Neural trees: a new tool for classification
CN111461322B (en) Deep neural network model compression method
WO2019155064A1 (en) Data compression using jointly trained encoder, decoder, and prior neural networks
Khodayar et al. Robust deep neural network for wind speed prediction
Hawkins et al. Towards compact neural networks via end-to-end training: A Bayesian tensor approach with automatic rank determination
CN106650817B (en) A kind of multimode data fusion method based on deep learning
Rajan et al. Is cross-attention preferable to self-attention for multi-modal emotion recognition?
CN113064968A (en) Social media emotion analysis method and system based on tensor fusion network
Yi et al. Expanded autoencoder recommendation framework and its application in movie recommendation
Vardhan et al. Deep learning based fea surrogate for sub-sea pressure vessel
CN110991601A (en) Neural network recommendation method based on multi-user behaviors
Sento Image compression with auto-encoder algorithm using deep neural network (DNN)
Raksarikorn et al. Facial expression classification using deep extreme inception networks
Varshitha et al. Natural language processing using convolutional neural network
CN112541541B (en) Lightweight multi-modal emotion analysis method based on multi-element layering depth fusion
Cho et al. Non-contrastive self-supervised learning of utterance-level speech representations
Gkoumas et al. An entanglement-driven fusion neural network for video sentiment analysis
CN112734025B (en) Neural network parameter sparsification method based on fixed base regularization
Rehman et al. Performance evaluation of MLPNN and NB: a comparative study on Car Evaluation Dataset
Sevakula et al. Fuzzy rule reduction using sparse auto-encoders
Shabalov et al. Automatized design application of intelligent information technologies for data mining problems
Ren The advance of generative model and variational autoencoder
CN206961151U (en) A kind of multimode data fusing device based on deep learning
CN112686306B (en) ICD operation classification automatic matching method and system based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant