CN106650817B

CN106650817B - A kind of multimode data fusion method based on deep learning

Info

Publication number: CN106650817B
Application number: CN201611243618.6A
Authority: CN
Inventors: 郭利; 周盛宗; 王开军; 余志刚; 付璐斯
Original assignee: Fujian Institute of Research on the Structure of Matter of CAS
Current assignee: Fujian Institute of Research on the Structure of Matter of CAS
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2019-09-20
Anticipated expiration: 2036-12-29
Also published as: CN106650817A

Abstract

The multimode data fusion method based on deep learning that this application discloses a kind of, comprising: vectorization processing is carried out respectively to N number of mode data；N is natural number, and includes sensing data in N number of mode data；Each mode data in N number of mode data is modeled, N number of single mode data are obtained；Obtained any two single mode data are merged, dual-mode data is obtained；Any two dual-mode data comprising model identical data is merged, is merged by any one dual-mode data and with the different single mode data of the dual-mode data, obtains three mode datas；And so on, the fusion of N mode data is carried out according to obtained N-1 mode data, obtains N mode data.The application can merge the various modes data including sensing data.

Description

Multi-mode data fusion method based on deep learning

Technical Field

The application relates to a deep learning-based multi-mode data fusion method, and belongs to the field of machine learning.

Background

Deep learning has become a form of machine learning that dominates in computer vision, speech analysis, and many other areas. The deep learning adopts a layered structure similar to a neural network, the system is a multilayer network consisting of an input layer, a plurality of hidden layers and an output layer, only the nodes of the adjacent layers are connected, and the nodes of the same layer and the nodes of the cross-layer are not connected with each other.

In the prior art, in deep learning, multi-mode data fusion is mainly to use a deep self-coding machine to realize fusion of two modes of audio and video data, or use a gaussian bernoulli limit boltzmann machine and a repeated softMax limit boltzmann machine to realize fusion of two modes of picture and text data, or use deep learning of the deep boltzmann machine to realize fusion of data of audio, video, text and the like.

However, in practical applications, a large amount of sensor data is included, and there is no fusion of multiple mode data such as audio, image, text, sensor data, and the like.

Disclosure of Invention

According to one aspect of the application, a deep learning based multimodal data fusion method is provided that can fuse multimodal data including sensor data.

A deep learning-based multi-mode data fusion method comprises the following steps:

vectorizing the N pieces of mode data respectively; n is a natural number, and the N pieces of mode data comprise sensor data;

modeling each mode data in the N mode data to obtain N single mode data;

fusing any two obtained single-mode data to obtain dual-mode data;

fusing any two dual-mode data containing the same mode data, and fusing any one dual-mode data with single-mode data different from the dual-mode data to obtain three-mode data;

and by analogy, performing N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data.

Wherein N is 4, and the four mode data are audio data, sensor data, image data, and text data, respectively.

The audio data thinning and vectorizing processing specifically includes:

mean activation according to jth hidden layer neuronTo obtainm is the number of audio data, x⁽ⁱ⁾Represents the ith audio data;

wherein,denotes the two values as p andthe relative entropy of the mean bernoulli distribution, ρ is the sparsity parameter,the activation degree of the hidden neuron j is shown, and n is the number of hidden neurons;

setting a truncation kernel norm;

then sparse self-coding learning is carried out to obtain sparse and vectorized audio data J_sparse(W,b)；

Wherein h is_W,b(x⁽ⁱ⁾) X representing reconstruction⁽ⁱ⁾β denotes the weight of the sparsification penalty factor, W⁽¹⁾Representing the weight of the visible layer to the first hidden layer.

The method comprises the following steps of performing thinning and vectorization processing on sensor data and image data, specifically:

setting a neural network as a k layer, setting sensor data and image data to be composed of N data samples, wherein each data sample is a D-dimensional vector, and setting the data of the k layerVector is

Presetting a learning threshold value of (b) for each layer₁,…b_K) The learning threshold value of each layer is gradually increased;

learning from a visible layer to a first hidden layer to obtain a vector of the first hidden layer;

learning from the ith hidden layer to the (i + 1) th hidden layer according to the obtained vector of the first hidden layer to obtain the vector of the (i + 1) th hidden layer, wherein 0< i < k-2;

and learning from the k-2 hidden layer to the k-1 hidden layer according to the vector of the k-2 hidden layer to obtain the thinned and vectorized sensor data and image data.

The vector of the first hidden layer isWherein,{v¹,...,v^mindicates that m training samples exist in the training set, rho is a sparsity parameter, D represents a dimension, and W represents_ijRepresenting the contribution degree of the ith unit of the visible layer to the jth neuron of the first hidden layer; b_jThe offset value of the jth neuron is indicated.

The vector of the k-2 hidden layer is

Wherein, represents the jth vector of the ith hidden layer，Represents the jth vector of the kth-2 hidden layer,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer,an offset value representing the jth neuron from the i-1 th layer to the ith layer;representing the contribution degree of the s-th neuron of the k-3 hidden layer to the j-th neuron of the k-2 hidden layer,the offset values for the jth neuron from layer k-3 to layer k-2 are indicated.

Let the thinned and vectorized sensor data and image data be h^(k-1)Then, then Represents the jth vector of the (k-1) th hidden layer, wherein,the contribution degree of the s-th neuron of the k-2 hidden layer to the j-th neuron of the k-1 hidden layer,the offset values for the jth neuron from layer k-2 to layer k-1 are indicated.

The method comprises the following steps of fusing any two dual-mode data containing the same mode data, fusing any one dual-mode data with a single-mode data different from the dual-mode data to obtain three-mode data, and specifically comprises the following steps:

combining any two dual-mode data containing the same mode data in the dual-mode data, namely a first combination; combining any one of the dual mode data with the different single mode data, which is called a second combination;

and respectively modeling the first hidden layer and the second hidden layer of any one of the first combination and the second combination by using a limiting Boltzmann machine to obtain three-mode data.

The method comprises the following steps of carrying out N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data, and specifically comprises the following steps:

combining any one of the obtained three-mode data with the different single-mode data, which is called a third combination; combining any one of the dual-mode data with the dual-mode data different from the dual-mode data, which is called a fourth combination;

and respectively modeling the first hidden layer and the second hidden layer of any one of the third combination and the fourth combination by using a limiting Boltzmann machine to obtain four-mode data.

Further, still include: the first mode data is inferred from multi-mode data that includes the first mode data and corresponding single mode or multi-mode data that does not include the first mode data.

The beneficial effects that this application can produce include:

1) the method comprises the steps of respectively conducting vectorization processing on N mode data, modeling each mode data in the N mode data after vectorization processing to obtain dual-mode data, fusing the obtained dual-mode data to obtain three-mode data, and repeating the steps to obtain the N mode data finally, so that fusion of multiple mode data including sensor data is achieved;

2) furthermore, when vectorization processing is carried out on high-dimensional data such as sensors, the learning threshold value of each layer is gradually increased, the number of activated neurons of a plurality of hidden layers in the middle is the minimum, and the limited number of neurons with the maximum activation relevance of the last layer is the maximum, so that layer-by-layer learning and layer-by-layer correction can be realized, and the high-dimensional data can be distributed on the limited dimension, so that data compression is realized, namely the high-dimensional data such as the sensors are sparsely represented in the learning process, and subsequent processing is facilitated;

3) furthermore, when the dual-mode or multi-mode data are fused based on the Gaussian Bernoulli limiting Boltzmann machine, the number of hidden layer activated neurons is limited, so that the representation of high-dimensional data on a limited dimension is obtained, and the fusion process is simplified;

4) further, in the event that a pattern or patterns are missing, missing or incomplete data can be inferred by bringing known pattern data into the multimodal data model.

Drawings

Fig. 1 is a schematic flow chart of a deep learning-based multi-mode data fusion method.

Detailed Description

The present application will be described in detail with reference to examples, but the present application is not limited to these examples.

Example 1

Referring to fig. 1, an embodiment of the present invention provides a deep learning-based multi-mode data fusion method, where the method includes:

101. vectorizing the N pieces of mode data respectively; n is a natural number, and the N pieces of mode data comprise sensor data;

in the embodiment of the present invention, N is set to 4, that is, the four pattern data include audio data, image data, and text data in addition to the sensor data.

Specifically, the audio data is subjected to sparsification and vectorization, specifically:

wherein,denotes the two values as p andthe relative entropy of the primary effort distribution of the mean, p is the sparsity parameter,the activation degree of hidden layer neuron j is shown, and n is the number of hidden layer neurons;andare independent of each other, soAndindependent of each other, minimizedI.e. minimizing all relative entropy such thatApproaching rho;

setting a truncation kernel norm;

in particular, the matrix W is given⁽¹⁾∈R^D×nTruncating the kernel norm W_rDefined as the sum of min (D, n) -r singular values;

Performing sparsification and vectorization processing on the sensor data and the image data, specifically:

setting a neural network as a k layer, setting sensor data and image data to be composed of N data samples, wherein each data sample is a D-dimensional vector, and the data vector of the k layer is

Presetting a learning threshold of each layerA value of (b)₁,…b_K) The learning threshold value of each layer is gradually increased;

specifically, learning from a visible layer to a first hidden layer is realized by adopting a Gaussian Bernoulli limiting Boltzmann machine based on a limited number of active neurons in the hidden layer, and the edge distribution of the visible layer v is P (v; theta), so that

Wherein, the truth unit v belongs to R^D,h∈{0,1}^F；

The energy function of v, h is:

where θ ═ { a, b, W, σ } is the model parameter, with a penalty termAnd (3) performing sparseness, wherein lambda is the weight of a penalty term, F represents the number of neurons, then reconstructing a visible layer through a contrast divergence algorithm, and then learning a model theta by using a gradient descent algorithm, wherein the contrast divergence algorithm and the gradient descent learning algorithm belong to common knowledge of technicians in the field, and are not repeated herein.

After θ is obtained, the conditional probability when the visible layer is a given value and the hidden layer neuron is 1 can be obtained based on the above energy function as:

based on the energy function, the conditional probability of the hidden layer as a given value and the visible layer as x can be obtained:

obtaining the vector of the first hidden layer asWherein,{v¹,...,v^mrepresenting that m training samples exist in the training set, D is the dimension of an input vector, and rho is a sparsity parameter; w_ijRepresenting the contribution degree of the ith unit of the visible layer to the jth neuron of the first hidden layer; b_jThe offset value of the jth neuron is indicated.

Learning from the i hidden layer to the i +1 hidden layer according to the obtained vector of the first hidden layer to obtain a vector of the i +1 hidden layer, wherein 0< i < k-2;

specifically, the vector of the ith hidden layer is used as the input of the (i + 1) th hidden layer to obtain the vector of the (i + 1) th hidden layer, namely the vector of the (k-2) th hidden layer is obtained after multi-layer learning;

in the embodiment of the invention, the vector of the ith hidden layer consists of real values, and the jth vector of the ith hidden layer is set asWherein,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer;represents the offset value of the jth neuron during learning from the i-1 th hidden layer to the ith hidden layer;

the reconstruction obtains the i-1 hidden layer vector as Representing the contribution degree of the ith hidden layer of the s-th neuron to the ith-1 hidden layer of the j-th neuron;representing the offset value from the ith hidden layer to the ith-1 hidden layer when the jth neuron is reconstructed;

with the knowledge of z, the conditional distribution of the i-1 th hidden layer is set as H^(i-1)|z～N(z,σ²I) Andwherein H^(i-1)The (i-1) th hidden layer is shown,representing the kth neuron of the i-1 hidden layer.

Then using the Loss function Loss (x, z) ═ C (σ)²)||h^(i-1)-z||²+λ₁||1{sigmoid(h⁽ⁱ⁾)≥ξ}||_*Optimizing to obtain the vector of the (i + 1) th hidden layer meeting the constraint condition, wherein the constraint condition is | z-h^(i-1)|≤B_i。

Wherein, C (σ)²) Denotes a constant, λ₁Weight representing penalty term | · | | non-woven phosphor_*Representing the nuclear norm, ξ is the activation value, Loss (x, z) is optimized such that z is at σ²The error range is as close as possible. Through multilayer learning, a plurality of vectors of the (i + 1) th hidden layer can be obtained, and the vectors of the (k-2) th hidden layer obtained through optimization are as follows:

wherein, represents the jth vector of the ith hidden layer,represents the jth vector of the kth-2 hidden layer,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer,an offset value representing the jth neuron from the i-1 th layer to the ith layer;representing the contribution degree of the s-th neuron of the k-3 hidden layer to the j-th neuron of the k-2 hidden layer,represents the offset value of the jth neuron from the k-3 layer to the k-2 layer;

according to the vector of the k-2 hidden layer, learning from the k-2 hidden layer to the k-1 hidden layer is carried out to obtain the vector of the k-1 hidden layer:

specifically, the vector of the k-2 hidden layer is used as the input of the k-1 hidden layer to obtain the vector of the k-1 hidden layer, namely the thinned and vectorized sensor data and image data h^(k-1)； Represents the jth vector of the (k-1) th hidden layer, wherein,the contribution degree of the s-th neuron of the k-2 hidden layer to the j-th neuron of the k-1 hidden layer,represents the offset value of the jth neuron from the (k-2) th layer to the (k-1) th layer;

the reconstructed k-2 hidden layer vector isz^k-2Is the constraint of^k-2-h^(k-2)|≤B^(k-2)；

Then useAnd optimizing the k-2 hidden layer vector obtained by reconstruction, so that the sensor data and the image data features are distributed on a limited number of neurons with the maximum relevance in a large quantity.

Wherein 1 is^TW_(k-1,k)1, ⊙ denotes the multiplication of elements, d_iIs the k-2 hidden layer vector h^(k-2)With the ith vector of the weight matrix from the k-1 th hidden layer to the kth hidden layerThe euclidean distance of (c).

In the embodiment of the invention, each layer is associated with the previous layer after learning, the learning threshold value of each layer is gradually increased, the number of the activated neurons of a plurality of hidden layers in the middle is the least, and the limited number of neurons with the maximum activated association of the last layer is the most, so that layer-by-layer learning and layer-by-layer correction can be realized, high-dimensional data can be distributed on the limited dimension, the compression of the data is realized, and the high-dimensional data such as a sensor and the like are sparsely represented in the learning process.

The vectorization processing of the text data specifically includes: different words in the text data are sequenced and the corresponding text is converted into a frequency vector for the occurrence of the corresponding word.

102. Modeling each mode data in the N mode data after the vector quantization processing to obtain N single mode data;

in the embodiment of the invention, N pieces of mode data after vectorization processing are visible layers, and the visible layers are modeled by using a limited Boltzmann machine, namely the limited Boltzmann machine is used for modeling quantized audio data, inductor data and image data, and the limited Boltzmann machine which repeats Softmax is used for modeling text data; the first and second hidden layers of N mode data are modeled using a constrained boltzmann machine.

103. Fusing any two obtained single-mode data to obtain dual-mode data;

in the embodiment of the invention, if A is vectorized audio data, B is vectorized sensor data, C is vectorized image data, and D is vectorized text data, the obtained dual-mode data are AB, AC, AD, BC, BD and CD.

The fusion of any single mode data and text data can be obtained by a mode based on a Gaussian Bernoulli limit Boltzmann machine and a limit Boltzmann machine based on a repetitive SoftMax, such as AD, BD, CD.

The dual-mode or multi-mode data fusion result is represented by a first mode and a second mode, for example, if the dual-mode data fusion result is (a, B), the first mode is a, and the second mode is B; the three-mode data fusion result is ((AB) C), the first mode is AB which is fused, and the second mode is C; the four-mode data fusion result is ((AB) (CD)), then the first mode is the fused AB, and the second mode is the fused CD; and so on.

In the embodiment of the present invention, the dual-mode or multi-mode data fusion process may be represented by a gaussian bernoulli-limited boltzmann machine, where a first mode based on the gaussian bernoulli-limited boltzmann machine is represented by m, a second mode based on the gaussian bernoulli-limited boltzmann machine is represented by t, and the above example is continued, where in the dual-mode data fusion (a, B), the first mode m represents a, and the second mode t represents B; in triple-mode data fusion ((AB) C), the first mode m represents AB that has been fused, and the second mode represents C; in four-mode data fusion ((AB) (CD)), a first mode represents AB that has been fused, and a second mode t represents CD that has been fused; and so on.

The process of dual-mode or multi-mode data fusion may also be represented by a {0,1} constrained Boltzmann machine, the first mode of the {0,1} constrained Boltzmann machine being represented by n, and the second mode of the {0,1} constrained Boltzmann machine being represented by a.

Besides text data, the fusion of any two other single-mode data can be obtained through a first mode m based on the Gaussian Bernoulli limiting Boltzmann machine and a second mode t based on the Gaussian Bernoulli limiting Boltzmann machine; it is also possible to obtain by limiting the first mode n of the boltzmann machine based on {0,1} and the second mode a of the boltzmann machine based on {0,1 }.

104. Fusing any two dual-mode data containing the same mode data, and fusing any one dual-mode data with single-mode data different from the dual-mode data to obtain three-mode data;

specifically, any two dual-mode data combinations containing the same mode data in the dual-mode data are referred to as a first combination; combining any one of the dual mode data with the different single mode data, which is called a second combination;

Continuing with the above example, the first combination of any two dual-mode data including the same mode data among the 6 dual-mode data is: (AB, AC), (AB, AD), (AB, BC), (AB, BD), (AC, AD), (AC, BC), (AC, CD), (AD, BD), (AD, CD), (BC, BD), (BC, CD), (BD, CD);

the second combination of the different single-mode data in any one of the 6 dual-mode data is as follows: (AB, C), (AB, D), (AC, B), (AC, D), (AD, B), (AD, C), (BC, A), (BC, D), (BD, A), (BD, C), (CD, A), (CD, B);

then, the first and second hidden layers of any one of the first and second combinations are modeled using a constrained boltzmann machine, resulting in three-mode data (ABC, ABD, ACD, BCD).

When the dual-mode or multi-mode data is fused based on the Gaussian Bernoulli limiting Boltzmann machine, a hidden layer variable h is { h ═ h^(1m),h^(2m),h^(1t),h^(2t),h⁽³⁾That is, the number of hidden layer-activated neurons is limited, the specific fusion result of bimodal or multimodal data is expressed as follows:

wherein,

representing the conditional probability of the first hidden layer under the condition of known neurons of the visible layer and the second hidden layer in the m mode;

representing the condition probability of the known first hidden layer, the known third hidden layer neuron condition and the known second hidden layer in the m mode;

representing the conditional probability of the known visible layer, the known second hidden layer and the known first hidden layer in the t mode;

representing the conditional probability of the first hidden layer, the third hidden layer and the second hidden layer which are known in the t mode;

the conditional probability of a second hidden layer of a known t mode, a second hidden layer of an m mode and a third hidden layer is represented;

representing that in a t mode, given a first hidden layer, a visible layer obeys Gaussian distribution;

representing that in the m mode, given a first hidden layer, a visible layer obeys Gaussian distribution;

when the boltzmann machine is limited to fuse dual-mode or multi-mode data based on {0,1}, the hidden layer variable h is { h ═ h⁽¹ⁿ⁾,h⁽²ⁿ⁾,h^(1a),h^(2a),h⁽³⁾And then the specific fusion result of the dual-mode or multi-mode data is expressed as follows:

wherein,

representing the conditional probability of the first hidden layer under the condition of known neurons of the visible layer and the second hidden layer in the n mode;

representing the conditional probability of a first hidden layer under the condition of known neurons of a visible layer and a second hidden layer in an a mode;

representing the conditional probability of a mode second hidden layer under the condition of a known a mode first hidden layer and a known a mode third hidden layer neuron;

the conditional probability of a third hidden layer under the condition of a known n-mode second hidden layer and a known a-mode second hidden layer neuron is represented;

representing that in a mode, given a first hidden layer, a visible layer obeys a Gaussian distribution;

representing that in an n mode, given a first hidden layer, a visible layer obeys Gaussian distribution;

105. and by analogy, performing N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data.

In the embodiment of the invention, if N is 4, four-mode modeling is carried out on the obtained three-mode data.

Specifically, any one of the obtained three-mode data is combined with the different single-mode data, which is called a third combination; combining any one of the dual-mode data with the dual-mode data different from the dual-mode data, which is called a fourth combination;

and then modeling the first hidden layer and the second hidden layer of any one of the third combination and the fourth combination by using a limiting Boltzmann machine to obtain four-mode data.

Continuing with the above example, a third combination of any tri-modal data and its different mono-modal data is: (ABC, D), (ABD, C), (ACD, B), (BCD, A);

a fourth combination of either dual mode data and dual mode data distinct therefrom: (AB, CD), (AC, BD), (AD, BC);

according to the third combination and the fourth combination, four-pattern data ABCD is obtained.

Further, after obtaining the multi-mode data, the present invention can also use the multi-mode data to learn, since the learning using the multi-mode data belongs to the common knowledge of those skilled in the art, the present invention will be briefly described as follows: to find the average domain parameter μ that satisfies the condition, the right part of the following inequality should be maximized to obtain the optimal μ ═ μ^(1m)，μ^(1t)，μ^(2m)，μ^(2t)，μ⁽³⁾}; wherein μ is related to θ; the inequality mentioned above is

Wherein,

as a result of this, it is possible to,

The embodiment of the invention uses a naive average domain approximation to approximate P (h | v; theta);

when the process of bi-modal or multi-modal data fusion is represented by a Gaussian Bernoulli-limited Boltzmann machine, Q (h | v; μ) is constructed, and μ obtained by the above learning process is as follows:

wherein,q (h | v; μ) is a form of continuous multiplication on hidden layer neurons in that it simplifies the approximation to the posterior distribution.

Then mu in mu^(1m)，μ^(1t)，μ^(2m)，μ^(2t)，μ⁽³⁾The methods are respectively as follows:

wherein,a first hidden layer and a second hidden layer respectively corresponding to the m-mode,a first hidden layer and a second hidden layer respectively corresponding to the t mode,the corresponding mixed layer is also the third hidden layer.

When the process of bi-modal or multi-modal data fusion can also be represented by a {0,1} based constrained Boltzmann machine, the Q (h | v; μ) constructed is:

in this case, { μ⁽¹ⁿ⁾，μ^(1a)，μ⁽²ⁿ⁾，μ^(2a)，μ⁽³⁾The methods are respectively as follows:

wherein,a first hidden layer and a second hidden layer corresponding to the n-mode respectively,a first hidden layer and a second hidden layer respectively corresponding to the a mode,the corresponding mixed layer is also the third hidden layer.

Then, the user can use the device to perform the operation,

sampling from Q (v, h; theta) to obtain a model expectation P (v, h | theta);

v_i、h_jdenotes the sample on Q (v, h; theta), N_vIndicates the number of visible layer elements, N_hRepresenting the number of hidden layer neurons;

data expectation E is then calculated from the obtained μ_P(h|v)；

Calculating a full sample according to the condition distribution of the current neuron relative to other neurons and the collected partial sample;

specifically, Q samples are provided, and the initial state of the s-th sample is (v)_0,s,h_0,s) And acquiring a Markov chain after Gibbs sampling, acquiring a stable state sample after t-50 times of sampling, and forming Q Markov chains by Q samples.

Updating the connection weight w between each layer of the neural network and the deviant b of hidden layer neuron according to the collected new sample and the obtained mu;

specifically, when θ is set, the energy function P (v, h) is calculated with respect to θ and W⁽¹⁾、W⁽ⁱ⁾A, b, obtaining a difference value corresponding to the desired case with respect to P (h | v; theta) and with respect to P (v, h; theta);

and updating the weight by using a gradient descent method according to the obtained expected difference value so as to achieve the aim of learning by using the obtained multimode data.

Wherein the energy function P (v, h) is related to theta, W⁽¹⁾、W⁽ⁱ⁾The first derivative of a, b is prior art, and the embodiment of the present invention only lists the energy function with respect to the connection weight W⁽¹⁾And W⁽ⁱ⁾The first derivative of (a) will not be described in detail.

Energy function with respect to connection weight W⁽¹⁾Is a first derivative of

W⁽¹⁾Representing weight matrixes of the visible layer and the first hidden layer; p (v; θ) is the edge probability for v;

energy function with respect to connection weight W⁽ⁱ⁾Is a first derivative of

W⁽ⁱ⁾And representing the weight matrix of the i-1 hidden layer and the i hidden layer.

Finally, when Q (h | v; μ) is equal to P (h | v; θ), the right part of the above inequality is maximized, and the optimal μ is obtained.

Furthermore, the multi-mode data model finally obtained by the embodiment of the invention can also estimate lost or incomplete data, namely, the first mode data can be estimated according to the multi-mode data containing the first mode data and the corresponding single-mode or multi-mode data not containing the first mode data;

for example, if the first mode data is C mode data, if the C mode data is missing, the C mode data can be estimated by guiding through the dual mode data p (ac), p (bc) or p (cd), and then correspondingly sampling in the single mode p (a), p (b) or p (d); the C mode data can also be deduced by guiding through the three mode data p (ABC), P (ACD) or P (BCD) and then correspondingly sampling on the dual mode data P (AB), P (AD) or P (BD); the C-mode data can also be inferred by guiding through the four-mode data p (abcd), and then sampling on the three-mode data p (abd), and similarly, the inference method when two or more mode data are missing is similar, and the embodiment of the present invention is not repeated herein.

In the embodiment of the invention, N mode data are respectively subjected to vectorization processing, then each mode data in the N mode data after vectorization processing is obtained through modeling, so that dual mode data are obtained, then the obtained dual mode data are fused to obtain three mode data, and the like, so that the N mode data are finally obtained, and the fusion of multiple mode data including sensor data is realized; furthermore, when vectorization processing is carried out on high-dimensional data such as sensors, the learning threshold value of each layer is gradually increased, the number of activated neurons of a plurality of hidden layers in the middle is the minimum, and the limited number of neurons with the maximum activation relevance of the last layer is the maximum, so that layer-by-layer learning and layer-by-layer correction can be realized, and the high-dimensional data can be distributed on the limited dimension, so that data compression is realized, namely the high-dimensional data such as the sensors are sparsely represented in the learning process, and subsequent processing is facilitated; furthermore, when the dual-mode or multi-mode data are fused based on the Gaussian Bernoulli limiting Boltzmann machine, the number of hidden layer activated neurons is limited, so that the representation of high-dimensional data on a limited dimension is obtained, and the fusion process is simplified; further, in the event that a pattern or patterns are missing, missing or incomplete data can be inferred by bringing known pattern data into the multimodal data model.

Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

Claims

1. A multi-mode data fusion method based on deep learning is characterized by comprising the following steps:

modeling each mode data in the N mode data to obtain N single mode data;

fusing any two obtained single-mode data to obtain dual-mode data;

by analogy, performing N-mode data fusion according to the obtained N-1 mode data to obtain N-mode data;

the N is 4, the four mode data are audio data, sensor data, image data and text data, and the thinning and vectorizing processing on the audio data specifically includes:

setting a truncation kernel norm;

Wherein h is_W,b(x⁽ⁱ⁾) X representing reconstruction⁽ⁱ⁾β denotes the weight of the sparsification penalty factor, W⁽¹⁾Representing the weight of the visible layer to the first hidden layer;

2. The method of claim 1, wherein the vector of the first hidden layer isWherein,{v¹,...,v^mindicates that m training samples exist in the training set, rho is a sparsity parameter, D represents a dimension, and W represents_ijRepresenting the contribution degree of the ith unit of the visible layer to the jth neuron of the first hidden layer; b_jMeans the jth godOffset value of the element.

3. The method of claim 1, wherein the vector of the k-2 hidden layer is

Wherein,h_j ⁽ⁱ⁾represents the jth vector of the ith hidden layer,represents the jth vector of the kth-2 hidden layer,representing the contribution degree of the s-th neuron of the i-1 th hidden layer to the j-th neuron of the i hidden layer,an offset value representing the jth neuron from the i-1 th layer to the ith layer;representing the contribution degree of the s-th neuron of the k-3 hidden layer to the j-th neuron of the k-2 hidden layer,the offset values for the jth neuron from layer k-3 to layer k-2 are indicated.

4. The method of claim 1, wherein the thinned and vectorized sensor data and image data are taken as h^(k-1)， Represents the jth vector of the (k-1) th hidden layer, wherein,the contribution degree of the s-th neuron of the k-2 hidden layer to the j-th neuron of the k-1 hidden layer,the offset values for the jth neuron from layer k-2 to layer k-1 are indicated.

5. The method according to claim 1, wherein any two dual-mode data containing the same mode data are fused, and any one dual-mode data and a single-mode data different from the dual-mode data are fused to obtain three-mode data, specifically:

6. The method according to claim 1, wherein N-mode data fusion is performed according to the obtained N-1-mode data to obtain N-mode data, specifically:

7. The method of any of claims 1 to 6, further comprising:

the first mode data is inferred from multi-mode data that includes the first mode data and corresponding single mode or multi-mode data that does not include the first mode data.