CN109902714B

CN109902714B - Multi-modal medical image retrieval method based on multi-graph regularization depth hashing

Info

Publication number: CN109902714B
Application number: CN201910048281.0A
Authority: CN
Inventors: 曾宪华; 郭姜
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Guangzhou Dayu Chuangfu Technology Co ltd
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2022-05-03
Anticipated expiration: 2039-01-18
Also published as: CN109902714A

Abstract

The invention requests to protect a multi-pattern regularization depth hash multi-modal medical image retrieval method, and particularly relates to a method for simultaneously extracting the characteristics of a multi-modal medical image group through a multi-channel depth model; correspondingly constructing a plurality of graph regularization matrixes according to the characteristics of the multi-modal medical image group; fusing a plurality of graph regularization matrixes, and obtaining hash codes of the multi-modal medical image group by learning through a modal adaptive limited Boltzmann machine; the distance between the single modal data hash code and the multi-modal medical image group hash code is calculated through Hamming distance measurement, the distances are sorted in an ascending order, n groups of multi-modal medical images with the minimum distance are selected and returned to a user, and therefore multi-modal medical image retrieval is achieved. The realization of the method can help doctors to quickly find other data of various modes in multi-mode medical images such as ultrasonic images, dispute texts, nuclear magnetic resonance images and the like through the data of one mode, thereby being beneficial to medical diagnosis of doctors, reducing the workload of doctors and improving the working efficiency.

Description

Multi-mode medical image retrieval method based on multi-image regularization depth hashing

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to a multi-pattern regularization depth hashing method for realizing multi-mode medical image retrieval.

Background

The multi-modal medical image retrieval technology refers to retrieving matched medical images of the same modality and different modalities from a multi-modality medical image library according to input data of a certain modality. The existing multi-modal retrieval technology mainly comprises three modules: text-based image retrieval techniques, text-based video retrieval techniques, image-based text retrieval techniques. The existing multi-modal retrieval technology is mostly used for mutually retrieving data between two modalities, however, the growing multi-modal medical images make the existing technology incapable of meeting the requirement of mutually retrieving data between any modalities.

The cross-modal hash retrieval algorithm becomes a research hotspot in recent years and obtains a better effect. However, there still exist some technical drawbacks: (1) most of the existing methods learn the hash code by extracting the manual features of the data, and compared with the method of learning the hash code according to the internal structural features of different data, the hash code learned by manually extracting the features has a great influence on the retrieval precision; (2) most of the existing cross-modal hash algorithms based on deep learning only realize mutual retrieval between two modal data; (3) the existing method does not consider the internal manifold structure of the data when the mapping from the data to the hash code is realized, so that the learned hash code cannot maintain the local manifold structure of the data, and the retrieval precision is influenced.

In view of the above problems, although many scholars invest much time and effort to research, no multi-modal search method for realizing an adaptive data modality still appears. The RBM principle can directly use the hidden layer result obtained by learning as the hash code of the data; and the addition of the manifold structure maintenance can maintain the local manifold structure of the data while the data is mapped to the hash code.

The invention aims to solve the problems that manual characteristics can not meet the requirement of high-precision retrieval, cross-modal hash algorithms are mostly dual-modal mutual retrieval, the mapping between data and hash codes can not keep the local manifold structure of the data, and the like. According to the method, the depth model is used for extracting the data depth features to replace the manual features of the data, so that the problem that the internal structure of the data cannot be well excavated by the manual features is solved, and the retrieval precision is greatly improved in the Hash retrieval; the problem that most of the existing multi-modal retrieval can only realize mutual retrieval in data of two modals can be solved by using a self-adaptive RBM hash algorithm, and mutual retrieval can be realized in any multi-modal data; and the manifold structure is used for keeping, so that the local manifold structure of the data can be well kept in the mapping process from the data to the hash code, and the retrieval precision is further improved.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. The multi-modal medical image retrieval method based on the multi-graph regularization depth hashing and capable of improving retrieval accuracy is provided. The technical scheme of the invention is as follows:

a multi-modal medical image retrieval method based on multi-graph regularization depth hashing comprises the following steps:

step 1, extracting depth features of a multi-modal medical image by using a multi-channel depth model, and standardizing the depth features;

step 2, constructing a plurality of neighbor graph matrixes according to the characteristics of the plurality of different modal data extracted in the step 1, so as to maintain the local manifold structure of the data and construct a label matrix;

step 3, fusing the constructed neighbor graph matrixes and the label matrixes into a graph matrix;

step 4, learning by combining a modal adaptive Restricted Boltzmann Machine (RBM) with the fused image matrix to obtain a common hash code of the multi-modal medical image;

step 5, generating a hash code for modal data to be retrieved through a depth channel and a modal adaptive RBM;

and 6, calculating the distance between the data of a certain modality to be retrieved and the multi-modal medical image library by using a Hamming distance measurement method, sequencing in an ascending order, and returning the n groups of nearest multi-modal medical images with the minimum distance to the user.

Further, the step 1 of extracting the depth features of the multi-modal medical image by using the multi-channel depth model specifically includes: the method comprises the steps of firstly, adaptively determining the channel number of a depth model according to the modal number of a data set, then connecting a plurality of depth channels to the same classification layer to be used as a whole, training the whole multi-channel depth model, and taking the result of the second last layer of each channel after training as the depth feature of corresponding modal data.

Further, in the step 1, after the depth features of the multi-modal medical image group are extracted, the depth features are normalized by using Z-score, the quantized features are subject to standard normal distribution, and the quantization formula of the features is as follows:

where μ represents the mean of a feature and δ represents the standard deviation of a feature.

Further, in the step 2 of neighbor graph construction, the graph is regarded as a set of n vectors to describe the geometry of the data, wherein each vector corresponds to one data point, the length of each vector is p, and represents p data points nearest to the data point, and a plurality of neighbor matrix matrices

The distance measurement mode of the method adopts a Gaussian thermonuclear distance or a Manhattan distance or a Chebyshev distance, m represents the modal number of the multi-modal medical image, i represents the ith row of a neighbor matrix constructed by certain modal data, and j represents the jth column;

A neighbor matrix representing a construction of data of a certain modality. And after the neighbor graph is constructed according to the depth characteristics, constructing an additional label neighbor graph according to the labels. Further, the constructing an additional label neighbor graph according to the labels specifically includes: and constructing an n-dimensional matrix according to the label, wherein the construction rule is as follows:

x_irepresenting a set of images, x, respectively representing multi-modal data_jRepresenting any of the remaining n-1 sets of images, a representing x_iAnd x_jThe number of the same label. After m +1 matrixes are constructed, multi-graph regularization matrix fusion is carried out by adopting the following formula:

where μ represents the weight coefficient of each matrix at the time of fusion.

Further, the step 4 of learning by using a mode-adaptive RBM in combination with the fused graph matrix to obtain a common hash code of the multi-modal medical image specifically includes:

the mode self-adaptive RBM is improved from an original Gaussian RBM, the number of visual layers is self-adaptively determined according to the number of data modes, and the visual layers are connected to the same hidden layer; simultaneously adding a manifold retention matrix when the visible layer and the hidden layer are generated through conditional probability, so that the generated hidden layer hash code can retain a local neighbor structure of the data; the energy function of the improved RBM model is as follows:

Wherein U represents the energy function of the entire improved RBM model;

represents a certain node of the 1 st, 2 nd, … th, M visual layers, h_iIndicating a certain point of the hidden layer. M denotes the number of modes, N₁,N₂Respectively representing the number of nodes of each visual layer and the number of nodes of the hidden layer; θ represents a parameter set of the RBM, including a bias a of the visible layer, a bias b of the hidden layer, and a connection weight w between the visible layer and the hidden layer.

Representing the bias of the mth node of the mth visual layer, b_sIndicating the bias of the s-th node of the hidden layer,

representing the connection weight of the r node of the m visual layer and the s node of the hidden layer;

r-th node, h, representing the m-th visible layer_sRepresenting the s-th node of the hidden layer;

the normal distribution standard deviation of the r node of the mth visual layer is shown and is a positive value, training is not generally carried out, and a fixed value of 1 is taken; λ represents a regularization weight parameter that controls the smoothness of the hidden layer representation. h is_isS-th node, h, representing hidden layer_jsRepresents the sum h derived from the fusion matrix_isA neighboring node. m represents the mth visual layer, r represents the r-th node of a certain visual layer, and s represents the s-th node of the hidden layer.

Further, the step 5 of generating a hash code for data in a certain modality to be retrieved through a depth channel and a modality-adaptive RBM specifically includes: placing a medical image of a certain modality to be retrieved into a corresponding modality feature extraction channel during training, and taking the penultimate layer number value of the convolutional neural network as a feature value of the medical image; normalizing the obtained characteristic value according to the mean value mu and the standard deviation delta of the characteristic matrix of the same modal data obtained during training; and inputting the standardized eigenvalue as a RBM visual layer after training, paying attention to the visual layer in the same mode during corresponding training, inputting a zero matrix with the same specification as the eigenvalue into other visual layers, and performing matrix multiplication on the zero matrix and the connection weight to obtain a result, namely taking a sign function sign to obtain the hash code of the data to be retrieved.

Further, in step 6, a hamming distance measurement method is used to calculate the distance between the data of a certain modality to be retrieved and the multi-modality medical image library, and the distance is sorted in an ascending order, and the n groups of nearest multi-modality medical images with the smallest distance are returned to the user, where the hamming distance measurement formula is as follows:

where k denotes the length of the hash code, h_r(x) The r-th bit hash code, h, representing a sample x_r(y) represents the r-th bit hash code of sample y,

representing an exclusive or operation.

The invention has the following advantages and beneficial effects:

1. the method avoids the problem that the internal structure of the data cannot be well mined by manual features, and the depth feature extraction structure based on the deep learning provided in the step 1 solves the problem that the traditional hash method based on the manual features is low in precision.

2. The feature extraction structure of the self-adaptive mode in the step 1 and the self-adaptive RBM visual layer in the step 4 solve the problem that most existing methods can only search between two modes, and realize mutual search between any multi-mode medical images.

3. The RBM combined with multi-graph regularization in the steps 2, 3 and 4 solves the problem that a local manifold structure of data cannot be needed in the hash mapping process, and further improves the retrieval precision.

Drawings

FIG. 1 is a flow chart of the operation of the present invention in providing a preferred embodiment;

FIG. 2 is a diagram of an overall model of mapping a multi-modal medical image group to a hash code in the present invention;

FIG. 3 is a diagram of a modal adaptive RBM model in the present invention;

FIG. 4 is a graphical representation of the experimental description and actual search results (bimodal datasets) in the present invention;

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

according to different data types of the multi-modal medical image, different types of depth models are selected for extracting the depth features of the multi-modal medical image, such as an image data selection convolution neural network, a text selection circulation neural network and the like, and the models for extracting the depth features are not the key tasks of the invention.

Because the numerical dimensions of all the features are inconsistent, and the visual layer of the subsequent modal adaptive RBM model for Hash code learning is required to be in accordance with Gaussian distribution, after the depth features of the multi-modal medical image group are extracted, the features are standardized by adopting Z-score, and the quantized features are in accordance with standard normal distribution. The quantization formula of the features is as follows:

Where μ and δ each represent the standard deviation of the mean of a feature.

And constructing a neighbor graph according to the original characteristics, and keeping a local neighbor structure of the data while learning the hash code. In the construction of the neighbor graph, distance measurement modes comprise Gaussian thermonuclear distance, Manhattan distance, Chebyshev distance and the like, and the selection of the measurement mode is not a key task of the invention. After the neighbor graph is constructed according to the depth characteristics, an additional neighbor graph is constructed according to the labels, the labels have great influence on the internal manifold structure of the data, and in the subsequent neighbor graph fusion, the proportion of the label neighbor graph is large.

The original Gaussian RBM model can map real-value data into binary data, the number of visual layers of the original Gaussian RBM model is adaptively adjusted according to the number of the multi-modal medical image modalities by utilizing the characteristic, the visual layers are connected to the same hidden layer, a graph regularization item is added to an energy function of the hidden layer to restrict similar data to generate similar hash codes, so that the internal manifold structure of the data is kept, and the retrieval precision is improved.

The existing multi-modal hash method is mainly used for generating hash codes by using manual features, and mutual retrieval between two modal data is mostly carried out. This greatly reduces the retrieval accuracy and does not meet the user's requirement for mutual retrieval between data of arbitrary modalities. The deep hash multi-mode retrieval method based on multi-graph regularization can greatly improve retrieval accuracy and meet the requirement of mutual retrieval of users among any data modes.

The technical scheme of the invention is explained in detail as follows:

the method comprises the following steps: multi-modality medical image group depth feature extraction

In order to obtain the hash code of the multi-modal medical image group, a depth feature is extracted from data. Firstly, the channel number of the depth model is determined adaptively according to the mode number of the data set, and then a plurality of depth channels are connected to the same classification layer as a whole, as shown in fig. 3. And training an integral multi-channel depth model, and taking the result of the second last layer of each channel as the depth characteristic of the corresponding modal data after the training is finished.

Step two: constructing and fusing multi-neighbor graph matrix

To obtain hash code of data group and keep local data close contact nodeAnd constructing a neighbor map according to the depth features of different modal data. The geometry of the data is described by considering the figure as a set of n vectors, where each vector corresponds to a data point, and the length of each vector is p, which represents the p data points nearest to the data point. Multiple neighbor matrix matrices

The distance measurement mode of (2) has various choices, such as Gaussian thermonuclear distance, Manhattan distance, Chebyshev distance and the like; where m represents the number of modalities of the multi-modality medical image. And constructing an n-dimensional matrix according to the label, wherein the construction rule is as follows:

After the construction of the neighbor matrix is completed, the neighbor matrix is fused into a matrix, and the fusion rule is as follows:

wherein the fusion coefficient is learned to a proper set of parameters in a traversal mode.

Step three: depth feature normalization

We must normalize the resulting depth features for two reasons:

1. different characteristics often have different dimensions, which affects the results of subsequent data analysis and hash code learning, and data standardization processing is required to eliminate the dimension influence between indexes.

2. The gaussian bernoulli limited boltzmann machine assumes that the data input follows a gaussian distribution, whereas the data in the original feature matrix does not conform to the gaussian distribution.

Combining the above two reasons, the original feature matrix obtained by us is normalized by Z-score, and the normalized features will follow the standard normal distribution. The quantization formula of the features is as follows:

where μ and δ each represent the standard deviation of the mean of a feature.

Step four: hash code learning

And (4) learning by combining the mode self-adaptive RBM with the fused image matrix to obtain the common hash code of the multi-mode medical images. The mode self-adaptive RBM is improved from an original Gaussian RBM, the number of visual layers is self-adaptively determined according to the number of data modes, and the visual layers are connected to the same hidden layer; meanwhile, a manifold retention matrix is added when the visible layer and the hidden layer are generated through conditional probability, so that the generated hidden layer hash code can retain a local neighbor structure of data. After the depth features are standardized, the depth features are input into a visual layer according to modes, and a multi-input contrast divergence algorithm is utilized to train a mode self-adaptive RBM in combination with a fused matrix. After training is completed, the shared hash code of the multi-modal medical data can be obtained.

Step five: returning n (taking n to be 3 in the invention) groups of neighbor graphs of the image to be retrieved

And taking an image of one mode from the multi-mode medical data set to be tested, obtaining the depth characteristic of the image by using the trained model, and then obtaining the Hash code of the image. And performing distance measurement on the obtained test image hash code and the obtained shared hash code of the multi-modal data. And calculating the distance between the samples by using a Hamming distance measurement method, sequencing the samples in an ascending order, taking out the n hash codes with the minimum distance, and returning the n groups of images corresponding to the n hash codes to the user.

Specific experimental verification

In order to verify the validity of the method, a bimodal brain medical data set is used as a proposed method verification experiment. The dataset contains 323 groups of 10 classified bimodal brain images in total, one modality being MRI modality data and the other modality being PET modality. 290 groups of images are taken as training groups, and the remaining 32 groups of images are taken as test sets.

The specific experimental steps are as follows:

1. and extracting image depth features by using a convolutional neural network. The number of nodes of the last but one full connection layer of the two convolutional networks is set to be 256, the same classification layer is shared, bimodal training data are input into the two-channel deep network, and the classification task is utilized to train the whole network. After the training is finished, 290 groups of training data and 32 groups of test data are put into the depth channels of the corresponding modes, at this time, the numerical value of the penultimate layer is obtained as the depth feature of the data, and two (256 × 290) feature matrixes and two (256 × 32) feature matrixes are respectively obtained.

2. Two adjacent matrixes (290 multiplied by 290) are constructed by utilizing the obtained training set depth feature matrix, and then a label matrix is constructed according to the labels of the training data set. And fusing the three matrixes by using the fusion rule to obtain a fusion matrix.

3. The resulting 4 depth feature matrices are normalized according to the normalization rules described above. Note that the mean and standard deviation of the normalized test set feature matrix is the mean and standard deviation of the test set feature matrix for its corresponding modality.

4. And (3) taking the normalized depth feature matrix as the input of a double-visual-layer Restricted Boltzmann Machine (RBM), and training the double-visual-layer RBM by combining a fusion matrix. And inputting the characteristic matrixes of the training set and the test set after the training is finished, performing matrix multiplication on the characteristic matrixes and the connection weight, and then taking a sign function (sign) to obtain the Hash codes of the training set and the test set.

5. Two MRI modal images labeled Glioma and Dementia are taken as images to be retrieved respectively, Hamming distances are calculated and ranked by the corresponding hash codes and the hash codes of the training set, the first 3 hash codes with the minimum distance are taken, then 3 groups of bimodal images corresponding to the hash codes are returned, and the obtained result is shown in figure 4.

6. All test sets of the MRI modality are taken as images to be retrieved, and the average accuracy (mAP) is calculated taking the returned results of the first 10, and the result is obtained as 0.6765.

The experimental results can prove the feasibility of the method.

In summary, the innovation and advantages of the invention are as follows:

1. the problem that the internal structure of the data cannot be well mined by manual features is avoided, and the problem that the traditional hash method based on the manual features is low in precision is solved.

2. The method solves the problem that most existing methods can only search between two modes, and realizes mutual search between any multi-mode medical images.

3. The problem that a data local manifold structure cannot be needed in the hash mapping process is solved, and the retrieval precision is further improved.

According to the depth hash multi-modal medical image retrieval method based on multi-graph regularization, the traditional manual features are replaced by the depth features, and the hash retrieval precision can be improved.

According to the depth hash multi-modal medical image retrieval method based on multi-graph regularization, provided by the invention, the local manifold structure of data is kept while hash codes are obtained through a manifold keeping mode, and the retrieval precision can be further improved.

The depth hash multi-modal medical image retrieval method based on multi-graph regularization provided by the invention realizes mutual retrieval of data in any mode, and greatly meets the user requirements.

The depth hash multi-modal medical image retrieval method based on multi-graph regularization provided by the invention has clear steps and strong pertinence.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A multi-modal medical image retrieval method based on multi-graph regularization depth hashing is characterized by comprising the following steps:

step 6, calculating the distance between the data of a certain modality to be retrieved and the multi-modal medical image library by using a Hamming distance measurement method, sequencing the distance in an ascending order, and returning n groups of nearest multi-modal medical images with the minimum distance to the user;

the step 1 of extracting the depth features of the multi-modal medical image by using the multi-channel depth model specifically comprises the following steps: firstly, adaptively determining the channel number of a depth model according to the modal number of a data set, connecting a plurality of depth channels to the same classification layer as a whole, training the whole multi-channel depth model, and taking the result of the second last layer of each channel as the depth feature of corresponding modal data after training;

in the step 2 of neighbor graph construction, a graph is regarded as a set of n vectors to describe the geometrical structure of data, wherein each vector corresponds to one data point, the length of each vector is rho, the length of each vector represents rho data points nearest to the data point, and a plurality of neighbor matrixes are used for describing the geometrical structure of the data

A neighbor matrix representing the construction of certain modal data, after constructing a neighbor graph according to the depth characteristics, constructing an additional label neighbor according to the labelA drawing;

step 3, constructing an additional label neighbor graph according to the labels, specifically comprising: and constructing an n-dimensional matrix according to the label, wherein the construction rule is as follows:

x_ia set of images, x, representing multimodal data_jRepresenting any of the remaining n-1 sets of images, a representing x_iAnd x_jThe number of the labels is the same, after m +1 matrixes are constructed, the following formula is adopted to perform multi-graph regularization matrix fusion:

wherein mu represents a weight coefficient of each matrix during fusion, and psi represents a fused matrix;

the step 4 of obtaining the common hash code of the multi-modal medical image by using the mode adaptive RBM and combining the fused image matrix learning specifically comprises the following steps:

Wherein U represents the energy function of the entire improved RBM model; f. of_i ¹,f_i ²,…,f_i ^MRepresents a certain node of the 1 st, 2 nd, … th, M visual layers, h_iRepresenting a certain node of the hidden layer, M representing the number of modes, N₁,N₂Respectively representing the number of nodes of each visual layer and the number of nodes of the hidden layer; theta represents a parameter set of the RBM, and comprises a bias a of the visual layer, a bias b of the hidden layer and a connecting weight w between the visual layer and the hidden layer,

the r-th node, h, representing the m-th visible layer_sRepresenting the s-th node of the hidden layer;

representing the normal distribution standard deviation of the mth node of the mth visual layer, taking a fixed value of 1 without training, wherein the normal distribution standard deviation is a positive value; λ represents the regularization weight parameter, which controls the smoothness of the hidden representation, h_isS-th node, h, representing hidden layer_jsRepresents the sum h derived from the fusion matrix_isAnd adjacent nodes, m represents the mth visible layer, r represents the nth node of a certain visible layer, and s represents the sth node of a hidden layer.

2. The multi-modal medical image retrieval method based on multi-graph regularization depth hashing as claimed in claim 1, wherein said step 1 is to normalize the depth features by Z-score after extracting the depth features of the multi-modal medical image group, the quantized features will obey standard normal distribution, and the quantization formula of the features is as follows:

Where μ represents the mean of a feature, δ represents the standard deviation of a feature, and x represents the depth feature.

3. The multi-modal medical image retrieval method based on multi-graph regularization depth hashing as claimed in claim 1, wherein said step 5 generates a hash code for a certain modal data to be retrieved through a depth channel and a modal adaptive RBM, specifically comprising: placing a medical image of a certain modality to be retrieved into a corresponding modality feature extraction channel during training, and taking the penultimate layer number value of the convolutional neural network as a feature value of the medical image; normalizing the obtained characteristic value according to the mean value mu and the standard deviation delta of the characteristic matrix of the same modal data obtained during training; and inputting the standardized eigenvalue as a RBM visual layer after training, paying attention to the visual layer in the same mode during corresponding training, inputting a zero matrix with the same specification as the eigenvalue into other visual layers, and performing matrix multiplication on the zero matrix and the connection weight to obtain a result, namely taking a sign function sign to obtain the hash code of the data to be retrieved.

4. The multi-modal medical image retrieval method based on multi-graph regularization depth hashing as claimed in claim 3, wherein said step 6 uses a hamming distance measurement method to calculate the distance between the data of a certain modality to be retrieved and the multi-modal medical image library and sort in an ascending order, and returns the n groups of nearest multi-modal medical images with the smallest distance to the user, wherein the hamming distance measurement formula is as follows:

Where k denotes the length of the hash code, h_r(x) The r-th bit hash code, h, representing a sample x_r(y) an r-th bit hash code representing a sample y,

representing an exclusive or operation.