CN117036894A

CN117036894A - Multi-mode data classification method and device based on deep learning and computer equipment

Info

Publication number: CN117036894A
Application number: CN202311297044.0A
Authority: CN
Inventors: 赵博涛; 张瑜; 蒋田仔; 刘盛锋; 程禄祺; 钱浩天
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2023-11-10
Anticipated expiration: 2043-10-09
Also published as: CN117036894B

Abstract

The application relates to a multi-mode data classification method and device based on deep learning and computer equipment. The method comprises the following steps: acquiring data to be classified, wherein the data to be classified at least comprises medical images and demographic information; determining data features based on the data to be classified, wherein the data features comprise image features and demographic information features, and the image features are obtained by inputting the medical image into a feature extraction model; and inputting the data characteristics into a trained classifier, and determining a classification result. The image features are extracted through the feature extraction model, the image features and the demographic information features are comprehensively considered, and the classification result is determined through the classifier, so that the data to be classified can be classified under the condition that the data to be classified comprises multi-mode data with multiple data types, and the accuracy of the classification result is effectively improved.

Description

Multi-mode data classification method and device based on deep learning and computer equipment

Technical Field

The present application relates to the field of data classification technologies, and in particular, to a method and apparatus for classifying multi-modal data based on deep learning, and a computer device.

Background

In recent years, deep learning has been widely used in the field of data classification as one of the fastest growing artificial intelligence technologies in recent years, for example, deep learning has been applied to medical data classification and the like.

In the conventional technology, the acquired medical images are generally classified based on deep learning. However, due to the diversity of diseases and the complexity of clinic, demographic information may also have an influence on classification results, and if only classification processing of medical images is considered, there is a problem that the accuracy of classification results is low.

Based on this, in view of the limitations of the existing clinical image data feature extraction and the diversity of the clinical acquisition data, there is a need in the art for a method capable of classifying multi-modal data of multiple data source types.

Disclosure of Invention

Aiming at the technical problems, the application provides a deep learning-based multi-modal data classification method, a device and computer equipment capable of classifying multi-modal data of various data source types.

In a first aspect, the present application provides a multi-modal data classification method based on deep learning. The method comprises the following steps:

acquiring data to be classified, wherein the data to be classified at least comprises medical images and demographic information;

Determining data features based on the data to be classified, wherein the data features comprise image features and demographic information features, and the image features are obtained by inputting the medical image into a feature extraction model;

and inputting the data characteristics into a trained classifier, and determining a classification result.

In one embodiment, the feature extraction model is trained by:

acquiring a training set, wherein the training set comprises training medical images and corresponding feature labels;

and training the first initial model and the second initial model based on the training set to obtain the first initial model as the characteristic extraction model.

In one embodiment, the training the first initial model based on the training set to obtain the feature extraction model includes:

inputting the training medical image into a second initial model to obtain auxiliary image characteristics;

classifying the auxiliary image features based on the types of the training medical images, and determining feature sets of different types;

and training a first initial model based on the training set and the feature set to obtain the feature extraction model.

In one embodiment, the classifying the auxiliary image features based on the kind of the medical image, determining the different kind of feature sets comprises:

And determining the similarity between the auxiliary image features, classifying the auxiliary image features based on the types of the medical images and the similarity, and determining feature sets of different types.

In one embodiment, after training the first initial model based on the training set and the feature set, the method comprises:

updating the second initial model based on parameters of the first initial model.

In one embodiment, the training set further includes a lesion segmentation label, and training the first initial model based on the training set to obtain the feature extraction model includes:

inputting the medical image into the first initial model to obtain a prediction feature;

inputting the prediction features into a focus segmentation model to determine a prediction focus segmentation result;

determining a first loss function based on the lesion segmentation result and a lesion segmentation label;

determining a second loss function based on the feature tag and the predicted feature;

and updating parameters of the first initial model based on the first loss function and the second loss function to obtain the feature extraction model.

In one embodiment, the medical image comprises images of at least two image modality types, the image features comprise image features corresponding to at least two image modality types, and the determining the data features based on the data to be classified comprises:

Judging whether the image mode type of the medical image is matched with a preset mode type;

if the image features are matched, inputting the medical image into the feature extraction model to obtain the image features;

if the image features are not matched, determining that the image features corresponding to the image modality types are zero-setting features.

In one embodiment, the inputting the data features into a trained classifier, and determining the classification result includes:

carrying out standardization processing on the data features to obtain standardized features, wherein the standardized features comprise demographic information features and standardized image features;

and performing dimension stitching on the demographic information features and the standardized image features to obtain stitching features, inputting the stitching features into a trained classifier, and determining classification results.

In one embodiment, the demographic information includes gender information and age information, and the normalizing the data features to obtain normalized features includes:

and carrying out binarization processing on the gender information and carrying out maximum and minimum standardization processing on the age information.

In a second aspect, the application further provides a multi-modal data classification device based on deep learning. The device comprises:

The data acquisition module is used for acquiring data to be classified, wherein the data to be classified at least comprises medical images and demographic information;

the feature extraction module is used for determining data features based on the data to be classified, wherein the data features comprise image features and demographic information features, and the image features are obtained by inputting the medical images into a feature extraction model;

and the classification module is used for inputting the data characteristics into the trained classifier and determining classification results.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of any of the deep learning based multi-modal data classification methods of the first aspect described above when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the deep learning based multi-modal data classification methods of the first aspect described above.

According to the multi-mode data classification method, the device and the computer equipment based on deep learning, the data to be classified at least comprise medical images and demographic information, the data characteristics are determined based on the data to be classified, the data characteristics comprise image characteristics and demographic information characteristics, the image characteristics are obtained by inputting the medical images into a characteristic extraction model, the data characteristics are input into a trained classifier, and the classification result is determined. The image features are extracted through the feature extraction model, the image features and the demographic information features are comprehensively considered, and the classification result is determined through the classifier, so that the data to be classified can be classified under the condition that the data to be classified comprises multiple data types, and the accuracy of the classification result is effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is an application environment diagram of a multi-modal data classification method based on deep learning in one embodiment;

FIG. 2 is a flow diagram of a multi-modal data classification method based on deep learning in one embodiment;

FIG. 3 is a schematic diagram of a feature extraction model training process in one embodiment;

FIG. 4 is a schematic diagram of a multi-modal data classification flow based on deep learning in one embodiment;

FIG. 5 is a schematic diagram of a test evaluation result of a multi-modal data classification method based on deep learning in one embodiment;

FIG. 6 is a block diagram of a multi-modal data sorter based on deep learning in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these" and similar terms in this application are not intended to be limiting in number, but may be singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used herein, are intended to encompass non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this disclosure, merely distinguish similar objects and do not represent a particular ordering for objects.

The terms "module," "unit," and the like are used below as a combination of software and/or hardware that can perform a predetermined function. While the means described in the following embodiments are preferably implemented in hardware, implementations of software, or a combination of software and hardware, are also possible and contemplated.

The multi-mode data classification method based on deep learning provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 obtains data to be classified, wherein the data to be classified at least comprises medical images and demographic information, the server 104 determines data characteristics based on the data to be classified, the data characteristics are input into a trained classifier, a classification result is determined, the data characteristics comprise image characteristics and demographic information characteristics, and the image characteristics are obtained by inputting the medical images into a characteristic extraction model. It will be appreciated that one or more of the steps described above may also be performed by the terminal 102 or the server 104 alone, as the application is not limited in this regard. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a multi-modal data classification method based on deep learning is provided, and the application of the method to the application scenario in fig. 1 is taken as an example for explanation, and the method includes the following steps:

s201: and obtaining data to be classified, wherein the data to be classified at least comprises medical images and demographic information.

In the embodiment of the application, the medical image comprises three-dimensional medical image data, for example, a tumor T2 enhanced magnetic resonance image, a T1 weighted image of heart diseases and the like can be included. Demographic information may include age, gender, etc. It will be appreciated that the medical image corresponds to demographic information, for example, the medical image may include medical image data of the person's first, and the demographic information may include gender information and age information of the person's first. The acquiring the data to be classified may include acquiring a medical image by a medical image generating device, and acquiring demographic information corresponding to the medical image, where the medical image generating device may include a CT device, a magnetic resonance device, a digital radiography device, and the like.

In some embodiments, the acquiring the data to be classified further comprises preprocessing the data to be classified, the preprocessing including preprocessing the medical image and/or preprocessing the demographic information. The preprocessing of the medical image comprises gray correction and size fixation of the medical image based on a preset algorithm, wherein the preset algorithm can comprise an N4 bias field correction algorithm, and the size fixation can comprise interpolation, size clipping and the like. The preprocessing of the demographic information comprises replacing and completing missing information in the demographic information, wherein if the missing information comprises age information, the age information is set as an age mean value of a history type corresponding to data to be classified; and if the missing information comprises gender information, setting the gender information to be the random gender determined based on the gender proportion of the history type corresponding to the data to be classified. In other embodiments, the preprocessing of the data to be classified further includes anonymizing, specifically includes removing components including personal information, such as name information, from all the data.

In other embodiments, the preprocessing of the medical image further comprises preprocessing the medical image based on a trained pre-segmentation model. In some specific embodiments, if the data to be classified includes brain tumor data and the brain tumor disease only occurs in a cerebellum or brain stem region, a pre-trained cerebellum-brain stem pre-segmentation model may be used to extract an image of the target region of interest, so as to obtain a medical image to be classified.

S203: determining data features based on the data to be classified, wherein the data features comprise image features and demographic information features, and the image features are obtained by inputting the medical image into a feature extraction model.

In the embodiment of the application, the image features can comprise feature vectors extracted from the medical image based on a feature extraction model, wherein the feature extraction model is used for extracting features in the medical image and representing the image features by the output feature vectors. In some embodiments, the feature extraction model may be used to extract image features of a three-dimensional medical image. The demographic information features may include digitized features corresponding to demographic information, and may also include demographic information features obtained by normalizing demographic information, where the digitized features may include age digital features, gender information features based on digital representations, and the like.

In the conventional art, features are generally extracted from medical images by manually setting extraction rules or by a deep learning-based manner. However, the manner of manually setting the extraction rule does not have versatility, and the feature extraction effect is poor; however, in practical application scenarios, especially for some rare samples, it is often difficult to obtain a large amount of training data. Based on this, in some embodiments, the feature extraction model is trained by:

s2031: a training set is obtained, the training set comprising training medical images and corresponding feature tags.

S2033: and training a first initial model based on the training set to obtain the feature extraction model.

In embodiments of the application, the training medical image may comprise a historical medical image of the determined classification result, for example, may comprise a magnetic resonance medical image of the determined cancer type. The corresponding feature labels may include classification result labels corresponding to training medical images, such as coronary heart disease of coronary artery stenosis, etc.

In an embodiment of the present application, the first initial model may include a query encoder. In some specific embodiments, the first initial model may be based on a res net50 convolutional neural network as a network backbone, but is different from a conventional res net50 convolutional neural network in that vectors after a convolutional layer are first subjected to average pooling, and then dimensions are compressed into set target extraction dimensions through a linear layer, so as to obtain extracted intermediate feature vectors. The extracted intermediate feature vector is then passed through a linear layer to further compress the dimensions to 1/2 of the input. Based on the first initial model of the structure, the corresponding feature vector can be extracted after the dimension of the three-dimensional medical image is reduced, and the method is applicable to application scenes comprising the three-dimensional image in the medical image. In other embodiments, the first initial model may also select other networks as network backbones based on different classification tasks, which the present application is not limited to in particular. It can be appreciated that after training the first initial model based on the training set, the trained first initial model can be used as the feature extraction model.

According to the feature extraction model provided by the embodiment of the application, the training of the feature extraction model and the effective extraction of the features in the medical image can be realized under the condition that a large amount of training data is not needed. The technical problem that the three-dimensional medical image features are difficult to extract in the traditional technology is solved, and meanwhile, the effectiveness and practicality of feature extraction are effectively improved.

S205: and inputting the data characteristics into a trained classifier, and determining a classification result.

In the embodiment of the application, the classifier is used for classifying the data features and determining a classification result, wherein the classification result can comprise the determined disease type. The classifier may include a two-layer fully connected neural network, and may also include a conventional machine learning model, such as a support vector machine (Support Vector Machine, SVM), an extreme gradient lifting tree (eXtremeGradient Boosting, XGBoost), etc., which is not particularly limited in the present application.

The multi-mode data classification method based on deep learning provided by the embodiment of the application can be applied to classification tasks aiming at different classification targets. For example, if the medical image includes a tumor image to be classified, the demographic information includes gender information and age information matched with the tumor image, and after determining the data features of the data to be classified, classification results corresponding to different lesion areas may be output based on the classifier, and in other embodiments, classification results corresponding to tumors of different stages may be output based on the classifier.

According to the multi-mode data classification method based on deep learning, data to be classified at least comprises medical images and demographic information are obtained, data characteristics are determined based on the data to be classified, the data characteristics comprise image characteristics and demographic information characteristics, the image characteristics are obtained by inputting the medical images into a characteristic extraction model, the data characteristics are input into a trained classifier, and a classification result is determined. The image features are extracted through the feature extraction model, the image features and the demographic information features are comprehensively considered, and the classification result is determined through the classifier, so that the data to be classified can be classified under the condition that the data to be classified comprises multiple data types, and the accuracy of the classification result is effectively improved.

In order to further improve the effectiveness of feature extraction of the feature extraction model in extracting image features, in the embodiment of the present application, as shown in fig. 3, the training the first initial model based on the training set to obtain the feature extraction model includes:

s301: and inputting the training medical image into a second initial model to obtain auxiliary image characteristics.

S303: classifying the auxiliary image features based on the types of the training medical images, and determining different types of feature sets.

S305: and training a first initial model based on the training set and the feature set to obtain the feature extraction model.

In the embodiment of the present application, the second initial model may include a key word encoder, and the network structure of the second initial model may refer to the structure setting of the query encoder in the foregoing embodiment of the present application, which is not described herein again. Similarly, the second initial model may also select other networks as network backbones based on different classification tasks, which the present application is not particularly limited to. In some embodiments, the order of inputting the training medical images into the second initial model is different from the order of inputting the first initial model, for example, if the training medical images include 100 three-dimensional images with determined classification results and the numbers are respectively 1-100, in the training process of the feature extraction model, the training medical images with different numbers can be respectively input into the first initial model and the second initial model. Different images of the same classification result are respectively input into the first initial model and the second initial model, and then the first initial model is trained based on the result obtained by the second initial model, so that the effectiveness of the feature vector output by the first initial model can be effectively improved, and the effectiveness of the image feature output by the feature extraction model is further improved.

In the embodiment of the application, before the auxiliary image features are classified based on the types of the training medical images, a preset feature set can be correspondingly set based on different feature labels for dynamically accessing the auxiliary image features output by the second initial model, so that more data can be used for adjusting the parameters of the feature extraction model in each iterative optimization process of the feature extraction model. Classifying the auxiliary image features based on the types of the training medical images, adding the auxiliary image features into corresponding preset feature sets according to classification results, and determining feature sets of different types, wherein the types of the medical images are matched with corresponding feature labels. In a specific embodiment, as shown in fig. 3, the feature sets are classified into a class 1 and a class 2, but it is to be understood that, in fig. 3, only one specific embodiment of the present application is shown, and the number of feature sets is not limited to two, and should be set based on actual classification tasks according to actual application scenarios.

It should be noted that, training the first initial model and the second initial model is a process of continuously optimizing the iteration. In the model training process, in order to improve the training effect of the model, the first initial model and the second initial model can be continuously and alternately trained, namely the application does not limit the sequence of training the first initial model and the second initial model.

In each iterative training process of the feature extraction model, including a process of adding the auxiliary image features into a preset feature set to obtain different kinds of feature sets and extracting features from the different kinds of feature sets, in order that the feature extraction model may be better converged, in some embodiments, the classifying the auxiliary image features based on the kinds of the medical images, and determining the different kinds of feature sets includes:

s401: and determining the similarity between the auxiliary image features, classifying the auxiliary image features based on the types of the medical images and the similarity, and determining feature sets of different types.

In the embodiment of the present application, the similarity may include cosine similarity. In order to ensure that the features of the auxiliary image feature added to the preset feature set can more accurately represent the category, different kinds of feature sets can be determined according to the formulas (1) - (3).

（1）

（2）

（3）

In the formula (1), the components are as follows,irepresenting a training medical image,representing auxiliary image features->Representing and training medical imagesiThe number of auxiliary image features contained in the corresponding category feature set,/->Representing and training medical imagesiAuxiliary image features in the corresponding category feature set, < > >Representing cosine similarity between auxiliary image features in the same class of feature set.

In the formula (2), the amino acid sequence of the compound,irepresenting a training medical image,representing auxiliary image features->Representing and training medical imagesiThe number of auxiliary image features included in the corresponding feature set of different types, < + >>Representing and training medical imagesiAuxiliary image features in corresponding feature sets of different categories +.>Representing auxiliary image features->And cosine similarity between auxiliary image features in different kinds of feature sets.

In the formula (3), s represents a feature set,、/>、/>the meaning of the representation refers to the foregoing and is not described in detail herein.

In some embodiments, if the auxiliary image features in the feature set reach the upper limit of the feature number, the auxiliary image features in the feature set are output as the output of the second initial model according to the first-in first-out principle.

In the embodiment of the application, training the first initial model based on the training set and the feature set may include determining a supervised contrast loss function based on the training set and the feature set, and training the first initial model based on the supervised contrast loss function to obtain a feature extraction model. Wherein the supervised contrast loss function can be derived according to equation (4).

（4）

In the formula (4), the amino acid sequence of the compound,representing a supervised contrast loss function,>representing to train the doctorPredictive features obtained after inputting the first initial model into the study image,/->Representing the auxiliary image features resulting from the second initial model output, in some embodiments +.>Auxiliary image features output in the feature set may also be represented,qrepresenting a training medical image in a training set,P（q) Representing the absence of training medical imagesqAll feature sets of the corresponding class, +.>Representing the categoryjCorresponding feature set, < >>Representing the categoryjThe auxiliary image features in the corresponding feature set, τ, represent scalar temperature parameters.

In order to reduce the parameter variation of the second initial model and further improve the training efficiency of the feature extraction model, in the embodiment of the present application, after training the first initial model based on the training set and the feature set, the method includes:

s501: updating the second initial model based on parameters of the first initial model.

In the embodiment of the present application, the second initial model may be updated based on the parameters of the first initial model according to equation (5).

（5）

In the formula (5), the amino acid sequence of the compound,θ _k ，θ _q representing parameters of the first initial model and parameters of the second initial model respectively,mindicating a value range of 0,1 ]Momentum factor of (c). The parameters of the second initial model are updated based on the parameters of the first initial model through a vector updating method, so that the training efficiency of the feature extraction model can be effectively improved.

In order to further improve the effectiveness of the feature extraction model in outputting the image features, in the embodiment of the present application, the training set further includes a focus segmentation label, and training the first initial model based on the training set to obtain the feature extraction model includes:

s601: and inputting the medical image into the first initial model to obtain a prediction characteristic.

S603: and inputting the prediction features into a focus segmentation model to determine a prediction focus segmentation result.

S605: a first loss function is determined based on the lesion segmentation result and a lesion segmentation label.

S607: a second loss function is determined based on the feature labels and the predicted features.

S609: and updating parameters of the first initial model based on the first loss function and the second loss function to obtain the feature extraction model.

In the embodiment of the application, the focus segmentation label comprises a preset focus area label corresponding to the medical image. In some specific embodiments, the lesion segmentation label may include that the medical image is a lesion area or that the medical image is not a lesion area. The focus segmentation model is used for obtaining a predicted focus segmentation result based on the predicted features output by the first initial model. In some specific embodiments, as shown in fig. 3, the lesion segmentation model may include a segmentation head, and the first initial model may include a query encoder, which may be used to decode the extracted encoding features into lesion segmentation results. Wherein the lesion segmentation result may include determining that the medical image is a lesion region or determining that the medical image is not a lesion region.

In some embodiments, the lesion segmentation model comprises three resolution levels, wherein the first two resolution levels are each comprised of a three-dimensional deconvolution layer, a batch normalization layer, a leak ReLu activation function layer, a three-dimensional convolution layer, a batch normalization layer, a leak ReLu activation function layer, wherein the compensation of the three-dimensional deconvolution layer is 2. The last resolution level does not include the last batch normalization layer, the leak ReLu activation function layer, and the last output channel is the category number, as compared to the first two resolution levels.

In the embodiment of the present application, the first loss function may be determined according to the formula (6) based on the lesion segmentation result and the lesion segmentation label.

（6）

In the formula (6), the amino acid sequence of the compound,as a function of the first loss,X，Yrespectively representing lesion segmentation results and lesion segmentation labels, < ->Is a smoothing factor, preventing denominator from being 0.

In the embodiment of the present application, the second loss function may be determined according to the formula (7) based on the feature tag and the predicted feature.

（7）

In the formula (7), the amino acid sequence of the compound,representing a second loss function, ">Represented in medical image voxelsiFor the characteristic categorycIs provided with a feature tag and a predicted feature, NRepresenting the number of voxels of all medical images,Mrepresenting the number of all feature classes +.>The weight corresponding to the category is indicated.

In other embodiments, the training the first initial model based on the training set to obtain the feature extraction model may further include training the first initial model based on a supervised contrast loss function, a first loss function, and a second loss function to obtain the feature extraction model. In some embodiments, the feature extraction model may be derived based on the supervised contrast loss function, the first loss function, and the second loss function according to equation (8), and training a first initial model based on the feature extraction model loss function.

（8）

In the formula (8), the amino acid sequence of the compound,for supervised contrast loss function->For the first loss function, +.>For the second loss function->Representing coefficients corresponding to different loss functions. Based on the loss function, training of the feature extraction model can be achieved through a random gradient descent method. In the model training process, in order to prevent over fitting, the training medical images can be randomly rotated, translated, turned over, noised and the like to realize data augmentation.

In an embodiment of the present application, the medical image includes images of at least two image modality types, the image features include image features corresponding to the at least two image modality types, and the determining the data features based on the data to be classified includes:

s701: and judging whether the image mode type of the medical image is matched with a preset mode type.

S703: and if so, inputting the medical image into the feature extraction model to obtain image features.

S705: if the image features are not matched, determining that the image features corresponding to the image modality types are zero-setting features.

In the embodiment of the application, the image mode type may include an imaging information type corresponding to the medical image, for example, may include a DWI mode type, a T2 flash mode type, a T1CE mode type, and the like. Judging whether the image mode type of the medical image is matched with a preset mode type, if so, indicating that the image mode type of the medical image is complete, and inputting the medical image into the feature extraction model to obtain image features; if the image features are not matched, the image mode types of the medical image are not found, and the image features corresponding to the image mode types are determined to be zero-setting features. The determining that the image feature of the corresponding image modality type is a zeroing feature may include setting the image feature of the corresponding image modality type to an all-zero vector. The image features of the missing image mode types are subjected to zero setting treatment, so that the missing image modes in the medical image can be complemented, and the problem of medical image mode missing is effectively solved.

In the embodiment of the present application, the inputting the data features into the trained classifier, and determining the classification result includes:

s801: and carrying out standardization processing on the data features to obtain standardized features, wherein the standardized features comprise demographic information features and standardized image features.

S803: and performing dimension stitching on the demographic information features and the standardized image features to obtain stitching features, inputting the stitching features into a trained classifier, and determining classification results.

In the embodiment of the application, the demographic information includes gender information and age information, and the standardized processing of the data features to obtain standardized features may include the standardized processing of image features to obtain standardized image features, the binarization processing of the gender information, and the maximum and minimum standardization processing of the age information.

In some specific embodiments, as shown in fig. 4, the binarizing the gender information may include binarizing the gender information to 0 and 1, the maximum and minimum normalizing the age information may include maximum and minimum normalizing the age information to 0 to 1, and the normalizing the image features to obtain normalized image features may include normalizing the image features to 0-1. And performing dimension stitching on the demographic information features and the standardized image features to obtain stitching features, inputting the stitching features into a trained classifier, and determining the diagnosis type.

In the embodiment of the application, the classifier can be obtained based on a classifier training set, and the classifier training set can comprise a training set of a feature extraction model and a demographic information training set. In the training process of the classifier, the optimal parameters can be determined in a certain range based on the mode of searching the hyper-parameters by using the grid search, and the parameters of the classifier are updated based on the optimal parameters. The hyper-parameter lookup may include a learning rate, a number of optimal iterations, a sub-tree maximum depth, etc.

In the conventional technology, because the parameter amount of the depth network is often huge, massive training data is needed to train the model, so as to prevent the model from being over fitted. In particular for the classification task of three-dimensional images, it is necessary to map the raw data from a very high data dimension (the size of the three-dimensional image) to a distribution of several dimensions (the number of classifications). However, in the back propagation process, the information quantity available for learning, which is brought by the tag, is small, so that the fitting phenomenon is very easy to occur, and the classification diagnosis effect is poor. In addition, considering that clinically collected data is often not limited to image data, but also includes various types of data such as demographic information such as age, sex and the like, however, due to the diversity of diseases and the complexity of clinic, the collected information often has a mode loss, such as the lack of a certain magnetic resonance mode and the like. This may further result in less data volume, which may present difficulties for model training.

Therefore, according to the multi-mode data classification method, the device and the computer equipment based on the deep learning provided by the embodiment of the application, on one hand, the problem of extracting the three-dimensional image features can be solved by extracting the low-dimensional features after the dimension of the three-dimensional image is reduced; on the other hand, through training a feature extraction network, the utilization rate of label information is improved by utilizing supervised contrast learning, meanwhile, a segmentation task is utilized to restrict a feature extraction model to pay attention to a focus area, the effective features of a medical image can be extracted only by small sample training set data, the overfitting phenomenon in the back propagation process is avoided, and the accuracy of a classification result is effectively improved; in addition, by establishing a multi-mode fusion framework at the patient level and complementing the missing modes, the problem of mode missing is effectively solved.

In the embodiment of the application, the classification effect of the feature extraction model and the classifier is tested after the feature extraction model and the classifier are trained. Firstly, the acquired data set is divided into a training set and a testing set based on five-fold cross validation, each fold is trained by the feature extraction model and the classifier through the method in the embodiment, and then model testing is carried out on the corresponding data. And finally, quantitatively analyzing each fold classification result of the model through calculating a plurality of indexes such as the accuracy, the precision, the recall rate, the F1-fraction, the area under the curve and the like of the classification, and then evaluating the final classification result of the model through taking an average value. And meanwhile, the classification result is further evaluated by drawing a working characteristic curve of the subject. The test evaluation results are shown in table 1 and fig. 5.

TABLE 1

As can be seen from Table 1, the various indexes of the classification result of the application are higher than those of other classification methods of the same type. In fig. 5, the abscissa represents the false positive rate, the ordinate represents the true positive rate, and the closer the curve is to the upper left, the more accurate the classification result is.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a multi-modal data classification device based on deep learning, which is used for realizing the multi-modal data classification method based on deep learning. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiment of one or more multi-modal data classification devices based on deep learning provided below may be referred to the limitation of the multi-modal data classification method based on deep learning hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 6, there is provided a deep learning-based multi-modal data classification apparatus 900 comprising:

the data acquisition module 901 is configured to acquire data to be classified, where the data to be classified at least includes a medical image and demographic information;

a feature extraction module 902, configured to determine data features based on the data to be classified, where the data features include image features and demographic information features, and the image features are obtained by inputting the medical image into a feature extraction model;

the classification module 903 is configured to input the data features into a trained classifier, and determine a classification result.

In one embodiment, the deep learning based multi-modal data classification apparatus 900 further includes a feature extraction training module, the feature extraction training module configured to,

acquiring a training set, wherein the training set comprises training medical images and corresponding feature labels; and training a first initial model based on the training set to obtain the feature extraction model.

In one embodiment, the feature extraction training module is further configured to,

after training a first initial model based on the training set and the feature set, the second initial model is updated based on parameters of the first initial model.

In one embodiment, the training set further comprises a lesion segmentation label, the feature extraction training module is further configured to,

In one embodiment, the medical image comprises images of at least two image modality types, the image features comprise image features corresponding to at least two image modality types, the feature extraction module 902 is further configured to,

In one embodiment, the classification module 903 is further configured to,

In one embodiment, the demographic information includes gender information and age information, the classification module 903 is further configured to,

The modules in the deep learning-based multi-modal data classification device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a multi-modal data classification method based on deep learning. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of:

And training a first initial model based on the training set to obtain the feature extraction model.

The training set also comprises focus segmentation labels;

the medical image comprises images of at least two image modality types, and the image features comprise image features corresponding to the at least two image modality types;

the demographic information includes gender information and age information;

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

the training set also comprises focus segmentation labels;

the demographic information includes gender information and age information;

The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (MagnetoresistiveRandom Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PhaseChange Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (StaticRandom Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A multi-modal data classification method based on deep learning, the method comprising:

2. The method of claim 1, wherein the feature extraction model is trained by:

3. The method of claim 2, wherein training a first initial model based on the training set to obtain the feature extraction model comprises:

4. A method according to claim 3, wherein the classifying the auxiliary image features based on the category of the medical image, determining a different category of feature set comprises:

5. A method according to claim 3, characterized in that after training a first initial model based on the training set and the feature set, it comprises:

6. The method of claim 2, wherein the training set further comprises a lesion segmentation label, wherein training a first initial model based on the training set to obtain the feature extraction model comprises:

7. The method of claim 1, wherein the medical image comprises an image of at least two image modality types, the image features comprising image features corresponding to at least two image modality types, the determining data features based on the data to be classified comprising:

8. The method of claim 1, wherein the inputting the data features into a trained classifier, determining classification results comprises:

9. The method of claim 8, wherein the demographic information includes gender information and age information, and wherein normalizing the data features to obtain normalized features includes:

10. A deep learning-based multi-modal data classification apparatus, the apparatus comprising:

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 9.