CN115408551A - Medical image-text data mutual detection method, device, equipment and readable storage medium - Google Patents

Medical image-text data mutual detection method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN115408551A
CN115408551A CN202210760827.7A CN202210760827A CN115408551A CN 115408551 A CN115408551 A CN 115408551A CN 202210760827 A CN202210760827 A CN 202210760827A CN 115408551 A CN115408551 A CN 115408551A
Authority
CN
China
Prior art keywords
text
image
features
feature
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210760827.7A
Other languages
Chinese (zh)
Inventor
赵雅倩
王立
范宝余
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210760827.7A priority Critical patent/CN115408551A/en
Publication of CN115408551A publication Critical patent/CN115408551A/en
Priority to PCT/CN2022/141374 priority patent/WO2024001104A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention provides a medical image-text data mutual detection method, which comprises the following steps: performing multi-level classification on text information in the image-text data according to a predetermined mode, and respectively generating text characteristics in a cascading mode through a first neural network model according to a classification relation on the classified text information; generating image features from image information in the image-text data through a second neural network model in an image sequence mode; iteratively training based on a predetermined loss function according to the text characteristics and the image characteristics to generate a graphic data mutual inspection model; and searching corresponding text information and/or image information in the input image-text data through the image-text data mutual detection model. The loss functions provided by the invention are used for calculating the training image-text data mutual inspection model based on the text features and the image features. The medical image-text data can be accurately and quickly checked mutually.

Description

Medical image-text data mutual detection method, device and equipment and readable storage medium
Technical Field
The invention belongs to the field of computers, and particularly relates to a method, a device and equipment for mutually inspecting image-text data and a readable storage medium.
Background
With the continuous improvement of the informatization level of the medical industry, the amount of medical image data is increasingly expanded, the common current situation in the industry is that effective management and retrieval modes are lacked for the medical image data with multiple modalities, and the data retrieval with multiple modalities becomes a problem which needs to be solved urgently.
The existing medical retrieval task is mainly oriented to single-modal retrieval. The single-mode retrieval can only query information of the same mode, such as text retrieval text and image retrieval image. The cross-mode retrieval refers to using a sample of one mode to retrieve a sample of another mode with similar semanteme, such as image retrieval text and text retrieval image.
The cross-domain heterogeneity of the invention is mainly embodied as follows: the image data is in different spaces and belongs to heterogeneous data. If the retrieval is correct, the retrieval method has a cross-domain retrieval function, and the alignment and the sequencing between the modalities are realized.
Compared with single-mode data, the cross-mode retrieval not only needs to model the relation between the mode data, but also needs to model the correlation between different modes, so that the cross-domain retrieval between different modes is realized. The cross-modal retrieval has strong flexibility, wide application scene and strong user demand, is also an important research content for cross-modal machine learning, and has very important academic value and significance.
For example, with the explosive development of medical informatization, various hospital information systems are more and more perfect, and various medical data are collected. Medical data slowly becomes a special cross-modality data type following a natural data set. Typically radiologists diagnose by visual inspection directly through experience and reference to the characteristics of cases they have seen before. Due to the reasons of large data volume, limited experience and the like, the situations of misdiagnosis, missed diagnosis and the like cannot be avoided, and great hidden danger is left on the accuracy of treatment of patients. Therefore, if doctors can quickly inquire the information with similar data in the medical database to perform auxiliary diagnosis, misdiagnosis conditions can be reduced, and the working efficiency is improved.
Disclosure of Invention
In order to solve the above problems, the present invention provides a medical image-text data mutual inspection method, including:
performing multi-level classification on text information in the image-text data according to a predetermined mode, and respectively generating text characteristics of the classified text information in a cascade mode according to a classification relation through a first neural network model;
generating image characteristics of image information in the image-text data through a second neural network model in an image sequence mode;
iteratively training based on a predetermined loss function according to the text features and the image features to generate a graph-text data mutual inspection model;
and retrieving corresponding text information and/or image information from the text information and/or image information in the input image-text data through the image-text data mutual inspection model.
In some embodiments of the present invention, the classifying the text information in the image-text data in a predetermined manner in multiple stages, and generating the text features of the classified text information in a cascade manner according to the classification relationship through the first neural network model respectively includes:
and classifying the text information according to the text structure type, and calculating the characteristic vector of each classified structure text information through the first neural network model.
In some embodiments of the invention, classifying the textual information by textual structure type comprises:
and classifying the text information according to a text structure and/or a time type.
In some embodiments of the present invention, performing multi-level classification on the text information in the teletext data according to a predetermined manner, and generating the text features of the classified text information in a cascade manner according to the classification relationship through the first neural network model respectively further includes:
and sequencing the text contents in the classified structural text information according to the occurrence times of the sentences, and inputting each sequenced sentence as a parameter into the first neural network model to calculate the text characteristics of the structural text information.
In some embodiments of the present invention, the step of sorting the sentences by their occurrence times, and the step of inputting each sorted sentence as a parameter into the first neural network model to calculate the text feature of the structural text information includes:
and adding the sequence number value corresponding to the word in each sentence and the sentence number in the text structure classification, and inputting the sum to the first neural network model to calculate the text characteristics of the structural text information. In some embodiments of the invention, the method further comprises:
selecting any one of the calculation results of the plurality of sentences output by the first neural network model and corresponding to the sentences as the text feature of the structural text information; or
And weighting and averaging the calculation results output by the first neural network model and corresponding to the sentences to obtain the text characteristics of the structural text information.
In some embodiments of the present invention, classifying the text information in the teletext data in a predetermined manner in multiple stages, and generating the text features of the classified text information in a cascade manner according to a classification relationship through the first neural network model respectively further includes:
and inputting a plurality of text features of the structural text information into the first neural network model to obtain the text features of the text information.
In some embodiments of the present invention, inputting a plurality of feature vectors of the structural text information to the text features of the first neural network model that result in the text information comprises:
and adding the text features of each piece of structural text information, the sequence values of the structural text corresponding to the text features and the classification numbers, and inputting the sum to the first neural network model to calculate the text features of the text information.
In some embodiments of the invention, the method further comprises:
selecting any one of the calculation results output by the first neural network model and corresponding to a plurality of structural text messages as the text feature of the text message; or
Weighting and averaging the calculation results output by the first neural network model and the plurality of structural text messages to obtain the text features of the text messages; or
And splicing the text features of the plurality of structural text messages into long vectors, and obtaining the text features of the text messages through the spliced long vectors through a full connection layer.
In some embodiments of the invention, generating image features from image information in the teletext data in a sequence of images through the second neural network model comprises:
inputting the image sequence into the second neural network model and calculating an image sequence feature vector corresponding to the image sequence;
calculating the weight of the image sequence feature vector, and multiplying the weight by the image sequence feature vector to obtain an image sequence feature weight vector; and
and adding the image characteristic weight vector and the image sequence characteristic vector to obtain the image characteristic.
In some embodiments of the invention, calculating the weight of the image sequence feature vector comprises:
enabling the image sequence feature vector to pass through a first full-connection layer to obtain a first full-connection layer vector;
obtaining a pooling layer vector by the first full-connection layer vector through a pooling layer;
obtaining a second full-connection layer vector by the pooling layer vector through a second full-connection layer;
and normalizing the second full-connected vectors to obtain the weight of the corresponding image sequence.
In some embodiments of the present invention, iteratively training the teletext data cross-checking model based on the predetermined loss function according to the text features and the image features comprises:
calculating the Euclidean distance between the text feature and the corresponding image feature and the minimum Euclidean distance between the text feature and other text features and/or image features for any text feature, and taking the difference between the Euclidean distance and the minimum Euclidean distance as a text loss value;
calculating the Euclidean distance between the image feature and the corresponding text feature and the minimum Euclidean distance between the image feature and other text features or image features for any image feature, and taking the difference between the Euclidean distance and the minimum Euclidean distance as an image loss value;
and summing the text loss value and the image loss value to obtain a first loss value, and training the mutual inspection model through the first loss value.
In some embodiments of the present invention, iteratively training the generation of the teletext data cross-inspection model based on the predetermined loss function according to the text feature and the image feature further comprises:
converting the text features into an image feature space by a first conversion method to obtain image text features, and converting the image text features into a text feature space by a second conversion method to obtain text conversion features;
transforming the image features to a text feature space by a second conversion method to obtain text image features, and transforming the text image features to an image feature space by a first conversion method to obtain image transformation features;
and taking the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature as a second loss value, and training the mutual inspection model through the second loss value.
In some embodiments of the present invention, iteratively training the teletext data cross-checking model generated based on the text features and the image features based on a predetermined loss function further comprises:
and calculating loss values corresponding to the corresponding text features and image features respectively through a third conversion method, judging the difference between the loss value corresponding to the text features and the loss value corresponding to the image features, taking the difference as a third loss value, and iteratively training the mutual inspection model through the third loss value.
In some embodiments of the present invention, iteratively training the generation of the teletext data cross-inspection model based on the predetermined loss function according to the text feature and the image feature further comprises:
and iteratively training the mutual inspection model by taking the sum of the first loss value, the second loss value and the third loss value as a loss value.
In another aspect of the present invention, a medical image-text data mutual inspection apparatus is further provided, including:
the preprocessing module is configured to perform multi-level classification on the text information in the image-text data according to a predetermined mode, and generate text features in a cascading mode according to classification relations of the classified text information through a first neural network model;
the first model calculation module is configured to generate image features of image information in the image-text data through a second neural network model in an image sequence mode;
the second model calculation module is configured to generate a graphic data mutual inspection model based on a predetermined loss function iterative training according to the text features and the image features;
and the image-text mutual inspection module is configured to retrieve corresponding text information and/or image information from the input image-text data through the image-text data mutual inspection model.
Yet another aspect of the present invention also provides a computer apparatus, including:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any of the above embodiments.
Yet another aspect of the present invention also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the method in any one of the above-mentioned embodiments.
According to the image-text mutual inspection method provided by the invention, the corresponding text characteristics of the text data are calculated in a multi-stage transform model cascade mode, the image characteristics of the image sequence are calculated through a residual error network, and the training image-text data mutual inspection model is calculated through a plurality of loss functions provided by the invention based on the text characteristics and the image characteristics. And further predicting the input text data or image data or retrieving the corresponding text data or image data through an image-text data mutual inspection model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an embodiment of a medical image-text retrieval method according to the present invention;
FIG. 2 is a schematic diagram of medical textual data provided by an embodiment of the present invention;
fig. 3 is a schematic diagram of a partial model structure of a medical image retrieval method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a partial model structure of a medical image retrieval method according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a partial model structure of a medical image-text retrieval method according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a model structure of a medical image-text retrieval method according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a partial model structure of a medical image retrieval method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a part of a model of a medical image-text retrieval method according to an embodiment of the present invention.
Fig. 9 is a schematic structural diagram of a medical image-text data mutual inspection device according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention;
fig. 12 is a diagram illustrating a semantic alignment loss fighting loss function according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
The invention aims to realize the mutual retrieval task between medical science and image text. Task data consists of two parts, medical image and medical text-for medical images a wide variety of images are included, such as nuclear magnetic images, CT, ultrasound images, etc., which are sequential images. For medical text, include: medical records reports, and the like. This is by way of example only and does not indicate that the method of the invention can only be used in this field.
In the traditional solutions, a single-mode retrieval method is mostly adopted to analyze the disease of a patient, for example, some medical image detection models based on an image processing technology can only provide the disease detection result of a corresponding medical image, only the medical image gives the disease result, i.e., the patient is sick or not sick, and the patient is difficult to be analyzed in all directions, and the patient needs to be analyzed in all directions to acquire various information of the patient, such as information of a medical history, a family medical history, age, a living habit and the like, and the information mostly adopts text records, so that an intelligent auxiliary technology for determining the disease according to the various information of the patient is lacked in the traditional technology. It is impossible to provide medical staff with comprehensive analysis of disease conditions and etiology.
As shown in fig. 1, to solve the above problem, the present invention provides a medical image-text data mutual inspection method, including:
step S1, performing multi-level classification on text information in image-text data according to a predetermined mode, and generating text characteristics in a cascading mode through a first neural network model according to a classification relation on the classified text information;
s2, generating image characteristics of image information in the image-text data through a second neural network model in an image sequence mode;
s3, iteratively training based on a preset loss function according to the text characteristics and the image characteristics to generate a graphic data mutual inspection model;
and S4, retrieving corresponding text information and/or image information from the text information and/or image information in the input image-text data through the image-text data mutual detection model.
In the embodiment of the present invention, in step S1, the image-text data refers to text data and image data corresponding to the medical image, that is, the image-text data in the present invention refers to medical sequence images and corresponding disease description, and information related to disease of the patient, such as physical state information corresponding to the patient, and specifically refer to the content shown in fig. 2.
Further, the text data in the image-text data, i.e., the description information of the disease state, is classified into a plurality of categories, as shown in fig. 2, and the classified text is used as a unit, and the classified text is input into the first neural network model, in this embodiment, the first neural network model is a Transformer model, that is, in step S1, corresponding feature vectors are respectively calculated for the classified text data through a plurality of Transformer models, and then the feature vectors output by the plurality of Transformer models are used as input and input into one upper Transformer model, and the output result of the upper Transformer model is used as a text feature.
In step S2, the medical image in the image-text data is calculated by the residual error network model ResNet to obtain the corresponding image characteristics. The image feature is a vector of a specified size.
In some embodiments of the present invention, the number of the medical images in the image-text data is at least 1, and usually refers to a plurality of medical images, that is, in reality, medical images such as nuclear magnetic resonance or CT generally scan a lesion at a plurality of angles or a plurality of angles. Therefore, when there are a plurality of medical images, it is necessary to generate corresponding image features from the plurality of medical images.
In step S3, similarity matching is performed on the text features and the image features, corresponding loss values of similarity are calculated according to a preset loss function, and are propagated back to the transform model and the residual network model through the corresponding loss values to perform iterative training repeatedly until the size of the loss values meets the requirement on accuracy, and then the transform model, the residual network model and corresponding model parameters on the loss function are saved as mutual inspection models.
In step S4, when the mutual detection model is used to perform analysis or prediction, the text description of the corresponding case or disease and/or the corresponding medical image are/is input to the mutual detection model, and the mutual detection model gives a matched detection report according to the input text or image, or the diagnosis content of the corresponding disease is screened out through the corresponding medical image. The medical image and text mutual detection is realized, and the medical workers are helped to reduce the workload.
In some embodiments of the present invention, the classifying the text information in the image-text data in a predetermined manner in multiple stages, and generating the text features of the classified text information in a cascade manner according to the classification relationship through the first neural network model respectively includes:
and classifying the text information according to the text structure type, and calculating the characteristic vector of each classified structure text information through the first neural network model.
In the present embodiment, as shown in fig. 2, fig. 2 shows the description information of the disease condition of a patient in a certain hospital, and includes personal information of the patient, such as age, marital status, occupation, and the like, and many contents, such as allergy history, current medical history, personal history, past history, family history, current disease condition, and the like. In this embodiment, the text information in fig. 2 is divided into a plurality of structural texts according to the above classification, for example: the text content of the personal history classification is: the birth is better than the home, the living environment is good when the people live- \ 8230, and the like are used as text information. And inputting the content into a Transformer model as input data of the Transformer model, and giving the feature vector of the corresponding text information under the classification by the Transformer model. Namely, the content of the personal history is represented by giving out corresponding characteristic vectors through a Transformer model for subsequent model judgment.
Further, the classified text content is input into the Transformer model, instead of being input as original text, corresponding characters are converted into word vector patterns by using a corresponding tool, and then input into the Transformer model, and the corresponding tool can be a model such as Bert to perform text vectorization.
In some embodiments of the invention, classifying the textual information by textual structure type comprises:
and classifying the text information according to a text structure and/or a time type.
In some embodiments of the present invention, scoring of textual information may also be based on the manner in which time is combined with structure. For example, the causes of some diseases can be influenced not by past medical history, but by living habits of patients in a recent period of time or other pre-occurring disease conditions, so when the contents of all the diseases are mixed with the medical history, the past history or the family history of the patients, a large number of irrelevant factors can influence the judgment of the mutual detection model, and text information can be classified by combining time factors when the text information is classified. Highlighting the effect of the text content of certain disorders in the model judgment.
In some embodiments of the present invention, performing multi-level classification on the text information in the teletext data according to a predetermined manner, and generating the text features of the classified text information in a cascade manner according to the classification relationship through the first neural network model respectively further includes:
and sequencing the text contents in the classified structural text information according to the occurrence times of the sentences, and inputting each sequenced sentence as a parameter into the first neural network model to calculate the text characteristics of the structural text information.
In this embodiment, for the classified text content, the invention also adopts a transform model cascade mode to generate the classified text features. Specifically, the classified text is divided into corresponding sections according to punctuation marks or semantics, which are called clauses in the invention, that is, each classification is represented by a plurality of clauses, and in natural language, the contents of the plurality of clauses are the classified text contents.
Further, each clause is used as an input of a Transformer model, feature vectors corresponding to the clauses are calculated, namely, one clause corresponds to one Transformer model, then, a plurality of clause feature vectors are input into one Transformer model, the feature vectors of the plurality of clauses are calculated by the Transformer model, and the calculation result is the text feature of the classified text content, for example, all the contents of the personal history in fig. 2 are calculated by corresponding to one Transformer model according to each sentence (comma interval can also be regarded as one sentence), then, a plurality of outputs of the Transformer model corresponding to the plurality of sentences are input into one total Transformer model, and then, the text feature of the personal history is output by the total Transformer model. Referring to the cascade structure shown in fig. 3, fig. 3 shows that text features of a plurality of classified texts are input to an overall Transformer model in a cascade manner, and the first text information in fig. 3 is converted into a corresponding clause in a manner that the overall Transformer model outputs the text features.
In some embodiments of the present invention, the sentences are ranked according to their occurrence times, and each of the ranked sentences is input as a parameter to the first neural network model to calculate the text features of the structural text information:
adding the sequence number value corresponding to the words in each sentence and the sentence number in the text structure classification, and inputting the sum to the first neural network model to calculate the text characteristics of the structural text information.
In this embodiment, as described above, when text features of the classified text information are calculated in clause units, a plurality of clauses are input to a plurality of Transformer models, and when any clause is input to a Transformer model, a corresponding word (represented by a word vector, i.e., a number) in the clause is added to a position number of the corresponding clause in the classified text information, i.e., if a value of the word vector of a first word is 0.3, and the clause is a first clause in the classified text, a calculation process is 0.3+1, and further, if the position number of the word vector is also 1, i.e., 0.3+1+ 2.3, and 2.3 is a first input data of the Transformer model, and so on, for a second time, the value is 0.4, and if the value input to the Transformer model is 0.4, a second sentence is a second sentence, wherein the first sentence is a second sentence, and the second sentence is a second sentence. And calculating the corresponding characteristic vector of each clause according to the mode, and finally inputting the characteristic vectors of a plurality of clauses into a total Transformer model to obtain the classified text characteristic value.
It should be noted that when feature vectors of multiple clauses are added to the Transformer model, the computation process is as a clause computation process, i.e. if the feature vector of the first clause is a = [0.1,0.2,0.3], i.e. it is not a single word vector, when the feature vector is input to the Transformer model, if a is the feature vector of the first clause, a +1 is added, and if the classification number is the first text classification, also 1 is added, i.e. a +1= [2.1,2.2,2.3], and so on, the computation of the cascaded Transformer model for each text classification is completed. Specifically referring to fig. 4, emb in the lower part of fig. 4 represents an input data of the Transformer model, before inputting into the Transformer model, any data needs to be added with the number of the text classification to which it belongs, i.e. the value of the text type in the second last line in the lower part of the figure, and then added with the input sequence number of the clause (i.e. the position information in fig. 4), and the last value is input into the Transformer model.
In some embodiments of the invention, the method further comprises:
selecting any one of the calculation results output by the first neural network model and corresponding to the plurality of sentences as a text feature of the structural text information; or
And weighting and averaging the calculation results output by the first neural network model and corresponding to the sentences to obtain the text characteristics of the structural text information.
In some embodiments of the present invention, the structured textual information refers to classified textual information. The output of the Transformer model at any level can be selected in two ways, the first way is dependent on the calculation principle of the Transformer model, and the Transformer model calculates any input data with other input data and outputs the result of the data, namely the feature vector (different from the value of the original input) of the input data, so that the output result of any input data can be used as the output of the Transformer model at the level, namely, if the output result of a certain classification text message can use the value of one clause calculated by the total Transformer model as the text feature of the classification text.
Taking the Transformer model output value of a certain clause as the text characteristic of the whole classified text; or weighting and averaging the output values of the clause Transformer models to obtain the text characteristics of the corresponding structural text information.
In some embodiments of the present invention, performing multi-level classification on the text information in the teletext data according to a predetermined manner, and generating the text features of the classified text information in a cascade manner according to the classification relationship through the first neural network model respectively further includes:
and inputting a plurality of text features of the structural text information into the first neural network model to obtain the text features of the text information.
As shown in fig. 3, fig. 3 is a schematic diagram of a transform model cascade according to the present invention, that is, a plurality of transform models are used to calculate a plurality of classified texts at a lower layer, and obtain a plurality of corresponding classified text features, and then the plurality of classified text features are input to a last-stage transform model to obtain text features of an entire text.
In some embodiments of the present invention, inputting a plurality of feature vectors of the structural text information to the text feature of the first neural network model that results in the text information comprises:
and adding the text features of each piece of structural text information, the sequence values of the structural text corresponding to the text features and the classification numbers, and inputting the sum to the first neural network model to calculate the text features of the text information.
In this embodiment, the structure text refers to a classified text classified according to a structure. Further, similar to the feature cascade calculation of clauses in classified texts, for the text feature calculation of the whole text information, the corresponding classified text features need to be added with the corresponding classification numbers first, and then added with the corresponding sequence, except that the two added values are the same in some scenarios.
In some embodiments of the invention, the method further comprises:
selecting any one of the calculation results which are output by the first neural network model and correspond to a plurality of structural text messages as the text feature of the text message; or
Weighting and averaging the calculation results output by the first neural network model and the plurality of structural text messages to obtain the text features of the text messages; or
And splicing the text features of the plurality of structural text messages into long vectors, and obtaining the text features of the text messages through the spliced long vectors through a full connection layer.
In this embodiment, similar to the above text features for determining the classified text, when determining the text features of the whole text, the features of one classified text output by the Transformer model may be selected as the features of the whole text. Namely, the output result of the total Transformer model corresponding to one of the classifications can be selected as the text characteristic of the image-text data.
The text features of the entire text information may be obtained by weighted averaging of the text features in the plurality of pieces of structured text information (classified text information).
In some embodiments of the present invention, text features of a plurality of pieces of structural text information may be spliced end to end, and a new dimensional feature vector is obtained from the spliced text features through a full connection layer as a text feature of the entire text information.
In some embodiments of the invention, generating image features from image information in the teletext data by means of the second neural network model in a sequence of images comprises:
inputting the image sequence into the second neural network model and calculating an image sequence feature vector corresponding to the image sequence;
calculating the weight of the image sequence feature vector, and multiplying the weight by the image sequence feature vector to obtain an image sequence feature weight vector; and
and adding the image characteristic weight vector and the image sequence characteristic vector to obtain the image characteristic.
In the present embodiment, as shown in fig. 5, the image sequence shown in fig. 5 shows only 3 images. Specifically, an image sequence is calculated through a residual error network, and a feature vector of each corresponding image is obtained.
And further calculating the weight of the feature vector corresponding to each image, multiplying the weight by the corresponding feature vector respectively, adding the multiplied weights to the feature vectors of the corresponding images, and transforming the plurality of image feature weight vectors into a new dimension through linear transformation to serve as the image features of the image sequence.
In some embodiments of the invention, calculating the weights of the image sequence feature vectors comprises:
enabling the image sequence feature vector to pass through a first full-connection layer to obtain a first full-connection layer vector;
obtaining a pooling layer vector by the first full-connection layer vector through a pooling layer;
obtaining a second full-connection layer vector by the pooling layer vector through a second full-connection layer;
and normalizing the second full-connected vectors to obtain the weight of the corresponding image sequence. In this embodiment, as shown in fig. 7, fig. 7 is a subgraph of the general network structure of the present invention, illustrating our weight calculation structure, including two fully-connected layers FC and one ReLU layer. In the invention, the image features pass through a backbone network backbone to obtain embedded features, namely the feature vectors of the images, and the embedded features pass through a full connection layer to obtain the final embedded features e of each image. The final embedded feature e is subjected to attribute structure, the weight of each feature is calculated, the weight is a number, and normalization is carried out through a sigmoid layer.
The weights of the features of all image sequences are unified into the softmax layer to determine which medical image sequence is important. Finally, the feature weights of the image sequence after passing through the softmax layer are multiplied by the final embedded feature e of each corresponding image. Meanwhile, we introduce the idea of residual error network, and for each medical image sequence, the output of the attention structure is shown as the following formula:
Figure RE-GDA0003920480980000151
and the final image characteristics can pass through a full connection layer fc of the Liner to obtain the final medical image characteristics:
Figure RE-GDA0003920480980000152
in the above formula
Figure RE-GDA0003920480980000153
Representing the original image sequence feature vector added after multiplying the image sequence feature vector corresponding to a plurality of images by weight (extension),
Figure RE-GDA0003920480980000154
and representing the finally calculated image features, and calculating the feature vector of the image sequence subjected to weight calculation and self addition through a full connection layer to obtain a feature vector of a new dimension as the image features of the image. Attenttion value the computation of the Attenttion model shown in FIG. 7 is described. fc denotes the full connection layer.
In some embodiments of the present invention, iteratively training the generation of the teletext data cross-inspection model based on the predetermined loss function based on the text features and the image features comprises:
calculating the Euclidean distance between the text feature and the corresponding image feature and the minimum Euclidean distance between the text feature and other text features and/or image features for any text feature, and taking the difference between the Euclidean distance and the minimum Euclidean distance as a text loss value;
calculating the Euclidean distance between the image feature and the corresponding text feature and the minimum Euclidean distance between the image feature and other text features or image features for any image feature, and taking the difference between the Euclidean distance and the minimum Euclidean distance as an image loss value;
and summing the text loss value and the image loss value to obtain a first loss value, and training the mutual inspection model through the first loss value.
In this embodiment, the present invention proposes a new generated probability change-loss function to evaluate the above model loss. The formula is as follows:
Figure RE-GDA0003920480980000161
in the design of the loss function, as shown in fig. 8, the present invention will average the loss function for such paired data through each image group feature encoding (i.e. the image feature as described above) and the text feature encoding (the text feature corresponding to the whole text information). As shown in the above formula.
Each iteration requires N iterations, where N represents N pairs of samples in the batch. First, feature the image group
Figure RE-GDA0003920480980000162
Go through the traversal (N in total), and the selected one is called as the traversal
Figure RE-GDA0003920480980000163
a represents anchor (anchor sample). Text feature encodings paired with anchor samples
Figure RE-GDA0003920480980000164
p represents positive. Similarly, in this batch and
Figure RE-GDA0003920480980000165
all remaining unpaired samples are denoted s np . Δ is a hyperparameter, fixed at training, and set to 0.4 in the present invention.
Similarly, the invention does the same traversal operation for text features,
Figure RE-GDA0003920480980000166
the sample which is selected in the traversal is represented, and the positive image group characteristic sample corresponding to the sample is marked as
Figure RE-GDA0003920480980000167
Non-corresponding notation s np . The method is used for performing gradient back transmission in training of the loss function, and updating parameters of a cascade Transformer model and a ResNet network.
In the present embodiment, the text loss value refers to the following formula:
Figure RE-GDA0003920480980000168
wherein, the first and the second end of the pipe are connected with each other,
Figure RE-GDA0003920480980000169
representing the euclidean distance of a text feature from a corresponding image feature,
Figure RE-GDA00039204809800001610
representing the minimum euclidean distance of a text feature from other text features and/or image features.
The image loss value refers to the following formula:
Figure RE-GDA0003920480980000171
Figure RE-GDA0003920480980000172
representing the first loss value (the loss value of the loss function after one iteration of calculation),
Figure RE-GDA0003920480980000173
representing the euclidean distance of an image feature from a corresponding text feature,
Figure RE-GDA0003920480980000174
representing the minimum euclidean distance of an image feature from other text features and/or image features.
In some embodiments of the present invention, iteratively training the generation of the teletext data cross-inspection model based on the predetermined loss function according to the text feature and the image feature further comprises:
converting the text features to an image feature space by a first conversion method to obtain image text features, and converting the image text features to a text feature space by a second conversion method to obtain text conversion features;
transforming the image features to a text feature space by a second conversion method to obtain text image features, and transforming the text image features to an image feature space by a first conversion method to obtain image transformation features;
and taking the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature as a second loss value, and training the mutual inspection model through the second loss value. In this embodiment, to achieve alignment of multi-structure text and image features, i.e., two features describe information of the same semantic space. The invention designs a semantic alignment loss countermeasure loss function as shown in FIG. 12:
wherein F represents a first conversion method, G represents a second conversion method,
as shown in FIG. 12, X represents e csi I.e. our image set features, Y stands for e rec I.e. our medical text features. We want the two features X, Y to map into a common space.
To constrain the above object, the present invention performs the steps of:
1 mapping X features to method G
Figure RE-GDA0003920480980000175
Representing features mapped by image features into a text feature space.
2 will
Figure RE-GDA0003920480980000176
Mapping to by method F
Figure RE-GDA0003920480980000177
2 requirements of the invention
Figure RE-GDA0003920480980000178
As close as possible to the original feature X.
The same principle is that:
4 mapping Y features to method F
Figure RE-GDA0003920480980000179
Figure RE-GDA00039204809800001710
Representing features mapped by text features into an image feature space.
5 will
Figure RE-GDA00039204809800001711
Mapping to by method G
Figure RE-GDA00039204809800001712
6 requirements of the invention
Figure RE-GDA00039204809800001713
As close as possible to the original feature Y.
Thus, the constraint equation is as follows:
L c =min{E(||F(G(X))-X|| 2 )+E(||G(F(Y))-Y|| 2 )}
wherein E (| | F (G (X)) -X | | electrically ventilated 2 ) The mean value of the difference between the image transformation characteristic and the original image characteristic is obtained after the image characteristic is transformed to the text characteristic space by the first transformation method and the text image characteristic is obtained;
E(||G(F(Y))-Y|| 2 ) The text features are transformed to an image feature space by a first transformation method to obtain image text features, and the image text features are transformed to a text feature space by a second transformation method to obtain text transformationAfter the feature, the text transforms the mean of the difference between the feature and the original text feature,
L c the minimum of the sum of the distance of the text transformation feature to the text feature and the distance of the image transformation feature to the image feature, i.e. the minimum of the second loss function, is represented.
In the present embodiment by L c The cross-check model is iteratively trained as a loss function.
In some embodiments of the present invention, iteratively training the teletext data cross-checking model generated based on the text features and the image features based on a predetermined loss function further comprises:
and respectively calculating loss values corresponding to the corresponding text features and the image features through a third conversion method, judging the difference between the loss values corresponding to the text features and the loss values corresponding to the image features, taking the difference as a third loss value, and iteratively training the mutual inspection model through the third loss value.
In this embodiment, the present invention aims to make the features of X (image feature) and Y (text feature) as close as possible, so the present invention designs a discriminant loss function:
that is, X is mapped to Dx features by the D method (scalar, i.e., a certain image feature is computed to one scalar Dx by the third conversion method D). Y is mapped to Dy features (scalar) by the D method. The purpose is to make Dx and Dy characteristics as close as possible, even if Dx or Dy cannot be determined.
The formula is as follows:
L d =E[log D(Y)]+E[log(1-D(X))]
wherein, log D (Y) is the logarithm of the number after the text feature is transformed into a scalar by a third transformation method D, and E [ log D (Y) ] represents the average of all logarithms in a corresponding Batch sample; log (1-D (X)) means that after the image feature is transformed into a scalar by a third transformation method D, a logarithm is obtained, and E [ log (1-D (X)) means an average of logarithms of all image data in one Batch sample after the transformation by the third transformation method D.
L d Then the discriminant loss function is expressed asThe penalty value in each Batch sample, i.e., the penalty value of the penalty function under one Batch iteration training. If the third transformation method D obtains suitable parameter values through iterative training, D (Y) and D (X) should be closer without any ratio, L d If the image characteristics are infinitely close to 0, (under a super-ideal condition), the mutual inspection model transforms the text characteristics and the image characteristics into the same space, and the text characteristics and the corresponding image characteristics have very similar meanings, namely almost the same meanings, the text characteristics can be represented by the image characteristics, and the practical meaning is that the corresponding text information description is found for a certain group of medical images, or the corresponding image is found for certain text information to be displayed, namely, the image-text mutual inspection is realized. In some embodiments of the present invention, iteratively training the teletext data cross-checking model generated based on the text features and the image features based on a predetermined loss function further comprises:
and iteratively training the mutual inspection model by taking the sum of the first loss value, the second loss value and the third loss value as a loss value.
In this embodiment, the three loss functions may be trained in a superimposed manner on the mutual inspection model provided by the present invention, that is, a formula is described as follows:
Figure RE-GDA0003920480980000191
example (b):
the training process of the present invention can construct a cascade transform-based medical image text retrieval network, including a text information feature encoder and a medical image sequence feature encoder (as shown in the above figure), with reference to fig. 6.
Establishing a generational pair change-loss function:
Figure RE-GDA0003920480980000192
the network is trained to converge according to the loss function as described above.
The network training process is as follows: the training process of the neural network is divided into two phases. The first phase is a phase in which data is propagated from the lower level to the upper level, i.e., a forward propagation phase. The other stage is a stage for training the propagation of the error from the high level to the bottom level when the result of the current propagation does not match the expectation, namely a back propagation stage. The training process is as follows:
1. initializing all network layer weights, generally adopting random initialization;
2. the input image and text data are transmitted forward through a neural network, a convolution layer, a down-sampling layer, a full-connection layer and other layers to obtain an output value;
3. calculating the output value of the network, and calculating the universal triple loss function of the output value of the network according to the above calculation formula
Figure RE-GDA0003920480980000201
Semantic alignment penalty function (L) c +L d ) And (4) summing.
4. And (4) reversely transmitting the error back to the network, and sequentially solving each layer of the network: the back propagation error of each layer such as the transform layer, the full link layer, and the convolution layer.
5. And (4) adjusting all weight coefficients in the network according to the back propagation errors of each layer, namely updating the weights.
6. And randomly selecting new batch image text data again, and then entering a second step to obtain an output value by network forward propagation.
7. And (4) performing infinite reciprocating iteration, and ending the training when the error between the output value of the obtained network and the target value (label) is smaller than a certain threshold value or the iteration number exceeds a certain threshold value.
8. And storing the network parameters of all the trained layers.
The following briefly describes the network reasoning process, i.e. the search matching process:
and in the reasoning process, pre-loading the weight coefficient trained by the network. And performing feature extraction on the medical text or the medical image sequence. And storing the data into a data set to be retrieved. The user is given any medical text data or medical image sequence data, which we call query data. Extracting the characteristics of the medical text data or the medical image sequence data of the query data, and using the medical image text retrieval network of our cascade transformer. And performing distance matching on the features of the query data and the features of all samples in the data set to be retrieved, namely solving the vector distance. The present invention calculates the Euclidean distance. For example: and if the query data is medical text data, all medical image sequence features in the data set to be retrieved are obtained to calculate the distance. The isomorphic query data is medical image sequence data. And solving Euclidean distances from all medical image sequence features in the data set to be retrieved, wherein the sample with the minimum distance is the recommended sample, and outputting the recommended sample.
According to the image-text mutual inspection method provided by the invention, the corresponding text characteristics of the text data are calculated in a multi-stage transform model cascade mode, the image characteristics of the image sequence are calculated through a residual error network, and the training image-text data mutual inspection model is calculated through a plurality of loss functions provided by the invention based on the text characteristics and the image characteristics. And further predicting the input text data or image data or retrieving the corresponding text data or image data through an image-text data mutual inspection model.
As shown in fig. 9, another aspect of the present invention further provides a medical image-text data mutual inspection apparatus, including:
the system comprises a preprocessing module 1, a first neural network model and a second neural network model, wherein the preprocessing module 1 is configured to carry out multistage classification on text information in image-text data according to a predetermined mode and generate text characteristics in a cascading mode according to a classification relation through a first neural network model;
the first model calculation module 2 is used for generating image characteristics of image information in the image-text data through a second neural network model in an image sequence mode by the first model calculation module 2;
the second model calculation module 3 is configured to generate a graphic data mutual inspection model based on a predetermined loss function iterative training according to the text features and the image features;
and the image-text mutual inspection module 4 is configured to retrieve corresponding text information and/or image information from the input image-text data through the image-text data mutual inspection model.
As shown in fig. 10, another aspect of the present invention also provides a computer device, including:
at least one processor 21; and
a memory 22, wherein the memory 22 stores computer instructions 23 executable on the processor 21, and when executed by the processor 21, the instructions 23 implement the steps of the method according to any one of the above embodiments.
As shown in fig. 11, a further aspect of the present invention also provides a computer-readable storage medium 401, where the computer-readable storage medium 401 stores a computer program 402, and the computer program 402 when executed by a processor implements the steps of the method in any one of the above-mentioned embodiments.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant only to be exemplary, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (18)

1. A medical image-text data mutual detection method is characterized by comprising the following steps:
performing multi-level classification on text information in the image-text data according to a predetermined mode, and respectively generating text characteristics of the classified text information in a cascade mode according to a classification relation through a first neural network model;
generating image characteristics of image information in the image-text data through a second neural network model in an image sequence mode;
iteratively training based on a predetermined loss function according to the text features and the image features to generate a graphic data mutual inspection model;
and searching corresponding text information and/or image information in the input image-text data through the image-text data mutual detection model.
2. The method of claim 1, wherein the step of classifying the text information in the teletext data in a predetermined manner in a multi-level manner, and the step of generating the text features in a cascade manner by the classified text information respectively through the first neural network model according to the classification relationship comprises:
classifying the text information according to text structure types, and calculating the characteristic vector of each classified structure text information through the first neural network model.
3. The method of claim 2, wherein classifying the textual information by text structure type comprises:
and classifying the text information according to a text structure and/or a time type.
4. The method of claim 2, wherein the classifying the text information in the teletext data in a predetermined manner in a multi-level manner, and the generating the text features of the classified text information in a cascade manner according to the classification relationship by the first neural network model respectively further comprises:
and sequencing the text contents in the classified structural text information according to the occurrence times of the sentences, and inputting each sequenced sentence as a parameter into the first neural network model to calculate the text characteristics of the structural text information.
5. The method according to claim 4, wherein the sentences are sorted according to their occurrence times, and each of the sorted sentences is input as a parameter to the first neural network model to calculate the text features of the structural text information:
and adding the sequence number value corresponding to the word in each sentence and the sentence number in the text structure classification, and inputting the sum to the first neural network model to calculate the text characteristics of the structural text information.
6. The method of claim 4, further comprising:
selecting any one of the calculation results of the plurality of sentences output by the first neural network model and corresponding to the sentences as the text feature of the structural text information; or
And weighting and averaging the calculation results output by the first neural network model and corresponding to the sentences to obtain the text features of the structural text information.
7. The method of claim 4, wherein the classifying the text information in the teletext data in a predetermined manner in a multi-level manner, and generating the text features of the classified text information in a cascade manner according to the classification relationship through the first neural network model respectively further comprises:
and inputting a plurality of text features of the structural text information into the first neural network model to obtain the text features of the text information.
8. The method of claim 7, wherein inputting a plurality of feature vectors of the structural text information into the first neural network model to obtain the text features of the text information comprises:
and adding the text features of each piece of structural text information, the sequence values of the structural text corresponding to the text features and the classification numbers, and inputting the sum to the first neural network model to calculate the text features of the text information.
9. The method of claim 7, further comprising:
selecting any one of the calculation results output by the first neural network model and corresponding to a plurality of structural text messages as the text feature of the text message; or
Weighting and averaging the calculation results output by the first neural network model and the plurality of structural text messages to obtain the text features of the text messages; or
And splicing the text features of the plurality of structural text messages into long vectors, and obtaining the text features of the text messages through the spliced long vectors through a full connection layer.
10. The method of claim 1, wherein generating image features from image information in the teletext data by means of the second neural network model in a sequence of images comprises:
inputting the image sequence into the second neural network model and calculating an image sequence feature vector corresponding to the image sequence;
calculating the weight of the image sequence feature vector, and multiplying the weight by the image sequence feature vector to obtain an image sequence feature weight vector; and
and adding the image feature weight vector and the image sequence feature vector to obtain the image feature.
11. The method of claim 10, wherein the computing the weight of the image sequence feature vector comprises:
enabling the image sequence feature vector to pass through a first full-connection layer to obtain a first full-connection layer vector;
obtaining a pooling layer vector by the first full-connection layer vector through a pooling layer;
obtaining a second full-connection layer vector by the pooling layer vector through a second full-connection layer;
and normalizing the second full-connected vectors to obtain the weight of the corresponding image sequence.
12. The method of claim 1, wherein iteratively training a teletext model based on a predetermined loss function based on the text feature and the image feature to generate a teletext model comprises:
calculating the Euclidean distance between the text feature and the corresponding image feature and the minimum Euclidean distance between the text feature and other text features and/or image features for any text feature, and taking the difference between the Euclidean distance and the minimum Euclidean distance as a text loss value;
calculating the Euclidean distance between the image feature and the corresponding text feature and the minimum Euclidean distance between the image feature and other text features or image features for any image feature, and taking the difference between the Euclidean distance and the minimum Euclidean distance as an image loss value;
and summing the text loss value and the image loss value to obtain a first loss value, and training the mutual inspection model through the first loss value.
13. The method of claim 1, wherein the iteratively training the generation of the teletext data interaction model based on the predetermined loss function based on the text features and the image features further comprises:
converting the text features to an image feature space by a first conversion method to obtain image text features, and converting the image text features to a text feature space by a second conversion method to obtain text conversion features;
converting the image characteristics to a text characteristic space by a second conversion method to obtain text image characteristics, and converting the text image characteristics to an image characteristic space by a first conversion method to obtain image conversion characteristics;
and taking the minimum value of the sum of the distance between the text transformation feature and the text feature and the distance between the image transformation feature and the image feature as a second loss value, and training the mutual inspection model through the second loss value.
14. The method of claim 1, wherein the iteratively training the generation of the teletext data interaction model based on the predetermined loss function based on the text features and the image features further comprises:
and calculating loss values corresponding to the corresponding text features and image features respectively through a third conversion method, judging the difference between the loss value corresponding to the text features and the loss value corresponding to the image features, taking the difference as a third loss value, and iteratively training the mutual inspection model through the third loss value.
15. The method of any one of claims 12-14, wherein the iteratively training the generation of the teletext data cross-checking model based on the predetermined loss function based on the text features and the image features further comprises:
and iteratively training the mutual inspection model by taking the sum of the first loss value, the second loss value and the third loss value as a loss value.
16. A medical treatment image-text data mutual inspection device is characterized by comprising:
the preprocessing module is configured for carrying out multi-level classification on the text information in the image-text data according to a predetermined mode and respectively generating text characteristics in a cascading mode through a first neural network model according to a classification relation;
the first model calculation module is configured to generate image features of image information in the image-text data through a second neural network model in an image sequence mode;
the second model calculation module is configured to generate a graphic data mutual inspection model based on a preset loss function iterative training according to the text features and the image features;
and the image-text mutual inspection module is configured to retrieve corresponding text information and/or image information from the input image-text data through the image-text data mutual inspection model.
17. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 15.
18. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 15.
CN202210760827.7A 2022-06-30 2022-06-30 Medical image-text data mutual detection method, device, equipment and readable storage medium Pending CN115408551A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210760827.7A CN115408551A (en) 2022-06-30 2022-06-30 Medical image-text data mutual detection method, device, equipment and readable storage medium
PCT/CN2022/141374 WO2024001104A1 (en) 2022-06-30 2022-12-23 Image-text data mutual-retrieval method and apparatus, and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210760827.7A CN115408551A (en) 2022-06-30 2022-06-30 Medical image-text data mutual detection method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN115408551A true CN115408551A (en) 2022-11-29

Family

ID=84158085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210760827.7A Pending CN115408551A (en) 2022-06-30 2022-06-30 Medical image-text data mutual detection method, device, equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN115408551A (en)
WO (1) WO2024001104A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024001104A1 (en) * 2022-06-30 2024-01-04 苏州元脑智能科技有限公司 Image-text data mutual-retrieval method and apparatus, and device and readable storage medium
CN117407518A (en) * 2023-12-15 2024-01-16 广州市省信软件有限公司 Information screening display method and system based on big data analysis

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11809822B2 (en) * 2020-02-27 2023-11-07 Adobe Inc. Joint visual-semantic embedding and grounding via multi-task training for image searching
CN113239153B (en) * 2021-05-26 2022-11-29 清华大学深圳国际研究生院 Text and image mutual retrieval method based on example masking
CN114357148A (en) * 2021-12-27 2022-04-15 之江实验室 Image text retrieval method based on multi-level network
CN114661933A (en) * 2022-03-08 2022-06-24 重庆邮电大学 Cross-modal retrieval method based on fetal congenital heart disease ultrasonic image-diagnosis report
CN114612749B (en) * 2022-04-20 2023-04-07 北京百度网讯科技有限公司 Neural network model training method and device, electronic device and medium
CN115408551A (en) * 2022-06-30 2022-11-29 苏州浪潮智能科技有限公司 Medical image-text data mutual detection method, device, equipment and readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024001104A1 (en) * 2022-06-30 2024-01-04 苏州元脑智能科技有限公司 Image-text data mutual-retrieval method and apparatus, and device and readable storage medium
CN117407518A (en) * 2023-12-15 2024-01-16 广州市省信软件有限公司 Information screening display method and system based on big data analysis
CN117407518B (en) * 2023-12-15 2024-04-02 广州市省信软件有限公司 Information screening display method and system based on big data analysis

Also Published As

Publication number Publication date
WO2024001104A1 (en) 2024-01-04

Similar Documents

Publication Publication Date Title
CN112015868B (en) Question-answering method based on knowledge graph completion
EP3567605A1 (en) Structured report data from a medical text report
CN112149414B (en) Text similarity determination method, device, equipment and storage medium
WO2018176035A1 (en) Method and system of building hospital-scale chest x-ray database for entity extraction and weakly-supervised classification and localization of common thorax diseases
CN115408551A (en) Medical image-text data mutual detection method, device, equipment and readable storage medium
CN110612522B (en) Establishment of solid model
CN109378066A (en) A kind of control method and control device for realizing disease forecasting based on feature vector
WO2023029506A1 (en) Illness state analysis method and apparatus, electronic device, and storage medium
US11354599B1 (en) Methods and systems for generating a data structure using graphical models
CN113486667A (en) Medical entity relationship joint extraction method based on entity type information
CN113779996B (en) Standard entity text determining method and device based on BiLSTM model and storage medium
CN112488301A (en) Food inversion method based on multitask learning and attention mechanism
US20220375576A1 (en) Apparatus and method for diagnosing a medical condition from a medical image
Zhang et al. Natural language generation and deep learning for intelligent building codes
CN113761192B (en) Text processing method, text processing device and text processing equipment
CN114758742A (en) Medical record information processing method and device, electronic equipment and storage medium
US11783244B2 (en) Methods and systems for holistic medical student and medical residency matching
CN115936014B (en) Medical entity code matching method, system, computer equipment and storage medium
CN116844731A (en) Disease classification method, disease classification device, electronic device, and storage medium
Mohamed et al. ImageCLEF 2020: An approach for Visual Question Answering using VGG-LSTM for Different Datasets.
CN114708952B (en) Image annotation method and device, storage medium and electronic equipment
CN115132372A (en) Term processing method, apparatus, electronic device, storage medium, and program product
CN112182253B (en) Data processing method, data processing equipment and computer readable storage medium
CN114461085A (en) Medical input recommendation method, device, equipment and storage medium
CN110633363A (en) Text entity recommendation method based on NLP and fuzzy multi-criterion decision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination