CN114334068B - Radiology report generation method, device, terminal and storage medium - Google Patents

Radiology report generation method, device, terminal and storage medium Download PDF

Info

Publication number
CN114334068B
CN114334068B CN202111346347.8A CN202111346347A CN114334068B CN 114334068 B CN114334068 B CN 114334068B CN 202111346347 A CN202111346347 A CN 202111346347A CN 114334068 B CN114334068 B CN 114334068B
Authority
CN
China
Prior art keywords
character
features
report
radiology report
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111346347.8A
Other languages
Chinese (zh)
Other versions
CN114334068A (en
Inventor
张灵艳
陈志鸿
李米芳
万翔
朱记超
谢尚煌
孙崎元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Longgang Central Hospital Shenzhen Longgang Central Hospital Group Shenzhen Ninth People's Hospital Acupuncture Research Institute Of Shenzhen Longgang Central Hospital
Shenzhen Research Institute of Big Data SRIBD
Original Assignee
Shenzhen Longgang Central Hospital Shenzhen Longgang Central Hospital Group Shenzhen Ninth People's Hospital Acupuncture Research Institute Of Shenzhen Longgang Central Hospital
Shenzhen Research Institute of Big Data SRIBD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Longgang Central Hospital Shenzhen Longgang Central Hospital Group Shenzhen Ninth People's Hospital Acupuncture Research Institute Of Shenzhen Longgang Central Hospital, Shenzhen Research Institute of Big Data SRIBD filed Critical Shenzhen Longgang Central Hospital Shenzhen Longgang Central Hospital Group Shenzhen Ninth People's Hospital Acupuncture Research Institute Of Shenzhen Longgang Central Hospital
Priority to CN202111346347.8A priority Critical patent/CN114334068B/en
Publication of CN114334068A publication Critical patent/CN114334068A/en
Application granted granted Critical
Publication of CN114334068B publication Critical patent/CN114334068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Processing (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

The invention discloses a method, a device, a terminal and a storage medium for generating a radiology report. The invention provides a radiology report generation method, which is characterized in that an image to be processed is input into a trained report generation model, the model comprises a visual feature encoder, a potential feature encoder and a layered decoder, after the visual feature of the image to be processed is extracted through the visual feature encoder, the potential feature is extracted through the potential feature encoder, a multilayer attention mechanism operation is adopted in the layered decoder, so that character features and sentence features of a report are alternately aggregated and distributed, the potential features and the visual features are encoded into semantic features of the report, the accuracy of a next character predicted by using the existing characters of the report is ensured, the radiology report of the image to be processed is generated by using a deep learning model, and the compiling efficiency of the radiology report is improved.

Description

Radiology report generation method, device, terminal and storage medium
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for generating a radiology report.
Background
Radiological images are widely used in the medical field, and diagnosis reports compiled based on the radiological images need to be described, but compiling the radiological reports is generally time-consuming and requires comprehensive knowledge and experience to understand the radiological images.
Thus, there is a need for improvements and enhancements in the art.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a radiology report generation method, aiming at solving the problem that the radiology report compiling in the prior art is long in time consumption.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect of the present invention, a method for generating a radiology report is provided, where the method includes:
acquiring an image to be processed, and inputting the image to be processed into a trained report generation model, wherein the report generation model comprises a visual feature encoder, a target embedding matrix, a potential feature encoder and a layered decoder, and the layered decoder comprises a first attention layer, a second attention layer and a third attention layer;
acquiring the visual features of the image to be processed through the visual feature encoder, inputting the visual features of the image to be processed into the potential feature encoder, and acquiring the potential features corresponding to the image to be processed output by the potential feature encoder;
acquiring an embedded feature of each character in a current radiology report according to the target embedded matrix, inputting each embedded feature into the first attention layer, and acquiring a first character-level feature of each character and a first aggregation feature of each sentence in the current radiology report output by the first attention layer;
inputting the first aggregation feature of each sentence of the current radiology report and the potential feature of the image to be processed into the second attention layer, and coding the potential feature of the image to be processed into the semantic feature of each sentence of the current radiology report through the second attention layer to obtain a second aggregation feature of each sentence of the current radiology report;
inputting each second aggregation feature, each first character-level feature and the visual feature of the to-be-processed image into the third attention layer, coding the visual feature of the to-be-processed image into the semantic feature of each character of the current radiology report through the third attention layer to obtain a second character-level feature corresponding to each character of the current radiology report, and obtaining the next character in the current radiology report according to each second character-level feature;
repeatedly executing the step of obtaining the embedding characteristics of each character in the current radiology report according to the target embedding matrix until a preset end character is obtained, and obtaining a target radiology report corresponding to the image to be processed;
wherein the initial content of the radiology report is a preset sentence marking character.
The radiology report generation method comprises the steps that the report generation model is obtained through training according to a preset data set, the preset data set comprises a plurality of groups of training samples, and each group of training samples comprises a sample image and a corresponding sample radiology report; before inputting the image to be processed into the trained report generation model, the method includes:
selecting a target training sample in the preset data set;
inputting a sample image in the target training sample into the report generation model, and acquiring the visual feature of the sample image;
inputting a sample radiology report in the target training sample into a text encoder, acquiring text features of the sample radiology report, inputting the text features into the potential feature encoder, and acquiring the potential features corresponding to the sample radiology report;
inputting the visual features of the sample images, a first character in the sample radiology report and the potential features corresponding to the sample radiology report to the layered encoder to obtain a prediction report corresponding to the sample radiology report;
obtaining the loss of the target training sample according to the prediction report, and updating the network parameters of the report generation model according to the loss of the target training sample;
and re-executing the step of selecting the target training sample in the preset data set until the parameters of the report generation model are converged.
The method for generating a radiology report, wherein the obtaining of the loss of the target training sample according to the prediction report includes:
acquiring a first probability distribution according to the prediction report, wherein the first probability distribution is the probability distribution of the prediction report being the sample radiology report under the joint condition of the potential features corresponding to the text features and the sample images;
inputting the visual features of the sample image into the potential feature encoder, and obtaining a second probability distribution according to the output of the potential feature encoder, wherein the second probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample image;
obtaining a third probability distribution according to the potential features corresponding to the text features, wherein the third probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample radiology report;
obtaining a loss of the target training sample based on the first probability distribution, the second probability distribution, and the third probability distribution.
The radiology report generating method, wherein the obtaining of the first character-level feature of each character and the first aggregate feature of each sentence in the current radiology report output by the first attention layer includes:
and taking the first character-level features corresponding to the preset sentence marking characters of each sentence in the current radiology report as the first aggregation features of each sentence in the radiology report.
The radiology report generating method, wherein the encoding, by the second attention layer, the potential features of the to-be-processed image into the semantic features of each current sentence of the radiology report to obtain the second aggregate features of each current sentence of the radiology report, includes:
generating a query embedding of a sentence according to the first aggregated features of the sentence;
generating key embedding and value embedding of sentences according to the potential features of the images to be processed;
performing a multi-attention mechanism based on query embedding, key embedding, and value embedding of each sentence to obtain the second aggregate features of each sentence.
The radiology report generation method includes the steps of, by the third attention layer, encoding the visual features of the to-be-processed image into semantic features of each character of the current radiology report to obtain second character-level features corresponding to each character of the current radiology report, where the second character-level features include:
generating query embeddings of the literal characters according to the first character-level features of the literal characters, and generating query embeddings of the preset sentence marking characters of the sentence according to the second aggregation features of the sentence;
generating key embedding and value embedding of characters according to the visual characteristics of the image to be processed;
performing a multi-head attention mechanism based on the query embedding, key embedding, and value embedding of each character to obtain the second character-level feature of each character.
The radiology report generating method of, wherein the layered decoder further comprises a feed forward layer, the feed forward layer comprising at least one linear transformation layer; the obtaining a next character in the current radiology report according to each of the second character-level features includes:
inputting each of the second character-level features to the feed-forward layer;
and obtaining the next character in the current radiology report according to the output of the feedforward layer.
In a second aspect of the present invention, there is provided a radiology report generation apparatus including:
the image acquisition module is used for acquiring an image to be processed and inputting the image to be processed into a trained report generation model, wherein the report generation model comprises a visual feature encoder, a target embedding matrix, a potential feature encoder and a layered decoder, and the layered decoder comprises a first attention layer, a second attention layer and a third attention layer;
the potential feature extraction module is used for acquiring the visual features of the image to be processed through the visual feature encoder, inputting the visual features of the image to be processed into the potential feature encoder, and acquiring the potential features corresponding to the image to be processed output by the potential feature encoder;
a first attention module, configured to obtain an embedding feature of each character in a current radiology report according to the target embedding matrix, input each of the embedding features to the first attention layer, and obtain a first character-level feature of each character and a first aggregation feature of each sentence in the current radiology report output by the first attention layer;
a second attention module, configured to input the first aggregate features of each current sentence of the radiology report and the latent features of the to-be-processed image into the second attention layer, and encode the latent features of the to-be-processed image into semantic features of each current sentence of the radiology report through the second attention layer, so as to obtain second aggregate features of each current sentence of the radiology report;
a third attention module, configured to input each of the second aggregate features, each of the first character-level features, and the visual features of the to-be-processed image into the third attention layer, encode the visual features of the to-be-processed image into semantic features of each character of the current radiology report through the third attention layer, obtain a second character-level feature corresponding to each character of the current radiology report, and obtain a next character in the current radiology report according to each of the second character-level features;
the circulation module is used for calling the first attention module to re-execute the step of acquiring the embedded features of each character in the current radiology report after the third attention module outputs the next character of the current radiology report until a preset end character is acquired, and obtaining a target radiology report corresponding to the image to be processed;
wherein the initial content of the radiology report is a preset sentence marker character.
In a third aspect of the present invention, there is provided a terminal comprising a processor, and a computer-readable storage medium communicatively connected to the processor, the computer-readable storage medium being adapted to store a plurality of instructions, and the processor being adapted to invoke the instructions in the computer-readable storage medium to perform the steps of implementing the classification method according to any one of the above.
In a fourth aspect of the invention, there is provided a computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the classification method of any one of the above.
Compared with the prior art, the invention provides a radiology report generating method, a device, a terminal and a storage medium, the radiology report generating method inputs an image to be processed into a trained report generating model, the model comprises a visual feature encoder, a potential feature encoder and a layered decoder, after the visual feature of the image to be processed is extracted through the visual feature encoder, the potential feature is extracted through the potential feature encoder, a multilayer attention mechanism operation is adopted in the layered decoder, so that character features and sentence features of a report are alternately aggregated and distributed, the potential features and the visual features are encoded into semantic features of the report, the accuracy of a next character predicted by using the existing characters of the report is ensured, the radiology report of the image to be processed is generated by using a deep learning model, and the compiling efficiency of the radiology report is improved.
Drawings
FIG. 1 is a flow chart of an embodiment of a radiology report generation method provided by the present invention;
FIG. 2 is a schematic diagram of a report generation model training process in the radiology report production method provided by the present invention;
fig. 3 is a schematic diagram illustrating a similarity calculation method for sentences in an evaluation process of a generated radiology report in an embodiment of a radiology report generation method according to the present invention;
fig. 4 is an exemplary diagram of a radiology report generated by the radiology report generating method provided by the present invention;
FIG. 5 is a statistical data plot of a data set employed during an experiment for a radiology report generation method provided by the present invention;
FIG. 6 is a schematic diagram of experimental results of a radiology report generation method provided by the present invention;
FIG. 7 is a schematic diagram of a configuration of an embodiment of a radiology report generating device provided by the present invention;
fig. 8 is a schematic diagram illustrating an embodiment of a terminal according to the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The radiology report generating method provided by the invention can be applied to a terminal with computing power, the terminal can execute the radiology report generating method provided by the invention to generate a radiology report, and the terminal can be, but is not limited to, various computers, mobile terminals, intelligent household appliances, wearable devices and the like.
Example one
As shown in fig. 1, one embodiment of the radiology report generation method includes the steps of:
s100, acquiring an image to be processed, and inputting the image to be processed into a trained report generation model, wherein the report generation model comprises a visual feature encoder, a target embedding matrix, a potential feature encoder and a layered decoder, and the layered decoder comprises a first attention layer, a second attention layer and a third attention layer.
Specifically, the image to be processed is a radiological image, such as an X-ray image, and in the prior art, a doctor needs to write a radiological report for image description after reading the radiological image, for example: 'two lungs have slightly strong texture, the two lower lungs have speckle flaky fuzzy images in the fields of' 8230 ';' and the radiology report needs an experienced doctor to carefully read and understand the images and then carry out coding, thus having low efficiency. To solve this problem, the radiology report generating method provided in this embodiment constructs and trains a report generating model, inputs the to-be-processed image, which needs to generate a radiology report, into the trained report generating model, and obtains the radiology report output by the report generating model.
S200, acquiring the visual characteristics of the image to be processed through the visual characteristic encoder, inputting the visual characteristics of the image to be processed into the potential characteristic encoder, and acquiring the potential characteristics corresponding to the image to be processed output by the potential characteristic encoder.
After the image to be processed is input into the report generation model, firstly, the visual feature of the image to be processed is extracted by a visual feature encoder in the report generation model, specifically, the visual feature encoder includes an initial feature extraction layer and a transform encoder, and the structure of the initial feature extraction layer may be the structure of an existing image feature extraction model, for example, a CNN network. The image to be processed can be decomposed into at least one region, the characteristics of each region are extracted through the initial characteristic extraction layer, and the characteristics output by the initial characteristic extraction layer can be connected to a long vector to be organized into a sequence which is expressed as { x1,x2,...,xl,...,xLIn which xlFor the features extracted for the ith region, L is the number of regions. After extracting the features of each region, in order to further sum up the visual features and explore the similarity between the features of each region, a Transformer encoder is used to encode the features of each region and the visual markers as an overall representation of the image, and the process can be expressed as follows: v = fve(x[VIS],x1,x2,...,xl,...,xL) Wherein x is[VIS]Representing a visual mark, fve() And v represents the collected visual features, namely the visual features of the image to be processed.
The potential feature encoder in the report generation model is used for extracting potential features according to the visual features of the images to be processed, wherein the potential features are potential representation features which are common to the images to be processed and radiology reports corresponding to the images to be processed, so that the radiology reports corresponding to the images to be processed can be obtained according to the potential features. The parameters of the visual encoder and the latent feature encoder are determined by training the report generation model in advance, and in order to enable the latent feature encoder to achieve the effect of outputting the latent features, when the report generation model is trained, embedding of a sample radiology report corresponding to a sample image is used as the input of the latent feature encoder for training, which will be described later in detail.
After the visual feature encoder is input to the potential feature encoder and the potential features output by the potential feature encoder are obtained, the method for generating a radiology report according to this embodiment further includes:
s300, acquiring the embedded features of each character in the current radiology report according to the target embedded matrix, inputting each embedded feature into the first attention layer, and acquiring the first character-level features of each character and the first aggregation features of each sentence in the current radiology report output by the first attention layer.
Specifically, in this embodiment, the next character is predicted according to all characters in the current radiology report, the initial content of the radiology report is a preset sentence marking character, that is, at first, the embedding feature of the preset sentence marking character is obtained according to the target embedding matrix, the embedding feature of the preset sentence marking character is input to the first attention layer, after the next character is obtained, the preset sentence marking character and the embedding feature of the next character of the preset sentence marking character are input to the first attention layer, and finally, a complete report is obtained.
Specifically, the first attention layer employs a self-attention mechanism, that is, the self-attention mechanism is executed according to the embedded features of the context character of each character in the current radiology report, and the first character-level features of the character are obtainedAnd (5) performing characterization. In the attention mechanism, a query embedding matrix (Q matrix), a key embedding matrix (K matrix) and a value embedding matrix (V matrix) are arranged, the query embedding Q, the key embedding K and the value embedding V of each character are obtained through the corresponding matrixes, and then the attention mechanism output of the character is obtained according to the K of the context of the character and the Q and V of the character, wherein the attention mechanism can be expressed as follows:
Figure BDA0003354136910000091
when the self-attention mechanism is executed, the inquiry embedding matrix, the key embedding matrix and the value embedding matrix corresponding to each character are obtained by multiplying the embedding matrix of each character with the inquiry embedding matrix, the key embedding matrix and the value embedding matrix in the first attention layer respectively.
Specifically, the context range of the character in the self-attention mechanism adopted in the first attention layer may be other characters belonging to the same sentence, or other characters before and after the character by a preset number, other characters in the whole report, and the like.
In this embodiment, each sentence of the radiology report in the sample is preceded by the predetermined sentence marker character, e.g., character y, during training[sent]When the training is completed and the next character of the radiology report is generated according to the report generation model, the preset tag character is output, which indicates that a sentence is finished and a new next sentence is started, that is, the preset sentence tag character exists before each sentence in the current radiology report. The obtaining a first character-level feature of each character and a first aggregate feature of each sentence in the current radiology report output by the first attention layer comprises:
and taking the first character-level features corresponding to the preset sentence marking characters of each sentence in the current radiology report as the first aggregation features of each sentence in the radiology report.
For example, assume that the current radiology report is { y }[sent],y1,y2,...,y[sent],...,ytIn which y istRepresenting the t-th character, y, of the current radiology report except for the preset sentence marker character[sent]Marking characters for the preset sentence, then firstly obtaining the embedding of each character through the target embedding matrix: { y[sens],y1,y2,...,y[sens],...,yt-then inputting the embedded features of each character into the first attention layer, in which for the embedded features of each character, self-attention is performed according to the embedded features of the context character of that character, resulting in the first character-level features of each character and the first aggregated features of each sentence in the current radiology report: { c[sens1],c1,c2,...,c[sens2],...,ctIn which c istSaid first character-level features representing the current tth character of said radiology report, excluding said preset sentence marker character, c[senst]The first aggregate feature representing a current tth sentence in the radiology report.
The parameters of the target embedding matrix and the parameters of the first attention layer (including the query embedding matrix, the key embedding matrix, and the value embedding matrix of the self-attention mechanism) are determined by training the report generation model in advance.
S400, inputting the first aggregation feature of each sentence of the current radiology report and the potential feature of the image to be processed into the second attention layer, and coding the potential feature of the image to be processed into the semantic feature of each sentence of the current radiology report through the second attention layer to obtain the second aggregation feature of each sentence of the current radiology report.
Specifically, the encoding, by the second attention layer, the latent features of the to-be-processed image into semantic features of each current sentence of the radiology report to obtain second aggregate features of each current sentence of the radiology report includes:
generating a query embedding of a sentence according to the first aggregated features of the sentence;
generating key embedding and value embedding of sentences according to the potential features of the images to be processed;
performing a multi-attention mechanism based on the query embedding, key embedding, and value embedding of each sentence to obtain the second aggregate features of each sentence.
In the second attention layer of the layered decoder, a multi-attention mechanism is executed, semantic features are operated at a sentence level, specifically, for each sentence in the current radiology report, query embedding of the sentence in the second attention layer is obtained according to the first aggregation feature of the sentence and a query embedding matrix in the second attention layer, key embedding and value embedding of the sentence in the second attention layer are obtained according to the latent features of the image to be processed and a key embedding matrix and a value embedding matrix in the second attention layer, and the multi-attention mechanism is executed according to the query embedding, key embedding and value embedding of each sentence in the second attention layer to obtain the second aggregation feature of each sentence.
Through the second attention layer, the semantic features of each sentence of the current radiology report, which encode the potential features of the image to be processed, are realized, so that the result of predicting the next character according to the semantic features of the current radiology report is more accurate.
Parameters in the second attention layer (including query embedding matrix, key embedding matrix, and value embedding matrix) are determined by training the report generation model in advance.
Referring to fig. 1 again, the method for generating a radiology report provided in the present embodiment further includes the steps of:
s500, inputting the second aggregation features, the first character-level features and the visual features of the to-be-processed image into the third attention layer, coding the visual features of the to-be-processed image into the semantic features of each character of the current radiology report through the third attention layer to obtain second character-level features corresponding to each character of the current radiology report, and obtaining a next character in the current radiology report according to the second character-level features.
Specifically, in step S400, the first character-level features of preset sentence-marking characters in the current radiology report have been processed as the second aggregation features, and thus, each of the first character-level features in step S500 is worth being the first character-level feature of each literal character. The encoding, by the third attention layer, the visual feature of the to-be-processed image into the semantic feature of each character of the current radiology report to obtain a second character-level feature corresponding to each character of the current radiology report includes:
generating query embeddings of the literal characters according to the first character-level features of the literal characters, and generating query embeddings of the preset sentence marking characters of the sentence according to the second aggregation features of the sentence;
generating key embedding and value embedding of characters according to the visual features of the image to be processed;
performing a multi-head attention mechanism based on query embedding, key embedding, and value embedding of each character to obtain the second character-level features of each character.
In the third attention layer of the hierarchical decoder, a multi-head attention mechanism is executed, semantic features are operated at a character level, specifically, for literal characters except for the preset sentence marking characters, query embedding of the literal characters in the third attention layer is generated according to the first character set features corresponding to the literal characters and the query embedding matrix in the third attention layer, and for each preset sentence marking character, query embedding of the preset sentence marking characters in the third attention layer is generated according to the second aggregation features of sentences to which the preset sentence marking characters belong and the query embedding matrix in the third attention layer, so that for each character in the current radiology report, corresponding query embedding is generated.
Generating key embedding of each character in the third attention layer according to the visual feature of the image to be processed and the key embedding matrix of the third attention layer, generating value embedding of each character in the third attention layer according to the visual feature of the image to be processed and the value embedding matrix of the third attention layer, and executing a multi-head attention mechanism according to query embedding, key embedding and value embedding of each character in the third attention layer to obtain the second character-level feature of each character.
It can be seen that, through the above steps, the visual features of the to-be-processed image are encoded into the second character-level features of each character of the current radiology report, the second character-level features of each character of the radiology report include the visual features of the to-be-processed image and the potential features of the to-be-processed image, and the layered decoder makes good use of the potential features and the visual features of the to-be-processed image, so that the accuracy of predicting the next character according to the current radiology report can be improved, and a radiology report with better accuracy can be obtained.
The parameters of the third attention layer (including the query embedding matrix, the key embedding matrix, and the value embedding matrix in the third attention layer) are determined in advance by training the report generation model.
The hierarchical decoder further comprises a feed-forward layer, and the acquiring the next character in the current radiology report according to each second character-level feature comprises:
inputting each of the second character-level features to the feed-forward layer;
and obtaining the next character in the current radiology report according to the output of the feedforward layer.
Specifically, the feedforward layer includes at least one linear transformation layer, for example, the feedforward layer may include two linear transformation layers, and a ReLE activation function may be further disposed between the two linear transformation layers.
The hierarchical decoder also comprises a classification layer, the classification is linear transformation comprising a softmax activation function, the output of the feedforward layer is input into the classification layer, the distribution of a vocabulary table is obtained, namely the probability that the next character is output to be each preset character is output, and the character with the largest probability is selected as the next character of the radiology report.
The parameters of the feed-forward layer and the classification layer are determined by training the report generation model in advance.
S600, repeatedly executing the step of obtaining the embedding characteristics of each character in the current radiology report Europe according to the target embedding matrix until a preset end character is obtained, and obtaining a target radiology report corresponding to the image to be processed.
According to the steps S300 to S600, a next character of the current radiology report may be generated, the next character is added to the current radiology report, the current radiology report is updated, then the embedding feature of each character in the updated radiology report is obtained according to the target embedding matrix, that is, the step S300 is repeatedly performed until the next character of the obtained radiology report is a preset end character, and the preset end character may be set as a character, for example, "which is not repeated with a literal character and the preset sentence marking character. "," end ", etc.
The following describes a training process of the report generation model, specifically, the report generation model is obtained by training according to a preset data set, the preset data set includes a plurality of groups of training samples, and each group of training samples includes a sample image and a sample radiology report corresponding to the sample image. Before the image to be processed is input into the generated report generation model, the method comprises the following steps:
selecting a target training sample in the preset data set;
inputting a sample image in the target training sample into the report generation model, and acquiring the visual features of the sample image;
inputting a sample radiology report in the target training sample into a text encoder, acquiring text features of the sample radiology report, inputting the text features into the potential feature encoder, and acquiring the potential features corresponding to the sample radiology report;
inputting the visual features of the sample images, a first character in the sample radiology report and the potential features corresponding to the sample radiology report to the layered encoder to obtain a prediction report corresponding to the sample radiology report;
obtaining the loss of the target training sample according to the prediction report, and updating the network parameters of the report generation model according to the loss of the target training sample;
and re-executing the step of selecting the target training sample in the preset data set until the parameters of the report generation model are converged.
Specifically, the sample radiology report corresponding to the sample image included in each set of training samples is a radiology report compiled by the doctor according to the sample image, that is, the sample radiology report can be regarded as a correct radiology report of the corresponding sample image, a first character of the sample radiology report is the preset sentence identification character, and a last character of the sample radiology report is the preset end character.
During training, parameters of the report generation model are updated by using a group of target training samples each time, as shown in fig. 2, a text encoder is provided during training, and the parameters of the text encoder and the parameters of the report generation model are updated together, but after the parameters of the report production model converge, that is, after training is completed, the text encoder is not used in the process of generating the target radiology report corresponding to the image to be processed.
For a target training sample, inputting a sample image in the target training sample to a visual encoder in the report generation model, and acquiring a visual feature of the sample image, where a specific process is consistent with the above-described process of acquiring the visual feature of the to-be-processed image by the visual encoder. Will be said toAnd inputting the sample radiology report in the standard training sample into the text encoder, wherein in the text encoder, firstly, an embedded matrix with the same parameters as the target embedded matrix in the report generation model is adopted to obtain an embedded sequence of each character of the sample radiology report, and then, a Transformer encoder is used for encoding the embedding of each character in the sample radiology report to obtain the text characteristics of the sample radiology report. In one possible implementation, a text label y may be added before the first character of the sample radiology report[TXT]The text label is also embedded and encoded.
After the text features of the sample radiology report are obtained, the text features are input to the potential feature encoder in the report generation model, and potential features corresponding to the sample radiology report are obtained. Inputting the visual features of the sample images, a first character in the sample radiology report, and the potential features corresponding to the sample radiology report to the layered encoder to obtain a prediction report corresponding to the sample radiology report. Specifically, the visual features of the sample image are taken as the visual features of the images to be processed in steps S300-S500, the first character in the sample radiology report is taken as the initial content of the radiology report in steps S300-S500, and through steps S300-S500, the first character in the sample radiology report can be given, the next character predicted by the report generation model is obtained, and the radiology report of the sample image predicted according to the current parameters of the report generation model, namely the prediction report, is obtained. It should be noted that, after the next character is generated each time in the training process, the next character may be added to the content of the sample radiology report predicted to obtain the next character, or the character after the content of the next character is predicted to obtain in the original sample radiology report may be added to the content of the sample radiology report predicted to obtain the next character, and then the next character is predicted.
It is obvious that, in order to make the report generation model work better, the update direction of the parameters of the report generation model should be such that the prediction report generated by the model according to the sample image and the sample radiology report is as close as the sample radiology report, and the potential features obtained according to the sample image are as close as the potential features obtained according to the sample radiology report. In order to capture the uncertainty of the radiology report and improve the generalization capability of the model, in the embodiment, a probabilistic modeling manner is adopted to obtain the training loss of the report generation model so as to capture the uncertainty, diversity and complex structure of the radiology report, so that the output of the model is more accurate.
The obtaining the loss of the target training sample according to the prediction report includes:
acquiring a first probability distribution according to the prediction report, wherein the first probability distribution is a probability distribution of the prediction forecast as the sample radiology report under the joint condition of the potential features corresponding to the text features and the sample images;
inputting the visual features of the sample image into the potential feature encoder, and acquiring a second probability distribution according to the output of the potential feature encoder, wherein the second probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample image;
obtaining a third probability distribution according to the potential features corresponding to the text features, wherein the third probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample radiology report;
obtaining a loss of the target training sample based on the first probability distribution, the second probability distribution, and the third probability distribution.
Based on probabilistic modeling, the objective function of the report generation model can be constructed as follows:
LELBO=log pθ(Y|Z,I)-βKL[qθ(Z|Y)||pθ(Z|I)]
wherein L isELBORepresenting the value of the objective function, logpθ(Y | Z, I) represents the union of the corresponding potential feature Z in the sample radiology report and the sample image IModel output probability distribution of sample radiology report Y under the Condition, qθ(Z | Y) denotes the probability distribution, p, of generating a potential feature Z under the conditions of a sample radiology report Yθ(Z | I) denotes the probability distribution, KL [ q ] q, of generating the latent feature Z under the conditions of the sample image Iθ(Z|Y)||pθ(Z|I)]Denotes qθ(Z | Y) and pθThe KL divergence between (Z | I), β, is a hyperparameter, used to control the weight of the KL divergence. As explained above, in order for the report generation model to work better, the parameters of the report generation model should be updated in a direction such that the prediction report generated by the model from the sample image and the sample radiology report is closer to the sample radiology report, and therefore, log pθThe larger the value of (Y | Z, I) the better, and the closer the potential features from the sample image and from the sample radiology report, the better, therefore, KL [ qθ(Z|Y)||pθ(Z|I)]The smaller the value, the better the value of the objective function, i.e. the larger the value of the objective function corresponding to the objective training sample, the smaller the model training loss corresponding to the coworker training sample. The target function value may be inverted as a loss of the target training sample.
And updating the network parameters of the report generation model according to the loss of the target training sample, and if the parameter convergence of the report generation model is known, completing the training of the report generation model, wherein the training of the report generation model can be used for predicting the radiology report of the image to be processed, namely generating the target radiology report according to the image to be processed.
The inventor also verifies the effectiveness of the radiology report generation method provided by the embodiment through experiments, specifically including qualitative experimental verification and quantitative experimental verification, in the qualitative experimental analysis, the target radiology reports Generated by the radiology report generation method provided by the embodiment are directly compared, as shown in fig. 4, as shown in the figure, the radiology reports (Generated Sample1 and Generated Sample2 in the figure) Generated by the report generation models trained according to the two training sets according to the same image to be processed are compared with the real radiology report (Ground report in the figure), it can be found that the report Generated by the radiology report generation method provided by the embodiment can generate an accurate description compared with the real report, and covers important findings in the real report. In addition, it can be seen that the report generation model obtained by training the two data sets has two different styles, because the sample radiology reports in the two data sets are written by research groups of two hospitals, the writing styles of the reports are different, and the difference is captured by the potential features, so that the probability modeling mode in the embodiment can capture the potential uncertainty between the reports.
In the quantitative experiment, the radiology report (hereinafter referred to as a candidate report) and the reference report generated by the radiology report generation method provided in the present embodiment are respectively expressed as a rule-based and model-based evaluation method (RMM)
Figure BDA0003354136910000171
And
Figure BDA0003354136910000172
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003354136910000181
represents a candidate report ScThe nth sentence in (a) is,
Figure BDA0003354136910000182
denotes a reference report SrThe mth sentence in (1), N, M represent the total number of sentences in the candidate report and the reference report, respectively. First a rule-based information extraction is performed, extracting information from the sentences according to anatomical positions for the candidate and reference reports, in practice a large number of anatomical positions are collated to match the anatomical positions mentioned in the report, an anatomical position can be extracted from each sentence, thus generating anatomical position information for the candidate and reference reports, which can be expressed as
Figure BDA0003354136910000183
And
Figure BDA0003354136910000184
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003354136910000185
representing the anatomical location in the nth sentence in the candidate report,
Figure BDA0003354136910000186
representing the anatomical position in the mth sentence in the reference report, and N, M representing the total number of sentences in the candidate report and the reference report, respectively.
After information extraction, matching sentences in the candidate report and the reference report, then applying a pre-trained sentence embedding model to calculate a similarity score of the candidate sentence and the reference sentence, and specifically, in calculating the similarity between the sentences, as shown in fig. 3, adopting BERTScore as the pre-trained sentence embedding, using context embedding (i.e. BERT) to evaluate the similarity between two sentences, formally, given two matched sentences scAnd srGenerating two vector sequences from a pre-trained BERT model
Figure BDA0003354136910000187
Then s iscEach vector in the corresponding vector sequence and srMatching one vector in the corresponding vector sequence to calculate recall rate, and comparing srEach vector in the corresponding vector sequence and scAnd matching one vector in the corresponding vector sequence to calculate the precision, wherein the greedy matching is carried out in a manner of maximizing the matching phase velocity score, so that each vector in one sentence can be matched with the most similar vector in another sentence, and finally, the similarity between the two sentences is calculated by combining the precision and the recall ratio. Can be formulated as:
Figure BDA0003354136910000188
wherein, SIM(s)c,sr) Representing a sentence scAnd srThe degree of similarity between the two images is determined,
Figure BDA0003354136910000189
representing a sentence scThe ith vector in the corresponding sequence of vectors,
Figure BDA00033541369100001810
representing a sentence srThe jth vector in the corresponding sequence of vectors.
A matching function MATCH (S, S) is defined, which represents the sentence matching S found from the report S, specifically, the matching is based on the anatomical position information, and the sentence matching S is the sentence matching S whose anatomical position is consistent with S' anatomical position in the report S. Constructing an evaluation metric for a candidate report FRMMThe following were used:
Figure BDA0003354136910000191
according to FRMMTo evaluate the difference between the candidate report and the reference report, enabling a more accurate evaluation.
Experiments were performed on a chinese radiology Report dataset SRIBD X-Ray, which contained 226347 cases, each with a positive chest film and corresponding reports, 10000 cases were randomly selected for validation, 100000 cases for testing, and the rest for training in order to segment the dataset, whose statistics include the number of cases and the average length of the reports (Report Len), the average length of the descriptions (trends Len), and the average length of the conclusions (Impression Len), as shown in fig. 5. The comparative models used in the experiments were mainly cyclic (i.e. ST, SAT, att2al, adaAtt and Updown) and acyclic (i.e. Trans, aoA and M2 Trans), the performance of which was evaluated by conventional WOMs (including BLEU, METEOR, ROUGE-L and CIDER) and the RMM metrics mentioned above. Before inputting the data set into the model, the report is processed at the character level and the characters are filtered at a frequency less than 10, and the initial feature extraction layer in the visual encoder in the model is pre-trained on ImageNetThe trained ResNet101 extracts 2,048-dimensional fragment features, adopts three layers and eight attention heads for a visual encoder, a text encoder and a layered encoder, adopts a 512-dimensional hidden state and a random initialization structure, and trains a model under cross entropy loss by using an Adam optimizer. The learning rate and other parameters of the visual encoder are set to 5 × 10, respectively-5And 1X 10-4. In the generation process, the beam size is set to 3 to balance the effectiveness and efficiency of all models. The optimal value of the hyperparameter is obtained by evaluating the model on the validation set from both data sets.
The results are shown in FIG. 6, and the report generation model of the present invention demonstrates its superiority by comparison with all other models. Although the most attractive model, aoA, and our model are both transform-based, our model is a significant improvement over it. The reason behind may be that AoA only focuses on improving the attention structure, and does not model the uncertainty in the report. The results indicate that potential topic modeling and layered decoding are critical to generating high quality radiology reports.
In summary, the present embodiment provides a method for generating a radiology report, where an image to be processed is input to a trained report generation model, the model includes a visual feature encoder, a potential feature encoder, and a layered decoder, after a visual feature of the image to be processed is extracted by the visual feature encoder, the potential feature encoder extracts the potential feature, and a multi-layer attention mechanism operation is used in the layered decoder, so that character features and sentence features of a report are alternately aggregated and distributed, and the potential feature and the visual feature are encoded into semantic features of the report, thereby ensuring accuracy of a next character predicted by using an existing character of the report, achieving generation of a radiology report of the image to be processed by using a deep learning model, and improving efficiency of compiling the radiology report.
It should be understood that, although the steps in the flowcharts shown in the figures of the present specification are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a portion of the steps in the flowchart may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases or other media used in the embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
Example two
Based on the foregoing embodiments, the present invention further provides a radiology report generating device, as shown in fig. 7, the radiology report generating device includes:
an image obtaining module, configured to obtain an image to be processed, and input the image to be processed into a trained report generation model, where the report generation model includes a visual feature encoder, a target embedding matrix, a latent feature encoder, and a layered decoder, and the layered decoder includes a first attention layer, a second attention layer, and a third attention layer, which is specifically described in embodiment one;
a potential feature extraction module, configured to obtain, by the visual feature encoder, a visual feature of the to-be-processed image, input the visual feature of the to-be-processed image to the potential feature encoder, and obtain a potential feature corresponding to the to-be-processed image output by the potential feature encoder, which is specifically described in embodiment one;
a first attention module, configured to obtain an embedding feature of each character in a current radiology report according to the target embedding matrix, input each embedding feature to the first attention layer, and obtain a first character-level feature of each character and a first aggregation feature of each sentence in the current radiology report output by the first attention layer, as described in embodiment one;
a second attention module, configured to input the first aggregate features of each current sentence of the radiology report and the latent features of the to-be-processed image into the second attention layer, and encode the latent features of the to-be-processed image into semantic features of each current sentence of the radiology report through the second attention layer, so as to obtain second aggregate features of each current sentence of the radiology report, as described in embodiment one;
a third attention module, configured to input each of the second aggregate features, each of the first character-level features, and the visual features of the to-be-processed image into the third attention layer, encode the visual features of the to-be-processed image into semantic features of each character of the current radiology report through the third attention layer, obtain a second character-level feature corresponding to each character of the current radiology report, and obtain a next character in the current radiology report according to each of the second character-level features, which is specifically described in embodiment one;
a circulation module, configured to invoke the first attention module to re-execute the step of obtaining the embedding feature of each character in the current radiology report after the third attention module outputs a next character of the current radiology report until a preset end character is obtained, so as to obtain a target radiology report corresponding to the to-be-processed image, which is specifically described in embodiment one.
EXAMPLE III
Based on the above embodiment, the present invention further provides a terminal, as shown in fig. 8, where the terminal includes a processor 10 and a memory 20. Fig. 8 shows only some of the components of the terminal, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may also be an external storage device of the terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various data. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 has stored thereon a radiology report generating program 30, and the radiology report generating program 30 is executable by the processor 10 to implement the radiology report generating method of the present application.
The processor 10 may be a Central Processing Unit (CPU), microprocessor or other chip in some embodiments, and is used for running program codes stored in the memory 20 or Processing data, such as executing the classification method, and the like.
In one embodiment, the following steps are implemented when the processor 10 executes the radiology report generation program 30 in the memory 20:
acquiring an image to be processed, and inputting the image to be processed into a trained report generation model, wherein the report generation model comprises a visual feature encoder, a target embedding matrix, a potential feature encoder and a layered decoder, and the layered decoder comprises a first attention layer, a second attention layer and a third attention layer;
acquiring the visual features of the image to be processed through the visual feature encoder, inputting the visual features of the image to be processed into the potential feature encoder, and acquiring the potential features corresponding to the image to be processed output by the potential feature encoder;
acquiring an embedded feature of each character in a current radiology report according to the target embedded matrix, inputting each embedded feature into the first attention layer, and acquiring a first character-level feature of each character and a first aggregation feature of each sentence in the current radiology report output by the first attention layer;
inputting the first aggregation feature of each sentence of the current radiology report and the potential feature of the image to be processed into the second attention layer, and coding the potential feature of the image to be processed into the semantic feature of each sentence of the current radiology report through the second attention layer to obtain a second aggregation feature of each sentence of the current radiology report;
inputting each second aggregation feature, each first character-level feature and the visual feature of the to-be-processed image into the third attention layer, coding the visual feature of the to-be-processed image into the semantic feature of each character of the current radiology report through the third attention layer to obtain a second character-level feature corresponding to each character of the current radiology report, and obtaining a next character in the current radiology report according to each second character-level feature;
repeatedly executing the step of obtaining the embedding characteristics of each character in the current radiology report according to the target embedding matrix until a preset end character is obtained, and obtaining a target radiology report corresponding to the image to be processed;
wherein the initial content of the radiology report is a preset sentence marker character.
The report generation model is obtained by training according to a preset data set, the preset data set comprises a plurality of groups of training samples, and each group of training samples comprises a sample image and a corresponding sample radiology report; before inputting the image to be processed into the trained report generation model, the method includes:
selecting a target training sample in the preset data set;
inputting a sample image in the target training sample into the report generation model, and acquiring the visual features of the sample image;
inputting a sample radiology report in the target training sample into a text encoder, acquiring text features of the sample radiology report, inputting the text features into the potential feature encoder, and acquiring the potential features corresponding to the sample radiology report;
inputting the visual features of the sample image, a first character in the sample radiology report, and the potential features corresponding to the sample radiology report to the layered encoder, to obtain a prediction report corresponding to the sample radiology report;
obtaining the loss of the target training sample according to the prediction report, and updating the network parameters of the report generation model according to the loss of the target training sample;
and re-executing the step of selecting the target training sample in the preset data set until the parameters of the report generation model are converged.
Wherein the obtaining a loss of the target training sample according to the prediction report comprises:
acquiring a first probability distribution according to the prediction report, wherein the first probability distribution is the probability distribution of the prediction report being the sample radiology report under the joint condition of the potential features corresponding to the text features and the sample images;
inputting the visual features of the sample image into the potential feature encoder, and acquiring a second probability distribution according to the output of the potential feature encoder, wherein the second probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample image;
obtaining a third probability distribution according to the potential features corresponding to the text features, wherein the third probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample radiology report;
obtaining a loss of the target training sample according to the first probability distribution, the second probability distribution, and the third probability distribution.
Wherein the obtaining of the first character-level feature of each character and the first aggregate feature of each sentence in the current radiology report output by the first attention layer comprises:
taking the first character-level feature corresponding to the preset sentence marking character of each sentence in the current radiology report as the first aggregation feature of each sentence in the radiology report.
Wherein the encoding, by the second attention layer, the latent features of the to-be-processed image into the semantic features of each current sentence of the radiology report to obtain a second aggregate feature of each current sentence of the radiology report includes:
generating a query embedding of a sentence according to the first aggregated feature of the sentence;
generating key embedding and value embedding of sentences according to the potential features of the images to be processed;
performing a multi-attention mechanism based on query embedding, key embedding, and value embedding of each sentence to obtain the second aggregate features of each sentence.
Wherein, the encoding, by the third attention layer, the visual feature of the to-be-processed image into the semantic feature of each character of the current radiology report to obtain a second character-level feature corresponding to each character of the current radiology report includes:
generating query embeddings of the literal characters according to the first character-level features of the literal characters, and generating query embeddings of the preset sentence marking characters of the sentence according to the second aggregation features of the sentence;
generating key embedding and value embedding of characters according to the visual features of the image to be processed;
performing a multi-head attention mechanism based on the query embedding, key embedding, and value embedding of each character to obtain the second character-level feature of each character.
Wherein the layered decoder further comprises a feedforward layer comprising at least one linear transformation layer; the obtaining a next character in the current radiology report according to each of the second character-level features includes:
inputting each of the second character-level features to the feed-forward layer;
and obtaining the next character in the current radiology report according to the output of the feed-forward layer.
Example four
The present invention also provides a computer readable storage medium having stored thereon one or more programs, the one or more programs being executable by one or more processors to perform the steps of the radiology report generation method described above.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A radiology report generation method, the method comprising:
acquiring an image to be processed, and inputting the image to be processed into a trained report generation model, wherein the report generation model comprises a visual feature encoder, a target embedding matrix, a potential feature encoder and a layered decoder, and the layered decoder comprises a first attention layer, a second attention layer and a third attention layer;
acquiring the visual features of the image to be processed through the visual feature encoder, inputting the visual features of the image to be processed into the potential feature encoder, and acquiring the potential features corresponding to the image to be processed output by the potential feature encoder;
acquiring an embedded feature of each character in a current radiology report according to the target embedded matrix, inputting each embedded feature into the first attention layer, and acquiring a first character-level feature of each character and a first aggregation feature of each sentence in the current radiology report output by the first attention layer;
inputting the first aggregation feature of each sentence of the current radiology report and the potential feature of the to-be-processed image into the second attention layer, and coding the potential feature of the to-be-processed image into the semantic feature of each sentence of the current radiology report through the second attention layer to obtain a second aggregation feature of each sentence of the current radiology report;
inputting each second aggregation feature, each first character-level feature and the visual feature of the to-be-processed image into the third attention layer, coding the visual feature of the to-be-processed image into the semantic feature of each character of the current radiology report through the third attention layer to obtain a second character-level feature corresponding to each character of the current radiology report, and obtaining a next character in the current radiology report according to each second character-level feature;
repeatedly executing the step of obtaining the embedding characteristics of each character in the current radiology report according to the target embedding matrix until a preset end character is obtained, and obtaining a target radiology report corresponding to the image to be processed;
wherein the initial content of the radiology report is a preset sentence marking character.
2. The radiology report generation method of claim 1, wherein the report generation model is trained according to a preset data set, the preset data set includes a plurality of groups of training samples, and each group of training samples includes a sample image and a corresponding sample radiology report; before inputting the image to be processed into the trained report generation model, the method includes:
selecting a target training sample in the preset data set;
inputting a sample image in the target training sample into the report generation model, and acquiring the visual features of the sample image;
inputting a sample radiology report in the target training sample into a text encoder, acquiring text features of the sample radiology report, inputting the text features into the potential feature encoder, and acquiring the potential features corresponding to the sample radiology report;
inputting the visual features of the sample image, a first character in the sample radiology report, and the potential features corresponding to the sample radiology report to the layered decoder to obtain a prediction report corresponding to the sample radiology report;
obtaining the loss of the target training sample according to the prediction report, and updating the network parameters of the report generation model according to the loss of the target training sample;
and re-executing the step of selecting the target training sample in the preset data set until the parameters of the report generation model are converged.
3. The radiology report generation method of claim 2, wherein the obtaining the loss of the target training sample from the prediction report comprises:
acquiring a first probability distribution according to the prediction report, wherein the first probability distribution is the probability distribution of the prediction report as the sample radiology report under the joint condition of the potential feature corresponding to the text feature and the sample image;
inputting the visual features of the sample image into the potential feature encoder, and acquiring a second probability distribution according to the output of the potential feature encoder, wherein the second probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample image;
obtaining a third probability distribution according to the potential features corresponding to the text features, wherein the third probability distribution is the probability distribution of the potential features corresponding to the text features under the condition of the sample radiology report;
obtaining a loss of the target training sample based on the first probability distribution, the second probability distribution, and the third probability distribution.
4. The radiology report generation method of claim 1, wherein the obtaining a first character-level feature for each character and a first aggregate feature for each sentence in the current radiology report output by the first attention layer comprises:
taking the first character-level feature corresponding to the preset sentence marking character of each sentence in the current radiology report as the first aggregation feature of each sentence in the radiology report.
5. The radiology report generation method of claim 1, wherein the encoding, by the second attention layer, the latent features of the to-be-processed image into the semantic features of each current sentence of the radiology report to obtain a second aggregate feature of each current sentence of the radiology report comprises:
generating a query embedding of a sentence according to the first aggregated features of the sentence;
generating key embedding and value embedding of sentences according to the potential features of the images to be processed;
performing a multi-attention mechanism based on the query embedding, key embedding, and value embedding of each sentence to obtain the second aggregate features of each sentence.
6. The radiology report generation method of claim 1, wherein the encoding, by the third attention layer, the visual features of the to-be-processed image into semantic features of each character of the current radiology report to obtain second character-level features corresponding to each character of the current radiology report comprises:
generating query embeddings of the literal characters according to the first character-level features of the literal characters, and generating query embeddings of the preset sentence marking characters of the sentence according to the second aggregation features of the sentence;
generating key embedding and value embedding of characters according to the visual features of the image to be processed;
performing a multi-head attention mechanism based on the query embedding, key embedding, and value embedding of each character to obtain the second character-level feature of each character.
7. The radiology report generation method of claim 1, wherein the layered decoder further comprises a feed forward layer comprising at least one linear transformation layer; the obtaining a next character in the current radiology report according to each of the second character-level features includes:
inputting each of the second character-level features to the feed-forward layer;
and obtaining the next character in the current radiology report according to the output of the feedforward layer.
8. A radiology report generating device, comprising:
the image acquisition module is used for acquiring an image to be processed and inputting the image to be processed into a trained report generation model, wherein the report generation model comprises a visual feature encoder, a target embedding matrix, a potential feature encoder and a layered decoder, and the layered decoder comprises a first attention layer, a second attention layer and a third attention layer;
the potential feature extraction module is used for acquiring the visual features of the image to be processed through the visual feature encoder, inputting the visual features of the image to be processed into the potential feature encoder, and acquiring the potential features corresponding to the image to be processed output by the potential feature encoder;
a first attention module, configured to obtain an embedding feature of each character in a current radiology report according to the target embedding matrix, input each embedding feature to the first attention layer, and obtain a first character-level feature of each character and a first aggregation feature of each sentence in the current radiology report output by the first attention layer;
a second attention module, configured to input the first aggregate features of each current sentence of the radiology report and the latent features of the to-be-processed image into the second attention layer, and encode the latent features of the to-be-processed image into semantic features of each current sentence of the radiology report through the second attention layer, so as to obtain second aggregate features of each current sentence of the radiology report;
a third attention module, configured to input each of the second aggregate features, each of the first character-level features, and the visual features of the to-be-processed image into the third attention layer, encode the visual features of the to-be-processed image into semantic features of each character of the current radiology report through the third attention layer, obtain a second character-level feature corresponding to each character of the current radiology report, and obtain a next character in the current radiology report according to each of the second character-level features;
the circulation module is used for calling the first attention module to re-execute the step of acquiring the embedded features of each character in the current radiology report after the third attention module outputs the next character of the current radiology report until a preset end character is acquired, and obtaining a target radiology report corresponding to the image to be processed;
wherein the initial content of the radiology report is a preset sentence marker character.
9. A terminal, characterized in that the terminal comprises: a processor, a computer readable storage medium communicatively connected to the processor, the computer readable storage medium adapted to store a plurality of instructions, the processor adapted to invoke the instructions in the computer readable storage medium to perform the steps of implementing the radiology report generation method of any one of claims 1-7 above.
10. A computer readable storage medium, storing one or more programs, the one or more programs being executable by one or more processors for performing the steps of the radiology report generation method of any one of claims 1-7.
CN202111346347.8A 2021-11-15 2021-11-15 Radiology report generation method, device, terminal and storage medium Active CN114334068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111346347.8A CN114334068B (en) 2021-11-15 2021-11-15 Radiology report generation method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111346347.8A CN114334068B (en) 2021-11-15 2021-11-15 Radiology report generation method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN114334068A CN114334068A (en) 2022-04-12
CN114334068B true CN114334068B (en) 2022-11-01

Family

ID=81044900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111346347.8A Active CN114334068B (en) 2021-11-15 2021-11-15 Radiology report generation method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN114334068B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109545302A (en) * 2018-10-22 2019-03-29 复旦大学 A kind of semantic-based medical image report template generation method
CN110111864A (en) * 2019-04-15 2019-08-09 中山大学 A kind of medical report generation model and its generation method based on relational model
CN111126024A (en) * 2018-10-12 2020-05-08 西门子医疗有限公司 Statement generation
WO2020121308A1 (en) * 2018-12-11 2020-06-18 Cvaid Ltd. Systems and methods for diagnosing a stroke condition
CN112529857A (en) * 2020-12-03 2021-03-19 重庆邮电大学 Ultrasonic image diagnosis report generation method based on target detection and strategy gradient
CN112992308A (en) * 2021-03-25 2021-06-18 腾讯科技(深圳)有限公司 Training method of medical image report generation model and image report generation method
CN113035311A (en) * 2021-03-30 2021-06-25 广东工业大学 Medical image report automatic generation method based on multi-mode attention mechanism
CN113065496A (en) * 2021-04-13 2021-07-02 湖南大学 Neural network machine translation model training method, machine translation method and device
CN113313199A (en) * 2021-06-21 2021-08-27 北京工业大学 Brain CT medical report automatic generation method based on weak supervision attention
CN113378112A (en) * 2021-06-18 2021-09-10 浙江工业大学 Point cloud completion method and device based on anisotropic convolution
CN113505701A (en) * 2021-07-12 2021-10-15 辽宁工程技术大学 Variational self-encoder zero sample image identification method combined with knowledge graph
CN113538662A (en) * 2021-07-05 2021-10-22 北京工业大学 Single-view three-dimensional object reconstruction method and device based on RGB data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021051091A1 (en) * 2019-09-13 2021-03-18 Rad Al, Inc. Method and system for automatically generating a section in a radiology report

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126024A (en) * 2018-10-12 2020-05-08 西门子医疗有限公司 Statement generation
CN109545302A (en) * 2018-10-22 2019-03-29 复旦大学 A kind of semantic-based medical image report template generation method
WO2020121308A1 (en) * 2018-12-11 2020-06-18 Cvaid Ltd. Systems and methods for diagnosing a stroke condition
CN110111864A (en) * 2019-04-15 2019-08-09 中山大学 A kind of medical report generation model and its generation method based on relational model
CN112529857A (en) * 2020-12-03 2021-03-19 重庆邮电大学 Ultrasonic image diagnosis report generation method based on target detection and strategy gradient
CN112992308A (en) * 2021-03-25 2021-06-18 腾讯科技(深圳)有限公司 Training method of medical image report generation model and image report generation method
CN113035311A (en) * 2021-03-30 2021-06-25 广东工业大学 Medical image report automatic generation method based on multi-mode attention mechanism
CN113065496A (en) * 2021-04-13 2021-07-02 湖南大学 Neural network machine translation model training method, machine translation method and device
CN113378112A (en) * 2021-06-18 2021-09-10 浙江工业大学 Point cloud completion method and device based on anisotropic convolution
CN113313199A (en) * 2021-06-21 2021-08-27 北京工业大学 Brain CT medical report automatic generation method based on weak supervision attention
CN113538662A (en) * 2021-07-05 2021-10-22 北京工业大学 Single-view three-dimensional object reconstruction method and device based on RGB data
CN113505701A (en) * 2021-07-12 2021-10-15 辽宁工程技术大学 Variational self-encoder zero sample image identification method combined with knowledge graph

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Hybrid retrieval-generation reinforced agent for medical image report generation";Yuan Li 等;《Advances in neural information processing systems》;20181231;第1-9页 *
"Knowledge-driven encode,retrieve,paraphrase for medical image report generation";Christy Y Li 等;《Proceedings of the AAAI Conference on Artificial Intelligence》;20191231;第6666-6673页 *
"基于深度学习的 X 线胸片肺部描述自动生成";黄欣 等;《模式识别与人工智能》;20210615;第34卷(第6期);第552-560页 *
"基于深度神经网络的成分句法分析";戴碧云;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20210815;第I138-695页 *

Also Published As

Publication number Publication date
CN114334068A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
US11062179B2 (en) Method and device for generative adversarial network training
JP7317050B2 (en) Systems and methods for integrating statistical models of different data modalities
US11562147B2 (en) Unified vision and dialogue transformer with BERT
CN110837836B (en) Semi-supervised semantic segmentation method based on maximized confidence
CN109697228A (en) Intelligent answer method, apparatus, computer equipment and storage medium
CN110532397B (en) Question-answering method and device based on artificial intelligence, computer equipment and storage medium
Das et al. Béziersketch: A generative model for scalable vector sketches
CN110929515A (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
CN110019471A (en) Text is generated from structural data
CN110933518B (en) Method for generating query-oriented video abstract by using convolutional multi-layer attention network mechanism
CN113157863B (en) Question-answer data processing method, device, computer equipment and storage medium
CN111460883A (en) Video behavior automatic description method based on deep reinforcement learning
CN111027681B (en) Time sequence data processing model training method, data processing method, device and storage medium
CN113724359A (en) CT report generation method based on Transformer
Li et al. Image operation chain detection with machine translation framework
CN115424691A (en) Case matching method, system, device and medium
CN113220874B (en) Multi-label text classification method and system
CN113255767B (en) Bill classification method, device, equipment and storage medium
CN108122613B (en) Health prediction method and device based on health prediction model
CN115115828A (en) Data processing method, apparatus, program product, computer device and medium
CN114334068B (en) Radiology report generation method, device, terminal and storage medium
CN117422065A (en) Natural language data processing system based on reinforcement learning algorithm
CN116884636A (en) Infectious disease data analysis method, infectious disease data analysis device, computer equipment and storage medium
CN116719934A (en) Method for extracting small sample relation under continuous learning based on prompt contrast learning
CN116186223A (en) Financial text processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant