automatic identification method and system for standard fetal section in ultrasonic image
Technical Field
the invention belongs to the technical field of prenatal ultrasonic examination, and particularly relates to an automatic identification method and system for a standard fetal section in an ultrasonic image.
background
ultrasound imaging has been widely used in pregnancy diagnosis at present due to its advantages of low consumption, real-time imaging and no radiation. In the process of performing ultrasonic diagnosis of fetal diseases by using ultrasonic imaging, an sonographer firstly needs to acquire each standard section of a fetus, then identifies the standard sections, inspects main anatomical structures through the standard sections, then performs further diagnosis and inspection, and performs analysis and diagnosis according to the growth condition of the fetus and whether the anatomical structures of the fetus are abnormal.
in the current ultrasonic diagnosis process, the identification of the standard section is mainly obtained by manually marking points and lines on the basis of manually obtaining the standard section or three-dimensional ultrasonic volume data based on two-dimensional ultrasonic scanning. However, this identification method has some non-negligible drawbacks: first, the acquisition of a standard cut plane is quite demanding for the sonographer as it is overly dependent on the sonographer's clinical experience and anatomical knowledge; secondly, the accuracy and consistency of the examination result are low due to the difference of the recognition results of different sonographers on the standard section; finally, the existing identification method has the disadvantages of overlong examination time for pregnant women, frequent operation and high labor intensity in the examination process of sonographers.
Disclosure of Invention
aiming at the defects or the improvement requirements of the prior art, the invention provides an automatic identification method and an automatic identification system for a standard fetal section in an ultrasonic image, and aims to solve the technical problems of relatively high technical requirements on sonographers, low accuracy and consistency caused by difference of examination results of different sonographers, overlong examination time, frequent operation in the examination process of the sonographers and high labor intensity in the existing identification method for the standard fetal section.
In order to achieve the above object, according to an aspect of the present invention, there is provided a method for automatically identifying a standard fetal tangent plane in an ultrasound image, comprising the steps of:
(1) Acquiring a data set;
(2) And (2) carrying out preprocessing operation on the data set acquired in the step (1) to obtain a preprocessed data set.
(3) And (3) inputting the preprocessed data set obtained in the step (2) into a trained deep convolutional neural network to obtain the category and the position of the target contained in each fetal ultrasound sectional plane image.
Preferably, the data set is a plurality of frames of fetal ultrasound sectional images acquired from an ultrasound device.
Preferably, the preprocessing operation performed on the acquired data set in step (2) includes the sub-steps of:
(2-1) deleting redundant information related to ultrasonic equipment parameters in each image in the acquired data set, zooming the image to 800x600 pixels, and performing normalization processing on the zoomed image by using a linear function to obtain a normalized image;
(2-2) carrying out random enhancement operation on each image normalized in the step (2-1) to obtain a randomly enhanced image;
and (2-3) graying each image randomly enhanced in the step (2-2) to obtain a grayscale image, performing histogram equalization on the obtained grayscale image to obtain an equalized image, processing the grayscale image by using a Gaussian difference function to obtain a Gaussian difference image, and synthesizing each obtained grayscale image, the equalized image corresponding to the grayscale image and the Gaussian difference image into a three-channel image serving as a preprocessed image, wherein all preprocessed images form a preprocessed data set.
preferably, the deep convolutional neural network comprises a backbone network ResNet-50, a feature pyramid network, a classification subnet, and a positioning subnet, which are connected in sequence.
preferably, for the backbone network ResNet-50, the network structure is as follows:
the first layer is an input layer, which is a matrix of 600 × 800 × 3 pixels;
the second layer is a feature extraction layer, which uses the public feature extraction network Resnet-50, and takes the output matrices of three layers, i.e., the conv3.x layer, the conv4.x layer and the conv5.x layer, in the feature extraction network Resnet-50 as extracted features C3, C4 and C5, whose sizes are 75 × 100 × 512, 38 × 50 × 1024 and 19 × 25 × 2048, respectively.
preferably, the feature pyramid layer performs feature fusion on features C3, C4 and C5 input by the backbone network ResNet-50, and outputs fused 5-scale features P3, P4, P5, P6 and P7, and the network structure of the feature pyramid layer is as follows:
The first layer is a convolution layer based on feature C5 with convolution kernel size 1 x 256 and step size 1, this layer is filled using SAME pattern, the output matrix is 19 x 25 x 256;
The second layer is a convolution layer with convolution kernel size 3 x 256 and step size 1, this layer is filled using SAME pattern, its output matrix P5 size is 19 x 25 x 256;
the third layer is a convolution layer based on feature C4, with convolution kernel size 1 x 256 and step size 1, this layer is filled using SAME pattern, and its output matrix P4_ size 38 x 50 x 256 is noted;
The fourth layer is an upsampling layer, which upsamples the output matrix P5 into an output matrix P5_ upsamplie, which has a size of 38 × 50 × 1024;
the fifth layer is an add layer that adds an output matrix P5_ update and an output matrix P4_ with an output matrix size of 38 x 50 x 1024;
the sixth layer is a convolution layer with convolution kernel size 3 x 256 and step size 1, filled using SAME pattern, and output matrix P4 size 38 x 50 x 256;
The seventh layer is a convolutional layer based on feature C3 with convolutional kernel size 1 x 256 and step size 1, filled using SAME pattern, with output matrix P3_ size 75 x 100 x 256;
the eighth layer is an upsampling layer, which upsamples P4 to size 75 × 100, with an output matrix P4_ upsampled size of 75 × 100 × 512;
The ninth layer is an Add layer that adds P4_ update and P3_ with an output matrix size of 75 × 100 × 512;
the tenth layer is a convolution layer with convolution kernel size 3 x 256 and step size 1, this layer is filled using SAME pattern, output matrix P3 size 75 x 100 x 512;
the eleventh layer is a convolution layer on C5 with convolution kernel size 3 x 256 and step size 2, this layer is filled with SAME pattern, and its output matrix P6 is 19 x 25 x 256 in size;
the twelfth layer is a convolution layer with convolution kernel size 3 x 256 and step size 2, filled using SAME pattern, and output matrix P7 size 19 x 25 x 256.
Preferably, the input matrices for the sorting and localization subnets are identical to the output matrices P3, P4, P5, P6, P7 of the aforementioned characteristic pyramid layers, the first to fourth layers of the sorting and localization subnets are all sequentially connected, identical convolution layers, the convolution kernel size is 3 × 3 256, the step size is 1, the first to fourth layers are all filled using SAME pattern, the output matrices of the layers are the SAME size, are all 75 × 100 × 256, 38 × 50 × 256, 19 × 25 × 256, for the localization subnets, the fifth layer is a convolution, the convolution kernel size is 3 × 36, the step size is 1, the layers are filled using SAME pattern, the output matrices are 75 × 100 × 36, 38 × 50, 19 × 36, 25 × 36 for the localization subnets, the convolution kernel size was 3 x 80, the step size was 1, the layers were filled using SAME patterns, and the output matrix sizes were 75 x 100 x 80, 38 x 50 x 80, 19 x 25 x 80, respectively.
preferably, the deep convolutional neural network is trained by the following steps:
(a1) Acquiring a data set, sending the data set to an ultrasonic expert, and acquiring the data set labeled by the ultrasonic expert;
(a2) Preprocessing the labeled data set to obtain a preprocessed data set;
(a3) counting the data set labeled in the step (a1) by using a K-means clustering algorithm to obtain 3 proportional values which can represent the length and the width of the key target in the data set most and are used as the proportion of the anchor points in the deep convolutional neural network;
(a4) inputting a batch of data in the training set part in the preprocessed data set obtained in the step (a2) into a deep convolutional neural network to obtain an inference output, and inputting the inference output and the data set labeled by the ultrasonic expert in the step (a1) into a loss function Lfl of the deep convolutional neural network to obtain a loss value.
(a5) optimizing a loss function L fl of the deep convolutional neural network according to an Adam algorithm by using the loss value obtained in the step (a 4);
(a6) And (d) repeating the steps (a4) and (a5) in sequence for the rest batches of data in the training set part in the preprocessed data set obtained in the step (a2) until the number of iterations is reached, so as to obtain the trained deep convolutional neural network.
preferably, the loss values used in the deep convolutional neural network are calculated by the following loss function L fl:
wherein y' represents the inference output of the deep convolutional neural network on the input image, y represents whether the evaluation object is a foreground, γ represents the rate of adjusting the weight reduction of the simple samples, γ belongs to [0,5], α is a weight factor, and α belongs to [0,1 ].
according to another aspect of the present invention, there is provided a system for automatically identifying a standard fetal tangent plane in an ultrasound image, comprising:
A first module for obtaining a data set;
and the second module is used for carrying out preprocessing operation on the data set acquired by the first module to obtain a preprocessed data set.
And the third module is used for inputting the preprocessing data set obtained by the second module into the trained deep convolutional neural network so as to obtain the category and the position of the target contained in each fetal ultrasonic sectional image.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) since the data sets used in the learning process of the invention are all selected and precisely labeled by the sonographer according to clinical experience, the invention can obtain the knowledge of the most experienced sonographer through machine learning, so that the sonographer can be programmed and automated in the whole process of patient examination, thereby solving the technical problem that the prior standard section identification method excessively depends on the clinical experience of the sonographer and the knowledge of the anatomical structure, and therefore the technical requirement of the acquisition of the standard section on the sonographer is quite high.
(2) The selection standard of the data set adopted in the training process is finally and uniquely determined by the sonographer, so that the concept of the standard tangent plane of the data set for training is clearly defined, namely, the standard tangent planes of various types contained in the data set are deterministic and regular, and the technical problem of low accuracy and consistency caused by the difference of the examination results of different sonographers in the existing standard tangent plane identification method can be solved.
(3) the invention is full-automatic and programmed, the sonographer does not need to stop to manually intercept the section in the detection process, the accuracy of the intercepted section does not need to be checked, and the standard section is identified in real time, so the technical problems of overlong examination time, frequent operation in the examination process of the sonographer and high labor intensity in the existing standard section identification method can be solved.
drawings
FIG. 1 is a flow chart of the method for automatically identifying a standard fetal tangent plane in an ultrasound image according to the present invention.
FIG. 2 is an architectural diagram of a deep self-convolution neural network used in step (2) of the method of the present invention;
fig. 3 is a schematic diagram of the recognition results obtained after different sections of the fetus are input into the deep convolutional neural network of the present invention.
Detailed Description
in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
the invention aims to provide an automatic identification method of a fetal standard section in an ultrasonic image, which enables a computer to learn autonomously and artificially guide the computer to learn clinical prior knowledge so as to achieve the automatic identification of more than 40 standard sections required clinically. And (4) using a deep learning network to train and learn the ultrasonic section of the fetus in all categories. Then, the newly acquired ultrasonic images are inferred, and the categories of various sections and the probability of the sections meeting the standard are judged.
as shown in fig. 1, the present invention provides a method for automatically identifying a standard fetal tangent plane in an ultrasound image, comprising the following steps:
(1) acquiring a data set;
in particular, the data set is a multi-frame fetal ultrasound sectional image taken from ultrasound equipment manufactured by mainstream manufacturers on the market (including samsung, siemens, kelly, etc.).
(2) and (2) carrying out preprocessing operation on the data set acquired in the step (1) to obtain a preprocessed data set.
The preprocessing operation on the acquired data set in the step comprises the following substeps:
(2-1) deleting redundant information related to ultrasonic equipment parameters in each image in the acquired data set, zooming the image to 800x600 pixels, and performing normalization processing on the zoomed image by using a linear function to obtain a normalized image;
(2-2) carrying out random enhancement operation on each image normalized in the step (2-1) to obtain a randomly enhanced image;
specifically, the enhancement operation may be one of an angular rotation, a horizontal-vertical flipping, a fill-mode scaling of edge mirroring, a small-scale degree of brightness random perturbation, or any combination thereof.
(2-3) carrying out graying processing on each image randomly enhanced in the step (2-2) to obtain a grayscale image, carrying out histogram equalization processing on the obtained grayscale image to obtain an equalized image, processing the grayscale image by utilizing a Gaussian difference function to obtain a Gaussian difference image, synthesizing each obtained grayscale image, the equalized image corresponding to the grayscale image and the Gaussian difference image into a three-channel image serving as a preprocessed image, and forming a preprocessed data set by all preprocessed images;
(3) and (3) inputting the preprocessed data set obtained in the step (2) into a trained deep convolutional neural network to obtain the category and the position of the target contained in each fetal ultrasound sectional plane image.
as shown in fig. 2, the deep convolutional neural network used in the present invention includes a backbone network ResNet-50, a Feature Pyramid Network (FPN), a classification subnet, and a positioning subnet, which are connected in sequence.
For the backbone network ResNet-50, the network structure is as follows:
The first layer is an input layer, which is a matrix of 600 × 800 × 3 pixels;
the second layer is a feature extraction layer, which adopts the public feature extraction network Resnet-50, and takes the output matrixes of three layers, namely, the conv3.x layer, the conv4.x layer and the conv5.x layer in the feature extraction network Resnet-50 as extracted features C3, C4 and C5, wherein the sizes of the extracted features are 75 × 100 × 512, 38 × 50 × 1024 and 19 × 25 × 2048 respectively;
for the feature pyramid layer, feature fusion is performed on features C3, C4 and C5 input by a backbone network ResNet-50, and fused 5-scale features P3, P4, P5, P6 and P7 are output, and the feature pyramid layer can be further subdivided into specific 12 layers, and the network structure is as follows:
the first layer is a convolution layer based on feature C5 with convolution kernel size 1 x 256 and step size 1, this layer is filled using SAME pattern, the output matrix is 19 x 25 x 256;
the second layer is a convolution layer with convolution kernel size 3 x 256 and step size 1, this layer is filled using SAME pattern, its output matrix P5 size is 19 x 25 x 256;
the third layer is a convolution layer based on feature C4, with convolution kernel size 1 x 256 and step size 1, this layer is filled using SAME pattern, and its output matrix P4_ size 38 x 50 x 256 is noted;
the fourth layer is an upsampling layer, which upsamples the output matrix P5 into an output matrix P5_ upsamplie, which has a size of 38 × 50 × 1024;
the fifth layer is an Add (Add) layer that adds output matrix P5_ upsample and output matrix P4_ with an output matrix size of 38 x 50 x 1024;
the sixth layer is a convolution layer with convolution kernel size 3 x 256 and step size 1, filled using SAME pattern, and output matrix P4 size 38 x 50 x 256;
the seventh layer is a convolutional layer based on feature C3 with convolutional kernel size 1 x 256 and step size 1, filled using SAME pattern, with output matrix P3_ size 75 x 100 x 256;
The eighth layer is an upsampling layer, which upsamples P4 to size 75 × 100, with an output matrix P4_ upsampled size of 75 × 100 × 512;
The ninth layer is an Add layer that adds P4_ update and P3_ with an output matrix size of 75 × 100 × 512;
The tenth layer is a convolution layer with convolution kernel size 3 x 256 and step size 1, this layer is filled using SAME pattern, output matrix P3 size 75 x 100 x 512;
The eleventh layer is a convolution layer on C5 with convolution kernel size 3 x 256 and step size 2, this layer is filled with SAME pattern, and its output matrix P6 is 19 x 25 x 256 in size;
the twelfth layer is a convolution layer with convolution kernel size 3 x 256 and step size 2, filled using SAME pattern, and output matrix P7 size 19 x 25 x 256;
for the classification subnet and the positioning subnet, both are 5 layers, and the input matrices of both are the same, and the output matrices P3, P4, P5, P6, P7, which are all feature pyramid layers, are also the same, and the specific structures of the first 4-layer network structures of both are the same as follows:
the first to fourth layers are sequentially connected, completely identical convolution layers, the convolution kernel size is 3 x 256, the step size is 1, the first to fourth layers are all filled by using the SAME mode, and the output matrixes of the layers are the SAME in size and are all 75 x 100 x 256, 38 x 50 x 256, 19 x 25 x 256 and 19 x 25 x 256;
The fifth layer is a convolution layer with convolution kernel size 3 x 36, step size 1, filled using SAME pattern, output matrix size 75 x 100 x 36, 38 x 50 x 36, 19 x 25 x 36, respectively; for the class subnet, the convolution kernel size is 3 x 80, the step size is 1, the layer is filled using SAME pattern, and the output matrix size is 75 x 100 x 80, 38 x 50 x 80, 19 x 25 x 80, respectively.
The loss value used in the present deep convolutional neural network is calculated by the following loss function L fl:
Where y' represents the inference output of the deep convolutional neural network on the input image, y represents whether the evaluation object is foreground, γ represents the rate of adjusting the weight reduction of the simple samples, γ is 0,5, preferably 2, α is a weighting factor for balancing the importance of the positive/negative number samples, and α is 0, 1.
specifically, the deep convolutional neural network used in this step is obtained by training through the following steps:
(1) acquiring a data set, sending the data set to an ultrasonic expert, and acquiring the data set labeled by the ultrasonic expert;
Specifically, the data set is 30000 fetal ultrasound sectional images obtained from ultrasound devices manufactured by major manufacturers on the market (including samsung, siemens, kelly, etc.), and these fetal ultrasound sectional images are randomly divided into 3 parts, of which 80% is a training set (Train set), 10% is a verification set (Validation set), and 10% is a test set (Testset).
(2) preprocessing the labeled data set to obtain a preprocessed data set;
specifically, the preprocessing process in this step is completely the same as the preprocessing process described above, and is not described herein again.
(3) Counting the data set labeled in the step (1) by using a K-means clustering algorithm to obtain 3 proportional values which can represent the length and the width of a key target in the data set most and are used as the proportion of anchor points (anchors) in the deep convolutional neural network;
(4) and (3) inputting a batch of secondary data in the training set part in the preprocessed data set obtained in the step (2) into the deep convolutional neural network to obtain inference output, and inputting the inference output and the data set labeled by the ultrasonic expert in the step (1) into a loss function L fl of the deep convolutional neural network to obtain a loss value.
specifically, one batch of data is 4 images;
(5) And (4) optimizing a loss function L fl of the deep convolutional neural network according to the Adam algorithm and by using the loss value obtained in the step (4) so as to achieve the purpose of gradually updating the parameters in the deep convolutional neural network.
specifically, in the optimization process, the learning rate lr is 0.001, the impulse ξ is 0.9, and the weight attenuation ψ is 0.004.
(6) Repeating the step (4) and the step (5) in sequence aiming at the residual batch of data in the training set part in the preprocessed data set obtained in the step (2) until the iteration number is reached, thereby obtaining a trained deep convolutional neural network;
Specifically, the training process in this step includes 120 cycles, and the number of iterations in each cycle is 6000.
(7) And (3) verifying the trained deep convolutional neural network by using the test set part in the preprocessed data set obtained in the step (2).
Test results
Inputting a new non-standard section, an approximate standard section and a standard section of each part of the fetus in prenatal ultrasonic examination into a trained model, automatically identifying the type of the input section by the model and giving an identification result, as shown in fig. 3.
The average accuracy, the standard tangent plane detection rate and the standard tangent plane false detection rate of the trained model on a new test set are as follows, and it can be seen that the accuracy and the standard tangent plane detection rate of the invention are quite high, and the standard tangent plane false detection rate is very low.
Average rate of accuracy
|
Standard section detection rate
|
false detection rate of standard tangent plane
|
98.61%
|
96.88%
|
0.001% |
it will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.