CN112950638B

CN112950638B - Image segmentation method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN112950638B
Application number: CN201911258179.XA
Authority: CN
Inventors: 沈涛; 郭又文; 邹斌; 郭健; 陈芳
Original assignee: BGI Shenzhen Co Ltd
Current assignee: BGI Shenzhen Co Ltd
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2023-12-29
Anticipated expiration: 2039-12-10
Also published as: CN112950638A

Abstract

The embodiment of the application discloses an image segmentation method, an image segmentation device, electronic equipment, a processor and a computer readable storage medium. The method comprises the following steps: u-shaped expansion is carried out on a preset Yolov3 model to obtain an improved Yolov3 model comprising a U-shaped sampling unit, wherein the U-shaped sampling unit comprises a downsampling network and an upsampling network, the downsampling network is composed of a Darknet53 network of the Yolov3 model, and the upsampling network is formed by connecting a plurality of upsampling modules in the Yolov3 model in series; training the improved YOLOv3 model by using a sample image of a preset training sample set to obtain an image segmentation model; and carrying out image segmentation processing on the image to be processed by utilizing the image segmentation model. According to the embodiment, the Darknet53 is adopted as a downsampling network, so that the depth of the sampling network is large, the expression capacity of the model is high, the whole model is easy to train, and the method is applicable to complex image segmentation scenes.

Description

Image segmentation method, device, electronic equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image segmentation method and apparatus based on an improved YOLOv3 model, an electronic device, and a computer readable storage medium.

Background

After the deep learning has emerged, techniques for image processing using convolutional neural networks, such as the U-Net model (shown in fig. 1), the YOLOv3 model (shown in fig. 2), and the like. The U-Net model belongs to a small image segmentation network, and because the depth of the U-Net model is insufficient, if the depth of the model is increased by increasing the number of downsampling modules, the risk of gradient disappearance exists, and the training difficulty is further increased, so that the U-Net model is generally applied to medical image segmentation (because the semantics of medical images are simpler and the structure is more fixed), in other words, the U-Net model is suitable for a simpler image segmentation task, and the YOLOv3 model is generally applied to target detection. For complex image segmentation tasks, no effective solution exists at present.

Disclosure of Invention

In view of the foregoing, there is a need for an image segmentation method, apparatus, electronic device, and computer-readable storage medium based on an improved YOLOv3 model, which can perform image segmentation processing on a complex image.

An embodiment of the present invention provides an image segmentation method based on an improved YOLOv3 model, the method including:

performing U-shaped expansion on a preset Yolov3 model to obtain an improved Yolov3 model comprising a U-shaped sampling unit, wherein the U-shaped sampling unit comprises a down-sampling network and an up-sampling network, the down-sampling network and the up-sampling network are connected in a layer-skipping manner, the down-sampling network is composed of a Darknet53 network of the preset Yolov3 model, and the up-sampling network is formed by connecting a plurality of up-sampling modules in the preset Yolov3 model in series;

training the improved YOLOv3 model by using a sample image of a preset training sample set to obtain an image segmentation model; a kind of electronic device with high-pressure air-conditioning system

And carrying out image segmentation processing on the image to be processed by using the image segmentation model.

Preferably, a convolution-standardization linear module is further connected between the downsampling network and the upsampling network, and the convolution-standardization linear module comprises five convolution-standardization linear units connected in series, and the convolution-standardization linear units consist of a convolution layer, a batch of standardization layers and a leakage correction linear layer.

Preferably, the number of residual modules included in the downsampling network is equal to the number of upsampling modules included in the upsampling network.

Preferably, the downsampling network comprises a convolution-standardization linear unit and N residual modules, the upsampling network comprises N upsampling modules, the output ends of the convolution-standardization linear unit of the downsampling network are connected to the nth upsampling module in a layer-skipping manner, the output ends of the first residual module to the nth-1 residual module are respectively connected to the nth-1 upsampling module to the first upsampling module in a layer-skipping manner, and the nth residual module is connected to the first upsampling module through the convolution-standardization linear module.

Preferably, the training the modified YOLOv3 model by using the sample image of the preset training sample set includes:

dividing the sample image of the preset training sample set into a training set and a testing set;

training the improved YOLOv3 model by utilizing the training set;

testing the trained improved YOLOv3 model by using the test set, and obtaining a model segmentation accuracy according to statistics of each test result;

judging whether the model segmentation accuracy is greater than a preset threshold;

and if the model segmentation accuracy is greater than the preset threshold, taking the trained improved YOLOv3 model as the image segmentation model.

Preferably, before dividing the sample image of the preset training sample set into a training set and a testing set, the method includes:

carrying out random preprocessing and normalization processing on each sample image of the preset training sample set;

wherein the pretreatment comprises one of the following treatment modes: and carrying out vertical overturning treatment on the sample image, carrying out horizontal overturning treatment on the sample image, carrying out vertical and horizontal overturning treatment on the sample image, and carrying out no overturning treatment on the sample image.

Preferably, the loss function of the modified YOLOv3 model is a binary cross entropy loss function.

An embodiment of the present invention provides an image segmentation apparatus based on an improved YOLOv3 model, the apparatus including:

the expansion module is used for carrying out U-shaped expansion on a preset Yolov3 model to obtain an improved Yolov3 model comprising a U-shaped sampling unit, wherein the U-shaped sampling unit comprises a down-sampling network and an up-sampling network, the down-sampling network and the up-sampling network are connected in a layer-skipping manner, the down-sampling network is formed by a Darknet53 network of the preset Yolov3 model, and the up-sampling network is formed by connecting a plurality of up-sampling modules in the preset Yolov3 model in series;

the training module is used for training the improved YOLOv3 model by using a sample image of a preset training sample set to obtain an image segmentation model; a kind of electronic device with high-pressure air-conditioning system

And the processing module is used for carrying out image segmentation processing on the image to be processed by utilizing the image segmentation model.

An embodiment of the present invention provides an electronic device including:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the improved YOLOv3 model-based image segmentation method described above.

An embodiment of the present invention provides a computer readable storage medium storing a computer program, where the program when executed by a processor implements the image segmentation method based on the improved YOLOv3 model described above.

According to the image segmentation method, the device, the electronic equipment and the computer readable storage medium based on the improved YOLOv3 model, which are provided by the embodiment of the application, by adopting the dark Net53 containing a residual structure as a downsampling network, the depth of the sampling network is large, so that the model has strong expression capability and the whole model is easy to train, the method and the device are applicable to complex image segmentation scenes, and the image segmentation effect is superior to that of a traditional U-Net model.

Drawings

FIG. 1 is a schematic diagram of a prior art U-Net model for image segmentation;

FIG. 2 is a schematic diagram of the structure of a YOLOv3 model of the prior art;

FIG. 3A is a schematic diagram of an image segmentation model according to an embodiment of the present invention;

FIG. 3B is a schematic diagram of the structure of a convolution-normalized linear unit (CBL) of an embodiment of the present invention;

FIG. 3C is a schematic diagram of a residual unit (Res_Unit) according to an embodiment of the present invention;

FIG. 3D is a schematic diagram of a residual block (Resn) according to an embodiment of the present invention;

fig. 3E is a schematic diagram of an UP-sampling module (UP) according to an embodiment of the present invention;

FIG. 4 is a flow chart of an image segmentation method based on the modified Yolov3 model according to an embodiment of the present invention;

FIG. 5 is a functional block diagram of an image segmentation apparatus based on a modified Yolov3 model according to an embodiment of the present invention;

fig. 6 is a functional block diagram of an electronic device according to an embodiment of the present invention.

Description of the main reference signs

Detailed Description

The invention will be further described in the following detailed description in conjunction with the above-described figures.

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It is further intended that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Referring to fig. 3A, an image segmentation model to which some embodiments of the image segmentation method or image segmentation apparatus of the present application may be applied is shown.

As shown in fig. 3A, the basic structure of the image segmentation model 11 is derived from the YOLOv3 model shown in fig. 2, the YOLOv3 model shown in fig. 2 is subjected to U-shape expansion by referring to the U-Net model of fig. 1 to obtain an improved YOLOv3 model containing a U-shaped sampling structure, and finally the improved YOLOv3 model containing the U-shaped sampling structure is trained to obtain the image segmentation model 11. The image segmentation model 11 can be used for image segmentation tasks like a U-Net model, and by using the dark 53 network in the YOLOv3 model as a downsampling part, a solution to the complex image segmentation problem can be achieved.

Since the structural module of the image segmentation model 11 is derived from the YOLOv3 model and a U-shaped structural design of the U-Net model is used, the image segmentation model 11 will be simply referred to as a YUnet model hereinafter.

In an embodiment, the YUnet model includes a downsampling network 111 and an upsampling network 112, the downsampling network 111 may be formed by a dark net53 network of the YOLOv3 model, and the upsampling network 112 may be formed by concatenating a plurality of upsampling modules ("UP 1121" in fig. 3A) in the YOLOv3 model. As shown in fig. 3A, the downsampling network 111 includes a convolution-normalized linear unit ("CBL 1111" in fig. 3A) and five residual modules ("Res 1, res2, res8, res 4" in fig. 3A), where Res1 represents the inclusion of 1 residual unit (res_unit), res2 represents the inclusion of 2 residual units, res8 represents the inclusion of 8 residual units, and Res4 represents the inclusion of 4 residual units. The downsampling network 111 is further connected to the upsampling network 112 by a convolutional-normalized linear block ("CBL 5 x 1112" in fig. 3A), i.e. the last residual block Res4 of the downsampling network 111 is connected to the first upsampling block of the upsampling network 112 by CBL 5, wherein CBL 5 is denoted as five CBLs in series, i.e. the convolutional-normalized linear block comprises five convolutional-normalized linear units in series. The output of the upsampling network 112 is further coupled to a CBL and convolutional layer conversion unit ("Conv 1113" in fig. 3A) to adjust the thickness of the output feature map, where CBL may first change the thickness to n1, conv may change the thickness to n, and the final output of the model is [416, n ]. Wherein, the values of n1 and n can be set and adjusted according to the actual requirements. For example, for two types of segmentation, the CBL may first change the thickness to 2, conv then 1, i.e. the final output of the model is [416,416,1].

As shown in fig. 3A, CBL in downsampling network 111 may be connected to an input of the YUnet model and Conv may be connected to an output of the YUnet model. A layer-jump connection is also made between the downsampling network 111 and the upsampling network 112, i.e. the output of the CBL in the dark 53 network is further layer-jumped to the last UP in the upsampling network 112, the output of the residual block Res1 in the dark 53 network is further layer-jumped to the next-to-last UP in the upsampling network 112, the output of the residual block Res2 in the dark 53 network is further layer-jumped to the next-to-last UP in the upsampling network 112, the output of the residual block Res8 (first Res 8) in the dark 53 network is further layer-jumped to the second UP in the upsampling network 112, and the output of the residual block Res8 (second Res 8) in the dark 53 network is further layer-jumped to the first UP in the upsampling network 112.

As shown in fig. 3B, the convolution-normalization linear unit (CBL) may be composed of a 2D convolution Layer (Convolutional Layer), a set of normalization layers (Batch Normalization Layer), and a leakage correction linear Layer (leakage return Layer), where the height and width are constant when the convolution Layer has a kernel=3, a stride=1, and a pad=1, and the height and width are constant when the convolution Layer has a kernel=1, a stride=1, and a pad=0, and the height and width are halved when the convolution Layer has a kernel=3, a stride=2, and a pad=1. As shown in fig. 3C, the residual unit (res_unit) is a residual structure, the number of output channels of the first CBL passing through is half of the number of input channels, the kernel=1, stride=1, pad=0 makes the width of the output feature map unchanged, the number of output channels of the second CBL is one time of the number of input channels, and kernel=3, stride=1, pad=1 makes the width of the output feature map unchanged, then the input of the module forms the residual structure, and the overall effect of the residual unit is that the width and thickness of the feature map are unchanged. As shown in fig. 3D, the residual module (Resn, n is a positive integer), whose input feature map passes through a CBL module with kernel=3, stride=2, pad=1, the effect is that the feature map is halved in width, and the thickness of the input feature map is doubled by setting the number of convolution kernels in the convolution layer, and then passes through n res_units, where the n res_units do not change the feature map in width nor thickness, and the overall effect of the residual module is that the feature map is halved in width. As shown in fig. 3E, the UP-sampling module (UP 1121) has two inputs (sampling input channels and non-sampling input channels) with the same thickness, the sampling input channels will first enter a CBL to halve the thickness of the input feature map, then go through an UP-sampling operation to double the height and width of the feature map (equivalent to an image amplifying operation using linear interpolation) and then go through a thickness dimension splicing (concat) of the input feature map of the non-sampling channels, the splicing result goes through a CBL 5 module to adjust the thickness to make the thickness of the output feature map half the thickness of the input feature map, the non-sampling input channels are indicated by bold lines in fig. 3E, the overall effect of the UP-sampling module is that the thickness of the feature map will be halved, and the height and width are aligned to the height and width of the non-sampling input feature map.

For example, the image input to the YUnet model is an RGB image with a height and width of 416×416, and the thickness of a typical RGB image is 3, i.e. the height, width, thickness of the input image may be represented as [416,416,3], and the image enters the downsampling network 111 (dark net53 network), the dark net53 network is the same as the dark net53 network in YOLOv3 shown in fig. 2, and the image features are extracted by using the dark net53 network, wherein the CBL in the dark net53 network changes the thickness of the feature map to 32 (the thickness of the feature map may also be changed to other values, such as 16 or 64, etc. according to the actual requirement), and the height and width are unchanged, i.e. the feature map of 416×32 is obtained; each residual module in the dark net53 network halves the feature map, doubles the thickness, and as shown in fig. 3A, there are 5 residual modules in total, each residual module obtains a feature map of 208×208×64, 104×104×128, 52×52×256, 26×26×512, 13×13×1024, the output of the last residual module (Res 4) in the dark net53 network passes through a cbl×5 module, the cbl×5 module does not change the feature map, the thickness is halved, i.e. the cbl×5 output is 13×13×512, the cbl×5 output enters the UP-sampling network 112, each UP-sampling module in the UP-sampling network 112 halves the feature map thickness, the aspect ratio is aligned to the aspect ratio of the non-sampled input, i.e. the aspect ratio is restored to 416, as shown in fig. 3A, for a total of 5 UP, each UP obtains a feature map of 26 x 256, 52 x 128, 104 x 64, 208 x 32, 416 x 16, and finally a CBL and a Conv are used to adjust the thickness of the feature map output by the last UP, for example, the CBL may first change the thickness to 2 to obtain a feature map 416×416×2, and the Conv may determine the value of n to change the thickness to n according to the actual use requirement, so as to obtain a feature split image with the feature map 416×416×n, that is, the output of the YUnet model is 416×416×n. For example, for two types of segmentation (segmentation into foreground and background), n takes a value of 1, i.e. the YUnet model outputs a segmented image of 416×416×1.

It will be appreciated that the number of upsampling modules included in the upsampling network 112 and the number of residual modules included in the downsampling network 111 may be adjusted according to actual usage requirements, and the number of upsampling modules included in the upsampling network 112 is preferably equal to the number of residual modules included in the downsampling network 111. For example, the upsampling network 112 shown in fig. 3A contains 5 upsampling modules, the downsampling network 111 contains 5 residual modules, and in other embodiments of the present application, the downsampling network 111 may contain 4 or 6 residual modules, and the corresponding upsampling network 112 may include 4 or 6 upsampling modules.

It will be appreciated that the number of n in the residual modules included in the downsampling network 111 may also be modified according to the actual usage requirements, such as changing all to Res1 modules or Res4, etc.

FIG. 4 is a flowchart illustrating steps of an embodiment of an image segmentation method based on the modified YOLOv3 model according to the present invention. The order of the steps in the flow diagrams may be changed, and some steps may be omitted, according to different needs. The image segmentation method based on the modified YOLOv3 model may be applied to the image segmentation apparatus 100 shown in fig. 5 based on the modified YOLOv3 model.

Referring to fig. 4, the image segmentation method based on the improved YOLOv3 model may specifically include the following steps.

Step S401, performing U-shaped expansion on a preset Yolov3 model to obtain an improved Yolov3 model comprising U-shaped sampling units.

In an embodiment, the preset YOLOv3 model may be the YOLOv3 model shown in fig. 2. And (3) performing U-shaped expansion on the preset Yolov3 model to obtain an improved Yolov3 model comprising a U-shaped sampling unit, wherein the structure of the U-shaped sampling unit is similar to that of the U-shaped structure of the U-Net model shown in fig. 1.

In one embodiment, the specific structure of the modified YOLOv3 model including the U-shaped sampling unit may be as shown in fig. 3A. The U-shaped sampling unit comprises a downsampling network 111 and an upsampling network 112. The downsampling network 111 may be formed by a dark net53 network of the preset YOLOv3 model, the upsampling network 112 may be formed by concatenating a plurality of upsampling modules in the preset YOLOv3 model, and the number of residual modules included in the downsampling network 111 is preferably equal to the number of upsampling modules included in the upsampling network 112. A layer-jump connection is further performed between the downsampling network 111 and the upsampling network 112, for example, the downsampling network 111 includes a CBL and N residual modules. The up-sampling network 112 includes N up-sampling modules, where the value of N may be set according to practical requirements, and N is preferably a positive integer. A layer-jump connection is also made between the downsampling network 111 and the upsampling network 112. The output end of the CBL of the downsampling network 111 is layer-jumped and connected to the nth upsampling module of the upsampling network 112, and the output ends of the first residual module to the N-1 th residual module of the downsampling network 111 are respectively layer-jumped and connected to the nth-1 upsampling module to the first upsampling module of the upsampling network 112.

In an embodiment, a convolution-normalization linear module (cbl×5) is further connected between the downsampling network 111 and the upsampling network 112, that is, an nth residual module of the downsampling network 111 is connected to a first upsampling module of the upsampling network 112 through cbl×5.

Step S403, training the improved YOLOv3 model by using a sample image of a preset training sample set to obtain an image segmentation model.

In an embodiment, the preset training sample set may be collected and constructed according to a model training requirement or an image sample library constructed in the prior art may be adopted, and the number of sample images of the preset training sample set may be adjusted according to an actual model training requirement, which is not limited herein. When model training is carried out, each sample image of the preset training sample set is sequentially input into the improved YOLOv3 model for training, and the image segmentation model can be obtained after training is completed.

In an embodiment, sample images of a preset training sample set may be first randomly divided into a training set and a test set according to a preset proportion, for example, 80% of sample images are training sets and 20% of sample images are test sets; training the improved YOLOv3 model by using the training set, testing the trained improved YOLOv3 model by using the testing set, and obtaining a model segmentation accuracy by statistics according to the testing result of each testing image; and finally judging whether the model segmentation accuracy is greater than a preset threshold, if so, indicating that the image segmentation effect of the trained improved YOLOv3 model meets the preset requirement, and taking the trained improved YOLOv3 model as the image segmentation model. If the model segmentation accuracy is not greater than the preset threshold, the image segmentation effect of the trained improved YOLOv3 model cannot meet the preset requirement, parameters of the improved YOLOv3 model can be adjusted, and the adjusted improved YOLOv3 model is retrained and tested until the model segmentation accuracy is greater than the preset threshold.

In an embodiment, the test set is used for testing the trained improved YOLOv3 model, a model segmentation average error can be obtained according to the test result statistics of each test image, whether the model segmentation average error is smaller than a preset error is judged, if the model segmentation average error is smaller than the preset error, the image segmentation effect of the trained improved YOLOv3 model meets the preset requirement, and the trained improved YOLOv3 model can be used as the image segmentation model; if the model segmentation average error is not smaller than the preset error, training and testing can be performed by adjusting the parameters of the improved YOLOv3 model.

In an embodiment, in order to further improve the training efficiency of the model and the model segmentation effect after training, the random preprocessing and normalization processing may be performed on each sample image of the preset training sample set before the training set and the test set are divided, or the random preprocessing and normalization processing may be performed on each sample image of the training set and the test set after the training set and the test set are divided. The sample image is normalized, so that the subsequent image data processing is more convenient, and the convergence in the subsequent model training process is accelerated. The pretreatment comprises any one of the following treatment modes: and carrying out vertical overturning treatment on the sample image, carrying out horizontal overturning treatment on the sample image, carrying out vertical and horizontal overturning treatment on the sample image, and carrying out no overturning treatment on the sample image. That is, for a sample image of the preset training sample set, a processing manner is selected randomly to process the sample image, for example, a sample image I1 of the preset training sample set may be subjected to vertical overturn processing, a sample image I2 may be subjected to vertical overturn processing and horizontal overturn processing, and a sample image I3 may not be subjected to overturn processing.

For example, the preset training sample set is a portrait data set, the portrait data set includes 2000 image samples, 1700 of the portrait data set can be used for training, 300 of the portrait data set can be used for testing, the image size of the image sample is 800 x 600, the image sample of the portrait data set can be stretched to 832 x 832 due to the fact that the image size of the model input is 416 x 416, then scaled to 416 x 416, then the image sample is subjected to random preprocessing, and the image sample is normalized by using mean variance.

In one embodiment, when the improved YOLOv3 model is trained, the loss function may be the same as the U-Net model shown in fig. 1, that is, an existing binary cross entropy loss function, which is not described in detail herein. The training process can use a mini-batch random gradient descent method plus impulse (Momentum) mode to train, the model training speed can be improved, the learning rate can be 0.01, the Momentum can be 0.9, and the weight attenuation (weight_decay) can be 0.0005. The learning rate decay Factor (Factor) may be 0.5 and the experience (number of training rounds without progress) may be 5. The Batch Size (number of samples selected for one training) for model training may be 8, the epoch (period) may be 100, and one period may be located as: one forward pass and one backward pass of all training samples, the model training process may be accelerated using 4 GPUs (e.g., model 1080 Ti).

In one embodiment, the trained image segmentation model is suitable for complex image segmentation tasks, and the image segmentation effect is better than that of the U-Net model due to the YOLO _V The 3 model is an excellent target detection model per se, YOLO _V The comprehensive performance of the Darknet-53 network of the 3 model is strong, and the image segmentation modelAs YOLO _V The U-shaped expansion of the model 3 can guarantee the performance in theory. Meanwhile, the data experiment shows that the image segmentation effect of the image segmentation model is stronger than that of a U-net model, for example, the image segmentation is carried out by using the portrait data in the prior paper Automatic Portrait Segmentation for Image Stylization, the Dice coefficient of the image segmentation model is 97.09%, and the Dice coefficient of the U-net model is 95.55%.

And step S405, performing image segmentation processing on the image to be processed by using the image segmentation model.

In an embodiment, when the image segmentation model is obtained through training, the image segmentation model may be used to perform image segmentation processing on the image to be processed. Specifically, an image to be processed may be input to the image segmentation model, and an image segmentation result of the image to be processed may be obtained by an output of the image segmentation model.

According to the method provided by the embodiment, the U-shaped expansion is carried out on the preset Yolov3 model to obtain the improved Yolov3 model comprising the U-shaped sampling unit, then the sample image of the preset training sample set is utilized to train the improved Yolov3 model to obtain the image segmentation model, then the image segmentation model is utilized to carry out image segmentation processing on the image to be processed, and the Darknet53 comprising a residual structure is adopted as a downsampling network, so that the depth of the sampling network is large, the model expression capacity is high, the whole model is easy to train, the model is further applicable to complex image segmentation scenes, and the image segmentation effect is superior to that of the traditional U-Net model.

Referring to fig. 5, the present invention also correspondingly discloses an image segmentation apparatus 100 based on the improved YOLOv3 model based on the image segmentation method based on the improved YOLOv3 model disclosed in the above embodiment.

The image segmentation apparatus 100 based on the modified YOLOv3 model may include an expansion module 101, a training module 102, a processing module 103, and a preprocessing module 104.

The expansion module 101 is configured to perform U-shape expansion on the preset YOLOv3 model, so as to obtain an improved YOLOv3 model including a U-shaped sampling unit. The U-shaped sampling unit includes a down-sampling network 111 and an up-sampling network 112, where the down-sampling network 111 and the up-sampling network 112 are connected in a layer-skipping manner, the down-sampling network 111 may be formed by a Darknet53 network of the YOLOv3 model shown in fig. 2, and the up-sampling network 112 may be formed by connecting a plurality of up-sampling modules in the YOLOv3 model shown in fig. 2 in series.

The training module 102 is configured to train the improved YOLOv3 model by using a sample image of a preset training sample set, so as to obtain an image segmentation model.

The processing module 103 is configured to perform image segmentation processing on an image to be processed by using the image segmentation model.

In some alternative implementations of the present embodiment, a convolution-normalization linear module (cbl×5) is further connected between the downsampling network 111 and the upsampling network 112, where the convolution-normalization linear module includes five serially connected convolution-normalization linear units (CBL), and the convolution-normalization linear units (CBL) may include a 2D convolution Layer (Convolutional Layer), a normalization Layer (Batch Normalization Layer), and a leakage correction linear Layer (leak Relu Layer).

In some alternative implementations of the present embodiment, the number of residual modules included in the downsampling network 111 is equal to the number of upsampling modules included in the upsampling network 112.

In some optional implementations of this embodiment, the downsampling network 111 includes a convolution-normalization linear unit and N residual modules, the upsampling network 112 includes N upsampling modules, an output of the convolution-normalization linear unit (CBL) of the downsampling network 111 is layer-jumped to the nth upsampling module, outputs of the first residual module to the nth-1 residual module are layer-jumped to the nth-1 upsampling module to the first upsampling module, respectively, and the nth residual module is connected to the first upsampling module through the convolution-normalization linear module (cbl×5).

In some optional implementations of this embodiment, the training module 102 trains the modified YOLOv3 model using the sample images of the preset training sample set, including: dividing the sample image of the preset training sample set into a training set and a testing set; training the improved YOLOv3 model by utilizing the training set; testing the trained improved YOLOv3 model by using the test set, and obtaining a model segmentation accuracy according to statistics of each test result; judging whether the model segmentation accuracy is greater than a preset threshold; and if the model segmentation accuracy is greater than the preset threshold, taking the trained improved YOLOv3 model as the image segmentation model.

In some optional implementations of the present embodiment, before the training module 102 divides the sample images of the preset training sample set into the training set and the test set, the preprocessing module 104 performs random preprocessing and normalization processing on each sample image of the preset training sample set; wherein the pretreatment comprises one of the following treatment modes: and carrying out vertical overturning treatment on the sample image, carrying out horizontal overturning treatment on the sample image, carrying out vertical and horizontal overturning treatment on the sample image, and carrying out no overturning treatment on the sample image.

In some alternative implementations of the present embodiment, the loss function of the modified YOLOv3 model is a binary cross entropy loss function.

It should be noted that, the image segmentation apparatus 100 based on the modified YOLOv3 model may be a chip, a component or a module, the image segmentation apparatus 100 based on the modified YOLOv3 model may include a processor and a memory, and the expansion module 101, the training module 102, the processing module 103, the preprocessing module 104, and the like are all stored as program units in the memory, and the processor executes the above-mentioned program units stored in the memory to implement corresponding functions.

The processor may include a core, and the core may call the memory to retrieve the corresponding program unit. The kernel may be provided with one or more, and the technical effects described above of the present application are achieved by adjusting kernel parameters.

The memory may include volatile memory, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), in a computer readable storage medium, the memory including at least one memory chip.

Referring to fig. 6, the invention also correspondingly discloses an electronic device 200 based on the image segmentation method based on the improved YOLOv3 model disclosed in the above embodiment.

The electronic device 200 may comprise a memory 10, a processor 20, a computer program 30 stored in said memory 10 and executable on said processor 20. The steps of the embodiment of the image segmentation method based on the modified YOLOv3 model described above, such as steps S401 to S405 shown in fig. 4, may be implemented by the processor 20 when executing the computer program 30. Alternatively, the processor 20, when executing the computer program 30, implements the functions of the modules in the embodiment of the image segmentation apparatus 100 based on the modified YOLOv3 model, such as the modules 101-104 in fig. 5.

The computer program 30 may be split into one or more modules that are stored in the memory 10 and executed by the processor 20 to complete the present invention. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device 200.

It is to be understood that the schematic diagram is merely an example of the electronic device 200, and does not constitute a limitation of the electronic device 200, and may include more or less components than those illustrated, or may combine some components, or different components, e.g., the electronic device 200 may further include a network access device (not shown), a display device (not shown), a communication bus (not shown), etc.

The processor 20 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor 20 may be any conventional processor or the like.

The memory 10 may be used to store computer programs 30 and/or modules, and the processor 20 implements the various functions of the electronic device 200 by running or executing the computer programs 30 and/or modules stored within the memory 10, and invoking data stored within the memory 10. Memory 10 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

An embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image segmentation method based on the improved YOLOv3 model provided by the above method embodiments.

An embodiment of the present invention provides a computer program product which, when executed on a data processing apparatus, causes the data processing apparatus to implement the image segmentation method based on the improved YOLOv3 model provided by the above method embodiments.

According to the image segmentation method, the device, the electronic equipment and the computer readable storage medium based on the improved YOLOv3 model, the Darknet53 containing a residual structure is adopted as a downsampling network, the depth of the sampling network is large, the model expression capacity is high, the whole model is easy to train, the image segmentation method and the device are suitable for complex image segmentation scenes, and the image segmentation effect is superior to that of a traditional U-Net model.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Other corresponding changes and modifications will occur to those skilled in the art in light of the present teachings and the actual needs of the invention in connection with production, and such changes and modifications are intended to be within the scope of the present disclosure.

Claims

1. An image segmentation method based on an improved YOLOv3 model, which is characterized by comprising the following steps:

performing U-shaped expansion on a preset Yolov3 model to obtain an improved Yolov3 model comprising a U-shaped sampling unit, wherein the U-shaped sampling unit comprises a down-sampling network and an up-sampling network, the down-sampling network and the up-sampling network are connected in a layer-skipping manner, the down-sampling network is formed by a Darknet53 network of the preset Yolov3 model, the up-sampling network is formed by connecting a plurality of up-sampling modules in the preset Yolov3 model in series, a convolution-standardization linear module is further connected between the down-sampling network and the up-sampling network, the convolution-standardization linear module comprises a plurality of convolution-standardization linear units connected in series, the down-sampling network comprises a convolution-standardization linear unit and N residual modules, the up-sampling network comprises N up-sampling modules, the output ends of the convolution-standardization linear units of the down-standardization linear unit are connected to the N-th up-sampling module in a layer-skipping manner, the output ends of a first residual module to the N-1 th residual module are respectively connected to the first convolution-standardization linear module through the first convolution-standardization linear module;

2. The method of claim 1, wherein the convolution-normalization linear module comprises five serially connected convolution-normalization linear units consisting of a convolution layer, a batch of normalization layers, and a leakage correction linear layer.

3. The method of claim 2, wherein the downsampling network comprises a number of residual modules equal to a number of upsampling modules comprised by the upsampling network.

4. A method according to any one of claims 1 to 3, wherein training the modified YOLOv3 model with sample images of a preset training sample set comprises:

training the improved YOLOv3 model by utilizing the training set;

5. The method of claim 4, wherein prior to dividing the sample images of the predetermined training sample set into training sets and test sets, comprising:

6. A method according to any one of claims 1 to 3, characterized in that the loss function of the modified YOLOv3 model is a binary cross entropy loss function.

7. An image segmentation apparatus based on an improved YOLOv3 model, the apparatus comprising:

the expansion module is used for carrying out U-shaped expansion on a preset Yolov3 model to obtain an improved Yolov3 model comprising a U-shaped sampling unit, wherein the U-shaped sampling unit comprises a down-sampling network and an up-sampling network, the down-sampling network and the up-sampling network are connected in a layer-skipping manner, the down-sampling network is formed by a Darknet53 network of the preset Yolov3 model, the up-sampling network is formed by connecting a plurality of up-sampling modules in the preset Yolov3 model in series, a convolution-standardization linear module is further connected between the down-sampling network and the up-sampling network, the convolution-standardization linear module comprises a plurality of convolution-standardization linear units connected in series, the down-sampling network comprises a convolution-standardization linear unit and N up-sampling modules, the output end of the convolution-standardization linear unit of the down-sampling network is connected to the N up-sampling module in a layer-skipping manner, the output ends of the first residual module to the N-1 th up-sampling module are respectively connected to the first convolution-standardization linear module through the first convolution-up-standardization residual module;

8. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the improved YOLOv3 model-based image segmentation method of any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the program, when executed by a processor, implements the image segmentation method based on the improved YOLOv3 model as claimed in any one of claims 1 to 6.