CN115690787A - Semantic segmentation method, image processing apparatus, and computer-readable storage medium - Google Patents

Semantic segmentation method, image processing apparatus, and computer-readable storage medium Download PDF

Info

Publication number
CN115690787A
CN115690787A CN202211390885.1A CN202211390885A CN115690787A CN 115690787 A CN115690787 A CN 115690787A CN 202211390885 A CN202211390885 A CN 202211390885A CN 115690787 A CN115690787 A CN 115690787A
Authority
CN
China
Prior art keywords
feature map
fusion
semantic segmentation
module
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211390885.1A
Other languages
Chinese (zh)
Inventor
陈勇全
于佳辉
马天宇
黄锐
郝祁
马睿
曾祥容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Shenzhen Institute of Artificial Intelligence and Robotics
Original Assignee
Chinese University of Hong Kong Shenzhen
Shenzhen Institute of Artificial Intelligence and Robotics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen, Shenzhen Institute of Artificial Intelligence and Robotics filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202211390885.1A priority Critical patent/CN115690787A/en
Publication of CN115690787A publication Critical patent/CN115690787A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a semantic segmentation method, image processing equipment and a computer readable storage medium, which are used for performing semantic segmentation on an original image under the condition of improving segmentation accuracy. The method of the embodiment of the application comprises the following steps: after an original image is obtained, inputting the original image into a multi-scale feature extraction module of a semantic segmentation model to obtain a first feature map of N-level scales of the original image output by the multi-scale feature extraction module; n is an integer greater than or equal to 2; the semantic segmentation model is a pre-trained model, the first feature maps of N-level scales are respectively input into a target fusion module of the semantic segmentation model, the target fusion module performs at least one fusion processing on the first feature maps of the N-level scales to obtain target fusion feature maps of the N-level scales output by the target fusion module, and a semantic segmentation image corresponding to an original image is generated according to the target fusion feature maps of the N-level scales.

Description

Semantic segmentation method, image processing apparatus, and computer-readable storage medium
Technical Field
The embodiment of the application relates to the field of image processing, in particular to a semantic segmentation method, an image processing device and a computer readable storage medium.
Background
At present, the automobile industry is in a revolutionary era, automatic driving is the function that a vehicle can drive by itself under the condition of no driver operation, and related technology development and application are fierce and have various development paths, such as single vehicle intelligence, vehicle path cooperation, networking cloud control and the like. The automatic driving technology senses the conditions of the vehicle and the surrounding environment in real time through a sensor, carries out planning decision through an intelligent system, and finally executes driving operation through a control system. Although many scientific researchers are in research in related technical industries, the segmentation field of automatic driving has many defects, such as too slow recognition of moving objects, too low accuracy, inaccurate prediction of static objects such as lane lines, etc., which may cause serious traffic accidents and harm personal and property safety of people. Therefore, the comprehensiveness and accuracy of image feature extraction in the semantic segmentation of the automatic driving image play a key role in the whole automobile industry.
The existing semantic segmentation method is a semantic segmentation method of a local Pyramid Occupied Network (PON), wherein the local Pyramid Occupied Network (PON) uses a residual error module as a trunk feature extraction network, and comprises a Transformer module which operates under a plurality of image scales in a deep convolutional neural network architecture.
However, the comprehensiveness and accuracy of the transform module for extracting the image features are poor, resulting in low segmentation accuracy, i.e. the cross-over ratio (IOU) of the semantic segmentation image output by the model and the semantic segmentation image of the label is low.
Disclosure of Invention
The embodiment of the application provides a semantic segmentation method, an image processing device and a computer readable storage medium, which are used for performing semantic segmentation on an original image under the condition of improving the intersection ratio of a semantic segmentation image output by a model and a semantic segmentation image of a label.
In a first aspect, an embodiment of the present application provides a semantic segmentation method, including:
after an original image is obtained, inputting the original image into a multi-scale feature extraction module of a semantic segmentation model to obtain a first feature map of N-level scale of the original image output by the multi-scale feature extraction module; the N is an integer greater than or equal to 2; the semantic segmentation model is a pre-trained model;
respectively inputting the first feature maps of the N-level scales into a target fusion module of the semantic segmentation model, and obtaining the target fusion feature maps of the N-level scales output by the target fusion module after the target fusion module performs at least one fusion process on the first feature maps of the N-level scales;
and generating a semantic segmentation image corresponding to the original image according to the target fusion feature map with the N-level scale.
Optionally, before the original image is input into the multi-scale feature extraction module of the semantic segmentation model, the method further includes:
preprocessing the original image to obtain a preprocessed original image;
the multi-scale feature extraction module for inputting the original image into the semantic segmentation model comprises:
and inputting the preprocessed original image into the multi-scale feature extraction module.
Optionally, the target fusion module includes a first fusion module and a second fusion module;
after the target fusion module performs at least one fusion process on the first feature map with the N-level scale, the target fusion feature map with the N-level scale output by the target fusion module is obtained, which includes:
performing first fusion processing on the first feature map of the N-level scale by the first fusion module to obtain a first fusion feature map of the N-level scale output by the first fusion module;
performing second fusion processing on the first fusion feature map of the N-level scale by the second fusion module to obtain a second fusion feature map of the N-level scale output by the second fusion module; and the second fusion feature map of the N-level scale is the target fusion feature map of the N-level scale.
Optionally, the performing, by the first fusion module, the first fusion processing on the first feature map of the N-level scale to obtain the first fusion feature map of the N-level scale output by the first fusion module includes:
for the first fusion feature map of the (n-1) th-level scale, inputting the first feature map of the (n) th-level scale into an up-sampling module of the semantic segmentation model, and performing up-sampling processing by the up-sampling module to obtain an up-sampling feature map of the (n) th-level scale; wherein N is more than or equal to 2 and less than or equal to N;
performing, by the first fusion module, first fusion processing on the upsampling feature map of the nth-level scale and the first feature map of the nth-1-level scale to obtain a first fusion feature map of the nth-1-level scale output by the first fusion module;
and regarding the first fusion feature map of the nth scale, taking the first feature map of the nth scale as the first fusion feature map of the nth scale.
Optionally, the performing, by the second fusion module, the second fusion processing on the first fusion feature map of the N-level scale to obtain a second fusion feature map of the N-level scale output by the second fusion module includes:
regarding the second fused feature map of the 1 st-level scale, taking the first fused feature map of the 1 st-level scale as the second fused feature map of the 1 st-level scale;
for the second fusion feature map of the nth scale, inputting the first feature map of the (n-1) th scale into a downsampling module of the semantic segmentation model, and carrying out downsampling processing by the downsampling module to obtain a downsampling feature map of the (n-1) th scale; wherein N is more than or equal to 2 and less than or equal to N;
performing, by the second fusion module, second fusion processing on the down-sampled feature map of the (n-1) th scale and the first fusion feature map of the (n) th scale to obtain a second fusion feature map of the (n) th scale output by the second fusion module;
optionally, the generating a semantic segmentation image corresponding to the original image according to the target fusion feature map of the N-level scale includes:
inputting the target fusion feature map with the N-level scale into a global convolution module of the semantic segmentation model to obtain a second feature map with the N-level scale output by the global convolution module;
and generating the semantic segmentation image corresponding to the original image according to the second feature map of the N-level scale.
Optionally, the generating the semantic segmentation image corresponding to the original image according to the second feature map of the N-level scale includes:
inputting the second feature map of the N-level scale into an edge refining module of the semantic segmentation model to obtain a third feature map of the N-level scale output by the edge refining module;
and generating the semantic segmentation image corresponding to the original image according to the third feature map.
Optionally, before the original image is input into the multi-scale feature extraction module of the semantic segmentation model, the method further includes:
obtaining an original image sample; wherein, each original image sample is marked with a semantic segmentation image;
inputting the original image sample into a semantic segmentation model to obtain a predicted semantic segmentation image corresponding to the original image sample output by the semantic segmentation model;
and calculating the loss between the predicted semantic segmentation image and the annotated semantic segmentation image according to a regression loss function, and obtaining a trained semantic segmentation model when the loss meets a convergence condition.
In a second aspect, an embodiment of the present application provides an image processing apparatus, including:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient storage memory or a persistent storage memory;
the central processor is configured to communicate with the memory and execute the instruction operations in the memory to perform the aforementioned semantic segmentation method.
In a third aspect, embodiments of the present application provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the foregoing. . A method.
In a fourth aspect, the present application provides a computer program product containing instructions for causing a computer to perform the aforementioned semantic segmentation method when the computer program product runs on the computer.
According to the technical scheme, the embodiment of the application has the following advantages: after the first feature map of the original image with the N-level scale is obtained, the first feature map of the original image with the N-level scale is respectively input into a target fusion module of the semantic segmentation model, and the target fusion module performs at least one fusion processing on the first feature map of the original image with the N-level scale, so that the comprehensiveness and the accuracy of image feature extraction are improved, and the intersection and parallel ratio of the semantic segmentation image output by the model and the semantic segmentation image of the label is improved.
Drawings
Fig. 1 is a schematic flow chart of a semantic segmentation method disclosed in an embodiment of the present application;
FIG. 2 is a schematic flowchart of a method for obtaining a semantic segmentation image by inputting an original image into a semantic segmentation model for semantic segmentation, disclosed in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a global convolution module disclosed in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an edge refining module disclosed in an embodiment of the present application;
fig. 5 is a schematic structural diagram of another image processing apparatus disclosed in the embodiment of the present application.
Detailed Description
The embodiment of the application provides a semantic segmentation method, an image processing device and a computer readable storage medium, which are used for performing semantic segmentation on an original image under the condition of improving the intersection ratio of a semantic segmentation image output by a model and a semantic segmentation image of a label.
Referring to fig. 1, fig. 1 is a schematic flow chart of a semantic segmentation method disclosed in an embodiment of the present application, where the method includes:
101. after an original image is obtained, inputting the original image into a multi-scale feature extraction module of a semantic segmentation model to obtain a first feature map of N-level scales of the original image output by the multi-scale feature extraction module; n is an integer greater than or equal to 2; the semantic segmentation model is a pre-trained model.
In the embodiment, when performing semantic segmentation, an original image can be obtained, and after the original image is obtained, the original image can be input into a multi-scale feature extraction module of a semantic segmentation model to obtain a first feature map of an N-level scale of the original image output by the multi-scale feature extraction module; n is an integer greater than or equal to 2; the semantic segmentation model is a pre-trained model.
102. And respectively inputting the first feature maps with the N-level scales into a target fusion module of the semantic segmentation model, and performing fusion processing on the first feature maps with the N-level scales at least once by using the target fusion module to obtain the target fusion feature maps with the N-level scales output by the target fusion module.
After the first feature map of the original image of the N-level scale output by the multi-scale feature extraction module is obtained, the first feature map of the N-level scale may be input into the target fusion module of the semantic segmentation model, and the target fusion module performs at least one fusion process on the first feature map of the N-level scale to obtain the target fusion feature map of the N-level scale output by the target fusion module. It can be understood that, the method of at least one time of fusion processing may be that the first feature map of N-level scale is first subjected to the first fusion processing to obtain the first fusion feature map of N-level scale, and then the first fusion feature map of N-level scale is subjected to the second fusion processing to obtain the second fusion feature map of N-level scale; the second fusion feature map of the N-level scale is a target fusion feature map of the N-level scale, and may also be another reasonable method for performing at least one fusion process, which is not limited herein.
103. And generating a semantic segmentation image corresponding to the original image according to the target fusion feature map with the N-level scale.
After the target fusion feature map of N-level scale output by the target fusion module is obtained, the semantic segmentation image corresponding to the original image can be generated according to the target fusion feature map of N-level scale.
In the embodiment of the application, after the first feature maps of N-level scales of the original image are obtained, the first feature maps of the N-level scales are respectively input into the target fusion module of the semantic segmentation model, and the target fusion module performs at least one fusion processing on the first feature maps of the N-level scales, so that the comprehensiveness and the accuracy of image feature extraction are improved, and the intersection ratio of the semantic segmentation image output by the model and the semantic segmentation image of the label is improved.
In this embodiment of the application, after the target fusion module performs at least one fusion process on the first feature map with the N-level scale, there may be multiple methods for obtaining the target fusion feature map with the N-level scale output by the target fusion module, and one of the methods is described below based on the semantic segmentation method shown in fig. 1.
In this embodiment, when performing semantic segmentation, an original image can be obtained. Specifically, the semantic segmentation means that many objects in the original image, such as drivable areas, lane lines, buses, cars, pedestrians, obstacles, and the like, are semantically segmented into semantic segmentation maps (bird's-eye views BEV). The method for obtaining the original image may be obtained from an automatically driven public large-scale data set (Nuscenes data set), the collection mode of the Nuscenes data set may be collection through a collection vehicle in 1000 scenes of different cities, wherein the collection vehicle is equipped with a perfect sensor, and the perfect sensor includes 6 Cameras (CAM), 1 LIDAR (LIDAR), 5 millimeter wave RADARs (RADAR), IMU and GPS, and may also be other reasonable methods for obtaining the original image, which is not limited herein.
After the original image is obtained, the original image may be preprocessed to obtain a preprocessed original image. Specifically, the preprocessing method may be normalization, scaling, and the like, where the normalization refers to converting the pixel value of the original image to 0-1, and the scaling refers to scaling the height and width of the input image using a resize () function, and it should be understood that preprocessing the original image may reduce the dependence of the network model on certain features, and improve the generalization and robustness of the model.
After the preprocessed original image is obtained, the preprocessed original image can be input into a multi-scale feature extraction module of a semantic segmentation model, and a first feature map of N-level scales of the original image output by the multi-scale feature extraction module is obtained; n is an integer greater than or equal to 2; the semantic segmentation model is a pre-trained model. It can be understood that the pre-trained semantic segmentation model may be an improved FPN novel pyramid occupation network semantic segmentation model, the multi-scale feature extraction module may be a residual module (ResNet 50 module), for example, the size of the preprocessed original image may be 3 × 196 × 200, the preprocessed original image of 3 × 196 × 200 is input into the ResNet50 module to obtain a first feature map of an N-scale of the original image output by the ResNet50 module, please refer to fig. 2, fig. 2 is a schematic flow chart of a method for semantic segmenting the original image input into the semantic segmentation model to obtain the semantic segmentation image disclosed in this embodiment, N is 5 in fig. 2, the first feature map of a 1-scale may be a first feature map of a 1/8-scale, the first feature map of a 2-scale may be a first feature map of a 1/16-scale, the first feature map of a 3-scale may be a first feature map of a 1/32-scale, the first feature map of a 4-scale may be a first feature map of a 1/16-scale, and the first feature map of a smaller scale may be a size of a first feature map, and the size is a lower than a first feature map of a scale, and the size is a first feature map of a second scale, which is a lower than a scale, and a scale, the size is a scale of a second scale is a scale, and a scale.
After the first feature maps of the N-level scales of the original image output by the multi-scale feature extraction module are obtained, the first feature maps of the N-level scales can be respectively input into a target fusion module of the semantic segmentation model, and the target fusion module performs at least one fusion processing on the first feature maps of the N-level scales to obtain target fusion feature maps of the N-level scales output by the target fusion module.
The method for obtaining the target fusion feature map of the N-level scale output by the target fusion module after performing at least one fusion process on the first feature map of the N-level scale by the target fusion module may be that the first fusion module performs a first fusion process on the first feature map of the N-level scale to obtain a first fusion feature map of the N-level scale output by the first fusion module, and then the second fusion module performs a second fusion process on the first fusion feature map of the N-level scale to obtain a second fusion feature map of the N-level scale output by the second fusion module; the target fusion module comprises a first fusion module and a second fusion module, and the second fusion feature map of the N-level scale is a target fusion feature map of the N-level scale.
The first fusion module performs first fusion processing on the first feature map with the N-level scale to obtain a first fusion feature map with the N-level scale output by the first fusion module, and for the first fusion feature map with the N-1-level scale, the first feature map with the N-level scale is input into an upsampling module of a semantic segmentation model, and the upsampling module performs upsampling processing to obtain an upsampling feature map with the N-level scale; wherein N is more than or equal to 2 and less than or equal to N; then, a first fusion module carries out first fusion processing on the up-sampling feature map of the nth scale and the first feature map of the nth-1 scale to obtain a first fusion feature map of the nth-1 scale output by the first fusion module; and regarding the first fusion feature map of the nth scale, taking the first feature map of the nth scale as the first fusion feature map of the nth scale.
With reference to fig. 2, as can be seen from fig. 2, for the first fused feature map of the 1 st-level scale, the first feature map of the 2 nd-level scale (1/16 of the first feature map) may be input into an upsampling module of the semantic segmentation model, the upsampling module performs upsampling to obtain an upsampled feature map of the 2 nd-level scale (1/8 of the upsampled feature map obtained after the upsampling is performed on the first feature map of 1/16), and then the first fusing module performs first fusing to the upsampled feature map of the 2 nd-level scale and the first feature map of the 1 st-level scale to obtain the first fused feature map of the 1 st-level scale output by the first fusing module, where it can be understood that the method for obtaining the first fused feature map of the 2 nd, 3 th, and 4 th-level scales is similar to the method for calculating the first fused feature map of the 1 st-level scale, and is not described herein again. For the first fused feature map at the 5 th-order scale, the first feature map at the 5 th-order scale may be used as the first fused feature map at the 5 th-order scale.
The second fusion module performs second fusion processing on the first fusion feature map with the N-level scale to obtain a second fusion feature map with the N-level scale output by the second fusion module, where for the second fusion feature map with the 1 st-level scale, the first fusion feature map with the 1 st-level scale is used as the second fusion feature map with the 1 st-level scale; for the second fusion feature map of the nth scale, the first feature map of the (n-1) th scale is input into a down-sampling module of the semantic segmentation model, and the down-sampling module performs down-sampling processing to obtain a down-sampling feature map of the (n-1) th scale; wherein N is more than or equal to 2 and less than or equal to N; and then, a second fusion module carries out second fusion processing on the down-sampling feature map of the (n-1) th scale and the first fusion feature map of the (n) th scale to obtain a second fusion feature map of the (n) th scale output by the second fusion module.
With continuing reference to fig. 2, as can be seen from fig. 2, for the second fused feature map at the level 1 scale, the first fused feature map at the level 1 scale can be used as the second fused feature map at the level 1 scale; for the second fusion feature map of the 2 nd-level scale, the first feature map of the 1 st-level scale is input into a down-sampling module of the semantic segmentation model, and the down-sampling module performs down-sampling processing to obtain a down-sampling feature map of the 1 st-level scale; and then, a second fusion module performs second fusion processing on the downsampled feature map with the 1 st scale and the first fusion feature map with the 2 nd scale to obtain a second fusion feature map with the 2 nd scale output by the second fusion module, and the method for obtaining the second fusion feature maps with the 3 rd, 4 th and 5 th scales is similar to the method for obtaining the second fusion feature map with the 2 nd scale, and is not repeated here.
It is worth mentioning that the two times of fusion processing are carried out, the comprehensiveness and the accuracy of image feature extraction are improved, and the segmentation accuracy is improved, namely the intersection ratio of the semantic segmentation image output by the model and the semantic segmentation image of the label is improved.
After the target fusion feature map of N-level scale output by the target fusion module is obtained, the semantic segmentation image corresponding to the original image can be generated according to the target fusion feature map of N-level scale.
The method for generating the semantic segmentation image corresponding to the original image according to the target fusion feature map of the N-level scale may be to input the target fusion feature map of the N-level scale into a global convolution module of a semantic segmentation model, and obtain a second feature map of the N-level scale output by the global convolution module. Please refer to fig. 3, fig. 3 is a schematic structural diagram of a global convolution module disclosed in an embodiment of the present application, where 14 in fig. 3 is a semantic category number, the global convolution module has two branches, respectively 1 × k + k × 1, it should be understood that k × k is replaced by this way, and a non-linear activation function is not connected to the global convolution module afterwards, which may reduce the amount of computation under the condition of ensuring a certain sense field, and may improve the detection speed on the premise of ensuring the accuracy, thereby improving the timeliness, and secondly, the global convolution module replaces full connection and global pooling, thereby reducing the possibility of losing positioning information and improving the positioning performance.
After the second feature map of the N-level scale output by the global convolution module is obtained, the second feature map of the N-level scale may be input into an edge refinement module of the semantic segmentation model to obtain a third feature map of the N-level scale output by the edge refinement module, and then a semantic segmentation image corresponding to the original image may be generated according to the third feature map. Referring to fig. 4, fig. 4 is a schematic structural diagram of an edge refinement module disclosed in an embodiment of the present application, fig. 4 is a structure of residual connection, the top w × h × 14 is a rough feature map, and residual connection at the side edge can improve the boundary of an identified object to achieve an edge refinement effect, thereby enhancing the IOU of a semantic segmentation image.
It is worth mentioning that before the original image is input into the multi-scale feature extraction module of the semantic segmentation model, the semantic segmentation model can be trained, and the method for training the semantic segmentation model can be to obtain an original image sample firstly; wherein, each original image sample is marked with a semantic segmentation image; and finally, calculating the loss between the predicted semantic segmentation image and the annotated semantic segmentation image according to a regression loss function, and obtaining the trained semantic segmentation model when the loss meets a convergence condition. Specifically, training can be performed by an Adam optimizer with a learning rate of 0.001.
It is worth mentioning that, compared with the local Pyramid Occupied Network (PON) in the prior art, the improved pyramid feature extraction network of the embodiment extracts semantics more sufficiently, and the generated result has higher accuracy, and a single detection can only take 0.2 second under the condition that the computer uses the GPU (NVIDIARTX 3090). Compared with a monocular semantic occupation mesh mapping (VED) method of a convolution V fractal encoder-decoder network, namely a method for predicting a semantic segmentation image by using an encoding and decoding network model with RGB video as input, the embodiment improves the comprehensiveness of feature extraction and the accuracy of semantic segmentation, and improves the semantic segmentation effect on dynamic objects, so that the method can be suitable for actual driving scenes. Compared with a cross-view semantic segmentation (VPN) method of a perception environment, namely a method for building surrounding objects through a view relation module to perform semantic segmentation, the embodiment improves the integrity of spatial information, ensures the accuracy of semantic segmentation of small targets, and also ensures the accuracy of semantic segmentation of large target points, thereby improving the accuracy of integral semantic segmentation and improving IOU.
It can be understood that, besides the above-described method of performing at least one fusion processing on the first feature map of the N-level scale by the target fusion module, the method of performing the first fusion processing on the first feature map of the N-level scale by the first fusion module, the method of performing the second fusion processing on the first fusion feature map of the N-level scale by the second fusion module, and the method of generating the semantic segmentation image corresponding to the original image according to the target fusion feature map of the N-level scale, other reasonable methods are also possible, and are not limited herein.
In this embodiment, after the N-level scale first feature map of the original image is obtained, the N-level scale first feature map is input to the target fusion module of the semantic segmentation model, and the target fusion module performs at least one fusion process on the N-level scale first feature map, so that the comprehensiveness and accuracy of image feature extraction are improved, and the segmentation accuracy is improved, that is, the intersection-parallel ratio (IOU) between the semantic segmentation image output by the model and the semantic segmentation image of the label is improved. Secondly, the target fusion feature map with the N-level scale can be input into a global convolution module of the semantic segmentation model for feature extraction, so that a second feature map with the N-level scale output by the global convolution module is obtained, the comprehensiveness and accuracy of image feature extraction are improved, and the problem of incomplete use of a feature layer in the aspect of a model is solved, so that the segmentation accuracy is improved, and the IOU is improved. Moreover, the second feature map of N-level scale output by the global convolution module can be input into the edge refinement module to obtain a third feature map of N-level scale output by the edge refinement module, so that the edge refinement degree is improved, and the segmentation accuracy is improved. Finally, a series of feature extraction is carried out through the multi-scale feature extraction module, the target fusion module, the global convolution module and the edge refinement module, the integrity of the semantic segmentation model can be improved, and the semantic segmentation speed can be improved while more semantic features are extracted.
Referring to fig. 5, an embodiment of the image processing apparatus 500 in the embodiment of the present application includes:
a central processing unit 501, a memory 505, an input/output interface 504, a wired or wireless network interface 503 and a power supply 502;
memory 505 is a transient storage memory or a persistent storage memory;
the central processor 501 is configured to communicate with the memory 505 and execute the instructions in the memory 505 to perform the methods described in the embodiments of fig. 1.
The embodiment of the present application further provides a computer-readable storage medium, which includes instructions, when the instructions are executed on a computer, cause the computer to execute the method in the foregoing embodiment shown in fig. 1.
The embodiment of the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method in the foregoing embodiment shown in fig. 1.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims (10)

1. A method of semantic segmentation, comprising:
after an original image is obtained, inputting the original image into a multi-scale feature extraction module of a semantic segmentation model to obtain a first feature map of N-level scales of the original image output by the multi-scale feature extraction module; the N is an integer greater than or equal to 2; the semantic segmentation model is a pre-trained model;
respectively inputting the first feature maps of the N-level scales into a target fusion module of the semantic segmentation model, and obtaining the target fusion feature maps of the N-level scales output by the target fusion module after the target fusion module performs at least one fusion process on the first feature maps of the N-level scales;
and generating a semantic segmentation image corresponding to the original image according to the target fusion feature map with the N-level scale.
2. The method of claim 1, wherein prior to inputting the original image into a multi-scale feature extraction module of a semantic segmentation model, the method further comprises:
preprocessing the original image to obtain a preprocessed original image;
the multi-scale feature extraction module for inputting the original image into the semantic segmentation model comprises:
and inputting the preprocessed original image into the multi-scale feature extraction module.
3. The method of claim 1, wherein the target fusion module comprises a first fusion module and a second fusion module;
after the target fusion module performs at least one fusion process on the first feature map with the N-level scale, the target fusion feature map with the N-level scale output by the target fusion module is obtained, which includes:
performing first fusion processing on the first feature map of the N-level scale by the first fusion module to obtain a first fusion feature map of the N-level scale output by the first fusion module;
performing second fusion processing on the first fusion feature map of the N-level scale by the second fusion module to obtain a second fusion feature map of the N-level scale output by the second fusion module; and the second fusion feature map of the N-level scale is the target fusion feature map of the N-level scale.
4. The method according to claim 3, wherein the performing, by the first fusion module, the first fusion process on the first feature map of the N-level scale to obtain the first fused feature map of the N-level scale output by the first fusion module includes:
for the first fusion feature map of the (n-1) th-level scale, inputting the first feature map of the (n) th-level scale into an up-sampling module of the semantic segmentation model, and performing up-sampling processing by the up-sampling module to obtain an up-sampling feature map of the (n) th-level scale; wherein N is more than or equal to 2 and less than or equal to N;
performing, by the first fusion module, first fusion processing on the upsampling feature map of the nth-level scale and the first feature map of the nth-1-level scale to obtain a first fusion feature map of the nth-1-level scale output by the first fusion module;
and regarding the first fusion feature map of the nth scale, taking the first feature map of the nth scale as the first fusion feature map of the nth scale.
5. The method according to claim 4, wherein the second fusing module performs a second fusing process on the first fused feature map with the N-level scale to obtain a second fused feature map with the N-level scale output by the second fusing module, and the method includes:
regarding the second fusion feature map of the 1 st-level scale, taking the first fusion feature map of the 1 st-level scale as the second fusion feature map of the 1 st-level scale;
for the second fusion feature map with the nth scale, inputting the first feature map with the (n-1) th scale into a down-sampling module of the semantic segmentation model, and performing down-sampling processing by the down-sampling module to obtain a down-sampling feature map with the (n-1) th scale; wherein N is more than or equal to 2 and less than or equal to N;
and performing second fusion processing on the down-sampling feature map of the (n-1) th level scale and the first fusion feature map of the nth level scale by the second fusion module to obtain a second fusion feature map of the nth level scale output by the second fusion module.
6. The method according to claim 1, wherein the generating of the semantic segmentation image corresponding to the original image according to the target fusion feature map of the N-level scale comprises:
inputting the target fusion feature map of the N-level scale into a global convolution module of the semantic segmentation model to obtain a second feature map of the N-level scale output by the global convolution module;
and generating the semantic segmentation image corresponding to the original image according to the second feature map with the N-level scale.
7. The method according to claim 6, wherein the generating the semantic segmentation image corresponding to the original image according to the second feature map of the N-level scale comprises:
inputting the second feature map of the N-level scale into an edge refining module of the semantic segmentation model to obtain a third feature map of the N-level scale output by the edge refining module;
and generating the semantic segmentation image corresponding to the original image according to the third feature map.
8. The method of claim 1, wherein prior to inputting the original image into a multi-scale feature extraction module of a semantic segmentation model, the method further comprises:
obtaining an original image sample; wherein, each original image sample is marked with a semantic segmentation image;
inputting the original image sample into a semantic segmentation model to obtain a predicted semantic segmentation image corresponding to the original image sample output by the semantic segmentation model;
and calculating the loss between the predicted semantic segmentation image and the annotated semantic segmentation image according to a regression loss function, and obtaining a trained semantic segmentation model when the loss meets a convergence condition.
9. An image processing apparatus characterized by comprising:
the system comprises a central processing unit, a memory, an input/output interface, a wired or wireless network interface and a power supply;
the memory is a transient storage memory or a persistent storage memory;
the central processor is configured to communicate with the memory and execute the instructions in the memory to perform the method of any of claims 1 to 8.
10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 8.
CN202211390885.1A 2022-11-07 2022-11-07 Semantic segmentation method, image processing apparatus, and computer-readable storage medium Pending CN115690787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211390885.1A CN115690787A (en) 2022-11-07 2022-11-07 Semantic segmentation method, image processing apparatus, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211390885.1A CN115690787A (en) 2022-11-07 2022-11-07 Semantic segmentation method, image processing apparatus, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN115690787A true CN115690787A (en) 2023-02-03

Family

ID=85049602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211390885.1A Pending CN115690787A (en) 2022-11-07 2022-11-07 Semantic segmentation method, image processing apparatus, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115690787A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116363161A (en) * 2023-06-02 2023-06-30 清华大学 Multi-category segmentation method and device for cement hydration image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116363161A (en) * 2023-06-02 2023-06-30 清华大学 Multi-category segmentation method and device for cement hydration image
CN116363161B (en) * 2023-06-02 2023-08-01 清华大学 Multi-category segmentation method and device for cement hydration image

Similar Documents

Publication Publication Date Title
JP7289918B2 (en) Object recognition method and device
CN111598030B (en) Method and system for detecting and segmenting vehicle in aerial image
CN111666921B (en) Vehicle control method, apparatus, computer device, and computer-readable storage medium
CN112418236B (en) Automobile drivable area planning method based on multitask neural network
EP4152204A1 (en) Lane line detection method, and related apparatus
CN113343778B (en) Lane line detection method and system based on LaneSegNet
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN112861619A (en) Model training method, lane line detection method, equipment and device
CN114802261B (en) Parking control method, obstacle recognition model training method and device
CN115546768A (en) Pavement marking identification method and system based on multi-scale mechanism and attention mechanism
CN113095152A (en) Lane line detection method and system based on regression
CN111931683A (en) Image recognition method, image recognition device and computer-readable storage medium
CN114764856A (en) Image semantic segmentation method and image semantic segmentation device
CN115376089A (en) Deep learning-based lane line detection method
CN115690787A (en) Semantic segmentation method, image processing apparatus, and computer-readable storage medium
Cervera-Uribe et al. U19-Net: a deep learning approach for obstacle detection in self-driving cars
CN114973199A (en) Rail transit train obstacle detection method based on convolutional neural network
Gao et al. Robust lane line segmentation based on group feature enhancement
Yasmin et al. Small obstacles detection on roads scenes using semantic segmentation for the safe navigation of autonomous vehicles
CN114495050A (en) Multitask integrated detection method for automatic driving forward vision detection
CN117115690A (en) Unmanned aerial vehicle traffic target detection method and system based on deep learning and shallow feature enhancement
CN117372991A (en) Automatic driving method and system based on multi-view multi-mode fusion
CN113780189A (en) Lane line detection method based on U-Net improvement
Guo et al. The dynamic-static dual-branch deep neural network for urban speeding hotspot identification using street view image data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination