CN111931782A

CN111931782A - Semantic segmentation method, system, medium, and apparatus

Info

Publication number: CN111931782A
Application number: CN202010808133.7A
Authority: CN
Inventors: 舒睿俊; 陈铭弘; 李嘉茂; 张晓林
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-11-13
Anticipated expiration: 2040-08-12
Also published as: CN111931782B

Abstract

The invention provides a semantic segmentation method, a semantic segmentation system, a semantic segmentation medium and a semantic segmentation device, wherein the method comprises the following steps: acquiring a picture needing semantic segmentation, and recording the size of the picture as H multiplied by W multiplied by M, wherein H represents the height of the picture, W represents the width of the picture, and M represents the number of channels; performing edge processing on the picture to obtain an edge connected graph S; performing semantic segmentation on the picture based on a pre-trained semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_i(ii) a According to the edge connected graph S, a four-neighbor domain connected set F with the pixel of 0 is generated, each connected domain k in the F is traversed, and a semantic segmentation model i with the minimum average information entropy corresponding to each connected domain k is found out respectively_minDetermining connectivityFinal semantic labels for all pixels in the domain k

Where (h, w) e.k. The invention discloses a semantic segmentation method, a semantic segmentation system, a semantic segmentation medium and a semantic segmentation device, which are used for improving the precision and the robustness of semantic segmentation in strange scenes.

Description

Semantic segmentation method, system, medium, and apparatus

Technical Field

The present invention relates to the field of image segmentation technologies, and in particular, to a semantic segmentation method, system, medium, and apparatus.

Background

In the semantic recognition system, the method combined with deep learning is the most widely applied at present, and the method with supervised learning is the most widely applied. In practical application, a method of supervised training of a certain data set with semantic tags is often needed, and the goal is to learn a stable model with good performance in all aspects. Due to the limited expression capability of the trained model, the obtained model can be a preferred model.

Ensemble learning is intended to obtain a better and more comprehensive model by combining multiple learners (models), and the core idea is that even if one learner gets a wrong prediction, the other learners can correct the error back, as shown in fig. 1 a. Each individual learner is generated from training data using an existing learning algorithm, such as the C4.5 decision tree algorithm, the BP neural network algorithm, and the like. The effect of reducing variance, deviation or improving prediction can be achieved by combining a plurality of machine learning technologies with the same form or different forms into a meta-algorithm of a prediction model.

In combination, ensemble learning can be divided into two types. The first is a sequence integration approach, in which the underlying learners participating in training are generated in order (e.g., AdaBoost). The principle of the sequence method is to utilize the dependency between the basic learners. By assigning higher weights to the samples that were incorrectly labeled in the previous training, the overall prediction effect can be improved. The second is a parallel integration approach, where the underlying learners participating in the training are generated in parallel (e.g., Random Forest). The principle of the parallel method is that the independence between the basic learners is utilized, and errors can be obviously reduced through averaging.

Ensemble learning can be divided into two categories in terms of binding strategies. The first is an averaging method, mainly aimed at outputting h in numerical form_i(x) Regression for e.RAnd (5) class tasks. The averaging method is classified into a simple averaging method and a weighted averaging method. The simple averaging method formula is as follows:

the weighted average method is as follows:

where w is_iPresentation individual learning machine h_iWeight of (2), usually requiring w_i≥0，

The weights of the weighted average method are generally learned from training data, and training samples in real tasks are usually insufficient or noisy, which makes the learned weights not completely reliable. Thus, both experiments and applications show that weighted average scores are not necessarily superior to naive simple averaging. In general, weighted averaging is preferred when the performance of individual learners varies greatly, while simple averaging is preferred when the performance of individual learners is similar.

Another combination strategy is voting, mainly aiming at classification tasks. For the classification task, learner h_iWill tag the set c from the category₁，c₂，...，c_NOne marker is predicted, and the most common binding strategy is to use voting. For ease of discussion, we will refer to h_iThe prediction output at sample X is represented as an N-dimensional vector, where

Is h_iIn category label c_jAn output of (c). The voting method is classified into three kinds, an absolute majority voting method, a relative majority voting method, and a weighted voting method. The absolute majority voting formula is as follows:

the relative majority voting formula is as follows:

the weighted voting formula is as follows:

in the existing scene perception and understanding algorithm, an edge graph is used as auxiliary information to improve results, wherein the work is less, part of the work is as the same as the Classification with edge of d.marmanis on the isps 2018, the Classification with edge of the d.marmanis is inputted into an edge detection network together with an input picture and a DEM signal before the classic edge detection network HED is added into a full volume network (FCN) -based semantic segmentation network, then the edge likelihood graph outputted by the edge detection network and the two signals are inputted into the semantic segmentation network together, and finally the whole network is supervised and trained by a semantic segmentation label, as shown in fig. 1 b. In the training phase, the HED network and the FCN network need to be trained respectively aiming at respective tasks and then spliced together for training.

In the existing edge auxiliary semantic segmentation method, only simple concatee (namely, two pictures are aligned and stacked) operation is carried out on an edge graph and an input picture, and edge information is integrated by a method of independently learning an edge likelihood graph by a network in a training process, so that the method is not clear in an auxiliary means; the method of marmanis is a method corresponding to a specific data set, and does not consider the effect of semantic segmentation when a network is placed in a real scene. The pictures collected in the real scene and the pictures in the data set during training have data distribution difference, and are also the data set of the automatic driving scene, and the semantic segmentation network trained on the Cityscapes data set cannot be well represented on the ADE20k data set, not to mention the representation capability in the real scene.

Therefore, it is desirable to solve the problem of how to obtain a model without preference and how to improve the precision and robustness of semantic segmentation in an unfamiliar scene.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present invention is to provide a semantic segmentation method, system, medium, and apparatus, which are used to solve the problems of how to obtain a model without preference and how to improve the precision and robustness of semantic segmentation in an unfamiliar scene in the prior art.

To achieve the above and other related objects, the present invention provides a semantic segmentation method, comprising the steps of: acquiring a picture needing semantic segmentation, and recording the size of the picture as H multiplied by W multiplied by M, wherein H represents the height of the picture, W represents the width of the picture, and M represents the number of channels; performing edge processing on the picture to obtain an edge connected graph S; performing semantic segmentation on the picture based on a pre-trained semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_i(ii) a According to the edge connected graph S, a four-neighbor domain connected set F with the pixel of 0 is generated, each connected domain k in the F is traversed to find out a semantic segmentation model i with the minimum average information entropy corresponding to each connected domain k_minDetermining the final semantic label of all pixels in the connected component k

Where (h, w) e.k.

In an embodiment of the present invention, the performing the edge processing on the picture to obtain the edge connected graph S includes: performing edge detection on the picture through an edge detection network to generate an initial edge detection image E, wherein the size of the initial edge detection image E is H multiplied by W; thinning the initial edge detection graph E to obtain a thinned edge graph; carrying out binarization processing on the refined edge map to obtain an initial binarization edge map T; and carrying out large connected block full connected processing on the initial binarization edge image T to obtain a closed edge image S.

In the hairIn an embodiment of the invention, the semantic segmentation is performed on the picture based on a semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_iThe method comprises the following steps: carrying out scene classification on the picture based on a scene recognition network to acquire the scene type of the picture; calling n semantic segmentation models which accord with scene attributes from a semantic segmentation model library based on the scene types; traversing each semantic segmentation model, and recording the current semantic segmentation model as i, i belongs to [1, n ]]Performing semantic segmentation on the picture to obtain an initial semantic segmentation result of the model i, wherein the initial semantic segmentation result comprises the following steps: semantic segmentation probability map P_iDimension H × W × C_iIn which C is_iThe number of semantic categories which can be output by the semantic segmentation model; semantic segmentation label graph G_iAnd the size is H multiplied by W.

In an embodiment of the present invention, the semantic segmentation model i with the minimum average information entropy is found out by traversing each connected domain k in F_minDetermining the final semantic label of all pixels in the connected component k

Wherein (h, w) ∈ k includes: counting the semantic label voting condition of the semantic model i at the position of the pixel position set on the pixel position set in the connected domain k, and selecting the label with the most votes as the semantic label result of the semantic model i at the pixel position in the connected domain k; calculating the average information entropy l of each semantic segmentation model i on the set of pixel positions in the connected domain k_i：

Wherein (h, w) ∈ k, C_iRepresenting the total number of semantic tags; finding the average information entropy l_iMinimum semantic segmentation model i_min(ii) a Determining final semantic tags for all pixels within connected domain k

Where (h, w) e.k.

In order to achieve the above object, the present invention further provides a semantic segmentation system, including: the system comprises an acquisition module, an edge processing module, a semantic segmentation module and a label module; the acquisition module is used for acquiring a picture needing semantic segmentation, and recording the size of the picture as H multiplied by W multiplied by M, wherein H represents the height of the picture, W represents the width of the picture, and M represents the number of channels; the edge processing module is used for carrying out edge processing on the picture to obtain an edge connected graph S; the semantic segmentation module is used for performing semantic segmentation on the picture based on a pre-trained semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_i(ii) a The label module is used for generating a four-adjacent domain connected set F with a pixel of 0 according to the edge connected graph S, traversing each connected domain k in the F and respectively finding out a semantic segmentation model i with the minimum average information entropy corresponding to each connected domain k_minDetermining the final semantic label of all pixels in the connected component k

Where (h, w) e.k.

In an embodiment of the present invention, the semantic segmentation is performed on the picture based on a semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_iThe method comprises the following steps: carrying out scene classification on the picture based on a scene recognition network to acquire the scene type of the picture; calling n semantic segmentation models which accord with scene attributes from a semantic segmentation model library based on the scene types; go throughEach semantic segmentation model records the current semantic segmentation model as i, i belongs to [1, n ]]Performing semantic segmentation on the picture to obtain an initial semantic segmentation result of the model i, wherein the initial semantic segmentation result comprises the following steps: semantic segmentation probability map P_iDimension H × W × C_iIn which C is_iThe number of semantic categories which can be output by the semantic segmentation model; semantic segmentation label graph G_iAnd the size is H multiplied by W.

Where (h, w) e.k.

To achieve the above object, the present invention further provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor implements any of the above semantic segmentation methods.

In order to achieve the above object, the present invention further provides a semantic segmentation apparatus, including: a processor and a memory; the memory is used for storing a computer program; the processor is connected with the memory and is used for executing the computer program stored in the memory so as to enable the semantic segmentation device to execute any one of the semantic segmentation methods.

As described above, the semantic segmentation method, system, medium, and apparatus of the present invention have the following advantageous effects: the method is used for improving the precision and robustness of semantic segmentation in an unfamiliar scene.

Drawings

FIG. 1a is a schematic diagram of an ensemble learning basic system according to an embodiment of the semantic segmentation method of the present invention;

FIG. 1b is a flow chart of a Classification with an edge network according to an embodiment of the semantic segmentation method of the present invention;

FIG. 1c is a flow chart of a semantic segmentation method according to an embodiment of the invention;

FIG. 1d is a schematic diagram of an eight neighborhood region in an embodiment of the semantic segmentation method of the present invention;

FIG. 2 is a schematic diagram illustrating a semantic segmentation system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a semantic segmentation apparatus according to an embodiment of the present invention.

Description of the element reference numerals

21 acquisition module

22 edge processing module

23 semantic segmentation module

24 label module

31 processor

32 memory

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, so that the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, the type, quantity and proportion of the components in actual implementation can be changed freely, and the layout of the components can be more complicated.

The semantic segmentation method, the semantic segmentation system, the semantic segmentation medium and the semantic segmentation device are used for improving the precision and the robustness of semantic segmentation in strange scenes.

As shown in fig. 1c, in an embodiment, the semantic segmentation method of the present invention includes the following steps:

and step S11, acquiring a picture needing semantic segmentation, and recording the size of the picture as H multiplied by W multiplied by M, wherein H represents the height of the image, W represents the width of the image, and M represents the number of channels.

Specifically, the method further includes loading of various modules of the system, configuration of system parameters and the like.

Specifically, a picture requiring semantic segmentation is acquired from a perception system, and the picture size is recorded as H multiplied by W multiplied by 3, wherein H represents the image height (unit: pixel), W represents the image width (unit: pixel), and 3 represents 3 channels of red, green and blue.

And step S12, performing edge processing on the picture to obtain an edge connected graph S.

Specifically, the performing edge processing on the picture to obtain an edge connected graph S includes:

and carrying out edge detection on the picture through an edge detection network to generate an initial edge detection image E, wherein the size of the initial edge detection image E is H multiplied by W. Specifically, edge detection is performed on an input picture by using an existing edge detection network, and an initial edge detection graph E is generated, wherein the size of E is H × W (the height and the width of the input picture are consistent), an edge detection value is stored in each pixel position in E, the value is a real number, the value range is 0-255, and the larger the value is, the higher the probability that the pixel is an object edge point is.

And thinning the initial edge detection image E to obtain a thinned edge image. Specifically, the edge is represented by using gradient change, and since the area with large gradient change is usually wide, a normal arctan (y/x) is determined by using gradients in the directions of an x axis and a y axis on an image coordinate system, then whether the gradient value of the current pixel is a peak value (or a local maximum value) is judged in the normal direction, if so, the edge detection value (gradient value) of the current pixel is retained, if not, the edge detection value of the current pixel is suppressed, the edge detection value of the current pixel is set to be 0, and finally, a refined edge line is obtained, wherein the width is 1-2 pixels, and a refined edge image is obtained.

And carrying out binarization processing on the refined edge map to obtain an initial binarization edge map T. Specifically, all pixel points in the refined edge graph are traversed, if the edge detection value of the pixel point is greater than 0, the edge detection value is uniformly set to be a fixed value (for example, 255), and the pixel point is called as a valued pixel point; otherwise, the edge detection value of the pixel point keeps 0 unchanged. And after traversing is finished, obtaining a binary edge map T.

And carrying out large connected block full connected processing on the initial binarization edge image T to obtain a closed edge image S. Specifically, firstly, a connected domain set C of valued pixels in T is detected. Here, the connection refers to the concept of eight neighborhood connections in the image processing field, as shown in fig. 1d, it is assumed that a pixel point 4 has a value, and if one of eight points around the pixel point 4 has a value, the point is said to be connected with the pixel point 4, and the connected pixel points form a line.

Secondly, traversing each connected domain in the connected domain set C, recording the current connected domain as t, and processing the connected domain t as follows:

a. according to the eight-neighborhood connected method, traversing each pixel point in the connected domain t, and judging whether the number of valued pixel points in the eight-neighborhood is more than 2: if yes, the pixel point is a line segment inner point; otherwise, the pixel is an endpoint in the connected domain. Generating the endpoint set S in the connected domain after the traversal is finished_t。

b. Traverse S_tIn the method, each endpoint is marked as j, valued pixel points in the neighborhood radius r of the endpoint j are searched, and all valued pixel points which belong to the same connected domain with the endpoint j form a pixel point set I in the connected domain_j(ii) a All valued pixels which do not belong to the same connected domain with the end point j form a connected domain outside pixel point set O_j. If set O_jIf not, then for O_jAnd (4) sorting the inner pixel points according to the distance from the inner pixel point to the end point j, connecting the closest point with the end point j, and updating the binary edge map T. If set I_jIf not, all pixels in the neighborhood radius r of the endpoint j are taken to form a graph, and connected domain detection is carried out on the graph to obtain a connected domain set L of the graph_jAnd to I_jThe inner pixel points are sorted according to the distance from the end point j, and sequentially judge I according to the sequence from far to near_jWhether the inner pixel point and the end point j belong to L_jIf so, connecting the two points, updating the binary edge map T and jumping out of the loop.

After this step is completed, an edge connected graph S with closed edges can be obtained. The closed concept, still taking fig. 1d as an example, wherein if 0, 1, 2, 3, 5, 6, 7, 8 has a value, it means that each point has at least two neighboring points in the eight-neighboring domain having values, so the line passing through the above pixels is a closed line. Similarly, lines passing through 1, 3, 5, 7 are also closed lines. Firstly, prior, the edge detection network focuses more on the lower-level semantic information in the image, and the semantic segmentation network focuses more on the higher-level semantic information in the image. Edge detection is used to separate each object in the picture using lines, while semantic segmentation is used to determine why each pixel location in the picture should classify an object. Edge detection performs better in detail on the edge portion of an object and another object, and therefore the result of edge detection is used as a dividing line between the object and the object. These dividing lines divide the entire picture into several connected domains. However, after the edge detection network performs edge detection on a picture, the generated edge map is a probability value, which means a probability value of an edge at the pixel position (which may be understood as generating a map of H × W, where each position has a value ranging from 0 to 255, and the larger the value, the more likely the edge is). Most areas throughout the graph are of values greater than 0, except for the size of the values. If it is desired to separate the object from the object, it is most desirable that the edge map be a single pixel wide line. Therefore, the non-maximum suppression (NMS) operation is carried out on the edge map obtained by the edge detection, and a refined edge map is obtained. However, the lines in the edge graph are broken, so a method is proposed to close the lines, so that each object or each block in the object can form a connected domain. The accuracy and robustness of the segmentation can be further improved. This is because the characteristic improvement system that can be guided by edge detection has an effect of improving the segmentation accuracy of the edge portion of the object in the semantic segmentation result; the semantic fusion can pick the most expressive semantic segmentation model i from the k connected region given by the edge detection to mark the k region with a semantic label, so that the semantic segmentation model i with better expressive ability in each region k can be picked for different scenes, and the robustness of the whole system is improved.

Step S13, performing semantic segmentation on the picture based on a pre-trained semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_i。

Specifically, the pre-trained semantic segmentation model i is trained, and the number of the semantic segmentation models i is i ∈ [1, n ].

Specifically, the semantic segmentation is carried out on the picture based on a semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_iThe method comprises the following steps:

and carrying out scene classification on the picture based on a scene recognition network to acquire the scene type of the picture. Specifically, the pictures are subjected to scene classification by using an existing scene recognition network, and scene types (such as outdoor, indoor, outdoor and the like) are generated.

And calling the semantic segmentation models which accord with the scene attributes from a semantic segmentation model library based on the scene types, wherein the number of the semantic segmentation models is n. Specifically, according to the identified scene type, the semantic segmentation models conforming to the scene type are called from a semantic segmentation model library, and n semantic segmentation models conforming to the scene type are assumed.

Traversing each semantic segmentation model, recording the current semantic segmentation model as i, i belongs to [1, n ], performing semantic segmentation on the picture, and obtaining an initial semantic segmentation result of the model i, wherein the initial semantic segmentation result comprises the following steps:

semantic segmentation probability map P_iDimension H × W × C_iIn which C is_iThe number of semantic categories which can be output by the semantic segmentation model;

represents P_iPosition (d) is the value of (h, w, c);

probability value representing that the pixel (h, w) position of the input picture belongs to semantic type C, C ∈ [1, C]。

Semantic segmentation label graph G_iThe size is H multiplied by W;

represents G_iThe value of position (h, w) in (c),

a number indicating that the pixel (h, w) position of the input picture belongs to semantic type c. Each semantic segmentation model i carries out semantic segmentation once, so that the best-performing semantic segmentation model is selected for each region k, and the result of the semantic segmentation model is used as a fused label. G_iAnd P_iIs required for later calculation of information entropy and voting.

Step S14, according to the edge connected graph S, generating a four-adjacent domain connected set F with a pixel of 0, traversing each connected domain k in the F, and respectively finding out a semantic segmentation model i with the minimum average information entropy corresponding to each connected domain k_minDetermining the final semantic label of all pixels in the connected component k

Where (h, w) e.k. Such as P (h, w, c), c e (1, c)_max). This totals c_maxAnd the value represents the probability value that the pixel point belongs to each class label c respectively. The quality of a model's expressive power is considered to be the deterministic size (confidence) for the target decision. For example, an expert may consider a thing to be not a good expert if the analysis of the thing is ambiguous, but consider a good expert if an expert considers that a thing must be something. The same is true in this probability P, for which c probability values an entropy is calculated, which low represents that the model determines that this point (h, w) is of a certain class, and vice versa.

Specifically, according to the edge connected graph S, generating the four-neighborhood connected set F with a pixel of 0 includes: the four neighborhood connected concept here is similar to eight neighborhood connected. As shown in fig. 1d, the four-neighborhood region of pixel 4 is

pixels

1, 3, 5, 7. When the pixel 4 has a non-zero value, and a non-zero value exists somewhere on the

positions

1, 3, 5 and 7, the positions are said to form four-neighborhood communication with the pixel 4.

Specifically, each connected domain k in the traversal F finds out a semantic segmentation model i with the minimum average information entropy_minDetermining the final semantic label of all pixels in the connected component k

Wherein (h, w) ∈ k includes: traversing each connected domain k in F, and dividing the corresponding category c of the network i in the connected domain k due to different semantemes_iThere may be differences, so the present invention trusts by calculating the confidence of different semantic probability maps and selecting the one with the highest confidence.

And counting the semantic label voting condition of the semantic model i at the position of the pixel position set on the pixel position set in the connected domain k, and selecting the label with the most votes as the semantic label result of the semantic model i at the pixel position in the connected domain k. And on a pixel position set in the connected domain k, counting the semantic label voting condition of each semantic model i at the positions according to the concept of a voting method in ensemble learning, and selecting the label with the most votes as a semantic label result of the semantic model at the pixel position in the connected domain k.

Calculating the average information entropy l of each semantic segmentation model i on the set of pixel positions in the connected domain k_i：

Wherein (h, w) ∈ k, C_iRepresenting the total number of semantic tags. The connected component k is a concept defined from the length and width dimensions of the picture. For example, a picture in which a connected component k divides a basketball in the picture, all the pixels representing the basketball are all the elements in the connected component. That is, (h, w) is a set of pixels included in the connected domain belonging to k.

Finding the average information entropy l_iThe smallest semantic segmentation model is denoted as i_min。

Determine the final semantic label of all pixels in the connected component k as

Where (h, w) e.k.

Meaning semantically segmented label graph G_iI is i_minG of (A) time_i，

To represent

The value of position (h, w) in (f) represents the semantic segmentation label of the pixel (h, w).

Specifically, each connected domain in F is traversed, and a final semantic segmentation label graph R is generated.

As shown in fig. 2, in an embodiment, the semantic segmentation system of the present invention includes: obtainA fetching module 21, an edge processing module 22, a semantic segmentation module 23 and a labeling module 24. The obtaining module 21 is configured to obtain a picture to be subjected to semantic segmentation, and record that the size of the picture is H × W × M, where H represents an image height, W represents an image width, and M represents a channel number; the edge processing module 22 is configured to perform edge processing on the picture to obtain an edge connected graph S; the semantic segmentation module 23 is configured to perform semantic segmentation on the picture based on a pre-trained semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_i(ii) a The labeling module 24 is configured to generate a connected set F of four neighboring domains with a pixel of 0 according to the edge connected graph S, traverse each connected domain k in F, and find a semantic segmentation model i with the minimum average information entropy corresponding to each connected domain k respectively_minDetermining the final semantic label of all pixels in the connected component k

Where (h, w) e.k.

In an embodiment of the present invention, the semantic segmentation is performed on the picture based on a semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_iThe method comprises the following steps: carrying out scene classification on the picture based on a scene recognition network to acquire the scene type of the picture; calling n semantic segmentation models which accord with scene attributes from a semantic segmentation model library based on the scene types; traversing each semantic segmentation model, and recording the current semantic segmentation model as i, i belongs to [1, n ]]Performing semantic segmentation on the picture to obtain the initial value of the model iThe starting semantic segmentation result comprises: semantic segmentation probability map P_iDimension H × W × C_iIn which C is_iThe number of semantic categories which can be output by the semantic segmentation model; semantic segmentation label graph G_iAnd the size is H multiplied by W.

Where (h, w) e.k.

It should be noted that the structures and principles of the obtaining module 21, the edge processing module 22, the semantic segmentation module 23, and the label module 24 correspond to the steps in the semantic segmentation method one to one, and therefore are not described herein again.

It should be noted that the division of the modules of the above system is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Specific Integrated circuits (ASICs), or one or more Microprocessors (MPUs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In an embodiment of the present invention, the present invention further includes a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement any of the above semantic segmentation methods.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

As shown in fig. 3, in an embodiment, the semantic segmentation apparatus of the present invention includes: a processor 31 and a memory 32; the memory 32 is for storing a computer program; the processor 31 is connected to the memory 32, and is configured to execute the computer program stored in the memory 32, so as to enable the semantic segmentation apparatus to execute any one of the semantic segmentation methods.

Specifically, the memory 32 includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

Preferably, the Processor 31 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

In summary, the semantic segmentation method, system, medium and apparatus of the present invention are used to improve the precision and robustness of semantic segmentation in strange scenes. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A semantic segmentation method is characterized by comprising the following steps:

acquiring a picture needing semantic segmentation, and recording the size of the picture as H multiplied by W multiplied by M, wherein H represents the height of the picture, W represents the width of the picture, and M represents the number of channels;

performing edge processing on the picture to obtain an edge connected graph S;

performing semantic segmentation on the picture based on a pre-trained semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_i；

According to the edge connected graph S, a four-neighbor domain connected set F with the pixel of 0 is generated, each connected domain k in the F is traversed, and a semantic segmentation model i with the minimum average information entropy corresponding to each connected domain k is found out respectively_minDetermining the final semantic label of all pixels in the connected component k

Where (h, w) e.k.

2. The semantic segmentation method according to claim 1, wherein the edge processing the picture to obtain an edge connected graph S comprises:

performing edge detection on the picture through an edge detection network to generate an initial edge detection image E, wherein the size of the initial edge detection image E is H multiplied by W;

thinning the initial edge detection graph E to obtain a thinned edge graph;

carrying out binarization processing on the refined edge map to obtain an initial binarization edge map T;

and carrying out large connected block full connected processing on the initial binarization edge image T to obtain a closed edge image S.

3. The semantic segmentation method according to claim 1, wherein the semantic segmentation is performed on the picture based on a semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_iThe method comprises the following steps:

carrying out scene classification on the picture based on a scene recognition network to acquire the scene type of the picture;

calling n semantic segmentation models which accord with scene attributes from a semantic segmentation model library based on the scene types;

semantic segmentation probability map P_iDimension H × W × C_iIn which C is_iThe number of semantic categories which can be output by the semantic segmentation model i;

semantic segmentation label graph G_iAnd the size is H multiplied by W.

4. The semantic segmentation method according to claim 1, wherein each connected component k in the traversal F finds the semantic segmentation model i with the smallest average entropy_minDetermining the final semantic label of all pixels in the connected component k

Wherein (h, w) ∈ k includes:

counting the semantic label voting condition of the semantic model i at the position of the pixel position set on the pixel position set in the connected domain k, and selecting the label with the most votes as the semantic label result of the semantic model i at the pixel position in the connected domain k;

Wherein (h, w) ∈ k, C_iRepresenting the total number of semantic tags;

finding the average information entropy l_iMinimum sizeSemantic segmentation model i_min；

Determining final semantic tags for all pixels within connected domain k

Where (h, w) e.k.

5. A semantic segmentation system, comprising: the system comprises an acquisition module, an edge processing module, a semantic segmentation module and a label module;

the acquisition module is used for acquiring a picture needing semantic segmentation, and recording the size of the picture as H multiplied by W multiplied by M, wherein H represents the height of the picture, W represents the width of the picture, and M represents the number of channels;

the edge processing module is used for carrying out edge processing on the picture to obtain an edge connected graph S;

the semantic segmentation module is used for performing semantic segmentation on the picture based on a pre-trained semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_i；

The label module is used for generating a four-adjacent domain connected set F with a pixel of 0 according to the edge connected graph S, traversing each connected domain k in the F and respectively finding out a semantic segmentation model i with the minimum average information entropy corresponding to each connected domain k_minDetermining the final semantic label of all pixels in the connected component k

Where (h, w) e.k.

6. The semantic segmentation system according to claim 5, wherein the edge processing the picture to obtain an edge connected graph S comprises:

thinning the initial edge detection graph E to obtain a thinned edge graph;

7. The semantic segmentation system according to claim 5, wherein the semantic segmentation is performed on the picture based on a semantic segmentation model i; generating a semantic tag graph G_iAnd semantic probability map P_iThe method comprises the following steps:

semantic segmentation label graph G_iAnd the size is H multiplied by W.

8. The semantic segmentation system according to claim 5, wherein each connected component k in the traversal F finds the semantic segmentation model i with the smallest average entropy_minDetermining the final semantic label of all pixels in the connected component k

Wherein (h, w) ∈ k includes:

Wherein (h, w) ∈ k, C_iRepresenting the total number of semantic tags;

finding the average information entropy l_iMinimum semantic segmentation model i_min；

Determining final semantic tags for all pixels within connected domain k

Where (h, w) e.k.

9. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor to implement the semantic segmentation method according to any one of claims 1 to 4.

10. A semantic segmentation apparatus, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is connected to the memory and is configured to execute the computer program stored in the memory to cause the semantic segmentation apparatus to perform the semantic segmentation method according to any one of claims 1 to 4.