CN112084864A

CN112084864A - Model optimization method and device, electronic equipment and storage medium

Info

Publication number: CN112084864A
Application number: CN202010784049.6A
Authority: CN
Inventors: 卢凯旋; 张昆仑
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2020-12-15

Abstract

The embodiment of the invention provides a model optimization method, a model optimization device, electronic equipment and a storage medium, wherein the method comprises the following steps: and carrying out feature extraction on the remote sensing image containing the target object to obtain a feature map. And respectively inputting the feature maps into the target detection model and the semantic segmentation model. And optimizing the target detection model according to the first prediction category to which the target object output by the target detection model belongs, the prediction position of the target object in the remote sensing image and the second prediction category to which the target object output by the semantic segmentation model belongs. Therefore, the method optimizes the target detection model by means of the results output by the target detection model and the semantic segmentation model. And because the second prediction category output by the semantic segmentation model is in a pixel level, the classification granularity of the second prediction category is smaller than that of the target detection model, the model optimization effect can be better after the output results of different granularities are combined, and the detection result of the target detection model can be more accurate.

Description

Model optimization method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to a model optimization method, a model optimization device, electronic equipment and a storage medium.

Background

Remote sensing may be performed inductively by telemetry instruments on platforms such as satellite, aeronautics, etc. to the earth's surface. Resource management and detection can be performed through the obtained remote sensing image. One common resource monitoring scenario is: the detection of the land cover means that the land is identified to be forest, open land, cultivated land, water area, factory, commercial area, etc. By using the detection result, the forest felling can be tracked, and reference can be provided for city planning.

The analysis of the remote sensing image can be realized by a target detection model, so that whether the detection result output by the target detection model is accurate or not is very important. In order to ensure the accuracy of the detection result, the target detection model is usually optimized, and at this time, how to ensure the optimization effect of the model becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a model optimization method, a model optimization device, electronic equipment and a storage medium, which are used for optimizing the detection effect of a target detection model.

The embodiment of the invention provides a model optimization method, which comprises the following steps:

carrying out feature extraction on a remote sensing image containing a target object to obtain a feature map;

acquiring a first prediction category of the target object determined by a target detection model according to the characteristic diagram and a prediction position of the target object in the remote sensing image;

obtaining a second prediction category of the target object determined by the semantic segmentation model according to the feature map;

optimizing the target detection model according to the first prediction category, the predicted location, and the second prediction category.

An embodiment of the present invention provides a model optimization apparatus, including:

the extraction module is used for extracting the characteristics of the remote sensing image containing the target object to obtain a characteristic diagram;

the first obtaining module is used for obtaining a first prediction category which the target object belongs to and the predicted position of the target object in the remote sensing image, wherein the first prediction category is determined by the target detection model according to the characteristic diagram;

the second obtaining module is used for obtaining a second prediction category of the target object determined by the semantic segmentation model according to the characteristic diagram;

an optimization module to optimize the target detection model based on the first prediction category, the predicted location, and the second prediction category.

An embodiment of the present invention provides an electronic device, including: a processor and a memory; wherein the memory is to store one or more computer instructions that when executed by the processor implement:

Embodiments of the present invention provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform at least the following:

In the model optimization method provided by the invention, the remote sensing image containing the target object is firstly subjected to feature extraction to obtain a feature map, and then the feature map is respectively input into a target detection model and a semantic segmentation model. According to the feature map, on one hand, the target detection model can output a first prediction category to which the target object belongs and the predicted position of the target object in the remote sensing image, and on the other hand, the semantic segmentation model can output a second prediction category to which the target object belongs. And finally, optimizing the target detection model according to the first prediction type and the prediction position output by the target detection model and the second prediction type output by the semantic segmentation model, namely obtaining the error between the output result of the two models and the real type and the real position of the target object, and optimizing the target detection model by using the error so as to improve the detection accuracy of the model.

Compared with the method for optimizing the target detection model only according to the result output by the target detection model, the method provided by the invention optimizes the target detection model by means of the result output by the target detection model and the semantic segmentation model. Because the second prediction category output by the semantic segmentation model is in a pixel level, the classification granularity of the second prediction category is smaller than that of the target detection model, the optimization effect is better by combining the prediction results of two different granularities, and the detection result of the target detection model is more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of a model optimization method according to an embodiment of the present invention;

FIG. 2 is a flowchart of an alternative implementation of step 104 in the embodiment shown in FIG. 1

FIG. 3 is a flow chart of an alternative third prediction error value determination provided by an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a model optimization apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device corresponding to the model optimization apparatus provided in the embodiment shown in fig. 4.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well. "plurality" generally includes at least two unless the context clearly dictates otherwise.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

The model optimization method provided herein is described in detail below with reference to the following examples. The sequence of steps in the method embodiments described below is merely exemplary and not strictly limiting. In addition, features in the embodiments and the embodiments described below may be combined with each other without conflict between the embodiments.

Fig. 1 is a flowchart of a model optimization method according to an embodiment of the present invention. The model training method may be performed by an optimization device, which may specifically be an electronic device with data processing capabilities. As shown in fig. 1, the method may include the steps of:

s101, extracting the characteristics of the remote sensing image containing the target object to obtain a characteristic diagram.

After the remote sensing image shot by the satellite is obtained, the remote sensing image can be input into a convolution neural network for convolution calculation to obtain a characteristic diagram. Optionally, the convolutional neural network may generally have at least one convolutional layer, and the remote sensing image may be subjected to at least one convolution calculation to obtain at least one feature map. In addition, each convolution calculation can be regarded as down-sampling of the remote sensing image, so that the size of the obtained feature map is smaller than that of the original remote sensing image, and the sizes of at least one feature map are different. The larger the number of convolution calculations, the smaller the size of the resulting feature map.

S102, obtaining a first prediction category of the target object determined by the target detection model according to the characteristic diagram and a prediction position of the target object in the remote sensing image.

S103, obtaining a second prediction category of the target object determined by the semantic segmentation model according to the feature map.

And S104, optimizing the target detection model according to the first prediction type, the prediction position and the second prediction type.

After the feature map is obtained by convolution calculation, in one aspect, the feature map may be input into the target detection model. The target detection model can output the category of the target object in the characteristic diagram and the position of the target object in the remote sensing image. Alternatively, the target object may be selected from the feature map, and the position of the image block selected from the frame in the feature map is also the position of the target object. And the target detection model can be trained in a supervised or unsupervised mode.

On the other hand, the feature map can be input into a semantic segmentation model, so that pixel-level semantic segmentation is performed on the feature map by the semantic segmentation model to determine the category to which each pixel belongs, and thus a second prediction category to which the target object in the feature map belongs is obtained.

It should be noted that the first prediction type, the prediction position, and the second prediction type are prediction results output by a single model, and the accuracy of the results directly depends on the training results of the model. The accuracy of target detection is obviously not guaranteed if the prediction result is directly used as the result of target detection. At this time, the results output by the two single models can be considered comprehensively, so that the target detection model is adjusted, that is, the target detection model is optimized by means of the second prediction type output by the semantic segmentation model.

Alternatively, the training effect of the target detection model can be expressed by a loss function. Specifically, the quality of the model training effect can be determined by the loss value calculated from the loss function. And then, adjusting the model parameters of the target detection model according to the loss value, namely, realizing the optimization of the target detection model. The specific process of optimizing the target detection model using the loss function can be seen in the detailed description of the embodiment shown in fig. 2.

It should be noted that, as can be seen from the above description, the number of the feature maps may be at least one, and in consideration of the problem of the calculation amount, optionally, any one of the feature maps a may be selected, and a set of prediction results corresponding to the feature map a may be used to calculate a prediction error, thereby implementing optimization on the target detection model. The group prediction result may include a first prediction category, a prediction position, and a second prediction category, which are obtained after the feature map a is input into the target detection model and the semantic segmentation model, respectively.

Optionally, a plurality of feature maps in the at least one feature map may be selected, where the plurality of feature maps have a respective set of predicted results. At this time, the target detection model is repeatedly performed by using the respective corresponding prediction results of the multiple feature maps, so as to ensure the optimization effect of the target detection model.

In this embodiment, feature extraction is performed on the remote sensing image including the target object to obtain a feature map corresponding to the remote sensing image. And respectively inputting the feature maps into a target detection model and a semantic segmentation model, and optimizing the target detection model according to a first prediction category to which the target object output by the target detection model belongs, the prediction position of the target object in the remote sensing image and a second prediction category to which the target object output by the semantic segmentation model belongs. Therefore, in the method provided by the embodiment, the target detection model is optimized by means of the results output by the target detection model and the semantic segmentation model. And because the second prediction category output by the semantic segmentation model is in a pixel level, the classification granularity of the second prediction category is smaller than that of the target detection model, the optimization effect of the model can be better by combining the output results of different granularities, and the detection result of the target detection model can be more accurate.

Having described the process of optimizing the model in the above embodiment, on the basis of the embodiment shown in fig. 1, an alternative implementation of step 104 may be as shown in fig. 2:

s201, determining a first prediction error value between the first prediction category and the real category of the target object.

Specifically, the target detection model outputs a first prediction class and, at the same time, a confidence that the target object belongs to the first prediction type. The magnitude of this confidence can indicate the degree of confidence that the target object belongs to the first prediction type. The confidence level can be any value between 0 and 1. After the remote sensing image is obtained, the user can generally label the category to which the target object belongs in the remote sensing image, the category labeled by the user can be regarded as the real category of the target object, and the confidence that the target object belongs to the real category can be regarded as 1.

For the two confidence levels obtained above, in an alternative manner, the difference between the confidence level corresponding to the first prediction category and the confidence level obtained after the manual labeling process may be determined as a first prediction error value. The smaller the error value is, the better the detection effect of the target detection model is.

S202, determining a second prediction error value between the prediction position and the real position of the target object.

Alternatively, after the feature map is input into the target detection model, the model may select the target object in the feature map, that is, output the predicted position of the target object in the feature map. At this time, the part of the feature map that is framed and selected to include the target object may be referred to as a first image block. Meanwhile, after the feature map is obtained, the user can select a position frame of the target object in the feature map, and the selected position can be regarded as the real position of the target object in the feature map. The part of the feature map selected by the user in the frame may be referred to as a second image block.

Then, first size information of the first image block and a first pixel coordinate of a central pixel point of the image block in the feature map are obtained. And acquiring second size information of the second image block and a second pixel coordinate of a central pixel point of the image block in the feature map. The size information of the image block may specifically be the width and height of the image block.

Finally, the height difference and the width difference of the two image blocks may be calculated, and a distance value between the first pixel coordinate and the second pixel coordinate may also be calculated. Alternatively, the sum of the height difference, the width difference, and the distance value may be determined as the second prediction error value.

And S203, determining a third prediction error value between the second prediction category and the real category.

After the feature map is input into the semantic segmentation model, the semantic segmentation model can output a corresponding semantic segmentation map. Because the semantic segmentation can be at a pixel level, each pixel point belongs to one category, and the pixel points belonging to the same category in the semantic segmentation graph are labeled with the same color.

Optionally, since the semantic segmentation map and the feature map have the same size, the third prediction error value may be calculated from color values of pixel points in the feature image and the semantic segmentation map, where the pixel coordinates are the same. This approach may also be understood as using cross entropy as a loss function to obtain the third prediction error value.

And S204, optimizing the target detection model according to the first prediction error value, the second prediction error value and the third prediction error value.

After obtaining the three prediction error values, the sum of the three prediction error values may be directly determined as a loss value of the target detection model in an optional manner. In another alternative mode, corresponding weight coefficients can be set for different prediction error values, and then the loss value of the target detection model is calculated according to the weight coefficients and the corresponding prediction error values. In the method provided by the invention, the target detection model is optimized by the semantic segmentation model, and the classification granularity of the semantic segmentation model is finer, so that the weight coefficient corresponding to the third prediction error value can be set to be larger. Of course, for the setting mode of the weight coefficient, different modes can be adopted according to actual requirements

Finally, the target detection model can be optimized according to the loss value, so that the accuracy of the detection result output by the target detection model is improved. Wherein, the larger the loss value is, the larger the optimization space of the target detection model is.

In this embodiment, for the three prediction results output by the target detection model and the semantic segmentation model respectively, the prediction error value corresponding to each prediction result may be calculated in different manners, and the target detection model is optimized by using the three prediction error values. It can be seen that the result output by the semantic segmentation model is used in the optimization process. And because the second prediction category output by the semantic segmentation model is in a pixel level, the classification granularity of the second prediction category is smaller, the optimization effect of the model can be better by combining the output results of different granularities, and the detection result of the target detection model can be more accurate.

In the above embodiment, a determination method of the third prediction error value has been provided, but since the size of the feature map is smaller than that of the remote sensing image, the details of the target object contained in the feature map are much smaller than those of the remote sensing image, which further results in that the third prediction error value obtained according to the feature map and the semantic segmentation map cannot truly reflect the accuracy of the second prediction category output by the semantic segmentation model.

Based on the above description, as shown in fig. 3, that is, an optional implementation manner of step 203, specifically, the following steps may be included:

s2031, a semantic segmentation map used for indicating that the target object belongs to the second prediction category is obtained.

S2032, adjusting the size of the semantic segmentation graph to be the same as that of the remote sensing image.

S2033, determining a third prediction error value according to the color values of the corresponding pixel points in the semantic segmentation graph and the remote sensing graph after the size adjustment.

Specifically, after the feature map is input into the semantic segmentation model, the semantic segmentation map output by the semantic segmentation model can be obtained, and the semantic segmentation map and the feature map have the same size, and the size of the semantic segmentation map is smaller than that of the remote sensing image. Then, the semantic segmentation image can be subjected to upsampling processing, that is, the size of the semantic segment is enlarged to be the same as that of the remote sensing image. The upsampling process may also be referred to as a deconvolution calculation process. Alternatively, the upsampling process of the semantic segmentation graph can be realized by using bilinear interpolation. And finally, determining a third prediction error value according to the semantic segmentation graph with the enlarged size and the color value of the corresponding pixel point in the remote sensing image.

In addition, the content that is not described in detail in this embodiment may refer to the related description in the embodiment shown in fig. 2, and is not described again here.

In this embodiment, the size of the semantic segmentation image is enlarged to be the same as that of the remote sensing image, and then the third prediction error value is calculated according to the semantic segmentation image and the remote sensing image after the size is enlarged. Because the detail information of the target object contained in the remote sensing image is far more than that of the feature map, the third prediction error value calculated by utilizing the semantic segmentation map and the remote sensing image can reflect the accuracy of the second prediction category better.

The model optimization apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these model optimization devices can each be constructed using commercially available hardware components configured through the steps taught in this scheme.

Fig. 4 is a schematic structural diagram of a model optimization apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes:

and the extraction module 11 is configured to perform feature extraction on the remote sensing image containing the target object to obtain a feature map.

And the first obtaining module 12 is configured to obtain a first prediction category to which the target object belongs and a predicted position of the target object in the remote sensing image, which are determined by the target detection model according to the feature map.

And a second obtaining module 13, configured to obtain a second prediction category to which the target object belongs, which is determined by the semantic segmentation model according to the feature map.

An optimization module 14, configured to optimize the target detection model according to the first prediction category, the predicted position, and the second prediction category.

Optionally, the optimization module 14 specifically includes:

a first determining unit 141 for determining a first prediction error value between the first prediction class and the real class of the target object.

A second determining unit 142 for determining a second prediction error value between the predicted position and the real position of the target object.

A third determining unit 143 configured to determine a third prediction error value between the second prediction class and the real class.

An optimization unit 144, configured to optimize the target detection model according to the first prediction error value, the second prediction error value, and the third prediction error value.

Optionally, the first determining unit 141 is specifically configured to: obtaining the confidence degree that the target object output by the target detection model belongs to a first prediction class and the real class of the target object; and determining the first prediction error value as a function of the confidence level and the true category.

Optionally, the second determining unit 141 is specifically configured to: determining a first image block and a second image block which respectively correspond to the predicted position and the real position in the feature map; acquiring first size information of the first image block and a first pixel coordinate of a central pixel point of the first image block; acquiring second size information of the second image block and second pixel coordinates of a central pixel point of the second image block; and determining the second prediction error value according to the first size information, the first pixel coordinate, the second size information and the second pixel coordinate.

Optionally, the third determining unit 143 is specifically configured to: acquiring a semantic segmentation map used for indicating that the target object belongs to the second prediction category; and determining the third prediction error value according to the semantic segmentation graph and the color value of the corresponding pixel point in the feature graph.

Optionally, the third determining unit 143 is specifically configured to: acquiring a semantic segmentation map used for indicating that the target object belongs to the second prediction category; adjusting the size of the semantic segmentation graph to be the same as the remote sensing image; and determining the third prediction error value according to the semantic segmentation graph after the size adjustment and the color value of the corresponding pixel point in the remote sensing image.

Wherein the characteristic diagram is at least one characteristic diagram with different sizes.

The apparatus shown in fig. 4 can perform the method of the embodiment shown in fig. 1 to 3, and reference may be made to the related description of the embodiment shown in fig. 1 to 3 for a part not described in detail in this embodiment. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to fig. 3, and are not described herein again.

The internal functions and structures of the model optimization device are described above, and in one possible design, the structure of the model optimization device may be implemented as an electronic device, as shown in fig. 5, which may include: a processor 21 and a memory 22. Wherein the memory 22 is used for storing a program for supporting the electronic device to execute the model optimization method provided in the foregoing embodiments shown in fig. 1 to 3, and the processor 21 is configured to execute the program stored in the memory 22.

The program comprises one or more computer instructions which, when executed by the processor 21, are capable of performing the steps of:

Optionally, the processor 21 is further configured to perform all or part of the steps in the foregoing embodiments shown in fig. 1 to 3.

The electronic device may further include a communication interface 23 for communicating with other devices or a communication network.

Additionally, embodiments of the present invention provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform at least the following:

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of model optimization, comprising:

2. The method of claim 1, wherein optimizing the object detection model based on the first prediction category, the predicted location, and the second prediction category comprises:

determining a first prediction error value between the first prediction class and a true class of the target object;

determining a second predicted error value between the predicted position and the true position of the target object;

determining a third prediction error value between the second prediction class and the true class;

optimizing the target detection model based on the first, second, and third prediction error values.

3. The method of claim 2, wherein determining a first prediction error value between the first prediction class and the true class of the target object comprises:

obtaining the confidence degree that the target object output by the target detection model belongs to a first prediction class and the real class of the target object;

determining the first prediction error value as a function of the confidence level and the true category.

4. The method of claim 2, wherein determining a second prediction error value between the predicted position and the true position of the target object comprises:

determining a first image block and a second image block which respectively correspond to the predicted position and the real position in the feature map;

acquiring first size information of the first image block and a first pixel coordinate of a central pixel point of the first image block;

acquiring second size information of the second image block and second pixel coordinates of a central pixel point of the second image block;

determining the second prediction error value according to the first size information, the first pixel coordinate, the second size information, and the second pixel coordinate.

5. The method of claim 2, wherein determining a third prediction error value between the second prediction class and the true class comprises:

acquiring a semantic segmentation map used for indicating that the target object belongs to the second prediction category;

and determining the third prediction error value according to the color values of the corresponding pixel points in the semantic segmentation graph and the feature graph.

6. The method of claim 2, wherein determining a third prediction error value between the second prediction class and the true class comprises:

adjusting the size of the semantic segmentation graph to be the same as the remote sensing image;

and determining the third prediction error value according to the semantic segmentation graph after the size adjustment and the color value of the corresponding pixel point in the remote sensing image.

7. The method according to any one of claims 1 to 6, wherein the feature map is at least one feature map having different sizes.

8. A model optimization apparatus, comprising:

9. An electronic device, comprising: a processor and a memory; wherein the memory is to store one or more computer instructions that when executed by the processor implement:

10. A computer-readable storage medium storing computer instructions, which when executed by one or more processors, cause the one or more processors to perform at least the following acts: