CN113449820A - Image processing method, electronic device, and storage medium - Google Patents

Image processing method, electronic device, and storage medium Download PDF

Info

Publication number
CN113449820A
CN113449820A CN202110996642.1A CN202110996642A CN113449820A CN 113449820 A CN113449820 A CN 113449820A CN 202110996642 A CN202110996642 A CN 202110996642A CN 113449820 A CN113449820 A CN 113449820A
Authority
CN
China
Prior art keywords
class
category
image
mask
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110996642.1A
Other languages
Chinese (zh)
Other versions
CN113449820B (en
Inventor
李艺
旷章辉
陈益民
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sensetime Technology Co Ltd
Original Assignee
Shenzhen Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sensetime Technology Co Ltd filed Critical Shenzhen Sensetime Technology Co Ltd
Priority to CN202110996642.1A priority Critical patent/CN113449820B/en
Publication of CN113449820A publication Critical patent/CN113449820A/en
Application granted granted Critical
Publication of CN113449820B publication Critical patent/CN113449820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20182Noise reduction or smoothing in the temporal domain; Spatio-temporal filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an image processing method, an electronic device and a storage medium, wherein the image processing method comprises the following steps: acquiring a training image and a class response graph thereof; segmenting the class response graph to obtain a plurality of class classification response graphs; carrying out coefficient of variation smoothing on each class classification response graph to obtain a smooth classification response graph of each class; obtaining a foreground mask of each category by using the smooth classification response image of each category and the training image; acquiring a proportional matrix by using the foreground mask of each category; and generating a pseudo mask image based on the foreground mask of each category and the proportion matrix. The quality of generating the pseudo mask is improved through an image processing method.

Description

Image processing method, electronic device, and storage medium
Technical Field
The present application relates to the field of image processing application technologies, and in particular, to an image processing method, an electronic device, and a storage medium.
Background
Semantic segmentation is a basic computer vision task that aims to predict the pixel-level classification results of images. However, semantic segmentation requires the collection of class labels at the pixel level, which is both time consuming and expensive compared to other tasks such as classification and detection.
Recently, a great deal of research has been conducted on weakly supervised semantic segmentation, which attempts to achieve segmentation performance equivalent to that of fully supervised methods, using weakly supervised semantic segmentation techniques such as image-level classification labels, graffiti, and bounding boxes.
Currently, weak supervised partitioning is generally based on a class response mapping (CAM) to generate pseudo masks. However, the class response graph usually only responds in the most identifiable place, and is missed in other areas, i.e. a local response problem, so that the quality of the generated pseudo mask is not high.
Disclosure of Invention
The application provides an image processing method, an electronic device and a storage medium.
One technical solution adopted by the present application is to provide an image processing method, including:
acquiring a training image and a class response graph thereof;
segmenting the class response graph to obtain a plurality of class classification response graphs;
carrying out coefficient of variation smoothing on each class classification response graph to obtain a smooth classification response graph of each class;
obtaining a foreground mask of each category by using the smooth classification response image of each category and the training image;
acquiring a proportional matrix by using the foreground mask of each category;
and generating a pseudo mask image based on the foreground mask of each category and the proportion matrix.
By the method, the proportion pseudo mask with smooth coefficient of variation can be provided to generate the high-quality pseudo mask, and the generation quality of the pseudo mask is improved.
Wherein, the step of performing coefficient of variation smoothing on each class classification response map to obtain a smoothed classification response map includes:
obtaining the variation coefficient of each category classification response graph;
taking the variation coefficient of each class classification response graph as a smoothing parameter of each class;
and smoothing the pixels in the classification response map of each class based on the smoothing parameters of the class to obtain a smooth classification response map of the class.
In the manner, the activation region of the class classification response map can be expanded by using the coefficient of variation smoothing processing so as to overcome the partial response problem based on the class response map.
Wherein, the obtaining the variation coefficient of each class classification response map comprises:
obtaining the confidence coefficient distribution of each category classification response image;
obtaining the confidence coefficient deviation and the confidence coefficient average value of each category classification response image based on the confidence coefficient distribution and a preset threshold value;
and obtaining the variation coefficient by using the confidence coefficient deviation and the confidence coefficient average value.
Through the method, the activation region of the target object can be effectively expanded by providing a specific variation coefficient smoothing processing mode.
Wherein the obtaining of the foreground mask of each class by using the smoothed classification response map of each class and the training image comprises:
obtaining class specific background in each class smooth classification response image;
and acquiring a foreground binary mask in the specific background of each category class by using a preset algorithm and the training image, and combining the foreground binary masks of all the categories to form a foreground matrix.
Through the mode, the importance of the foreground position is represented through the foreground matrix.
Wherein the obtaining a scaling matrix using the foreground mask of each category includes:
obtaining a category foreground score by using a foreground binary mask in the specific background of each category;
acquiring a pixel category score of each pixel in the training image;
obtaining the total of the category foreground scores of all categories;
and acquiring the proportion matrix based on the pixel category fraction and the category foreground fraction sum.
In this way, the importance of each position of each category can be independently calculated by the scale matrix.
Wherein the generating a pseudo mask image based on the foreground mask of each category and the scaling matrix comprises:
and multiplying the elements of the foreground matrix and the elements of the proportional matrix according to the channel dimension of the training image to generate the pseudo mask image.
In this way, the process from the class response graph to the pseudo mask can be optimized by the proportional pseudo mask generation.
The image processing method further comprises the following steps:
carrying out normalization processing on the category response graph;
the step of segmenting the class response map to obtain a classification response map of a plurality of classes includes:
and segmenting the normalized class response graph to obtain a classification response graph of a plurality of classes.
By the method, the statistical distribution of the generalized uniform category response graph can be realized.
Wherein the image processing method further comprises:
inputting the pseudo mask image into a preset segmentation model, and acquiring a loss mean value obtained by training the pseudo mask image;
processing the loss mean value by adopting a preset strategy under the condition that the loss mean value is smaller than a preset loss threshold value;
and training the preset segmentation model by using the processed loss average value.
In this way, the problem of noise of the pseudo mask image can be solved by adjusting the loss value of the segmentation model.
Wherein, the processing of the loss mean value by adopting a preset strategy comprises the following steps:
setting the loss average value as a preset threshold value when the loss average value is greater than or equal to the preset threshold value;
or, performing scaling processing on the loss mean value by using the preset threshold value;
or setting the loss average value to 0 when the loss average value is greater than or equal to the preset threshold value.
By the method, the segmentation model can be adjusted by using an incomplete fitting strategy, and the anti-noise performance of the segmentation part is improved.
Wherein the image training method further comprises:
acquiring a pseudo mask image output by the preset segmentation model;
and taking the output pseudo mask image as the input of the next training of the preset segmentation model.
In the above manner, the output of the segmentation model is used as a new pseudo mask input, namely, a cyclic pseudo mask, so that the quality of the training annotation is improved.
Another technical solution adopted by the present application is to provide an electronic device, including:
the acquisition module is used for acquiring a training image and a class response map thereof;
the segmentation module is used for segmenting the category response graph to obtain a plurality of category classification response graphs;
the processing module is used for carrying out coefficient of variation smoothing processing on each class classification response graph to obtain a smooth classification response graph of each class;
a computing module, configured to obtain a foreground mask of each class by using the smooth classification response map of each class and the training image, and further obtain a scaling matrix by using the foreground mask of each class;
a generating module for generating a pseudo mask image based on the foreground mask of each category and the scaling matrix
Another technical solution adopted by the present application is to provide an electronic device, which includes a memory and a processor coupled to the memory;
wherein the memory is configured to store program data and the processor is configured to execute the program data to implement the image processing method as described above.
Another technical solution adopted by the present application is to provide a computer storage medium for storing program data, which when executed by a computer, is used to implement the image processing method as described above.
The beneficial effect of this application is: the electronic equipment acquires a training image and a class response diagram thereof; dividing the class response graph to obtain a plurality of class classification response graphs; carrying out coefficient of variation smoothing on each class classification response graph to obtain a smooth classification response graph of each class; acquiring a foreground mask of each category by using the smooth classification response image of each category and the training image; acquiring a proportional matrix by using the foreground mask of each category; a pseudo-mask image is generated based on the foreground mask and the scaling matrix for each category. The image processing method provides the proportion pseudo mask with smooth coefficient of variation to generate the high-quality pseudo mask, and the generation quality of the pseudo mask is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flowchart of an embodiment of an image processing method provided in the present application;
FIG. 2 is a block diagram of an embodiment of an image processing method provided in the present application;
FIG. 3 is a schematic flow chart of step S103 of the image processing method shown in FIG. 1;
FIG. 4 is a schematic flowchart of another embodiment of an image processing method provided in the present application;
FIG. 5 is a schematic structural diagram of an embodiment of an electronic device provided in the present application;
FIG. 6 is a schematic structural diagram of another embodiment of an electronic device provided herein;
fig. 7 is a schematic structural diagram of a computer-readable storage medium of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Specifically, referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of an embodiment of an image processing method provided by the present application, and fig. 2 is a schematic frame diagram of an embodiment of an image processing method provided by the present application. The image processing method of the embodiment of the application can be applied to an electronic device, wherein the electronic device can be a server, a terminal device, a system in which the server and the terminal device are matched with each other, or a device with processing capability (such as a processor). Accordingly, each part, such as each unit, sub-unit, module, and sub-module, included in the electronic device may be all disposed in the server, may be all disposed in the terminal device, and may be disposed in the server and the terminal device, respectively.
Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein.
As shown in fig. 1, the image processing method according to the embodiment of the present application may specifically include the following steps:
step S101: and acquiring a training image and a class response map thereof.
The electronic device of the embodiment of the application first acquires the training image and then acquires the class response map of the training image, wherein the manner of acquiring the class response map from the training image refers to the prior art and is not described herein again. The category response map can map the response size of the feature map to the original map, so that the reader can more intuitively understand the effect of the model, and the category response map can be embodied in expressions such as an attention map and a thermodynamic map.
The electronic device further normalizes the category response map, and specifically, min-max normalization is applied to the category response map
Figure 378685DEST_PATH_IMAGE001
And is expressed as:
Figure 992069DEST_PATH_IMAGE002
wherein h is the ordinate of the category response graph, w is the abscissa of the category response graph, c is the channel index of the category response graph,
Figure 463502DEST_PATH_IMAGE003
corresponding to a pixel point with coordinates (h, w, c)
The value of the pixel is determined by the pixel value,
Figure 721308DEST_PATH_IMAGE004
for the smallest pixel value in the class response map,
Figure 885573DEST_PATH_IMAGE005
is the maximum pixel value in the class response map.
Step S102: and dividing the class response map to obtain a classification response map of a plurality of classes.
Step S103: and carrying out coefficient of variation smoothing treatment on each class classification response graph to obtain a smooth classification response graph of each class.
As shown in fig. 2, the electronic device performs coefficient of variation smoothing on the class response map to obtain the coefficient of variation of each class, and performs exponential transformation.
For example, the electronic device may divide the category response map into a plurality of categories of classification response maps, i.e., CAM slices in fig. 2, calculate a confidence distribution of the classification response map for each category, and smooth the classification response map according to the confidence distribution for each category. Specifically, the category response map has multi-dimensional information, each dimension is embodied as an image channel, that is, the category of the classification response map of the embodiment of the present application, and the classification response map of each category represents image information of one dimension of the category response map.
Referring to fig. 3, fig. 3 is a flowchart illustrating step S103 of the image processing method shown in fig. 1. As shown in fig. 3, step S103 in the embodiment of the present application specifically includes the following steps:
step S131: and obtaining the variation coefficient of each class classification response graph.
In the embodiment of the present application, the motivation of the coefficient of variation smoothing process is to smooth the class response graph based on the change of the spatial domain confidence. Different smoothing strengths are required for different images and different classes depending on their confidence distributions. Therefore, in order to measure the confidence distribution, the coefficient of variation cv is introduced in the embodiment of the present application, and the coefficient of variation cv may be specifically defined by the following equation:
Figure 986253DEST_PATH_IMAGE006
wherein,
Figure 995797DEST_PATH_IMAGE007
for each class of classification response map pixel confidence deviations,
Figure 373689DEST_PATH_IMAGE008
for the pixel confidence mean of the classification response map for each class,
Figure 443276DEST_PATH_IMAGE009
and f, the confidence of the pixel points in the classification response image of the category f.
Step S132: and taking the variation coefficient of each class classification response graph as a smoothing parameter of each class.
Step S133: and smoothing the pixels in the classification response map of each class based on the smoothing parameters of the class to obtain a smooth classification response map of the class.
In an embodiment of the application, the electronic device raises the coefficient of variation cv as an exponential function power of each pixel in the classification response map. Wherein, due to
Figure 31252DEST_PATH_IMAGE010
Lower exponential powers below 1 result in less difference between foreground pixels and smoother classification response maps.
In the embodiment of the application, the variation coefficient is used as a smoothing parameter of each pixel in the classification response map, and each pixel is smoothed, that is, the difference between the product of the variation coefficient and the scale factor s and 1 is used as an index of the classification response map to perform smoothing, so as to obtain a corresponding smooth classification response map. Specifically, the implementation equation is specifically as follows:
Figure 844487DEST_PATH_IMAGE011
wherein,
Figure 342465DEST_PATH_IMAGE012
the method is a smooth classification response graph of n types, cv is a coefficient of variation, and s is a preset scale factor.
Step S104: and acquiring the foreground mask of each category by using the smooth classification response graph of each category and the training image.
In the embodiment of the application, in order to optimize the process from the class response diagram to the pseudo mask, the generation of the pseudo mask in proportion is further proposed.
As shown in fig. 2, after performing coefficient of variation smoothing processing on the class response map, the electronic device further obtains a foreground matrix by using the smoothed classification response map and the training image. An important issue in weakly supervised semantic segmentation is that the class response maps are obtained from binary classifiers, following an independent way of binary cross entropy loss, so that embodiments of the present application can generate class specific backgrounds for each class's smooth class response map by the bg function and apply CRF (conditional random field) thereto. Then, the electronic device introduces a training image, calculates a foreground binary mask by using an fg function, and specifically implements the following equation:
Figure 848533DEST_PATH_IMAGE013
wherein, I is a training image,
Figure 533592DEST_PATH_IMAGE014
in order to smooth the classification response map,
Figure 275152DEST_PATH_IMAGE015
is the foreground binary mask.
It should be noted that the foreground matrix in the embodiment of the present application is formed by combining foreground binary masks in all class-specific backgrounds, and the representation form of the foreground matrix may be that each column or each row of the foreground matrix respectively includes a foreground binary mask in one class-specific background.
Step S105: the scaling matrix is obtained using the foreground mask for each class.
In the embodiment of the application, the electronic device further acquires the proportional matrix by adopting the foreground matrix. Specifically, the electronic device obtains a category foreground score using a foreground binary mask in the specific background of each category class; acquiring a pixel category score of each pixel in a training image; the pixel category score is divided by the sum of the category foreground scores to obtain a scaling matrix. The category response map is obtained from the binary classifier, each pixel point in the category response map outputs a pixel category score through the binary classifier, and the category foreground score of the foreground binary mask is the sum of the pixel category scores of all the pixel points in the foreground binary mask.
As shown in fig. 2, the electronic device employs the category score of the pixels in each category divided by the foreground score sum for that category. Wherein the class scores of the pixels are output by the binary classifier in a training process, and the foreground score sum of each class is provided by the foreground matrix.
Therefore, the specific implementation equation of the scaling matrix in the embodiment of the present application is as follows:
Figure 362056DEST_PATH_IMAGE016
wherein,
Figure 39025DEST_PATH_IMAGE017
the class foreground score for the foreground binary mask in class m,
Figure 211381DEST_PATH_IMAGE018
for the class foreground score of the foreground binary mask in class n,
Figure 631998DEST_PATH_IMAGE019
is the category score of pixel point (c, h, w).
Step S106: a pseudo-mask image is generated based on the foreground mask and the scaling matrix for each category.
In an embodiment of the application, the electronic device calculates a maximum value of a product of the foreground matrix and the scaling matrix using an argmax function. Specifically, the argmax function performs multiplication operation on the elements of the foreground binary mask and the elements of the scale map along the channel dimension to generate a pseudo mask, and the specific implementation equation is as follows:
Figure 963622DEST_PATH_IMAGE020
through the processing of the steps, a high-quality pseudo mask is generated on the training image and can be used as the input of the subsequent semantic segmentation training.
In the embodiment of the application, the electronic equipment acquires a training image and a class response map thereof; dividing the class response graph to obtain a plurality of class classification response graphs; carrying out coefficient of variation smoothing on each class classification response graph to obtain a smooth classification response graph of each class; acquiring a foreground mask of each category by using the smooth classification response image of each category and the training image; acquiring a proportional matrix by using the foreground mask of each category; a pseudo-mask image is generated based on the foreground mask and the scaling matrix for each category. The image processing method provides the proportion pseudo mask with smooth coefficient of variation to generate the high-quality pseudo mask, and the generation quality of the pseudo mask is improved.
Referring to fig. 4, fig. 4 is a schematic flowchart illustrating an image processing method according to another embodiment of the present disclosure. As shown in fig. 4, the image processing method according to the embodiment of the present application may specifically include the following steps:
step S201: a pseudo mask image of a training image is obtained.
In this embodiment of the application, the method for obtaining the high-quality pseudo mask image in the training image may be implemented in the above embodiment of the image processing method, and details are not repeated here.
Step S202: and inputting the pseudo mask image into a preset segmentation model, and acquiring a loss mean value obtained by training the pseudo mask image.
In the embodiment of the application, the electronic device inputs the training image with the pseudo mask into the preset segmentation model for training the preset segmentation model. In the segmentation training process, in order to solve the noise problem, an incomplete fitting strategy is provided in an embodiment of the present application, which specifically refers to the following steps:
step S203: and under the condition that the loss average value is smaller than a preset loss threshold value, processing the loss average value by adopting a preset strategy.
In the embodiment of the present application, compared with manual annotation, the pseudo mask image used as a supervisory signal for training semantic segmentation is noisy, and the current research focuses on generating high-quality pseudo masks to reduce noise, and few people try to suppress noise in the model training process.
The embodiment of the application provides a method for re-weighting the loss value of the potential noise pixel in the optimization of the weak supervised segmentation so as to reduce the influence of the noise pixel on the training of the preset segmentation model. Specifically, an incomplete fitting strategy is provided in an embodiment of the present application, first, the electronic device needs to obtain a loss mean value obtained by training a pseudo mask image by using a preset segmentation model, where the loss mean value of the pseudo mask image is an average value of training loss values of all pixels in the pseudo mask image.
Then, the electronic device judges whether the loss mean value of the pseudo mask image needs to be adjusted in the training through a preset loss threshold, and the judgment logic is as the following equation:
Figure 811492DEST_PATH_IMAGE021
wherein pus () represents the operation of an incomplete fit strategy,
Figure 471144DEST_PATH_IMAGE022
is the loss average in the pseudo-mask image.
When the loss mean value of the pseudo mask image is greater than or equal to the preset loss threshold value beta, the electronic equipment does not need to adjust the loss mean value of the pseudo mask image in the training. And when the loss mean value of the pseudo mask image is smaller than a preset loss threshold value beta, the electronic equipment adjusts the loss mean value of the pseudo mask image in the training through incomplete fitting strategies such as cutting, exponential scaling, neglecting and the like.
Specifically, the three incomplete-fit strategies provided herein are as follows:
Figure 429873DEST_PATH_IMAGE023
Figure 341635DEST_PATH_IMAGE024
Figure 360407DEST_PATH_IMAGE025
wherein,
Figure 507354DEST_PATH_IMAGE026
the loss average is set to a preset threshold k, and
Figure 269774DEST_PATH_IMAGE027
the missing values of these pixels are discarded and,
Figure 451356DEST_PATH_IMAGE028
a scaling strategy is performed on the loss means by an exponential function.
In particular, the amount of the solvent to be used,
Figure 500084DEST_PATH_IMAGE029
when the loss average value is smaller than a preset threshold k, the loss average value is kept unchanged, and when the loss average value is larger than or equal to the preset threshold k, the loss average value is set to be a fixed value k.
Figure 134327DEST_PATH_IMAGE030
And performing exponential transformation on the loss mean value through k to realize loss mean value scaling processing.
Figure 434859DEST_PATH_IMAGE031
When the loss average value is smaller than a preset threshold k, the loss average value is kept unchanged, and when the loss average value is larger than or equal to the preset threshold k, the loss average value is set to be 0.
Step S204: and training the preset segmentation model by using the processed loss average value.
In the embodiment of the application, the electronic device trains the preset segmentation model by using the processed loss mean value, so that the noise problem of the pseudo mask image can be well solved.
Further, the embodiment of the application can also correspond to the problem of low accuracy of the pseudo mask, and the output of the preset segmentation model is used as a new pseudo mask image, namely, a cyclic pseudo mask, so that the quality of the training annotation is improved. Specifically, the electronic device may obtain a pseudo mask image output by a preset segmentation model; and taking the output pseudo mask image as the input of the next training of the preset segmentation model, namely, retraining the segmentation model by taking the output of the segmentation model as a new pseudo mask image, and updating the training annotation quality so as to improve the training precision of the segmentation model.
The above embodiments are only one of the common cases of the present application and do not limit the technical scope of the present application, so that any minor modifications, equivalent changes or modifications made to the above contents according to the essence of the present application still fall within the technical scope of the present application.
With continued reference to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of an electronic device provided in the present application. The electronic device 300 includes an obtaining module 31, a dividing module 32, a processing module 33, a calculating module 34, and a generating module 35.
The obtaining module 31 is configured to obtain a training image and a category response map thereof.
And a dividing module 32, configured to divide the category response map to obtain a classification response map of multiple categories.
And the processing module 33 is configured to perform coefficient of variation smoothing on each class classification response map to obtain a smoothed classification response map of each class.
And a calculating module 34, configured to obtain a foreground mask of each class by using the smoothed classification response map of each class and the training image, and further obtain a scaling matrix by using the foreground mask of each class.
A generating module 35, configured to generate a pseudo mask image based on the foreground mask of each category and the scaling matrix.
With continued reference to fig. 6, fig. 6 is a schematic structural diagram of another embodiment of the electronic device provided in the present application. The electronic device 500 of the embodiment of the present application includes a processor 51, a memory 52, an input-output device 53, and a bus 54.
The processor 51, the memory 52 and the input/output device 53 are respectively connected to the bus 54, the memory 52 stores program data, and the processor 51 is used for executing the program data to implement the image processing method according to the above embodiment.
In the embodiment of the present application, the processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.
Please refer to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application, the computer storage medium 600 stores program data 61, and the program data 61 is used to implement the image processing method according to the above embodiment when being executed by a processor.
Embodiments of the present application may be implemented in software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, which is defined by the claims and the accompanying drawings, and the equivalents and equivalent structures and equivalent processes used in the present application and the accompanying drawings are also directly or indirectly applicable to other related technical fields and are all included in the scope of the present application.

Claims (13)

1. An image processing method, characterized in that the image processing method comprises:
acquiring a training image and a class response graph thereof;
segmenting the class response graph to obtain a plurality of class classification response graphs;
carrying out coefficient of variation smoothing on each class classification response graph to obtain a smooth classification response graph of each class;
obtaining a foreground mask of each category by using the smooth classification response image of each category and the training image;
acquiring a proportional matrix by using the foreground mask of each category;
and generating a pseudo mask image based on the foreground mask of each category and the proportion matrix.
2. The image processing method according to claim 1,
the step of performing coefficient of variation smoothing on each class classification response map to obtain a smoothed classification response map includes:
obtaining the variation coefficient of each category classification response graph;
taking the variation coefficient of each class classification response graph as a smoothing parameter of each class;
and smoothing the pixels in the classification response map of each class based on the smoothing parameters of the class to obtain a smooth classification response map of the class.
3. The image processing method according to claim 2,
the obtaining of the coefficient of variation of each class classification response map includes:
obtaining the confidence coefficient distribution of each category classification response image;
obtaining the confidence coefficient deviation and the confidence coefficient average value of each category classification response image based on the confidence coefficient distribution and a preset threshold value;
and obtaining the variation coefficient by using the confidence coefficient deviation and the confidence coefficient average value.
4. The image processing method according to any one of claims 1 to 3,
the obtaining the foreground mask of each category by using the smooth classification response map of each category and the training image includes:
obtaining class specific background in each class smooth classification response image;
and acquiring a foreground binary mask in the specific background of each category class by using a preset algorithm and the training image, and combining the foreground binary masks of all the categories to form a foreground matrix.
5. The image processing method according to claim 4,
the obtaining a scaling matrix by using the foreground mask of each category includes:
obtaining a category foreground score by using a foreground binary mask in the specific background of each category;
acquiring a pixel category score of each pixel in the training image;
obtaining the total of the category foreground scores of all categories;
and acquiring the proportion matrix based on the pixel category fraction and the category foreground fraction sum.
6. The image processing method according to claim 5,
generating a pseudo-mask image based on the foreground mask of each category and the scaling matrix, comprising:
and multiplying the elements of the foreground matrix and the elements of the proportional matrix according to the channel dimension of the training image to generate the pseudo mask image.
7. The image processing method according to claim 1,
the image processing method further comprises the following steps:
carrying out normalization processing on the category response graph;
the step of segmenting the class response map to obtain a classification response map of a plurality of classes includes:
and segmenting the normalized class response graph to obtain a classification response graph of a plurality of classes.
8. The image processing method according to claim 1, characterized in that the image processing method further comprises:
inputting the pseudo mask image into a preset segmentation model, and acquiring a loss mean value obtained by training the pseudo mask image;
processing the loss mean value by adopting a preset strategy under the condition that the loss mean value is smaller than a preset loss threshold value;
and training the preset segmentation model by using the processed loss average value.
9. The image processing method according to claim 8,
and processing the loss mean value by adopting a preset strategy, comprising the following steps:
setting the loss average value as a preset threshold value when the loss average value is greater than or equal to the preset threshold value;
or, performing scaling processing on the loss mean value by using the preset threshold value;
or setting the loss average value to 0 when the loss average value is greater than or equal to the preset threshold value.
10. The image processing method according to claim 8,
the image processing method further includes:
acquiring a pseudo mask image output by the preset segmentation model;
and taking the output pseudo mask image as the input of the next training of the preset segmentation model.
11. An electronic device, characterized in that the electronic device comprises:
the acquisition module is used for acquiring a training image and a class response map thereof;
the segmentation module is used for segmenting the category response graph to obtain a plurality of category classification response graphs;
the processing module is used for carrying out coefficient of variation smoothing processing on each class classification response graph to obtain a smooth classification response graph of each class;
a computing module, configured to obtain a foreground mask of each class by using the smooth classification response map of each class and the training image, and further obtain a scaling matrix by using the foreground mask of each class;
and the generating module is used for generating a pseudo mask image based on the foreground mask of each category and the proportion matrix.
12. An electronic device, comprising a memory and a processor coupled to the memory;
wherein the memory is used for storing program data, and the processor is used for executing the program data to realize the image processing method according to any one of claims 1-10.
13. A computer storage medium for storing program data for implementing an image processing method according to any one of claims 1 to 10 when executed by a computer.
CN202110996642.1A 2021-08-27 2021-08-27 Image processing method, electronic device, and storage medium Active CN113449820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110996642.1A CN113449820B (en) 2021-08-27 2021-08-27 Image processing method, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110996642.1A CN113449820B (en) 2021-08-27 2021-08-27 Image processing method, electronic device, and storage medium

Publications (2)

Publication Number Publication Date
CN113449820A true CN113449820A (en) 2021-09-28
CN113449820B CN113449820B (en) 2022-01-18

Family

ID=77818867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110996642.1A Active CN113449820B (en) 2021-08-27 2021-08-27 Image processing method, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN113449820B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008962A (en) * 2019-04-11 2019-07-12 福州大学 Weakly supervised semantic segmentation method based on attention mechanism
CN110870770A (en) * 2019-11-21 2020-03-10 大连理工大学 ICA-CNN classified fMRI space activation map smoothing and broadening method
CN111462163A (en) * 2020-01-03 2020-07-28 华中科技大学 Weakly supervised semantic segmentation method and application thereof
CN111915618A (en) * 2020-06-02 2020-11-10 华南理工大学 Example segmentation algorithm and computing device based on peak response enhancement
CN113096138A (en) * 2021-04-13 2021-07-09 西安电子科技大学 Weak supervision semantic image segmentation method for selective pixel affinity learning
US20210241034A1 (en) * 2020-01-31 2021-08-05 Element Al Inc. Method of and system for generating training images for instance segmentation machine learning algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008962A (en) * 2019-04-11 2019-07-12 福州大学 Weakly supervised semantic segmentation method based on attention mechanism
CN110870770A (en) * 2019-11-21 2020-03-10 大连理工大学 ICA-CNN classified fMRI space activation map smoothing and broadening method
CN111462163A (en) * 2020-01-03 2020-07-28 华中科技大学 Weakly supervised semantic segmentation method and application thereof
US20210241034A1 (en) * 2020-01-31 2021-08-05 Element Al Inc. Method of and system for generating training images for instance segmentation machine learning algorithm
CN111915618A (en) * 2020-06-02 2020-11-10 华南理工大学 Example segmentation algorithm and computing device based on peak response enhancement
CN113096138A (en) * 2021-04-13 2021-07-09 西安电子科技大学 Weak supervision semantic image segmentation method for selective pixel affinity learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUN WEI ET AL.: "Shallow Feature Matters for Weakly Supervised Object Localization", 《ARXIV:2108.00873V1 [CS.CV]》 *
杨昀欣: "基于弱监督学习的图像语义分割方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Also Published As

Publication number Publication date
CN113449820B (en) 2022-01-18

Similar Documents

Publication Publication Date Title
CN109829448B (en) Face recognition method, face recognition device and storage medium
US11669711B2 (en) System reinforcement learning method and apparatus, and computer storage medium
CN111160407B (en) Deep learning target detection method and system
CN109726195B (en) Data enhancement method and device
CN112257738A (en) Training method and device of machine learning model and classification method and device of image
CN111178261B (en) Face detection acceleration method based on video coding technology
CN111695462A (en) Face recognition method, face recognition device, storage medium and server
EP4270247A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
CN112446379A (en) Self-adaptive intelligent processing method for dynamic large scene
CN110909665A (en) Multitask image processing method and device, electronic equipment and storage medium
CN110880018B (en) Convolutional neural network target classification method
CN116525517B (en) Positioning control method and system for conveying semiconductor wafers
CN114998595A (en) Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN113449820B (en) Image processing method, electronic device, and storage medium
CN115795355B (en) Classification model training method, device and equipment
CN116129496A (en) Image shielding method and device, computer equipment and storage medium
CN114172708A (en) Method for identifying network flow abnormity
CN114519675A (en) Image processing method and device, electronic equipment and readable storage medium
Kumar et al. Age Classification Based On Integrated Approach
CN112836819B (en) Neural network model generation method and device
CN111898421B (en) Regularization method for video behavior recognition
CN111832460B (en) Face image extraction method and system based on multi-feature fusion
WO2023134068A1 (en) Digit recognition model training method and apparatus, device, and storage medium
US20240161245A1 (en) Image optimization
Jia et al. Image salient object detection based on perceptually homogeneous patch

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant