CN116342857A - Weak supervision target positioning method based on category correction - Google Patents
Weak supervision target positioning method based on category correction Download PDFInfo
- Publication number
- CN116342857A CN116342857A CN202310336796.7A CN202310336796A CN116342857A CN 116342857 A CN116342857 A CN 116342857A CN 202310336796 A CN202310336796 A CN 202310336796A CN 116342857 A CN116342857 A CN 116342857A
- Authority
- CN
- China
- Prior art keywords
- network
- positioning
- foreground
- mask
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of computer vision, and particularly relates to a weak supervision target positioning method based on category correction. To solve the disadvantage of inaccurate positioning of CAM technology, we do not use class feature diagrams for positioning any more, but use a coarse-to-fine flow. The model of the invention is composed of a main network, a positioning network and a classification network, wherein the positioning network firstly generates a class-independent segmentation map by using an unsupervised segmentation technology, thereby determining the rough position of a target object. And fine granularity correction is carried out by the classification network through the class labels. The method based on category correction can accurately position the object and can well identify the contour details.
Description
Technical Field
The invention belongs to the field of computer vision, and relates to a weak supervision target positioning method based on category correction.
Background
Target positioning is a basic sensing task in the field of computer vision, and aims to position a specific position of a target object in an image and judge the category to which the target object belongs. However, in practical applications, in order to make the algorithm model have good generalization performance, it is often necessary to use large-scale labor cost to label the target bounding box and even the pixel level. Because of the cost of labeling, the weakly supervised target localization task often enables models to locate objects by relying on class labels that are easily available and labeled. Aiming at the problem of weak supervision target positioning, the mainstream research at home and abroad is based on CAM technology, and the position of an object is determined through a focus highlight area related to a category in the diagram. However, such methods can generally locate only the portion of the object having the category identification, which causes an inaccurate problem that the algorithm locating frame is often smaller than the target object. Therefore, how to obtain an accurate positioning frame is a problem to be solved in the field of weak supervision target positioning.
Disclosure of Invention
The invention mainly provides a weak supervision target positioning method based on category correction. To solve the disadvantage of inaccurate positioning of CAM technology, we do not use class feature diagrams for positioning any more, but use a coarse-to-fine flow. The algorithm of the present invention consists of a positioning network and a classification network. First, a class-independent segmentation map is generated by a positioning network using an unsupervised segmentation technique to determine a rough location of a target object. Fine-grained correction is then performed by the classification network via the class labels. The method based on category correction can accurately position the object and can well identify the contour details.
In the technical scheme provided by the invention, the coarse-to-fine target positioning method comprises a training stage and a testing stage, wherein the training stage comprises the following steps of:
step 1, constructing a target positioning model, wherein the target positioning model comprises a main network, a classification network and a positioning network, the main network performs feature extraction on an input image, the classification network and the positioning network are dual networks, and the classification and mask prediction is performed on the features extracted from the main network;
step 2, for the input image I, generating a synthetic image I with distributed similarity to the training sample s Foreground mask M s Then, the image I is synthesized s Inputting into a target positioning model to obtain a mask of positioning network prediction
Step 3, picture level fine positioning stage: the difference between the foreground and the background in the image hierarchy is increased, so that the positioning network can position more accurately; comprises the following substeps:
step 3.1, obtaining a real picture I by the positioning network with rough positioning capability trained in the step 2 r Foreground mask prediction of (a)
Step 3.2, predicting the foreground maskAnd real picture I r Hadamard product is carried out to obtain a foreground attention image I irrelevant to category f At the same time, 0-1 conversion is performed on the foreground mask to +.>Will be true picture I r And->Hadamard product is carried out to obtain a background attention image I irrelevant to category f ;
Step 3.3, respectively comparing the foreground attention images I f And background attention image I b Feeding into a classification network for prediction to obtain predicted probability characteristicsAnd +.>
Step 4, fine positioning stage of feature level: after the foreground and background differences of the image level are amplified, the differences between the foreground and the background of the feature level are increased by using the same method as that in the step 3, so that the positioning network further corrects the details which are positioned incorrectly, and a final positioning result is output;
the test phase is as follows:
disconnecting the positioning network and the classification network, and obtaining a final positioning frame by threshold screening of the foreground mask of the positioning network:
wherein the method comprises the steps ofMask representing test sample predicted by positioning network, theta is screening threshold value, select function is selected +.>A portion greater than the threshold value and returning a minimum bounding Box containing all foreground coordinates as the final determined bounding Box.
Furthermore, in the step 1, the backbone network adopts a U-Net network structure, and the positioning network adopts a CNN convolution network structure.
Further, in step 2, a bigbiggan method is used to generate a composite image and a mask.
wherein θ is B And theta L Representing parameters of the backbone network and the positioning network, respectively.
Further, the positioning network is optimized by adopting a binary cross entropy function, and the loss function is as follows:
where m, n are the width and height of the mask,for foreground mask M s Elements of row i and column j +.>For prediction mask->Elements of row i and column j.
Further, the probability characteristics predicted in step 3.3And +.>The calculation formula of (2) is as follows:
wherein θ is B And theta C Parameters representing the backbone network and classifier respectively, the loss functions for the foreground and background attention images are specifically as follows:
wherein the method comprises the steps ofIs a cross entropy function of the foreground attention image, < ->Is the negative of the entropy of the background attention image, K is the number of categories of the whole dataset, and the overall loss function at the picture level fine positioning stage can be expressed as:
where α and B are balance parameters.
Further, the specific implementation manner of the step 4 is as follows;
step 4.1 for real picture I r Obtaining a feature map by using the positioning network trained in the step 3And mask->The calculation formula of the feature map is as follows:
wherein θ is B Is a parameter of the backbone network;
step 4.2, feature mapAnd mask->0-1 conversion of mask +.>Respectively carrying out Hadamard products to obtain a foreground characteristic diagram +.>And background feature map->The formula is as follows:
step 4.3, fixing the classification network weights trained in the step 3, using the classification network weights as a judger of mask quality, and respectively mapping the foreground characteristic imagesAnd background feature map->Feeding into a classification network for prediction to obtain predicted probability characteristics +.>And +.>
Wherein θ is C Parameters representing the classifier; the specific loss function is as follows:
wherein the method comprises the steps ofIs the cross entropy of the foreground features, +.>Is the negative number of the entropy of the background feature, K is the class number of the training sample, and the loss function of the training sample in the feature level fine positioning stage can be expressed as:
where α and β are balance parameters.
Further, the value of the threshold value θ is 0.55±0.05.
Further, the values of alpha and beta are 1.
Compared with the prior art, the invention has the beneficial effects that:
the invention avoids the defect of small positioning caused by CAM technology, the CAM obtains the category attention image through the category information training in the whole course, but ignores the object area with low category identification degree, thus only performing rough positioning and having very bad positioning effect on the fine granularity data set. The invention adopts the flow combining the category irrelevant information and the category relevant information, trains the network by using the category irrelevant segmentation map, and carries out detail correction by the category information, thereby achieving the effect of fine positioning. The invention can completely locate the outline of the object, and the feature map can clearly outline the outline information of the target object. In the fine positioning stage, the category information plays an auxiliary correction role, so that the network can not ignore a foreground region with low category identification degree, and the defect of CAM technology is overcome.
Drawings
Fig. 1 is a training flow chart in an embodiment of the present invention.
FIG. 2 is a flow chart of a test in an embodiment of the invention.
Detailed Description
The invention is further described below with reference to the drawings and specific examples.
The invention provides a weak supervision target positioning algorithm based on category correction. The algorithm utilizes both category independent and category dependent information while avoiding CAM [1] Technical disadvantages. The invention provides a double-head network structure of a locator-classifier for learning category-independent information and category-related information. The locator consists of a segmentation network that predicts the foreground mask of the input image. The classifier then modifies the predicted structure of the locator from the image level and the feature level, respectively.
[1]B.Zhou,A.Khosla,A.Lapedriza,A.Oliva,and A.Torralba,“Learning deep features for discriminative localization,”in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016,pp.2921–2929.
The present invention proposes optimizing an algorithm model from a coarse to a fine training process, as shown in fig. 1. In the rough localization stage, for category-independent information, we generate a composite image I by an unsupervised method s Foreground mask M s . The related unsupervised method comprises [2],[3] . Locator prediction composite image I s M generated by unsupervised masking s And supervision is performed, so that the feature segmentation capability is independent of the category.
[2]A.Voynov,S.Morozov,and A.Babenko,“Object segmentation without labels with large-scale generative models,”in International Conference on Machine Learning.PMLR,2021,pp.10 596–10 606
[3]M.Chen,T.Artieres,and L.Denoyer,“Unsupervised object segmenta-`tion by redrawing,”Advances in Neural Information Processing Systems,vol.32,2019.
In the fine positioning phase, a positioner with coarse positioning capability first predicts the real picture I r Foreground mask of (a)To further increase the difference between foreground and background we get the foreground and background attention images by Hadamard product of the foreground mask and the original image. The classifier performs different classification tasks on the two attention images to optimize: the foreground image is supervised by the category labels, the background image is not to be classified into any category, and the supervision is performed by inhibiting the category highly belonging to the foreground image, so that the invention further increases the difference between the foreground and the background in the feature hierarchy. And carrying out Hadamard product on the feature map of the locator and the predicted foreground mask to obtain a separated foreground feature map and a background feature map. After the fine positioning of the image level, the classifier learns a certain foreground classifying capability, and the weight of the classifier is fixed at the moment so that the classifier does not participate in gradient feedback, and the classifier is utilized to judge the separation quality of the foreground and background feature images, so that the foreground and background distinguishing degree of the classifier in the feature level is increased.
After fine positioning training of an image level and a feature level, the locator can better judge the foreground and background areas from the semantic angle through category correction, and meanwhile, the defect that the CAM technology cannot find the foreground without category identification is avoided. This is because the locator is based on class independent feature training and thus has a good identification of the contour, texture information itself. After the category correction is obtained, the algorithm has judging capability on the foreground information related to the semantics and irrelevant to the semantics.
As shown in fig. 2, in the test stage, the invention takes the trained positioner to complete the target positioning task. First locator prediction for real picture I r Is a mask of (2)Foreground mask +.>Binarizing to obtain binary foreground mask +.>Wherein the foreground value is 1 and the background value is 0. The foreground values may be discontinuous and clustered, and the largest continuous foreground value cluster is selected as the prediction foreground, and the rest is regarded as the background. And taking the tightest bounding Box containing the foreground (namely the smallest bounding Box containing all foreground coordinates) as a positioning Box of the target object for the screened foreground.
The flow provided by the embodiment specifically comprises the following steps:
step 1, constructing a target positioning model, wherein the target positioning model comprises a main network, a classification network and a positioning network, the main network performs feature extraction on an input image, the classification network and the positioning network are dual networks, and the classification and mask prediction is performed on the features extracted from the main network;
step 2: generating a synthetic image I with distributed similarity with training samples through an unsupervised algorithm such as GAN s Foreground mask M s In the example, bigBiGAN is selected [2] The method of (1) generates a composite image and a mask. Then the composite image I s Inputting the foreground mask predicted by the positioning network into the target positioning modelThe formula is as follows:
wherein θ is B And theta L Representing parameters of the backbone network and the positioning network, respectively, in the example θ B Adopting a U-Net network structure, theta L A CNN convolutional network structure is employed. f is the process of network prediction masking. Masking maskThe closer the value of the corresponding pixel in 1 indicates the more likely the positioning network will determine it as foreground, whereas the closer the pixel value is to 0 represents the more likely the positioning network will determine it as background. The algorithm then optimizes the positioning network using a binary cross entropy function (binary cross-entropy). The loss function is as follows:
where m, n are the width and height of the mask,for foreground mask M s Elements of row i and column j +.>For predictive maskingElements of row i and column j. Through step 2, the positioning network has rough positioning capability irrelevant to category, and category related correction is performed on the positioning network by using category information.
Step 3: picture level fine positioning stage: the disparity between foreground and background is increased from the image hierarchy to correct the image. Step 3 may be divided into the following sub-steps:
step 3.1: obtaining a real picture I through a positioning network with coarse positioning capability r Is a predictive mask of (a)
Step 3.2: evaluating a mask using category informationAnd (2) the mass ofIs corrected during the training process. Sample I r And mask->0-1 conversion of mask +.>Respectively carrying out Hadamard product->And calculating to obtain a foreground attention image and a background attention image, wherein the formula is as follows:
step 3.3: respectively comparing foreground attention images I f And background attention image I b Feeding into a classification network for prediction to obtain predicted probability characteristicsAnd +.>The formula in the example is expressed as:
wherein θ is B And theta C Representing parameters of the backbone network and the classification network, respectively. For I f And monitoring by using the class labels, wherein the loss function is a cross entropy function. For I b It does not belong to any class, so it is desirable for the model to be for I b The class probability predictions for (a) tend to average so that they have neither too high a probability prediction class nor too low a probability class. In the example is represented as instruction I b The entropy of the prediction probability of (c) is as large as possible. The loss function for the foreground, background, attention image is specifically as follows:
wherein the method comprises the steps ofIs a cross entropy function of the foreground attention image, < ->Is the negative of the entropy of the background attention image, and K is the number of classes of the training sample as a whole. The overall loss function at the picture level fine positioning stage can be expressed as:
where α and β are balance parameters, and in practice, a number of experiments prove that setting both to 1 can make the algorithm achieve good results. In this step, on the one hand, the class-related correction is performed on the image level on the locator, and on the other hand, the training classifier has classification capability, so that the preparation is made for the class correction on the feature level in the next step.
Step 4: after the foreground and background differences at the image level are increased, this step needs to further ensure that the foreground and background still have differences at the feature level, which is more beneficial to the foreground positioning by the positioning network. Step 4 may be subdivided into the following sub-steps:
step 4.1: for real picture I r Obtaining a feature map by using the positioning network trained in the step 3And mask->Wherein the calculation formula of the mask is the same as (1), and the calculation formula of the feature map is as follows:
wherein θ is B Is a parameter of the backbone network.
Step 4.2: map the characteristic mapAnd mask->0-1 conversion of mask +.>Respectively carrying out Hadamard products to obtain a foreground characteristic diagram +.>And background feature map->The formula is as follows:
step 4.3: fixing the classified network weight trained by the step 3, and taking the classified network weight as a judging device of mask quality. Respectively comparing the foreground feature imagesAnd background feature map->Feeding into a classification network for prediction to obtain predicted probability characteristics +.>And +.>The formula in the example is expressed as:
wherein θ is C Representing parameters of the classification network. The function for the foreground probability features and the background probability features are identical to those in step 3.3. The algorithm optimizes the foreground probability features using a minimized cross entropy function and the background probability features using a maximized entropy function, the specific loss function of which is as follows in the example:
wherein the method comprises the steps ofIs the cross entropy of the foreground features, +.>Is the negative of the entropy of the background feature, and K is the number of classes of the training sample as a whole. The overall loss function at the feature level fine positioning stage can be expressed as:
where α and β are balance parameters, and in practice, a lot of experiments prove that, consistent with step 3, it is found that setting both to 1 can make the algorithm achieve good effect.
Although step 4 and step 3 have similarities, step 4 is necessary. Since the classification network also participates in the training at step 3, that is to say the loss function is partly correcting the features of the positioning model, but a larger part is adapting the classification network. However, in step 4, the classification information can be more fully transferred to the positioning network by fixing the classification network and adjusting the classification network on the feature level, so as to carry out fine-grained correction on the positioning result of the network. Meanwhile, step 3 is also indispensable, and a classification network with generalization capability cannot be obtained without step 3.
The specific implementation also has the following notes:
in the test stage, for the selection of the threshold value theta, a great number of experiments prove that a good result can be obtained by taking the threshold value of 0.55 in the CUB data set. It should be noted that compared with the similar method, the sensitivity of the method to the threshold value is not high, and good effect can be obtained within the range of +/-0.15, and the threshold tolerance interval of the similar method is often less than +/-0.05.
It should be emphasized that the described embodiments of the present invention are illustrative rather than limiting. The invention thus comprises the examples described in the detailed description, but also other embodiments which are obvious to a person skilled in the art from the solution according to the invention, which fall within the scope of protection of the invention.
Claims (9)
1. A weak supervision target positioning method based on category correction is characterized by comprising the following steps: the training phase comprises the following steps:
step 1, constructing a target positioning model, wherein the target positioning model comprises a main network, a classification network and a positioning network, the main network performs feature extraction on an input image, the classification network and the positioning network are dual networks, and the classification and mask prediction is performed on the features extracted from the main network;
step 2, for the input image I, generating a synthetic image I with distributed similarity to the training sample s Foreground mask M s Then, the image I is synthesized s Inputting into a target positioning model to obtain a mask of positioning network prediction
Step 3, picture level fine positioning stage: the difference between the foreground and the background in the image hierarchy is increased, so that the positioning network can position more accurately; comprises the following substeps:
step 3.1, obtaining a real picture I by the positioning network with rough positioning capability trained in the step 2 r Foreground mask prediction of (a)
Step 3.2, predicting the foreground maskAnd real picture I r Hadamard product is carried out to obtain a foreground attention image I irrelevant to category f At the same time, 0-1 conversion is performed on the foreground mask to +.>Will be true picture I r And->Hadamard product is carried out to obtain a background attention image I irrelevant to category b ;
Step 3.3, respectively comparing the foreground attention images I f And background attention image I b Feeding into a classification network for prediction to obtain predicted probability characteristicsAnd +.>
Step 4, fine positioning stage of feature level: after the foreground and background differences of the image level are amplified, the differences between the foreground and the background of the feature level are increased by using the same method as that in the step 3, so that the positioning network further corrects the details which are positioned incorrectly, and a final positioning result is output;
the test phase is as follows:
disconnecting the positioning network and the classification network, and obtaining a final positioning frame by threshold screening of the foreground mask of the positioning network:
wherein the method comprises the steps ofMask representing test sample predicted by positioning network, theta is screening threshold value, select function is selected +.>A portion greater than the threshold value and returning a minimum bounding Box containing all foreground coordinates as the final determined bounding Box.
2. The weak supervision target positioning method based on category correction as set forth in claim 1, wherein: in the step 1, the backbone network adopts a U-Net network structure, and the positioning network adopts a CNN convolution network structure.
3. The weak supervision target positioning method based on category correction as set forth in claim 1, wherein: in step 2, a BigBiGAN method is adopted to generate a composite image and a mask.
5. The weak supervision target positioning method based on category correction as defined in claim 4, wherein: optimizing the positioning network by adopting a binary cross entropy function, wherein the loss function is as follows:
6. The weak supervision target positioning method based on category correction as set forth in claim 1, wherein: probability characterization predicted in step 3.3And +.>The calculation formula of (2) is as follows:
wherein θ is B And theta C Parameters representing the backbone network and classifier respectively, the loss functions for the foreground and background attention images are specifically as follows:
wherein the method comprises the steps ofIs a cross entropy function of the foreground attention image, < ->Is the negative of the entropy of the background attention image, K is the number of categories of the whole dataset, and the overall loss function at the picture level fine positioning stage can be expressed as:
where α and β are balance parameters.
7. The weak supervision target positioning method based on category correction as set forth in claim 1, wherein: the specific implementation mode of the step 4 is as follows;
step 4.1 for real picture I r Obtaining a feature map by using the positioning network trained in the step 3And mask->The calculation formula of the feature map is as follows:
wherein θ is B Is a parameter of the backbone network;
step 4.2, feature mapAnd mask->0-1 conversion of mask +.>Respectively carrying out Hadamard products to obtain a foreground characteristic diagram +.>And background feature map->The formula is as follows:
step 4.3, fixing the classification network weights trained in the step 3, using the classification network weights as a judger of mask quality, and respectively mapping the foreground characteristic imagesAnd background feature map->Feeding into a classification network for prediction to obtain predicted probability characteristics +.>And +.>
Wherein θ is C Parameters representing the classifier; the specific loss function is as follows:
wherein the method comprises the steps ofIs the cross entropy of the foreground features, +.>Is the negative number of the entropy of the background feature, K is the class number of the training sample, and the loss function of the training sample in the feature level fine positioning stage can be expressed as:
where α and β are balance parameters.
8. The weak supervision target positioning method based on category correction as set forth in claim 1, wherein: the value of the threshold value theta is 0.55 plus or minus 0.05.
9. A class-based modified weakly supervised target localization method as set forth in claim 6 or 7, wherein: the values of alpha and beta are 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310336796.7A CN116342857A (en) | 2023-03-28 | 2023-03-28 | Weak supervision target positioning method based on category correction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310336796.7A CN116342857A (en) | 2023-03-28 | 2023-03-28 | Weak supervision target positioning method based on category correction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116342857A true CN116342857A (en) | 2023-06-27 |
Family
ID=86892823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310336796.7A Pending CN116342857A (en) | 2023-03-28 | 2023-03-28 | Weak supervision target positioning method based on category correction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116342857A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912184A (en) * | 2023-06-30 | 2023-10-20 | 哈尔滨工业大学 | Weak supervision depth restoration image tampering positioning method and system based on tampering area separation and area constraint loss |
-
2023
- 2023-03-28 CN CN202310336796.7A patent/CN116342857A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116912184A (en) * | 2023-06-30 | 2023-10-20 | 哈尔滨工业大学 | Weak supervision depth restoration image tampering positioning method and system based on tampering area separation and area constraint loss |
CN116912184B (en) * | 2023-06-30 | 2024-02-23 | 哈尔滨工业大学 | Weak supervision depth restoration image tampering positioning method and system based on tampering area separation and area constraint loss |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107133569B (en) | Monitoring video multi-granularity labeling method based on generalized multi-label learning | |
CN111444939B (en) | Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field | |
Kuznetsova et al. | Expanding object detector's horizon: Incremental learning framework for object detection in videos | |
CN113724231B (en) | Industrial defect detection method based on semantic segmentation and target detection fusion model | |
CN112836639A (en) | Pedestrian multi-target tracking video identification method based on improved YOLOv3 model | |
CN108564598B (en) | Improved online Boosting target tracking method | |
CN114648665A (en) | Weak supervision target detection method and system | |
CN111275010A (en) | Pedestrian re-identification method based on computer vision | |
CN116342857A (en) | Weak supervision target positioning method based on category correction | |
CN114818963B (en) | Small sample detection method based on cross-image feature fusion | |
CN115601307A (en) | Automatic cell detection method | |
CN115861229A (en) | YOLOv5 s-based X-ray detection method for packaging defects of components | |
CN116051479A (en) | Textile defect identification method integrating cross-domain migration and anomaly detection | |
CN112418358A (en) | Vehicle multi-attribute classification method for strengthening deep fusion network | |
CN114078106A (en) | Defect detection method based on improved Faster R-CNN | |
CN112307894A (en) | Pedestrian age identification method based on wrinkle features and posture features in community monitoring scene | |
CN116681961A (en) | Weak supervision target detection method based on semi-supervision method and noise processing | |
CN116310293A (en) | Method for detecting target of generating high-quality candidate frame based on weak supervised learning | |
CN112487927B (en) | Method and system for realizing indoor scene recognition based on object associated attention | |
CN110968735B (en) | Unsupervised pedestrian re-identification method based on spherical similarity hierarchical clustering | |
CN111401286B (en) | Pedestrian retrieval method based on component weight generation network | |
Zhao et al. | Forward vehicle detection based on deep convolution neural network | |
CN114581722A (en) | Two-stage multi-classification industrial image defect detection method based on twin residual error network | |
CN113688735A (en) | Image classification method and device and electronic equipment | |
CN112733883B (en) | Point supervision target detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |