CN116030077B - Video salient region detection method based on multi-dataset collaborative learning - Google Patents

Video salient region detection method based on multi-dataset collaborative learning Download PDF

Info

Publication number
CN116030077B
CN116030077B CN202310314307.8A CN202310314307A CN116030077B CN 116030077 B CN116030077 B CN 116030077B CN 202310314307 A CN202310314307 A CN 202310314307A CN 116030077 B CN116030077 B CN 116030077B
Authority
CN
China
Prior art keywords
data set
dataset
specific
data
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310314307.8A
Other languages
Chinese (zh)
Other versions
CN116030077A (en
Inventor
张云佐
张天
郑宇鑫
武存宇
刘亚猛
于璞泽
康伟丽
朱鹏飞
王双双
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Tiedao University
Original Assignee
Shijiazhuang Tiedao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Tiedao University filed Critical Shijiazhuang Tiedao University
Priority to CN202310314307.8A priority Critical patent/CN116030077B/en
Publication of CN116030077A publication Critical patent/CN116030077A/en
Application granted granted Critical
Publication of CN116030077B publication Critical patent/CN116030077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a video salient region detection method based on multi-dataset collaborative learning. The method comprises the following steps: acquiring a plurality of video saliency data sets with different distributions; constructing a multi-dataset collaboration network, modeling the statistical characteristics of the multi-dataset through a dataset specific unit, and prompting the network to learn the commonality with the significance characteristic through a dataset countermeasure module, wherein the two units are combined to relieve the problem of the distribution difference between the datasets; aiming at different application scenes, corresponding multi-data set training and testing modes are provided, and a compound batch training mechanism is adopted to optimize the collaborative learning process. The method is different from a common single data set or a fine tuning training mode, and the detection precision of the video salient region is improved by utilizing the information of a plurality of data sets, and the generalization performance of the model on the data outside the domain is improved.

Description

Video salient region detection method based on multi-dataset collaborative learning
Technical Field
The invention relates to the technical field of image communication methods, in particular to a video salient region detection method based on multi-dataset collaborative learning.
Background
Video salient region detection is one of the basic tasks in video processing and computer vision, and is also an important preprocessing task in perceptual video coding. It aims to simulate human visual attention system, predicts the attention degree of human being to each video area when watching video freely, and expresses it in the form of saliency map. In the perceptual video coding, firstly, capturing a video salient region, then distributing more bit resources to the salient region, so that the salient region keeps high definition, and the non-salient region is properly distorted, thereby reducing video code rate under the condition of unchanged subjective visual perception, improving video compression rate, further reducing video storage space and relieving bandwidth burden of video communication.
With the development of deep learning, the field of video saliency area detection has advanced greatly, but most video saliency detection models are trained in a single data set or fine tuning manner. Due to the limited data volume of a single data set, the detection accuracy of the single data set is approaching saturation, and the single data set lacks enough generalization capability, so that the application of the models in real life is hindered. Training with multiple data sets expands the amount of training data, which seems to solve the above problem, but often there is a distribution bias between the data sets, and the model trained directly on the multiple data sets is often inferior to the single data set or the results under a fine-tuning model. It follows that modeling common features with significant information is critical to efficiently performing multi-dataset training how to decouple the distribution differences between datasets.
Disclosure of Invention
The invention provides a video salient region detection method based on multi-dataset collaborative learning in order to solve the problems in the prior art.
A video salient region detection method based on multi-dataset collaborative learning is characterized by comprising the following steps:
s1: acquiring a plurality of video saliency data sets with labels, wherein samples and label distributions of the plurality of data sets are different;
s2: a multi-dataset collaboration network is constructed, and a saliency map of the input video is acquired by utilizing information of the multi-dataset. The network consists of an encoder of a 3D convolution backbone network, a characteristic fusion module, a data set specific unit and a data set countermeasure moduleAnd a decoder. Wherein the data set specific unit comprises a data set specific batch normalization operation, a data set specific gaussian prior graph and a data set specific gaussian smoothing filter for modeling statistical properties of each data set; the data set countermeasure module is used for judging the data set label of the input sample, and generating classification loss
Figure SMS_1
The commonality of the salient features of the network learning is promoted in the form of the countermeasure learning; the data set specific unit and the data set countermeasure module work cooperatively, so that the statistical characteristics and the remarkable commonalities of a plurality of data sets can be modeled, and the problem of distribution difference among the plurality of data sets is relieved together;
s3: aiming at an intra-domain scene, training and testing are carried out in a general mode; training and testing a target domain without a label in a domain self-adaptive mode; training and testing an unknown target domain in a domain generalization mode; and a composite batch training mechanism is adopted to assist the multi-dataset collaborative network training.
The data set specific unit sets a corresponding branch for each data set, and automatically switches a switch to activate the corresponding branch according to the label of the input data set, so as to model the exclusive characteristic of the data set; the specific application of the method is divided into a data set specific batch normalization operation, a data set specific Gaussian prior graph and a data set specific Gaussian smoothing filter; for different batch normalization parameter distributions across datasets, dataset specific batch normalization operations are training to learn the batch normalization mean and variance for each dataset; aiming at the difference of Gaussian prior graphs among data sets, a data set specific Gaussian prior graph builds a different two-dimensional Gaussian prior graph for each data set to model the central fixation deviation of each data set; for significant map sharpness differences between datasets, a learner-specific gaussian smoothing filter is employed to eliminate this bias.
The data set countermeasure module consists of a gradient inversion layer and a data set classifier; the data set classifier consists of a convolution layer and a full connection layer, which is used forIn predicting the data set to which the input video belongs, its loss function
Figure SMS_2
Cross entropy loss for multiple classifications; the gradient inversion layer does not perform numerical transformation in forward propagation, but automatically inverts the gradient direction in backward propagation.
A further technical solution is that the generic approach aims at learning a unified model using information from multiple data sets to improve the performance of the model on each data set; during the training phase, batches of each dataset are propagated forward, and loss of significance prediction is propagated backward
Figure SMS_3
And data set classification loss->
Figure SMS_4
During the detection phase, the corresponding dataset specific cell branches are selected according to the tags of the input dataset without using the dataset countermeasure module.
The technical scheme is that the field self-adaption mode aims at improving the performance of the label-free target domain; during the training phase, batches from each source domain dataset and one unlabeled target domain are propagated forward, requiring computation and back-propagation of significant predictive losses for each source domain dataset
Figure SMS_5
And data set classification loss->
Figure SMS_6
For the target domain, only the classification loss of the back propagation data set is calculated>
Figure SMS_7
The method comprises the steps of carrying out a first treatment on the surface of the In the test phase, for the source domain data sets, the corresponding data set specific element branches are selected according to the labels of the data sets, without using a data set countermeasure module, while for the target domain data, the source domain data set with the largest data amount is selected as the data set label thereof to determine the corresponding specific element branches.
The further technical proposal is that the domain generalization mode aims at trying to learn a generalization model from a plurality of source domain data sets without using target domain data; because of the lack of the target domain, the training phase is the same as the general mode, and the testing phase is the same as the domain adaptive mode.
The further technical scheme is that the composite batch training mechanism is used for promoting collaborative optimization of the training process and avoiding batch jitter caused by switching different data sets; the mechanism constructs batches from each dataset into composite batches according to the video quantity proportion of the plurality of source domain datasets; the loss for each dataset batch is calculated separately in forward propagation, and back propagation is performed to update the gradient when the loss from all dataset batches is calculated.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: the scheme breaks through the constraint of the traditional single data set or fine tuning training mode, and proposes a multi-data set collaborative learning paradigm for video salient region detection. The unified model is constructed by utilizing the information of a plurality of data sets, so that the detection precision of the salient region is improved, the generalization capability of the model for the data outside the domain is remarkably improved, and the method is more suitable for being applied to a real scene.
Drawings
The invention will be described in further detail with reference to the drawings and the detailed description.
Fig. 1 is a flow chart of a video salient region detection method based on multi-dataset collaborative learning in a first embodiment of the present invention;
fig. 2 is an overall structure diagram of a video salient region detection method based on multi-dataset collaborative learning in a first embodiment of the present invention;
FIG. 3 is a network detail schematic diagram of a video salient region detection method based on multi-dataset collaborative learning in a first embodiment of the present invention;
fig. 4 (a) is a schematic structural diagram of a spatial attention guiding fusion module according to an embodiment of the present invention; (b) Is a schematic structural diagram of a channel attention guide fusion module;
FIG. 5 is a schematic diagram of a data set specific unit according to a first embodiment of the present invention;
fig. 6 is a flow chart of a composite batch training method in accordance with the first embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Example 1
The embodiment of the invention provides a video salient region detection method based on multi-dataset collaborative learning, which is shown in a flow chart in fig. 1 and comprises the following steps:
s1: acquiring a plurality of video saliency data sets with labels, wherein samples and label distributions of the plurality of data sets are different;
s2: a multi-dataset collaboration network is constructed, and a saliency map of the input video is acquired by utilizing information of the multi-dataset. The network consists of an encoder, a feature fusion module, a dataset specific unit, a dataset challenge module and a decoder of a 3D convolutional backbone network, as shown in fig. 2. Wherein the data set specific unit comprises a data set specific batch normalization operation, a data set specific gaussian prior graph and a data set specific gaussian smoothing filter for modeling statistical properties of each data set; the data set countermeasure module is used for judging the data set label of the input sample, and generating classification loss
Figure SMS_8
To combat learningThe form promotes the commonality of the salient features of the network learning; the data set specific unit and the data set countermeasure module work cooperatively, so that the statistical characteristics and the remarkable commonalities of a plurality of data sets can be modeled, and the problem of distribution difference among the plurality of data sets is relieved together;
s3: aiming at an intra-domain scene, training and testing are carried out in a general mode; training and testing a target domain without a label in a domain self-adaptive mode; training and testing an unknown target domain in a domain generalization mode; and a composite batch training mechanism is adopted to assist the multi-dataset collaborative network training.
The present invention provides a preferred embodiment to perform S1. Four commonly used video saliency datasets were employed: DHF1K, hollywood-2, UCF-Sports and LEDOV. The DHF1K is a large video fixation database, the coverage types are more, 1000 videos in the data set are divided into a training set, a verification set and a test set, and the number of the videos in the data set is 600, 100 and 300 respectively. Hollywood-2 is 1707 video from Hollywood movies, 823 videos for training and 884 videos for testing. UCF-Sports is a dataset from Sports videos, 103 of which are used for training and 47 of which are used for testing. LEDOV collects video from accessible public sources, including advertisements, documentaries, etc., consisting of 44 training samples, 20 verification samples, and 20 test samples. The sample and label distributions of the four are significantly different.
The present invention provides a preferred embodiment to perform S2, the network architecture of which is shown in fig. 3. The whole multi-dataset collaboration network is divided into the following parts:
a 3D convolutional backbone network encoder. The encoder employs an S3D network as a backbone network that is pre-trained on a Kinetics dataset, which can continually extract multi-scale spatio-temporal features from an input frame sequence in a sliding window fashion.
And a feature fusion module. The module body consists of a bidirectional space-time feature pyramid, which is fused by adopting an attention-directed fusion mechanism. The bi-directional spatio-temporal feature pyramid adds a bottom-up path to the top-down path of the TSFP-Net. Through the framework, the position information of the deep features is propagated along a top-down path, and the position information of the shallow features is propagated along a bottom-up path, so that multi-scale space-time features can be fully fused, and context information required by accurate prediction is generated. Furthermore, the framework employs an attention directed fusion mechanism to fusion of neighboring features instead of simple stitching or summing. The mechanism can automatically learn the fusion weight and carry out self-adaptive adjustment under different scenes. Specific applications thereof can be divided into a spatial attention directed fusion module (SAGFs) and a channel attention directed fusion module (CAGF), and specific operation processes can be referred to in FIG. 4. Through this module, multi-scale spatio-temporal features are further enhanced.
Data set specific units. The unit is configured to model the statistical properties of each dataset and have the remaining network share part learn the significant commonality representation. The dataset specific unit sets a respective branch for each dataset, and switches automatically to activate the respective branch according to the input dataset, the structure of which is shown in fig. 5. The specific forms thereof can be divided into: a dataset specific batch normalization operation, a dataset specific gaussian prior map and a dataset specific gaussian smoothing filter. The dataset specific batch normalization operation is to learn the batch normalization mean and variance for each dataset through training for differences in batch normalization parameter distribution across datasets. Since the parameters of the backbone network S3D come from the pre-training model, this embodiment sets a data set specific batch normalization decoder for each multi-scale branch, respectively, to generate features of the same resolution. Aiming at the difference of Gaussian prior graphs among data sets, the embodiment adds a group of data set specific Gaussian prior graphs and splices the data set specific Gaussian prior graphs with the combined fusion characteristics so as to model central attention deviation. The prior map consists of a two-dimensional Gaussian prior map, and the combination parameters of the prior map can be learned. For differences in saliency between datasets, this embodiment adds a dataset specific gaussian smoothing filter whose gaussian blur parameters can be learned before generating the final saliency map.
The data set antagonism module. The module consists of a gradient inversion layer and a data set classifier; data set partitioningThe classifier consists of a convolution layer and a full connection layer and is used for predicting the data set to which the input video belongs
Figure SMS_9
Its loss function->
Figure SMS_10
Cross entropy loss for multiple classifications; the gradient inversion layer does not perform numerical transformation in forward propagation, but automatically inverts the gradient direction in backward propagation.
And a decoder. The decoder consists of four 3D convolutional layers and two upsampling layers. In the decoding process, the fused multi-scale features are aggregated along the time and channel dimensions and up-sampled to the resolution of the original frame, and then a sigmoid function is employed to generate the final saliency map.
The present invention provides a preferred embodiment to perform S3. Aiming at an intra-domain scene, training and testing are carried out in a general mode; training and testing a target domain without a label in a domain self-adaptive mode; training and testing an unknown target domain in a domain generalization mode; and a composite batch training mechanism is adopted to assist the multi-dataset collaborative network training.
The generic approach aims to learn a unified model using information from multiple data sets to improve the performance of the model on each data set; during the training phase, batches of each dataset are propagated forward, the significance prediction loss is propagated backward, and the dataset classification loss is propagated backward
Figure SMS_11
During the detection phase, the corresponding dataset specific cell branches are selected according to the tags of the input dataset without using the dataset countermeasure module. Wherein the predicted loss of significance is->
Figure SMS_12
Figure SMS_13
/>
Wherein,,
Figure SMS_14
=0.5,/>
Figure SMS_15
=0.1,、/>
Figure SMS_16
,/>
Figure SMS_17
and->
Figure SMS_18
KL divergence loss, linear correlation coefficient and regularized scan path significance loss, respectively.
The domain adaptation approach aims to improve performance on unlabeled target domains; during the training phase, batches from each source domain dataset and one unlabeled target domain are propagated forward, requiring computation and back-propagation of significant predictive losses for each source domain dataset
Figure SMS_19
And data set classification loss->
Figure SMS_20
Only the data set classification loss is calculated and back propagated for the target domain
Figure SMS_21
The method comprises the steps of carrying out a first treatment on the surface of the In the test phase, for the source domain data sets, the corresponding data set specific element branches are selected according to the labels of the data sets, without using a data set countermeasure module, while for the target domain data, the source domain data set with the largest data amount is selected as the data set label thereof to determine the corresponding specific element branches.
The domain generalization approach aims at attempting to learn a generalization model from multiple source domain data sets without using target domain data; because of the lack of the target domain, the training phase is the same as the general mode, and the testing phase is the same as the domain adaptive mode.
The composite batch training mechanism is used for promoting collaborative optimization of the training process and avoiding batch jitter caused by switching different data sets; the mechanism constructs batches from each dataset into composite batches according to the video quantity ratio of the plurality of source domain datasets; the loss for each dataset batch is calculated separately during forward propagation, and after all the losses from all dataset batches are calculated, backward propagation is performed to update the gradient, as shown in the flowchart of fig. 6. According to the characteristics of the data set selected in S1, the embodiment constructs the composite batch by using the data of DHF1K, hollywood-2, UCF Sports and LEDOV according to the ratio of 8:3:1:4.
To verify the effectiveness of the first example above, the method of the present invention performs a performance comparison with other advanced methods on three data sets DHF1K, hollywood-2, UCF-Sports and LEDOV, and selects the 5 indices commonly used as the metrics: AUC-Judd (AUC-J), similarity measure (Similarity Metric, SIM), s_auc, CC and NSS. The larger these five indices, the more accurate the saliency region. The experimental results are shown in table 1.
Table 1 comparison of prediction accuracy over three data sets
Figure SMS_22
Table 2 comparison of prediction accuracy on LEDOV datasets
Figure SMS_23
As can be seen from tables 1 and 2, the present embodiment is advanced over the existing methods in terms of multiple indicators on each data set. Further, by comparing the effects of this embodiment with other training patterns, as shown in table 3, it can be seen that the accuracy of this embodiment is far higher than that of the previous training pattern.
TABLE 3 NSS index comparison results for multiple training patterns
Figure SMS_24
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (6)

1. The video salient region detection method based on multi-dataset collaborative learning is characterized by comprising the following steps of:
s1: acquiring a plurality of video saliency data sets with labels, wherein samples and label distributions of the plurality of data sets are different;
s2: constructing a multi-data set collaboration network, and acquiring a saliency map of an input video by utilizing information of the multi-data set; the network consists of an encoder, a characteristic fusion module, a data set specific unit, a data set countermeasure module and a decoder of the 3D convolution backbone network; wherein the data set specific unit comprises a data set specific batch normalization operation, a data set specific gaussian prior graph and a data set specific gaussian smoothing filter for modeling statistical properties of each data set; the data set batch normalization operation refers to learning a specific batch normalization mean and variance for each data set for the case that the batch normalization parameter distribution across the data is different; the specific Gaussian prior graph of the data sets is that a specific two-dimensional Gaussian prior graph is built for each data set aiming at the difference of the Gaussian prior graphs among the data sets, and the specific two-dimensional Gaussian prior graph is used for modeling the central fixation deviation of each data set; the data set specific Gaussian smoothing filter is used for eliminating the definition deviation by learning the Gaussian smoothing filter with specific parameters for each data set according to the difference of the definition of the salient graphs among the data sets; the data set countermeasure module is used for judging the data set label of the input sample, and generating classification loss
Figure QLYQS_1
The commonality of the salient features of the network learning is promoted in the form of the countermeasure learning; the data set specific unit and the data set countermeasure module work cooperatively to model the statistical characteristics and the remarkable commonalities of a plurality of data sets, and shareAnd simultaneously, the problem of distribution difference among a plurality of data sets is solved; the specific flow is as follows: firstly capturing multi-scale space-time characteristics through an encoder of a 3D convolution backbone network, fusing the multi-scale characteristics by adopting a characteristic fusion module, then transmitting the multi-scale characteristics into a convolution layer with a specific batch normalization operation of a data set to obtain characteristics with normalized deviation removed, wherein the characteristics are transmitted into a data set countermeasure module for countermeasure learning on one hand, and spliced with a specific Gaussian prior graph of the data set on the other hand, and finally output a significant graph is obtained through the encoder and a Gaussian smoothing filter of the data set;
s3: aiming at an intra-domain scene, training and testing are carried out in a general mode; training and testing a target domain without a label in a domain self-adaptive mode; training and testing an unknown target domain in a domain generalization mode; a compound batch training mechanism is adopted to assist the multi-data set collaborative network training; the composite batch training mechanism is that firstly, according to the video quantity proportion of a plurality of source domain data sets, batches from each data set are combined into composite batches, loss of each data set batch is calculated during forward transmission, and after the loss from all the data set batches is calculated, reverse transmission is carried out to update the gradient; the mechanism can promote collaborative optimization of the training process and avoid batch jitter caused by switching different data sets.
2. The method for detecting video saliency areas based on multi-dataset collaborative learning according to claim 1, wherein the dataset specific unit sets a corresponding branch for each dataset, and automatically switches a switch to activate the corresponding branch according to the tag of the input dataset, thereby modeling dataset specific features; the specific application is divided into a data set specific batch normalization operation, a data set specific Gaussian prior diagram and a data set specific Gaussian smoothing filter.
3. The method for detecting video salient regions based on multi-dataset collaborative learning as claimed in claim 1, wherein the dataset countermeasure module is composed of a gradient inversion layer and a datasetA classifier; the data set classifier consists of a convolution layer and a full connection layer and is used for predicting the data set to which the input video belongs, and the loss function of the data set classifier
Figure QLYQS_2
Cross entropy loss for multiple classifications; the gradient inversion layer does not perform numerical transformation in forward propagation, but automatically inverts the gradient direction in backward propagation.
4. The method for detecting regions of video saliency based on collaborative learning of multiple data sets according to claim 1, wherein the generic approach is directed to learning a unified model using information from multiple data sets to enhance the performance of the model on each data set; during the training phase, batches of each dataset are propagated forward, and loss of significance prediction is propagated backward
Figure QLYQS_3
And data set classification loss->
Figure QLYQS_4
During the detection phase, the corresponding dataset specific cell branches are selected according to the tags of the input dataset without using the dataset countermeasure module.
5. The method for detecting video saliency areas based on multi-dataset collaborative learning according to claim 1, wherein the domain adaptation is aimed at improving performance on a label-free target domain; during the training phase, batches from each source domain dataset and one unlabeled target domain are propagated forward, requiring computation and back-propagation of significant predictive losses for each source domain dataset
Figure QLYQS_5
And data set classification loss->
Figure QLYQS_6
For the target domain, only the classification loss of the data set is calculated and back-propagatedLoss of function
Figure QLYQS_7
The method comprises the steps of carrying out a first treatment on the surface of the In the test phase, for the source domain data sets, the corresponding data set specific element branches are selected according to the labels of the data sets, without using a data set countermeasure module, while for the target domain data, the source domain data set with the largest data amount is selected as the data set label thereof to determine the corresponding specific element branches.
6. The method for detecting video salient regions based on multi-dataset collaborative learning according to claim 1, wherein the domain generalization approach aims at attempting to learn a generalization model from a plurality of source domain datasets without using target domain data; because of the lack of the target domain, the training phase is the same as the general mode, and the testing phase is the same as the domain adaptive mode.
CN202310314307.8A 2023-03-28 2023-03-28 Video salient region detection method based on multi-dataset collaborative learning Active CN116030077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310314307.8A CN116030077B (en) 2023-03-28 2023-03-28 Video salient region detection method based on multi-dataset collaborative learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310314307.8A CN116030077B (en) 2023-03-28 2023-03-28 Video salient region detection method based on multi-dataset collaborative learning

Publications (2)

Publication Number Publication Date
CN116030077A CN116030077A (en) 2023-04-28
CN116030077B true CN116030077B (en) 2023-06-06

Family

ID=86077919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310314307.8A Active CN116030077B (en) 2023-03-28 2023-03-28 Video salient region detection method based on multi-dataset collaborative learning

Country Status (1)

Country Link
CN (1) CN116030077B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116612122B (en) * 2023-07-20 2023-10-10 湖南快乐阳光互动娱乐传媒有限公司 Image significance region detection method and device, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884730A (en) * 2021-02-05 2021-06-01 南开大学 Collaborative significance object detection method and system based on collaborative learning
CN113705811A (en) * 2021-10-29 2021-11-26 腾讯科技(深圳)有限公司 Model training method, device, computer program product and equipment
CN113902783A (en) * 2021-11-19 2022-01-07 东北大学 Three-modal image fused saliency target detection system and method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020167B2 (en) * 2018-05-17 2024-06-25 Magic Leap, Inc. Gradient adversarial training of neural networks
US20220198339A1 (en) * 2020-12-23 2022-06-23 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for training machine learning model based on cross-domain data
CN114676785A (en) * 2022-04-08 2022-06-28 云从科技集团股份有限公司 Method, system, equipment and medium for generating target detection model
CN115035346A (en) * 2022-06-23 2022-09-09 温州大学 Classification method for Alzheimer disease based on cooperative learning method enhancement

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884730A (en) * 2021-02-05 2021-06-01 南开大学 Collaborative significance object detection method and system based on collaborative learning
CN113705811A (en) * 2021-10-29 2021-11-26 腾讯科技(深圳)有限公司 Model training method, device, computer program product and equipment
CN113902783A (en) * 2021-11-19 2022-01-07 东北大学 Three-modal image fused saliency target detection system and method

Also Published As

Publication number Publication date
CN116030077A (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN111832570A (en) Image semantic segmentation model training method and system
Getu et al. Making sense of meaning: A survey on metrics for semantic and goal-oriented communication
CN115731505B (en) Video salient region detection method and device, electronic equipment and storage medium
CN116030077B (en) Video salient region detection method based on multi-dataset collaborative learning
Duanmu et al. Quantifying visual image quality: A bayesian view
US20230177384A1 (en) Attention Bottlenecks for Multimodal Fusion
CN114663798B (en) Single-step video content identification method based on reinforcement learning
CN115293348A (en) Pre-training method and device for multi-mode feature extraction network
CN112633234A (en) Method, device, equipment and medium for training and applying face glasses-removing model
Li et al. A tensor-based online RPCA model for compressive background subtraction
Cai et al. Multiscale attentive image de-raining networks via neural architecture search
CN116757986A (en) Infrared and visible light image fusion method and device
WO2022164680A1 (en) Simultaneously correcting image degradations of multiple types in an image of a face
CN112365551A (en) Image quality processing system, method, device and medium
Bang et al. Discriminator feature-based inference by recycling the discriminator of gans
Yang et al. Blind VQA on 360° video via progressively learning from pixels, frames, and video
WO2023122132A2 (en) Video and feature coding for multi-task machine learning
CN116091763A (en) Apple leaf disease image semantic segmentation system, segmentation method, device and medium
Jayageetha et al. Medical image quality assessment using CSO based deep neural network
Xie et al. Pruning with compensation: efficient channel pruning for deep convolutional neural networks
CN114581353A (en) Infrared image processing method and device, medium and electronic equipment
Qiao et al. HyperSOR: Context-aware graph hypernetwork for salient object ranking
CN112967309A (en) Video target segmentation method based on self-supervision learning
Kas et al. Multi streams with dynamic balancing-based Conditional Generative Adversarial Network for paired image generation
Shi et al. Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality Token

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant