CN113610885B - Semi-supervised target video segmentation method and system using difference contrast learning network - Google Patents

Semi-supervised target video segmentation method and system using difference contrast learning network Download PDF

Info

Publication number
CN113610885B
CN113610885B CN202110785106.7A CN202110785106A CN113610885B CN 113610885 B CN113610885 B CN 113610885B CN 202110785106 A CN202110785106 A CN 202110785106A CN 113610885 B CN113610885 B CN 113610885B
Authority
CN
China
Prior art keywords
target
feature
convolution
global
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110785106.7A
Other languages
Chinese (zh)
Other versions
CN113610885A (en
Inventor
杨大伟
董美辰
毛琳
张汝波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Minzu University filed Critical Dalian Minzu University
Priority to CN202110785106.7A priority Critical patent/CN113610885B/en
Publication of CN113610885A publication Critical patent/CN113610885A/en
Application granted granted Critical
Publication of CN113610885B publication Critical patent/CN113610885B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a semi-supervised target video segmentation method and a semi-supervised target video segmentation system using a difference comparison learning network, and relates to the technical field of video segmentation. According to the initial frame mask, global and local feature information of the target is extracted, the similarity between the global feature and the local feature of the target is improved by adopting a contrast learning idea, and the degree of distinction between the target and the background feature is enlarged, so that more robust target feature expression is obtained. And performing pixel comparison by using the obtained global features, and simultaneously combining a reference frame segmentation result to ensure the accuracy of target and background region division in the video segmentation result.

Description

Semi-supervised target video segmentation method and system using difference contrast learning network
Technical Field
The application relates to the technical field of video segmentation, in particular to a semi-supervised target video segmentation method and system using a difference contrast learning network.
Background
The semi-supervised target video segmentation task finely segments target objects in the whole video sequence from the background based on a given initial frame mask, so that accurate target positioning is realized, and the semi-supervised target video segmentation task has wide application value and landing requirements in the fields of video understanding, man-machine interaction, automatic driving and the like. However, due to continuous change of the target background in the video and the influence factors such as illumination change, similar background interference and the like, single-target video segmentation still faces many challenges.
The existing semi-supervised video segmentation methods can be divided into three types of methods based on motion propagation, detection and template matching. The method based on motion propagation mainly utilizes the correlation of the motion and time of the target, and the accurate segmentation can be realized by relatively smoothing the position and shape change of the target depending on the space-time relationship among pixels, but when the influence of time discontinuous factors such as shielding or rapid motion is met, the drift problem can be caused. The method based on detection does not depend on time information, utilizes target information in an initial frame segmentation result, learns an appearance model, detects and segments a target in a video frame, and respectively enhances data of an initial frame image and a segmentation result diagram thereof during a test sequence so as to facilitate adjustment of a training model, thereby obtaining more accurate instance characteristic information, but great calculation amount is brought by online training. According to the method based on template matching, pixel level matching is carried out on the current video frame and the initial frame characteristics, the pixels are segmented according to the comparison result, the segmentation result cannot be influenced due to accumulated propagation errors, but space-time information is not fully utilized, and the requirement on initial frame characteristic extraction is high.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a semi-supervised target video segmentation method and a semi-supervised target video segmentation system using a difference comparison learning network, which are used for obtaining robust and distinguishable target features from a feature space, and improving the performance of a video segmentation algorithm by adopting a comparison learning idea and combining with space-time information.
In order to achieve the above purpose, the technical scheme of the application is as follows: a semi-supervised target video segmentation method using a difference contrast learning network includes:
step 1: initial video frame of size h×wInputting into backbone network to obtain general visual feature with feature channel number of c, and performing edge enhancement convolution treatment to obtain visual feature with clearer detail texture>As a basis for a subsequent comparison network; -providing said visual features->And segmentation result->Respectively multiplying and sizingAdjusting to obtain target feature->And background features->
Step 2: extracting the target featuresGlobal mapping feature->
Step 3: characterizing the global mapIs->Performing pixel-level similarity comparison to obtain c channels and m×n similarity response graphs;
step 4: characterizing the global mapIs->Performing pixel-level similarity comparison to obtain c channels and m×n difference degree response graphs;
step 5: characterizing the global mapIs->Performing pixel comparison, combining reference frame segmentation results, and passing through convolutionAccording to global mapping features->Distinguishing the object from the background features and the similarity between the object and the background at the pixel level to obtain an object region +.>And background area->
Step 6: better global feature mapping mode and global mapping features of initial frames are obtained through 3-5 steps of learningSharing the parameters of the convolution layer, repeating the step 1, and inputting the subsequent video frame with the size of h multiplied by w>Visual characteristic +.>
Step 7: global mapping features for initial framesVisual characteristics with subsequent frames->Based on which reference frame segmentation results are combined +.>Repeating the fifth step, and outputting the segmentation result of the subsequent frames;
step 8: and repeating the steps 6-7 until the target segmentation task of the whole video is completed.
Further, the visual characteristics areAnd segmentation result->Multiplying and adjusting the size to obtain the target characteristic +.>And background features->The formula is:
further, extracting the target featuresGlobal mapping feature->The method comprises two parts of global average pooling and a full connection layer, which are respectively:
(1) First for the target featureBy J 3×3,c Global average pooling is carried out on convolution kernels of (2) to output c-dimensional feature vectors +.>The formula is:
wherein H is average (x,J k×k,c S, p) is an average pooling function,for convolution operation, the convolution kernels with a step size s of 1 and a convolution kernel size k=3 are used to pool the pixel features of the c feature channels in turn until a c-dimensional feature vector is outputThe parameter is reduced and the calculated amount is reduced while the feature integrity of the image content is ensured.
(2) C-dimensional feature vector subjected to global average poolingInputting the full connection layer to obtain global mapping characteristicsThe formula is:
where μ is a map coefficient, and η is a correction amount. The purity of the global feature is improved through the full-connection layer processing, and the influence of the position on the feature expression is reduced.
Further, c channels are obtained, and a similarity response diagram with the size of m×n is obtained, wherein the formula is as follows:
wherein i=1, 2,..m, j=1, 2,..n; l=1, 2,..c; h standard Mapping the similarity score of each pixel point into a 0-1 interval for a normalization function; each pixel point takes the highest r scores to obtain a three-channel scoring result graph with the size of m multiplied by n, and the three-channel scoring result graph is subjected to average pooling operation to obtain a final response graph with similar contrast, wherein the formula is as follows:
further, c channels are obtained, and a difference degree response diagram with the size of m multiplied by n is obtained, wherein the formula is as follows:
wherein i=1, 2,..m, j=1, 2,..n; l=1, 2,..c;
taking the highest r scores of each pixel point to obtain a three-channel scoring result graph with the size of m multiplied by n, and carrying out average pooling operation on the scoring result graph to obtain a final response graph with difference contrast, wherein the formula is as follows:
further, the target areaAnd background area->The calculation formula of (2) is as follows:
and sigma is a threshold value, and is obtained through training and used for judging the target and background areas in the video frame. Setting the convolution kernel size as 1×1, and the step length s=1, performing convolution operation on the primary segmentation results of the target and the background, performing fine processing, and outputting a segmentation mapThe formula is:
the application also provides a semi-supervised target video segmentation system using the difference contrast learning network, comprising:
difference contrast learning network for obtaining video initial frameThe general visual characteristics are obtained through backbone network processing, and then the visual characteristics with clearer detail textures are obtained through edge enhancement convolution processing>Said visual features->Multiplying the original frame segmentation map to obtain target feature +.>Background feature->The target feature->The feature vector is obtained through global average pooling treatment>Then the global mapping characteristic +.>
And the similarity comparison branch unit improves the description capability of the global mapping feature on the target by improving the similarity of the global mapping feature and the target local feature. Acquiring target featuresFeature vector of each pixel point is +.>Global mapping feature->Performing similarity comparison through convolution with the convolution kernel size of 1×1 to obtain a similarity scoring graph with c channels and the size of m×n, namely, each pixel point comprises c channels, each channel has a corresponding similarity score, k scores before each channel are reserved, and average pooling processing is performed to obtain a final similarity response graph->The local feature receptive field is limited, and the similar comparison branches improve the capability of the global mapping features to capture information from different local areas through the comparison and learning of the local features and the global mapping features.
Differential contrast branching unit by enlarging background featuresGlobal mapping feature->The degree of distinction between the two can improve the capability of the model to divide the object and the background. Acquisition of background features->Feature vector of each pixel point is +.>Global mapping feature->Performing similarity comparison through convolution with the convolution kernel size of 1×1 to obtain a similarity scoring graph with c channels and the size of m×n, namely, each pixel point comprises c channels, each channel has a corresponding similarity score, the k scores before each channel are reserved as similar branches, and average pooling processing is performed to obtain a final difference response graph>
Referencing learning branch unit, global mapping featuresAnd visual characteristics->And performing similarity comparison by taking pixels as units through convolution with the convolution kernel size of 1 multiplied by 1 to obtain c channels and m multiplied by n similarity scoring graphs, combining a reference frame segmentation result, obtaining a response graph with higher accuracy through convolution with the convolution kernel size of 3 multiplied by 3, and finally outputting a segmentation result of a target and a background.
By adopting the technical scheme, the application can obtain the following technical effects:
(1) Completing video target segmentation and target tracking multi-domain tasks
The difference contrast learning network can complete the target tracking task while completing the video target segmentation, and simultaneously improves the accuracy of target tracking. The gap between the segmentation task and the tracking task is reduced, and the application range of the network is enlarged.
(2) Target segmentation task applicable to automatic driving
The application combines the reference frame segmentation result, effectively improves the segmentation precision of the target in the case of rapid movement or deformation, is suitable for the automatic driving field, and can achieve the effect of accurate obstacle avoidance by obtaining the accurate segmentation result.
(3) Real-time tracking task suitable for automatic driving
The method and the device can be applied to a tracking module in automatic driving, and frame the target segmentation result to obtain a real-time tracking frame of targets such as pedestrians and the like, so that the subsequent path planning of automatic driving is completed.
(4) Security monitoring system
According to the application, the differentiation between the target and the background is improved through the difference contrast learning network, accurate segmentation is realized through pixel level contrast, the target can be accurately positioned and segmented from the background under a complex scene, and the method can be applied to a security monitoring system.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
FIG. 1 is a schematic overall frame of the present method;
FIG. 2 is a schematic diagram of the real-time tracking task of the autopilot target in example 1;
FIG. 3 is a schematic illustration of an autopilot obstacle avoidance task of example 2;
fig. 4 is a schematic diagram of a security monitoring designated target task in example 3.
Detailed Description
The embodiment of the application is implemented on the premise of the technical scheme of the application, and a detailed implementation mode and a specific operation process are provided, but the protection scope of the application is not limited to the following embodiment.
The embodiment provides a semi-supervised target video segmentation method and a semi-supervised target video segmentation system using a difference contrast learning network, which extract global and local feature information of a target according to an initial frame mask, adopt a contrast learning idea, improve the similarity between global features and local features of the target, and enlarge the degree of distinction between the target and background features so as to obtain more robust target feature expression. And performing pixel comparison by using the obtained global features, and simultaneously combining a reference frame segmentation result to ensure the accuracy of target and background region division in the video segmentation result.
In the application, the initial frame is the first frame of the task video, and the segmentation result of the target and the background can be given. The segmentation result is the result of accurately distinguishing the target and the background area according to the target contour. The reference frame segmentation result refers to the last frame segmentation result of the current test frame. The test frame is a subsequent video frame which needs to be segmented except the initial frame of the task video, namely, the video frame which needs to be segmented currently. The general visual features are basic visual features including color, shape, spatial relationship, etc. extracted through the backbone network. The clear visual features are the results of the detail texture and the edge feature expression in the enhanced image through the edge enhanced convolution network processing. The target feature is an area containing the target in the overall feature map. The background feature is the removal of the region containing the object from the overall feature map. Global map features refer to global feature expressions that may represent targets. Feature vectors are a mathematical form of feature expression. The characteristic channel is a place where the convolution layer performs information interaction and is also a mapping area expression of the characteristics. The similarity response graph reflects the similarity relationship between input contrast features. The difference response graph is a difference relation between the input contrast features. The target region is a region in which the image is compared with the target global mapping characteristics and is higher than a set threshold value, and is determined as a target. The background area is an area in which the image is compared with the global mapping characteristics of the target and is lower than a set threshold value, and is determined as the background.
The input video frame size can be 1280×720 RGB three-channel image, the output general visual characteristic size can be 640×360 after backbone network processing, the number c of output channels of each layer of backbone network is {32, 64, 128, 256, 512}, and the general characteristic diagrams with different sizes (1,640,360,32), (1,640,360,64), (1,640,360,128), (1,640,360,256), (1,640,360,512) can be output according to the requirement. The feature map with the size (1,640,360,256) can be output after the CNN convolution layer processing. In the similar comparison branch and the difference comparison branch, the highest r scores of each pixel point are obtained through convolution similarity comparison, and then average pooling is carried out, wherein r is {1,2,3,4,5,6,7,8,9,10}.
In the process of extracting global mapping characteristics of targets in an initial frame and a subsequent frame, convolution parameters are shared. In the similar comparison branch and the difference comparison branch, each pixel point comprises c channels, each channel has a corresponding similarity score, the highest r scores of each pixel point are taken, and the values of c and r in the two branches are the same.
Example 1:
real-time tracking task for automatic driving target
This example is directed to an automated driving target tracking task. The application is applied to the vehicle-mounted camera to position and track the surrounding environment of the vehicle in real time, so as to prepare for planning the system path and ensure the driving safety. The autopilot real-time positioning task situation is shown in fig. 2.
Example 2:
automatic driving obstacle avoidance task
The embodiment aims at the automatic driving running process, is applied to a vehicle-mounted camera, improves the accuracy of positioning and dividing the road obstacle in a shot picture, and realizes accurate obstacle avoidance. The autopilot obstacle avoidance task is shown in fig. 3.
Example 3:
target task appointed by security monitoring system
The embodiment is applied to a security monitoring system, positions and segments the designated targets in a complex scene, improves the efficiency of a monitoring and checking system, and makes target tasks shown in fig. 4.
The foregoing descriptions of specific exemplary embodiments of the present application are presented for purposes of illustration and description. It is not intended to limit the application to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the application and its practical application to thereby enable one skilled in the art to make and utilize the application in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the application be defined by the claims and their equivalents.

Claims (2)

1. The semi-supervised target video segmentation method using the difference contrast learning network is characterized by comprising the following steps of:
step 1: initial video frame of size h×wInputting into backbone network to obtain general visual feature with feature channel number of c, and performing edge enhancement convolution treatment to obtain visual feature with clearer detail texture>-providing said visual features->And segmentation result->Multiplying and adjusting the size to obtain the target characteristic +.>And background features->
Step 2: extracting the target featuresGlobal mapping feature->
Step 3: characterizing the global mapIs->Performing pixel-level similarity comparison to obtain c channels and m×n similarity response graphs;
step 4: characterizing the global mapIs->Performing pixel-level similarity comparison to obtain c channels and m×n difference degree response graphs;
step 5: characterizing the global mapIs->Performing pixel comparison, combining reference frame segmentation result, and performing convolution method according to global mapping feature +.>Distinguishing the object from the background features and the similarity between the object and the background at the pixel level to obtain an object region +.>And background area->
Step 6: sharing the parameters of the convolution layer, repeating the step 1, and inputting the subsequent video frames with the size of h multiplied by wVisual characteristic +.>
Step 7: global mapping features for initial framesVisual characteristics with subsequent frames->Based on which reference frame segmentation results are combined +.>Repeating the fifth step, and outputting the segmentation result of the subsequent frames;
step 8: repeating the steps 6-7 until the target segmentation task of the whole video is completed;
will be visual characteristicsAnd segmentation result->Multiplying and adjusting the size to obtain the target characteristic +.>And background features->The formula is:
extracting the target featuresGlobal mapping feature->The method comprises two parts of global average pooling and a full connection layer, which are respectively:
(1) First for the target featureBy J 3×3,c Global average pooling is carried out on convolution kernels of (2) to output c-dimensional feature vectors +.>The formula is:
wherein H is average (x,J k×k,c S, p) is an average pooling function,for convolution operation, the pixel features of c feature channels are sequentially pooled using a convolution kernel with a step size s of 1 and a convolution kernel size k=3 until a c-dimensional feature vector +.>
(2) C-dimensional feature vector subjected to global average poolingInputting the full connection layer to obtain global mapping characteristic +.>The formula is:
wherein μ is a map coefficient, η is a correction amount;
c channels are obtained, and a similarity response diagram with the size of m multiplied by n is obtained, wherein the formula is as follows:
wherein H is standard Mapping the similarity score of each pixel point into a 0-1 interval for a normalization function; each pixel point takes the highest r scores to obtain a three-channel scoring result graph with the size of m multiplied by n, and the three-channel scoring result graph is subjected to average pooling operation to obtain a final response graph with similar contrast, wherein the formula is as follows:
c channels are obtained, and a difference degree response diagram with the size of m multiplied by n is obtained, wherein the formula is as follows:
taking the highest r scores of each pixel point to obtain a three-channel scoring result graph with the size of m multiplied by n, and carrying out average pooling operation on the scoring result graph to obtain a final response graph with difference contrast, wherein the formula is as follows:
target areaAnd background area->The calculation formula of (2) is as follows:
wherein, sigma is a threshold value, and is obtained through training and used for judging a target and a background area in a video frame; setting the convolution kernel size as 1×1, and the step length s=1, performing convolution operation on the primary segmentation results of the target and the background, performing fine processing, and outputting a segmentation mapThe formula is:
2. a semi-supervised target video segmentation system using a differential contrast learning network to implement the semi-supervised target video segmentation method of claim 1, comprising:
difference contrast learning network for obtaining video initial frameThe general visual characteristics are obtained through backbone network processing, and then the visual characteristics with clearer detail textures are obtained through edge enhancement convolution processing>Said visual features->Multiplying the original frame segmentation map to obtain target feature +.>Background feature->The target feature->The feature vector is obtained through global average pooling treatment>Then the global mapping characteristic +.>
Similar comparison branch unit for obtaining target characteristicsFeature vector of each pixel point is +.>Global mapping feature->By convolutionPerforming similarity comparison on convolution with the kernel size of 1 multiplied by 1 to obtain a similarity scoring graph with c channels and the size of m multiplied by n, namely, each pixel point comprises c channels, each channel has a corresponding similarity score, k scores before each channel are reserved, and average pooling treatment is performed to obtain a final similarity response graph>
A difference comparison branch unit for obtaining background characteristicsFeature vector of each pixel point is +.>Global mapping feature->Performing similarity comparison through convolution with the convolution kernel size of 1×1 to obtain a similarity scoring graph with c channels and the size of m×n, namely, each pixel point comprises c channels, each channel has a corresponding similarity score, k scores before each channel are reserved, and average pooling processing is performed to obtain a final difference response graph->
Referencing learning branch unit, global mapping featuresAnd visual characteristics->And performing similarity comparison by taking pixels as units through convolution with the convolution kernel size of 1 multiplied by 1 to obtain c channels and m multiplied by n similarity scoring graphs, combining a reference frame segmentation result, obtaining a response graph with higher accuracy through convolution with the convolution kernel size of 3 multiplied by 3, and finally outputting a segmentation result of a target and a background.
CN202110785106.7A 2021-07-12 2021-07-12 Semi-supervised target video segmentation method and system using difference contrast learning network Active CN113610885B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110785106.7A CN113610885B (en) 2021-07-12 2021-07-12 Semi-supervised target video segmentation method and system using difference contrast learning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110785106.7A CN113610885B (en) 2021-07-12 2021-07-12 Semi-supervised target video segmentation method and system using difference contrast learning network

Publications (2)

Publication Number Publication Date
CN113610885A CN113610885A (en) 2021-11-05
CN113610885B true CN113610885B (en) 2023-08-22

Family

ID=78337461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110785106.7A Active CN113610885B (en) 2021-07-12 2021-07-12 Semi-supervised target video segmentation method and system using difference contrast learning network

Country Status (1)

Country Link
CN (1) CN113610885B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800689A (en) * 2019-01-04 2019-05-24 西南交通大学 A kind of method for tracking target based on space-time characteristic fusion study
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN111968123A (en) * 2020-08-28 2020-11-20 北京交通大学 Semi-supervised video target segmentation method
CN112861830A (en) * 2021-04-13 2021-05-28 北京百度网讯科技有限公司 Feature extraction method, device, apparatus, storage medium, and program product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671855B2 (en) * 2018-04-10 2020-06-02 Adobe Inc. Video object segmentation by reference-guided mask propagation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800689A (en) * 2019-01-04 2019-05-24 西南交通大学 A kind of method for tracking target based on space-time characteristic fusion study
CN110163875A (en) * 2019-05-23 2019-08-23 南京信息工程大学 One kind paying attention to pyramidal semi-supervised video object dividing method based on modulating network and feature
CN111968123A (en) * 2020-08-28 2020-11-20 北京交通大学 Semi-supervised video target segmentation method
CN112861830A (en) * 2021-04-13 2021-05-28 北京百度网讯科技有限公司 Feature extraction method, device, apparatus, storage medium, and program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合时空多特征表示的无监督视频分割算法;李雪君;张开华;宋慧慧;;计算机应用(第11期);全文 *

Also Published As

Publication number Publication date
CN113610885A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN109784333B (en) Three-dimensional target detection method and system based on point cloud weighted channel characteristics
CN111209810B (en) Boundary frame segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time through visible light and infrared images
CN109615611B (en) Inspection image-based insulator self-explosion defect detection method
CN107452015B (en) Target tracking system with re-detection mechanism
CN106780560A (en) A kind of feature based merges the bionic machine fish visual tracking method of particle filter
CN109657538B (en) Scene segmentation method and system based on context information guidance
CN113763427A (en) Multi-target tracking method based on coarse-fine shielding processing
Qing et al. A novel particle filter implementation for a multiple-vehicle detection and tracking system using tail light segmentation
CN110334703B (en) Ship detection and identification method in day and night image
CN116665097A (en) Self-adaptive target tracking method combining context awareness
CN107122756A (en) A kind of complete non-structural road edge detection method
CN113610885B (en) Semi-supervised target video segmentation method and system using difference contrast learning network
CN111914749A (en) Lane line recognition method and system based on neural network
CN113470074B (en) Self-adaptive space-time regularization target tracking method based on block discrimination
CN112801020B (en) Pedestrian re-identification method and system based on background graying
CN109934853B (en) Correlation filtering tracking method based on response image confidence region adaptive feature fusion
Ueda et al. Data Augmentation for Semantic Segmentation Using a Real Image Dataset Captured Around the Tsukuba City Hall
CN113112522A (en) Twin network target tracking method based on deformable convolution and template updating
CN112949389A (en) Haze image target detection method based on improved target detection network
Zhou et al. An anti-occlusion tracking system for UAV imagery based on Discriminative Scale Space Tracker and Optical Flow
Zhao et al. A traffic sign detection method based on saliency detection
Lin et al. Breaking of brightness consistency in optical flow with a lightweight CNN network
Han et al. A robust object detection algorithm based on background difference and LK optical flow
Tao et al. A sky region segmentation method for outdoor visual-inertial SLAM
US20240193964A1 (en) Lane line recognition method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant