CN114972422B - Image sequence motion occlusion detection method, device, memory and processor - Google Patents

Image sequence motion occlusion detection method, device, memory and processor Download PDF

Info

Publication number
CN114972422B
CN114972422B CN202210491032.0A CN202210491032A CN114972422B CN 114972422 B CN114972422 B CN 114972422B CN 202210491032 A CN202210491032 A CN 202210491032A CN 114972422 B CN114972422 B CN 114972422B
Authority
CN
China
Prior art keywords
occlusion
shielding
characteristic diagram
layer
boundary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210491032.0A
Other languages
Chinese (zh)
Other versions
CN114972422A (en
Inventor
董冲
方挺
韩家明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University Of Technology Science Park Co ltd
Original Assignee
Anhui University Of Technology Science Park Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University Of Technology Science Park Co ltd filed Critical Anhui University Of Technology Science Park Co ltd
Priority to CN202210491032.0A priority Critical patent/CN114972422B/en
Publication of CN114972422A publication Critical patent/CN114972422A/en
Application granted granted Critical
Publication of CN114972422B publication Critical patent/CN114972422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method, a device, a memory and a processor for detecting motion shielding of an image sequence, wherein the method is implemented by acquiring any two continuous frames of images; acquiring a dense optical flow field and a motion boundary region between the two frames of images; and analyzing the dense optical flow field and the motion boundary area by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model. The multi-layer accumulated loss function based on the information weight of the shielding boundary space is adopted in the semantic segmentation depth neural network model, and the network model can be converged to details such as a moving shielding boundary and the like by embedding the space correlation of the neighborhood pixels of the shielding boundary into the learning process, so that the constructed network model is suitable for moving shielding detection, and a shielding detection effect with clear boundary is obtained.

Description

Image sequence motion occlusion detection method, device, memory and processor
Technical Field
The application relates to a moving image sequence processing technology, in particular to a moving image sequence shielding detection method based on a semantic segmentation depth neural network architecture.
Background
Image sequence motion occlusion refers to the phenomenon in which a portion of pixels are visible in one frame of image and not visible in another frame of image. The method is an important task in the fields of image processing and computer vision research, and aims to guide other computer vision tasks such as optical flow estimation, image registration, target segmentation, target tracking and the like to accurately calculate by detecting the shielding areas between different objects and scenes or between different parts of different objects in an image sequence. Research results are widely applied to military science and technology, medical image processing and analysis, aerospace, satellite cloud image analysis and the like.
The traditional image sequence motion shielding detection method is to compare forward and backward motion estimation by utilizing motion symmetry or detect shielding by establishing a model such as geometric constraint, matching constraint and the like, but the method faces the problems of shielding area and shielding boundary blurring when a complex scene or complex motion.
Disclosure of Invention
The embodiment of the application provides a method, a device, a memory and a processor for detecting motion occlusion of an image sequence, which are used for at least solving the technical problems that an occlusion area and an occlusion boundary are fuzzy in the motion occlusion of the image sequence.
According to an aspect of the present application, there is provided an image sequence motion occlusion detection method comprising:
Acquiring any two continuous frames of images;
Acquiring a dense optical flow field and a motion boundary region between the two frames of images;
analyzing the dense optical flow field and the motion boundary area by using a semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;
The loss value L k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:
In the above formula, the meanings of the parameters are as follows:
x refers to pixel coordinates, and Ω represents a real number domain;
k x is the predicted value in each channel of the occlusion feature map input by the last layer of the decoder;
a (k x) represents an activation value of forming an occlusion map value after the k x is mapped to a (0, 1) interval;
o (x) represents an occlusion label of each pixel x, taking 0 or 1;
omega (x) represents a weight, and
O is a shielding area, and B is a shielding boundary area;
Omega 0 (x) is the occlusion region weight;
Omega b is the initial weight of the occlusion border region;
d (σ) is a distance function based on the search window radius σ.
Further, in the present invention, the D (σ) is obtained by:
Wherein:
d 1 (x) is the distance from the pixel in the occlusion boundary region to the occlusion boundary;
d 2 (x) is the distance of the point from the occlusion border region within the search window.
Further, in the present invention, the occlusion border area is obtained by the following method:
obtaining an occlusion boundary from a real occlusion graph;
Performing mask expansion on the shielding boundary to obtain an expanded shielding region;
and subtracting the expanded shielding area from the real shielding image to obtain the shielding boundary area.
Further, in the present invention, the loss value of the semantic segmentation deep neural network model is
Where ω k represents the weight of each layer of occlusion prediction map.
Further, in the present invention, ω k is the same for each layer.
Further, in the present invention, the structure of each layer of the decoder is as follows:
4 deconvolution modules which are stacked in succession, wherein each deconvolution module is used for sequentially executing a deconvolution operation of 4×4 once and a convolution operation of 7×7 twice to obtain a characteristic diagram after deconvolution operation; wherein, the normalization processing and the activation processing are carried out once after each convolution operation;
The splicing module is used for splicing the characteristic diagram generated by the corresponding layer of the encoder, the characteristic diagram obtained by the deconvolution operation of the layer of the decoder and the upsampled shielding characteristic diagram processed by the layer before the decoder to obtain a spliced characteristic diagram, and executing a 3 multiplied by 3 convolution operation on the spliced characteristic diagram to generate a shielding characteristic diagram; the occlusion feature map is for being processed to double the resolution via an upsampling operation as the upsampled occlusion feature map in the next decoding layer;
when the splicing module in the first layer of the decoder performs splicing to obtain a spliced characteristic diagram, the characteristic diagram of the coding part is spliced with the characteristic diagram after the deconvolution module is operated to obtain the spliced characteristic diagram.
Further, in the present invention, the acquiring a motion boundary region between the two frames of images includes:
Detecting a motion boundary of the dense optical flow field with an edge detector;
Expanding the motion boundary of the dense optical flow field using an expansion mask to obtain a motion boundary region.
A second aspect of the present application is to provide an image sequence motion occlusion detection device, including:
the first acquisition module is used for acquiring any two continuous frames of images;
The second acquisition module is used for acquiring a dense optical flow field and a motion boundary between the two frames of images;
the analysis output module is used for analyzing the dense optical flow field and the motion boundary by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;
The loss value L k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:
In the above formula, the meanings of the parameters are as follows:
x refers to pixel coordinates, and Ω represents a real number domain;
k x is the predicted value in each channel of the occlusion feature map input by the last layer of the decoder;
a (k x) represents an activation value of forming an occlusion map value after the k x is mapped to a (0, 1) interval;
o (x) represents an occlusion label of each pixel x, taking 0 or 1;
omega (x) represents a weight, and
O is a shielding area, and B is a shielding boundary area;
Omega 0 (x) is the occlusion region weight;
Omega b is the initial weight of the occlusion border region;
d (σ) is a distance function based on the search window radius σ.
In a third aspect of the application, there is provided a memory for storing software for performing the method of the first aspect of the application.
In a fourth aspect of the application, there is provided a processor for processing software for performing the method of the first aspect of the application.
The beneficial effects are that:
The application provides an image sequence motion shielding detection method, which comprises the steps of obtaining any two continuous frames of images; acquiring a dense optical flow field and a motion boundary region between the two frames of images; and analyzing the dense optical flow field and the motion boundary area by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model. The multi-layer accumulated loss function based on the information weight of the shielding boundary space is adopted in the semantic segmentation depth neural network model, and the network model can be converged to details such as a moving shielding boundary and the like by embedding the space correlation of the neighborhood pixels of the shielding boundary into the learning process, so that the constructed network model is suitable for moving shielding detection, and a shielding detection effect with clear boundary is obtained.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a method of image sequence motion occlusion detection in accordance with an embodiment of the present application;
FIG. 2 is a schematic diagram of a semantic segmentation deep neural network model according to an embodiment of the present application;
FIG. 3 is a first frame image of a sequence of images of the beam_1 in the MPI_ Sintel dataset;
FIG. 4 is a second frame image of a sequence of images of the beam_1 in the MPI_ Sintel dataset;
fig. 5 is a block diagram of a sequence of images in the mpi_ Sintel dataset bamboost_1 calculated by the method according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the application provides a method for detecting moving occlusion of an image sequence, which creatively regards the moving occlusion as semantic information among the image sequences, adopts an encoder-decoder structure of a semantic segmentation depth neural network model to construct an occlusion detection neural network module, analyzes occlusion information in an optical flow field of the image sequence, and designs a loss function which is more attached to a moving occlusion scene, thereby realizing accurate detection of the moving occlusion.
As shown in fig. 1, the method for detecting the motion occlusion of an image sequence provided by the embodiment of the invention includes the following steps:
s102, acquiring any two continuous frames of images.
As shown in fig. 3 and fig. 4, 2 pictures provided by the embodiment of the present application are selected from a frame_0043 image of a frame_0044 image sequence in an mpi_ Sintel dataset as a first frame image and a frame_0044 image as a second frame image, and the gray scale of the 2 frame images is shown in the drawings.
S104, acquiring a dense optical flow field and a motion boundary region between the two frames of images.
In the embodiment, an optical flow convolutional neural network is adopted to detect the dense optical flow field, and then a Sobel edge detector is used to detect the motion boundary of the dense optical flow field; and finally, expanding the motion boundary of the dense optical flow field by using an h multiplied by h expansion mask to obtain a motion boundary region.
S106, analyzing the dense optical flow field and the motion boundary by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model.
In this embodiment, as shown in fig. 2, the feature channel of the selectively language segmentation deep neural network model is 3 layers.
Wherein each layer of the encoder has the following structure:
4 serially stacked convolution modules, each of which is used for sequentially executing a convolution operation of 3×3 once, wherein each convolution operation is followed by a normalization process and an activation process;
And the pooling module is used for executing the maximum pooling operation of 2 multiplied by 2.
Under the encoder structure described above, the number of characteristic channels doubles during each downsampling, 16, 32, 64, 128, 256 respectively.
Wherein each layer of the decoder has the following structure:
4 deconvolution modules which are stacked in succession, wherein each deconvolution module is used for sequentially executing a deconvolution operation of 4×4 once and a convolution operation of 7×7 twice to obtain a characteristic diagram after deconvolution operation; wherein, normalization processing and activation processing are carried out once after each convolution operation;
The splicing module is used for splicing the characteristic diagram generated by the corresponding layer of the encoder, the characteristic diagram obtained by the deconvolution operation of the layer of the decoder and the upsampled shielding characteristic diagram processed by the layer before the decoder to obtain a spliced characteristic diagram, and executing a 3 multiplied by 3 convolution operation on the spliced characteristic diagram to generate a shielding characteristic diagram; the occlusion feature map is for being processed to double the resolution via an upsampling operation as the upsampled occlusion feature map in the next decoding layer;
when the splicing module in the first layer of the decoder performs splicing to obtain a spliced characteristic diagram, the characteristic diagram of the coding part is spliced with the characteristic diagram after the deconvolution module is operated to obtain the spliced characteristic diagram.
Each deconvolution module operation reduces the number of channels by half, 256, 128, 64, 32, 16,1 respectively, wherein the deconvolution module is not introduced into the last layer, and a single-channel occlusion feature map is generated through 3×3 convolution.
Occlusion detection is a two-class semantic problem, typically using a loss function based on binary cross entropy to train a neural network. However, the motion occlusion pixels in the image sequence generally have obvious sample deflection, and when the number of non-occlusion pixels is far greater than that of occlusion pixels, the network loss value cannot better reflect the accuracy of detection of the occlusion pixels; meanwhile, the designed network needs to be well converged to details such as a motion shielding boundary. Based on the two considerations, in this embodiment, a multi-layer accumulated loss function based on the weight of the occlusion boundary space information is designed, specifically, the loss value L k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:
In the above formula, the meanings of the parameters are as follows:
x refers to pixel coordinates, and Ω represents a real number domain;
k x is the predicted value in each channel of the occlusion feature map input by the last layer of the decoder;
a (k x) represents an activation value that forms an occlusion map value after the k x is mapped to a (0, 1) interval by using a Sigmoid function;
o (x) represents an occlusion tag of each pixel x, taking 0 or 1, for distinguishing whether it is occluded;
omega (x) represents a weight, and
O is a shielding area, and B is a shielding boundary area;
Omega 0 (x) is the occlusion region weight;
Omega b is the initial weight of the occlusion border region;
d (σ) is a distance function based on the search window radius σ.
In this embodiment, the D (σ) is obtained by:
Wherein:
d 1 (x) is the distance from the pixel in the occlusion boundary region to the occlusion boundary;
d 2 (x) is the distance of the point from the occlusion border region within the search window.
The method is based on a semantic segmentation depth neural network architecture, improves the accuracy of detection of a neural network model on an occlusion region and an occlusion boundary by introducing motion boundary input and designing a multi-layer accumulation loss function based on occlusion boundary space information weight, has higher calculation precision and better adaptability to complex scenes and complex moving image sequences, and can be effectively applied to image sequence motion analysis visual tasks.
In this embodiment, the occlusion border area is obtained by the following method:
obtaining an occlusion boundary from a real occlusion graph;
Performing mask expansion on the shielding boundary to obtain an expanded shielding region;
and subtracting the expanded shielding area from the real shielding image to obtain the shielding boundary area.
In the embodiment, supervised learning is adopted, a real occlusion map is a target in machine learning, and the real occlusion map is also obtained from the MPI_ Sintel dataset. According to the embodiment of the application, the occlusion boundary area is obtained according to the real occlusion graph and the distribution of weights is acted, so that the method in the embodiment of the application can be clearly shown at the occlusion boundary.
In this embodiment, downsampling the occlusion real map according to the size of each layer of occlusion prediction map, and finally obtaining the loss value of the semantic segmentation depth neural network model as the loss value by using the defined loss function
Where ω k represents the weight of each layer of occlusion prediction map.
In this embodiment, ω k is taken as the same average region of 0.5 per layer.
According to the occlusion detection result in fig. 5, the method of the invention improves the accuracy of the detection of the moving occlusion of the image sequence, has higher detection accuracy of the moving occlusion of the complex scene and the complex moving image sequence, and has wide application prospect in the fields of medical segmentation, video monitoring and the like.
According to a second aspect of the present application, there is provided an image sequence motion occlusion detection device, comprising:
the first acquisition module is used for acquiring any two continuous frames of images;
The second acquisition module is used for acquiring a dense optical flow field and a motion boundary area between the two frames of images;
The analysis output module is used for analyzing the dense optical flow field and the motion boundary area by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;
The loss value L k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:
In the above formula, the meanings of the parameters are as follows:
x refers to pixel coordinates, and Ω represents a real number domain;
k x is the predicted value in each channel of the occlusion feature map input by the last layer of the decoder;
a (k x) represents an activation value of forming an occlusion map value after the k x is mapped to a (0, 1) interval;
o (x) represents an occlusion label of each pixel x, taking 0 or 1;
omega (x) represents a weight, and
O is a shielding area, and B is a shielding boundary area;
Omega 0 (x) is the occlusion region weight;
Omega b is the initial weight of the occlusion border region;
d (σ) is a distance function based on the search window radius σ.
According to yet another aspect of the present application, a processor is provided for executing software for performing the method of image sequence motion occlusion detection.
According to a further aspect of the application, a memory is provided for storing software for performing the method of image sequence motion occlusion detection.
It should be noted that, the method for detecting the moving occlusion of the image sequence executed by the software is the same as the method for detecting the moving occlusion of the image sequence described above, and will not be described herein.
In this embodiment, there is provided an electronic device including a memory in which a computer program is stored, and a processor configured to run the computer program to perform the method in the above embodiment.
These computer programs may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks, and corresponding steps may be implemented in different modules.
The above-described programs may be run on a processor or may also be stored in memory (or referred to as computer-readable media), including both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technique. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (9)

1. The image sequence motion shielding detection method is characterized by comprising the following steps of:
Acquiring any two continuous frames of images;
Acquiring a dense optical flow field and a motion boundary region between the two frames of images;
analyzing the dense optical flow field and the motion boundary area by using a semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;
The loss value L k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:
In the above formula, the meanings of the parameters are as follows:
x refers to pixel coordinates, and Ω represents a real number domain;
k x is the predicted value in each channel of the occlusion feature map input by the last layer of the decoder;
a (k x) represents an activation value of forming an occlusion map value after the k x is mapped to a (0, 1) interval;
o (x) represents an occlusion label of each pixel x, taking 0 or 1;
omega (x) represents a weight, and
O is a shielding area, and B is a shielding boundary area;
Omega 0 (x) is the occlusion region weight;
Omega b is the initial weight of the occlusion border region;
D (σ) is a distance function based on the search window radius σ;
the structure of each layer of the decoder is as follows:
4 deconvolution modules which are stacked in succession, wherein each deconvolution module is used for sequentially executing a deconvolution operation of 4×4 once and a convolution operation of 7×7 twice to obtain a characteristic diagram after deconvolution operation; wherein, normalization processing and activation processing are carried out once after each convolution operation;
The splicing module is used for splicing the characteristic diagram generated by the corresponding layer of the encoder, the characteristic diagram obtained by the deconvolution operation of the layer of the decoder and the upsampled shielding characteristic diagram processed by the layer before the decoder to obtain a spliced characteristic diagram, and executing a 3 multiplied by 3 convolution operation on the spliced characteristic diagram to generate a shielding characteristic diagram; the occlusion feature map is for being processed to double the resolution via an upsampling operation as the upsampled occlusion feature map in the next decoding layer;
when the splicing module in the first layer of the decoder performs splicing to obtain a spliced characteristic diagram, the characteristic diagram of the coding part is spliced with the characteristic diagram after the deconvolution module is operated to obtain the spliced characteristic diagram.
2. The method according to claim 1, characterized in that: the D (σ) is obtained by:
Wherein:
d 1 (x) is the distance from the pixel in the occlusion boundary region to the occlusion boundary;
d 2 (x) is the distance of the pixel within the search window to the occlusion border region.
3. The method according to claim 1, characterized in that: the shielding boundary area is obtained by the following method:
obtaining an occlusion boundary from a real occlusion graph;
Performing mask expansion on the shielding boundary to obtain an expanded shielding region;
and subtracting the expanded shielding area from the real shielding image to obtain the shielding boundary area.
4. The method according to claim 1, characterized in that: the loss value of the semantic segmentation depth neural network model is
Where ω k represents the weight of each layer of occlusion prediction map.
5. The method according to claim 4, wherein: the ω k is the same for each layer.
6. The method according to any one of claims 1 to 5, characterized in that: the acquiring the motion boundary region between the two frames of images comprises the following steps:
Detecting a motion boundary of the dense optical flow field with an edge detector;
the motion boundary of the dense optical flow field is expanded using an expansion mask to obtain a motion boundary region.
7. Image sequence motion shelters from detection device, its characterized in that: comprising the following steps:
the first acquisition module is used for acquiring any two continuous frames of images;
The second acquisition module is used for acquiring a dense optical flow field and a motion boundary area between the two frames of images;
The analysis output module is used for analyzing the dense optical flow field and the motion boundary area by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;
The loss value L k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:
In the above formula, the meanings of the parameters are as follows:
x refers to pixel coordinates, and Ω represents a real number domain;
k x is the predicted value in each channel of the occlusion feature map input by the last layer of the decoder;
a (k x) represents an activation value of forming an occlusion map value after the k x is mapped to a (0, 1) interval;
o (x) represents an occlusion label of each pixel x, taking 0 or 1;
omega (x) represents a weight, and
O is a shielding area, and B is a shielding boundary area;
Omega 0 (x) is the occlusion region weight;
Omega b is the initial weight of the occlusion border region;
D (σ) is a distance function based on the search window radius σ;
the structure of each layer of the decoder is as follows:
4 deconvolution modules which are stacked in succession, wherein each deconvolution module is used for sequentially executing a deconvolution operation of 4×4 once and a convolution operation of 7×7 twice to obtain a characteristic diagram after deconvolution operation; wherein, normalization processing and activation processing are carried out once after each convolution operation;
The splicing module is used for splicing the characteristic diagram generated by the corresponding layer of the encoder, the characteristic diagram obtained by the deconvolution operation of the layer of the decoder and the upsampled shielding characteristic diagram processed by the layer before the decoder to obtain a spliced characteristic diagram, and executing a 3 multiplied by 3 convolution operation on the spliced characteristic diagram to generate a shielding characteristic diagram; the occlusion feature map is for being processed to double the resolution via an upsampling operation as the upsampled occlusion feature map in the next decoding layer;
when the splicing module in the first layer of the decoder performs splicing to obtain a spliced characteristic diagram, the characteristic diagram of the coding part is spliced with the characteristic diagram after the deconvolution module is operated to obtain the spliced characteristic diagram.
8. A memory for storing software for performing the method of any one of claims 1-6.
9. A processor for processing software for performing the method of any of claims 1-6.
CN202210491032.0A 2022-05-07 2022-05-07 Image sequence motion occlusion detection method, device, memory and processor Active CN114972422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210491032.0A CN114972422B (en) 2022-05-07 2022-05-07 Image sequence motion occlusion detection method, device, memory and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210491032.0A CN114972422B (en) 2022-05-07 2022-05-07 Image sequence motion occlusion detection method, device, memory and processor

Publications (2)

Publication Number Publication Date
CN114972422A CN114972422A (en) 2022-08-30
CN114972422B true CN114972422B (en) 2024-06-07

Family

ID=82980963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210491032.0A Active CN114972422B (en) 2022-05-07 2022-05-07 Image sequence motion occlusion detection method, device, memory and processor

Country Status (1)

Country Link
CN (1) CN114972422B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102073873B1 (en) * 2019-03-22 2020-02-05 주식회사 루닛 Method for semantic segmentation and apparatus thereof
CN110992367A (en) * 2019-10-31 2020-04-10 北京交通大学 Method for performing semantic segmentation on image with shielding area
CN111401308A (en) * 2020-04-08 2020-07-10 蚌埠学院 Fish behavior video identification method based on optical flow effect
CN112347852A (en) * 2020-10-10 2021-02-09 上海交通大学 Target tracking and semantic segmentation method and device for sports video and plug-in
CN112509014A (en) * 2020-12-14 2021-03-16 南昌航空大学 Robust interpolation light stream computing method matched with pyramid shielding detection block
CN112862828A (en) * 2019-11-26 2021-05-28 华为技术有限公司 Semantic segmentation method, model training method and device
CN113888604A (en) * 2021-09-27 2022-01-04 安徽清新互联信息科技有限公司 Target tracking method based on depth optical flow

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080279478A1 (en) * 2007-05-09 2008-11-13 Mikhail Tsoupko-Sitnikov Image processing method and image processing apparatus
US10986325B2 (en) * 2018-09-12 2021-04-20 Nvidia Corporation Scene flow estimation using shared features
US20220101539A1 (en) * 2020-09-30 2022-03-31 Qualcomm Incorporated Sparse optical flow estimation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102073873B1 (en) * 2019-03-22 2020-02-05 주식회사 루닛 Method for semantic segmentation and apparatus thereof
CN110992367A (en) * 2019-10-31 2020-04-10 北京交通大学 Method for performing semantic segmentation on image with shielding area
CN112862828A (en) * 2019-11-26 2021-05-28 华为技术有限公司 Semantic segmentation method, model training method and device
CN111401308A (en) * 2020-04-08 2020-07-10 蚌埠学院 Fish behavior video identification method based on optical flow effect
CN112347852A (en) * 2020-10-10 2021-02-09 上海交通大学 Target tracking and semantic segmentation method and device for sports video and plug-in
CN112509014A (en) * 2020-12-14 2021-03-16 南昌航空大学 Robust interpolation light stream computing method matched with pyramid shielding detection block
CN113888604A (en) * 2021-09-27 2022-01-04 安徽清新互联信息科技有限公司 Target tracking method based on depth optical flow

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Better Dense Trajectories by Motion in Videos";Liu Yu 等;《 IEEE transactions on cybernetics》;20171128;第49卷(第1期);159-170 *
"基于运动优化语义分割的变分光流计算方法";葛利跃 等;《模式识别与人工智能》;20210715;第34卷(第7期);631-645 *

Also Published As

Publication number Publication date
CN114972422A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
US10943145B2 (en) Image processing methods and apparatus, and electronic devices
CN112465828B (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN109300151B (en) Image processing method and device and electronic equipment
CN102859389A (en) Range measurement using a coded aperture
CN111797836B (en) Depth learning-based obstacle segmentation method for extraterrestrial celestial body inspection device
CN111986472B (en) Vehicle speed determining method and vehicle
CN113269722A (en) Training method for generating countermeasure network and high-resolution image reconstruction method
CN109377499B (en) Pixel-level object segmentation method and device
CN115035295B (en) Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
AU2020272936B2 (en) Methods and systems for crack detection using a fully convolutional network
CN111415300A (en) Splicing method and system for panoramic image
US9659372B2 (en) Video disparity estimate space-time refinement method and codec
CN113807185B (en) Data processing method and device
CN111929688B (en) Method and equipment for determining radar echo prediction frame sequence
CN114972422B (en) Image sequence motion occlusion detection method, device, memory and processor
US20230298335A1 (en) Computer-implemented method, data processing apparatus and computer program for object detection
CN116468968A (en) Astronomical image small target detection method integrating attention mechanism
Yu et al. Deep learning-based RGB-thermal image denoising: review and applications
CN114913519B (en) 3D target detection method and device, electronic equipment and storage medium
Ke et al. Scale-aware dimension-wise attention network for small ship instance segmentation in synthetic aperture radar images
CN113837243B (en) RGB-D camera dynamic visual odometer method based on edge information
US11057641B1 (en) Systems and methods of motion estimation using monocular event-based sensor
CN116894959B (en) Infrared small target detection method and device based on mixed scale and focusing network
CN118072229B (en) Video salient target detection method and system based on hierarchical feature alignment
CN117475357B (en) Monitoring video image shielding detection method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant