CN114972422B

CN114972422B - Image sequence motion occlusion detection method, device, memory and processor

Info

Publication number: CN114972422B
Application number: CN202210491032.0A
Authority: CN
Inventors: 董冲; 方挺; 韩家明
Original assignee: Anhui University Of Technology Science Park Co ltd
Current assignee: Anhui University Of Technology Science Park Co ltd
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2024-06-07
Anticipated expiration: 2042-05-07
Also published as: CN114972422A

Abstract

The application discloses a method, a device, a memory and a processor for detecting motion shielding of an image sequence, wherein the method is implemented by acquiring any two continuous frames of images; acquiring a dense optical flow field and a motion boundary region between the two frames of images; and analyzing the dense optical flow field and the motion boundary area by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model. The multi-layer accumulated loss function based on the information weight of the shielding boundary space is adopted in the semantic segmentation depth neural network model, and the network model can be converged to details such as a moving shielding boundary and the like by embedding the space correlation of the neighborhood pixels of the shielding boundary into the learning process, so that the constructed network model is suitable for moving shielding detection, and a shielding detection effect with clear boundary is obtained.

Description

Image sequence motion occlusion detection method, device, memory and processor

Technical Field

The application relates to a moving image sequence processing technology, in particular to a moving image sequence shielding detection method based on a semantic segmentation depth neural network architecture.

Background

Image sequence motion occlusion refers to the phenomenon in which a portion of pixels are visible in one frame of image and not visible in another frame of image. The method is an important task in the fields of image processing and computer vision research, and aims to guide other computer vision tasks such as optical flow estimation, image registration, target segmentation, target tracking and the like to accurately calculate by detecting the shielding areas between different objects and scenes or between different parts of different objects in an image sequence. Research results are widely applied to military science and technology, medical image processing and analysis, aerospace, satellite cloud image analysis and the like.

The traditional image sequence motion shielding detection method is to compare forward and backward motion estimation by utilizing motion symmetry or detect shielding by establishing a model such as geometric constraint, matching constraint and the like, but the method faces the problems of shielding area and shielding boundary blurring when a complex scene or complex motion.

Disclosure of Invention

The embodiment of the application provides a method, a device, a memory and a processor for detecting motion occlusion of an image sequence, which are used for at least solving the technical problems that an occlusion area and an occlusion boundary are fuzzy in the motion occlusion of the image sequence.

According to an aspect of the present application, there is provided an image sequence motion occlusion detection method comprising:

Acquiring any two continuous frames of images;

Acquiring a dense optical flow field and a motion boundary region between the two frames of images;

analyzing the dense optical flow field and the motion boundary area by using a semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;

The loss value L _k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:

In the above formula, the meanings of the parameters are as follows:

x refers to pixel coordinates, and Ω represents a real number domain;

k _x is the predicted value in each channel of the occlusion feature map input by the last layer of the decoder;

a (k _x) represents an activation value of forming an occlusion map value after the k _x is mapped to a (0, 1) interval;

o (x) represents an occlusion label of each pixel x, taking 0 or 1;

omega (x) represents a weight, and

O is a shielding area, and B is a shielding boundary area;

Omega ₀ (x) is the occlusion region weight;

Omega _b is the initial weight of the occlusion border region;

d (σ) is a distance function based on the search window radius σ.

Further, in the present invention, the D (σ) is obtained by:

Wherein:

d ₁ (x) is the distance from the pixel in the occlusion boundary region to the occlusion boundary;

d ₂ (x) is the distance of the point from the occlusion border region within the search window.

Further, in the present invention, the occlusion border area is obtained by the following method:

obtaining an occlusion boundary from a real occlusion graph;

Performing mask expansion on the shielding boundary to obtain an expanded shielding region;

and subtracting the expanded shielding area from the real shielding image to obtain the shielding boundary area.

Further, in the present invention, the loss value of the semantic segmentation deep neural network model is

Where ω ^k represents the weight of each layer of occlusion prediction map.

Further, in the present invention, ω ^k is the same for each layer.

Further, in the present invention, the structure of each layer of the decoder is as follows:

4 deconvolution modules which are stacked in succession, wherein each deconvolution module is used for sequentially executing a deconvolution operation of 4×4 once and a convolution operation of 7×7 twice to obtain a characteristic diagram after deconvolution operation; wherein, the normalization processing and the activation processing are carried out once after each convolution operation;

The splicing module is used for splicing the characteristic diagram generated by the corresponding layer of the encoder, the characteristic diagram obtained by the deconvolution operation of the layer of the decoder and the upsampled shielding characteristic diagram processed by the layer before the decoder to obtain a spliced characteristic diagram, and executing a 3 multiplied by 3 convolution operation on the spliced characteristic diagram to generate a shielding characteristic diagram; the occlusion feature map is for being processed to double the resolution via an upsampling operation as the upsampled occlusion feature map in the next decoding layer;

when the splicing module in the first layer of the decoder performs splicing to obtain a spliced characteristic diagram, the characteristic diagram of the coding part is spliced with the characteristic diagram after the deconvolution module is operated to obtain the spliced characteristic diagram.

Further, in the present invention, the acquiring a motion boundary region between the two frames of images includes:

Detecting a motion boundary of the dense optical flow field with an edge detector;

Expanding the motion boundary of the dense optical flow field using an expansion mask to obtain a motion boundary region.

A second aspect of the present application is to provide an image sequence motion occlusion detection device, including:

the first acquisition module is used for acquiring any two continuous frames of images;

The second acquisition module is used for acquiring a dense optical flow field and a motion boundary between the two frames of images;

the analysis output module is used for analyzing the dense optical flow field and the motion boundary by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;

In the above formula, the meanings of the parameters are as follows:

x refers to pixel coordinates, and Ω represents a real number domain;

o (x) represents an occlusion label of each pixel x, taking 0 or 1;

omega (x) represents a weight, and

O is a shielding area, and B is a shielding boundary area;

Omega ₀ (x) is the occlusion region weight;

Omega _b is the initial weight of the occlusion border region;

d (σ) is a distance function based on the search window radius σ.

In a third aspect of the application, there is provided a memory for storing software for performing the method of the first aspect of the application.

In a fourth aspect of the application, there is provided a processor for processing software for performing the method of the first aspect of the application.

The beneficial effects are that:

The application provides an image sequence motion shielding detection method, which comprises the steps of obtaining any two continuous frames of images; acquiring a dense optical flow field and a motion boundary region between the two frames of images; and analyzing the dense optical flow field and the motion boundary area by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model. The multi-layer accumulated loss function based on the information weight of the shielding boundary space is adopted in the semantic segmentation depth neural network model, and the network model can be converged to details such as a moving shielding boundary and the like by embedding the space correlation of the neighborhood pixels of the shielding boundary into the learning process, so that the constructed network model is suitable for moving shielding detection, and a shielding detection effect with clear boundary is obtained.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a flow chart of a method of image sequence motion occlusion detection in accordance with an embodiment of the present application;

FIG. 2 is a schematic diagram of a semantic segmentation deep neural network model according to an embodiment of the present application;

FIG. 3 is a first frame image of a sequence of images of the beam_1 in the MPI_ Sintel dataset;

FIG. 4 is a second frame image of a sequence of images of the beam_1 in the MPI_ Sintel dataset;

fig. 5 is a block diagram of a sequence of images in the mpi_ Sintel dataset bamboost_1 calculated by the method according to an embodiment of the present invention.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The embodiment of the application provides a method for detecting moving occlusion of an image sequence, which creatively regards the moving occlusion as semantic information among the image sequences, adopts an encoder-decoder structure of a semantic segmentation depth neural network model to construct an occlusion detection neural network module, analyzes occlusion information in an optical flow field of the image sequence, and designs a loss function which is more attached to a moving occlusion scene, thereby realizing accurate detection of the moving occlusion.

As shown in fig. 1, the method for detecting the motion occlusion of an image sequence provided by the embodiment of the invention includes the following steps:

s102, acquiring any two continuous frames of images.

As shown in fig. 3 and fig. 4, 2 pictures provided by the embodiment of the present application are selected from a frame_0043 image of a frame_0044 image sequence in an mpi_ Sintel dataset as a first frame image and a frame_0044 image as a second frame image, and the gray scale of the 2 frame images is shown in the drawings.

S104, acquiring a dense optical flow field and a motion boundary region between the two frames of images.

In the embodiment, an optical flow convolutional neural network is adopted to detect the dense optical flow field, and then a Sobel edge detector is used to detect the motion boundary of the dense optical flow field; and finally, expanding the motion boundary of the dense optical flow field by using an h multiplied by h expansion mask to obtain a motion boundary region.

S106, analyzing the dense optical flow field and the motion boundary by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model.

In this embodiment, as shown in fig. 2, the feature channel of the selectively language segmentation deep neural network model is 3 layers.

Wherein each layer of the encoder has the following structure:

4 serially stacked convolution modules, each of which is used for sequentially executing a convolution operation of 3×3 once, wherein each convolution operation is followed by a normalization process and an activation process;

And the pooling module is used for executing the maximum pooling operation of 2 multiplied by 2.

Under the encoder structure described above, the number of characteristic channels doubles during each downsampling, 16, 32, 64, 128, 256 respectively.

Wherein each layer of the decoder has the following structure:

4 deconvolution modules which are stacked in succession, wherein each deconvolution module is used for sequentially executing a deconvolution operation of 4×4 once and a convolution operation of 7×7 twice to obtain a characteristic diagram after deconvolution operation; wherein, normalization processing and activation processing are carried out once after each convolution operation;

Each deconvolution module operation reduces the number of channels by half, 256, 128, 64, 32, 16,1 respectively, wherein the deconvolution module is not introduced into the last layer, and a single-channel occlusion feature map is generated through 3×3 convolution.

Occlusion detection is a two-class semantic problem, typically using a loss function based on binary cross entropy to train a neural network. However, the motion occlusion pixels in the image sequence generally have obvious sample deflection, and when the number of non-occlusion pixels is far greater than that of occlusion pixels, the network loss value cannot better reflect the accuracy of detection of the occlusion pixels; meanwhile, the designed network needs to be well converged to details such as a motion shielding boundary. Based on the two considerations, in this embodiment, a multi-layer accumulated loss function based on the weight of the occlusion boundary space information is designed, specifically, the loss value L _k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:

In the above formula, the meanings of the parameters are as follows:

x refers to pixel coordinates, and Ω represents a real number domain;

a (k _x) represents an activation value that forms an occlusion map value after the k _x is mapped to a (0, 1) interval by using a Sigmoid function;

o (x) represents an occlusion tag of each pixel x, taking 0 or 1, for distinguishing whether it is occluded;

omega (x) represents a weight, and

O is a shielding area, and B is a shielding boundary area;

Omega ₀ (x) is the occlusion region weight;

Omega _b is the initial weight of the occlusion border region;

d (σ) is a distance function based on the search window radius σ.

In this embodiment, the D (σ) is obtained by:

Wherein:

The method is based on a semantic segmentation depth neural network architecture, improves the accuracy of detection of a neural network model on an occlusion region and an occlusion boundary by introducing motion boundary input and designing a multi-layer accumulation loss function based on occlusion boundary space information weight, has higher calculation precision and better adaptability to complex scenes and complex moving image sequences, and can be effectively applied to image sequence motion analysis visual tasks.

In this embodiment, the occlusion border area is obtained by the following method:

obtaining an occlusion boundary from a real occlusion graph;

In the embodiment, supervised learning is adopted, a real occlusion map is a target in machine learning, and the real occlusion map is also obtained from the MPI_ Sintel dataset. According to the embodiment of the application, the occlusion boundary area is obtained according to the real occlusion graph and the distribution of weights is acted, so that the method in the embodiment of the application can be clearly shown at the occlusion boundary.

In this embodiment, downsampling the occlusion real map according to the size of each layer of occlusion prediction map, and finally obtaining the loss value of the semantic segmentation depth neural network model as the loss value by using the defined loss function

Where ω ^k represents the weight of each layer of occlusion prediction map.

In this embodiment, ω ^k is taken as the same average region of 0.5 per layer.

According to the occlusion detection result in fig. 5, the method of the invention improves the accuracy of the detection of the moving occlusion of the image sequence, has higher detection accuracy of the moving occlusion of the complex scene and the complex moving image sequence, and has wide application prospect in the fields of medical segmentation, video monitoring and the like.

According to a second aspect of the present application, there is provided an image sequence motion occlusion detection device, comprising:

The second acquisition module is used for acquiring a dense optical flow field and a motion boundary area between the two frames of images;

The analysis output module is used for analyzing the dense optical flow field and the motion boundary area by using the semantic segmentation depth neural network model as input to obtain a shielding detection result output by the semantic segmentation depth neural network model;

In the above formula, the meanings of the parameters are as follows:

x refers to pixel coordinates, and Ω represents a real number domain;

o (x) represents an occlusion label of each pixel x, taking 0 or 1;

omega (x) represents a weight, and

O is a shielding area, and B is a shielding boundary area;

Omega ₀ (x) is the occlusion region weight;

Omega _b is the initial weight of the occlusion border region;

d (σ) is a distance function based on the search window radius σ.

According to yet another aspect of the present application, a processor is provided for executing software for performing the method of image sequence motion occlusion detection.

According to a further aspect of the application, a memory is provided for storing software for performing the method of image sequence motion occlusion detection.

It should be noted that, the method for detecting the moving occlusion of the image sequence executed by the software is the same as the method for detecting the moving occlusion of the image sequence described above, and will not be described herein.

In this embodiment, there is provided an electronic device including a memory in which a computer program is stored, and a processor configured to run the computer program to perform the method in the above embodiment.

These computer programs may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks, and corresponding steps may be implemented in different modules.

The above-described programs may be run on a processor or may also be stored in memory (or referred to as computer-readable media), including both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technique. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. The image sequence motion shielding detection method is characterized by comprising the following steps of:

Acquiring any two continuous frames of images;

The loss value L ^k of the kth layer of the decoder in the semantic segmentation depth neural network model is as follows:

In the above formula, the meanings of the parameters are as follows:

x refers to pixel coordinates, and Ω represents a real number domain;

o (x) represents an occlusion label of each pixel x, taking 0 or 1;

omega (x) represents a weight, and

O is a shielding area, and B is a shielding boundary area;

Omega ₀ (x) is the occlusion region weight;

Omega _b is the initial weight of the occlusion border region;

D (σ) is a distance function based on the search window radius σ;

the structure of each layer of the decoder is as follows:

2. The method according to claim 1, characterized in that: the D (σ) is obtained by:

Wherein:

d ₂ (x) is the distance of the pixel within the search window to the occlusion border region.

3. The method according to claim 1, characterized in that: the shielding boundary area is obtained by the following method:

obtaining an occlusion boundary from a real occlusion graph;

4. The method according to claim 1, characterized in that: the loss value of the semantic segmentation depth neural network model is

Where ω ^k represents the weight of each layer of occlusion prediction map.

5. The method according to claim 4, wherein: the ω ^k is the same for each layer.

6. The method according to any one of claims 1 to 5, characterized in that: the acquiring the motion boundary region between the two frames of images comprises the following steps:

the motion boundary of the dense optical flow field is expanded using an expansion mask to obtain a motion boundary region.

7. Image sequence motion shelters from detection device, its characterized in that: comprising the following steps:

In the above formula, the meanings of the parameters are as follows:

x refers to pixel coordinates, and Ω represents a real number domain;

o (x) represents an occlusion label of each pixel x, taking 0 or 1;

omega (x) represents a weight, and

O is a shielding area, and B is a shielding boundary area;

Omega ₀ (x) is the occlusion region weight;

Omega _b is the initial weight of the occlusion border region;

D (σ) is a distance function based on the search window radius σ;

the structure of each layer of the decoder is as follows:

8. A memory for storing software for performing the method of any one of claims 1-6.

9. A processor for processing software for performing the method of any of claims 1-6.