CN118097566B - Scene change detection method, device, medium and equipment based on deep learning - Google Patents

Scene change detection method, device, medium and equipment based on deep learning Download PDF

Info

Publication number
CN118097566B
CN118097566B CN202410487285.XA CN202410487285A CN118097566B CN 118097566 B CN118097566 B CN 118097566B CN 202410487285 A CN202410487285 A CN 202410487285A CN 118097566 B CN118097566 B CN 118097566B
Authority
CN
China
Prior art keywords
image
images
feature map
module
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410487285.XA
Other languages
Chinese (zh)
Other versions
CN118097566A (en
Inventor
杨国锴
卓涛
程志勇
高赞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Original Assignee
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Institute of Artificial Intelligence filed Critical Qilu University of Technology
Priority to CN202410487285.XA priority Critical patent/CN118097566B/en
Publication of CN118097566A publication Critical patent/CN118097566A/en
Application granted granted Critical
Publication of CN118097566B publication Critical patent/CN118097566B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to the technical field of image recognition, in particular to a scene change detection method, device, medium and equipment based on deep learning. The method comprises the following steps: acquiring an image to be detected; inputting an image pair to be detected into a homography-based alignment module, and acquiring images with respective scenes aligned; the images related to the aligned images are extracted and input into a preliminary fluctuation detection network to obtain fluctuation information, the fluctuation information is input into respective positioning networks to output boundary boxes of fluctuation areas of the two images. According to the invention, the two images are aligned through homography, so that the characteristic that the fluctuation of the two images cannot be acquired quickly is made up, the corresponding relation of the two images is captured through a cross attention mechanism structure, and the loss of information in the unaligned images is made up. The network adopts a twin neural network architecture, two images are simultaneously operated, and the identification of a variation area is better completed through a feature fusion module.

Description

Scene change detection method, device, medium and equipment based on deep learning
Technical Field
The invention relates to the technical field of image recognition, in particular to a scene change detection method, device, medium and equipment based on deep learning.
Background
With the rapid development of computer vision technology, exploring scene changes plays an important role in the fields of image processing and computer graphics. Scene changes aim at developing algorithms and techniques to detect, analyze and understand changes in different scenes, the need to extract useful information from dynamic scenes. This subject relates to modeling, detecting and describing changes in image sequences or video to provide an understanding and analysis of scene evolution. With the popularization of digital image capturing apparatuses and the increase in computing power, it becomes easier to acquire and process image sequences and video data, and thus, it becomes increasingly important to accurately understand and analyze changes in scenes. However, there is still some knowledge gap for understanding the changes in the scene. Changes in the scene may involve the appearance, disappearance, movement, shape change, etc. of objects, as well as changes in illumination, background, etc. of the scene. For example, given a pair of images, the location of the change between them is determined. The primary solution is to protect against extraneous "noise" or "disturbance" variables. For example, in a fixed camera surveillance application, the "disturbance" parameter may be a scene change in illumination, changing weather conditions (e.g., rain, fog), etc., all of which prevent the application of common methods. Furthermore, the two images may come from different shooting angles entirely, there may be geometrical variations between them in addition to photometric variations. In this case, the effect of detecting the change in the image pair is not ideal. It can be seen that the understanding and analysis of these changes is still not deep enough at present. Therefore, how to provide a device that can accurately and reliably detect changes in an image pair without being affected by the external scene environment and disregarding geometric changes is a challenging problem.
Disclosure of Invention
Aiming at the defects of the prior art, the invention develops a scene change detection method, a scene change detection device, a scene change detection medium and scene change detection equipment based on deep learning.
The technical scheme for solving the technical problems is as follows: in one aspect, the invention provides a scene change detection method based on deep learning, which comprises the following steps:
a) Preprocessing the two original images to obtain a preprocessed size of Two images of (2)AndL and R are two images,The data type representing the matrix elements is real,For the height of the image to be high,Is the width of the image, 3 is the number of channels of the image,The representation image L is formed by a matrix of real numbers of a shape size 3 x h x w,And the same is done;
b) Constructing a homography-based alignment module, inputting the preprocessed two images L and R into the module to respectively obtain aligned images L 'and R', L 'corresponding to the two images, wherein the L' is the alignment of the image L based on an image R coordinate system, and the R is the alignment of the image L based on the image R coordinate system Is the alignment of the image R based on the image L coordinate system, so that the spatial position between the image L 'and the image R is consistent, and the spatial position between the image R' and the image L is consistent;
c) Constructing a preliminary fluctuation detection network composed of a feature extraction module and a fluctuation extraction module, respectively carrying out channel combination on two aligned images L 'and R' and corresponding preprocessed images R and L according to the spatial position consistency of the images, and combining L and R 'in the channel dimension in the channel combination process to obtain a 6-channel image LR' with the size of Where h and w represent the height and width of the image, respectively,The representation image LR' is formed by a real matrix with a shape size of 6×h×w. Similarly, R and L 'are combined in the channel dimension to obtain a 6-channel image RL' which is also of the size. The combined images LR 'and RL' are respectively input into the corresponding preliminary change detection network to obtain change information D L and D R corresponding to the images L and R, LR 'is the combined image of the preprocessed image L and the aligned image R', RL 'is the combined image of the preprocessed image R and the aligned image L', D L is the change information of the image L, and D R is the change information of the image R;
d) Constructing a positioning network formed by a feature fusion module and a frame detection module, inputting variation information D L and D R obtained by the preliminary variation detection network into the respective corresponding positioning networks, and then outputting a boundary frame of a variation area of each of two images L and R by the two positioning networks;
e) Training a positioning network.
Based on the scene change detection method based on deep learning, the step b) comprises the following steps:
b-1) based on the homography alignment module, the image feature point matching is composed of feature point detection, feature point description and feature point matching, and the image alignment is composed of a calculation homography transformation matrix and a registration image;
b-2) inputting the preprocessed images L and R into characteristic point matching of an alignment module, obtaining respective characteristic points of the image pairs, matching the characteristic points in the two images, and outputting to obtain characteristic points KP L,KPR,KPL successfully matched with the two images as points with obvious local structures in the image L, and KP R as points with obvious local structures in the image R;
b-3) inputting the matched characteristic points KP L,KPR into a method for calculating homography transformation matrix to calculate transformation matrices H L-R and H R -L,HL-R as images L-direction images The aligned transformation matrix, H R -L is the transformation matrix for aligning the image R to the image L, and then the calculated transformation matrix is applied to the corresponding image to realize the image alignment, and the aligned image is obtained by outputtingAndL 'is the image in which the image L is aligned in the scene of the image R, and R' is the image in which the image R is aligned in the scene of the image L.
Based on the scene change detection method based on deep learning, the step c) includes the following steps:
c-1) the preliminary fluctuation detection network is composed of image channel connection, a U-Net encoder and a fluctuation information extraction module, wherein the fluctuation information extraction module is composed of a subtraction operation and a cross attention mechanism;
c-2) aligning the images And the corresponding imageChannel combination is carried out to obtain an image pairAndThe two combined image pairs are respectively input into a U-Net encoder, and respectively output to obtain two groups of five intermediate feature images with different scalesAnd,;
C-3) dividing the channels of the two generated feature maps in half, i.eIs divided intoAndRepresenting an imageA corresponding characteristic map is provided for the user,Representing an imageA corresponding feature map; Is divided into AndA feature map corresponding to the image R is shown,Representing an imageA corresponding feature map;
c-4) utilizing a change extraction module pair AndProcessing to obtain variation information corresponding to each of the images L and RAndTaking the change information of the image L at the first level as an example, in the intermediate feature map at the first level, forAndPerforming subtraction operation, and comparing the feature map obtained after subtraction with the feature map obtained after subtractionFusing to obtain variation information of the image L in the first-level intermediate feature mapFor the image L to change information in the first-level intermediate feature map,For an intermediate feature map of the image L at a first level,Is an imageIn the intermediate feature map of the first level,Is a fusion mechanism. In the same way, getFor the image R to change information in the first-level intermediate feature map,For the intermediate feature map of image R at the first level,Is an imageAn intermediate feature map at a first level; in the second through fifth levels of the two sets of intermediate feature maps,Taking the example of obtaining the variation information of the image L from the second level to the fifth level, firstly, forAndPerforming subtraction operation to obtainAnd then toAndCross-attention processing to obtainWill beAnd (3) withThe result obtained by the addition is compared withFusion is carried out to obtain an imageVariation information of feature map at the layer,,For the image L to change information in the intermediate feature map of the second to fifth layers,For the intermediate feature map of the image L at the second to fifth levels,Is an imageIn the intermediate feature maps of the second to fifth levels,In order to be a mechanism of fusion,Is a cross-attention mechanism; similarly, variation information of the image R from the second level to the fifth level is obtainedThe fluctuation information of the representative image L is collectively referred to asThe fluctuation information of the representative image R is collectively called as
Based on the scene change detection method based on deep learning, the step d) comprises the following steps:
d-1) feature variation information generated for preliminary variation detection network by U-Net decoder AndUpsampling and decoding to finally generate feature maps at the original image resolution, respectivelyAnd;
D-2) mapping featuresAndIn the component input to the prediction target bounding box, the changed region in the two images is output and a bounding box is generated around the region.
Based on the scene change detection method based on deep learning, the step e) comprises the following steps:
e-1) the preprocessed image pairs are aligned according to 20:1:2 is divided into a training set, a verification set and a test set;
e-2) training the network with key points loss and offsetloss, optimizing the overall objective with Adam, learning rate 0.00001, weight decay 0.0005, using DDP training strategy batchsize of 16, performing 200 iterations during training, and performing verification with verification set every 1 round interval.
In another aspect, an embodiment of the present invention provides a scene change detection apparatus based on deep learning, including:
Based on a homography alignment module, processing the preprocessed two images L and R to obtain alignment images L 'and R' corresponding to the two images; the preliminary change detection network module comprises a feature extraction module and a transformation extraction module, wherein the two images are input into the feature extraction module to obtain a preprocessed image and an image feature fusion module after the preprocessed image and the aligned image are combined; and the positioning network module, the characteristic fusion module and the frame detection module are used for obtaining the boundary frames of the respective change areas of the two images L and R.
In yet another aspect, embodiments of the present invention provide a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform steps in the scene change detection method.
In a final aspect, an embodiment of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the scene change detection method when executing the program.
The effects provided in the summary of the invention are merely effects of embodiments, not all effects of the invention, and the above technical solution has the following advantages or beneficial effects:
A network formed by combining a homography image registration structure and a cross attention mechanism structure is adopted. The two images are aligned under the same coordinate system through homography, so that the characteristic that fluctuation in the two images cannot be acquired quickly is made up, the corresponding relation of the two images is captured through a cross attention mechanism structure, and the loss of information in the unaligned images is made up. The network adopts a twin neural network architecture, so that two images can be operated at the same time, and the fusion of the change characteristics is enhanced through the characteristic fusion module, so that the identification of a change area is better completed.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a network configuration diagram of scene change detection according to the present invention.
FIG. 3 is a first layer intermediate feature map transformation extraction operational map of the present invention.
Fig. 4 is a diagram of the second to fifth layer intermediate feature map transformation extraction operations of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
A scene change detection method based on deep learning comprises the following steps:
a) Preprocessing the two original images to obtain a preprocessed size of Two images of (2)AndL and R are two images,The data type representing the matrix elements is real,For the height of the image to be high,Is the width of the image, 3 is the number of channels of the image,The representation image L is formed by a matrix of real numbers of a shape size 3 x h x w,And the same is done;
b) Constructing a homography-based alignment module, inputting the preprocessed two images L and R into the module to respectively obtain aligned images L 'and R', L 'corresponding to the two images, wherein the L' is the alignment of the image L based on an image R coordinate system, and the R is the alignment of the image L based on the image R coordinate system Is the alignment of the image R based on the image L coordinate system, so that the spatial position between the image L 'and the image R is consistent, and the spatial position between the image R' and the image L is consistent;
c) Constructing a preliminary fluctuation detection network composed of a feature extraction module and a fluctuation extraction module, respectively carrying out channel combination on two aligned images L 'and R' and corresponding preprocessed images R and L according to the spatial position consistency of the images, and combining L and R 'in the channel dimension in the channel combination process to obtain a 6-channel image LR' with the size of Where h and w represent the height and width of the image, respectively,The representation image LR' is formed by a real matrix with a shape size of 6×h×w. Similarly, R and L 'are combined in the channel dimension to obtain a 6-channel image RL' which is also of the size. The combined images LR 'and RL' are respectively input into the corresponding preliminary change detection network to obtain change information D L and D R corresponding to the images L and R, LR 'is the combined image of the preprocessed image L and the aligned image R', RL 'is the combined image of the preprocessed image R and the aligned image L', D L is the change information of the image L, and D R is the change information of the image R;
d) Constructing a positioning network composed of a feature fusion module and a frame detection module, inputting variation information D L and D R obtained by the preliminary variation detection network into respective corresponding positioning networks, and then outputting a boundary frame boundingbox of respective variation areas of two images L and R by the two positioning networks, wherein a variation area between each image is positioned by a positioning box;
e) Training a positioning network.
In this embodiment, step b) includes the steps of:
b-1) based on the homography alignment module, the image feature point matching is composed of feature point detection, feature point description and feature point matching, and the image alignment is composed of a calculation homography transformation matrix and a registration image;
b-2) inputting the preprocessed images L and R into characteristic point matching of an alignment module, obtaining respective characteristic points of the image pairs, matching the characteristic points in the two images, and outputting to obtain characteristic points KP L,KPR,KPL successfully matched with the two images as points with obvious local structures in the image L, and KP R as points with obvious local structures in the image R;
b-3) inputting the matched characteristic points KP L,KPR into a method for calculating homography transformation matrix to calculate transformation matrices H L-R and H R -L,HL-R as images L-direction images The aligned transformation matrix, H R -L is the transformation matrix for aligning the image R to the image L, and then the calculated transformation matrix is applied to the corresponding image to realize the image alignment, and the aligned image is obtained by outputtingAndL 'is the image in which the image L is aligned in the scene of the image R, and R' is the image in which the image R is aligned in the scene of the image L.
In this embodiment, step c) includes the steps of:
c-1) the preliminary fluctuation detection network is composed of an image channel connection, a U-Net encoder and a fluctuation information extraction module, wherein the fluctuation information extraction module is composed of a subtraction operation and a cross attention mechanism.
C-2) aligning the imagesAnd the corresponding imageChannel combination is carried out to obtain an image pairAndThe two combined image pairs are respectively input into a U-Net encoder, and respectively output to obtain two groups of five intermediate feature images with different scalesAnd,;
C-3) dividing the channels of the two generated feature maps in half, i.eIs divided intoAndRepresenting an imageA corresponding characteristic map is provided for the user,Representing an imageA corresponding feature map; Is divided into AndA feature map corresponding to the image R is shown,Representing an imageA corresponding feature map;
c-4) utilizing a change extraction module pair AndProcessing to obtain variation information corresponding to each of the images L and RAndTaking the change information of the image L at the first level as an example, in the intermediate feature map at the first level, forAndPerforming subtraction operation, and comparing the feature map obtained after subtraction with the feature map obtained after subtractionFusing to obtain variation information of the image L in the first-level intermediate feature mapFor the image L to change information in the first-level intermediate feature map,For an intermediate feature map of the image L at a first level,Is an imageIn the intermediate feature map of the first level,Is a fusion mechanism. In the same way, getFor the image R to change information in the first-level intermediate feature map,For the intermediate feature map of image R at the first level,Is an imageAn intermediate feature map at a first level; in the second through fifth levels of the two sets of intermediate feature maps,Taking the example of obtaining the variation information of the image L from the second level to the fifth level, firstly, forAndPerforming subtraction operation to obtainAnd then toAndCross-attention processing to obtainWill beAnd (3) withThe result obtained by the addition is compared withFusion is carried out to obtain an imageVariation information of feature map at the layer,,For the image L to change information in the intermediate feature map of the second to fifth layers,For the intermediate feature map of the image L at the second to fifth levels,Is an imageIn the intermediate feature maps of the second to fifth levels,In order to be a mechanism of fusion,Is a cross-attention mechanism; similarly, variation information of the image R in the second to fifth levels is obtained; similarly, variation information of the image R from the second level to the fifth level is obtainedThe fluctuation information of the representative image L is collectively referred to asThe fluctuation information of the representative image R is collectively called as
In this embodiment, step d) includes the steps of:
d-1) feature variation information generated for preliminary variation detection network by U-Net decoder AndUpsampling and decoding to finally generate feature maps at the original image resolution, respectivelyAnd;
D-2) mapping featuresAndIn the component input to the prediction target bounding box, the changed region in the two images is output and a bounding box is generated around the region.
In this embodiment, step e) includes the steps of:
e-1) the preprocessed image pairs are aligned according to 20:1:2 is divided into a training set, a verification set and a test set;
e-2) training the network with key points loss and offsetloss, optimizing the overall objective with Adam, learning rate 0.00001, weight decay 0.0005, using DDP training strategy batchsize of 16, performing 200 iterations during training, and performing verification with verification set every 1 round interval.
To verify the effectiveness of the present invention, evaluations were performed on the COCO-INPAINTED dataset, the Synthtext-Change dataset, the VIRAT-STD dataset, and the Kubric-Change dataset, with the COCO-INPAINTED dataset being a variation-based test set that we collated from the COCO test subset. In this embodiment, the test set is divided into three categories according to the size of the variable object, namely small, medium and large, all represents the integration of the test sets of the three categories, and we sort 1655 pairs of images for small objects, 1747 pairs of images for medium objects, 1006 pairs of images for large objects, and 4408 pairs of images for COCO-INPAINTED test sets. Synthtext-Change dataset random text was added to the "background" image by synthesis techniques and 5000 pairs of images were generated in a manner consistent with their geometry. To detect changes in outdoor scenes, 1000 pairs of images are randomly selected from the STD dataset, since STD does not provide a base GroundTruth for variation, an automated tool is used to obtain base GroundTruth, since the camera is static, there is one and the same geometric transformation between images, but the photometric conditions may change due to time of day, weather conditions, etc. Kubric-Change datasets are 1605 varying pairs of realistic images, a scene consisting of a set of randomly selected 3D objects that lie on a randomly textured ground plane. For a given scene, objects are iteratively removed therefrom and pairs of "before" and "after" images are captured.
For quantitative evaluation, we calculate the average accuracy AP as an evaluation index based on the prediction bounding box and the ground bounding box according to the previous correlation method.
Comparison of the performance of the classical image change detection algorithm and the performance of the invention is shown in the following table, 200 epochs are experimentally set, an optimization method Adam is adopted, the default learning rate is 0.00001, and the weight attenuation is 0.0005; to enhance the fitting ability of the model to the data, we resort to random affine transformation, contrast enhancement, illumination enhancement and saturation enhancement.
TABLE 1 comparison of the currently optimal variation detection model with the performance of the present invention on different data sets
The CYWS model is the optimal change detection model in the current research field, and from table 1, it can be found that compared with the CYWS model, the model effect of the model obtains excellent performance in COCO-INPAINTED and VIRAT-STD data sets, and performance in other data sets is in a stable state.
Example 2
The embodiment of the invention provides a scene change detection device based on deep learning, which comprises the following components: based on a homography alignment module, processing the preprocessed two images L and R to obtain alignment images L 'and R' corresponding to the two images; the preliminary change detection network module comprises a feature extraction module and a change extraction module, wherein the two images are input into the feature extraction module to obtain a preprocessed image and an image feature fusion module after the preprocessed image and the aligned image are combined; and the positioning network module, the characteristic fusion module and the frame detection module are used for obtaining the boundary frames of the respective change areas of the two images L and R.
Example 3
Embodiments of the present invention provide a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the scene change detection method. The storage medium may include a Read Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or the like.
Example 4
An embodiment of the application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the scene change detection method when executing the program. The computer device in the embodiment of the application can be a terminal or a server, and the terminal can be a terminal device such as a smart phone, a tablet Personal computer, a notebook computer, a touch screen, a game console, a Personal computer (PC, personal Computer), a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA) and the like.
While the foregoing description of the embodiments of the present invention has been presented with reference to the drawings, it is not intended to limit the scope of the invention, but rather, it is apparent that various modifications or variations can be made by those skilled in the art without the need for inventive work on the basis of the technical solutions of the present invention.

Claims (7)

1. The scene change detection method based on deep learning is characterized by comprising the following steps of:
a) Preprocessing the two original images to obtain a preprocessed size of Two images of (2)AndL and R are two images,The data type representing the matrix elements is real,For the height of the image to be high,Is the width of the image, 3 is the number of channels of the image,The representation image L is formed by a matrix of real numbers of a shape size 3 x h x w,The representation image R is composed of a real matrix with the shape size of 3 Xh x w;
b) Constructing a homography-based alignment module, inputting the preprocessed two images L and R into the module to respectively obtain aligned images L 'and R', L 'corresponding to the two images, wherein the L' is the alignment of the image L based on an image R coordinate system, and the R is the alignment of the image L based on the image R coordinate system Is the alignment of the image R based on the image L coordinate system, so that the spatial position between the image L 'and the image R is consistent, and the spatial position between the image R' and the image L is consistent;
c) Constructing a preliminary fluctuation detection network composed of a feature extraction module and a fluctuation extraction module, respectively carrying out channel combination on two images L 'and R' after alignment and corresponding preprocessed images R and L, and combining L and R 'in the channel dimension in the channel combination process to obtain a 6-channel image LR', wherein the size of the 6-channel image LR is as follows Where h and w represent the height and width of the image, respectively,The representation image LR ' is composed of a real matrix with a shape size of 6 Xh x w, R and L ' are combined in the channel dimension to obtain a 6-channel image RL ' with a size ofThe combined images LR 'and RL' are respectively input into the corresponding preliminary change detection network to obtain change information D L and D R corresponding to the images L and R, LR 'is the combined image of the preprocessed image L and the aligned image R', RL 'is the combined image of the preprocessed image R and the aligned image L', D L is the change information of the image L, and D R is the change information of the image R;
c) The method comprises the following steps:
c-1) the preliminary fluctuation detection network is composed of image channel connection, a U-Net encoder and a fluctuation information extraction module, wherein the fluctuation information extraction module is composed of a subtraction operation and a cross attention mechanism;
c-2) aligning the images And the corresponding imageChannel combination is carried out to obtain an image pairAndThe two combined image pairs are respectively input into a U-Net encoder, and respectively output to obtain two groups of five intermediate feature images with different scalesAnd,;
C-3) dividing the channels of the two generated feature maps in half, i.eIs divided intoAndRepresenting an imageA corresponding characteristic map is provided for the user,Representing an imageA corresponding feature map; Is divided into AndA feature map corresponding to the image R is shown,Representing an imageA corresponding feature map;
c-4) utilizing a change extraction module pair AndProcessing to obtain variation information corresponding to each of the images L and RAnd
In the intermediate feature map of the first level, the change information of the image L at the first level is acquired, forAndPerforming subtraction operation, and comparing the feature map obtained after subtraction with the feature map obtained after subtractionThe fusion is carried out and the fusion is carried out,
Thereby obtaining the variation information of the image L in the first-level intermediate feature mapFor the image L to change information in the first-level intermediate feature map,For an intermediate feature map of the image L at a first level,Is an imageIn the intermediate feature map of the first level,Is a fusion mechanism;
obtained by the same procedure as the acquisition of the variation information of the image L at the first level For the image R to change information in the first-level intermediate feature map,For the intermediate feature map of image R at the first level,Is an imageAn intermediate feature map at a first level;
in the second through fifth levels of the intermediate feature map, To acquire variation information of the images L and R at the second to fifth levels;
First pair AndPerforming subtraction operation to obtainAnd then toAndCross-attention processing to obtainWill beAnd (3) withThe result obtained by the addition is compared withFusion is carried out to obtain an imageVariation information of feature map at the layer,,For the image L to change information in the intermediate feature map of the second to fifth layers,For the intermediate feature map of the image L at the second to fifth levels,Is an imageIn the intermediate feature maps of the second to fifth levels,In order to be a mechanism of fusion,Is a cross-attention mechanism;
Obtaining the change information of the image R at the second to fifth levels by adopting the same steps as the change information of the image L at the second to fifth levels The fluctuation information of the representative image L is collectively referred to asThe fluctuation information of the representative image R is collectively called as
D) Constructing a positioning network formed by a feature fusion module and a frame detection module, inputting variation information D L and D R obtained by the preliminary variation detection network into the respective corresponding positioning networks, and then outputting a boundary frame of a variation area of each of two images L and R by the two positioning networks;
e) Training a positioning network.
2. The scene change detection method based on deep learning according to claim 1, wherein the step b) includes the steps of:
b-1) based on the homography alignment module, the image feature point matching is composed of feature point detection, feature point description and feature point matching, and the image alignment is composed of a calculation homography transformation matrix and a registration image;
b-2) inputting the preprocessed images L and R into characteristic point matching of an alignment module, obtaining respective characteristic points of the image pairs, matching the characteristic points in the two images, and outputting to obtain characteristic points KP L,KPR,KPL successfully matched with the two images as points with obvious local structures in the image L, and KP R as points with obvious local structures in the image R;
b-3) inputting the matched characteristic points KP L,KPR into a method for calculating homography transformation matrix to calculate transformation matrices H L-R and H R -L,HL-R as images L-direction images The aligned transformation matrix, H R -L is the transformation matrix for aligning the image R to the image L, and then the calculated transformation matrix is applied to the corresponding image to realize the image alignment, and the aligned image is obtained by outputtingAndL 'is the image in which the image L is aligned in the scene of the image R, and R' is the image in which the image R is aligned in the scene of the image L.
3. The scene change detection method based on deep learning according to claim 1, wherein the step d) includes the steps of:
d-1) feature variation information generated for preliminary variation detection network by U-Net decoder AndUpsampling and decoding to finally generate feature maps at the original image resolution, respectivelyAnd;
D-2) mapping featuresAndIn the component input to the prediction target bounding box, the changed region in the two images is output and a bounding box is generated around the region.
4. The scene change detection method based on deep learning according to claim 1, wherein the step e) includes the steps of:
e-1) the preprocessed image pairs are aligned according to 20:1:2 is divided into a training set, a verification set and a test set;
e-2) training the network with key points loss and offsetloss, optimizing the overall objective with Adam, learning rate 0.00001, weight decay 0.0005, using DDP training strategy batchsize of 16, performing 200 iterations during training, and performing verification with verification set every 1 round interval.
5. A scene change detection device based on deep learning, characterized in that the steps in the scene change detection method according to any one of claims 1 to 4 are performed, comprising:
Based on a homography alignment module, processing the preprocessed two images L and R to obtain alignment images L 'and R' corresponding to the two images;
the preliminary change detection network module comprises a feature extraction module and a change extraction module, wherein the two images are input into the feature extraction module to obtain a preprocessed image and an image feature fusion module after the preprocessed image and the aligned image are combined;
And the positioning network module, the characteristic fusion module and the frame detection module are used for obtaining the boundary frames of the respective change areas of the two images L and R.
6. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the scene change detection method of any of claims 1 to 4.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the scene change detection method according to any of claims 1 to 4 when the program is executed.
CN202410487285.XA 2024-04-23 Scene change detection method, device, medium and equipment based on deep learning Active CN118097566B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410487285.XA CN118097566B (en) 2024-04-23 Scene change detection method, device, medium and equipment based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410487285.XA CN118097566B (en) 2024-04-23 Scene change detection method, device, medium and equipment based on deep learning

Publications (2)

Publication Number Publication Date
CN118097566A CN118097566A (en) 2024-05-28
CN118097566B true CN118097566B (en) 2024-06-28

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160291A (en) * 2021-04-12 2021-07-23 华雁智科(杭州)信息技术有限公司 Change detection method based on image registration
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
CN113160291A (en) * 2021-04-12 2021-07-23 华雁智科(杭州)信息技术有限公司 Change detection method based on image registration

Similar Documents

Publication Publication Date Title
CN108230278B (en) Image raindrop removing method based on generation countermeasure network
CN110992238B (en) Digital image tampering blind detection method based on dual-channel network
CN107481279A (en) A kind of monocular video depth map computational methods
CN103971399A (en) Street view image transition method and device
CN107767358B (en) Method and device for determining ambiguity of object in image
CN110825900A (en) Training method of feature reconstruction layer, reconstruction method of image features and related device
CN112085835A (en) Three-dimensional cartoon face generation method and device, electronic equipment and storage medium
Swaminathan et al. Component forensics
Liu et al. Overview of image inpainting and forensic technology
CN112329771A (en) Building material sample identification method based on deep learning
CN110163095B (en) Loop detection method, loop detection device and terminal equipment
CN114677722A (en) Multi-supervision human face in-vivo detection method integrating multi-scale features
Hassner et al. SIFTing through scales
Matusiak et al. Unbiased evaluation of keypoint detectors with respect to rotation invariance
CN112149528A (en) Panorama target detection method, system, medium and equipment
Ma et al. Light field image quality assessment using natural scene statistics and texture degradation
CN118097566B (en) Scene change detection method, device, medium and equipment based on deep learning
CN116188956A (en) Method and related equipment for detecting deep fake face image
CN116128919A (en) Multi-temporal image abnormal target detection method and system based on polar constraint
CN118097566A (en) Scene change detection method, device, medium and equipment based on deep learning
CN114863132A (en) Method, system, equipment and storage medium for modeling and capturing image spatial domain information
Nasiri et al. Exposing forgeries in soccer images using geometric clues
RU2538319C1 (en) Device of searching image duplicates
CN114612798B (en) Satellite image tampering detection method based on Flow model
Borkowski 2d to 3d conversion with direct geometrical search and approximation spaces

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant