CN116091868A - Online video anti-shake device, online video anti-shake method and learning method thereof - Google Patents

Online video anti-shake device, online video anti-shake method and learning method thereof Download PDF

Info

Publication number
CN116091868A
CN116091868A CN202310102762.1A CN202310102762A CN116091868A CN 116091868 A CN116091868 A CN 116091868A CN 202310102762 A CN202310102762 A CN 202310102762A CN 116091868 A CN116091868 A CN 116091868A
Authority
CN
China
Prior art keywords
video
frame
shake
motion
inter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310102762.1A
Other languages
Chinese (zh)
Inventor
刘帅成
张卓凡
刘震
曾兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202310102762.1A priority Critical patent/CN116091868A/en
Publication of CN116091868A publication Critical patent/CN116091868A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Studio Devices (AREA)

Abstract

The invention discloses an online video anti-shake device, an online video anti-shake method and a learning method thereof, which belong to the technical field of video processing, wherein the learning method for video anti-shake comprises the following steps: acquiring training data; training the neural network model based on the training data; acquiring training data includes: obtaining a jittering video and a stable video; extracting a first inter-frame motion of the jittered video; transforming each frame of the stabilized video based on the first inter-frame motion of the dithered video to obtain a processed video; the stabilized video and the processed video are used as training data. According to the learning method, the motion of a jittering video is transferred to a stable video, so that an unstable video corresponding to the original stable video is synthesized, and then the original stable video and the corresponding unstable video are used as training data required by the video anti-shake method. The invention does not need to carry out synchronous shooting on the stable video and the jittering video, and the picture content can be irrelevant.

Description

Online video anti-shake device, online video anti-shake method and learning method thereof
Technical Field
The invention belongs to the technical field of video processing, and particularly relates to online video anti-shake equipment, an online video anti-shake method and a learning method thereof.
Background
Video anti-shake aims at converting a shake video into a satisfactory stable video through a smooth video track, and is widely applied to the fields of smart phones, unmanned aerial vehicles, security protection and the like at present. Video anti-shake can be divided into three main categories at present, namely mechanical anti-shake, optical anti-shake and digital anti-shake. Mechanical anti-shake typically uses sensors and mechanical structures to accomplish this task. The optical anti-shake detects the angle and speed of motion through a set of lenses and sensors to achieve video stabilization. Digital anti-shake technology is implemented in software without using a specific device, and thus digital video anti-shake can be regarded as a problem in the fields of video processing and computer vision. Because digital anti-shake is implemented solely by means of software algorithms, it is the only way to stabilize already recorded video, in addition to saving costs and reducing specific equipment requirements.
Digital video anti-shake may consider two different environments: off-line anti-shake and on-line anti-shake. In an offline case, the information from all frames of the video can be used, thus yielding better results, especially in the post-processing of recorded video. Under the on-line condition, future frames are not used for anti-shake of the video, and the video can be immediately and stably recorded in the video recording process, so that the anti-shake method is important for a real-time flow field picture.
The traditional digital anti-shake method detects feature points in a video frame, then estimates a 2D transformation such as Homography (Homography), optical Flow (Optical Flow) and grid Flow (MeshFlow), or estimates a 3D camera pose as a motion representation, and finally performs smoothing processing on a camera path formed by the motion to realize video anti-shake. The anti-shake method based on deep learning in the conventional manner uses a neural network model, such as a convolutional neural network model (Convolutional Neural Networks), to directly learn the mapping relationship from unstable video to stable video. However, the conventional approach has the following drawbacks: 1. the traditional method is limited by a feature algorithm, and the situation that feature detection and tracking fail can occur on low-quality video, so that anti-shake failure is caused. 2. Although the deep learning method performs well on low quality video, it is very dependent on the quality and amount of training data, and usually takes video frames directly as input, and is therefore also affected by the picture texture. 3. The deep learning training data for video anti-shake adopts double-camera shooting, namely, two video recording devices with identical models respectively use and do not use external mechanical auxiliary anti-shake devices to synchronously shoot stable and unstable video pairs, and the problems of high cost, low efficiency, path divergence and the like are caused.
Disclosure of Invention
The invention provides online video anti-shake equipment, an online video anti-shake method and a learning method thereof, which can synthesize training data for video anti-shake tasks without double-machine shooting.
The invention is realized by the following technical scheme:
in one aspect, the present invention provides a learning method for video anti-shake, including the steps of: acquiring training data; training the neural network model based on the training data; acquiring training data includes: obtaining a jittering video and a stable video; extracting a first inter-frame motion of the jittered video; transforming each frame of the stabilized video based on the first inter-frame motion of the dithered video to obtain a processed video; the stabilized video and the processed video are used as training data.
In some of these embodiments, the loss function of the neural network model to be trained is:
L=L MC +αL SC +βL SP
wherein L is MC Is a motion consistency loss function, L SC Is a shape consistency loss function, L SP Is the scale-preserving loss function, and α and β are balance parameters used to balance the contributions of the three loss functions.
In some of these embodiments, the motion consistency loss function is:
Figure BDA0004073734850000021
wherein B' t And B' t-1 A transformed field map representing two adjacent frames of the network estimate,
Figure BDA0004073734850000022
and->
Figure BDA0004073734850000023
Representing the true value of the transformed field map of two adjacent frames;
the shape consistency loss function is:
Figure BDA0004073734850000031
wherein v is i Represents the ith mesh vertex, N represents the total number of mesh vertices;
the scale retention loss function is:
Figure BDA0004073734850000032
where s represents a scale factor.
In another aspect, the present application provides a method for anti-shake of a minimum delay online video, including the steps of: obtaining an unstable frame in a video; extracting a second inter-frame motion of a video formed by an unstable frame and a previous continuous frame through a preset neural network model; performing path smoothing on the unstable frames based on the second inter-frame motion and the trained neural network model to obtain a transformation field diagram; the unstable frame is reset by transforming the field pattern.
In some of these embodiments, resetting the unstable frame by transforming the field map comprises the steps of: and adjusting the positions of all pixels on the unstable frame according to the displacement vectors of all the pixel points provided by the transformation field diagram to obtain the stable frame.
In some of these embodiments, the neural network model being trained is a convolutional neural network model.
In some of these embodiments, the second inter-frame motion is represented in the form of a sparse grid; after extracting the second inter-frame motion of the video formed by the unstable frame and the previous continuous frames, before performing path smoothing on the unstable frame based on the second inter-frame motion and the neural network model after training to obtain a transformed field map, the method comprises the following steps: processing input data of the convolutional neural network model: interpolation is carried out through a sparse grid formed by the second inter-frame motion to obtain a flow field diagram; the flow field diagram comprises a channel dimension and a Gao Weihe wide dimension; and splicing the flow field graph on the channel dimension according to the time sequence by using the sliding window to form the input data of the convolutional neural network model.
The application also provides a minimum delay online video anti-shake device, comprising: the motion extraction device is used for extracting the second inter-frame motion of the video; a path smoothing device for smoothing a path of the video; a memory having a computer program stored thereon; a processor executing a computer program to implement the minimum delay online video anti-shake method of any of the above embodiments.
Compared with the prior art, the invention has the following advantages:
according to the learning method for video anti-shake, the motion of one shake video is transferred to one stable video, so that an unstable video corresponding to the original stable video is synthesized, and then the original stable video and the corresponding unstable video are used as training data required by the video anti-shake method. The invention does not need to carry out synchronous shooting on the stable video and the jittering video, and the picture content can be irrelevant.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly describe the drawings in the embodiments, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a relationship between motion and a transformed field diagram of two adjacent frames in a video anti-shake method based on a deep learning method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a synthesis relationship of a processed video in a video anti-shake method based on a deep learning method according to an embodiment of the present invention;
fig. 3 is a flowchart of a video anti-shake method based on a deep learning method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a relationship between motion and a transform field of two adjacent frames in a loss function according to an embodiment of the present invention;
fig. 5 is a comparison chart of effects of a video anti-shake method based on a deep learning method according to an embodiment of the present invention;
FIG. 6 is a path diagram of a prior art dual camera video;
fig. 7 is a path diagram of a video anti-shake method based on a deep learning method according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention.
In the description of the present invention, it should be noted that, as the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are used to indicate orientations or positional relationships based on those shown in the drawings, or those that are conventionally put in use in the product of the present invention, they are merely used to facilitate description of the present invention and simplify description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, the terms "horizontal," "vertical," and the like in the description of the present invention, if any, do not denote absolute levels or overhangs, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the present invention, it should also be noted that, unless explicitly stated and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" should be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
The terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
In one aspect, an embodiment of the present application provides a learning method for video anti-shake, including the following steps:
s10, training data are acquired. In S10, first, the first inter-frame motion of a jittered video is extracted using video motion estimation, the motion is expressed in the form of a grid stream, and then each frame of a stabilized video is transformed based on these first inter-frame motions, thereby obtaining a new jittered video. The first inter-frame motion of the plum is used for distinguishing from the second inter-frame motion, and the first inter-frame motion refers to the inter-frame motion of the acquired known jittering video in the process of acquiring training data; the second inter-frame motion is in the video anti-shake process, and the inter-frame motion of appointed continuous frames in the acquired video to be processed is acquired. The method does not need to carry out synchronous shooting on the stable video and the jittering video, and the picture content can be irrelevant.
S10 may specifically include the following steps:
s101, obtaining jittering video V ust And stabilizing video V stb . Wherein the video V is dithered ust And stabilizing video V stb May be uncorrelated, i.e. jittery video V ust Content and stabilized video V of (2) stb May be different.
S102, extracting first inter-frame motion of the jittered video. In S102, a Deep neural network model, such as Deep MeshFlow, may be used to estimate the jittered video V ust And stabilizing video V stb First inter-frame motion of (2)
Figure BDA0004073734850000061
And
Figure BDA0004073734850000062
s103, stabilizing the video V based on the first inter-frame motion of the jittery video stb Each frame is transformed to obtain a new processed video V syn . In S103, a video V with jitter is synthesized by migrating the first inter-frame motion of the jittered video onto a stable video ust Is a dithering effect but picture and main path and stabilizing video V stb New processed video V that remains consistent syn For convenience of description, use
Figure BDA0004073734850000063
Figure BDA0004073734850000064
Frames respectively representing the three videos are obtained by +.>
Figure BDA0004073734850000065
To->
Figure BDA0004073734850000066
Transform to synthesize +.>
Figure BDA0004073734850000067
Figure BDA0004073734850000068
By the arrangement, each stable video can be combined into a new processing video, and a group of stable videos and the corresponding combined new processing video can form a stable/jittery video pair for network training. Referring to fig. 2, each video has the following relationship:
Figure BDA0004073734850000071
due to
Figure BDA0004073734850000072
And->
Figure BDA0004073734850000073
Has been calculated beforehand, so +.>
Figure BDA0004073734850000074
Can be expressed as:
Figure BDA0004073734850000075
in subsequent training, the path smoothing network will follow
Figure BDA0004073734850000076
For input, output +.>
Figure BDA0004073734850000077
Supervised training is performed for true values.
S104, taking the stable video and the corresponding processing video as training data.
And S20, training the training data to obtain a neural network model.
In the deep learning method, the loss function used by the neural network model to be trained is mainly as follows:
motion consistency Loss function (Motion-consistency Loss):
Figure BDA0004073734850000078
/>
wherein B' t And B' t-1 A transformed field map representing two adjacent frames of the network estimate,
Figure BDA0004073734850000079
and->
Figure BDA00040737348500000710
Representing the true value of the transformed field map of two adjacent frames. The motion consistency loss function is responsible for constraining the network to learn a reasonable anti-shake result while maintaining inter-frame continuity.
Shape consistency Loss function (Shape-consistency Loss):
Figure BDA00040737348500000711
wherein v is i Representing the vertex of the ith mesh,
Figure BDA00040737348500000712
representing different mesh vertices, referring to figure 4,n represents the total number of mesh vertices. The output result of the shape consistency loss function constraint convolution neural network model cannot deviate from the general grid shape greatly, otherwise, the result picture is distorted and distorted.
Scale-preserving Loss function (Scale-preserving Loss):
Figure BDA0004073734850000081
where s represents a scale factor. Because we convert sparse motion in grid form into a dense flow field map and predict a grid-like transformed field map, it is necessary to introduce a scale-preserving loss function to ensure that the network can guarantee consistency of the output results in such a scale transformation.
This gives the final total loss function as follows:
L=L MC +αL SC +βL SP
where α and β are balance parameters used to balance the contributions of the three loss functions, where the value may be 0.01.
On the other hand, the application provides a video anti-shake method based on the deep learning method in any embodiment, which first uses a neural network model to estimate the second inter-frame motion of the input video, and specifically, the deep neural network model can be adopted. Preferably, the neural network model to be trained can adopt a convolutional neural network model, a sliding window is used for inputting a second inter-frame motion sequence of an input video into the convolutional neural network model with an attention mechanism for path smoothing, a transformation field diagram of the last frame of the sliding window is output, and finally, the shape and the position of the last frame in the window are transformed through the transformation field diagram to realize anti-shake. Different motion estimation methods may express motion in different modes, so we design motion estimated by different methods, and convert the motion into a unified dense flow field diagram according to the offset generated by the motion to each pixel position, so as to solve the problem of inconsistent motion expression mode, and the motion estimation method is also naturally suitable for being used as input of a convolutional neural network model.
Specifically, the video anti-shake method includes the following steps:
and T10, obtaining unstable frames in the video. In T10, the unstable frame of the video can be captured directly by the existing software, and the video recording device can capture the unstable frame I at time T t As an example.
T20, extracting an unstable frame I including capturing T moment through a preset neural network model t And a second inter-frame motion of the video formed by successive frames preceding the unstable frame, the predetermined neural network model may be set to be the same as the deep neural network model in step S102, and then I is recorded using a fixed window t Past r video frames { I } t } r =<I t ,I t-1 ,…,I t-r >And uses them to treat I t Stabilization is performed. Since the whole process does not need to use I t So at I t After being captured, the device can be stabilized and output, so that the device is a minimum delay method. Second inter-frame motion { F t The path smoothing network of the present application predicts transformed field patterns based solely on estimated motion, which may be responsible for another deep neural network model:
{B′ t }=φ({F t ;θ})
where φ (-) represents the camera path smoothing network and θ represents the network parameters to be optimized.
T30, the second inter-frame motion, is represented in the form of a sparse grid. Processing input data of the convolutional neural network model: processing input data of the convolutional neural network model: interpolation is carried out through a sparse grid formed by the second inter-frame motion to obtain a flow field diagram; the flow field diagram comprises a channel dimension and a Gao Weihe wide dimension; splicing the flow field diagram on the channel dimension according to the time sequence by using a sliding window to form input data of a convolutional neural network model;
and T40, performing path smoothing on the unstable frames based on the second inter-frame motion and the trained neural network model obtained by the learning method for video anti-shake, and obtaining a transformation field diagram. At T40, the continuous flow field map within the sliding window may be input into a convolutional neural network model with a channel attention mechanism, and the transformed field map of the last frame in the sliding window is estimated. The convolutional neural network model used in the method adds a channel attention mechanism to the jump connection part on the basis of the UNet structure, so that the network can set weights for flow field diagrams at different time sequence positions according to the motion mode of an input sequence, and the anti-shake effect is improved.
And T50, resetting the unstable frame through the transformation field diagram. In T50, the elements in the transform field map estimated in T40 are in one-to-one correspondence with the pixel points at the same positions in the original frame, representing the displacement vector of the pixel from the position on the original frame to the position on the stable frame. According to the displacement vectors of all the pixel points provided by the transformation field diagram, the positions of all the pixels on the original frame can be adjusted to synthesize a stable frame I t ′。
The embodiment of the application also provides a minimum delay online video anti-shake device, which comprises:
the motion extraction device is used for extracting the second inter-frame motion of the video;
a path smoothing device for smoothing a path of the video;
a memory having a computer program stored thereon;
a processor executing a computer program to implement the minimum delay online video anti-shake method of any of the above embodiments.
In the above embodiment, the processing efficiency can be improved by providing a dedicated device responsible for extracting the motion, but the neural network model of the other device focuses on smoothing the path.
In a specific example, the training is supervised, requiring a true transformed field map. In the training phase, the sequence of flow field patterns of two consecutive windows needs to be input together, because the motion consistency loss function is a time sequence loss function, which calculates the estimation result of the transformation field pattern of two consecutive frames. The shape consistency loss function and the scale retention loss function are used for constraining the quality of a single estimation result, and special treatment is not needed. In the reasoning stage, the loss function is not required to be calculated, and the flow field diagram sequences in the windows are sequentially sent into the convolution network according to the window sliding sequence.
The training process adopts Adam as an optimizer, the initial learning rate is set to be 1e-4, and a weight attenuation strategy is not used. We set 3 parameters beta of the optimizer 1 ,β 2 And E is 0.9,0.999 and 1e-8 respectively, the training is iterated for 10 ten thousand times, and the total time is about 20 hours on 2 NVIDIA 1080Ti display cards.
And (3) effect display:
referring to fig. 5, fig. 5 shows a comparison of the proposed method with two existing online anti-shake methods (columns 1, 2: two other methods; column 3: the method; column 4 original frame). The method can obtain good anti-shake effect in different scenes (rotation, scaling and the like), and can avoid the problems of excessive clipping, distortion and the like of the result.
Referring to fig. 6 and fig. 7, fig. 6 and fig. 7 show the effect of the proposed method on jitter video composition, fig. 6 is a path comparison of video pairs shot by double camera, fig. 7 is a path comparison of video pairs synthesized by the proposed method, dashed lines are jitter video paths, and solid lines are stable video paths. It can be seen that the method provided by the application can synthesize high-quality training data samples without generating a divergence in the path with the original stable video.
The embodiment of the application further provides a computer storage medium, on which a computer program is stored, the computer program being loaded by a processor to perform the video anti-shake method according to any of the above embodiments based on the deep learning method according to any of the above embodiments.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent variation, etc. of the above embodiment according to the technical matter of the present invention fall within the scope of the present invention.

Claims (8)

1. A learning method for video anti-shake, comprising the steps of:
acquiring training data;
training a neural network model based on the training data;
the acquiring training data includes:
obtaining a jittering video and a stable video;
extracting a first inter-frame motion of the jittered video;
transforming each frame of the stable video based on the first inter-frame motion of the jittered video to obtain a processed video;
and taking the stable video and the processing video as training data.
2. The learning method for video anti-shake according to claim 1, wherein, when training a neural network model, a training process is constrained by a loss function, and the loss function of the neural network model to be trained is:
L=L MC +αL SC +βL SP
wherein L is MC Is a motion consistency loss function, L SC Is a shape consistency loss function, L SP Is the scale-preserving loss function, and α and β are balance parameters used to balance the contributions of the three loss functions.
3. The learning method for video anti-shake according to claim 2, wherein the motion consistency loss function is:
Figure FDA0004073734830000011
wherein B is t ' and B t-1 A transformed field map representing two adjacent frames of the network estimate,
Figure FDA0004073734830000012
and->
Figure FDA0004073734830000013
Representing the true value of the transformed field map of two adjacent frames;
the shape consistency loss function is:
Figure FDA0004073734830000014
wherein v is i Represents the ith mesh vertex, N represents the total number of mesh vertices;
the scale retention loss function is:
Figure FDA0004073734830000021
where s represents a scale factor.
4. The minimum delay online video anti-shake method is characterized by comprising the following steps of:
obtaining an unstable frame in a video;
extracting a second inter-frame motion of a video formed by an unstable frame and a previous continuous frame through a preset neural network model;
performing path smoothing on the unstable frames based on the second inter-frame motion and the neural network model after training to obtain a transformation field diagram;
resetting the unstable frame through the transformation field diagram.
5. The minimum delay online video anti-shake method according to claim 4, wherein said resetting the unstable frame through the transformed field map comprises the steps of:
and adjusting the positions of all pixels on the unstable frame according to the displacement vectors of all pixel points provided by the transformation field diagram to obtain a stable frame.
6. The minimum delay online video anti-shake method of claim 4, wherein the neural network model being trained is a convolutional neural network model.
7. The minimum delay online video anti-shake method of claim 6, wherein the second inter-frame motion is represented in the form of a sparse grid;
after said extracting a second inter-frame motion comprising an unstable frame and a video formed by a succession of frames preceding it, and before said smoothing the path of the unstable frame based on said second inter-frame motion and on said neural network model after training, obtaining a transformed field map, comprising the steps of:
processing input data of the convolutional neural network model:
interpolation is carried out on the sparse grid formed through the second inter-frame motion to obtain a flow field diagram; the flow field diagram comprises a channel dimension and a Gao Weihe wide dimension;
and splicing the flow field diagram on the channel dimension according to time sequence by using the sliding window to form the input data of the convolutional neural network model.
8. A minimum delay online video anti-shake apparatus, comprising:
the motion extraction device is used for extracting the second inter-frame motion of the video;
a path smoothing device for smoothing a path of the video;
a memory having a computer program stored thereon;
a processor executing the computer program to implement the minimum delay online video anti-shake method of any of claims 4 or 7.
CN202310102762.1A 2023-01-17 2023-01-17 Online video anti-shake device, online video anti-shake method and learning method thereof Pending CN116091868A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310102762.1A CN116091868A (en) 2023-01-17 2023-01-17 Online video anti-shake device, online video anti-shake method and learning method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310102762.1A CN116091868A (en) 2023-01-17 2023-01-17 Online video anti-shake device, online video anti-shake method and learning method thereof

Publications (1)

Publication Number Publication Date
CN116091868A true CN116091868A (en) 2023-05-09

Family

ID=86211852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310102762.1A Pending CN116091868A (en) 2023-01-17 2023-01-17 Online video anti-shake device, online video anti-shake method and learning method thereof

Country Status (1)

Country Link
CN (1) CN116091868A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117291252A (en) * 2023-11-27 2023-12-26 浙江华创视讯科技有限公司 Stable video generation model training method, generation method, equipment and storage medium
CN117714875A (en) * 2024-02-06 2024-03-15 博大视野(厦门)科技有限公司 End-to-end video anti-shake method based on deep neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117291252A (en) * 2023-11-27 2023-12-26 浙江华创视讯科技有限公司 Stable video generation model training method, generation method, equipment and storage medium
CN117291252B (en) * 2023-11-27 2024-02-20 浙江华创视讯科技有限公司 Stable video generation model training method, generation method, equipment and storage medium
CN117714875A (en) * 2024-02-06 2024-03-15 博大视野(厦门)科技有限公司 End-to-end video anti-shake method based on deep neural network
CN117714875B (en) * 2024-02-06 2024-04-30 博大视野(厦门)科技有限公司 End-to-end video anti-shake method based on deep neural network

Similar Documents

Publication Publication Date Title
WO2021208122A1 (en) Blind video denoising method and device based on deep learning
Liang et al. Vrt: A video restoration transformer
CN116091868A (en) Online video anti-shake device, online video anti-shake method and learning method thereof
CN106331480B (en) Video image stabilization method based on image splicing
Hu et al. Video stabilization using scale-invariant features
EP2534828B1 (en) Generic platform for video image stabilization
EP3800878B1 (en) Cascaded camera motion estimation, rolling shutter detection, and camera shake detection for video stabilization
CN105611116B (en) A kind of global motion vector method of estimation and monitor video digital image stabilization method and device
CN103139568A (en) Video image stabilizing method based on sparseness and fidelity restraining
JP6202879B2 (en) Rolling shutter distortion correction and image stabilization processing method
CN114339030B (en) Network live video image stabilizing method based on self-adaptive separable convolution
CN111614965B (en) Unmanned aerial vehicle video image stabilization method and system based on image grid optical flow filtering
CN107360377B (en) Vehicle-mounted video image stabilization method
JP5313326B2 (en) Image decoding apparatus, method and program, and image encoding apparatus, method and program
CN114429191B (en) Electronic anti-shake method, system and storage medium based on deep learning
Chen et al. Pixstabnet: Fast multi-scale deep online video stabilization with pixel-based warping
CN114066761A (en) Method and system for enhancing frame rate of motion video based on optical flow estimation and foreground detection
Mathew et al. Self-attention dense depth estimation network for unrectified video sequences
JP6505501B2 (en) Rolling shutter rotational distortion correction and image stabilization processing method
Rawat et al. Adaptive motion smoothening for video stabilization
WO2023045627A1 (en) Image super-resolution method, apparatus and device, and storage medium
CN113709483B (en) Interpolation filter coefficient self-adaptive generation method and device
CN116208812A (en) Video frame inserting method and system based on stereo event and intensity camera
CN105163046A (en) Video stabilization method based on grid point non-parametric motion model
CN115760590A (en) Video image stabilizing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination