CA3172605A1

CA3172605A1 - Video jitter detection method and device

Info

Publication number: CA3172605A1
Application number: CA3172605A
Authority: CA
Inventors: Chong MU; Xuyang Zhou; Erlong LIU; Wenzhe GUO
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2019-06-21
Filing date: 2020-06-11
Publication date: 2020-12-24
Anticipated expiration: 2040-06-11
Also published as: CA3172605C; CN110248048A; WO2020253618A1; CN110248048B

Abstract

Disclosed in the present invention are a video jitter detection method and device. The method comprises: framing a video requiring detection to obtain a frame sequence; performing feature point detection on the frame sequence frame by frame to obtain a feature point of each frame, and generating a frame feature point sequence matrix; performing operation on the frame feature point sequence matrix on the basis of an optical flow tracking algorithm to obtain a motion vector of each frame; obtaining the feature value of the video requiring detection according to the motion vector of each frame; and obtaining an output signal by means of operation by using the feature value of the video requiring detection as an input signal of a detection model, and determining whether jitter occurs to the video requiring detection according to the output signal. According to the present invention, the use of feature point detection and the use of an optical flow tracking algorithm for feature points effectively solve the problem of tracking failure caused by excessive changes between two adjacent frames, and the present invention has good sensitivity and robustness when performing detection on videos captured in cases such as sudden large displacement, strong shaking, and large rotation of a camera lens.

Description

VIDEO JITTER DETECTION METHOD AND DEVICE
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the field of computer vision technology, and more particularly to a method of and a device for detecting video jitter.
Description of Related Art

[0002] The wave of science and technology has profoundly changed the life of everybody. Such handheld video capturing devices as smart mobile phones, digital cameras, ILDC
and SLR cameras with ever shrinking size and ever lowering price have become everyday necessities of most people, and an age of photographing by all people has quietly come into being. While people are enjoying the interesting and exhilarating moments recorded by handheld video capturing devices, irregular jitters generated in the video due to unstable movements of the lens caused by moving or unintentional shaking of the photographer make the effects of wonderful highlights as recorded to fall far short of expectation, and severely affect subsequent processing of the video at the same time.
Accordingly, video jitter detection has become an indispensable, important component of the video processing technology.

[0003] Video jitter detection is the basis for subsequent readjustment and processing of videos, and the researching personnel has made great quantities of researches on the basis of video analysis in the fields of video processing, video image stabilization and computer vision, etc. Although some researchers have proposed several methods of detecting video jitters, the currently available detecting algorithms are not high in precision, as some are not sensitive to videos captured under conditions of large displacement and strong jitter of lenses within a short time, some are not adapted to the detection of rotational movements, and some are not applicable to scenarios in which the lens moves slowly. For example, the following commonly employed methods of detecting video jitters are more or less defective in ways described below.
1. Block matching method: at present, this is the most commonly used algorithm in video image stabilization systems. This method divides a current frame into blocks, each pixel in a block has the same and single motion vector, and the optimal match is searched for in each block within a specific range of reference frames, to thereby estimate the global motion vector of the video sequence.
Division into blocks is usually required by the block matching method, whereby the global motion vector is estimated according to the motion vector in each block, then the detection of video jitter in certain specific scenarios is inferior in effect, for instance, a picture is divided into four grids, in which 3 grids are motionless, while objects in one grid are moving. Besides, Kalman filtering is usually required by the block matching method to process the calculated motion vectors, while the computational expense is large, the real-time property is inferior, and the scenario of large displacement and strong jitter of the lens within a short time cannot be accommodated.
2. Grayscale projection method: based on the principle of consistent distribution of grayscales in overlapped and similar regions in an image, and making use of regional grayscale information of adjacent video frames to seek for vector motion relation, this algorithm mainly consists of relevant calculation of grayscale projection in two directions of different regions, rows and columns. The grayscale projection method is effective to scenarios in which only translational jitters exist, and cannot estimate rotational motion vectors.

SUMMARY OF THE INVENTION

[0004] In order to solve the problems pending in the state of the art, embodiments of the present invention provide a method of and a device for detecting video jitter, so as to overcome such prior-art problems as low precision of detecting algorithms, and insensitivity to videos captured under conditions of large displacement and strong jitter of lenses within a short time.

[0005] In order to solve one or more of the aforementioned technical problems, the present invention employs the following technical solutions.

[0006] According to one aspect, a method of detecting video jitter is provided, and the method comprises the following steps:

[0007] performing a framing process on a to-be-detected video to obtain a frame sequence;

[0008] performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame, and generating a frame feature point sequence matrix;

[0009] basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame;

[0010] obtaining a feature value of the to-be-detected video according to the motion vector of each frame; and

[0011] taking the feature value of the to-be-detected video as an input signal of a detection model to perform operation to obtain an output signal, and judging whether jitter occurs to the to-be-detected video according to the output signal.

[0012] Further, prior to performing feature point detection, the method further comprises the following steps of preprocessing the frame sequence:

[0013] grayscale-processing the frame sequence, and obtaining a grayscale frame sequence; and

[0014] denoising the grayscale frame sequence; wherein

[0015] the step of performing feature point detection on the frame sequence frame by frame is performing feature point detection on the preprocessed frame sequence frame by frame.

[0016] Further, the step of performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame includes:

[0017] employing a feature point detecting algorithm in which are fused FAST
features and SURF features to perform feature point detection on the frame sequence frame by frame, and obtain feature points of each frame.

[0018] Further, the step of basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame includes:

[0019] performing optical flow tracking calculation on the frame feature point sequence matrix of each frame, and obtaining an initial motion vector of each frame;

[0020] obtaining a corresponding accumulative motion vector according to the initial motion vector;

[0021] smoothening the accumulative motion vector, and obtaining a smoothened motion vector;
and

[0022] employing the accumulative motion vector and the smoothened motion vector to readjust the initial motion vector of each frame, and obtaining the motion vector of each frame.

[0023] Further, the step of obtaining a feature value of the to-be-detected video according to the motion vector of each frame includes:

[0024] merging and converting the motion vectors of all frames into a matrix, and calculating unbiased standard deviations of various elements in the matrix;

[0025] weighting and fusing the unbiased standard deviations of the various elements, and obtaining a weighted value; and

[0026] taking the unbiased standard deviations of the various elements and the weighted value as the feature value of the to-be-detected video.

[0027] According to another aspect, a device for detecting video jitter is provided, and the device comprises:

[0028] a framing processing module, for performing a framing process on a to-be-detected video to obtain a frame sequence;

[0029] a feature point detecting module, for performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame, and generating a frame feature point sequence matrix;

[0030] a vector calculating module, for basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame;

[0031] a feature value extracting module, for obtaining a feature value of the to-be-detected video according to the motion vector of each frame; and

[0032] a jitter detecting module, for taking the feature value of the to-be-detected video as an input signal of a detection model to perform operation to obtain an output signal, and judging whether jitter occurs to the to-be-detected video according to the output signal.

[0033] Further, the device further comprises:

[0034] a data preprocessing module, for preprocessing the frame sequence;

[0035] the data preprocessing module includes:

[0036] a grayscale-processing unit, for grayscale-processing the frame sequence, and obtaining a grayscale frame sequence; and

[0037] a denoising processing unit, for denoising the grayscale frame sequence;

[0038] the feature point detecting module is employed for performing feature point detection on the preprocessed frame sequence frame by frame.

[0039] Further, the feature point detecting module is further employed for:

[0040] employing a feature point detecting algorithm in which are fused FAST
features and SURF features to perform feature point detection on the frame sequence frame by frame, and obtain feature points of each frame.
s

[0041] Further, the vector calculating module includes:

[0042] an optical flow tracking unit, for performing optical flow tracking calculation on the frame feature point sequence matrix of each frame, and obtaining an initial motion vector of each frame;

[0043] an accumulation calculating unit, for obtaining a corresponding accumulative motion vector according to the initial motion vector;

[0044] a smoothening processing unit, for smoothening the accumulative motion vector, and obtaining a smoothened motion vector; and

[0045] a vector readjusting unit, for employing the accumulative motion vector and the smoothened motion vector to readjust the initial motion vector of each frame, and obtaining the motion vector of each frame.

[0046] Further, the feature value extracting module includes:

[0047] a matrix converting unit, for merging and converting the motion vectors of all frames into a matrix;

[0048] a standard deviation calculating unit, for calculating unbiased standard deviations of various elements in the matrix; and

[0049] a weighting and fusing unit, for weighting and fusing the unbiased standard deviations of the various elements, and obtaining a weighted value.

[0050] The technical solutions provided by the embodiments of the present invention bring about the following advantageous effects.
1. In the method of and device for detecting video jitter provided by the embodiments of the present invention, by basing on the optical flow tracking algorithm to obtain a motion vector of each frame according to the frame feature point sequence matrix, the present invention effectively solves the problem of failed tracking due to unduly large change between two adjacent frames, exhibits excellent toleration and adaptability in detecting jitters of videos captured under the condition in which the lens slowly moves, and achieves excellent sensitivity and robustness when videos are detected as captured under circumstances of abrupt large displacement, strong jitter, and excessive rotation of the lens.
2. In the method of and device for detecting video jitter provided by the embodiments of the present invention, a feature point detecting algorithm in which are fused FAST features and SURF features is employed, that is to say, the feature point extracting algorithm is so optimized that not only the image global feature is considered, but the local features are also retained, moreover, computational expense is small, robustness against image blurs and faint illumination is strong, and real-time property and precision of the detection are further enhanced.
3. In the method of and device for detecting video jitter provided by the embodiments of the present invention, features of at least four dimensions are extracted from the to-be-detected video, and an SVM model is used as the detection model, so that the method of detecting video jitter as provided by the embodiments of the present invention is more advantageous in terms of generality, and precision of detection is further enhanced.

[0051] Of course, implementation of any one solution according to the present application does not necessarily achieve all of the aforementioned advantages simultaneously.
BRIEF DESCRIPTION OF THE DRAWINGS

[0052] To more clearly explain the technical solutions in the embodiments of the present invention, drawings required for use in the following explanation of the embodiments are briefly described below. Apparently, the drawings described below are merely directed to some embodiments of the present invention, while it is further possible for persons ordinarily skilled in the art to base on these drawings to acquire other drawings, and no creative effort will be spent in the process.

[0053] Fig. 1 is a flowchart illustrating a method of detecting video jitter according to an exemplary embodiment;

[0054] Fig. 2 is a flowchart illustrating preprocessing of the frame sequence according to an exemplary embodiment;

[0055] Fig. 3 is a flowchart illustrating performing an operation on the frame feature point sequence matrix based on an optical flow tracking algorithm to obtain a motion vector of each frame according to an exemplary embodiment;

[0056] Fig. 4 is a flowchart illustrating obtaining a feature value of the to-be-detected video according to the motion vector of each frame according to an exemplary embodiment;
and

[0057] Fig. 5 is a view schematically illustrating the structure of a device for detecting video jitter according to an exemplary embodiment.
DETAILED DESCRIPTION OF THE INVENTION

[0058] To make more lucid and clear the objectives, technical solutions and advantages of the present invention, technical solutions in the embodiments of the present invention will be described more clearly and completely below with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the embodiments described below are merely partial, rather than the entire, embodiments of the present invention. All other embodiments achievable by persons ordinarily skilled in the art on the basis of the embodiments in the present invention without creative effort shall all fall within the protection scope of the present invention.

[0059] Fig. 1 is a flowchart illustrating a method of detecting video jitter according to an exemplary embodiment, with reference to Fig. 1, the method comprises the following steps.

[0060] Si - performing a framing process on a to-be-detected video to obtain a frame sequence.

[0061] Specifically, in order to facilitate subsequent calculation so as to detect the to-be-detected video, after the to-be-detected video (indicated as S) has been obtained, a framing extraction process should be firstly performed on the to-be-detected video S
to obtain a frame sequence corresponding to the to-be-detected video, and the frame sequence is expressed as Li (1=1, 2, 3, ..., n), where Li represents the ith frame of the video, and n represents the total number of frames of the video.

[0062] S2 - performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame, and generating a frame feature point sequence matrix.

[0063] Specifically, it is required in the detection of video jitter to select the current frame and the adjacent next frame (or to extract the next frame by an interval of N
frames) from the video, corresponding feature points should be obtained from the two frames of images, and corresponding matching is subsequently performed according to the feature points of the two frames, to hence judge whether offset (jitter) occurs between the two frames.

[0064] During specific implementation, a feature point detecting algorithm is employed to perform feature point detection on the processed frame sequence Li(i=1, 2, 3, ..., n) frame by frame, feature points of each frame are obtained (i.e., feature points of each frame of image are extracted), and a frame feature point sequence matrix is generated, which is supposed to be expressed as Zi (1=1, 2, ..., n), and which can be specifically expressed as follows:

_ a1 CIL2 = = - a 1 1,q - = - at .
Z-21I,2_2 2,q ,= a' =

Prq 0 a _ F. ai - = - a' i pr2 prq

[0065] where P'q represents a feature point detection result at rowp, column q of the ith frame matrix, 1 is a feature point, 0 is a non-feature point, p represents the number of rows of the matrix, and q represents the number of columns of the matrix.

[0066] S3 - basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame.

[0067] Specifically, the optical flow tracking algorithm is employed to perform optical flow tracking calculation on the frame feature point sequence matrix, namely to track the change of feature points in the current frame to the next frame. For instance, the change of a feature point sequence matrix Zi in the ith frame is tracked to the next i+lth frame, and a motion vector 06/ is obtained, whose expression is as follows:
dx' a (i 1,2,¨ rt) ¨ = eiv ¨

I dr'

[0068] where chci represents a Euclidean column offset from the ith frame to the i+lth frame, dyi represents a Euclidean row offset from the 1th frame to the i+lth frame, and dr' represents an angle offset from the ith frame to the i+lth frame.

[0069] S4 - obtaining a feature value of the to-be-detected video according to the motion vector of each frame.
1.0

[0070] Specifically, the feature value of three dimensions is usually used in the state of the art, whereas in the embodiments of the present invention the extracted feature value at least includes the feature value of four dimensions. The addition of one dimension to the feature value as compared with prior-art technology makes the method of detecting video jitter as provided by the embodiments of the present invention more advantageous in terms of generality, and precision of detection is further enhanced.

[0071] S5 - taking the feature value of the to-be-detected video as an input signal of a detection model to perform operation to obtain an output signal, and judging whether jitter occurs to the to-be-detected video according to the output signal.

[0072] Specifically, the feature value of the to-be-detected video obtained in the previous step is taken as an input signal to be input into the detection model to perform operation to obtain an output signal, and the output signal is based on to judge whether jitter occurs to the to-be-detected video. As should be noted here, the detection model in the embodiments of the present invention is well trained in advance. During the specific training, it is possible to correspondingly process sample video data in a set of selected training data by employing the method in the embodiments of the present invention, to obtain a feature value of the sample video data. A detection model is trained according to the feature value of the sample video data and a corresponding annotation result of the sample video data, until model training is completed to obtain the final detection model.

[0073] For instance, suppose that an mth video sample in a set of jittering video data with annotations has undergone the processing specified in the above step to extract to obtain the feature value of the mth video sample. That is, the mth video sample is firstly framing-processed to obtain a frame sequence, feature point detection is then performed frame by frame on the frame sequence, feature points of each frame are obtained, a frame feature point sequence matrix is generated, the optical flow tracking algorithm is thereafter based on to perform an operation on the frame feature point sequence matrix to obtain the motion vector of each frame, and the feature value of the Mth video sample is finally obtained according to the motion vector of each frame. After the motion vectors have been dimensionally converted, unbiased standard deviations of various elements and their weighted and fused value are calculated and obtained, which are respectively expressed as o- [A, (dx)] ,o- [A, (dy)] ,0[A(h1')1 and K., and the annotation result ym (if ym=0, this indicates that no jitter occurs to the video sample, if ym=1, this indicates that jitter occurs to the video sample) of the nith video sample is extracted to obtain the training sample of the ?nth video sample, which can be expressed as follows:
[A,(dx)]m o- [A,(dy)I1m o-P-(dr)1 Km yn, }(m)

[0074] The video sample makes use of features of at least five dimensions, in comparison with prior-art technology in which features of three dimensions are usually used (the average values of average values, variances, and included angles of translational vectors of the translation quantities of adjacent frames are usually used), generality is more advantageous, and precision of detection is further enhanced. In addition, as a preferred embodiment in the embodiments of the present invention, the detection model can be selected from an SVM model, that is, the feature value of the to-be-detected video as obtained through the previous step is input to a well-trained SVM model to obtain an output result. If the output result is 0, this indicates that no jitter occurs to the to-be-detected video, if the output result is 1, this indicates that jitter occurs to the to-be-detected video. The use of a trainable SVM model as a video jitter decider enables jitter detection of videos captured in different scenarios, and the use of this model makes the generality better, and the precision rate of detection higher.

[0075] Fig. 2 is a flowchart illustrating preprocessing of the frame sequence according to an exemplary embodiment, with reference to Fig. 2, as a preferred embodiment in the embodiments of the present invention, prior to performing feature point detection, the method further comprises the following steps of preprocessing the frame sequence.

[0076] S101 - grayscale-processing the frame sequence, and obtaining a grayscale frame sequence.

[0077] Specifically, since the gray space only contains luminance information and does not contain color information, the amount of information of the image is greatly reduced after grayscale-processing; accordingly, in order to reduce subsequent amount of information participating in the calculation to facilitate subsequent calculation, the frame sequence Li (i=1, 2, 3, ..., n) obtained in the previous step is further grayscale-processed in the embodiments of the present invention, and a grayscale frame sequence is obtained to be expressed as Gi (i=1, 2, 3, ..., n) , in which the grayscale conversion expression is as follows:
G = Rx 0.299+G x 0.587 + B x0.114

[0078] S102 - denoising the grayscale frame sequence.

[0079] Specifically, in order to effectively prevent noise points (namely non feature points) from affecting subsequent steps and to enhance precision of detection, it is further required to denoise the grayscale frame sequence. During specific implementation, it is possible to employ a TV denoising method based on a total variation model to denoise the grayscale frame sequence Gi (i=1, 2, 3, ..., n) to obtain a denoised frame sequence expressed as Ti (i=1, 2, 3, ..., n), namely a preprocessed frame sequence to which the to-be-detected video corresponds. As should be noted here, the denoising method is randomly selectable in the embodiments of the present invention, and no restriction is made thereto in this context.

[0080] The step of performing feature point detection on the frame sequence frame by frame is performing feature point detection on the preprocessed frame sequence frame by frame.

[0081] As a preferred embodiment in the embodiments of the present invention, the step of performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame includes:

[0082] employing a feature point detecting algorithm in which are fused FAST
features and SURF features to perform feature point detection on the frame sequence frame by frame, and obtain feature points of each frame.

[0083] Specifically, since the precision of the video jitter detecting algorithm is affected by feature point extraction and the matching technique, the performance of the feature point extracting algorithm will directly affect the precision of the video jitter detecting algorithm, so the feature point extracting algorithm is optimized in the embodiments of the present invention. As a preferred embodiment, a feature point detecting algorithm in which are fused FAST features and SURF features is employed. The SURF
algorithm is an improvement over SIFT algorithm. SIFT is a feature describing method with excellent robustness and invariant scale, while the SURF algorithm improves over the problems of large amount of data to be calculated, high time complexity and long duration of calculation inherent in the SIFT algorithm at the same time of maintaining the advantages of the SIFT algorithm. Moreover, SURF is more excellent in performance in the aspect of invariance of illumination change and perspective change, particularly excellent in processing severe blurs and rotations of images, and excellent in describing local features of images. FAST feature detection is a kind of corner detection method, and the most prominent advantage of its algorithm rests in its calculation efficiency, and the capability to excellently describe global features of images. Therefore, use of the feature point detecting algorithm in which are fused FAST features and SURF features to perform feature point extraction not only gives consideration to global features of images, but also fully retains local features of images, moreover, computational expense is small, robustness against image blurs and faint illumination is strong, and real-time property and precision of the detection are further enhanced.

[0084] Fig. 3 is a flowchart illustrating performing an operation on the frame feature point sequence matrix based on an optical flow tracking algorithm to obtain a motion vector of each frame according to an exemplary embodiment, with reference to Fig. 3, as a preferred embodiment in the embodiments of the present invention, the step of basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame includes the following steps.

[0085] S301 - performing optical flow tracking calculation on the frame feature point sequence matrix of each frame, and obtaining an initial motion vector of each frame.

[0086] Specifically, while the optical flow tracking calculation is performed on the frame feature point sequence matrix, a pyramid optical flow tracking LK (Lucas-Kanade) algorithm can be employed. For instance, the change of a feature point sequence matrix Zi in the ith frame is tracked to the next i+lth frame, and a motion vector ai is obtained, whose expression is as follows:
dx' a i (i = 1,2,¨, n) - - dyi I dr'

[0087] where dx-' represents a Euclidean column offset from the th frame to the i+lth frame, dy represents a Euclidean row offset from the ith frame to the i+1th frame, and dri represents an angle offset from the ith frame to the i+lth frame.

[0088] Use of the pyramid optical flow tracking LK (Lucas-Kanade) algorithm that utilizes the pyramid iterative structure can effectively solve the problem of failed tracking due to unduly large change from feature points of frame A (which is supposed to be the current frame) to feature points of frame B (which is supposed to be the next frame), and lays the foundation for the method of detecting video jitter provided by the embodiments of the present invention to enhance its jitter detecting sensitivity and robustness when videos are processed as captured under circumstances of abrupt large displacement, strong jitter, and excessive rotation of lenses.

[0089] S302 - obtaining a corresponding accumulative motion vector according to the initial motion vector.

[0090] Specifically, accumulative integral transformation is performed on the initial motion vector Ot: of each frame as obtained in step S301 to obtain an accumulative motion vector, expressed as /8 , of each frame, in which the expression of the accumulative motion vector pi is as follows:
j=1 dr

[0091] S303 - smoothening the accumulative motion vector, and obtaining a smoothened motion vector.

[0092] Specifically, a sliding average window is used to smoothen the motion vector /8 i obtained in step S302 to obtain a smoothened motion vector r , whose expression is:

CbCj J =1 = E dr' .1 =1 j J =1

[0093] where n represents the total number of frames of the video; the radius of the smooth window is r, and its expression is:
Fl n 20 1= 10 In (1+ prz) _______________________________________________________ n > 20 1. Ill (1+ ,u)

[0094] where indicates a parameter of the sliding window, and la is a positive number, the specific numerical value of can be dynamically adjusted according to practical requirement, for instance, as a preferred embodiment, it can be set as =30.

[0095] In the embodiments of the present invention, the sliding average window with extremely small computational expense is used to smoothen the motion vector, while Kalman filtering with complicated computation is not used for the process, whereby computational expense is further reduced and real-time property is further enhanced, not at the expense of losing any precision.

[0096] S304 - employing the accumulative motion vector and the smoothened motion vector to readjust the initial motion vector of each frame, and obtaining the motion vector of each frame.

[0097] Specifically, # and y obtained in the previous steps S302 and S303 are used to readjust ai obtained in step S301, to obtain a readjusted motion vector A,/ , whose expression is:
r dri - dr' j=1 I 4 11/4, F
= 1, 2, - = - , n) = +(7, -,,0,) = dy' + dyi-Z dyi j=i j=1 dr'+ Zdri-Zdril j=1 i=1

[0098] The readjusted motion vector as obtained is taken as the motion vector of each frame to participate in subsequent calculation, so that the calculation result is made more precise, i.e., the result of video jitter detection is made more precise.

[0099] Fig. 4 is a flowchart illustrating obtaining a feature value of the to-be-detected video according to the motion vector of each frame according to an exemplary embodiment, with reference to Fig. 4, as a preferred embodiment in the embodiments of the present invention, the step of obtaining a feature value of the to-be-detected video according to the motion vector of each frame includes the following steps.

[0100] S401 - merging and converting the motion vectors of all frames into a matrix, and calculating unbiased standard deviations of various elements in the matrix.

[0101] Specifically, the motion vectors of all frames obtained through the previous steps are firstly merged and converted into a matrix, for instance, with respect to the motion vector A A= = A, A./ , it is converted into the form of a matrix [d 2 n , and unbiased standard deviations of its elements are calculated by rows, the specific calculation expression is as follows:
( fr) 11 \41 ____________________________________________________ A) i.n-

[0102] The unbiased standard deviations of the various elements in the matrix can be obtained through the above expression, and are respectively expressed as a [A, ( dx)] , [A.(dy )] and a[A,(dr)] , in which A represents the average value of the samples.

[0103] S402 - weighting and fusing the unbiased standard deviations of the various elements, and obtaining a weighted value.

[0104] Specifically, weights are assigned to the unbiased standard deviations of the various elements according to practical requirements, and the unbiased standard deviations of the various elements are weighted and fused according to the weights, wherein the weights of the unbiased standard deviations of the various elements can be dynamically readjusted according to practical requirements. For instance, set the weight of a [A, (dx)] as 3, the weight of a [gdy)] as 3, and the weight of a[A,(dr)] as 10, then the fusing expression is as follows:
1. K = 3 Cr[ 2(dX)] 3 a[2(dy)] +I Clo-[2.(dr)]

[0105] S403 - taking the unbiased standard deviations of the various elements and the weighted value as the feature value of the to-be-detected video.

[0106] Specifically, in the embodiments of the present invention, the feature value of the to-be-detected video S is the unbiased standard deviations of the various elements and their weighted value as obtained in the previous steps, which are expressed as:
{1X)1, cr[A4dyns 47[2.0ra

[0107] Fig. 5 is a view schematically illustrating the structure of a device for detecting video jitter according to an exemplary embodiment, with reference to Fig. 5, the device comprises:

[0108] a framing processing module, for performing a framing process on a to-be-detected video to obtain a frame sequence;

[0109] a feature point detecting module, for performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame, and generating a frame feature point sequence matrix;

[0110] a vector calculating module, for basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame;

[0111] a feature value extracting module, for obtaining a feature value of the to-be-detected video according to the motion vector of each frame; and

[0112] a jitter detecting module, for taking the feature value of the to-be-detected video as an input signal of a detection model to perform operation to obtain an output signal, and judging whether jitter occurs to the to-be-detected video according to the output signal.

[0113] As a preferred embodiment in the embodiments of the present invention, the device further comprises:

[0114] a data preprocessing module, for preprocessing the frame sequence;

[0115] the data preprocessing module includes:

[0116] a grayscale-processing unit, for grayscale-processing the frame sequence, and obtaining a grayscale frame sequence; and

[0117] a denoising processing unit, for denoising the grayscale frame sequence;

[0118] the feature point detecting module is employed for performing feature point detection on the preprocessed frame sequence frame by frame.

[0119] As a preferred embodiment in the embodiments of the present invention, the feature point detecting module is further employed for:

[0120] employing a feature point detecting algorithm in which are fused FAST
features and SURF features to perform feature point detection on the frame sequence frame by frame, and obtain feature points of each frame.

[0121] As a preferred embodiment in the embodiments of the present invention, the vector calculating module includes:

[0122] an optical flow tracking unit, for performing optical flow tracking calculation on the frame feature point sequence matrix of each frame, and obtaining an initial motion vector of each frame;

[0123] an accumulation calculating unit, for obtaining a corresponding accumulative motion vector according to the initial motion vector;

[0124] a smoothening processing unit, for smoothening the accumulative motion vector, and obtaining a smoothened motion vector; and

[0125] a vector readjusting unit, for employing the accumulative motion vector and the smoothened motion vector to readjust the initial motion vector of each frame, and obtaining the motion vector of each frame.

[0126] As a preferred embodiment in the embodiments of the present invention, the feature value extracting module includes:

[0127] a matrix converting unit, for merging and converting the motion vectors of all frames into a matrix;

[0128] a standard deviation calculating unit, for calculating unbiased standard deviations of various elements in the matrix; and

[0129] a weighting and fusing unit, for weighting and fusing the unbiased standard deviations of the various elements, and obtaining a weighted value.

[0130] In summary, the technical solutions provided by the embodiments of the present invention bring about the following advantageous effects.
1. In the method of and device for detecting video jitter provided by the embodiments of the present invention, by basing on the optical flow tracking algorithm to obtain a motion vector of each frame according to the frame feature point sequence matrix, the present invention effectively solves the problem of failed tracking due to unduly large change between two adjacent frames, exhibits excellent toleration and adaptability in detecting jitters of videos captured under the condition in which the lens slowly moves, and achieves excellent sensitivity and robustness when videos are detected as captured under circumstances of abrupt large displacement, strong jitter, and excessive rotation of the lens.
2. In the method of and device for detecting video jitter provided by the embodiments of the present invention, a feature point detecting algorithm in which are fused FAST features and SURF features is employed, that is to say, the feature point extracting algorithm is so optimized that not only the image global feature is considered, but the local features are also retained, moreover, computational expense is small, robustness against image blurs and faint illumination is strong, and real-time property and precision of the detection are further enhanced.
3. In the method of and device for detecting video jitter provided by the embodiments of the present invention, features of at least four dimensions are extracted from the to-be-detected video, and an SVM model is used as the detection model, so that the method of detecting video jitter as provided by the embodiments of the present invention is more advantageous in terms of generality, and precision of detection is further enhanced.

[0131] Of course, implementation of any one solution according to the present application does not necessarily achieve all of the aforementioned advantages simultaneously.
As should be noted, when the device for detecting video jitter provided by the aforementioned embodiment triggers a detecting business, it is merely exemplarily described with its division into the aforementioned various functional modules, whereas in actual application it is possible to base on requirements to assign the aforementioned functions to different functional modules for completion, that is to say, the internal structure of the device is divided into different functional modules to complete the entire or partial functions as described above. In addition, the device for detecting video jitter provided by the aforementioned embodiment pertains to the same inventive conception as the method of detecting video jitter, in other words, the device is based on the method of detecting video jitter ¨ see the method embodiment for its specific implementation process, while no repetition will be made in this context.

[0132] As comprehensible to persons ordinarily skilled in the art, the entire or partial steps in the aforementioned embodiments can be completed via hardware, or via a program instructing relevant hardware, the program can be stored in a computer-readable storage medium, and the storage medium can be a read-only memory, a magnetic disk or an optical disk, etc.

[0133] The foregoing embodiments are merely preferred embodiments of the present invention, and they are not to be construed as restrictive to the present invention. Any amendment, equivalent substitution, and improvement makeable within the spirit and principle of the present invention shall all fall within the protection scope of the present invention.

Claims

What is claimed is:

1. A method of detecting video jitter, characterized in that the method comprises the following steps:
performing a framing process on a to-be-detected video to obtain a frame sequence;
performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame, and generating a frame feature point sequence matrix;
basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame;
obtaining a feature value of the to-be-detected video according to the motion vector of each frame;
and taking the feature value of the to-be-detected video as an input signal of a detection model to perform operation to obtain an output signal, and judging whether jitter occurs to the to-be-detected video according to the output signal.

2. The method of detecting video jitter according to Claim 1, characterized in that, prior to performing feature point detection, the method further comprises the following steps of preprocessing the frame sequence:
grayscale-processing the frame sequence, and obtaining a grayscale frame sequence; and denoising the grayscale frame sequence;
the step of performing feature point detection on the frame sequence frame by frame is performing feature point detection on the preprocessed frame sequence frame by frame.

3. The method of detecting video jitter according to Claim 1 or 2, characterized in that the step of performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame includes:
employing a feature point detecting algorithm in which are fused FAST features and SURF

features to perform feature point detection on the frame sequence frame by frame, and to obtain feature points of each frame.

4. The method of detecting video jitter according to Claim 1 or 2, characterized in that the step of basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame includes:
performing optical flow tracking calculation on the frame feature point sequence matrix of each frame, and obtaining an initial motion vector of each frame;
obtaining a corresponding accumulative motion vector according to the initial motion vector;
smoothening the accumulative motion vector, and obtaining a smoothened motion vector; and employing the accumulative motion vector and the smoothened motion vector to readjust the initial motion vector of each frame, and obtaining the motion vector of each frame.

5. The method of detecting video jitter according to Claim 1 or 2, characterized in that the step of obtaining a feature value of the to-be-detected video according to the motion vector of each frame includes:
merging and converting the motion vectors of all frames into a matrix, and calculating unbiased standard deviations of various elements in the matrix;
weighting and fusing the unbiased standard deviations of the various elements, and obtaining a weighted value; and taking the unbiased standard deviations of the various elements and the weighted value as the feature value of the to-be-detected video.

6. A device for detecting video jitter, characterized in that the device comprises:
a framing processing module, for performing a framing process on a to-be-detected video to obtain a frame sequence;
a feature point detecting module, for performing feature point detection on the frame sequence frame by frame, obtaining feature points of each frame, and generating a frame feature point sequence matrix;

a vector calculating module, for basing on an optical flow tracking algorithm to perform an operation on the frame feature point sequence matrix to obtain a motion vector of each frame;
a feature value extracting rnodule, for obtaining a feature value of the to-be-detected video according to the motion vector of each frame; and a jitter detecting rnodule, for taking the feature value of the to-be-detected video as an input signal of a detection rnodel to perform operation to obtain an output signal, and judging whether jitter occurs to the to-be-detected video according to the output signal.

7. The device for detecting video jitter according to Claim 6, characterized in that the device further comprises:
a data preprocessing module, for preprocessing the frame sequence;
the data preprocessing module includes:
a grayscale-processing unit, for grayscale-processing the frame sequence, and obtaining a grayscale frame sequence; and a denoising processing unit, for denoising the grayscale frarne sequence;
wherein the feature point detecting module is employed for perforrning feature point detection on the preprocessed frarne sequence frarne by frarne.

8. The device for detecting video jitter according to Claim 6 or 7, characterized in that the feature point detecting module is further employed for:
ernploying a feature point detecting algorithm in which are fused FAST
features and SURF
features to perform feature point detection on the frarne sequence frarne by frarne, and to obtain feature points of each frarne.

9. The device for detecting video jitter according to Clairn 6 or 7, characterized in that the vector calculating rnodule includes:
an optical flow tracking unit, for perforrning optical flow tracking calculation on the frame feature point sequence matrix of each frame, and obtaining an initial rnotion vector of each frame;

an accumulation calculating unit, for obtaining a corresponding accumulative motion vector according to the initial motion vector;
a srnoothening processing unit, for smoothening the accurnulative rnotion vector, and obtaining a smoothened motion vector; and a vector readjusting unit, for ernploying the accumulative motion vector and the smoothened motion vector to readjust the initial motion vector of each frame, and obtaining the motion vector of each frame.

10. The device for detecting video jitter according to Clairn 6 or 7, characterized in that the feature value extracting module includes:
a matrix converting unit, for merging and converting the rnotion vectors of all frames into a rnatrix;
a standard deviation calculating unit, for calculating unbiased standard deviations of various elernents in the rnatrix; and a weighting and fusing unit, for weighting and fusing the unbiased standard deviations of the various elements, and obtaining a weighted value.