CN111741186A

CN111741186A - Video jitter detection method, device and system

Info

Publication number: CN111741186A
Application number: CN202010529379.0A
Authority: CN
Inventors: 胡东; 毛礼建; 陈媛媛
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2020-10-02
Anticipated expiration: 2040-06-11
Also published as: CN111741186B

Abstract

The invention provides a video jitter detection method, a video jitter detection device and a video jitter detection system, which are used for solving the technical problems of low sensitivity, low robustness and inapplicability to complex scenes in the existing video jitter detection technology, wherein the method comprises the following steps: acquiring any two adjacent frames of images in a video stream, and performing pixel value compression processing on characteristic images of the two frames of images; performing pixel value matching on the two frames of characteristic images after the pixel value compression processing, and acquiring a first jitter degree between the two frames of images according to a pixel value matching result; acquiring a second jitter degree between the two frames of images, wherein the second jitter degree is a value obtained after the first jitter degree is adjusted; determining the frame number of continuous frame images with the same offset direction in the video stream according to the second jitter degree between any two adjacent frame images in the video stream; and judging whether the frame number is smaller than a first preset threshold value, if so, determining that the video stream shakes, and if not, determining that the video stream does not shake.

Description

Video jitter detection method, device and system

Technical Field

The present invention relates to the field of image processing, and in particular, to a method, an apparatus, and a system for detecting video jitter.

Background

The video monitoring system is used as an important component of the visual Internet of things, is widely applied to various fields of urban safety, intelligent traffic, intelligent environmental protection, boundary security and the like, and the contradiction of routine maintenance and routing inspection is intensified day by day. According to statistics, in the current domestic running monitoring system, the proportion of cameras which can be normally used is less than 60%, and the running and maintenance work of a huge number of video monitoring systems mostly depends on manual detection and processing. In order to improve the efficiency of the operation and maintenance work of the video monitoring system and know the operation condition of the front-end video equipment in time, the construction of an intelligent video monitoring quality diagnosis system becomes an actual problem which needs to be solved urgently in the field of video monitoring.

Video jitter is an image quality anomaly that often occurs with video surveillance equipment. Normally, the transition between consecutive frames of a moving image sequence is smooth and the picture correlation is relatively continuous, but if the correlation between them fluctuates greatly, the video will appear jittered. In video monitoring, a camera is generally fixed at a certain position, so that the phenomenon of video picture shaking is mainly caused by 1) the camera regularly swings due to the interference of the environment, so that the image shakes up and down or left and right; 2) the camera is being moved by a person, causing picture jitter. In any case, the image is caused to vibrate periodically or distort irregularly, which means that the camera works abnormally, and the working utility of the video monitoring system is seriously affected, therefore, the video image of the video monitoring system needs to be analyzed and detected intelligently, the problem of video jitter fault is found timely, and real-time alarming and repairing are realized.

Currently, there are four main categories of existing video jitter detection methods: a gray projection method, an image block matching method, a feature point matching method, and an LK optical flow method. The gray projection method has the defects that short-time rapid movement of a plurality of objects in a monitoring picture is mistaken for video jitter; the image block matching method and the feature point matching method have the disadvantages that effective detection cannot be carried out on a monitoring scene with clean texture, for example, the background of a monitoring picture is a pure-color wall or floor, on one hand, feature point detection is difficult to carry out, and on the other hand, all areas in the picture are very similar; the LK optical flow method has two disadvantages, one is that the calculation speed is slow, the real-time analysis requirement of the monitoring video is not satisfied, and the other is that the LK optical flow method belongs to the sparse optical flow algorithm, which depends heavily on the feature point detection, and the effect is poor under the condition that the feature point can not be effectively positioned. It can be seen that the existing video jitter detection technology has the problems of low sensitivity, low robustness and unsuitability for complex scenes.

Disclosure of Invention

The embodiment of the application provides a video jitter detection method, a video jitter detection device and a video jitter detection system, which are used for solving the technical problems of low sensitivity, low robustness and inapplicability to complex scenes in the existing video jitter detection technology.

In a first aspect, to solve the foregoing technical problem, an embodiment of the present application provides a video jitter detection method, where a technical solution of the method is as follows:

acquiring any two adjacent frames of images in a video stream, and performing pixel value compression processing on characteristic images of the two frames of images;

performing pixel value matching on the two frames of characteristic images after the pixel value compression processing, and acquiring a first jitter degree between the two frames of images according to the pixel value matching result, wherein the first jitter degree is used for representing the offset distance and the offset direction between the pixel point in the next frame of image and the corresponding pixel point in the previous frame of image;

acquiring a second jitter degree between the two frames of images, wherein the second jitter degree is a value obtained after the first jitter degree is adjusted based on an offset distance and an offset direction between a pixel point in the next frame of image and a corresponding pixel point in a preset area in the previous frame of image;

determining whether any two adjacent frame images in the video stream shift according to a second jitter degree between any two adjacent frame images in the video stream, and if so, determining the frame number of continuous frame images with the same shift direction in the video stream;

and judging whether the frame number is smaller than a first preset threshold value, if so, determining that the video stream shakes, and if not, determining that the video stream does not shake.

In the embodiment of the application, pixel value compression processing may be performed on the feature images of two frames of images, pixel value matching may be performed on the two frames of feature images after the pixel value compression processing, according to a pixel value matching result, a first jitter degree between the two frames of images may be obtained, and then a second jitter degree between the two frames of images may be obtained after the first jitter degree is adjusted based on an offset distance and an offset direction between a pixel point in a next frame of image and a corresponding pixel point in a preset region in a previous frame of image, so that the offset distance and the offset direction between the pixel point in the next frame of image and the corresponding pixel point in the previous frame of image may be stably and comprehensively reflected, and the sensitivity is higher and the robustness is stronger. And finally, determining the frame number of continuous frame images with the same offset direction in the video stream according to a second jitter degree between any two adjacent frame images in the video stream, and determining whether the reason causing the offset of any two adjacent frame images in the video stream is the movement of a camera or not according to whether the frame number of the continuous frame images with the same offset direction in the video stream is smaller than a first preset threshold value or not, namely filtering false detections such as foreground movement, video angle conversion and the like through jitter periodic variation, so that the method is suitable for complex scenes.

In an optional implementation manner, before performing pixel value compression processing on the feature images of the two frames of images, the method further includes:

performing texture detection and texture expansion processing on the two frames of images to obtain texture maps and texture expansion maps corresponding to the two frames of images;

judging whether the texture number on the texture map is larger than a second preset threshold value or not;

if so, determining a texture image corresponding to the frame image as a characteristic image of the frame image;

and if not, determining that the texture expansion image corresponding to the frame image is the characteristic image of the frame image.

In an alternative embodiment, the pixel value compression processing on the feature image of the two frames of images includes:

adopting a first formula to compress the pixel values of the characteristic images of the two frames of images;

the first formula specifically includes:

wherein, CF_i,jFor the pixel value at (i, j) of the feature image after the pixel value compression processing, F_i,jFor the pixel value at (i, j) of the feature image, M is the length of the image in the x direction and N is the length of the image in the y direction.

In an optional implementation manner, performing pixel value matching on the two frames of feature images after the pixel value compression processing, and obtaining a first jitter degree between the two frames of feature images according to a pixel value matching result includes:

adopting a second formula to carry out pixel value matching on the two frames of characteristic images after the pixel value compression processing, and acquiring a first jitter degree between the two frames of images according to the pixel value matching result;

the second formula specifically includes:

wherein J is a first degree of shake between two frame images, J_x、J_yThe jitter degrees in the x direction and the y direction respectively, Dis () is that pixel value matching is carried out through a preset sliding window, pixel rows/columns in the previous frame image corresponding to the pixel rows/columns in the next frame image are determined, urban distance calculation is carried out according to the corresponding pixel rows/columns, CF_m,WThe m-th line pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_i,WThe pixel value of the ith row with the length W of the characteristic image after the pixel value compression processing of the previous frame image, CF_W,nThe n-th column pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_W,jThe length of the characteristic image after the pixel value compression processing of the previous frame image is the ith column pixel value of W, M is 1,2, …, M-W is the starting point of the image x direction, N is 1,2, …, N-W is the starting point of the image y direction, and M is the length of the image x directionAnd degree, wherein N is the length of the image in the y direction, and W is the size of the sliding window.

In an alternative embodiment, obtaining a second degree of judder between the two frame images comprises:

acquiring absolute region jitter degree or average region jitter degree between the two frames of images, wherein the absolute region jitter degree is the offset distance and offset direction between a pixel point in a next frame of image and a corresponding pixel point in an absolute region in a previous frame of image, and the average region jitter degree is the offset distance and offset direction between the pixel point in the next frame of image and the corresponding pixel point in the average region in the previous frame of image;

if the average area jitter degree is obtained, the second jitter degree between the two frames of images is the accumulation of the first jitter degree and the average area jitter degree;

and if the average area jitter degree is not obtained, the second jitter degree between the two frames of images is the accumulation of the first jitter degree and the absolute area jitter degree.

In an alternative embodiment, obtaining the absolute region shaking degree between the two frame images includes:

acquiring the absolute area jitter degree between the two frames of images by adopting a third formula;

the third formula specifically includes:

wherein, J_absFor the absolute region shake degree between two frame images, F_m,nPixel value, F ', at (m, n) of a feature image of the following frame image'_i,jPixel value at (i, j) of the feature image of the previous frame image, M is the length of the image in x direction, N is the length of the image in y direction, dis (min ()) is determined within the absolute area and F_m,nF 'with minimum difference'_i,jAnd to F_m,nAnd F'_i,jCalculating the urban distance, wherein the absolute areas are m-1, m, m +1,j is n-1, n, n +1 respectively.

In an alternative embodiment, obtaining the average region jitter degree between the two images comprises:

acquiring the average region jitter degree between the two frames of images by adopting a fourth formula;

the fourth formula is specifically as follows:

wherein, J_avgIs the average degree of regional judder between two frame images, F_m,nThe pixel value at (m, n) of the feature image of the subsequent frame image, F_i' is a set of pixel values in the i direction corresponding to (M, N) of the feature image of the previous frame image, mean () is a mean value calculation, M is a length in the x direction of the image, N is a length in the y direction of the image, dis (min ()) is a difference between F and M determined in the average region_m,nMean (F) with minimum difference_i') and F_m,nAnd F_i' city distance calculation is performed, and the average area is the directions of i, namely, upper left, upper right, upper left, right, lower left, lower right and lower right.

In a second aspect, a video jitter detection apparatus is provided, including:

the processing module is used for acquiring any two adjacent frames of images in a video stream and carrying out pixel value compression processing on the characteristic images of the two frames of images;

the first obtaining module is used for carrying out pixel value matching on the two frames of characteristic images after the pixel value compression processing, and obtaining a first jitter degree between the two frames of images according to the pixel value matching result, wherein the first jitter degree is used for representing the offset distance and the offset direction between the pixel point in the next frame of image and the corresponding pixel point in the previous frame of image;

a second obtaining module, configured to obtain a second jitter degree between the two frames of images, where the second jitter degree is a value obtained after adjusting the first jitter degree based on an offset distance and an offset direction between a pixel point in the next frame of image and a corresponding pixel point in a preset region in the previous frame of image;

a first determining module, configured to determine whether an offset occurs between any two adjacent frames of images in the video stream according to a second jitter degree between any two adjacent frames of images in the video stream, and if the offset occurs, determine the number of frames of consecutive frames of images in the video stream, where the offset directions of the consecutive frames of images are the same;

and the second determining module is used for judging whether the frame number is smaller than a first preset threshold value, if so, determining that the video stream shakes, and if not, determining that the video stream does not shake.

In an optional embodiment, the apparatus further comprises a third determining module configured to:

In an optional implementation manner, the processing module is specifically configured to:

the first formula specifically includes:

In an optional implementation manner, the first obtaining module is specifically configured to:

the second formula specifically includes:

wherein J is a first degree of shake between two frame images, J_x、J_yThe jitter degrees in the x direction and the y direction respectively, Dis () is that pixel value matching is carried out through a preset sliding window, pixel rows/columns in the previous frame image corresponding to the pixel rows/columns in the next frame image are determined, urban distance calculation is carried out according to the corresponding pixel rows/columns, CF_m,WThe m-th line pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_i,WThe pixel value of the ith row with the length W of the characteristic image after the pixel value compression processing of the previous frame image, CF_W,nThe n-th column pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_W,jThe length of the characteristic image after the pixel value compression processing of the previous frame image is the ith column pixel value of W, M is 1,2, …, M-W is the starting point of the image x direction, N is 1,2, …, N-W is the starting point of the image y direction, M is the length of the image x direction, N is the length of the image y direction, and W is the sliding window size.

In an optional implementation manner, the second obtaining module is specifically configured to:

the third formula specifically includes:

wherein, J_absFor the absolute region shake degree between two frame images, F_m,nPixel value, F ', at (m, n) of a feature image of the following frame image'_i,jPixel value at (i, j) of the feature image of the previous frame image, M is the length of the image in x direction, N is the length of the image in y direction, dis (min ()) is determined within the absolute area and F_m,nF 'with minimum difference'_i,jAnd to F_m,nAnd F'_i,jAnd (5) carrying out urban distance calculation, wherein the absolute areas are i, m is-1, m is m +1, j is n-1, n is n + 1.

the fourth formula is specifically as follows:

wherein, J_avgIs the average degree of regional judder between two frame images, F_m,nThe pixel value at (m, n) of the feature image of the subsequent frame image, F_i' is a feature map of a previous frame imageThe pixel value set of i direction corresponding to (M, N) of the image, mean () is the mean value calculation, M is the length of the image in x direction, N is the length of the image in y direction, dis (min ()) is the average area determined and F_m,nMean (F) with minimum difference_i') and F_m,nAnd F_i' city distance calculation is performed, and the average area is the directions of i, namely, upper left, upper right, upper left, right, lower left, lower right and lower right.

In a third aspect, an embodiment of the present application provides a video jitter detection system, including:

a memory for storing program instructions;

and the processor is used for calling the program instructions stored in the memory and executing the steps included in any one of the implementation modes of the first aspect according to the obtained program instructions.

In a fourth aspect, embodiments of the present application provide a storage medium storing computer-executable instructions for causing a computer to perform the steps included in any one of the embodiments of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application.

Fig. 1 is a schematic structural diagram of a video jitter detection system according to an embodiment of the present application;

FIG. 2-1 is a flowchart illustrating a video jitter detection method according to an embodiment of the present application;

FIG. 2-2 is a diagram illustrating pixel value compression processing performed on an image according to an embodiment of the present application;

FIGS. 2-3 are schematic diagrams illustrating the obtaining of the absolute region jitter degree between two frames of images according to the embodiment of the present application;

FIGS. 2-4 are schematic diagrams illustrating obtaining the average region jitter between two images according to the embodiment of the present application;

FIG. 3 is a schematic structural diagram of an apparatus for detecting video jitter according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video jitter detection system in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described can be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

In the embodiments of the present application, "at least one" may mean at least two, for example, two, three, or more, and the embodiments of the present application are not limited.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship unless otherwise specified.

Currently, there are four main categories of existing video jitter detection methods: a gray projection method, an image block matching method, a feature point matching method, and an LK optical flow method. The gray projection method has the defects that short-time rapid movement of a plurality of objects in a monitoring picture is mistaken for video jitter; the image block matching method and the feature point matching method have the disadvantages that effective detection cannot be carried out on a monitoring scene with clean texture, for example, the background of a monitoring picture is a pure-color wall or floor, on one hand, feature point detection is difficult to carry out, and on the other hand, all areas in the picture are very similar; the LK optical flow method has two disadvantages, one is that the calculation speed is slow, the real-time analysis requirement of the monitoring video is not satisfied, and the other is that the LK optical flow method belongs to the sparse optical flow algorithm, which depends heavily on the feature point detection, and the effect is poor under the condition that the feature point can not be effectively positioned. Therefore, the prior art has the problems of low sensitivity, low robustness and unsuitability for complex scenes.

In view of this, an embodiment of the present application provides a video jitter detection method, which may first perform pixel value compression on feature images of two frames of images, perform pixel value matching on the two frames of feature images after the pixel value compression, obtain a first jitter degree between the two frames of images according to a pixel value matching result, and then adjust the first jitter degree based on an offset distance and an offset direction between a pixel point in a next frame of image and a corresponding pixel point in a preset region in a previous frame of image to obtain a second jitter degree between the two frames of images, so that the offset distance and the offset direction between the pixel point in the next frame of image and the corresponding pixel point in the previous frame of image can be stably and comprehensively reflected, and the method has high sensitivity and high robustness. And finally, determining the frame number of continuous frame images with the same offset direction in the video stream according to a second jitter degree between any two adjacent frame images in the video stream, and determining whether the reason causing the offset of any two adjacent frame images in the video stream is the movement of a camera or not according to whether the frame number of the continuous frame images with the same offset direction in the video stream is smaller than a first preset threshold value or not, namely filtering false detections such as foreground movement, video angle conversion and the like through jitter periodic variation, so that the method is suitable for complex scenes.

In order to better understand the technical solutions, the technical solutions of the present application are described in detail below through the drawings and the specific embodiments of the specification, and it should be understood that the specific features of the embodiments and examples of the present application are detailed descriptions of the technical solutions of the present application, and are not limitations of the technical solutions of the present application, and the technical features of the embodiments and examples of the present application may be combined with each other without conflict.

Fig. 1 is a structure of a video shake detection system to which the method provided in the embodiment of the present application is applicable, but it should be understood that the video shake detection system shown in fig. 1 is a detailed description of a video shake detection system to which the method provided in the embodiment of the present application is applicable, and is not a limitation of a video shake detection system to which the method provided in the embodiment of the present application is applicable.

The video jitter detection system shown in fig. 1 comprises a memory 101, a processor 102, a bus interface 103. The memory 101 and the processor 101 are connected via a bus interface 103. The memory 101 is used to store program instructions. The processor 102 is configured to call the program instructions stored in the memory 101, and execute all steps included in the video jitter detection method according to the obtained program instructions.

Referring to fig. 2-1, a video shake detection method according to an embodiment of the present application is provided, which can be executed by the video shake detection system shown in fig. 1. The specific flow of the method is described below.

Step 201: acquiring any two adjacent frames of images in a video stream, and performing pixel value compression processing on the characteristic images of the two frames of images.

In the embodiment of the present application, any two adjacent frames of images in a video stream are obtained first, feature images of the two frames of images are determined, and specifically, the two adjacent frames of images in the video stream, for example, a first frame of image and a second frame of image in the video stream, texture detection and texture expansion processing are performed on the first frame of image and the second frame of image, and texture images and texture expansion images corresponding to the first frame of image and the second frame of image are obtained, where the texture detection may be canny texture detection, and the texture expansion processing may be canny texture expansion processing, and whether a texture number on a texture image corresponding to the first frame of image and the second frame of image is greater than a second preset threshold is determined, if so, the texture image corresponding to the frame of image is determined to be the feature image of the frame of image, and if not, the texture expansion image corresponding to the frame of image is determined to be the feature image of the frame of image. For ease of understanding, the following description is given by way of example:

for example, canny texture detection and canny texture dilation processing is performed on the first frame image: firstly, the original gray level image of the first frame image is denoised by Gaussian filtering, the denoising treatment brings the side effect of image blurring, and the image blurring is caused by the fact that the outline of an object in the image is not obvious, the gray level change of the edge of the outline is not strong, and the layering sense is not strong, so that the image is clearer due to the fact that the gray level change of the edge of the outline is obvious, the layering sense is strong, and a formula can be adopted after the denoising treatment is carried out on the original gray level image of the first frame image

Calculating the gradient strength and the direction of the first frame of image after Gaussian filtering, wherein G is the gradient strength, theta is the gradient direction, and f is the gradient direction_x、f_yDetermining the gradient in the x direction and the gradient in the y direction of the image respectively, namely determining the size and the direction of the gray change rate of the first frame image, then performing non-maximum suppression processing on the first frame image after Gaussian filtering according to the obtained gradient strength and direction to obtain a canny texture map corresponding to the first frame image, and then processing the canny texture map corresponding to the first frame image by using a sliding window to obtain a canny texture expansion map corresponding to the first frame image.

After determining the feature images of the two frames of images, as shown in fig. 2-2, a first formula is used to perform pixel value compression processing on the feature images of the two frames of images, specifically, the first formula is:

Step 202: and performing pixel value matching on the two frames of characteristic images after the pixel value compression processing, and acquiring a first jitter degree between the two frames of images according to the pixel value matching result, wherein the first jitter degree is used for representing the offset distance and the offset direction between the pixel point in the next frame of image and the corresponding pixel point in the previous frame of image.

In this embodiment of the present application, a second formula is used to perform pixel value matching on two frames of feature images after pixel value compression processing, and a first jitter degree between the two frames of images is obtained according to a pixel value matching result, where the first jitter degree is used to represent an offset distance and an offset direction between a pixel point in a next frame of image and a corresponding pixel point in a previous frame of image, and specifically, the second formula is:

wherein J is a first degree of shake between two frame images, J_x、J_yThe jitter degrees in the x direction and the y direction respectively, Dis () is that pixel value matching is carried out through a preset sliding window, pixel rows/columns in the previous frame image corresponding to the pixel rows/columns in the next frame image are determined, urban distance calculation is carried out according to the corresponding pixel rows/columns, CF_m,WThe m-th line pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_i,WThe pixel value of the ith row with the length W of the characteristic image after the pixel value compression processing of the previous frame image, CF_W,nThe n-th column pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_W,jIs the former oneThe length of the characteristic image after the pixel value compression processing of the frame image is the ith column pixel value of W, M is 1,2, …, M-W is the starting point of the image x direction, N is 1,2, …, N-W is the starting point of the image y direction, M is the length of the image x direction, N is the length of the image y direction, and W is the sliding window size.

For ease of understanding, the following description is given by way of example:

for example, Dis (CF) when m is 1 and W is 16_m,W,CF′_i,W) Pixel value matching is respectively carried out on the 1 st pixel row with the length of 16 of the characteristic image after the pixel value compression processing of the next frame image and the 2 nd to 17 th pixel rows with the length of 16 of the characteristic image after the pixel value compression processing of the previous frame image, the pixel row in the previous frame image corresponding to the 1 st pixel row in the next frame image is determined as the 6 th pixel row in the previous frame image, and the city distance between the 1 st pixel row in the next frame image and the 6 th pixel row in the previous frame image is calculated.

Step 203: and acquiring a second jitter degree between the two frames of images, wherein the second jitter degree is a value obtained after the first jitter degree is adjusted based on the offset distance and the offset direction between the pixel point in the next frame of image and the corresponding pixel point in the preset area in the previous frame of image.

In the embodiment of the application, the absolute region shaking degree between two frame images is obtained by adopting a third formula or the average region shaking degree between two frame images is obtained by adopting a fourth formula, wherein, the absolute region jitter degree is the offset distance and offset direction between the pixel point in the next frame image and the corresponding pixel point in the absolute region in the previous frame image, the average region jitter degree is the offset distance and offset direction between the pixel point in the next frame image and the corresponding pixel point in the average region in the previous frame image, if the average region jitter degree is obtained, the second jitter level between the two frames is the summation of the first jitter level and the average regional jitter level, and if the average regional jitter level is not obtained, the second degree of shake between the two frame images is an accumulation of the first degree of shake and the absolute area degree of shake.

Specifically, the third formula is:

The fourth formula is:

for example, as shown in FIGS. 2-3, F_2,2Pixel value at (2,2), F, of the feature image of the subsequent frame image_2,2＝74，F′_2,1Is the pixel value at (2,1) of the feature image of the previous frame image, F'_2,174, then dis (min (F)_2,2,F′_i,j) Means to determine the characteristics of the previous frame imageNeutralization F in pixel values at (1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2) and (3,3) of the image_2,2The pixel value having the smallest difference value is F'_2,1And to F_2,2And F'_2,1Performing urban distance calculation, and determining that the offset distance between a pixel point in the next frame of image and a corresponding pixel point in the absolute area in the previous frame of image is 1 and the offset direction is leftward;

as shown in FIGS. 2-4, F_2,2Pixel value at (2,2), F, of the feature image of the subsequent frame image_2,2＝74，F′_{Upper left of}Is a pixel value set, F ', in the upper left direction corresponding to (2,2) of the feature image of the previous frame image'_{Upper left of}＝{16,19,74,0}，F′_{On the upper part}Is the corresponding pixel value set of the upper direction at (2,2) of the feature image of the previous frame image, F'_{On the upper part}＝{19,0}，F′_{Upper right part}Is the corresponding pixel value set, F ', in the upper right direction at (2,2) of the feature image of the previous frame image'_{Upper right part}＝{19,11,0,17}，F′_{Left side of}Is a pixel value set, F ', in the left direction corresponding to (2,2) of the feature image of the previous frame image'_{Left side of}＝{74,0}，F′_{Right side}Is a pixel value set, F ', in the corresponding right direction at (2,2) of the feature image of the previous frame image'_{Right side}＝{0,17}，F′_{Left side of}′_{Lower part}Is a pixel value set, F ', in the corresponding lower left direction at (2,2) of the feature image of the previous frame image'_{Left lower part}＝{74,0,23,0}，F′_{Lower part}Is the corresponding pixel value set of the lower direction at (2,2) of the feature image of the previous frame image, F'_{Lower part}＝{0,0}，F′_{Lower right}Is a pixel value set, F ', in the corresponding lower-right direction at (2,2) of the feature image of the previous frame image'_{Lower right}＝{0,17,0,8}，dis(min(F_2,2,mean(F_i')) represents a plurality of pixel value sets F ' determined to be included in the average region '_{Upper left of}、F′_{On the upper part}、F′_{Upper right part}、F′_{Left side of}、F′_{Right side}、F′_{Left lower part}、F′_{Lower part}、F′_{Lower right}Respectively corresponding pixel value set mean sum F_2,2The set mean of the pixel values with the smallest difference value is mean (F'_{Left side of}) And to F_m,nAnd the set of pixel values F'_{Left side of}The urban distance calculation is carried out on the pixel values in the previous frame of image, the offset distance between the pixel point in the next frame of image and the corresponding pixel point in the average region in the previous frame of image is determined to be 1, and the offset direction is leftward, namely the average region is determined by comparing a plurality of regions to determine a reasonable region to determine the offset distance and the offset direction.

Step 204: and determining whether any two adjacent frame images in the video stream shift according to a second jitter degree between any two adjacent frame images in the video stream, and if so, determining the frame number of continuous frame images with the same shift direction in the video stream.

In the embodiment of the present application, according to a second jitter degree between any two adjacent frame images in a video stream, it is determined whether an offset occurs between any two adjacent frame images in the video stream, if no offset occurs, it is determined that the video stream is not jittered, and if an offset occurs, it is determined that the number of frames of consecutive frame images in the video stream, which have the same offset direction, is determined, and for convenience of understanding, the following description is made by way of example:

for example, if the second, third, fourth, fifth, sixth and seventh frame images in the video stream are all shifted to the left, it is determined that there are 6 consecutive frame images in the video stream with the same shift direction;

if the second, third, fourth, sixth and seventh frame images in the video stream all shift to the left and the fifth frame image shifts to the right, determining that 2 frames and 3 frames of continuous frame images with the same shift direction exist in the video stream;

and if the second frame image is shifted to the left, the third frame image is shifted to the right, the fourth frame image is shifted to the left and the fifth frame image is shifted to the right, determining that no continuous frame image with the same shifting direction exists in the video stream.

Step 205: and judging whether the frame number is smaller than a first preset threshold value, if so, determining that the video stream shakes, and if not, determining that the video stream does not shake.

In the embodiment of the present application, it is determined whether the number of frames of consecutive frame images with the same offset direction in a video stream is smaller than a first preset threshold, if yes, it is determined that the video stream is jittered, and if not, it is determined that the video stream is not jittered, so that false detections such as foreground movement and video angle conversion are filtered through jitter periodicity change, and the method is suitable for a complex scene, and for understanding, the following description is given in an exemplary manner:

for example, if the first preset threshold is 5, the number of frames of consecutive frame images with the same offset direction in the video stream is 6, that is, the number of frames of consecutive frame images with the same offset direction in the video stream is greater than the first preset threshold, it is determined that the frame images in the video stream are offset in the same direction for a long time, and the reason that any two adjacent frame images in the video stream are offset is foreground movement or video angle transformation, and the video stream is not jittered;

if the first preset threshold value is 5, the number of frames of the continuous frame images with the same offset direction in the video stream is 4, that is, the number of frames of the continuous frame images with the same offset direction in the video stream is smaller than the first preset threshold value, it is determined that the frame images in the video stream do not offset towards the same direction for a long time, and the reason that any two adjacent frame images in the video stream are offset is that the camera head moves and the video stream shakes;

if the first preset threshold is 5, the number of frames of the continuous frame images with the same offset direction in the video stream is 0, that is, the number of frames of the continuous frame images with the same offset direction in the video stream is smaller than the first preset threshold, it is determined that the frame images in the video stream do not offset in the same direction for a long time, and the reason that any two adjacent frame images in the video stream are offset is that the camera head moves and the video stream shakes.

Based on the same inventive concept, embodiments of the present application provide a video shake detection apparatus, which can implement a function corresponding to the video shake detection method. The video jitter detection means may be a hardware structure, a software module, or a hardware structure plus a software module. The video jitter detection device can be realized by a chip system, and the chip system can be formed by a chip and can also comprise the chip and other discrete devices. Referring to fig. 3, the video jitter detection apparatus includes a processing module 301, a first obtaining module 302, a second obtaining module 303, a first determining module 304, and a second determining module 305, wherein:

the processing module 301 is configured to obtain any two adjacent frames of images in a video stream, and perform pixel value compression processing on feature images of the two frames of images;

a first obtaining module 302, configured to perform pixel value matching on the two frames of feature images after the pixel value compression processing, and obtain a first jitter degree between the two frames of images according to the pixel value matching result, where the first jitter degree is used to indicate an offset distance and an offset direction between a pixel point in a next frame of image and a corresponding pixel point in a previous frame of image;

a second obtaining module 303, configured to obtain a second jitter degree between the two frames of images, where the second jitter degree is a value obtained after adjusting the first jitter degree based on an offset distance and an offset direction between a pixel point in the next frame of image and a corresponding pixel point in a preset region in the previous frame of image;

a first determining module 304, configured to determine whether an offset occurs between any two adjacent frames of images in the video stream according to a second jitter degree between any two adjacent frames of images in the video stream, and if the offset occurs, determine the number of frames of consecutive frames of images in the video stream that have the same offset direction;

a second determining module 305, configured to determine whether the frame number is smaller than a first preset threshold, if so, determine that the video stream jitters, and if not, determine that the video stream is not jittered.

In an optional implementation manner, the processing module 301 is specifically configured to:

the first formula specifically includes:

In an optional implementation manner, the first obtaining module 302 is specifically configured to:

the second formula specifically includes:

wherein J is a first degree of shake between two frame images, J_x、J_yThe jitter degrees in the x direction and the y direction respectively, Dis () is that pixel value matching is carried out through a preset sliding window, pixel rows/columns in the previous frame image corresponding to the pixel rows/columns in the next frame image are determined, urban distance calculation is carried out according to the corresponding pixel rows/columns, CF_m,WThe m-th line pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_i,WThe pixel value of the ith row with the length W of the characteristic image after the pixel value compression processing of the previous frame image, CF_W,nCompressing the length of the processed characteristic image for the pixel value of the next frame imageIs n-th column pixel value of W, CF'_W,jThe length of the characteristic image after the pixel value compression processing of the previous frame image is the ith column pixel value of W, M is 1,2, …, M-W is the starting point of the image x direction, N is 1,2, …, N-W is the starting point of the image y direction, M is the length of the image x direction, N is the length of the image y direction, and W is the sliding window size.

In an optional implementation manner, the second obtaining module 303 is specifically configured to:

the third formula specifically includes:

wherein, J_absFor the absolute region shake degree between two frame images, F_m,nPixel value, F ', at (m, n) of a feature image of the following frame image'_i,jIs the pixel value at (i, j) of the feature image of the previous frame image, M is the length of the image in the x direction, N is the length of the image in the y direction, dis (min ()) is determined within the absolute region with F_m,nF 'with minimum difference'_i,jAnd to F_m,nAnd F'_i,jAnd (5) carrying out urban distance calculation, wherein the absolute areas are i, m is-1, m is m +1, j is n-1, n is n + 1.

the fourth formula is specifically as follows:

Based on the same inventive concept, an embodiment of the present application provides a video jitter detection system, please refer to fig. 4, where the video jitter detection system includes at least one processor 402 and a memory 401 connected to the at least one processor, a specific connection medium between the processor 402 and the memory 401 is not limited in this embodiment of the present application, fig. 4 illustrates that the processor 402 and the memory 401 are connected by a bus 400, the bus 400 is represented by a thick line in fig. 4, and a connection manner between other components is only schematically illustrated and not limited thereto. The bus 400 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 4 for ease of illustration, but does not represent only one bus or type of bus.

In the embodiment of the present application, the memory 401 stores instructions executable by the at least one processor 402, and the at least one processor 402 may perform the steps included in the foregoing video jitter detection method by calling the instructions stored in the memory 401. The processor 402 is a control center of the video jitter detection system, and may be connected to various parts of the entire video jitter detection system through various interfaces and lines, and implement various functions of the video jitter detection system by executing instructions stored in the memory 401. Optionally, the processor 402 may include one or more processing units, and the processor 402 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 402. In some embodiments, processor 402 and memory 401 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

Memory 401, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 401 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 401 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 401 in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function for storing program instructions and/or data.

The processor 402 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the video jitter detection method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

By programming the processor 402, the code corresponding to the video jitter detection method described in the foregoing embodiment may be solidified into a chip, so that the chip can execute the steps of the video jitter detection method when running, and how to program the processor 402 is a technique known by those skilled in the art, and is not described herein again.

Based on the same inventive concept, embodiments of the present application further provide a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform the steps of the video jitter detection method as described above.

In some possible embodiments, the aspects of the video shake detection method provided by the present application may also be implemented in the form of a program product, which includes program code for causing a video shake detection system to perform the steps of the video shake detection method according to various exemplary embodiments of the present application described above in this specification, when the program product is run on the video shake detection system.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A video jitter detection method, comprising:

2. The method according to claim 1, wherein before the pixel value compression processing is performed on the feature images of the two frames of images, the method further comprises:

3. The method according to claim 2, wherein the pixel value compression processing is performed on the feature images of the two frame images, and comprises:

the first formula specifically includes:

wherein, CF_i，jFor the pixel value at (i, j) of the feature image after the pixel value compression processing, F_i，jFor the pixel value at (i, j) of the feature image, M is the length of the image in the x direction and N is the length of the image in the y direction.

4. The method according to claim 1, wherein performing pixel value matching on the two frames of feature images after the pixel value compression processing, and obtaining a first jitter degree between the two frames of feature images according to the pixel value matching result comprises:

the second formula specifically includes:

wherein J is a first degree of shake between two frame images, J_x、J_yThe jitter degrees in the x direction and the y direction respectively, Dis () is that pixel value matching is carried out through a preset sliding window, pixel rows/columns in the previous frame image corresponding to the pixel rows/columns in the next frame image are determined, urban distance calculation is carried out according to the corresponding pixel rows/columns, CF_m，WThe m-th line pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_i，WThe pixel value of the ith row with the length W of the characteristic image after the pixel value compression processing of the previous frame image, CF_W，nThe n-th column pixel value, CF ', of the feature image after pixel value compression processing of the subsequent frame image'_W，jThe length of a feature image after pixel value compression processing of a previous frame image is the ith row of pixel values of W, wherein M is 1, 2.

5. The method of any of claims 1-4, wherein obtaining a second degree of jitter between the two frame images comprises:

6. The method of claim 5, wherein obtaining the absolute region dither level between the two images comprises:

the third formula specifically includes:

wherein, J_absFor the absolute region shake degree between two frame images, F_m，nPixel value, F ', at (m, n) of a feature image of the following frame image'_i，jPixel value at (i, j) of the feature image of the previous frame image, M is the length of the image in x direction, N is the length of the image in y direction, dis (min ()) is determined within the absolute area and F_m，nF 'with minimum difference'_i，jAnd to F_m，nAnd F'_i，jAnd (5) carrying out urban distance calculation, wherein the absolute areas are i, m is-1, m is m +1, j is n-1, n is n + 1.

7. The method of claim 6, wherein obtaining an average region jitter level between the two images comprises:

the fourth formula is specifically as follows:

wherein, J_avgIs the average degree of regional judder between two frame images, F_m，nThe pixel value at (m, n) of the feature image of the subsequent frame image, F_i' is a set of pixel values in the i direction corresponding to (M, N) of the feature image of the previous frame image, mean () is a mean value calculation, M is a length in the x direction of the image, N is a length in the y direction of the image, dis (min ()) is a difference between F and M determined in the average region_m，nMean (F) with minimum difference_i') and F_m，nAnd F_i' city distance calculation is performed, and the average area is the directions of i, namely, upper left, upper right, upper left, right, lower left, lower right and lower right.

8. A video judder detection device, comprising:

9. A video judder detection system, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory and for executing the steps comprised by the method of any one of claims 1 to 7 in accordance with the obtained program instructions.

10. A storage medium storing computer-executable instructions for causing a computer to perform the steps comprising the method of any one of claims 1-7.