CN113905147A

CN113905147A - Method and device for removing jitter of marine monitoring video picture and storage medium

Info

Publication number: CN113905147A
Application number: CN202111156440.2A
Authority: CN
Inventors: 顾骏; 王珅; 彭梦兰; 常健杰; 唐红梅; 黄永珍; 罗勇
Original assignee: Guilin Changhai Development Co ltd
Current assignee: Guilin Changhai Development Co ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-01-07
Anticipated expiration: 2041-09-30
Also published as: CN113905147B

Abstract

The invention provides a method, a device and a storage medium for removing jitter of a marine monitoring video picture, wherein the method comprises the following steps: the method comprises the steps of preprocessing a plurality of video frame images to be processed to obtain a reference frame image and a plurality of video frame images, analyzing offset vectors of the reference frame image and the plurality of video frame images to obtain offset vectors, constructing perspective projection matrixes through the offset vectors, and performing rendering transformation compensation on the video frame images according to the perspective projection matrixes to obtain a debounced video picture. The invention solves the bottleneck problems of low calculation speed and unsatisfactory jitter removal effect in the current software jitter removal technology, thereby achieving the effect that a computer with lower service performance can also realize real-time video jitter removal, improving the jitter removal effect and enhancing the definition of the processed video.

Description

Method and device for removing jitter of marine monitoring video picture and storage medium

Technical Field

The invention mainly relates to the technical field of image processing, in particular to a method and a device for removing jitter of a marine monitoring video picture and a storage medium.

Background

In a video monitoring system for a ship, a camera fixedly installed on the ship vibrates due to factors such as ship resonance, sea wave impact and the like, so that a phenomenon of shaking and blurring of an acquired video picture is caused, the watching effect is influenced, and the problem can be solved through a hardware and software mode. The hardware mode is to install various stabilizer devices on the market, so that the problems of high cost, poor safety, inconvenience in installation and the like exist, and the method is not suitable for a video monitoring system for a ship; the software mode is to process the jittered picture by a computer image processing method and automatically transform and compensate the jittered picture, thereby visually achieving the effect of removing the jitters of the video. However, the existing software debouncing technology has several problems, one is that the calculated amount is large, the requirement on computer hardware is too high, and the real-time video processing effect is difficult to meet; secondly, the processing effect of removing the tremble is not ideal, and the processed video is still fuzzy.

Disclosure of Invention

The invention aims to solve the technical problem of the prior art and provides a method and a device for removing jitter of a marine monitoring video picture and a storage medium.

The technical scheme for solving the technical problems is as follows: a method for removing jitter of a marine surveillance video picture comprises the following steps:

importing video data, wherein the video data comprises a plurality of video frame images to be processed, and preprocessing the plurality of video frame images to be processed to obtain a reference frame image and a plurality of video frame images;

analyzing offset vectors of the reference frame image and the plurality of video frame images to obtain offset vectors corresponding to the video frame images;

constructing a perspective projection matrix corresponding to each video frame image through each offset vector;

and performing rendering transformation compensation on each video frame image according to each perspective projection matrix to obtain a debounced video picture corresponding to each video frame image.

Another technical solution of the present invention for solving the above technical problems is as follows: a marine surveillance video picture de-jitter apparatus, comprising:

the image preprocessing module is used for importing video data, wherein the video data comprise a plurality of video frame images to be processed, and preprocessing the video frame images to be processed to obtain a reference frame image and a plurality of video frame images;

the offset vector analysis module is used for analyzing offset vectors of the reference frame image and the video frame images to obtain the offset vectors corresponding to the video frame images;

the matrix construction module is used for constructing perspective projection matrixes corresponding to the video frame images through the offset vectors;

and the de-jittering video image obtaining module is used for respectively carrying out rendering transformation compensation on each video frame image according to each perspective projection matrix to obtain a de-jittering video image corresponding to each video frame image.

Another technical solution of the present invention for solving the above technical problems is as follows: a shipboard surveillance video picture de-jittering apparatus comprises a memory, a processor and a computer program stored in the memory and operable on the processor, wherein when the computer program is executed by the processor, the shipboard surveillance video picture de-jittering method is realized.

Another technical solution of the present invention for solving the above technical problems is as follows: a computer readable storage medium storing a computer program which, when executed by a processor, implements a shipboard surveillance video picture de-jittering method as described above.

The invention has the beneficial effects that: the method comprises the steps of preprocessing a plurality of video frame images to be processed to obtain a reference frame image and a plurality of video frame images, analyzing offset vectors of the reference frame image and the plurality of video frame images to obtain offset vectors, constructing perspective projection matrixes through the offset vectors, and performing rendering transformation compensation on the video frame images according to the perspective projection matrixes to obtain a debounce video image.

Drawings

Fig. 1 is a schematic flowchart of a method for removing jitter in a surveillance video frame for a ship according to an embodiment of the present invention;

fig. 2 is a block diagram of a monitoring video image de-jitter apparatus for a ship according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a flowchart illustrating a method for removing jitter in a surveillance video frame for a ship according to an embodiment of the present invention.

As shown in fig. 1, a method for removing jitter in a surveillance video frame for a ship includes the following steps:

In the embodiment, the reference frame image and the video frame images are obtained by preprocessing the video frame images to be processed, the offset vectors of the reference frame image and the video frame images are analyzed to obtain the offset vectors, the perspective projection matrixes are constructed through the offset vectors, and the debounce video images are obtained by performing rendering transformation compensation on the video frame images according to the perspective projection matrixes, so that the bottleneck problems of low calculation speed and unsatisfactory debounce effect in the current software debounce technology are solved, the real-time video debounce effect can be realized by a computer with low service performance, the debounce effect is also improved, and the definition of the processed video is enhanced.

Optionally, as an embodiment of the present invention, the sequentially arranging a plurality of the to-be-processed video frame images, and the pre-processing the plurality of to-be-processed video frame images to obtain a reference frame image and a plurality of video frame images includes:

respectively carrying out image format conversion on each video frame image to be processed in a set format to obtain a converted image corresponding to each video frame image to be processed;

scaling the converted images respectively to obtain scaled images corresponding to the video images;

and carrying out gray level conversion on the zoomed images respectively to obtain video frame images corresponding to the video images, and taking the first video frame image as a reference frame image.

It should be understood that the reference frame image is used for the localization of feature points as a reference for calculating the amount of motion change for subsequent frames.

It should be understood that the setting format is the BGR24 format, i.e., the yuv format of the video frame image to be processed is converted into the rgb format for subsequent image processing, and the BGR24 format belongs to one of the rgb formats.

It should be understood that, after acquiring real-time video image data (i.e. the video frame image to be processed), converting each acquired frame of image (i.e. the video frame image to be processed) into BGR24 format, and performing 1/4 scaling and gray scale conversion to obtain a plurality of image data (i.e. the video frame images); and putting the 1 st frame image data (i.e. the video frame image) into a memory as a debounced reference frame (i.e. the reference frame image).

In the above embodiment, the image formats of the video frame images to be processed are respectively converted by the set formats to obtain converted images, the converted images are respectively scaled to obtain scaled images, the scaled images are respectively converted to obtain the video frame images by the gray scale of the scaled images, and the first video frame image is used as the reference frame image, so that accurate data is provided for subsequent processing, the debouncing effect is improved, the definition of the processed video is enhanced, and the purpose that the real-time video debouncing can be realized by a computer with lower usability is achieved.

Optionally, as an embodiment of the present invention, the analyzing the offset vectors of the reference frame image and the plurality of video frame images to obtain the offset vector corresponding to each of the video frame images includes:

positioning the characteristic points of the reference frame image to obtain a plurality of reference characteristic points, and collecting all the reference characteristic points to obtain global characteristic points;

and carrying out feature point iterative computation on the plurality of video frame images and the global feature points through a pyramid optical flow method and the reference frame image to obtain a plurality of offset values corresponding to the video frame images, respectively collecting the plurality of offset values corresponding to the video frame images, and obtaining offset vectors corresponding to the video frame images.

It should be understood that the global feature point will subsequently perform estimation tracking on each feature point (i.e., the reference feature point).

It should be understood that the pyramid optical flow method, i.e. the lucas-kanard method, also called pyramid optical flow method, widely uses a differential method of optical flow estimation, which assumes a constant in the field of pixel points and then solves the basic optical flow equation for all pixel points in the neighborhood using the least squares method.

Specifically, 100-200 goodffeatures feature points are positioned for a reference frame (i.e. the reference frame image), each subsequent frame image tracks the feature points (i.e. the reference feature points), the feature points (i.e. the reference feature points) are obtained according to a specified rule, and all the feature points are counted as global feature points; and establishing an LK optical flow pyramid according to an improved optical flow method (namely the pyramid optical flow method), and putting the global feature points of the previous frame into the lowest layer of the pyramid (namely, only sending the obtained feature point set (namely the global feature points) into the lowest layer of the pyramid, namely, as input data).

It should be understood that the reference frame image is subjected to the positioning of feature points for the tracking of each subsequent frame; the specific rule is to randomly acquire a plurality of points, and then screen out at least 100 and at most 200 points which have a difference exceeding a certain range with respect to neighboring gray-scale values around as feature points (i.e., the reference feature points) meeting the requirements.

It should be understood that two frames of images (i.e., the reference frame image and/or the video frame image) before and after completing feature point positioning are sent to an LK optical flow pyramid for feature point iterative computation, a layer 4 is a pyramid top layer, and a PointTrack offset vector of an adjacent frame is obtained after 4 iterations as a global motion amount (i.e., the offset vector).

It should be understood that, using the global feature points, 4 times of iterative computations are performed by the pyramid optical flow method, and according to the characteristics that the brightness change of the adjacent frame images is small and the content displacement is small, the offset amounts (i.e., the offset vectors) of the two previous and next frame images to all the feature points are obtained. The PointTrack offset vector refers to a set of offset vectors after all characteristic points of a relative reference frame are offset, and is used for acquiring a projective transformation matrix in the next step.

In the embodiment, the feature points of the reference frame image are positioned to obtain global feature points, the pyramid optical flow method and the reference frame image are used for iteratively calculating the feature points of the plurality of video frame images and the global feature points to obtain offset vectors, and each frame can be tracked subsequently, so that the definition of each frame image is ensured, and the purpose that a computer with low use performance can realize real-time video debouncing is achieved.

Optionally, as an embodiment of the present invention, the process of constructing a perspective projection matrix corresponding to each of the video frame images by using each of the offset vectors includes:

respectively carrying out rectangular construction on each offset vector by using a Matrix3D Matrix tool to obtain a first isomorphic Matrix corresponding to each video frame image and a second isomorphic Matrix corresponding to each video frame image;

restoring each first homogeneous matrix and each second homogeneous matrix according to a preset initial scaling ratio to obtain a restored first homogeneous matrix corresponding to each video frame image and a restored second homogeneous matrix corresponding to each video frame image;

respectively carrying out cyclic subtraction conversion on each restored first isomorphic matrix and each restored second isomorphic matrix continuously to obtain a relative offset matrix corresponding to each video frame image;

and importing an initial scaling, and analyzing the offset value of each relative offset matrix through the initial scaling to obtain a perspective projection matrix corresponding to each video frame image.

Preferably, the initial scaling may be 90% of the original scaling.

It should be understood that the Matrix3D tool, the Matrix3D tool class, represents a transformation Matrix that determines the position and orientation of three-dimensional (3D) display objects. The matrix may perform translation functions including translation (repositioning along the x, y, and z axes), rotation, and scaling (resizing). The Matrix3D class may also perform perspective projection, which maps points in 3D coordinate space to a two-dimensional (2D) view.

It should be understood that the image scaling matrix Zoomer is computed from the offset vectors of adjacent frames; constructing the zooming proportion of the two scaling matrixes into homomorphic matrixes; finally, after the scaling compensation is carried out according to the original 1/4, the angle, the variable magnification number and the displacement difference parameter of the xyz axis generated by image transformation can be obtained, and finally the parameters are converted into a perspective projection matrix.

Specifically, the global motion amount (i.e., the offset vector) is put into a Matrix tool of Matrix3D, a homotypic Matrix H1 (i.e., the first homotypic Matrix) and a homotypic Matrix H2 (i.e., the second homotypic Matrix) are constructed, the matrices (i.e., the first homotypic Matrix and the second homotypic Matrix) are restored according to an original 1/4 scaling ratio, the homotypic Matrix H1 (i.e., the restored first homotypic Matrix) and the homotypic Matrix H2 (i.e., the restored second homotypic Matrix) are subjected to cyclic subtraction to be converted into a relative offset Matrix H, a default scaling Zoom (i.e., the initial scaling ratio) is set to 90% of the original scaling ratio, and an angle offset value and an xyz axis offset value of the current frame can be obtained through calculation.

In the above embodiment, the perspective projection matrix corresponding to each video frame image is constructed by each offset vector, so that the bottleneck problems of low calculation speed and unsatisfactory shake removal effect in the current software shake removal technology are solved, the effect of real-time video shake removal can be realized by a computer with lower use performance, the shake removal effect is also improved, and the definition of the processed video is enhanced.

Optionally, as an embodiment of the present invention, the analyzing the offset value of each relative offset matrix according to the initial scaling ratio to obtain a perspective projection matrix corresponding to each video frame image includes:

calculating the angle deviation value of each relative deviation matrix respectively through a first equation to obtain the angle deviation value corresponding to each video frame image, wherein the first equation is as follows:

△θ＝arcsin(H(0,1))*180.f/pi，

wherein, Delta theta is an angle offset value, f is a floating point, pi is pi, and H (0,1) is the 1 st row and the 0 th column of the relative offset matrix;

respectively calculating the x-axis offset value of each relative offset matrix according to a second formula and the initial scaling to obtain the x-axis offset value corresponding to each video frame image, wherein the second formula is as follows:

△x＝-H(0,2)*Zoom，

where Δ x is the x-axis offset value, Zoom is the initial scaling, and H (0,2) is the 0 th row and 2 nd column of the relative offset matrix;

respectively calculating y-axis offset values of the relative offset matrixes according to a third formula and the initial scaling to obtain y-axis offset values corresponding to the video frame images, wherein the third formula is as follows:

△y＝-H(1,2)*Zoom，

wherein, Δ y is the y-axis offset value, Zoom is the initial scaling, and H (1,2) is the 1 st row and 2 nd column of the relative offset matrix;

respectively calculating a z-axis offset value of each relative offset matrix according to a fourth formula and the initial scaling to obtain a z-axis offset value corresponding to each video frame image, where the fourth formula is:

△z＝-H(2,2)*Zoom，

wherein, Δ z is a z-axis offset value, Zoom is an initial scaling, and H (2,2) is a 2 nd row and a 2 nd column of the relative offset matrix;

judging whether a condition is met, wherein the condition comprises that the difference value of the angle deviation value corresponding to the current video frame image and the angle deviation value corresponding to the last video frame image is greater than or equal to a preset angle deviation threshold value, the difference value of the x-axis deviation value corresponding to the current video frame image and the x-axis deviation value corresponding to the last video frame image is greater than or equal to a preset x-axis deviation threshold value, the difference value of the y-axis deviation value corresponding to the current video frame image and the y-axis deviation value corresponding to the last video frame image is greater than or equal to a preset y-axis deviation threshold value, and the difference value of the z-axis deviation value corresponding to the current video frame image and the z-axis deviation value corresponding to the last video frame image is greater than or equal to a preset z-axis deviation threshold value,

if the conditions are met, reversely modifying the initial scaling according to a preset first modification value to obtain a first scaling, and taking the first scaling as the modified scaling;

if any condition is not met, modifying the initial scaling according to a preset second modification value to obtain a second scaling, and taking the second scaling as the modified scaling;

and performing matrix conversion on each angle deviation value, the x-axis deviation value, the y-axis deviation value, the z-axis deviation value and the modified scaling ratio to obtain perspective projection matrixes corresponding to the video frame images.

It should be appreciated that the image scaling effect is scaled in order to reduce the black edges around the image when the transform compensates for the motion. The default is set to a 90% zoom ratio, and the zoom ratio is modified when the calculated angle, xyz axis displacement difference change suddenly becomes larger or smaller, and the change is leveled off to increase or decrease the zoom ratio to 90% frame by frame.

It should be understood that, when the calculated angle offset value and xyz-axis offset value are suddenly changed compared to the previous frame, the Zoom value (i.e. the initial Zoom ratio) is reversely modified according to a preset modification value (i.e. the preset first modification value), and when the change is flat, the Zoom ratio (i.e. the initial Zoom ratio) is modified back to 90% frame by frame in a +1 or-1 manner, and finally the angle offset value, xyz-axis offset value and Zoom value (i.e. the modified Zoom ratio) are converted into a perspective projection matrix corresponding to each frame according to the rules of perspective projection.

In the embodiment, the perspective projection matrix is obtained by analyzing the deviation value of each relative deviation matrix through the initial scaling, so that the black edges generated around the image when the transformation compensation moves are reduced, and the bottleneck problems of low calculation speed and unsatisfactory jitter removal effect in the current software jitter removal technology are solved, thereby achieving the effect that a computer with lower service performance can also realize real-time video jitter removal, improving the jitter removal effect and enhancing the definition of the processed video.

Optionally, as an embodiment of the present invention, the process of performing rendering transformation compensation on each video frame image according to each perspective projection matrix to obtain a debounced video picture corresponding to each video frame image includes:

respectively carrying out perspective transformation on each video frame image according to each perspective projection matrix to obtain a transformed image corresponding to the video frame image;

rendering each transformed image respectively to obtain a debounce video picture corresponding to each video frame image.

It should be understood that, using a vertex shader as a rendering tool, after obtaining a perspective projection matrix, performing perspective transformation processing on an original image (i.e., the video frame image) according to the perspective projection matrix, and then rendering transformed image data (i.e., the transformed image), so as to obtain a compensated de-jittered video image (i.e., the de-jittered video picture).

In the above embodiment, the transformed images are obtained by respectively performing perspective transformation on each video frame image through each perspective projection matrix, and the debounce video pictures are obtained by respectively performing rendering on each transformed image, so that automatic debounce processing of real-time videos with blurred shake acquired by a camera in a marine video monitoring system is realized, frame images are rapidly processed within 40ms, and the picture output is clear and stable.

Alternatively, as another embodiment of the present invention, as shown in fig. 2, a video picture de-jitter apparatus for monitoring on a ship includes:

Optionally, as an embodiment of the present invention, the to-be-processed video frame images are sequentially arranged, and the image preprocessing module is specifically configured to:

Optionally, another embodiment of the present invention provides a ship surveillance video picture de-jittering apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where when the processor executes the computer program, the ship surveillance video picture de-jittering method as described above is implemented. The device may be a computer or the like.

Alternatively, another embodiment of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the shipboard surveillance video picture de-jittering method as described above.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for removing jitter of a marine surveillance video picture is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the plurality of video frame pictures to be processed are sequentially arranged, and the step of preprocessing the plurality of video frame pictures to be processed to obtain a reference frame picture and a plurality of video frame pictures comprises:

3. The method according to claim 1, wherein the analyzing the reference frame picture and the plurality of video frame pictures for the offset vector to obtain the offset vector corresponding to each of the video frame pictures comprises:

4. The method for removing jitter from video frames of ship surveillance as claimed in claim 1, wherein said process of constructing a perspective projection matrix corresponding to each of said video frame images by each of said offset vectors comprises:

5. The method according to claim 4, wherein the step of analyzing the offset value of each of the relative offset matrices according to the initial scaling to obtain the perspective projection matrix corresponding to each of the video frame images comprises:

△θ＝arcsin(H(0,1))*180.f/pi，

△x＝-H(0,2)*Zoom，

△y＝-H(1,2)*Zoom，

△z＝-H(2,2)*Zoom，

6. The method as claimed in claim 1, wherein the process of performing rendering transformation compensation on each video frame image according to each perspective projection matrix to obtain a debounced video frame image corresponding to each video frame image comprises:

7. A marine surveillance video picture de-jitter apparatus, comprising:

8. The picture de-jittering device for monitoring videos of ships according to claim 7, wherein a plurality of the video frames to be processed are arranged in sequence, and the image preprocessing module is specifically configured to:

9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for dejittering marine surveillance video frames as recited in any one of claims 1-6.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for dejittering marine surveillance video frames as recited in any one of claims 1-6.