CN112597794A

CN112597794A - Video matching method

Info

Publication number: CN112597794A
Application number: CN202011125897.2A
Authority: CN
Inventors: 季鹏飞; 季坤朋; 周培明
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-10-20
Filing date: 2020-10-20
Publication date: 2021-04-02

Abstract

The invention discloses a video matching method, which comprises the following steps of S1: preprocessing a video to be matched; s2: preprocessing an original video; s3: pre-training a neural network; s4: high-efficiency matching: and carrying out efficient matching operation on the video to be matched and the original video. The method extracts the key frames from the scene transition points in the video, further refines and compresses the key frame data through convolution and pooling operation to form a new key frame sequence, matches the key frame sequence, greatly improves the matching speed through the use of an efficient matching algorithm and a neural network, and greatly improves the matching speed and the matching accuracy compared with the traditional frame-by-frame comparison mode.

Description

Video matching method

Technical Field

The invention relates to the technical field of video matching and screening, in particular to a video matching method.

Background

Due to the rapid development of the internet in recent years, especially the development of the mobile internet, the transmission amount of videos is greatly increased, the copyright of the videos is also emphasized, and the manual judgment of whether a video clip belongs to a certain video is very slow. The traditional automatic judgment mode is frame-by-frame comparison, if all frame data are equal in sequence, the matching is considered to be successful, the matching speed is slow, and the method is invalid under the condition that the aspect ratio of a video clip is changed or the video is degraded.

Disclosure of Invention

The invention aims to provide a video matching method which can quickly and automatically identify whether a video clip belongs to a certain original video or not and can correctly match the video clip even if the video clip is degraded or the aspect ratio is changed.

In order to achieve the purpose, the invention provides the following technical scheme: a method of video matching, comprising the steps of:

s1: preprocessing a video to be matched: ordering preprocessing to-be-matched videos needing to be matched and compared with original videos;

s2, preprocessing the original video: extracting key frames from an original video to form a key frame sequence;

s3: pre-training a neural network: pre-training a matching detection system by using a key frame sequence of a video to be matched, outputting matching similarity of the frame and all key frames of the video to be matched when inputting frame data, obtaining training data by degrading the key frame sequence of the video to be matched, and recording a neural network matching module as L;

s4: high-efficiency matching: and carrying out efficient matching operation on the video to be matched and the original video.

Further, the preprocessing methods in steps S1 and S2 are both: extracting scene switching points in a video to be matched or an original video as key frames to form a key frame sequence, performing convolution operation and maximum pooling operation on each frame image to obtain a compressed key frame sequence only retaining key information, wherein the key frame sequence after preprocessing of the video to be matched is represented as P ═ P₀,P₁,P₂,P₃,...,P_m-1M represents the length of the key frame sequence to be matched, and the original video preprocessed key frame sequence is represented as O ═ O₀，O₁，O₂，...,O_n-1And n represents the length of the original key frame sequence.

Further, an encapsulating module is further arranged in S3, which is denoted as M, M encapsulates the neural network module L, when a frame of data is input, M inputs the frame of data to the neural network module L, and the neural network module L outputs the frame and a key frame sequence of the video to be matchedThe similarity of the columns (i.e., P sequences) forms a similarity sequence, which is respectively designated as alpha₀,α₁,α₂,...,α_m-1M represents the length of the key frame sequence to be matched, and the maximum value in the similarity sequence is taken out and is marked as alpha_maxIf α is_max>Alpha (alpha is a constant set by man), the packaging module M returns alpha_maxCorresponding sequence number in the similarity sequence, otherwise, returning an illegal value which can be [0, m-1]]Any value other than.

Further, the step S4 includes the following matching steps:

initialization: the P sequence is aligned to the left of the O sequence, i.e., to the alignment.

From the left end of the P sequence, the following operations S41 are performed in a loop in turn, S42 and S43 are sub-operations of S41 (i.e., in each loop operation, the current key frame to be matched is P in turn₀,P₁,P₂,P₃,...,P_m-1)：

S41, recording the current key frame to be matched as P_jFind and P_jAligned original key frame is O_iAnd i and j are the key frame sequence numbers. Computing P with a common module S_jAnd O_iIf S (O)_i,P_j) If the frame is more than alpha, the frame is matched, and the process goes to S42; if S (O) is detected_i,P_j) Alpha is not more than alpha, the frame is not matched and the process goes to S43.

S42 if P_jIf the frame is the last frame of the P sequence, the P sequence is successfully matched with the O sequence (namely, the video to be matched is successfully matched with the original video, and the position aligned with the O sequence in the O sequence is the position successfully matched), and the loop is skipped, if the P sequence is the last frame of the P sequence, the loop is skipped_jIs not the last frame of the P sequence, and O_iAnd if the frame is the last frame of the O sequence, the matching failure of the P sequence and the O sequence is indicated, and the loop is skipped. Otherwise, the current loop is continued, that is, the similarity between the next key frame to be matched and the original key frame aligned with the next key frame to be matched is compared.

S43 finding the next frame of the key frame in the O sequence aligned with the last frame of the P sequence, i.e. O_i+m-jUsing pre-trained encapsulated neural netsA collateral M module for calculating M (O)_i+m-j)Obtain a t value, if the t value is legal (i.e. at [0, m-1]]Within), description and O)_i+m-jMatching, then moving the key frame sequence P to be matched to make O_i+m-jAnd P_tAlign and then jump to S41 to match from the beginning; if the value of t does not exist, the key frame sequence P to be matched is directly advanced to the right by m +1, and then the process jumps to S41 to match from the beginning.

The invention has the beneficial effects that:

1. the method extracts the key frames from the scene transition points in the video, further refines and compresses the key frame data through convolution and pooling operation to form a new key frame sequence, matches the key frame sequence, greatly improves the matching speed through the use of an efficient matching algorithm and a neural network, and greatly improves the matching speed and the matching accuracy compared with the traditional frame-by-frame comparison mode.

2. Before matching, the P sequence is used for training a deep neural network matching model for quickly finding out the sequence number matched with the input frame in the P sequence, and compared with the mode that similarity is calculated by using the input frames and key frames in the P sequence one by one, the efficiency is further improved.

3. The mechanism adopted by the invention on the matching algorithm realizes that more P sequences can be moved when the P sequences are not matched, and the performance is improved.

4. The method has stronger fault tolerance, and can obtain a result similar to artificial matching no matter whether the video to be matched is degraded by the aspect ratio or the definition or even degraded by frame skipping.

Drawings

FIG. 1: the invention is a matching graph of the initial position of a video matching method;

FIG. 2: the schematic diagram of successful video frame matching of the video matching method is shown;

FIG. 3: the invention is a video frame matching unsuccessful schematic diagram of the video matching method;

FIG. 4: the invention is a video frame matching flow end chart of the video matching method;

FIG. 5: the invention discloses a schematic block diagram of a video matching method.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to fig. 1 to 5 in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments disclosed herein are intended to be within the scope of the present invention.

The invention provides a technical scheme that: wherein, the public module S is a program used for comparing two video key frames in the prior art, wherein, the threshold value alpha is a numerical value set artificially, the value output by the public module is larger than alpha, the matching is judged to be successful, otherwise, the matching is failed, the neural network matching module is an artificial intelligence technology of supervised learning, which is also the prior art, after the key frame sequence of the video to be matched is input, then inputting the video key frame to be matched after being processed by degradation and the like into the system, manually giving a corresponding matching result, after repeated active training, illegal values or corresponding serial numbers of the video key frames to be matched can be obtained by giving any key frame, in the process of training the neural network, the similarity between the degraded key frame image and the undegraded key frame is required to be consistent with the judgment in the use process of the public module S, namely the operation process is as follows: forming a plurality of degraded frames according to different degradation parameters by using the P sequence key frames, and marking each degraded frame as B_iIts corresponding original non-degraded frame is P_iCalculating B using the common module S_iAnd P_iDegree of similarity of (a)_iNamely: alpha is alpha_i＝S(B_i，P_i) (ii) a The following processes are carried out immediately: if α is_iIf > alpha, then alpha is changed_iIs 1, otherwise alpha is changed_iIs 0. Based on this, we make an output sequence R for training neural network_i＝(0，...，α_i,., 0), the position value in the sequence except for the sequence number l is alpha_iExcept for this, all the positions are 0. Thereby we obtainOne training data: (B)_i，R_i) We can generate many degraded frames from the key frames in the P sequence, and generate many training data using the above method to train the neural network. In order to optimize the output result of the neural network, the neural network is packaged into a module, which is denoted as M, and when frame data is input, M can directly output the sequence number of the P sequence matched with the M. When receiving input frame data, the M inputs the frame data to the neural network module, the neural network module outputs the similarity between the frame and a video key frame sequence to be matched (namely a P sequence), and a similarity sequence is formed and is respectively marked as alpha₀，α₁，α₂，...，α_m-1M represents the length of the key frame sequence to be matched, and the maximum value in the similarity sequence is taken out and is marked as alpha_maxIf α is_maxAlpha (alpha is a constant set by human), the packaging module M returns alpha_maxCorresponding sequence number in the similarity sequence, otherwise, returning an illegal value which can be [0, m-1]]Any value other than.

A method of video matching, comprising the steps of:

s1: preprocessing a video to be matched: preprocessing a video to be matched, which needs to be matched and compared with an original video, extracting scene switching points in the video to be matched as key frames to form a key frame sequence, and performing convolution operation and maximum pooling operation (max pooling) on each frame image to obtain a compressed key frame sequence only retaining key information, wherein the compressed key frame sequence is represented as P (P ═ P)₀，P₁，P₂，P₃，...，P_m-1M represents the length of the key frame sequence to be matched, and the video to be matched is preprocessed in advance, so that the performance is improved;

s2: preprocessing an original video: similarly, for the original video, a key frame sequence may also be extracted, a convolution operation and a maximum pooling operation (max pooling) are performed on each frame image in the original video, so as to obtain a compressed key frame sequence that only retains key information, and the key frame sequence obtained from the original video is represented as O ═ O₀，O₁，O₂，…，O_n-1N represents the length of the original key frame sequence;

s3: pre-training a neural network: pre-training a matching detection system by utilizing a P sequence, matching with a packaging module, outputting a sequence number of the P sequence matched with a frame data when inputting the frame data, wherein the training data can be obtained by degrading the P sequence or degrading a key frame of a video to be matched, marking the neural network matching module integrated with the packaging module as M, and returning an illegal value if the matched sequence number cannot be found, wherein the illegal value can be any value except [0, M-1 ]; if there are multiple matching sequence numbers, only the largest sequence number is returned.

S4: high-efficiency matching: the video to be matched is efficiently matched with the original video, the matching speed is greatly improved by using an efficient matching algorithm and a neural network, compared with the traditional frame-by-frame comparison mode, the matching speed and the matching precision are greatly improved, and in order to express the video conveniently, the common module is recorded as S and S (O)_i，P_j) Representing the calculation of two key frames O_iAnd P_jThe similarity of (c).

Initialization: aligning the P sequence to the left of the O sequence, i.e., P₀And O₀And (4) aligning.

From the left end of the P sequence, the following operations S41 are performed in a loop in turn, S42 and S43 are sub-operations of S41 (i.e., in each loop operation, the current key frame to be matched is P in turn₀，P₁，P₂，P₃，...，P_m-1)：

S41: recording the current key frame to be matched as P_jFind and P_jAligned original key frame is O_iAnd i and j are the key frame sequence numbers. Computing P with a common module S_jAnd O_iIf S (O)_i，P_j) If the frame is more than alpha, the frame is matched, and the process goes to S42; if S (O) is detected_i，P_j) Alpha is not more than alpha, the frame is not matched and the process goes to S43.

S42 if P_jIs the last frame of the P sequence, it indicates that the P sequence is successfully matched with the O sequence (i.e. to be matched)The video is successfully matched with the original video, and the position aligned with the video in the O sequence is the position successfully matched at the moment), and the loop is skipped, if P is_jIs not the last frame of the P sequence, and O_iAnd if the frame is the last frame of the O sequence, the matching failure of the P sequence and the O sequence is indicated, and the loop is skipped. Otherwise, the current loop is continued, that is, the similarity between the next key frame to be matched and the original key frame aligned with the next key frame to be matched is compared.

S43 finding the next frame of the key frame in the O sequence aligned with the last frame of the P sequence, i.e. O_i+m-jCalculating M (O) by using a pre-trained and packaged neural network M module_i+m-j)Obtain a t value, if the t value is legal (i.e. at [0, m-1]]Within), description and O)_i+m-jMatching, then moving the key frame sequence P to be matched to make O_i+m-jAnd P_tAlign and then jump to S41 to match from the beginning; if the value of t does not exist, the key frame sequence P to be matched is directly advanced to the right by m +1, and then the process jumps to S41 to match from the beginning.

The above process description is relatively abstract, and a specific example will now be set forth:

1. let the video sequence P to be matched have only 4 key frames P₀，P₁，P₂，P₃Assume that the original video sequence O has only 11 key frames: o is₀，O₁，O₂，O₃，O₄，O₅，O₆，O₇，O₈，O₉，O₁₀，O₁₁. Key frames with the same mark symbol whose similarity is greater than the threshold α are matched. E.g. O₀And P₀、O₁And P₁、O₄And P₂、O₇And P₀Are all matched.

2. As shown in FIG. 1, the O sequence and the P sequence are first aligned at the left end, and when they are sequentially matched from the starting position, O is detected₂And P₂Is not matched. Then we examine the next frame in the O sequence of the key frame aligned with the last frame of the P sequence, i.e. O₄It is used to match P sequences and find P₂Match, thus move PSequence of P₂And O₄Aligned as shown in fig. 2.

3. In FIG. 2, the match is checked starting from the P sequence header, i.e., P is checked₀And O₂Whether there is a match.

4. As shown in FIG. 3, P₀And O₂Mismatch, checking the next frame of the key frame in the O sequence to be matched, which is aligned with the last frame of the P sequence, i.e. O₆If the matching position cannot be found, the method directly moves to the position of m +1 of the O sequence, wherein m is the length of the P sequence, namely 4, and then the movement is 4+ 1-5. After moving, P₀And O₇And aligning, as shown in figure four.

5. As in FIG. 4, from P₀And O₇The two sequences of position, P and O are matched in sequence until the last P₃Match O₁₀The record matching sequence number is 7.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A method of video matching, comprising the steps of:

s1: preprocessing a video to be matched: preprocessing a video to be matched, which needs to be matched and compared with an original video;

s2: preprocessing an original video: preprocessing an original video;

s3: pre-training a neural network: the pre-trained neural network is characterized in that a matching detection system is pre-trained by utilizing a key frame sequence of a video to be matched, when frame data is input, matching similarity of the frame and all key frames of the video to be matched can be output, training data can be obtained by degrading the key frame sequence of the video to be matched, and a neural network matching module is marked as L;

2. The method of claim 1, wherein: the preprocessing methods in steps S1 and S2 are both: extracting scene switching points in a video to be matched or an original video as key frames to form a key frame sequence, performing convolution operation and maximum pooling operation on each frame image to obtain a compressed key frame sequence only retaining key information, wherein the key frame sequence after preprocessing of the video to be matched is represented as P ═ P₀,P₁,P₂,P₃,...,P_m-1M represents the length of the key frame sequence to be matched, and the original video preprocessed key frame sequence is represented as O ═ O₀，O₁，O₂，...,O_n-1And n represents the length of the original key frame sequence.

3. A method of video matching as claimed in claim 2, wherein: the S3 is also provided with an encapsulation module marked as M, the M encapsulates a neural network module L, when frame data is input, the M inputs the frame data into the neural network module L, the neural network module L outputs the similarity of the frame and a video key frame sequence (namely a P sequence) to be matched to form a similarity sequence respectively marked as alpha₀,α₁,α₂,...,α_m-1M represents the length of the sequence of key frames to be matchedAnd taking out the maximum value in the similarity sequence and marking as alpha_maxIf α is_max>Alpha (alpha is a constant set by man), the packaging module M returns alpha_maxCorresponding sequence number in the similarity sequence, otherwise, returning an illegal value which can be [0, m-1]]Any value other than.

4. A method of video matching as claimed in claim 3, wherein: the step S4 includes the following matching steps:

initialization: aligning the P sequence to the left of the O sequence, i.e., P₀And O₀Aligning;

S41, recording the current key frame to be matched as P_jFind and P_jAligned original key frame is O_iI and j are the serial numbers of the key frames, and P is calculated by using a common module S_jAnd O_iIf S (O)_i,P_j) If the frame is more than alpha, the frame is matched, and the process goes to S42; if S (O) is detected_i,P_j) If the frame is not matched, entering S43;

s42 if P_jIf the frame is the last frame of the P sequence, the P sequence is successfully matched with the O sequence (namely, the video to be matched is successfully matched with the original video, and the aligned position in the O sequence is the position of successful matching), and the loop is skipped; if P is_jIs not the last frame of the P sequence, and O_iIf the frame is the last frame of the O sequence, the matching failure of the P sequence and the O sequence is indicated, and the circulation is skipped; if not, continuing the current cycle, namely comparing the similarity of the next key frame to be matched with the original key frame aligned with the next key frame to be matched;

s43 finding the next frame of the key frame in the O sequence aligned with the last frame of the P sequence, i.e. O_i+m-jCalculating M (O) by using a pre-trained and packaged neural network M module_i+m-j)To obtain aIf the t value is legal (i.e., at [0, m-1]]Within), description and O)_i+m-jMatching, then moving the key frame sequence P to be matched to make O_i+m-jAnd P_tAlign and then jump to S41 to match from the beginning; if the value of t does not exist, the key frame sequence P to be matched is directly advanced to the right by m +1, and then the process jumps to S41 to match from the beginning.