CN112597794A - Video matching method - Google Patents

Video matching method Download PDF

Info

Publication number
CN112597794A
CN112597794A CN202011125897.2A CN202011125897A CN112597794A CN 112597794 A CN112597794 A CN 112597794A CN 202011125897 A CN202011125897 A CN 202011125897A CN 112597794 A CN112597794 A CN 112597794A
Authority
CN
China
Prior art keywords
sequence
matched
frame
video
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011125897.2A
Other languages
Chinese (zh)
Inventor
季鹏飞
季坤朋
周培明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202011125897.2A priority Critical patent/CN112597794A/en
Publication of CN112597794A publication Critical patent/CN112597794A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video matching method, which comprises the following steps of S1: preprocessing a video to be matched; s2: preprocessing an original video; s3: pre-training a neural network; s4: high-efficiency matching: and carrying out efficient matching operation on the video to be matched and the original video. The method extracts the key frames from the scene transition points in the video, further refines and compresses the key frame data through convolution and pooling operation to form a new key frame sequence, matches the key frame sequence, greatly improves the matching speed through the use of an efficient matching algorithm and a neural network, and greatly improves the matching speed and the matching accuracy compared with the traditional frame-by-frame comparison mode.

Description

Video matching method
Technical Field
The invention relates to the technical field of video matching and screening, in particular to a video matching method.
Background
Due to the rapid development of the internet in recent years, especially the development of the mobile internet, the transmission amount of videos is greatly increased, the copyright of the videos is also emphasized, and the manual judgment of whether a video clip belongs to a certain video is very slow. The traditional automatic judgment mode is frame-by-frame comparison, if all frame data are equal in sequence, the matching is considered to be successful, the matching speed is slow, and the method is invalid under the condition that the aspect ratio of a video clip is changed or the video is degraded.
Disclosure of Invention
The invention aims to provide a video matching method which can quickly and automatically identify whether a video clip belongs to a certain original video or not and can correctly match the video clip even if the video clip is degraded or the aspect ratio is changed.
In order to achieve the purpose, the invention provides the following technical scheme: a method of video matching, comprising the steps of:
s1: preprocessing a video to be matched: ordering preprocessing to-be-matched videos needing to be matched and compared with original videos;
s2, preprocessing the original video: extracting key frames from an original video to form a key frame sequence;
s3: pre-training a neural network: pre-training a matching detection system by using a key frame sequence of a video to be matched, outputting matching similarity of the frame and all key frames of the video to be matched when inputting frame data, obtaining training data by degrading the key frame sequence of the video to be matched, and recording a neural network matching module as L;
s4: high-efficiency matching: and carrying out efficient matching operation on the video to be matched and the original video.
Further, the preprocessing methods in steps S1 and S2 are both: extracting scene switching points in a video to be matched or an original video as key frames to form a key frame sequence, performing convolution operation and maximum pooling operation on each frame image to obtain a compressed key frame sequence only retaining key information, wherein the key frame sequence after preprocessing of the video to be matched is represented as P ═ P0,P1,P2,P3,...,Pm-1M represents the length of the key frame sequence to be matched, and the original video preprocessed key frame sequence is represented as O ═ O0,O1,O2,...,On-1And n represents the length of the original key frame sequence.
Further, an encapsulating module is further arranged in S3, which is denoted as M, M encapsulates the neural network module L, when a frame of data is input, M inputs the frame of data to the neural network module L, and the neural network module L outputs the frame and a key frame sequence of the video to be matchedThe similarity of the columns (i.e., P sequences) forms a similarity sequence, which is respectively designated as alpha012,...,αm-1M represents the length of the key frame sequence to be matched, and the maximum value in the similarity sequence is taken out and is marked as alphamaxIf α ismax>Alpha (alpha is a constant set by man), the packaging module M returns alphamaxCorresponding sequence number in the similarity sequence, otherwise, returning an illegal value which can be [0, m-1]]Any value other than.
Further, the step S4 includes the following matching steps:
initialization: the P sequence is aligned to the left of the O sequence, i.e., to the alignment.
From the left end of the P sequence, the following operations S41 are performed in a loop in turn, S42 and S43 are sub-operations of S41 (i.e., in each loop operation, the current key frame to be matched is P in turn0,P1,P2,P3,...,Pm-1):
S41, recording the current key frame to be matched as PjFind and PjAligned original key frame is OiAnd i and j are the key frame sequence numbers. Computing P with a common module SjAnd OiIf S (O)i,Pj) If the frame is more than alpha, the frame is matched, and the process goes to S42; if S (O) is detectedi,Pj) Alpha is not more than alpha, the frame is not matched and the process goes to S43.
S42 if PjIf the frame is the last frame of the P sequence, the P sequence is successfully matched with the O sequence (namely, the video to be matched is successfully matched with the original video, and the position aligned with the O sequence in the O sequence is the position successfully matched), and the loop is skipped, if the P sequence is the last frame of the P sequence, the loop is skippedjIs not the last frame of the P sequence, and OiAnd if the frame is the last frame of the O sequence, the matching failure of the P sequence and the O sequence is indicated, and the loop is skipped. Otherwise, the current loop is continued, that is, the similarity between the next key frame to be matched and the original key frame aligned with the next key frame to be matched is compared.
S43 finding the next frame of the key frame in the O sequence aligned with the last frame of the P sequence, i.e. Oi+m-jUsing pre-trained encapsulated neural netsA collateral M module for calculating M (O)i+m-j)Obtain a t value, if the t value is legal (i.e. at [0, m-1]]Within), description and O)i+m-jMatching, then moving the key frame sequence P to be matched to make Oi+m-jAnd PtAlign and then jump to S41 to match from the beginning; if the value of t does not exist, the key frame sequence P to be matched is directly advanced to the right by m +1, and then the process jumps to S41 to match from the beginning.
The invention has the beneficial effects that:
1. the method extracts the key frames from the scene transition points in the video, further refines and compresses the key frame data through convolution and pooling operation to form a new key frame sequence, matches the key frame sequence, greatly improves the matching speed through the use of an efficient matching algorithm and a neural network, and greatly improves the matching speed and the matching accuracy compared with the traditional frame-by-frame comparison mode.
2. Before matching, the P sequence is used for training a deep neural network matching model for quickly finding out the sequence number matched with the input frame in the P sequence, and compared with the mode that similarity is calculated by using the input frames and key frames in the P sequence one by one, the efficiency is further improved.
3. The mechanism adopted by the invention on the matching algorithm realizes that more P sequences can be moved when the P sequences are not matched, and the performance is improved.
4. The method has stronger fault tolerance, and can obtain a result similar to artificial matching no matter whether the video to be matched is degraded by the aspect ratio or the definition or even degraded by frame skipping.
Drawings
FIG. 1: the invention is a matching graph of the initial position of a video matching method;
FIG. 2: the schematic diagram of successful video frame matching of the video matching method is shown;
FIG. 3: the invention is a video frame matching unsuccessful schematic diagram of the video matching method;
FIG. 4: the invention is a video frame matching flow end chart of the video matching method;
FIG. 5: the invention discloses a schematic block diagram of a video matching method.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to fig. 1 to 5 in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments disclosed herein are intended to be within the scope of the present invention.
The invention provides a technical scheme that: wherein, the public module S is a program used for comparing two video key frames in the prior art, wherein, the threshold value alpha is a numerical value set artificially, the value output by the public module is larger than alpha, the matching is judged to be successful, otherwise, the matching is failed, the neural network matching module is an artificial intelligence technology of supervised learning, which is also the prior art, after the key frame sequence of the video to be matched is input, then inputting the video key frame to be matched after being processed by degradation and the like into the system, manually giving a corresponding matching result, after repeated active training, illegal values or corresponding serial numbers of the video key frames to be matched can be obtained by giving any key frame, in the process of training the neural network, the similarity between the degraded key frame image and the undegraded key frame is required to be consistent with the judgment in the use process of the public module S, namely the operation process is as follows: forming a plurality of degraded frames according to different degradation parameters by using the P sequence key frames, and marking each degraded frame as BiIts corresponding original non-degraded frame is PiCalculating B using the common module SiAnd PiDegree of similarity of (a)iNamely: alpha is alphai=S(Bi,Pi) (ii) a The following processes are carried out immediately: if α isiIf > alpha, then alpha is changediIs 1, otherwise alpha is changediIs 0. Based on this, we make an output sequence R for training neural networki=(0,...,αi,., 0), the position value in the sequence except for the sequence number l is alphaiExcept for this, all the positions are 0. Thereby we obtainOne training data: (B)i,Ri) We can generate many degraded frames from the key frames in the P sequence, and generate many training data using the above method to train the neural network. In order to optimize the output result of the neural network, the neural network is packaged into a module, which is denoted as M, and when frame data is input, M can directly output the sequence number of the P sequence matched with the M. When receiving input frame data, the M inputs the frame data to the neural network module, the neural network module outputs the similarity between the frame and a video key frame sequence to be matched (namely a P sequence), and a similarity sequence is formed and is respectively marked as alpha0,α1,α2,...,αm-1M represents the length of the key frame sequence to be matched, and the maximum value in the similarity sequence is taken out and is marked as alphamaxIf α ismaxAlpha (alpha is a constant set by human), the packaging module M returns alphamaxCorresponding sequence number in the similarity sequence, otherwise, returning an illegal value which can be [0, m-1]]Any value other than.
A method of video matching, comprising the steps of:
s1: preprocessing a video to be matched: preprocessing a video to be matched, which needs to be matched and compared with an original video, extracting scene switching points in the video to be matched as key frames to form a key frame sequence, and performing convolution operation and maximum pooling operation (max pooling) on each frame image to obtain a compressed key frame sequence only retaining key information, wherein the compressed key frame sequence is represented as P (P ═ P)0,P1,P2,P3,...,Pm-1M represents the length of the key frame sequence to be matched, and the video to be matched is preprocessed in advance, so that the performance is improved;
s2: preprocessing an original video: similarly, for the original video, a key frame sequence may also be extracted, a convolution operation and a maximum pooling operation (max pooling) are performed on each frame image in the original video, so as to obtain a compressed key frame sequence that only retains key information, and the key frame sequence obtained from the original video is represented as O ═ O0,O1,O2,…,On-1N represents the length of the original key frame sequence;
s3: pre-training a neural network: pre-training a matching detection system by utilizing a P sequence, matching with a packaging module, outputting a sequence number of the P sequence matched with a frame data when inputting the frame data, wherein the training data can be obtained by degrading the P sequence or degrading a key frame of a video to be matched, marking the neural network matching module integrated with the packaging module as M, and returning an illegal value if the matched sequence number cannot be found, wherein the illegal value can be any value except [0, M-1 ]; if there are multiple matching sequence numbers, only the largest sequence number is returned.
S4: high-efficiency matching: the video to be matched is efficiently matched with the original video, the matching speed is greatly improved by using an efficient matching algorithm and a neural network, compared with the traditional frame-by-frame comparison mode, the matching speed and the matching precision are greatly improved, and in order to express the video conveniently, the common module is recorded as S and S (O)i,Pj) Representing the calculation of two key frames OiAnd PjThe similarity of (c).
Initialization: aligning the P sequence to the left of the O sequence, i.e., P0And O0And (4) aligning.
From the left end of the P sequence, the following operations S41 are performed in a loop in turn, S42 and S43 are sub-operations of S41 (i.e., in each loop operation, the current key frame to be matched is P in turn0,P1,P2,P3,...,Pm-1):
S41: recording the current key frame to be matched as PjFind and PjAligned original key frame is OiAnd i and j are the key frame sequence numbers. Computing P with a common module SjAnd OiIf S (O)i,Pj) If the frame is more than alpha, the frame is matched, and the process goes to S42; if S (O) is detectedi,Pj) Alpha is not more than alpha, the frame is not matched and the process goes to S43.
S42 if PjIs the last frame of the P sequence, it indicates that the P sequence is successfully matched with the O sequence (i.e. to be matched)The video is successfully matched with the original video, and the position aligned with the video in the O sequence is the position successfully matched at the moment), and the loop is skipped, if P isjIs not the last frame of the P sequence, and OiAnd if the frame is the last frame of the O sequence, the matching failure of the P sequence and the O sequence is indicated, and the loop is skipped. Otherwise, the current loop is continued, that is, the similarity between the next key frame to be matched and the original key frame aligned with the next key frame to be matched is compared.
S43 finding the next frame of the key frame in the O sequence aligned with the last frame of the P sequence, i.e. Oi+m-jCalculating M (O) by using a pre-trained and packaged neural network M modulei+m-j)Obtain a t value, if the t value is legal (i.e. at [0, m-1]]Within), description and O)i+m-jMatching, then moving the key frame sequence P to be matched to make Oi+m-jAnd PtAlign and then jump to S41 to match from the beginning; if the value of t does not exist, the key frame sequence P to be matched is directly advanced to the right by m +1, and then the process jumps to S41 to match from the beginning.
The above process description is relatively abstract, and a specific example will now be set forth:
1. let the video sequence P to be matched have only 4 key frames P0,P1,P2,P3Assume that the original video sequence O has only 11 key frames: o is0,O1,O2,O3,O4,O5,O6,O7,O8,O9,O10,O11. Key frames with the same mark symbol whose similarity is greater than the threshold α are matched. E.g. O0And P0、O1And P1、O4And P2、O7And P0Are all matched.
2. As shown in FIG. 1, the O sequence and the P sequence are first aligned at the left end, and when they are sequentially matched from the starting position, O is detected2And P2Is not matched. Then we examine the next frame in the O sequence of the key frame aligned with the last frame of the P sequence, i.e. O4It is used to match P sequences and find P2Match, thus move PSequence of P2And O4Aligned as shown in fig. 2.
3. In FIG. 2, the match is checked starting from the P sequence header, i.e., P is checked0And O2Whether there is a match.
4. As shown in FIG. 3, P0And O2Mismatch, checking the next frame of the key frame in the O sequence to be matched, which is aligned with the last frame of the P sequence, i.e. O6If the matching position cannot be found, the method directly moves to the position of m +1 of the O sequence, wherein m is the length of the P sequence, namely 4, and then the movement is 4+ 1-5. After moving, P0And O7And aligning, as shown in figure four.
5. As in FIG. 4, from P0And O7The two sequences of position, P and O are matched in sequence until the last P3Match O10The record matching sequence number is 7.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (4)

1. A method of video matching, comprising the steps of:
s1: preprocessing a video to be matched: preprocessing a video to be matched, which needs to be matched and compared with an original video;
s2: preprocessing an original video: preprocessing an original video;
s3: pre-training a neural network: the pre-trained neural network is characterized in that a matching detection system is pre-trained by utilizing a key frame sequence of a video to be matched, when frame data is input, matching similarity of the frame and all key frames of the video to be matched can be output, training data can be obtained by degrading the key frame sequence of the video to be matched, and a neural network matching module is marked as L;
s4: high-efficiency matching: and carrying out efficient matching operation on the video to be matched and the original video.
2. The method of claim 1, wherein: the preprocessing methods in steps S1 and S2 are both: extracting scene switching points in a video to be matched or an original video as key frames to form a key frame sequence, performing convolution operation and maximum pooling operation on each frame image to obtain a compressed key frame sequence only retaining key information, wherein the key frame sequence after preprocessing of the video to be matched is represented as P ═ P0,P1,P2,P3,...,Pm-1M represents the length of the key frame sequence to be matched, and the original video preprocessed key frame sequence is represented as O ═ O0,O1,O2,...,On-1And n represents the length of the original key frame sequence.
3. A method of video matching as claimed in claim 2, wherein: the S3 is also provided with an encapsulation module marked as M, the M encapsulates a neural network module L, when frame data is input, the M inputs the frame data into the neural network module L, the neural network module L outputs the similarity of the frame and a video key frame sequence (namely a P sequence) to be matched to form a similarity sequence respectively marked as alpha012,...,αm-1M represents the length of the sequence of key frames to be matchedAnd taking out the maximum value in the similarity sequence and marking as alphamaxIf α ismax>Alpha (alpha is a constant set by man), the packaging module M returns alphamaxCorresponding sequence number in the similarity sequence, otherwise, returning an illegal value which can be [0, m-1]]Any value other than.
4. A method of video matching as claimed in claim 3, wherein: the step S4 includes the following matching steps:
initialization: aligning the P sequence to the left of the O sequence, i.e., P0And O0Aligning;
from the left end of the P sequence, the following operations S41 are performed in a loop in turn, S42 and S43 are sub-operations of S41 (i.e., in each loop operation, the current key frame to be matched is P in turn0,P1,P2,P3,...,Pm-1):
S41, recording the current key frame to be matched as PjFind and PjAligned original key frame is OiI and j are the serial numbers of the key frames, and P is calculated by using a common module SjAnd OiIf S (O)i,Pj) If the frame is more than alpha, the frame is matched, and the process goes to S42; if S (O) is detectedi,Pj) If the frame is not matched, entering S43;
s42 if PjIf the frame is the last frame of the P sequence, the P sequence is successfully matched with the O sequence (namely, the video to be matched is successfully matched with the original video, and the aligned position in the O sequence is the position of successful matching), and the loop is skipped; if P isjIs not the last frame of the P sequence, and OiIf the frame is the last frame of the O sequence, the matching failure of the P sequence and the O sequence is indicated, and the circulation is skipped; if not, continuing the current cycle, namely comparing the similarity of the next key frame to be matched with the original key frame aligned with the next key frame to be matched;
s43 finding the next frame of the key frame in the O sequence aligned with the last frame of the P sequence, i.e. Oi+m-jCalculating M (O) by using a pre-trained and packaged neural network M modulei+m-j)To obtain aIf the t value is legal (i.e., at [0, m-1]]Within), description and O)i+m-jMatching, then moving the key frame sequence P to be matched to make Oi+m-jAnd PtAlign and then jump to S41 to match from the beginning; if the value of t does not exist, the key frame sequence P to be matched is directly advanced to the right by m +1, and then the process jumps to S41 to match from the beginning.
CN202011125897.2A 2020-10-20 2020-10-20 Video matching method Pending CN112597794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011125897.2A CN112597794A (en) 2020-10-20 2020-10-20 Video matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011125897.2A CN112597794A (en) 2020-10-20 2020-10-20 Video matching method

Publications (1)

Publication Number Publication Date
CN112597794A true CN112597794A (en) 2021-04-02

Family

ID=75180366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011125897.2A Pending CN112597794A (en) 2020-10-20 2020-10-20 Video matching method

Country Status (1)

Country Link
CN (1) CN112597794A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115243073A (en) * 2022-07-22 2022-10-25 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115243073A (en) * 2022-07-22 2022-10-25 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium
CN115243073B (en) * 2022-07-22 2024-05-14 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Lin et al. Bsn: Boundary sensitive network for temporal action proposal generation
CN109919032B (en) Video abnormal behavior detection method based on motion prediction
CN111079646A (en) Method and system for positioning weak surveillance video time sequence action based on deep learning
CN110135386B (en) Human body action recognition method and system based on deep learning
CN108805036B (en) Unsupervised video semantic extraction method
CN109711380A (en) A kind of timing behavior segment generation system and method based on global context information
CN108921032B (en) Novel video semantic extraction method based on deep learning model
CN111091839B (en) Voice awakening method and device, storage medium and intelligent device
CN112329794A (en) Image description method based on double self-attention mechanism
CN115022711B (en) System and method for ordering shot videos in movie scene
CN111079539A (en) Video abnormal behavior detection method based on abnormal tracking
CN112801068A (en) Video multi-target tracking and segmenting system and method
CN112507778B (en) Loop detection method of improved bag-of-words model based on line characteristics
CN110602504A (en) Video decompression method and system based on YOLOv2 target detection algorithm
CN109949217A (en) Video super-resolution method for reconstructing based on residual error study and implicit motion compensation
Zhang et al. Multiscale adaptation fusion networks for depth completion
CN112597794A (en) Video matching method
CN114996495A (en) Single-sample image segmentation method and device based on multiple prototypes and iterative enhancement
CN117197727B (en) Global space-time feature learning-based behavior detection method and system
Fujitake et al. Temporally-aware convolutional block attention module for video text detection
CN112348033B (en) Collaborative saliency target detection method
CN112508121A (en) Method and system for sensing outside by industrial robot
CN115496134B (en) Traffic scene video description generation method and device based on multi-mode feature fusion
CN110753228A (en) Garage monitoring video compression method and system based on Yolov1 target detection algorithm
CN115222959A (en) Lightweight convolutional network and Transformer combined human body key point detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210402

WD01 Invention patent application deemed withdrawn after publication