CN117058581A - Video duplication detection method and device - Google Patents

Video duplication detection method and device Download PDF

Info

Publication number
CN117058581A
CN117058581A CN202311000145.7A CN202311000145A CN117058581A CN 117058581 A CN117058581 A CN 117058581A CN 202311000145 A CN202311000145 A CN 202311000145A CN 117058581 A CN117058581 A CN 117058581A
Authority
CN
China
Prior art keywords
video
detected
candidate
frame
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311000145.7A
Other languages
Chinese (zh)
Inventor
黄泱柯
陈劲
张彪
余意
杨杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan MgtvCom Interactive Entertainment Media Co Ltd
Original Assignee
Hunan MgtvCom Interactive Entertainment Media Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan MgtvCom Interactive Entertainment Media Co Ltd filed Critical Hunan MgtvCom Interactive Entertainment Media Co Ltd
Priority to CN202311000145.7A priority Critical patent/CN117058581A/en
Publication of CN117058581A publication Critical patent/CN117058581A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a video duplication detection method and device, comprising the following steps: extracting at least two frames of detection images from the video to be detected according to a preset period; performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images; obtaining video features of the video to be detected based on the detected frame features of the at least two detected images; searching in a preset video feature database based on the video features to obtain a video set, wherein the video set comprises candidate videos, and the candidate videos and the video to be detected meet a first similar condition; obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database; and determining target video meeting a second similar condition with the video to be detected in a video set based on the detected frame characteristic and the candidate frame characteristic.

Description

Video duplication detection method and device
Technical Field
The present application relates to the field of information technologies, and in particular, to a method and apparatus for detecting video duplication.
Background
The video duplication detection is a detection method for determining similar videos, specifically, a user inputs a video, and other videos with similar contents can be automatically searched in a base.
Video duplication detection is understood to be a video fingerprint that can uniquely identify a video and can be used for similarity comparison of video content, and in a short video platform, a large number of short videos are produced every day, and the short videos inevitably have duplication.
When a user watches videos, the user sees two repeated videos in a short time, and the watching experience of the user is obviously affected. Therefore, deduplication of video within a video library is required.
Disclosure of Invention
In view of this, the present application provides a video duplication detection method and apparatus, as follows:
a video duplication detection method comprising:
extracting at least two frames of detection images from the video to be detected according to a preset period;
performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
searching in a preset video feature database based on the video features to obtain a video set, wherein the video set comprises at least two candidate videos, and the candidate videos and the video to be detected meet a first similar condition;
Obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
and determining target video meeting a second similar condition with the video to be detected in a video set based on the detected frame characteristic and the candidate frame characteristic.
Optionally, in the above method, the obtaining the video feature of the video to be detected based on the detected frame features of the at least two detected images includes:
and averaging the detected frame characteristics of the at least two frames of detected images to obtain the video characteristics of the video to be detected.
Optionally, in the above method, after extracting at least two frames of detection images from the video to be detected according to a preset period, the method further includes:
and performing scaling treatment on the detection image to obtain the detection image meeting the input requirement of the preset feature extraction model.
Optionally, in the method, the determining, based on the detected frame feature and the candidate frame feature, a target video that meets a second similar condition to the video to be detected in the at least two candidate videos includes:
sequentially obtaining similarity matrixes of the detected frame features and the candidate frame features corresponding to each candidate video according to the detected frame features of the video to be detected and the candidate frame features corresponding to each candidate video in a video set;
And determining a target video with similar fragments with the video to be detected based on the similarity matrix.
Optionally, in the above method, the sequentially obtaining, according to the detected frame feature of the video to be detected and the candidate frame feature corresponding to each candidate video in the at least two candidate videos, a similarity matrix of the detected frame feature and the candidate frame feature corresponding to each candidate video includes:
combining the detection frame characteristics of at least two frames of detection images to obtain a detection frame characteristic matrix;
combining the candidate frame characteristics corresponding to each candidate video of the video set in sequence to obtain a candidate frame characteristic matrix corresponding to each candidate video;
and calculating the similarity matrix of the detection frame feature matrix and each candidate frame feature matrix in turn.
Optionally, the method calculates a similarity matrix of the detected frame feature matrix and the candidate frame feature matrix, including:
and carrying out point multiplication on the detection frame feature matrix and the candidate frame feature matrix to obtain the similarity moment.
Optionally, the method, based on the similarity matrix, determines a target video having a similar segment to the video to be detected, including:
Screening the similarity matrix based on a preset threshold value to obtain at least two target elements in the similarity matrix, wherein the target elements are elements larger than the preset threshold value;
and if the at least two first elements meet a preset position continuous condition, determining that the target video exists in the at least two candidate videos.
Optionally, the method further comprises:
and analyzing the obtained target fragments in the target video based on the at least two target elements and the positions of the target elements in the similarity matrix, wherein the target fragments are similar fragments in the target video to the video to be detected.
Optionally, in the above method, before extracting at least two frames of detection images from the video to be detected according to a preset period, the method further includes:
training the original feature extraction model based on a preset training image to obtain a preset feature extraction model.
A video duplication detection apparatus comprising:
the image extraction module is used for extracting at least two frames of detection images from the video to be detected according to a preset period;
the feature extraction module is used for carrying out feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
The video feature obtaining module is used for obtaining the video feature of the video to be detected based on the detected frame features of the at least two frames of detected images;
the searching module is used for searching in a preset video feature database based on the video features to obtain a video set, wherein the video set comprises at least two candidate videos, and the candidate videos and the video to be detected meet a first similar condition;
the obtaining module is used for obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
and the determining module is used for determining target video which meets a second similar condition with the video to be detected in a video set based on the detected frame characteristics and the candidate frame characteristics.
In summary, the present application provides a method and apparatus for detecting video duplication, including: extracting at least two frames of detection images from the video to be detected according to a preset period; performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images; obtaining video features of the video to be detected based on the detected frame features of the at least two detected images; searching in a preset video feature database based on the video features to obtain a video set, wherein the video set comprises at least two candidate videos, and the candidate videos and the video to be detected meet a first similar condition; obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database; and determining target video meeting a second similar condition with the video to be detected in a video set based on the detected frame characteristic and the candidate frame characteristic. In this embodiment, the video is first subjected to coarse granularity sorting based on the video features, and then subjected to fine granularity sorting based on the frame features, so that the target video is determined, and a mode of combining coarse granularity and fine granularity is realized, so that the capability of quick retrieval can be achieved while higher detection precision of similar fragments is maintained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an embodiment 1 of a video duplication detection method provided by the present application;
fig. 2 is a flowchart of an embodiment 2 of a video duplication detection method provided by the present application;
FIG. 3 is a flowchart of an embodiment 3 of a video duplication detection method provided by the present application;
FIG. 4 is a flowchart of an embodiment 4 of a video duplication detection method provided by the present application;
FIG. 5 is a schematic diagram of a similarity matrix in embodiment 4 of a video duplication detection method provided by the present application;
FIG. 6 is a flowchart of an embodiment 5 of a video duplication detection method provided by the present application;
FIG. 7 is a schematic diagram of a similarity matrix in embodiment 5 of a video duplication detection method provided by the present application;
FIG. 8 is a flowchart of an embodiment 6 of a video duplication detection method provided by the present application;
Fig. 9 is a schematic structural diagram of a preset feature extraction model in embodiment 6 of a video duplication detection method according to the present application;
fig. 10 is a schematic structural diagram of an embodiment of a video duplication detecting apparatus provided by the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As shown in fig. 1, a flowchart of an embodiment 1 of a video duplication detection method according to the present application is applied to an electronic device, and the method includes the following steps:
step S101: extracting at least two frames of detection images from the video to be detected according to a preset period;
in order to reduce the data volume of subsequent processing, frames of the video to be detected are extracted, and the subsequent analysis is performed based on a plurality of extracted frame images.
Specifically, the value of the preset period is combined with the condition of the image output speed and the picture redundancy in the video to be detected.
For example, the preset period takes 1 second, and a frame of detection image is extracted from the video to be detected every 1 second.
It should be noted that, in this embodiment, a uniform frame extraction manner is adopted, and according to a preset period, a detection image is extracted from a video to be detected, so that uniform frame extraction is implemented for the video to be detected, and a basis is provided for a subsequent processing process.
The number of frames of the detected image is related to the length of the video to be detected, and the longer the length of the video to be detected is, the more the number of frames of the detected image is.
Step S102: performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
the electronic equipment is also provided with a preset feature extraction model.
The preset feature extraction model performs feature extraction on the input image to obtain the feature of the detection frame.
After the video to be detected is subjected to image extraction to obtain a detection image, the detection image is sequentially input into the preset feature extraction model, so that the preset feature extraction model respectively performs feature extraction on the multi-frame detection image to obtain detection frame features corresponding to each frame of detection image.
In the implementation, in the process of extracting detection images from the video to be detected, after the detection image detection extraction is finished, each frame of detection image obtained by extraction is respectively input into a preset feature extraction model to obtain corresponding detection frame features; or extracting a frame of detection image, directly inputting the detection image into a preset feature extraction model to obtain corresponding detection frame features, and extracting the detection frame features of the last frame of detection image by the preset feature extraction model after the detection image extraction is finished.
For example, the preset feature model extracts features for one frame of detected image to obtain 256-dimensional features, and correspondingly, features of each frame of detected image are 256 dimensions, and m (m is an integer greater than 1) frames of detected image are extracted for the video to be detected, and correspondingly, the extracted feature dimensions are (m, 256). Where 256 is the feature vector representation dimension for each frame.
After extracting and obtaining the detection frame characteristics of each frame of detection image, inserting the detection frame characteristics into a preset frame characteristic database for storage.
In a specific implementation, the preset frame feature database adopts a MySQL (mysql_get_proco_info) database.
In specific implementation, index information is set in a preset frame feature database, and the stored frame features can be searched and inquired according to the index information.
Step S103: obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
and processing the detection frame characteristics based on the multi-frame detection images respectively to obtain the video characteristics of the whole video to be detected.
Specifically, the data corresponding to the detected frame characteristics of the at least two detected images are averaged to obtain the video characteristics of the video to be detected.
The video feature may be represented by a video feature vector, which is directly obtained by averaging the features of each video frame. The m 256-dimensional vectors are averaged to obtain a feature representation 256-dimensional feature of the entire video.
After the video features are obtained, the video features are inserted into a video feature database for storage.
In particular embodiments, the video feature database may employ a milvus database
Step S104: searching in a preset video feature database based on the video features to obtain a video set;
the video set comprises candidate videos, and the candidate videos and the video to be detected meet a first similar condition.
The first similarity condition is that the candidate video and the whole of the video to be detected have a certain similarity, for example, the similarity of the video characteristics of the candidate video and the video to be detected is greater than a certain threshold value, or the similarity is ranked in the first few bits, etc.
The electronic equipment is also provided with a preset video feature database, and the video features of the massive videos/video fragments are stored in the preset feature database.
Specifically, a video set similar to the video to be detected is determined based on the video feature database and the video features.
Specifically, the similarity between the video features of the video to be detected and any video features in a preset video feature data block is calculated, and according to the similarity comparison, a plurality of candidate videos with higher similarity are determined as the candidate videos, wherein the videos corresponding to the video features in the preset video feature database are videos meeting the first similarity condition with the video to be detected.
Specifically, the similarity of the video features is calculated by adopting cosine distance between the two features.
In a specific implementation, the video feature database may adopt a milvus database, where the database is a high-performance search database capable of automatically calculating cosine distances and sorting, and selecting a number of candidate videos (such as 5 videos, 10 videos, etc.) with a contracted number from high to low, and combining the video sets of the candidate videos.
The application does not limit the number of candidate videos.
Specifically, if no candidate video meeting the first similar condition with the video to be detected is searched in the video feature database, ending.
It should be noted that, searching is performed in a preset video feature database based on video features, and the obtained candidate video is a video which is similar to the video to be detected on the whole, and coarse granularity detection is realized in the step.
For example, among several videos, 10 videos with higher similarity are selected as candidate videos.
Step S105: obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
the frame characteristics of a large number of videos are stored in a preset frame characteristic database, and particularly the frame characteristics of the same video are uniformly stored.
Correspondingly, after the candidate video is determined, the candidate frame characteristics corresponding to the candidate video are obtained in a preset frame characteristic database.
It should be noted that the formats of the candidate frame features corresponding to any candidate video are different.
The number of the extracted images is not constant when the candidate video is extracted to obtain the images, so that the number of the frame features obtained by extracting the features of a plurality of frame images extracted from the candidate video is also different.
For example, if n frames are extracted from the candidate video, the extracted frame feature dimension is (n, 256).
Step S106: and determining target video meeting a second similar condition with the video to be detected in the video set based on the detected frame characteristic and the candidate frame characteristic.
Based on the detected frame characteristics of the video to be detected and the candidate frame characteristics of the candidate video, judging the similarity conditions of the two videos, sequentially judging the similarity conditions of a plurality of candidate videos in the video set and the video to be detected, and finally determining the target video meeting the second similarity condition.
The second similarity condition may be that the similarity is greater than a threshold value, or that the similarity is the first one or several of the similarity sequences, etc.
It should be noted that, the similarity determination of the candidate video and the video to be detected based on the frame characteristics is to detect the video from the frame angle, and this step realizes fine granularity detection.
In this embodiment, video duplication detection is performed on the video to be detected from two directions of coarse granularity and fine granularity, so that higher detection precision can be ensured.
In the following embodiments, the determining process will be described in detail, which is not described in detail in this embodiment.
In summary, the method for detecting video duplication provided in this embodiment includes: extracting at least two frames of detection images from the video to be detected according to a preset period; performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images; obtaining video features of the video to be detected based on the detected frame features of the at least two detected images; searching in a preset video feature database based on the video features to obtain a video set, wherein the video set comprises at least two candidate videos, and the candidate videos and the video to be detected meet a first similar condition; obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database; and determining target video meeting a second similar condition with the video to be detected in the video set based on the detected frame characteristic and the candidate frame characteristic. In this embodiment, the video is first subjected to coarse granularity sorting based on the video features, and then subjected to fine granularity sorting based on the frame features, so that the target video is determined, and a mode of combining coarse granularity and fine granularity is realized, so that the capability of quick retrieval can be achieved while higher detection precision of similar fragments is maintained.
As shown in fig. 2, a flowchart of an embodiment 2 of a video duplication detection method provided by the present application includes the following steps:
step S101: extracting at least two frames of detection images from the video to be detected according to a preset period;
step S201 is identical to the corresponding steps in embodiment 1, and is not described in detail in this embodiment.
Step S202: performing scaling treatment on the detection image to obtain the detection image meeting the input requirement of the preset feature extraction model;
the format of the video to be detected may be any format, such as 1080P, 2K, 720P, or even other formats, which is not limited in the present application.
Because the input format of the preset feature extraction model is fixed, in order to successfully input the detected image into the preset feature extraction model to perform feature extraction, the detected image needs to be scaled to obtain an image meeting the input format requirement of the preset feature extraction model.
For example, if the input requirement of the preset feature extraction model is 512 pixels by 512 pixels, the detected image is scaled to obtain a detected image in 512 by 512 format.
In specific implementation, a bilinear interpolation mode can be adopted to perform scaling processing on the detection image.
Step S203: performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
step S204: obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
step S205: searching in a preset video feature database based on the video features to obtain a video set;
step S206: obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
step S207: and determining target videos meeting second similar conditions with the video to be detected in the at least two candidate videos based on the detected frame characteristics and the candidate frame characteristics.
Steps S203-207 are identical to the corresponding steps in embodiment 1, and are not described in detail in this embodiment.
In summary, the method for detecting video duplication provided in this embodiment further includes: and performing scaling treatment on the detection image to obtain the detection image meeting the input requirement of the preset feature extraction model. In this embodiment, the scaling process is performed on the detected image, so that the scaled detected image can meet the input requirement of the preset feature extraction model, and the preset feature extraction model can be input, thereby providing a basis for extracting the frame features of the detected image.
As shown in fig. 3, a flowchart of embodiment 3 of a video duplication detection method provided by the present application includes the following steps:
step S301: extracting at least two frames of detection images from the video to be detected according to a preset period;
step S302: performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
step S303: obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
step S304: searching in a preset video feature database based on the video features to obtain a video set;
step S305: obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
steps S301-305 are identical to the corresponding steps in embodiment 1, and are not described in detail in this embodiment.
Step S306: sequentially obtaining similarity matrixes of the detected frame features and the candidate frame features corresponding to each candidate video according to the detected frame features of the video to be detected and the candidate frame features corresponding to each candidate video in a video set;
the method comprises the steps that an obtained video set comprises a plurality of candidate videos, candidate frame characteristics corresponding to each candidate video are obtained in a preset frame characteristic database, and then the candidate frame characteristics and the detection frame characteristics corresponding to each candidate video are sequentially processed to obtain a detection frame characteristic and a similarity matrix of the candidate frame characteristics.
Specifically, the similarity matrix may be calculated sequentially for the candidate frame features and the detected frame features corresponding to the candidate video according to the ranking.
The similarity matrix can represent the similarity between a certain frame in the candidate video and a certain frame in the video to be detected.
The process of obtaining the similarity matrix will be described in detail in the following embodiments, which will not be described in detail in this embodiment.
Step S307: and determining a target video with similar fragments with the video to be detected based on the similarity matrix.
Each element in the similarity matrix characterizes the similarity between a certain frame image in the candidate video and a certain frame image in the video to be detected.
Correspondingly, based on the similarity matrix, the similarity conditions of the candidate videos and the videos to be detected are determined, and different similarity matrices represent the similarity conditions of different candidate videos and the videos to be detected.
Therefore, based on the similarity matrix, the similarity condition of each candidate video and the video to be detected is determined, and whether the video to be detected has a similar segment or not is determined.
If a certain or a plurality of candidate videos exist in the video detection method, the candidate videos with similar fragments are used as target videos, or the candidate videos with more similar fragments and higher similarity are used as target videos.
And if the candidate video does not have the segment similar to the video to be detected, generating a result feedback of the video without similarity.
The process of determining the target video will be described in detail in the following embodiments, which will not be described in detail in this embodiment.
In summary, the method for detecting video duplication provided in this embodiment includes: sequentially obtaining similarity matrixes of the detected frame features and the candidate frame features corresponding to each candidate video according to the detected frame features of the video to be detected and the candidate frame features corresponding to each candidate video in a video set; and determining a target video with similar fragments with the video to be detected based on the similarity matrix. In this embodiment, the detection frame features of the video to be detected and the frame features of each candidate video are sequentially processed to obtain a similarity matrix of the video to be detected and each candidate video, so as to determine the similarity condition of the video to be detected and the candidate video based on the similarity matrix, determine that a condition that a similar segment exists in the candidate video to be detected and the candidate video is the target video, and realize the copy detection of the video to be detected.
As shown in fig. 4, a flowchart of an embodiment 4 of a video duplication detection method provided by the present application includes the following steps:
Step S401: extracting at least two frames of detection images from the video to be detected according to a preset period;
step S402: performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
step S403: obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
step S404: searching in a preset video feature database based on the video features to obtain a video set;
step S405: obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
steps S401 to 405 are identical to the corresponding steps in embodiment 3, and are not described in detail in this embodiment.
Step S406: combining the detection frame characteristics of at least two frames of detection images to obtain a detection frame characteristic matrix;
and combining the multiple frame detection images extracted from the video to be detected according to the multiple detection frame characteristics to obtain a detection frame characteristic matrix corresponding to the video to be detected.
For example, the detected frame features are 256 dimensions, m detected frame images are extracted, and m detected frame features are combined to obtain an (m, 256) dimensional matrix.
Step S407: combining the candidate frame characteristics corresponding to each candidate video of the video set in sequence to obtain a candidate frame characteristic matrix corresponding to each candidate video;
each candidate video is extracted through m frames, the extracted multi-frame detection images correspond to a plurality of detection frame features, and the detection frame features are combined to obtain a detection frame feature matrix corresponding to the video to be detected.
For example, the candidate frame features corresponding to n frames stored in the preset frame feature database are combined to obtain an (n, 256) dimensional matrix.
In a specific implementation, the frame feature matrix of the candidate video is stored in the preset frame feature database, and step S407 may be omitted.
Step S408: sequentially calculating the similarity matrix of the detection frame feature matrix and each candidate frame feature matrix;
and calculating a similarity matrix by sequentially carrying out the detection frame feature matrix and the candidate frame feature matrix of each candidate video to obtain a similarity matrix consistent with the number of the candidate videos.
Specifically, the similarity matrix is obtained by dot multiplying the detection frame feature matrix and the candidate frame feature matrix.
As shown in fig. 5, a similarity matrix schematic diagram includes: extracting m frames of detection images from the video A to be detected, extracting the characteristics of the detection images, wherein the obtained detection frame characteristic matrix is an (m, 256) dimensional matrix, and m represents m frames; extracting n frames of images from the candidate video B, extracting features of the obtained images, wherein the obtained candidate frame feature matrix is (n, 256), n represents that n frames exist, 256 represents that each frame of extracted feature dimension is 256, detecting the dot product of the frame feature matrix and the candidate frame feature matrix to obtain a similarity matrix sim of the (m, n) dimension, wherein sim (i, j) in the similarity matrix represents the similarity degree of an ith frame detection graph in the video A to be detected and a jth frame image in the candidate video B, and the value of i is not more than m, and the value of j is not more than n.
Step S409: and determining a target video with similar fragments with the video to be detected based on the similarity matrix.
Step S409 is identical to the corresponding step in embodiment 3, and is not described in detail in this embodiment.
In summary, the method for detecting video duplication provided in this embodiment includes: combining the detection frame characteristics of at least two frames of detection images to obtain a detection frame characteristic matrix; combining the candidate frame characteristics corresponding to each candidate video of the video set in sequence to obtain a candidate frame characteristic matrix corresponding to each candidate video; and calculating the similarity matrix of the detection frame feature matrix and each candidate frame feature matrix in turn. In this embodiment, a detection frame feature matrix is obtained by combining detection frame features of the detection images, a candidate frame feature matrix is obtained by combining second features of the candidate videos, a similarity matrix is obtained based on the detection frame feature matrix and the candidate frame feature matrix, elements in the similarity matrix represent similarity of any detection image in the video to be detected and any image in the candidate videos, and the similarity matrix provides a basis for subsequently determining candidate videos with similar segments to the video to be detected.
As shown in fig. 6, a flowchart of an embodiment 5 of a video duplication detection method provided by the present application includes the following steps:
step S601: extracting at least two frames of detection images from the video to be detected according to a preset period;
step S602: performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
step S603: obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
step S604: searching in a preset video feature database based on the video features to obtain a video set;
step S605: obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
step S606: sequentially obtaining similarity matrixes of the detected frame features and the candidate frame features corresponding to each candidate video according to the detected frame features of the video to be detected and the candidate frame features corresponding to each candidate video in a video set;
steps S601-605 are identical to the corresponding steps in embodiment 3, and are not described in detail in this embodiment.
Step S607: screening the similarity matrix based on a preset threshold value to obtain at least two target elements in the similarity matrix;
Wherein the target element is an element greater than the preset threshold.
The preset threshold is used for judging whether the two images are similar or not.
Specifically, if the value of an element in the similarity matrix is greater than a preset threshold, the corresponding detection image of the element in the video to be detected is characterized as being more similar to the corresponding image in the candidate video.
Specifically, the preset threshold value is a number between 0.8, 0.75 and other arbitrary 0 to 1, the larger the value of the preset threshold value is, the higher the similarity of two images is, and the higher the similarity degree of the finally determined target video and the video to be detected is; the smaller the value of the preset threshold value is, the more similar images can be screened, and the more target videos are finally determined, so that omission is avoided.
In specific implementation, the value of the preset threshold value can be determined according to practical situations, and the value of the preset threshold value is not limited in the application.
If a certain similarity matrix does not have a target element or only has a target element larger than the preset threshold value, judging that the video to be detected and the candidate video related to the similarity matrix do not have similar segment contents.
Step S608: and if the at least two target elements meet a preset position continuous condition, determining that the target video exists in the at least two candidate videos.
If a similar segment exists between the candidate video and the video to be detected, an image corresponding to the similar segment forms a line segment in the similarity matrix.
The continuous condition of the preset position is that the position in the similarity matrix is formed into a continuous line segment, the value of each element in the line segment is larger than a preset threshold value, and the similarity between the image in the candidate video corresponding to any element in the line segment and the detected image in the video to be detected is higher.
If the positions of the plurality of target elements in the similarity matrix represent a continuous line segment, it is determined that a candidate video corresponding to the similarity matrix has a similar segment similar to the video to be detected.
Typically, the line segments extend in an oblique direction (e.g., from top left to bottom right, top right to bottom left, etc.) in the similarity matrix.
Fig. 7 is a schematic diagram of a similarity matrix, where the similarity matrix is a matrix with m×n, and a line segment 701 formed by the target elements exists in a region outlined by a dotted line in the matrix.
The plurality of target elements meet a preset position continuous condition, and may form a continuous line segment, or form a plurality of continuous line segments.
For example, candidate videos 1-3, analyzing similarity matrixes of the candidate videos and the video to be detected, and if no target element exists in the candidate video 1, the candidate video 1 does not have a similar segment with the video to be detected; if the target element exists in the candidate video 2, but the position of the target element does not meet the preset position continuous condition, the candidate video 2 does not exist a similar segment with the video to be detected; if there are multiple first attributes in the candidate video 3 and the positions of some or all target elements meet the preset position continuous condition, then there are similar segments in the candidate video 3 with the video to be detected.
Specifically, the method further comprises the following steps: and analyzing the obtained target fragments in the target video based on the at least two target elements and the positions of the target elements in the similarity matrix, wherein the target fragments are similar fragments in the target video to the video to be detected.
In this embodiment, the positions of the target elements meeting the preset position continuity condition in the similarity matrix can also be analyzed to obtain the corresponding detected images in the video to be detected and the corresponding images in the candidate video, further, the positions of the similar segments in the video to be detected are determined based on the detected images corresponding to the target elements, the starting time and the ending time of the similar segments in the video to be detected are obtained, the positions of the similar segments in the candidate video are determined based on the images in the candidate video corresponding to the target elements, and the starting time and the ending time of the similar segments in the candidate video are obtained.
Specifically, a TN algorithm may be used to process the similarity matrix containing the target element, and determine the start time and the end time of the similar segments in the video to be detected and the candidate video.
In summary, the method for detecting video duplication provided in this embodiment includes: screening the similarity matrix based on a preset threshold value to obtain at least two target elements larger than the preset threshold value in the similarity matrix; and if the at least two target elements meet a preset position continuous condition, determining that the target video exists in the at least two candidate videos. In this embodiment, elements in each similarity matrix are screened based on a preset threshold value, so as to obtain target elements in each similarity matrix, and if positions of the target elements in the same similarity matrix represent continuous line segments, it is determined that similar segments similar to the video to be detected exist in candidate videos corresponding to the similarity matrix, so as to obtain a target video.
As shown in fig. 8, a flowchart of embodiment 6 of a video duplication detection method provided by the present application includes the following steps:
Step S801: training the original feature extraction model based on a preset training image to obtain a preset feature extraction model;
the preset feature extraction model may be a neural network model.
Fig. 9 is a schematic structural diagram of a preset feature extraction model, which includes: an input layer 901, a convolution layer 902, a feature dimension reduction layer 903, a feature normalization layer 904, and an output embedding layer 905.
Wherein the convolution layer is resnet50, the feature dimension reduction layer is fc (flame), and the feature normalization layer is bn+12norm.
It should be noted that, the classification layer is not set in the preset feature extraction model, and 2048-dimension features are output, however, directly using 2048-dimension features obviously causes higher storage pressure, so that an fc layer is added to perform feature dimension reduction, and 256-dimension features are output. Finally, bn and l2norm were added for feature normalization.
The method comprises the steps of training an original feature extraction model, generating a positive sample by adopting a self-supervision training method and enhancing strong data, and further adding training negative sample data to enhance the image similarity discrimination capability.
Specifically, the data enhancement method adopts an isc algorithm, wherein the isc algorithm is mage similarity feature extraction model (image similarity feature extraction model) and the internal parameters of the isc algorithm can be set according to requirements; the memory bank algorithm updated in real time proposed by mocov3 algorithm is adopted to increase the number of training negative samples.
Specifically, a loss function info nce is used to determine whether training is complete.
Wherein the loss function is as follows:
wherein q represents the picture feature obtained by performing feature extraction model on the current input picture, and k + And representing a positive sample obtained after the current input picture is subjected to data enhancement. k (k) i And representing the features extracted by the feature extraction model of other pictures. τ represents a temperature coefficient, which is an ultra-parameter, and an optimal value is obtained through experiments. The optimization objective of the loss function is to make q and k + Is very close to the cosine distance of q and k i Is so far from the cosine of the model that the trained model possesses image similarity discrimination capability. The more and more strongly the data enhancement categories, the more discriminative the feature extraction model.
Step S802: extracting at least two frames of detection images from the video to be detected according to a preset period;
step S803: performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
step S804: obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
step S805: searching in a preset video feature database based on the video features to obtain a video set;
Step S806: obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
step S807: and determining target video meeting a second similar condition with the video to be detected in the video set based on the detected frame characteristic and the candidate frame characteristic.
Steps S802-807 are identical to the corresponding steps in embodiment 1, and are not described in detail in this embodiment.
In summary, the method for detecting video duplication provided in this embodiment further includes: training the original feature extraction model based on a preset training image to obtain a preset feature extraction model. In this embodiment, the preset feature extraction model is trained in advance, so as to provide a basis for extracting features of the detected image.
Corresponding to the embodiment of the video duplication detection method provided by the application, the application also provides an embodiment of a device applying the video duplication detection method.
Fig. 10 is a schematic structural diagram of an embodiment of a video duplication detecting apparatus according to the present application, where the apparatus includes the following structures: an image extraction module 1001, a feature extraction module 1002, a video feature module 1003, a search module 1004, an acquisition module 1005, and a determination module 1006;
The image extracting module 1001 is configured to extract at least two frames of detection images from a video to be detected according to a preset period;
the feature extraction module 1002 is configured to perform feature extraction on the at least two frames of detection images based on a preset feature extraction model, so as to obtain detection frame features of the at least two frames of detection images;
the video feature module 1003 is configured to obtain video features of the video to be detected based on detected frame features of the at least two detected frames of images;
the searching module 1004 is configured to search a preset video feature database based on the video features to obtain a video set, where the video set includes candidate videos, and the candidate videos and the video to be detected meet a first similar condition;
the obtaining module 1005 is configured to obtain candidate frame features corresponding to the candidate video in a preset frame feature database;
the determining module 1006 is configured to determine, in a video set, a target video that meets a second similar condition to the video to be detected based on the detected frame feature and the candidate frame feature.
Optionally, the video feature module is specifically configured to:
and averaging the data corresponding to the detected frame characteristics of the at least two frames of detected images to obtain the video characteristics of the video to be detected.
Optionally, the method further comprises:
and the scaling module is used for scaling the detection image to obtain the detection image meeting the input requirement of the preset feature extraction model.
Optionally, the determining module includes:
the matrix unit is used for sequentially obtaining the similarity matrix of the detected frame characteristics and the candidate frame characteristics corresponding to each candidate video according to the detected frame characteristics of the video to be detected and the candidate frame characteristics corresponding to each candidate video in the video set;
and the determining unit is used for determining a target video with a similar segment with the video to be detected based on the similarity matrix.
Optionally, the matrix unit is specifically configured to:
a first combination subunit, configured to combine the detected frame features of at least two detected images to obtain a detected frame feature matrix;
the second combination subunit is used for sequentially combining the candidate frame characteristics corresponding to each candidate video of the video set to obtain a candidate frame characteristic matrix corresponding to each candidate video;
and the calculating subunit is used for sequentially calculating the similarity matrix of the detection frame feature matrix and each candidate frame feature matrix.
Optionally, the computing subunit is specifically configured to:
And carrying out dot multiplication on the detection frame feature matrix and the candidate frame feature matrix to obtain the similarity matrix.
Optionally, the determining unit includes:
a screening subunit, configured to screen the similarity matrix based on a preset threshold, so as to obtain at least two first elements in the similarity matrix that are greater than the preset threshold;
and the determining subunit is used for determining that the target video exists in the at least two candidate videos if the at least two first elements meet a preset position continuous condition.
Optionally, the method further comprises:
and the target segment subunit is used for analyzing the obtained target segment in the target video based on the at least two first elements and the positions of the first elements in the similarity matrix, wherein the target segment is a similar segment of the target video with the video to be detected.
Optionally, the method further comprises:
the training module is used for training the original feature extraction model based on a preset training image to obtain a preset feature extraction model.
It should be noted that, for explanation of the functions of each component structure in the video duplication detection apparatus provided in this embodiment, please refer to the explanation in the foregoing method embodiment, and details are not described in this embodiment.
In summary, according to the video duplication detection device provided by the embodiment, the video is subjected to coarse granularity sequencing based on the video features, and then the candidate video is subjected to fine granularity sequencing based on the frame features, so that the target video is determined, a mode of combining the coarse granularity and the fine granularity is realized, and the capability of quick retrieval can be achieved while the detection precision of the similar fragments is kept higher.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. The device provided in the embodiment corresponds to the method provided in the embodiment, so that the description is simpler, and the relevant points refer to the description of the method.
The previous description of the provided embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features provided herein.

Claims (10)

1. A video duplication detection method, comprising:
extracting at least two frames of detection images from the video to be detected according to a preset period;
performing feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
obtaining video features of the video to be detected based on the detected frame features of the at least two detected images;
searching in a preset video feature database based on the video features to obtain a video set, wherein the video set comprises candidate videos, and the candidate videos and the video to be detected meet a first similar condition;
obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
and determining target video meeting a second similar condition with the video to be detected in the video set based on the detected frame characteristic and the candidate frame characteristic.
2. The method according to claim 1, wherein the obtaining the video feature of the video to be detected based on the detected frame features of the at least two detected images includes:
and averaging the data corresponding to the detected frame characteristics of the at least two frames of detected images to obtain the video characteristics of the video to be detected.
3. The method according to claim 1, wherein after extracting at least two frames of detection images from the video to be detected according to a preset period, the method further comprises:
and performing scaling treatment on the detection image to obtain the detection image meeting the input requirement of the preset feature extraction model.
4. The method of claim 1, wherein the determining, in the video set, a target video that satisfies a second similar condition to the video to be detected based on the detected frame feature and a candidate frame feature comprises:
sequentially obtaining similarity matrixes of the detected frame features and the candidate frame features corresponding to each candidate video according to the detected frame features of the video to be detected and the candidate frame features corresponding to each candidate video in a video set;
and determining a target video with similar fragments with the video to be detected based on the similarity matrix.
5. The method according to claim 4, wherein the sequentially obtaining the similarity matrix of the detected frame feature and the candidate frame feature corresponding to each candidate video according to the detected frame feature of the video to be detected and the candidate frame feature corresponding to each candidate video in the video set includes:
Combining the detection frame characteristics of at least two frames of detection images to obtain a detection frame characteristic matrix;
combining the candidate frame characteristics corresponding to each candidate video of the video set in sequence to obtain a candidate frame characteristic matrix corresponding to each candidate video;
and calculating the similarity matrix of the detection frame feature matrix and each candidate frame feature matrix in turn.
6. The method of claim 5, wherein computing a similarity matrix for the detected frame feature matrix and the candidate frame feature matrix comprises:
and carrying out dot multiplication on the detection frame feature matrix and the candidate frame feature matrix to obtain the similarity matrix.
7. The method of claim 4, wherein determining, based on the similarity matrix, a target video for which a similar segment exists with the video to be detected, comprises:
screening the similarity matrix based on a preset threshold value to obtain at least two target elements in the similarity matrix, wherein the target elements are elements larger than the preset threshold value;
and if the at least two target elements meet a preset position continuous condition, determining that the target video exists in the at least two candidate videos.
8. The method as recited in claim 7, further comprising:
and analyzing the obtained target fragments in the target video based on the at least two target elements and the positions of the target elements in the similarity matrix, wherein the target fragments are similar fragments in the target video to the video to be detected.
9. The method according to claim 1, wherein before extracting at least two frames of detected images from the video to be detected according to a preset period, the method further comprises:
training the original feature extraction model based on a preset training image to obtain a preset feature extraction model.
10. A video duplication detection apparatus, comprising:
the image extraction module is used for extracting at least two frames of detection images from the video to be detected according to a preset period;
the feature extraction module is used for carrying out feature extraction on the at least two frames of detection images based on a preset feature extraction model to obtain detection frame features of the at least two frames of detection images;
the video feature obtaining module is used for obtaining the video feature of the video to be detected based on the detected frame features of the at least two frames of detected images;
The searching module is used for searching in a preset video feature database based on the video features to obtain a video set, wherein the video set comprises candidate videos, and the candidate videos and the video to be detected meet a first similar condition;
the obtaining module is used for obtaining candidate frame characteristics corresponding to the candidate video in a preset frame characteristic database;
and the determining module is used for determining target video which meets a second similar condition with the video to be detected in a video set based on the detected frame characteristics and the candidate frame characteristics.
CN202311000145.7A 2023-08-09 2023-08-09 Video duplication detection method and device Pending CN117058581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311000145.7A CN117058581A (en) 2023-08-09 2023-08-09 Video duplication detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311000145.7A CN117058581A (en) 2023-08-09 2023-08-09 Video duplication detection method and device

Publications (1)

Publication Number Publication Date
CN117058581A true CN117058581A (en) 2023-11-14

Family

ID=88656572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311000145.7A Pending CN117058581A (en) 2023-08-09 2023-08-09 Video duplication detection method and device

Country Status (1)

Country Link
CN (1) CN117058581A (en)

Similar Documents

Publication Publication Date Title
CN111062871B (en) Image processing method and device, computer equipment and readable storage medium
EP2676224B1 (en) Image quality assessment
KR101457284B1 (en) Methods and apparatuses for facilitating content-based image retrieval
CN103608826B (en) Annotated using product in the video of Web information mining
US20120027295A1 (en) Key frames extraction for video content analysis
Zhang et al. A joint compression scheme of video feature descriptors and visual content
KR20190082593A (en) System and Method for Reidentificating Object in Image Processing
Li et al. Detection of blotch and scratch in video based on video decomposition
CN115115856A (en) Training method, device, equipment and medium for image encoder
Ge et al. WGI-Net: A weighted group integration network for RGB-D salient object detection
Kaur et al. Content-Based Video and Image Retrieval in the Modern Era: Apprehensions and Scope
Bekhet et al. Video Matching Using DC-image and Local
CN117058581A (en) Video duplication detection method and device
CN112100412B (en) Picture retrieval method, device, computer equipment and storage medium
Bhaumik et al. Towards redundancy reduction in storyboard representation for static video summarization
CN112487943B (en) Key frame de-duplication method and device and electronic equipment
Jabnoun et al. Video-based assistive aid for blind people using object recognition in dissimilar frames
Guru et al. Histogram based split and merge framework for shot boundary detection
CN111680722B (en) Content identification method, device, equipment and readable storage medium
Antony et al. Copy Move Image Forgery Detection Using Adaptive Over-Segmentation and Brute-Force Matching
RU2693994C1 (en) Video processing method for visual search purposes
CN113766311A (en) Method and device for determining number of video segments in video
CN118035494A (en) Information determination method, apparatus, device and computer readable storage medium
Sunuwar et al. A comparative analysis on major key-frame extraction techniques
Sakpal et al. Revenge Pornography Matching Using Computer Vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination