CN116188821A

CN116188821A - Copyright detection method, system, electronic device and storage medium

Info

Publication number: CN116188821A
Application number: CN202310450421.3A
Authority: CN
Inventors: 刘世章; 汪昭辰; 王全宁
Original assignee: Qingdao Chenyuan Technology Information Co ltd
Current assignee: Qingdao Chenyuan Technology Information Co ltd
Priority date: 2023-04-25
Filing date: 2023-04-25
Publication date: 2023-05-30
Anticipated expiration: 2043-04-25
Also published as: CN116188821B

Abstract

The invention discloses a copyright detection method, a copyright detection system, electronic equipment and a storage medium, and relates to the technical field of video processing, wherein the copyright detection method comprises the following steps: acquiring a target object of copyrights to be detected, wherein the target object comprises an image to be detected or a video to be detected; preprocessing a target object to obtain a target image corresponding to an image to be detected, or obtaining a video event sequence corresponding to a video to be detected, wherein the video event sequence comprises at least one target video event, and searching an image similar to the target image in an image information network to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; and performing infringement processing on the target object according to the detection result. The invention can quickly find out the image or video set similar to the target object in the copyright sample library so as to judge whether the target object infringes or not, and can improve the accuracy and efficiency of infringement comparison.

Description

Copyright detection method, system, electronic device and storage medium

Technical Field

The present invention relates to the field of video processing, and in particular, to a copyright detection method, a copyright detection system, an electronic device, and a storage medium.

Background

With the development of self-media, the ways of image infringement and video infringement are increased, for example, copyrighted images or videos are directly moved; cutting long video into a plurality of short video transmission; deleting the head and the tail of the original work piece, directly cutting or assembling the core picture into a new video for transmission; performing secondary creation on the original video; mosaic, scaling the picture, changing the aspect ratio, changing the resolution of the image, etc. These phenomena seriously impair the legal rights of copyrighted parties and prevent the development of cultural utilities.

The existing copyright protection technology comprises a digital image watermarking technology or a mode of combining machine learning and digital image watermarking technology, and copyright information is extracted through a watermark extraction algorithm to be used as main evidence of digital image attribution. However, the technology is easy to be attacked by representation, robustness and interpretation, so that partial or even all watermark information is lost in the digital image watermark, and difficulty is brought to watermark information extraction, thereby affecting copyright protection. The method of combining machine learning and digital watermarking is dependent on training on a sample library, has high cost and huge energy consumption, and is difficult to meet the requirements of complex and changeable video contents to be detected.

Therefore, providing a new copyright detection method is a problem to be solved.

Disclosure of Invention

In view of the above, the present invention aims to provide a copyright detection method, a system, an electronic device and a storage medium, which can solve the problems of low accuracy and low efficiency of the existing copyright detection.

Based on the above object, in a first aspect, the present invention provides a copyright detection method, including: acquiring a target object to be detected, wherein the target object comprises an image to be detected or a video to be detected; preprocessing the target object to obtain a target image corresponding to the image to be detected, or obtaining a video event sequence corresponding to the video to be detected, wherein the video event sequence comprises at least one target video event, the video event refers to a set of all content frames in a lens, the content frames refer to frames representing the lens content and comprise a first frame, a last frame and N middle frames, N is a natural number, and the middle frames are obtained when the difference rate is larger than a preset threshold value by calculating the difference rate of all subframes of a lens except the first frame and the last frame and the previous content frame; searching an image similar to the target image in an image information network to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; the image information network is a forest structure constructed based on an image information space and an image multi-level tree set, the image multi-level tree comprises root nodes and child nodes, the difference rate between images corresponding to any two root nodes is larger than a preset threshold, the difference rate between the child node of each root node and the image corresponding to the root node is smaller than or equal to the preset threshold, the image information space is a multi-dimensional vector space in which image feature vectors are located, and the image feature vectors are obtained by calculating after feature matrices are extracted from the images under the same coordinate system; the video event information network is a forest structure constructed based on a video event information space based on an event multi-level tree set, the video event information space is a multi-dimensional vector space in which video event feature vectors are located, and the video event feature vectors are obtained by calculating after extracting feature matrixes from a content frame set under the same coordinate system; obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result, and outputting the infringement data.

Optionally, before acquiring the target object to be detected, the method includes: acquiring original data, and preprocessing the original data to obtain a copyright sample library, wherein the copyright sample library comprises an original image and an original video corresponding to the original data; and constructing the video event information network according to the video event sequence of the original video, and constructing the image information network according to the original image and the content frame of the original video.

Optionally, preprocessing the target object includes: when the target object is an image to be detected, carrying out normalization processing on the image to be detected to obtain the target image; when the target object is a video to be detected, carrying out normalization processing on the video to be detected to obtain a normalized video; granulating the normalized video to obtain a lens sequence and a content frame sequence; and obtaining a video event sequence corresponding to the video to be detected according to the shot sequence and the content frame sequence.

Optionally, searching for an image similar to the target image in the image information network to obtain a first detection result, including: calculating image characteristics of the target image; traversing root nodes of the image information network, and screening the root nodes through the feature quantity of the target image to obtain alternative image root nodes; calculating the image feature difference rate of the target image and the alternative image according to the image features of the target image and the image features of the alternative image, and judging whether the target image is similar to the alternative image or not according to the image feature difference rate, wherein the alternative image is an image corresponding to a root node of the alternative image; and outputting a similar image set as the first detection result, wherein the similar image set comprises all images similar to the target image in the image information network.

Optionally, the method further comprises: under the condition that the target image is similar to the alternative image, traversing all sub-nodes associated with the alternative image root node corresponding to the alternative image to obtain alternative image sub-nodes; judging whether the target image is similar to the corresponding image of the alternative image sub-node according to the image characteristic difference rate of the target image and the corresponding image of the alternative image sub-node, and adding the corresponding image of the alternative image sub-node into the similar image set under the condition that the target image is similar to the corresponding image of the alternative image sub-node.

Optionally, searching for a video event similar to the video event of the video to be detected in the video event information network to obtain a second detection result, including: calculating feature data of the target video event based on each content frame in the target video event; traversing root nodes of the video event information network, and screening the root nodes according to the number of the content frames of the target video event to obtain alternative event root nodes; calculating the similarity ratio of the target video event and the alternative event according to the characteristic data of the target video event and the characteristic data of the alternative event, and judging whether the target video event and the alternative event are similar or not according to the similarity ratio of the target video event and the alternative event, wherein the alternative event is a video event corresponding to the alternative event root node; and outputting a similar event set as the second detection result, wherein the similar event set comprises all video events similar to the target video event in the video event information network.

Optionally, the method further comprises: under the condition that the target video event and the alternative event are similar, traversing all sub-nodes associated with the alternative event root node corresponding to the alternative event to obtain alternative event sub-nodes; judging whether the target video event is similar to the video event corresponding to the alternative event child node according to the similarity of the target video event and the video event corresponding to the alternative event child node, and adding the video event corresponding to the alternative event child node into the similar event set under the condition that the target video event is similar to the video event corresponding to the alternative event child node.

Optionally, when the target object includes a video to be measured, the method further includes: acquiring a content frame of each video event of the video to be detected; and searching an image similar to the content frame in an image information network to obtain the first detection result.

Optionally, after obtaining the first detection result or the second detection result, the method includes: when the total number of the similar images contained in the first detection result is greater than 1, sorting the images in the similar image set according to the image difference rate of the target image and each similar image; or under the condition that the total number of the similar video events contained in the second detection result is greater than 1, sorting the video events in the similar event set according to the similarity rate of the target video event and each similar video event.

Optionally, obtaining infringement data characterizing infringement of the target object according to the first detection result or the second detection result includes: under the condition that the first detection result comprises a similar image of the image to be detected, taking the image to be detected, the similar image and the position information of the similar image in an original video as infringement data of the image to be detected, wherein the similar image comprises an original image similar to the image to be detected or a content frame of the original video similar to the image to be detected; under the condition that the second detection result comprises similar video events of the video to be detected, taking the video to be detected, the similar video events and the position information of the similar video events in the original video as infringement data of the video to be detected;

optionally, the method further comprises: acquiring infringement confirmation information of infringement monitoring personnel based on infringement data of the image to be detected or the video to be detected; and storing infringement data corresponding to the infringement confirmation information to form an evidence library.

In a second aspect, there is also provided a copyright detection system, the system including: the data acquisition module is used for acquiring a target object to be detected, wherein the target object comprises an image to be detected or a video to be detected; the processing module is used for preprocessing the target object to obtain a target image corresponding to an image to be detected, or obtaining a video event sequence corresponding to the video to be detected, wherein the video event sequence comprises at least one target video event, the video event refers to a set of all content frames in a lens, the content frames refer to frames representing the content of the lens and comprise a first frame, a last frame and N middle frames, N is a natural number, and the middle frames are obtained when the difference rate is larger than a preset threshold value through calculating the difference rate of all subframes of the lens except the first frame and the last frame and the previous content frame; the detection module is used for searching an image similar to the target image in the image information network to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; the image information network is a forest structure constructed based on an image information space and an image multi-level tree set, the image multi-level tree comprises root nodes and child nodes, the difference rate between images corresponding to any two root nodes is larger than a preset threshold, the difference rate between the child node of each root node and the image corresponding to the root node is smaller than or equal to the preset threshold, the image information space is a multi-dimensional vector space in which image feature vectors are located, and the image feature vectors are obtained by calculating after feature matrices are extracted from the images under the same coordinate system; the video event information network is a forest structure constructed based on a video event information space based on an event multi-level tree set, the video event information space is a multi-dimensional vector space in which video event feature vectors are located, and the video event feature vectors are obtained by calculating after feature matrices are extracted from a content frame set under the same coordinate system. The output module is used for obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result and outputting the infringement data.

Optionally, the system further comprises: the infringement evidence obtaining module is used for obtaining infringement confirmation information of infringement data of the infringement monitor based on the image to be detected or the video to be detected, storing infringement data corresponding to the infringement confirmation information and forming an evidence base; the processing module further includes: the normalization module is used for carrying out normalization processing on the image to be detected or the video to be detected; the granulating module is used for granulating the video to be detected to obtain a video event sequence; the feature extraction module is used for calculating image features of the images and feature data of the video event; the image information network creation module is used for constructing an image information network according to the original image and the content frame of the original video; and the video event information network creation module is used for constructing a video event information network according to the video event sequence of the original video.

In a third aspect, there is also provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor runs the computer program to implement the method of the first aspect.

In a fourth aspect, there is also provided a computer readable storage medium having stored thereon a computer program for execution by a processor to perform the method of any of the first aspects.

In general, the present invention has at least the following benefits:

according to the copyright detection method, a target object is preprocessed by acquiring an image to be detected or a video to be detected, so that a target image corresponding to the image to be detected is obtained, or a video event sequence comprising at least one target video event corresponding to the video to be detected is obtained, and an image similar to the target image is searched in an image information network, so that a first detection result is obtained; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; and obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result, outputting the infringement data, providing infringement data for infringement judgment, accurately and rapidly obtaining an infringement analysis result, and improving the accuracy of infringement analysis.

Drawings

In the drawings, the same reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily drawn to scale. It is appreciated that these drawings depict only some embodiments according to the disclosure and are not therefore to be considered limiting of its scope. The exemplary embodiments of the present invention and the descriptions thereof are for explaining the present invention and do not constitute an undue limitation of the present invention. In the drawings:

Fig. 1 is a schematic diagram showing an application environment of a copyright detection method according to a first embodiment of the present invention;

fig. 2 is a schematic diagram showing an application environment of another copyright detection method according to the second embodiment of the present invention;

fig. 3 is a flowchart showing steps of a copyright detection method according to a third embodiment of the present invention;

fig. 4 shows a process of creating an image tree structure according to the third embodiment of the present invention;

fig. 5 shows a process of creating an image information network according to the third embodiment of the present invention;

fig. 6 shows a process of creating a video event information network according to the third embodiment of the present invention;

fig. 7 is a schematic view showing a granulating structure according to a third embodiment of the present invention;

FIG. 8 is a schematic diagram of content frame extraction according to a third embodiment of the present invention;

fig. 9 shows a content frame infringement effect diagram of a video to be tested provided in the third embodiment of the present invention;

fig. 10 shows a video event infringement effect diagram of a video to be tested according to a third embodiment of the present invention;

fig. 11 is a flowchart showing an overall copyright detection method according to the third embodiment of the present invention;

fig. 12 is a schematic diagram showing the structure of a copyright detection system according to a fourth embodiment of the present invention;

Fig. 13 is a schematic diagram showing the structure of another copyright detection system according to the fifth embodiment of the present invention;

fig. 14 shows an overall configuration diagram of a copyright detection system provided in a sixth embodiment of the present invention;

fig. 15 shows a schematic diagram of an electronic device according to a seventh embodiment of the present invention.

Reference numerals: terminal device 102, network 104, server 106, user 108, man-machine interaction screen 1022, first processor 1024, first memory 1026, server 106, database 1062, processing engine 1064, user device 204, second memory 206, second processor 208, copyright detection system 700, data acquisition module 701, processing module 702, detection module 703, output module 704, infringement evidence module 705, normalization module 7021, granulating module 7022, feature extraction module 7023, image information network creation module 7024, video event information network creation module 7025, electronic device 800, memory 801, processor 802, transmission means 803, display 804, connection bus 805.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In one aspect of the embodiment of the present invention, a method for detecting copyright is provided, and as an alternative implementation manner, the method for detecting copyright may be applied to, but not limited to, an application environment as shown in fig. 1. The application environment comprises the following steps: a terminal device 102, a network 104 and a server 106 which interact with a user in a man-machine manner. Human-machine interaction can be performed between the user 108 and the terminal device 102, and a copyrights detection application program runs in the terminal device 102. The terminal device 102 includes a man-machine interaction screen 1022, a first processor 1024 and a first memory 1026. The man-machine interaction screen 1022 is used for displaying video; the first processor 1024 is configured to obtain a target object to be detected, and perform copyright detection on the target object. The first memory 1026 is used for storing a copy right sample library and a target object to be detected.

In addition, the server 106 includes a database 1062 and a processing engine 1064, where the database 1062 is used to store the above-mentioned copyright sample library and the target object to be detected. The processing engine 1064 is configured to: obtaining a target object to be detected, preprocessing the target object to obtain a target image corresponding to the image to be detected, or obtaining a video event sequence corresponding to the video to be detected, and searching an image similar to the target image in an image information network to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result, and outputting the infringement data.

Example two

In one or more embodiments, the above-described copyright detection of the present invention may be applied in the application environment shown in FIG. 2. As shown in fig. 2, human-machine interaction may be performed between user 108 and user device 204. The user device 204 includes a second memory 206 and a second processor 208. The user device 204 in this embodiment may perform the copyright detection on the target object, but is not limited to, with reference to performing the operations performed by the terminal device 102.

Optionally, the terminal device 102 and the user device 204 include, but are not limited to, a mobile phone, a tablet computer, a notebook computer, a PC, a vehicle-mounted electronic device, a wearable device, and the like, and the network 104 may include, but is not limited to, a wireless network or a wired network. Wherein the wireless network comprises: WIFI and other networks that enable wireless communications. The wired network may include, but is not limited to: wide area network, metropolitan area network, local area network. The server 106 may include, but is not limited to, any hardware device that may perform calculations. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and is not limited in any way in the present embodiment.

The copyright protection technology in the related art comprises a digital image watermarking technology or a mode of combining machine learning and digital image watermarking technology, and copyright information is extracted through a watermark extraction algorithm to be used as main evidence of digital image attribution. However, the technology is easy to be attacked by representation, robustness and interpretation, so that partial or even all watermark information is lost in the digital image watermark, and difficulty is brought to watermark information extraction, thereby affecting copyright protection. The method of combining machine learning and digital watermarking is dependent on training on a sample library, has high cost and huge energy consumption, and is difficult to meet the requirements of complex and changeable video contents to be detected.

In order to solve the above technical problems, as an alternative implementation manner, an embodiment of the present invention provides a copyright detection method.

Example III

Fig. 3 shows a flowchart of the steps of a copyright detection method according to an embodiment of the present invention. As shown in FIG. 3, the copyright detection method comprises the following steps S301-S304:

s301, acquiring a target object to be detected.

In this embodiment, the target object includes an image to be detected or a video to be detected, that is, the copyright detection object in this embodiment may be an image or a video clip, and the target object may be an image or a video from one or more resource libraries, an image or a video from the internet, or an image or a video uploaded by a user.

In addition, before acquiring the target object to be detected, the embodiment includes: and acquiring the original data, and preprocessing the original data to obtain a copyright sample library, wherein the copyright sample library comprises an original image and an original video corresponding to the original data. Therefore, a copyright sample library consisting of copyrighted images or videos can be formed, and when whether the target object of any source is infringed or not needs to be detected, whether the target object is infringed or not can be judged only by comparing the target object with the images or videos in the copyright sample library.

In consideration of the fact that the data formats of the target objects from different sources are different in size, the calculation amount is huge when images or videos in a copyright sample library are searched, and therefore the embodiment also builds a video event information network according to the video event sequence of the original video and builds the image information network according to the original images and the content frames of the original video. So as to improve the speed of content comparison of the target object and massive images or videos. The original image and the original video refer to copyrighted images and videos in this embodiment.

In this embodiment, the image information network is a forest structure based on an image information space and based on an image multi-level tree set, the image multi-level tree includes root nodes and child nodes, a difference rate between images corresponding to any two root nodes is greater than a preset threshold, a difference rate between child nodes of each root node and images corresponding to the root nodes is less than or equal to the preset threshold, the image information space is a multidimensional vector space in which image feature vectors are located, and the image feature vectors are obtained by computing after feature matrices are extracted from images in the same coordinate system.

Fig. 4 shows a process of creating an image tree structure, in which four points A, B, C, D in fig. 4 are central positions of respective circular areas, a radius of a circle represents a maximum distance from a center of a circle in an image information space, and image contents of A, B, C, D represent main contents of the respective circular areas. c1, C2 are images similar to the content of the image C, B1, B2 are images similar to the content of the image B, D1, D2, D3 are images similar to the content of the image D, and the distances between C1, C2, B1, B2, D1, D2, D3 and the circle centers of the areas where the C1, C2, B2, D1, D3 are respectively located are not larger than the radius. Based on the image information space shown in fig. 4, the whole image information space can be zoned by selecting a center point and a designated radius to divide the area, and a tree structure can be established according to the zoning characteristic to record the relationship among the areas, namely the image multi-level tree set.

The tree structure can be divided into two stages according to the relation between the areas in the image information space, wherein the first stage is the center of the circle of the space areas corresponding to the root node, and the second stage is the non-center point of the space areas corresponding to the child nodes. If the space region is subdivided into multiple sub-regions, the tree structure will also generate corresponding multi-level sub-nodes, where the number of levels of the tree structure corresponds to the number of levels of the space region in the information space, and in this embodiment, a 2-level tree structure is described as an example. As shown in fig. 4, a plurality of image multi-level tree structures can be obtained according to an image information space, the image multi-level tree includes a root node and sub-nodes, a forest structure constructed based on an image multi-level tree set formed by the plurality of multi-level tree structures is an image information network, each sub-node in the image information network at least belongs to 1 root node, and no sub-node exists under the root node.

Fig. 5 shows a process of creating an image information network, as shown in fig. 5, performing image preprocessing on an acquired image to obtain normalized images, so that each image can be calculated under a unified coordinate system, analyzing image contents of a large number of images by calculating image features of the normalized images, extracting a feature matrix, performing validity check on the number of the image features to ensure that the images in an image information space are all effective images, for example, performing validity check on the normalized images according to a mode of the image feature matrix, if the number of the features is greater than a preset number, traversing to find similar root nodes in the image information network, if the number of the features is not greater than the preset number, adding the current image as a root node to form a root node set, and if the similar root node exists, adding the current image as a child node to form a child node set. The difference rate between the images corresponding to any two root nodes is larger than a preset threshold value, the difference rate between the sub-node of each root node and the image corresponding to the root node is smaller than or equal to the preset threshold value, so that the images between the root node and the root node are dissimilar in the constructed image information network, the images between the root node and the sub-node of the root node are similar, the images in the image information network have an association relationship based on the image similarity, thus, after image content analysis is carried out according to a mass image, the image information network comprising the mass image can be obtained, the root node image in the image information network has most of the image characteristics of the sub-node image, when the image information network is used for carrying out similarity comparison of the target image, the target image can be compared with the root node of the image information network, if the target image is similar to the root node, all the sub-nodes of the target image and the similar root node are then compared in a similar way, and all the images similar to the target image in the image information network can be obtained quickly, and the analysis result can be obtained quickly, and the image analysis efficiency can be improved.

In this embodiment, the video event information network is a forest structure constructed based on a multi-level tree set of events based on a video event information space, where the video event information space is a multi-dimensional vector space in which feature vectors of video events are located, and the feature vectors of the video events are calculated by extracting feature matrices from a set of content frames under the same coordinate system. The forest structure of the event multi-level tree set as the basic structure is similar to the forest structure of the image multi-level tree set structure shown in fig. 4.

Fig. 6 shows a process of creating a video event information network, as shown in fig. 6, preprocessing and granulating videos to obtain video events, calculating feature vectors of the video events, traversing and searching root nodes similar to the video events in a root node set, if the root nodes similar exist, adding the video events as child nodes of the similar root nodes, if the root nodes not similar exist, adding the video events as root nodes to construct the video event information network in a video event information space, because the video events in the created video event information network have an association relationship based on the similarity of the video events, when content analysis is performed according to massive video events, the video event information network comprising massive video events can be obtained, the root node video events in the video event information network have most of video event features of the child nodes thereof, when the video event information network is used for similarity comparison of target video events, the root nodes of the target video events and the video event information network can be compared, if the target video events are similar to the root nodes, then all child nodes of the target video events and the similar root nodes are compared, and the similarity of the target video events and all child nodes of the video events can be compared rapidly, thus the similarity of the target video events and the video events can be analyzed rapidly, and the target video events can be analyzed rapidly, and the efficiency can be improved, and the efficiency can be obtained.

S302, preprocessing a target object to obtain a target image corresponding to the image to be detected, or obtaining a video event sequence corresponding to the video to be detected.

In this embodiment, preprocessing the target object includes: and when the target object is the image to be detected, carrying out normalization processing on the image to be detected to obtain a target image. When the target object is a video to be detected, carrying out normalization processing on the video to be detected to obtain a normalized video, carrying out granulation processing on the normalized video to obtain a lens sequence and a content frame sequence, and obtaining a video event sequence corresponding to the video to be detected according to the lens sequence and the content frame sequence.

The normalization processing of the images to be detected comprises, but is not limited to, normalization of the amplitude-to-form ratio, the resolution ratio and the color space of the images to be detected, so that each image to be processed has the same image dimension, the analysis of the content of a large number of images under the same coordinate system is facilitated, the similarity analysis between different image contents is facilitated according to the pixels of the images, and the analysis rate of the image contents is accelerated.

The normalization processing is performed on the video to be detected to obtain a normalized video, which includes but is not limited to performing processing such as video frame decomposition, picture-in-picture image extraction, frame removal of images in the video, and the like on the video, and performing normalization conversion such as resolution, amplitude-to-shape ratio, color space, and the like on the images of the video, so that the obtained normalized video has the same dimension, and is convenient for granulation processing.

Fig. 7 is a schematic diagram showing a granulating structure, referring to fig. 7, a granulating structure of one video includes a video, a frame sequence, a shot and a content frame, the frame sequence is all frames representing video content, the shot is a continuous picture segment shot by a camera between one turn-on and turn-off, the continuous picture segment is a basic unit of video composition, and the content frame is a frame representing shot content.

In this embodiment, the granulating process refers to performing shot segmentation on a video to obtain a granulating structure of the video, where the principle of obtaining the granulating structure is as follows: the video content is composed of continuous frame sequences, the continuous frame sequences can be divided into a plurality of groups according to the continuity of the video content, shot detection is carried out according to the frame sequences, each group of continuous frame sequences is a shot, the shot sequences comprise at least one shot, a small number of frames are selected from the continuous frame sequences to represent the shot content by analyzing the difference of the content in the video shots, the frames are content frames, namely, the video frame sequences of each shot in the shot sequences are subjected to content frame extraction to obtain the content frame sequences of each shot, and then the video event sequences are obtained according to the shot sequences and the content frame sequences. Wherein, the content frames at least comprise the first and last two frames (shot frames) of the shot, so the content frame number of one shot is more than or equal to 2.

In this embodiment, the video event sequence includes at least one target video event, where the video event refers to a set of all content frames in a shot, the content frames refer to frames representing shot content, including a first frame, a last frame, and N intermediate frames, where N is a natural number, and the intermediate frames are obtained when the difference rate is greater than a preset threshold by performing difference rate calculation on all subframes of a shot except for the first frame and the last frame in sequence with the previous content frame.

Fig. 8 is a schematic diagram of content frame extraction according to an embodiment of the present invention, as shown in fig. 8, the first frame is the first content frame, and then the 2 nd and 3 rd frames are calculated. And then calculating the difference rates of the 5 th, 6 th and 4 th frames until the difference rate is larger than a preset threshold, and if the difference rates of the 5 th, 6 th and 7 th frames and the first frame are smaller than the preset threshold and the 8 th frame is larger than the preset threshold, the 8 th frame is the third content frame. And by analogy, calculating the content frames in all subframes between all the first frames and all the tail frames. The end frame is selected directly as the last content frame without having to calculate the rate of difference with its previous content frame. The difference rate is the calculated difference rate between two frames of images.

For example, a surveillance video, with few people and few cars during the night, the video frame changes little, and the content frames will be few, for example, only a single number of content frames are extracted within 10 hours. The number of people and vehicles in the daytime is large, the change of people and objects in the video picture is frequent, and the content frames calculated according to the method are much more than those in the evening. Thus, the content frames are guaranteed not to lose all of the content information of the shot video relative to the key frames, as the key frames may lose part of the shot content. Compared with the scheme that each frame of the video is calculated and considered, the selection of the content frames only selects partial video image frames, so that the image calculation amount is greatly reduced on the premise of not losing the content.

By the method, the normalized target image and the target video event comprising the content frame can be obtained, so that the image or video event similar to the target image or the target video event can be quickly searched in the image information network or the video event information network.

S303, searching an image similar to the target image in the image information network to obtain a first detection result; or searching the video event similar to the target video event in the video event information network to obtain a second detection result.

The creation process of the image information network and the video event information network has been described in the above step S301, and will not be described here again.

In one example, searching for an image similar to the target image in the image information network to obtain a first detection result includes: calculating image characteristics of a target image, traversing root nodes of an image information network, screening the root nodes through characteristic quantity of the target image to obtain alternative image root nodes, calculating image characteristic difference rates of the target image and the alternative image according to the image characteristics of the target image and the image characteristics of the alternative image, judging whether the target image is similar to the alternative image or not through the image characteristic difference rates, wherein the alternative image is an image corresponding to the root nodes of the alternative image, outputting a similar image set as a first detection result, and the similar image set comprises all images similar to the target image in the image information network.

In this embodiment, the image features of the target image include, but are not limited to, image features composed of the UniformLBP features, and the UniformLBP features have good sensitivity to image texture changes, so that the UniformLBP features of the image are adopted as the image features in this embodiment, so that the content features of the image can be better reflected. In an alternative example, the image features may also be other image features, such as histogram features, sift features, hog features, haar features, etc., which are not listed here.

Taking image features as UniformLBP features as an example, in this embodiment, the image features include an image feature matrix and a modulus of the image feature matrix, and calculating the image feature matrix of the target image may obtain sixteen-bit feature data through the low-eight-bit feature data and the high-eight-bit feature data of the target image, and obtain sixteen-bit feature data according to the sixteen-bit feature data

And obtaining an image feature matrix by the feature matrix.

In this embodiment, the feature quantity of the target image may be measured by using a model of an image feature matrix, where a model value calculation formula of the image feature matrix is:

where i is the YUV component, wi and hi are the width and height, respectively, of the component,

for the abscissa of the pixel point, m and n are non-negative integers, and +.>

Is->

Special feature of pixel coordinate point in v dimensionSign value (s)/(s)>

As a dimension of the features,

。

in one example, the smaller the modulus of the image feature matrix of the target image is, the less the feature representing the current image is, the larger the modulus of the image feature matrix of the target image is, the more the feature representing the current image is, so that the root node can be filtered according to the feature quantity of the target image to obtain an alternative image root node, for example, if the modulus of the image feature matrix of the target image is a, the modulus of the image feature matrix is set as (a

) The root node corresponding to the range image is taken as an alternative root node, and the modulus of the image characteristic matrix is not within (A +_>

) The root node corresponding to the range of images is screened out, so that most of images dissimilar to the target image can be screened out quickly, and the user is added with the target image>

Is a preset threshold.

In this embodiment, the image feature difference ratio of the target image and the candidate image

The calculation formula is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

for an alternative image +.>

For the target image +.>

For the corresponding characteristic difference value of the target image and the candidate image under YUV component, +.>

The modes of the candidate image and the target image respectively,

the denominator cannot be 0, when +.>

And->

All are 0

。

The characteristic difference value of the target image and the candidate image corresponding to each other under YUV component

The expression of (2) is:

wherein (1)>

For coordinate points in the candidate image and the target image +.>

Corresponding characteristic difference values of the pixel points of (1) under YUV components, i is YUV components, wi and hi are respectively the width and height under the components, and +.>

And is the abscissa of the pixel point, m and n are non-negative integers,

is->

Pixel arrangementCharacteristic value of coordinate point in v dimension, +.>

。

In this embodiment, the difference ratio preset condition for judging whether the target image is similar to the candidate image according to the image feature difference ratio is:

is an inherent error->

For calculating the preset threshold value of the error, +.>

And the image characteristic difference rate threshold value is preset.

Image characteristic difference rate between target image and candidate image

And under the condition that the difference rate preset condition is met, the difference rate of the target image and the candidate image is small, and the condition that the images are similar is met, and the target image and the candidate image are determined to be similar. If the difference rate of the image characteristics of the target image and the candidate image +.>

If the fourth preset condition is not met, the difference rate between the target image and the alternative image is large, and the target image and the alternative image are determined to be dissimilar.

After traversing the root nodes of the image information network in the mode, all root node images similar to the target image can be obtained. It may be understood that a sub-node also exists in the root node in the image information network, when the root node image is similar to the target image, the image corresponding to the sub-node may be similar to the target image, so that the embodiment further includes traversing all sub-nodes associated with the root node of the candidate image corresponding to the candidate image when the target image is similar to the candidate image, obtaining the sub-node of the candidate image, judging whether the corresponding images of the target image and the sub-node of the candidate image are similar according to the image feature difference rate of the corresponding images of the target image and the sub-node of the candidate image, and adding the corresponding images of the sub-node of the candidate image to the similar image set when the corresponding images of the target image and the sub-node of the candidate image are similar.

Therefore, in this embodiment, all images similar to the target image can be found in the image information network, so as to form a similar image set, where the similar image set includes all images similar to the target image in the image information network. And the image information network is created according to the copyrighted original image and the content frame of the original video, so that a similar image set can be used as the first detection result.

Further, searching for a video event similar to the video event of the video to be detected in the video event information network to obtain a second detection result, including: calculating feature data of the target video event based on each content frame in the target video event; traversing the root node of the video event information network, and screening the root node through the content frame number of the target video event to obtain an alternative event root node; calculating the similarity of the target video event and the alternative event according to the feature data of the target video event and the feature data of the alternative event, and judging whether the target video event and the alternative event are similar or not according to the similarity of the target video event and the alternative event, wherein the alternative event is a video event corresponding to an alternative event root node; and outputting a similar event set as a second detection result, wherein the similar event set comprises all video events similar to the target video event in the video event information network.

In this embodiment, the feature data of the target video event includes the number of content frames of the video event and the feature matrix of the content frames of the video event, and the feature matrix of the content frames of the video event includes the feature matrix of each content frame included in the video event. The feature matrix of each content frame may be obtained according to the uniformity lbp feature of the content frame, which may better reflect the content feature of the video event, and in an alternative example, the feature matrix of the content frame may be obtained according to other features of the content frame, such as a histogram feature, a sift feature, a hog feature, and a haar feature, which are not listed herein.

In this embodiment, analysis and calculation are performed on a content frame of a video event to obtain feature data of the video event, including: obtaining the number of the content frames of the video event according to the number of the frames of the content frames in the content frame set of the video event, and obtaining the feature vector of the video event according to the feature matrix of each content frame in the content frame set of the video event.

For example, if the number of frames of the content frame set of the video event a is 5, the number of frames of the content frame set of the video event a is 5. Because the video event refers to a set of all content frames in a shot, the feature vector of the video event can be obtained according to the feature matrix of each content frame in the set of content frames of the video event.

In one example, when the number of content frames of the target video event differs from the number of content frames corresponding to the root node by more than a preset number, it may be initially determined that the content contained in the content frames of the target video event is also larger, for example, the number of content frames of the first video event is 9, the number of content frames of the second video event is 20, the number of content frames of the second video event is larger, and the difference between the two content frames is directly determined that the two content frames are dissimilar. When the difference between the number of the content frames of the target video event and the number of the content frames corresponding to the root node is smaller than the preset number, the content frames of the target video event and the root node may be similar to each other, so that the root node is screened to obtain the alternative event root node through the number of the content frames of the target video event, and video events dissimilar to the target video event can be screened out quickly.

After the alternative event root node is obtained, calculating the similarity rate of the target video event and the alternative event according to the characteristic data of the target video event and the characteristic data of the alternative event

，/>

The calculation formula of (2) is as follows:

，/>

representing an alternative event, q representing a target video event, < +.>

Content frame number for target video event, +.>

Content frame number for alternative event, +.>

The +. >

Content frame difference rate.

Target video event and alternate event

Content frame difference rate->

The calculation formula of (2) is as follows:

the content frame difference value for the i content frame of the target video event and the j content frame of the alternative event,

the calculation formula of (2) is as follows: />

j content frame for alternative event p, +.>

，/>

For the i content frame of the target video event q,

，/>

for the original difference rate between the j content frame of event p and the i content frame of event q,/>

Is an inherent error->

For calculating the preset threshold value of the error, +.>

Is a content frame difference rate threshold. />

Modulo the feature matrix of the j content frame of the alternative event, +.>

Modulo the feature matrix of the i content frame of the target video event,

as the denominator is not 0, when +.>

And->

All are 0

。

In this embodiment, whether the target video event and the candidate event are similar is determined according to the similarity between the target video event and the candidate event, which may be determined according to the magnitude relation between the similarity between the target video event and the candidate event and the preset similarity threshold, for example, when the similarity between the target video event and the candidate event is greater than or equal to the preset similarity threshold, which indicates that the similarity between the target video event and the candidate event is greater within the allowable error range, the target video event and the candidate event are determined to be similar, and when the similarity between the target video event and the candidate event is less than the preset similarity threshold, which indicates that the difference between the target video event and the candidate event is greater, the target video event and the candidate event are determined to be dissimilar.

After traversing the root node of the video event information network by the method, all the root node video events similar to the target video event can be obtained. It can be understood that a sub-node also exists in the root node in the video event information network, when the video event of the root node is similar to the target video event, the video event corresponding to the sub-node may be similar to the target video event, so that all sub-nodes associated with the alternative event root node corresponding to the alternative event are traversed under the condition that the target video event is similar to the alternative event, the alternative event sub-node is obtained, whether the video event corresponding to the target video event is similar to the video event corresponding to the alternative event sub-node is judged according to the similarity of the video event corresponding to the target video event and the alternative event sub-node, and when the video event corresponding to the target video event is similar to the video event corresponding to the alternative event sub-node, the video event corresponding to the alternative event sub-node is added into the similar event set.

Therefore, in this embodiment, all video events similar to the target video event can be found in the video event information network, so as to form a similar video event set, where the similar video event set includes all video events similar to the target video event in the video event information network. And the video event information network is created according to the copyrighted original video, so that a similar video event set can be used as a second detection result.

In one example, a video may be formed by a constant image plus text-by-text and speech-narrative, in which case the content frames in the sequence of video events of a video are identical, and the effects are affected when the video events are infringed by a network of video event information.

Therefore, in this embodiment, when the target object includes a video to be detected, the method further includes: and acquiring a content frame of each video event of the video to be detected, and searching an image similar to the content frame in the image information network to obtain a first detection result. That is, the image information network can be used for infringement comparison of the content frames of the video event, and the accuracy of the infringement comparison can be improved.

After the first detection result or the second detection result is obtained, the method of this embodiment further includes: and under the condition that the total number of the similar images contained in the first detection result is larger than 1, the images in the similar image set are ordered according to the image difference rate of the target image and each similar image, so that the image most similar to the target image can be ranked first, and the follow-up infringement judgment processing is facilitated.

Or under the condition that the total number of the similar video events contained in the second detection result is greater than 1, the video events in the similar event set are ordered according to the similarity of the target video event and each similar video event, so that the video event most similar to the target video event can be ranked first, and the subsequent infringement judgment processing is facilitated.

S304, obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result, and outputting the infringement data.

It may be understood that, when the first detection result includes a similar image of the image to be detected, or the second detection result includes a similar video event of the video to be detected, it indicates that there is a possibility of infringement of the target image or the target video event, and whether the infringement specifically requires relevant evidence and human judgment, and in this embodiment, infringement data representing infringement of the target object may be obtained according to the first detection result or the second detection result, and infringement data is output, so as to provide data support for subsequent infringement judgment.

In this embodiment, obtaining infringement data characterizing infringement of the target object according to the first detection result or the second detection result includes: under the condition that the first detection result comprises a similar image of the image to be detected, taking the image to be detected, the similar image and the position information of the similar image in the original video as infringement data of the image to be detected, wherein the similar image comprises an original image similar to the image to be detected or a content frame of the original video similar to the image to be detected.

For example, the image to be detected is M, the similar image included in the first detection result is N, P, Q, where N, P is the original image, Q is the content frame of the original video, and the obtained infringement data includes the frame position of Q in the original video or the playing time information, for example, the content frame Q appears at 20 th minute of the original video.

In this embodiment, under the condition that the second detection result includes a similar video event of the video to be detected, the similar video event, and the position information of the similar video event in the original video are used as infringement data of the video to be detected.

Fig. 9 shows a content frame infringement effect diagram of a video to be tested, referring to fig. 9, in the video to be tested of fig. 9, a video frame is kept unchanged all the time, and a movie work is illustrated by changing subtitles, but the video frame in the video is similar to a copyrighted image in an image information network, so that infringement data of the video frame of the video can be obtained, and fig. 9 also shows a relationship among the copyrighted image in the image information network, the video to be tested and the content frame similar to the copyrighted image in the video to be tested, so that the infringement detection effect of the embodiment can be clearly shown.

Fig. 10 shows a video event infringement effect diagram of a video to be detected, and referring to fig. 10, the video to be detected in fig. 10 can obtain an infringement detection result of the video to be detected, which is similar to the copyrighted video 1 and the copyrighted video 2 at the same time, by combining the copyrighted video 1 and the copyrighted video 2 in the detection process. Also shown in fig. 10 is the location of a specific infringement fragment in the video under test that is similar to the copyrighted video, it being understood that the location of the infringed video fragment may also be obtained in the copyrighted video (not shown).

It will be appreciated that, since the target object may be from any resource platform, the target object may also be copyrighted video data, for example, a video a is stored in a copyrighted sample library, and the playing right of the video a is granted to the playing platform Q, and when the video a comes from the playing platform Q, the infringement of the playing platform Q cannot be determined, so that after outputting the infringement data, an infringement monitor can manually detect the infringement data.

In one example, the present embodiment may provide a selection control based on each infringement data on the display interface, and the infringement monitor may click on the selection control as infringement confirmation information when determining that a certain video or image is infringed.

Further, the embodiment further includes: acquiring infringement confirmation information of infringement monitoring personnel based on infringement data of an image to be detected or a video to be detected; and storing infringement data corresponding to the infringement confirmation information to form an evidence base.

For example, the infringement data includes infringement data of the image M to be tested, infringement data of the video N to be tested, infringement data of the video T to be tested, and the video T to be tested comes from an authorized platform allowed to be played, and infringement monitoring personnel confirms that the image M to be tested and the video N to be tested infringe, and stores the infringement data of the image M to be tested and the infringement data of the video N to be tested into an evidence library so as to be convenient for infringement evidence.

Fig. 11 is an overall flowchart of a method for detecting copyrights in an embodiment, as shown in fig. 11, taking a video sample as an example, taking the copyrighted video sample as an input to create a sample library, performing video preprocessing on the copyrighted video sample, performing granulation processing on video, where the granulation processing includes dividing shots of the video, extracting content frames in a video event, and calculating characteristics of the video event, where the dividing the shots of the video may obtain a video event sequence, and further creating a video event information network through the video event sequence. The method comprises the steps of extracting content frames, calculating the characteristics of the content frames to further obtain event characteristics of video events, and obtaining a content frame sequence according to the content frames of the video to establish an image information network according to the content frame sequence. And meanwhile, a copyright sample library is also established according to the granulating processing copyright video samples.

When infringement detection is performed, video preprocessing and granulating are performed on the detection video, shot segmentation is performed on the detection video to obtain an event sequence of the detection video, and event-level infringement detection (namely, judging whether the video event in the detection video is infringed or not) can be performed by comparing the video event in the event sequence of the detection video with the video event in the video event information network.

Meanwhile, in order to improve the detection efficiency, the content frames in the detection video are extracted to obtain a content frame sequence of the detection video, and frame-level infringement detection (namely, judging whether the images in the detection video are infringed) can be performed by comparing the content frames in the content frame sequence of the detection video with the images in the image information network.

In addition, a single image may infringe, so when the detected image is detected, the image features are calculated after the detected image is subjected to image preprocessing, and frame-level infringe detection is performed according to the image features of the detected image to determine whether the detected image is infringed, wherein the detected image can be any image from different resource libraries.

The above is a copyright detection method provided in this embodiment, where a target object is preprocessed by obtaining an image to be detected or a video to be detected, to obtain a target image corresponding to the image to be detected, or obtain a video event sequence corresponding to the video to be detected, where the video event sequence includes at least one target video event, and find an image similar to the target image in an image information network, to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; and obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result, outputting the infringement data, providing infringement data for infringement judgment, accurately and rapidly obtaining an infringement analysis result, and improving the accuracy of infringement analysis.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

The following is an embodiment of the copyright detection system of the present invention, which may be used to perform an embodiment of the method of the present invention. For details not disclosed in the embodiments of the copyright detection system of the present invention, please refer to the embodiments of the method of the present invention.

Example IV

Fig. 12 is a schematic diagram showing the structure of a copyright detection system provided in an exemplary embodiment of the present invention. The rights detection system may be implemented as all or part of the terminal by software, hardware or a combination of both. The copyright detection system 700 includes:

the data acquisition module 701 is configured to acquire a target object to be detected, where the target object includes an image to be detected or a video to be detected.

The processing module 702 is configured to pre-process the target object to obtain a target image corresponding to the image to be detected, or obtain a video event sequence corresponding to the video to be detected, where the video event sequence includes at least one target video event, the video event refers to a set of all content frames in a shot, the content frames refer to frames representing the content of the shot, including a first frame, a last frame and N intermediate frames, where N is a natural number, and the intermediate frames are obtained when the difference rate is greater than a preset threshold by calculating a difference rate between all subframes of the shot except the first frame and the last frame and a previous content frame thereof.

The detection module 703 is configured to search an image similar to the target image in the image information network to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; the image information network is a forest structure constructed based on an image information space and an image multi-level tree set, the image multi-level tree comprises root nodes and child nodes, the difference rate between images corresponding to any two root nodes is larger than a preset threshold, the difference rate between the child node of each root node and the image corresponding to the root node is smaller than or equal to the preset threshold, the image information space is a multi-dimensional vector space in which image feature vectors are located, and the image feature vectors are obtained by calculating after feature matrices are extracted from the images under the same coordinate system; the video event information network is a forest structure constructed based on a video event information space based on an event multi-level tree set, the video event information space is a multi-dimensional vector space in which video event feature vectors are located, and the video event feature vectors are obtained by calculating after feature matrices are extracted from a content frame set under the same coordinate system.

And the output module 704 is configured to obtain infringement data representing infringement of the target object according to the first detection result or the second detection result, and output the infringement data.

Example five

Fig. 13 is a schematic diagram showing another configuration of a copyright detection system provided in an exemplary embodiment of the present invention.

The system further comprises: the infringement evidence obtaining module 705 is configured to obtain infringement confirmation information of infringement data of the infringement monitor based on the image to be detected or the video to be detected, store infringement data corresponding to the infringement confirmation information, and form an evidence library.

The processing module 702 further includes:

the normalization module 7021 is configured to normalize an image to be measured or a video to be measured.

And the granulating module 7022 is used for granulating the video to be tested to obtain a video event sequence.

The feature extraction module 7023 is configured to calculate image features of an image and feature data of a video event.

The image information network creation module 7024 is configured to construct an image information network from the original image and the content frame of the original video.

The video event information network creation module 7025 is configured to construct a video event information network according to the video event sequence of the original video.

It should be noted that, in the copyright detection system provided in the above embodiment, only the division of the above functional modules is used for illustration when executing the copyright detection method, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the copyright detection system and the copyright detection method provided in the foregoing embodiments belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not described herein again.

Fig. 14 shows an overall architecture diagram of a copyright detection system provided by an exemplary embodiment of the present invention.

Referring to fig. 14, the overall architecture of the copyright detection system is divided into four layers of architecture layer, information production layer, platform technology layer, application presentation layer.

The infrastructure layer is the basis for the reliable, stable and efficient operation of the copyright detection system, and comprises an operating system, a file system, a communication link, a network, a server and a database, so that data storage and dynamic management service are realized.

The information production layer is used for storing, normalizing and granulating the access image or video, autonomously constructing an information characteristic network, autonomously learning the video content without a sample, realizing video content analysis, and providing data support for system application, and comprises an information access module, an information granulating module, an information monitoring module, a message subscribing module and an information management module.

The platform technology layer provides an adaptive service interface for diversified applications by utilizing the characteristics of a self-constructed information feature network through image feature space analysis, and concretely comprises a resource management unit, a data processing unit and an infringement analysis unit, wherein the resource management unit comprises a film and television copyright sample library, a content frame database, a video event database, an infringement evidence library and the like. The data processing unit comprises a data normalization module, a lens/content frame extraction module, an image feature extraction module and an event feature extraction module. The infringement analysis module includes a video event information network, an image similarity analysis module, and a video event similarity analysis module.

The application display layer provides functions of video image comparison result output, resource configuration and the like according to the requirement of video copyright monitoring, and comprises an infringement monitoring module, an infringement evidence obtaining module, an evidence library management module, a sample library management module and a resource configuration module. The copyright detection method implemented by the overall architecture of the copyright detection system corresponds to the copyright detection method provided by the above embodiment.

The embodiment of the invention also provides an electronic device corresponding to the copyright detection method provided by the previous embodiment, so as to execute the copyright detection method.

Example six

Fig. 15 shows a schematic diagram of an electronic device according to an embodiment of the invention. As shown in fig. 15, the electronic device 800 includes: a memory 801 and a processor 802, the memory 801 storing a computer program executable on the processor 802, the processor 802 executing the method provided by any of the preceding embodiments of the invention when the computer program is executed.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the steps of the above-described copyright detection method by a computer program.

Alternatively, as will be appreciated by those skilled in the art, the structure shown in fig. 15 is merely illustrative, and the electronic device may be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palmtop computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 15 is not limited to the structure of the electronic device and the electronic apparatus described above. For example, the electronics can also include more or fewer components (e.g., network interfaces, etc.) than shown in fig. 15, or have a different configuration than shown in fig. 15.

The memory 801 may be used to store software programs and modules, such as program instructions/modules corresponding to the copyright detection method and apparatus in the embodiment of the present invention, and the processor 802 executes the software programs and modules stored in the memory 801, thereby performing various functional applications and data processing, that is, implementing the copyright detection method described above. The memory 801 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 801 may further include memory remotely located relative to the processor 802, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. Wherein the memory 801 may be, but is not limited to, a network for storing video event information. As an example, the memory 801 may include, but is not limited to, a video processing module, a granulating module, a calculating module, and a constructing module in the copyright detection system. In addition, other module units in the above-mentioned copyright detection system may be included, but are not limited to, and are not described in detail in this example.

Optionally, the electronic device comprises transmission means 803, the transmission means 803 being adapted to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 803 includes a network adapter (Network Interface Controller, NIC) that can be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 803 is a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

In addition, the electronic device further includes: a display 804, configured to display an analysis result of the copyright detection; and a connection bus 805 for connecting the respective module parts in the above-described electronic apparatus.

Example seven

The present embodiments provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer program is configured to, when executed, perform the steps of any of the method embodiments described above.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for executing the steps of the image content analysis method of a massive image.

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

Claims

1. A copyright detection method, characterized by comprising:

acquiring a target object to be detected, wherein the target object comprises an image to be detected or a video to be detected;

preprocessing the target object to obtain a target image corresponding to the image to be detected, or obtaining a video event sequence corresponding to the video to be detected, wherein the video event sequence comprises at least one target video event, the video event refers to a set of all content frames in a lens, the content frames refer to frames representing the lens content and comprise a first frame, a last frame and N middle frames, N is a natural number, and the middle frames are obtained when the difference rate is larger than a preset threshold value by calculating the difference rate of all subframes of a lens except the first frame and the last frame and the previous content frame;

Searching an image similar to the target image in an image information network to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; the image information network is a forest structure constructed based on an image information space and an image multi-level tree set, the image multi-level tree comprises root nodes and child nodes, the difference rate between images corresponding to any two root nodes is larger than a preset threshold, the difference rate between the child node of each root node and the image corresponding to the root node is smaller than or equal to the preset threshold, the image information space is a multi-dimensional vector space in which image feature vectors are located, and the image feature vectors are obtained by calculating after feature matrices are extracted from the images under the same coordinate system; the video event information network is a forest structure constructed based on a video event information space based on an event multi-level tree set, the video event information space is a multi-dimensional vector space in which video event feature vectors are located, and the video event feature vectors are obtained by calculating after extracting feature matrixes from a content frame set under the same coordinate system;

Obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result, and outputting the infringement data.

2. The method according to claim 1, characterized in that before acquiring the target object to be detected, the method comprises:

acquiring original data, and preprocessing the original data to obtain a copyright sample library, wherein the copyright sample library comprises an original image and an original video corresponding to the original data;

and constructing the video event information network according to the video event sequence of the original video, and constructing the image information network according to the original image and the content frame of the original video.

3. The method of claim 1, wherein preprocessing the target object comprises:

when the target object is an image to be detected, carrying out normalization processing on the image to be detected to obtain the target image;

when the target object is a video to be detected, carrying out normalization processing on the video to be detected to obtain a normalized video; granulating the normalized video to obtain a lens sequence and a content frame sequence; and obtaining a video event sequence corresponding to the video to be detected according to the shot sequence and the content frame sequence.

4. The method of claim 1, wherein searching for an image similar to the target image in the image information network to obtain a first detection result comprises:

calculating image characteristics of the target image;

traversing root nodes of the image information network, and screening the root nodes through the feature quantity of the target image to obtain alternative image root nodes;

calculating the image feature difference rate of the target image and the alternative image according to the image features of the target image and the image features of the alternative image, and judging whether the target image is similar to the alternative image or not according to the image feature difference rate, wherein the alternative image is an image corresponding to a root node of the alternative image;

and outputting a similar image set as the first detection result, wherein the similar image set comprises all images similar to the target image in the image information network.

5. The method according to claim 4, wherein the method further comprises:

under the condition that the target image is similar to the alternative image, traversing all sub-nodes associated with the alternative image root node corresponding to the alternative image to obtain alternative image sub-nodes;

Judging whether the target image is similar to the corresponding image of the alternative image sub-node according to the image characteristic difference rate of the target image and the corresponding image of the alternative image sub-node, and adding the corresponding image of the alternative image sub-node into the similar image set under the condition that the target image is similar to the corresponding image of the alternative image sub-node.

6. The method of claim 1, wherein searching for a video event in the video event information network that is similar to the video event of the video under test to obtain a second detection result comprises:

calculating feature data of the target video event based on each content frame in the target video event;

traversing root nodes of the video event information network, and screening the root nodes according to the number of the content frames of the target video event to obtain alternative event root nodes;

calculating the similarity ratio of the target video event and the alternative event according to the characteristic data of the target video event and the characteristic data of the alternative event, and judging whether the target video event and the alternative event are similar or not according to the similarity ratio of the target video event and the alternative event, wherein the alternative event is a video event corresponding to the alternative event root node;

And outputting a similar event set as the second detection result, wherein the similar event set comprises all video events similar to the target video event in the video event information network.

7. The method of claim 6, wherein the method further comprises:

under the condition that the target video event and the alternative event are similar, traversing all sub-nodes associated with the alternative event root node corresponding to the alternative event to obtain alternative event sub-nodes;

judging whether the target video event is similar to the video event corresponding to the alternative event child node according to the similarity of the target video event and the video event corresponding to the alternative event child node, and adding the video event corresponding to the alternative event child node into the similar event set under the condition that the target video event is similar to the video event corresponding to the alternative event child node.

8. The method of claim 7, wherein when the target object comprises a video to be measured, the method further comprises:

acquiring a content frame of each video event of the video to be detected;

and searching an image similar to the content frame in an image information network to obtain the first detection result.

9. The method of claim 1, wherein after obtaining the first test result or the second test result, the method further comprises:

when the total number of the similar images contained in the first detection result is greater than 1, sorting the images in the similar image set according to the image difference rate of the target image and each similar image; or (b)

And under the condition that the total number of the similar video events contained in the second detection result is larger than 1, sequencing the video events in the similar event set according to the similarity rate of the target video event and each similar video event.

10. The method according to claim 1, wherein deriving infringement data characterizing infringement of the target object based on the first detection result or the second detection result comprises:

under the condition that the first detection result comprises a similar image of the image to be detected, taking the image to be detected, the similar image and the position information of the similar image in an original video as infringement data of the image to be detected, wherein the similar image comprises an original image similar to the image to be detected or a content frame of the original video similar to the image to be detected;

And under the condition that the second detection result comprises similar video events of the video to be detected, taking the video to be detected, the similar video events and the position information of the similar video events in the original video as infringement data of the video to be detected.

11. The method according to claim 10, wherein the method further comprises:

acquiring infringement confirmation information of infringement monitoring personnel based on infringement data of the image to be detected or the video to be detected;

and storing infringement data corresponding to the infringement confirmation information to form an evidence library.

12. A copyright detection system, the system comprising:

the data acquisition module is used for acquiring a target object to be detected, wherein the target object comprises an image to be detected or a video to be detected;

the processing module is used for preprocessing the target object to obtain a target image corresponding to an image to be detected, or obtaining a video event sequence corresponding to the video to be detected, wherein the video event sequence comprises at least one target video event, the video event refers to a set of all content frames in a lens, the content frames refer to frames representing the content of the lens and comprise a first frame, a last frame and N middle frames, N is a natural number, and the middle frames are obtained when the difference rate is larger than a preset threshold value through calculating the difference rate of all subframes of the lens except the first frame and the last frame and the previous content frame;

The detection module is used for searching an image similar to the target image in the image information network to obtain a first detection result; or searching a video event similar to the target video event in the video event information network to obtain a second detection result; the image information network is a forest structure constructed based on an image information space and an image multi-level tree set, the image multi-level tree comprises root nodes and child nodes, the difference rate between images corresponding to any two root nodes is larger than a preset threshold, the difference rate between the child node of each root node and the image corresponding to the root node is smaller than or equal to the preset threshold, the image information space is a multi-dimensional vector space in which image feature vectors are located, and the image feature vectors are obtained by calculating after feature matrices are extracted from the images under the same coordinate system; the video event information network is a forest structure constructed based on a video event information space based on an event multi-level tree set, the video event information space is a multi-dimensional vector space in which video event feature vectors are located, and the video event feature vectors are obtained by calculating after extracting feature matrixes from a content frame set under the same coordinate system;

The output module is used for obtaining infringement data representing infringement of the target object according to the first detection result or the second detection result and outputting the infringement data.

13. The copyright detection system of claim 12, wherein said system further comprises: the infringement evidence obtaining module is used for obtaining infringement confirmation information of infringement data of the infringement monitor based on the image to be detected or the video to be detected, storing infringement data corresponding to the infringement confirmation information and forming an evidence base;

the processing module further includes:

the normalization module is used for carrying out normalization processing on the image to be detected or the video to be detected;

the granulating module is used for granulating the video to be detected to obtain a video event sequence;

the feature extraction module is used for calculating image features of the images and feature data of the video event;

the image information network creation module is used for constructing an image information network according to the original image and the content frame of the original video;

and the video event information network creation module is used for constructing a video event information network according to the video event sequence of the original video.

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor runs the computer program to implement the method of any one of claims 1-11.

15. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the method of any of claims 1-11.