CN110163079A

CN110163079A - Video detecting method and device, computer-readable medium and electronic equipment

Info

Publication number: CN110163079A
Application number: CN201910226798.4A
Authority: CN
Inventors: 龚国平; 徐敘遠; 吴韬; 杨喻茸
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2019-08-23

Abstract

The invention discloses a kind of video detecting method and devices, computer-readable medium and electronic equipment, are related to technical field of image processing.The video detecting method includes: preparatory acquisition video frame sample, determine the first global characteristics and the second global characteristics of video frame sample, second global characteristics are converted into the first cryptographic Hash, is established and is indexed using the first cryptographic Hash, and the first global characteristics are stored under index；It determines the first global characteristics and the second global characteristics of key frame of video to be detected, and the second global characteristics of key frame of video to be detected is converted into the second cryptographic Hash；When the similarity of the second cryptographic Hash and the first cryptographic Hash is greater than the first similarity threshold, extract corresponding first global characteristics of the second cryptographic Hash, corresponding first global characteristics of second cryptographic Hash are compared with the global characteristics of the corresponding storage of the first cryptographic Hash, and determine the testing result of video to be detected according to comparison result.The disclosure can quickly and accurately realize video detection.

Description

Video detecting method and device, computer-readable medium and electronic equipment

Technical field

This disclosure relates to technical field of image processing, in particular to a kind of video detecting method, device, computer Readable medium and electronic equipment.

Background technique

With the continuous development of machine learning techniques, machine learning is widely used in every field.For example, will Machine learning techniques are applied to field of video detection, for video to be detected, therefrom extract image, computer system can be certainly The image for being about to extract is compared with other images, and then can determine image similar with the image, and according to image Comparison result, complete the detection of video.

Currently, the disaggregated model based on convolutional neural networks can effectively detect image, however, for each The process of detection, process is complicated and takes a long time, can not be quickly and accurately to figure especially under the scene of detection video image As being detected, lead to not preferably obtain video detection result.

It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.

Summary of the invention

The disclosure is designed to provide a kind of video detecting method, device, computer-readable medium and electronic equipment, into And it can not be quickly and accurately to image caused by overcoming the limitation and defect due to the relevant technologies at least to a certain extent The problem of being detected.

According to one aspect of the disclosure, a kind of video detecting method is provided, comprising: video frame sample is obtained in advance, Second global characteristics are converted to the first cryptographic Hash by the first global characteristics and the second global characteristics for determining video frame sample, benefit It is established and is indexed with the first cryptographic Hash, and the first global characteristics are stored under index；Extract the video to be detected of video to be detected Key frame, determines the first global characteristics and the second global characteristics of key frame of video to be detected, and by key frame of video to be detected The second global characteristics be converted to the second cryptographic Hash；When the similarity of the second cryptographic Hash and the first cryptographic Hash is greater than the first similarity When threshold value, corresponding first global characteristics of the second cryptographic Hash are extracted, by corresponding first global characteristics of the second cryptographic Hash and first The global characteristics of the corresponding storage of cryptographic Hash are compared, and the testing result of video to be detected is determined according to comparison result.

According to one aspect of the disclosure, a kind of video detecting device, including detection preparation module, feature extraction are provided Module and feature comparison module.

Specifically, detection preparation module determines that the first of video frame sample is global special for obtaining video frame sample in advance It seeks peace the second global characteristics, the second global characteristics is converted into the first cryptographic Hash, established and indexed using the first cryptographic Hash, and by the One global characteristics are stored under index；Characteristic extracting module is used to extract the key frame of video to be detected of video to be detected, determines The first global characteristics and the second global characteristics of key frame of video to be detected, and it is global special by the second of key frame of video to be detected Sign is converted to the second cryptographic Hash；Feature comparison module is used to be greater than the first phase when the similarity of the second cryptographic Hash and the first cryptographic Hash When like degree threshold value, extract corresponding first global characteristics of the second cryptographic Hash, by corresponding first global characteristics of the second cryptographic Hash with The global characteristics of the corresponding storage of first cryptographic Hash are compared, and the testing result of video to be detected is determined according to comparison result.

Optionally, the first global characteristics and the second global characteristics of video frame are determined using convolutional neural networks, wherein Global characteristics are full articulamentum feature.

Specifically, detection preparation module determines that the first of the video frame sample is complete for obtaining video frame sample in advance Described second full articulamentum Feature Conversion is the first cryptographic Hash, using described by articulamentum feature and the second full articulamentum feature First cryptographic Hash establishes index, and the floating point features of the described first full articulamentum is stored under the index；Feature extraction mould Block is used to extract the key frame of video to be detected of video to be detected, determines the first full articulamentum of the key frame of video to be detected Floating point features and the second full articulamentum feature, and be the by the second of the key frame of video to be detected the full articulamentum Feature Conversion Two cryptographic Hash；Feature comparison module is used to be greater than first to the similarity of first cryptographic Hash when second cryptographic Hash similar When spending threshold value, the storage corresponding with first cryptographic Hash of the corresponding first full articulamentum floating point features of second cryptographic Hash is extracted Full articulamentum floating point features be compared, and the testing result of the video to be detected is determined according to comparison result.

Optionally, video frame sample include positive sample to and negative sample；Wherein, detection preparation module further includes sample acquisition Unit.

Specifically, sample acquisition unit is configured as executing: obtaining positive sample pair in advance, positive sample is to including the first image With the second image；Wherein, the first image is identical as the classification of the second image；It is different from the classification of positive sample pair that classification is obtained in advance Third image, as negative sample；Wherein, using the positive sample to and negative sample the convolutional neural networks are instructed Practice, to adjust the parameter in the convolutional neural networks.

Optionally, sample acquisition unit is additionally configured to execute: an image is obtained from video sample, as the first figure Picture；Using in video sample away from the image in the first image preset time period as the second image.

Optionally, sample acquisition unit is additionally configured to execute: an image is obtained from video sample, as the first figure Picture；Form conversion is carried out to the first image, to determine the second image.

Optionally, sample acquisition unit is additionally configured to execute: obtaining an image from video sample, waits as first Select image；Using in video sample away from the image in the first candidate image preset time period as the second candidate image；Respectively to One candidate image and the second candidate image carry out form conversion, to determine candidate image collection；It is concentrated from candidate image and obtains two Image, respectively as the first image and the second image.

Optionally, characteristic extracting module includes network training unit and feature extraction unit.

Specifically, network training unit is used to video frame sample inputting convolutional neural networks, convolutional neural networks are calculated Loss function, restrain the loss function of convolutional neural networks, with determine training after convolutional neural networks；Feature extraction Unit is used for the convolutional neural networks after the key frame of video to be detected input training by video to be detected, will connect entirely via first The first global characteristics of feature that layer is determined as key frame of video to be detected are connect, and will be determined via the second full articulamentum Second global characteristics of the feature as key frame of video to be detected.

Optionally, the loss function of convolutional neural networks includes range loss function, quantization loss function and comentropy damage Lose function；

Range loss function are as follows: L_t=max { (x_a-x_p)+m-(x_a-x_n),0}；

Quantify loss function are as follows: L_q=| | | x⁽ⁱ⁾|-1||₁；

Comentropy loss function are as follows:

Wherein, L_tIndicate range loss function, x_a、x_p、x_nRespectively indicate the spy of two samples and negative sample of positive sample centering Sign, m indicate the confidence level of feature spacing；L_qIndicate quantization loss function, x⁽ⁱ⁾Indicate any feature of video frame sample；L_eIt indicates Comentropy loss function, u_dIndicate the mean value of the feature of video frame sample.

Optionally, characteristic extracting module further includes key-frame extraction unit.

Specifically, key-frame extraction unit is used to extract video frame from video to be detected every prefixed time interval, make For key frame of video to be detected.

Optionally, video detecting device further includes copyright detection module.

Specifically, copyright detection module is configured as executing: if corresponding first global characteristics of second cryptographic Hash The similarity of global characteristics corresponding with first cryptographic Hash is greater than the second similarity threshold, it is determined that the video to be detected Upload side mark and first cryptographic Hash correspond to video upload side mark, when the video to be detected upload side mark When the upload side that knowledge and first cryptographic Hash correspond to video identifies different, send a warning message.

Optionally, video detecting device further includes label adding module.

Specifically, label adding module is configured as executing: if corresponding first global characteristics of second cryptographic Hash The similarity of global characteristics corresponding with first cryptographic Hash is greater than third similarity threshold, it is determined that first cryptographic Hash The label of corresponding video, and the label that first cryptographic Hash corresponds to video is added to the video to be detected.

Optionally, video detecting device further includes video uploading module.

Specifically, video uploading module is configured as executing: if corresponding first global characteristics of second cryptographic Hash The similarity of global characteristics corresponding with first cryptographic Hash, then will be on the video to be detected less than the 4th similarity threshold It passes.

According to one aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, is counted Any one of the above video detecting method when calculation machine program is executed by processor.

According to one aspect of the disclosure, a kind of electronic equipment is provided, comprising: one or more processors；Storage device, For storing one or more programs, when one or more programs are executed by one or more processors, so that one or more A processor realizes such as any one of the above video detecting method.

In the technical solution provided by some embodiments of the present disclosure, firstly, global using the first of video frame sample Feature construction video fingerprint data library utilizes the rope of the second global characteristics building and video fingerprint data library of video frame sample Draw；Next, the first global characteristics and the second global characteristics of key frame of video to be detected are extracted, the index based on building and benefit It determines to be compared with the first global characteristics of key frame of video to be detected with the second global characteristics of key frame of video to be detected Pair feature, and aspect ratio pair is carried out, to determine testing result.The illustrative embodiments of the disclosure are by providing index function Can, can quickly detect may global characteristics relevant to video to be detected, then it is special using the overall situation of video to be detected itself Sign is compared with possible relevant global characteristics, can accurately determine out and whether there is view similar with video to be detected Frequently.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.In the accompanying drawings:

Fig. 1 is shown can be using the video detecting method of the embodiment of the present disclosure or the exemplary system of video detecting device The schematic diagram of framework；

Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present disclosure；

Fig. 3 diagrammatically illustrates the flow chart of the video detecting method according to disclosure illustrative embodiments；

Fig. 4 diagrammatically illustrates the network of the convolutional neural networks according to used in the training of disclosure illustrative embodiments Structure chart；

Fig. 5 shows the schematic diagram of the determination positive sample pair according to a kind of illustrative embodiments of the disclosure；

Fig. 6 shows the schematic diagram of the determination positive sample pair according to disclosure another kind illustrative embodiments；

Fig. 7, which is diagrammatically illustrated, uses the video detecting method of disclosure illustrative embodiments to generate label for video Schematic diagram；

Fig. 8 is shown under bonding wire according to an exemplary embodiment of the present disclosure and scheme realizes video detection on line Schematic diagram；

Fig. 9 diagrammatically illustrates the block diagram of video detecting device according to an exemplary embodiment of the present disclosure；

Figure 10 diagrammatically illustrates the block diagram of detection preparation module according to an exemplary embodiment of the present disclosure；

Figure 11 diagrammatically illustrates the block diagram of characteristic extracting module according to an exemplary embodiment of the present disclosure；

Figure 12 diagrammatically illustrates the block diagram of the characteristic extracting module of the another exemplary embodiment according to the disclosure；

Figure 13 diagrammatically illustrates the block diagram of the video detecting device of the another exemplary embodiment according to the disclosure；

Figure 14 diagrammatically illustrates the block diagram of the video detecting device of the another exemplary embodiment according to the disclosure；

Figure 15 diagrammatically illustrates the block diagram of the video detecting device of another illustrative embodiments according to the disclosure.

Specific embodiment

Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein；On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.

In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.

Flow chart shown in the drawings is merely illustrative, it is not necessary to including all steps.For example, the step of having It can also decompose, and the step of having can merge or part merges, therefore the sequence actually executed is possible to according to the actual situation Change.It should be understood that term " first ", " second ", " third ", " the 4th " etc. are merely to the purpose distinguished, should not make For the limitation of disclosure corresponding content.

It is detected to image, more particularly under the scene detected to video image, some technologies use base In the scheme of VGG (Visual Geometry Group, visual geometric group) network integration sparse coding (Sparse Coding), However, this scheme needs intensive characteristic extraction procedure, time complexity is higher, and process is complicated, can not be preferably extensive It is applied in complicated video detection task；Other technologies use the unsupervised Hash based on DeepBit (depth bit) Algorithm, this kind of scheme is using Hash feature as the feature (or be fingerprint) of final image (video), however, by floating point features It is converted into Hash procedure, some effective informations can be lost, cause operation result inaccurate, in addition, for complexity and possible The video scene of transformation, this kind of scheme can not also be applied on a large scale.

In consideration of it, disclosure illustrative embodiments provide a kind of video detecting method and video detecting device, with solution The certainly above problem.

The embodiment of the present disclosure can be applied to the copyright detection scene of video.Specifically, the feature of video is extracted, based on view The feature of frequency and combine the mode of hash index determine whether there is with the duplicate video of the video, if it find that there is duplicate view Frequently, and the upload side of video is not identical, then there may be invade video copy.

The embodiment of the present disclosure can also be applied to add tagged scene for video.Specifically, extracting the feature of video, base Video tab relevant to the video is determined in the feature of video, and video tab is added to the video.

The embodiment of the present disclosure can also be applied to that video is avoided to repeat the scene uploaded.Specifically, if it is determined that in outlet In the presence of content identical with video, then the upload procedure of video can be prevented, to economize on resources, and user be avoided to view repetition Content.

In addition, the embodiment of the present disclosure can also be applied to video monitoring, target following, the detection of identification specific information, video The several scenes such as teaser or tail detection.

Fig. 1 is shown can be using the video detecting method of the embodiment of the present disclosure or the exemplary system of video detecting device The schematic diagram of framework.

As shown in Figure 1, system architecture 1000 may include one of terminal device 1001,1002,1003 or a variety of, net Network 1004 and server 1005.Network 1004 is logical to provide between terminal device 1001,1002,1003 and server 1005 Believe the medium of link.Network 1004 may include various connection types, such as wired, wireless communication link or fiber optic cables etc. Deng.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.For example server 1005 can be multiple server compositions Server cluster etc..

User can be used terminal device 1001,1002,1003 and be interacted by network 1004 with server 1005, to receive Or send message etc..Terminal device 1001,1002,1003 can be the various electronic equipments with display screen, including but unlimited In smart phone, tablet computer, portable computer and desktop computer etc..

Server 1005 can be to provide the server of various services.For example, server 1005 can be from terminal device 1001, video to be detected is obtained on 1002,1003, key frame of video to be detected is extracted from video to be detected, is determined to be detected The first global characteristics and the second global characteristics of key frame of video determine the second global characteristics with video frame sample is advanced with Index out is compared, and is determined and indexed lower cryptographic Hash corresponding with video to be detected, and can be determined based on cryptographic Hash can Can global characteristics relevant to video to be detected, the first global characteristics of video to be detected are compared with these global characteristics It is right, to realize the detection to video to be detected.Some such as copyright detections can also be carried out for comparison result, video repeats to examine Survey, the addition operation such as video tab, and feed back that these operate to terminal device 1001,1002,1003 as a result, to be accused to user Know corresponding content.

For the process for extracting the first global characteristics and the second global characteristics, server 1005 can be closed video to be detected The convolutional neural networks that key frame input one constructs in advance, the first full articulamentum and the second full articulamentum based on convolutional neural networks Determine the first global characteristics and the second global characteristics.Furthermore it is possible to be instructed based on video frame sample to convolutional neural networks Practice, to obtain above-mentioned index, and stores the video finger print library of the global characteristics of sample.

In addition, being directed to the training process of convolutional neural networks, can be executed by server 1005.It is, however, to be understood that It is that other servers in addition to server 1005 or processing unit can also be utilized to execute training for convolutional neural networks Journey, then, server 1005 can directly acquire the convolutional neural networks after training, can also realize the disclosure to video as a result, Detection process.

It should be noted that the video detecting method that disclosure illustrative embodiments provide generally is held by server 1005 Row, correspondingly, video detecting device is generally disposed in server 1005.

However, following video detecting methods can also be realized by terminal device 1001,1002,1003, the disclosure to this not It does specifically limited.

Fig. 2 shows the structures of the computer system of the electronic equipment suitable for being used to realize disclosure illustrative embodiments Schematic diagram.

It should be noted that Fig. 2 shows the computer system 200 of electronic equipment be only an example, should not be to this public affairs The function and use scope for opening embodiment bring any restrictions.

As shown in Fig. 2, computer system 200 includes central processing unit (CPU) 201, it can be read-only according to being stored in Program in memory (ROM) 202 or be loaded into the program in random access storage device (RAM) 203 from storage section 208 and Execute various movements appropriate and processing.In RAM203, it is also stored with various programs and data needed for system operatio. CPU201, ROM202 and RAM203 are connected with each other by bus 204.Input/output (I/O) interface 205 is also connected to bus 204。

I/O interface 205 is connected to lower component: the importation 206 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 207 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 208 including hard disk etc.； And the communications portion 209 of the network interface card including LAN card, modem etc..Communications portion 209 via such as because The network of spy's net executes communication process.Driver 210 is also connected to I/O interface 205 as needed.Detachable media 211, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 210, in order to read from thereon Computer program be mounted into storage section 208 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer below with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 209, and/or from detachable media 211 are mounted.When the computer program is executed by central processing unit (CPU) 201, executes and limited in the system of the application Various functions.

It should be noted that computer-readable medium shown in the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part realizes that described unit also can be set in the processor.Wherein, the title of these units is in certain situation Under do not constitute restriction to the unit itself.

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment；It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when the electronics is set by one for said one or multiple programs When standby execution, so that method described in electronic equipment realization as the following examples.

Fig. 3 diagrammatically illustrates the flow chart of the video detecting method according to disclosure illustrative embodiments.With reference to figure 3, video detecting method may comprise steps of:

S32. video frame sample is obtained in advance, determines the first global characteristics and the second global characteristics of video frame sample, it will Second global characteristics are converted to the first cryptographic Hash, are established and are indexed using the first cryptographic Hash, and the first global characteristics are stored in rope Under drawing.

In the illustrative embodiments of the disclosure, video frame can be regarded as video it is static when the image that presents.Video Frame sample can be the frame picture (or being single-frame images) extracted from existing video, it being understood, however, that, be not The single image extracted from video can also be chosen for the video frame sample of the disclosure.

The global characteristics of video frame can indicate the Global Information of video frame images, be the low-level image feature of Pixel-level, can be with It is related to the color characteristic of video frame images entirety, the textural characteristics of image entirety, spatial distribution characteristics of image entirety etc..It is global Feature can be polymerize by the local feature for expressing image local information to be generated, and can also directly be extracted from the pixel layer of image It generates.

First global characteristics and the second global characteristics described in the disclosure indicate the Global Information of corresponding video frame images. When global characteristics to be expressed as to the vector of reflection image information, the dimension of the dimension of the first global characteristics and the second global characteristics It is different.For example, the second global characteristics can be the image for further extracting feature from the first global characteristics and determining Global Information, in this case, the dimension of the first global characteristics may may be for 512 dimensions, the dimension of the second global characteristics 32 dimensions.

The disclosure can use histograms of oriented gradients (Histogram of Oriented Gradient, HOG), scale The methods of invariant features transformation (Scale-invariant Feature Transform, SIFT), deep learning network mention in advance Take the global characteristics of video frame sample.

To be with the convolutional neural networks (Convolutional Neural Networks, CNN) in deep learning below Example is illustrated the process of the first global characteristics and the second global characteristics that extract video frame sample.

The video frame sample obtained in advance can be input to convolutional neural networks by server, wherein convolutional neural networks In include the first full articulamentum and the second full articulamentum.Wherein, in convolutional neural networks, the purpose of full articulamentum be in order to The characteristic information by convolution and Chi Huahou is maximally utilized, these characteristic informations extracted before full articulamentum are gathered It closes, to obtain the high-rise meaning of image, facilitates subsequent classification process that may be present.

Comprehensively consider model treatment speed and accuracy, the dimension of the first full articulamentum can in disclosure convolutional neural networks Think 512 dimensions, the dimension of the second full articulamentum can be 32 dimensions.However, design of the invention, example also may be implemented in other dimensions Such as, the dimension of the first full articulamentum can be 1024 dimensions, and the dimension of the second full articulamentum can be 64 dimensions.The disclosure does not do this Limitation.

Disclosure illustrative embodiments using the feature determined via the first full articulamentum as the first global characteristics, and Using the feature determined via the second full articulamentum as the second global characteristics.

The training process of convolutional neural networks is illustrated below.

With reference to Fig. 4, exemplary embodiment of the invention uses Triplet convolutional neural networks (Triplet CNN, ternary Group CNN) network structure realizes the training process of network.It include three convolutional Neurals in Triplet convolutional neural networks structure Network, and these three convolutional neural networks parameter sharings.It will be readily appreciated by those skilled in the art that when training, three convolution The training sample of neural network input is different.

For the training process of Triplet convolutional neural networks, firstly, it is necessary to training sample is obtained, the example of the disclosure Property embodiment can be using video frame sample as the training sample of network.Wherein, every group of video frame sample may include positive sample This to and negative sample.

Server can determine positive sample pair, wherein positive sample is remembered respectively to may include identical two images of classification For the first image and the second image, for example, can be using the first image as positive sample, using the second image as anchor point.

According to one embodiment of the disclosure, can use video often has the characteristics that space-time expending, obtains first Image and the second image.Specifically, it is possible, firstly, to obtaining an image from existing video as the first image；Next, can Using by video away from the image in the first image preset time period as the second image, wherein preset time period can be one section The extremely short time, for example, 0.5s, so that the first image is identical as the classification of the second image.

With reference to Fig. 5, two images can be determined respectively as the first image and the second image in video streaming.Although figure Shown in the first image and the second image be only separated by time of a video frame, however, it is understood that the first image with Second image can also be separated by the time of multiple video frames.

In addition, can also determine the second figure using frame number in addition to determining the second image by means of preset time period Picture.For example, the image of second frame after the first image in video can be extracted as the second figure after determining the first image Picture does not do particular determination to this in this illustrative embodiment.

According to another embodiment of the present disclosure, it is contemplated that some personnel carry out form conversion to image, to escape copyright This behavior can be described as copy-attack by the behavior of problem.First image can be carried out form conversion by server, with determination Second image out.Wherein, form conversion can include but is not limited to posture conversion, scale conversion, modality conversion, illumination convert, Colour temperature conversion etc., it can be seen that, the image converted through form described in the disclosure, it is only shape that substantial variation, which does not occur, for content Formula is changed, thus, it is possible to think, image be belong to it is same category of.

Fig. 6 diagrammatically illustrates several examples of the image via form conversion.It is easily understood that except shown in Fig. 6 this Outside several transformation results, those skilled in the art can also associate other transformation results, however, these should belong to the present invention Design.

According to another embodiment of the present disclosure, both above situation can be integrated to determine the first image and the second figure Picture.Specifically, firstly, server can obtain an image as the first candidate image from video, by video away from first Image in candidate image preset time period is as the second candidate image；Next, server can be respectively to the first candidate figure Picture and the second candidate image carry out form conversion, to determine candidate image collection, so it is easy to understand that candidate image is concentrated all Image belong to same category；Then, server can be concentrated from candidate image and choose two images, respectively as the first figure Picture and the second image, wherein the process specifically chosen can be realized by way of randomly selecting.

After determining the first image and the second image, that is, positive sample is being determined to rear, server can determine negative sample This, the third image which can be different from the classification of positive sample pair.That is, can choose and the first image or The different third image of two image scenes is as negative sample.Furthermore it is also possible to first determine negative sample, then determine positive sample Right, the disclosure does not do particular determination to the sequence for determining an image in sample.

Determine positive sample to and in the case where negative sample, that is to say, that determining positive sample, anchor point and negative sample In the case where, it can be using them as one group of training sample, as shown in figure 4, being input in convolutional neural networks, to realize once Model training process.

It is easily understood that the process of above-mentioned determining training sample can be used, multiple groups training sample is determined, constantly to instruct Practice convolutional neural networks.

In the training process of network, loss function (Loss Function) is needed to restrain.Loss function is also known as cost Function (Cost Function) is that chance event or its value in relation to stochastic variable are mapped as nonnegative real number to indicate this The risk of chance event or the function of loss.During being trained using sample to model, according to prediction probability and reality Error between the probability of border is adjusted the parameter in model, constantly repeatedly inputs the process of sample, adjusting parameter as a result, Until loss function reaches minimum, that is to say, that loss function tends towards stability, in this case, it is believed that loss function is received It holds back, model training can terminate.

The loss function that the embodiment of the present disclosure uses may include range loss function, quantization loss function and comentropy damage Lose function.Specifically, the weighted sum of range loss function, quantization loss function, comentropy loss function can be determined as rolling up The loss function of product neural network.

Loss function L used by the embodiment of the present disclosure can be built as the form of formula 1:

L=α L_t+βL_q+λL_e(formula 1)

Wherein, L_tIndicate range loss function, L_qIndicate quantization loss function, L_eIndicate comentropy loss function, α, β, λ Respectively indicate range loss function L_t, quantify loss function L_qWith comentropy loss function L_eWeight (coefficient).

Range loss function L_tIt can be built as the form of formula 2:

L_t=max { (x_a-x_p)+m-(x_a-x_n), 0 } (formula 2)

Wherein, x_a、x_p、x_nThe feature of anchor point, positive sample and negative sample is respectively indicated, m indicates the confidence level of feature spacing, It can be float.

Quantify loss function L_qIt can be built as the form of formula 3:

L_q=| | | x⁽ⁱ⁾|-1||₁(formula 3)

Wherein, x⁽ⁱ⁾Indicate any feature of training sample.

Comentropy loss function L_eIt can be built as the form of formula 4:

Wherein, u_dIndicate the mean value of the feature of training sample, specifically, in the two-dimensional array formed by feature, characterization The mean value of d column bit.

It should be understood that constructing the exemplary description that loss function is only the disclosure using above-mentioned formula 1.At it In his embodiment, loss function used in network training can also be formed only with comentropy loss function shown in formula 4.

For convolutional neural networks structure shown in Fig. 4, those skilled in the art, so it is easy to understand that connect entirely except first It connects except layer and the second full articulamentum, convolutional neural networks structure further includes various convolutional layers and pond layer.For example, convolutional Neural Network structure can also include n Inception module, wherein consider model calculating speed and accuracy, n herein can Think 5.Inception module can be considered as steroids convolutional layer, can provide different receptive fields, and it is each can to export capture ReLU activation function can be used in the Feature Mapping of kind scale complex mode, convolutional layer.The specific network structure of CNN can be by opening Hair personnel voluntarily adjust, and there is no special restriction on this for the disclosure.

Thus, it is possible to using the video frame sample determined, based on the loss function constructed to convolutional neural networks into Row training, to determine the convolutional neural networks after training.It should be noted that the convolutional neural networks after practical application training When, as shown in figure 4, three convolutional neural networks parameter sharings, therefore, when practical application network is to extract global characteristics, only The concrete processing procedure of image is realized using a convolutional neural networks.

It should be understood that above-mentioned network training process can be executed by the server of execution video detecting method.However, Convolutional neural networks after training can also be directly obtained by other processing unit training patterns, server.

It, can the first full articulamentum based on convolutional neural networks and second complete in the training process of convolutional neural networks Articulamentum determines the first global characteristics and the second global characteristics of video frame sample respectively.That is, if complete by first The feature that articulamentum determines is known as the first full articulamentum feature, and the feature that the second full articulamentum determines is known as the second full articulamentum First full articulamentum feature then can be determined as above-mentioned first global characteristics, the second full articulamentum feature is determined as by feature Above-mentioned second global characteristics.In addition, it should be noted that the first global characteristics and the second global characteristics are floating point type.

Server can carry out Hash processing to the second global characteristics of video frame sample, to determine the of video frame sample The corresponding cryptographic Hash of two global characteristics, is denoted as the first cryptographic Hash.Wherein, Hash processing is called hashing, is by random length Information MAP be shorter regular length binary value, this shorter binary value is known as cryptographic Hash.Then, server It can use the first cryptographic Hash building index for determining video frame sample, and the first global characteristics of video frame sample stored Under the index.

For example, being 000010 according to the cryptographic Hash that the second global characteristics of sample A are determined, 000010 building point is utilized Bucket, the first global characteristics of sample A are floating point features, can store 000010 divide bucket under.If the Kazakhstan that sample B is determined It is uncommon value also be 000010, then can by the first global characteristics of sample B there is also 000010 divide bucket under.It forms as a result, Index corresponding relationship.

It is easily understood that be directed to each index value, can correspond to a point of bucket, each divide bucket be stored with one or The floating point features of multiple samples.That is, each index value, which can correspond to one, carries out global characteristics ratio for detection-phase Pair feature set.

S34. the key frame of video to be detected for extracting video to be detected determines that the first of key frame of video to be detected is global special It seeks peace the second global characteristics, and the second global characteristics of key frame of video to be detected is converted into the second cryptographic Hash.

For the process for the key frame of video to be detected for extracting video to be detected, server can be every prefixed time interval Video frame is extracted from video to be detected, as key frame of video to be detected, wherein prefixed time interval can taking human as determination, It can also be extracted by fixed number of frames.That is, no matter the duration of video to be detected is how many, can therefrom extract The key frame of video to be detected of fixed number of frames.For example, extracting 50 views from 1 minute video medium while spacing to be detected Frequency frame is as key frame of video to be detected.

In addition, server can carry out analysis of complexity to frame image each in video to be detected, if complexity meets Default complexity requirement, then can be using the frame image as key frame of video to be detected.Specifically, video frame images can be used Gray level histogram second moment determine the complexity of each frame image.Special limit is not done in this illustrative embodiment to this It is fixed.

After the key frame of video to be detected for extracting video to be detected, determine that the first of key frame of video to be detected is global special It seeks peace the second global characteristics.Still for by the way of convolutional neural networks, key frame of video to be detected can be inputted State training after convolutional neural networks in, and the first full articulamentum being based respectively in network and the second full articulamentum determine to Detect the first global characteristics and the second global characteristics of key frame of video.

It is to be understood, however, that can also be determined using the methods of histograms of oriented gradients, Scale invariant features transform The first global characteristics and the second global characteristics of key frame of video to be detected.

Next, server can carry out Hash processing to the second global characteristics of key frame of video to be detected, to obtain The corresponding cryptographic Hash of the second global characteristics of key frame of video to be detected, as the second cryptographic Hash.

S36. when the similarity of the second cryptographic Hash and the first cryptographic Hash is greater than the first similarity threshold, the second Hash is extracted It is worth corresponding first global characteristics, by the overall situation of corresponding first global characteristics of the second cryptographic Hash storage corresponding with the first cryptographic Hash Feature is compared, and the testing result of video to be detected is determined according to comparison result.

After determining corresponding second cryptographic Hash of key frame of video to be detected, server can calculate the second cryptographic Hash with The similarity of the first cryptographic Hash determined in step S32, if the similarity of the second cryptographic Hash and the first cryptographic Hash is greater than first Similarity threshold, it is determined that the global characteristics of the corresponding storage of the first cryptographic Hash.It should be noted that corresponding storage described herein It refers to that the first global characteristics of video frame sample are stored in the storing process under index in step S32.That is, corresponding The global characteristics of storage are the first global characteristics of video frame sample.

For the process for calculating similarity, the Hamming distance of the second cryptographic Hash Yu the first cryptographic Hash, Hamming distance can be calculated From smaller, similarity is bigger, in such a case, it is possible to determine whether the similarity of the second cryptographic Hash and the first cryptographic Hash is greater than First similarity threshold.Wherein, the first similarity threshold can artificially be set in advance, and it is special that the disclosure does not do its value Limitation.However, the other embodiments of the disclosure can also determine the second cryptographic Hash and rope using COS distance, Euclidean distance etc. The similarity of first cryptographic Hash does not do particular determination to this in this illustrative embodiment.

After the global characteristics for determining the corresponding storage of the first cryptographic Hash, server can be by the second cryptographic Hash corresponding first Global characteristics are compared with the global characteristics of the corresponding storage of the first cryptographic Hash.

In the illustrative embodiments of the disclosure, for the comparison of feature, server can calculate the second cryptographic Hash pair The similarity of the global characteristics of the first global characteristics answered storage corresponding with the first cryptographic Hash, and similarity result is determined as comparing To result.For example, the Euclidean distance between them can be calculated, and similarity is characterized with calculated Euclidean distance.However, also Method can be determined using other similarities in addition to Euclidean distance, do not do special limit to this in this illustrative embodiment It is fixed.

Server can be according to the overall situation of corresponding first global characteristics of the second cryptographic Hash storage corresponding with the first cryptographic Hash The similarity of feature determines the testing result of video to be detected.Specifically, can determine whether video to be detected is existing In existing video.

In addition, according to the global characteristics of corresponding first global characteristics of the second cryptographic Hash storage corresponding with the first cryptographic Hash Similarity can execute the operation of corresponding scene.

According to some embodiments of the present disclosure, the video detecting method of disclosure illustrative embodiments can be applied to version Power detection scene.Specifically, if the overall situation of corresponding first global characteristics of the second cryptographic Hash storage corresponding with the first cryptographic Hash The similarity of feature is greater than the second similarity threshold, it is determined that the upload side of video to be detected identifies and the first cryptographic Hash is corresponding The upload side of video identifies.

If upload side's mark of video to be detected is different from the upload side of the corresponding video of the first cryptographic Hash mark, may There are same videos, there are two upload sides.In this case, server can be to two upload sides, video version It weighs the side in regulatory agency or sends a warning message in many ways.

According to another embodiment of the present disclosure, the video detecting method of disclosure illustrative embodiments can be applied to Label scene is added for video.Specifically, being deposited if corresponding first global characteristics of the second cryptographic Hash are corresponding with the first cryptographic Hash The similarity of the global characteristics of storage is greater than third similarity threshold, it is determined that the first cryptographic Hash corresponds to the label of video.Wherein, should Label can be added on existing video in advance by artificially, or be added on existing video in advance by the way of machine learning Add.Next, the label that the first cryptographic Hash corresponds to video can be added to video to be detected.For example, video to be detected is one The TV play name can be referred to as view to be detected in the case where corresponding video tab is TV play title by episodes The label of frequency.Effect with reference to Fig. 7, after showing addition label.

According to another embodiment of the present disclosure, the video detecting method of disclosure illustrative embodiments can be applied to It avoids repeating to upload scene.Specifically, if corresponding first global characteristics of the second cryptographic Hash storage corresponding with the first cryptographic Hash Global characteristics similarity less than the 4th similarity threshold, then video to be detected is uploaded to corresponding platform.If it is greater than etc. In the 4th similarity threshold, then video to be detected is prevented to upload, and send prompt information to the upload side of video to be detected, to accuse Know that upload side's video content of video to be detected repeats.Thus, it is possible to which video is avoided to repeat to upload, resource has been saved.

It should be noted that above-mentioned first similarity threshold, the second similarity threshold, third similarity threshold and the 4th phase It is respectively applied to different similarities like degree threshold value and judges scene, therefore, between them there is no incidence relation, can be difference and take The threshold value of value.

Fig. 8 is shown under bonding wire according to an exemplary embodiment of the present disclosure and scheme realizes video detection on line Schematic diagram.

For (offline) stage under line, global characteristics can be extracted from video frame sample using convolutional neural networks, In, the global characteristics of extraction include the first global characteristics and the second global characteristics.On the one hand, Hash is carried out to the second global characteristics Processing, obtains the first cryptographic Hash, is constructed and is indexed using the first cryptographic Hash；On the other hand, corresponding first global characteristics are stored In video fingerprint data library, so that the first cryptographic Hash based on the second global characteristics can find the first global characteristics.

For the stage on line, the key frame of video to be detected of video to be detected is extracted, using convolutional neural networks to be checked It surveys in key frame of video and extracts the first global characteristics and the second global characteristics.It can use the second global characteristics and based on structure under line The index built determines the global characteristics of existing video corresponding with key frame of video to be detected, by key frame of video to be detected First global characteristics are compared with corresponding global characteristics in video fingerprint data library, with the detection knot of determination video to be detected Fruit.

It should be noted that although describing each step of method in the disclosure in the accompanying drawings with particular order, this is simultaneously Undesired or hint must execute these steps in this particular order, or have to carry out the ability of step shown in whole Realize desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, And/or a step is decomposed into execution of multiple steps etc..

Further, a kind of video detecting device is additionally provided in this example embodiment.

Fig. 9 diagrammatically illustrates the block diagram of the video detecting device of the illustrative embodiments of the disclosure.With reference to Fig. 9, Video detecting device 9 according to an exemplary embodiment of the present disclosure may include detection preparation module 91, characteristic extracting module 93 and feature comparison module 95.

Specifically, detection preparation module 91 can be used for obtaining video frame sample in advance, the first of video frame sample is determined Second global characteristics are converted to the first cryptographic Hash by global characteristics and the second global characteristics, are established and are indexed using the first cryptographic Hash, And the first global characteristics are stored under index；Characteristic extracting module 93 can be used for extracting the video to be detected of video to be detected Key frame, determines the first global characteristics and the second global characteristics of key frame of video to be detected, and by key frame of video to be detected The second global characteristics be converted to the second cryptographic Hash；Feature comparison module 95 can be used for when the second cryptographic Hash and the first cryptographic Hash Similarity when being greater than the first similarity threshold, corresponding first global characteristics of the second cryptographic Hash are extracted, by the second cryptographic Hash pair The first global characteristics answered are compared with the global characteristics of the corresponding storage of the first cryptographic Hash, and are determined according to comparison result to be checked Survey the testing result of video.

It can quickly be detected using the video detecting device of disclosure illustrative embodiments by providing index function Out may global characteristics relevant to video to be detected, then using video to be detected itself global characteristics with may it is relevant entirely Office's feature is compared, and can accurately determine out and whether there is video similar with video to be detected.

According to an exemplary embodiment of the present disclosure, using convolutional neural networks determine video frame the first global characteristics and Second global characteristics, wherein global characteristics are full articulamentum feature.

Specifically, detection preparation module 91 can be used for obtaining video frame sample in advance, the video frame sample is determined Described second full articulamentum Feature Conversion is the first cryptographic Hash, benefit by the first full articulamentum feature and the second full articulamentum feature It is established and is indexed with first cryptographic Hash, and the floating point features of the described first full articulamentum is stored under the index；Feature Extraction module 93 can be used for extracting the key frame of video to be detected of video to be detected, determine the key frame of video to be detected First full articulamentum floating point features and the second full articulamentum feature, and by the second full articulamentum of the key frame of video to be detected Feature Conversion is the second cryptographic Hash；Feature comparison module 95 can be used for when second cryptographic Hash and first cryptographic Hash When similarity is greater than the first similarity threshold, extract the corresponding first full articulamentum floating point features of second cryptographic Hash with it is described The full articulamentum floating point features of the corresponding storage of first cryptographic Hash is compared, and determines the video to be detected according to comparison result Testing result.

The first global characteristics and the second global characteristics are extracted using convolutional neural networks, using the place of convolutional neural networks Reason method, so that the video detection process versatility of disclosure illustrative embodiments is higher, it is easy to promote and utilize.

According to an exemplary embodiment of the present disclosure, video frame sample include positive sample to and negative sample；With reference to Figure 10, detection Preparation module 91 further includes sample acquisition unit 101.

Specifically, sample acquisition unit 101 can be configured as execution: obtaining positive sample pair in advance, positive sample is to including First image and the second image；Wherein, the first image is identical as the classification of the second image；Classification and positive sample pair are obtained in advance The different third image of classification, as negative sample；Wherein, using the positive sample to and negative sample to the convolutional neural networks It is trained, to adjust the parameter in the convolutional neural networks.

Using network is trained based on the Triplet CNN network structure of positive negative sample, Triplet CNN network structure is made For a kind of twin neural network (Siamese Convolution Neural Network, SCNN), when determining similarity, more It is accurate to add.

According to an exemplary embodiment of the present disclosure, sample acquisition unit 101 can be additionally configured to execute: from video sample One image of middle acquisition, as the first image；Using in video sample away from the image in the first image preset time period as second Image.

According to an exemplary embodiment of the present disclosure, sample acquisition unit 101 can be additionally configured to execute: from video sample One image of middle acquisition, as the first image；Form conversion is carried out to the first image, to determine the second image.

According to an exemplary embodiment of the present disclosure, sample acquisition unit 101 can be additionally configured to execute: from video sample One image of middle acquisition, as the first candidate image；By in video sample away from the image in the first candidate image preset time period As the second candidate image；Form conversion is carried out to the first candidate image and the second candidate image respectively, to determine candidate image Collection；It is concentrated from candidate image and obtains two images, respectively as the first image and the second image.

According to an exemplary embodiment of the present disclosure, with reference to Figure 11, characteristic extracting module 93 may include network training unit 1101 and feature extraction unit 1103.

Specifically, network training unit 1101 can be used for video frame sample inputting convolutional neural networks, convolution is calculated The loss function of neural network restrains the loss function of convolutional neural networks, to determine the convolutional neural networks after training； Feature extraction unit 1103 can be used for the convolutional Neural net after the key frame of video to be detected input training by video to be detected Network, using the feature determined via the first full articulamentum as the first global characteristics of key frame of video to be detected, and will be via Second global characteristics of the feature that second full articulamentum is determined as key frame of video to be detected.

According to an exemplary embodiment of the present disclosure, the loss function of convolutional neural networks includes range loss function, quantization Loss function and comentropy loss function；

Range loss function are as follows: L_t=max { (x_a-x_p)+m-(x_a-x_n),0}；

Quantify loss function are as follows: L_q=| | | x⁽ⁱ⁾|-1||₁；

Comentropy loss function are as follows:

According to an exemplary embodiment of the present disclosure, with reference to Figure 12, characteristic extracting module 93 can also include key-frame extraction Unit 1201.

Specifically, key-frame extraction unit 1201 can be used for extracting view from video to be detected every prefixed time interval Frequency frame, as key frame of video to be detected.

According to an exemplary embodiment of the present disclosure, with reference to Figure 13, video detecting device 13 compared to video detecting device 9, It can also include copyright detection module 131.

Specifically, copyright detection module 131 can be configured as execution: if second cryptographic Hash corresponding first is entirely The similarity of office's feature global characteristics corresponding with first cryptographic Hash is greater than the second similarity threshold, it is determined that described to be checked The uploads side for surveying video identifies and first cryptographic Hash corresponds to the upload side of video and identifies, upper when the video to be detected When the upload side that biography side's mark and first cryptographic Hash correspond to video identifies different, send a warning message.

By the video detection process of the present exemplary embodiment, video copy can detecte.

According to an exemplary embodiment of the present disclosure, with reference to Figure 14, video detecting device 14 compared to video detecting device 9, It can also include label adding module 141.

Specifically, label adding module 141 can be configured as execution: if second cryptographic Hash corresponding first is entirely The similarity of office's feature global characteristics corresponding with first cryptographic Hash is greater than third similarity threshold, it is determined that described first Cryptographic Hash corresponds to the label of video, and the label that first cryptographic Hash corresponds to video is added to the video to be detected.

By the video detection process of the present exemplary embodiment, label can be added for video clip.

According to an exemplary embodiment of the present disclosure, with reference to Figure 15, video detecting device 15 compared to video detecting device 9, It can also include video uploading module 151.

If specifically, video upload device 151 can be used for corresponding first global characteristics of second cryptographic Hash with The similarity of the corresponding global characteristics of first cryptographic Hash, then will be on the video to be detected less than the 4th similarity threshold It passes.

By the video detection process of the present exemplary embodiment, can repeat to upload to avoid video.

Since each functional module and the above method of the program analysis of running performance device of embodiment of the present invention are invented It is identical in embodiment, therefore details are not described herein.

Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment Method.

In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.

It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure His embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim It points out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the attached claims.

Claims

1. a kind of video detecting method characterized by comprising

Video frame sample is obtained in advance, determines the first global characteristics and the second global characteristics of the video frame sample, it will be described Second global characteristics are converted to the first cryptographic Hash, are established and are indexed using first cryptographic Hash, and by first global characteristics It is stored under the index；

The key frame of video to be detected for extracting video to be detected, determine the key frame of video to be detected the first global characteristics and Second global characteristics, and the second global characteristics of the key frame of video to be detected are converted into the second cryptographic Hash；

When the similarity of second cryptographic Hash and first cryptographic Hash is greater than the first similarity threshold, described second is extracted Corresponding first global characteristics of cryptographic Hash, by corresponding first global characteristics of second cryptographic Hash and first cryptographic Hash pair The global characteristics that should be stored are compared, and the testing result of the video to be detected is determined according to comparison result.

2. video detecting method according to claim 1, which is characterized in that use convolution mind to the processing of the video frame Through network, the global characteristics are full articulamentum feature；The method specifically includes:

Video frame sample is obtained in advance, determines that the first full articulamentum feature of the video frame sample and the second full articulamentum are special Described second full articulamentum Feature Conversion is the first cryptographic Hash by sign, establishes index using first cryptographic Hash, and will be described The floating point features of first full articulamentum is stored under the index；

The key frame of video to be detected for extracting video to be detected determines that the first full articulamentum of the key frame of video to be detected is floating Point feature and the second full articulamentum feature, and be second by the second of the key frame of video to be detected the full articulamentum Feature Conversion Cryptographic Hash；

When the similarity of second cryptographic Hash and first cryptographic Hash is greater than the first similarity threshold, described second is extracted The full articulamentum floating point features of the corresponding first full articulamentum floating point features of cryptographic Hash storage corresponding with first cryptographic Hash into Row compares, and the testing result of the video to be detected is determined according to comparison result.

3. video detecting method according to claim 2, which is characterized in that the video frame sample include positive sample to Negative sample；Wherein, obtaining video frame sample in advance includes:

Positive sample pair is obtained in advance, and the positive sample is to including the first image and the second image；Wherein, the first image and institute The classification for stating the second image is identical；

The classification third image different from the classification of the positive sample pair is obtained in advance, as negative sample；

Wherein, using the positive sample to and negative sample the convolutional neural networks are trained, to adjust convolution mind Through the parameter in network.

4. video detecting method according to claim 3, which is characterized in that obtain positive sample to including:

An image is obtained from video sample, as the first image；

Using in the video sample away from the image in the first image preset time period as the second image.

5. video detecting method according to claim 3, which is characterized in that obtain positive sample to including:

An image is obtained from video sample, as the first image；

Form conversion is carried out to the first image, to determine the second image.

6. video detecting method according to claim 3, which is characterized in that obtain positive sample to including:

An image is obtained from video sample, as the first candidate image；

Using in the video sample away from the image in the first candidate image preset time period as the second candidate image；

Form conversion is carried out to first candidate image and second candidate image respectively, to determine candidate image collection；

It is concentrated from the candidate image and obtains two images, respectively as the first image and the second image.

7. the video detecting method according to any one of claim 2 to 6, which is characterized in that determine the view to be detected The first global characteristics and the second global characteristics of frequency key frame include:

The video frame sample is inputted into the convolutional neural networks, the loss function of the convolutional neural networks is calculated, makes institute The loss function convergence of convolutional neural networks is stated, to determine the convolutional neural networks after training；

It, will be via described first by the convolutional neural networks after the key frame of video to be detected input training of the video to be detected First global characteristics of the feature that full articulamentum is determined as the key frame of video to be detected, and will be complete via described second Second global characteristics of the feature that articulamentum is determined as the key frame of video to be detected.

8. video detecting method according to claim 7, which is characterized in that the loss function packet of the convolutional neural networks Include range loss function, quantization loss function and comentropy loss function；

The range loss function are as follows: L_t=max { (x_a-x_p)+m-(x_a-x_n),0}；

The quantization loss function are as follows: L_q=| | | x⁽ⁱ⁾|-1||₁；

The comentropy loss function are as follows:

Wherein, L_tIndicate range loss function, x_a、x_p、x_nRespectively indicate the feature of two samples and negative sample of positive sample centering, m Indicate the confidence level of feature spacing；L_qIndicate quantization loss function, x⁽ⁱ⁾Indicate any feature of video frame sample；L_eIndicate information Entropy loss function, u_dIndicate the mean value of the feature of video frame sample.

9. video detecting method according to claim 1, which is characterized in that the video to be detected for extracting video to be detected closes Key frame includes:

Video frame is extracted from video to be detected every prefixed time interval, as key frame of video to be detected.

10. video detecting method according to claim 1, which is characterized in that the video detecting method further include:

If corresponding first global characteristics of second cryptographic Hash global characteristics corresponding with first cryptographic Hash is similar Degree is greater than the second similarity threshold, it is determined that the upload side of the video to be detected identifies and the corresponding view of first cryptographic Hash The upload side of frequency identifies, when the side of uploading of the video to be detected identifies and first cryptographic Hash corresponds to the upload side of video When identifying different, send a warning message；Or

If corresponding first global characteristics of second cryptographic Hash global characteristics corresponding with first cryptographic Hash is similar Degree is greater than third similarity threshold, it is determined that first cryptographic Hash corresponds to the label of video, and by first cryptographic Hash pair The label of video is answered to be added to the video to be detected；Or

If corresponding first global characteristics of second cryptographic Hash global characteristics corresponding with first cryptographic Hash is similar Degree then uploads the video to be detected less than the 4th similarity threshold.

11. a kind of video detecting device characterized by comprising

Preparation module is detected, for obtaining video frame sample in advance, determines the first global characteristics and the of the video frame sample Second global characteristics are converted to the first cryptographic Hash by two global characteristics, are established and are indexed using first cryptographic Hash, and will First global characteristics are stored under the index；

Characteristic extracting module determines the Video Key to be detected for extracting the key frame of video to be detected of video to be detected The first global characteristics and the second global characteristics of frame, and the second global characteristics of the key frame of video to be detected are converted to Two cryptographic Hash；

Feature comparison module is greater than the first similarity threshold for the similarity when second cryptographic Hash and first cryptographic Hash When value, corresponding first global characteristics of second cryptographic Hash are extracted, by corresponding first global characteristics of second cryptographic Hash It is compared with the global characteristics of the corresponding storage of first cryptographic Hash, and the video to be detected is determined according to comparison result Testing result.

12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor The video detecting method as described in any one of claims 1 to 10 is realized when row.

13. a kind of electronic equipment characterized by comprising

One or more processors；

Storage device, for storing one or more programs, when one or more of programs are by one or more of processing When device executes, so that one or more of processors realize the video detection side as described in any one of claims 1 to 10 Method.