CN108615043A

CN108615043A - A kind of video classification methods and system

Info

Publication number: CN108615043A
Application number: CN201611137674.1A
Authority: CN
Inventors: 宋刘汉; 宋刘一汉; 杜安安; 程耀; 许宝亮
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2016-12-12
Filing date: 2016-12-12
Publication date: 2018-10-02
Anticipated expiration: 2036-12-12
Also published as: CN108615043B

Abstract

The embodiment of the invention discloses a kind of video classification methods and system, the method includes：The system obtains at least one video segmentation from input video；The system obtains the corresponding key frame of each video segmentation according to the range performance in each video segmentation；The system carries out image classification to each key frame, obtains the corresponding static classification set of each key frame；The system obtains the classification results of the input video according to the corresponding key frame of each video segmentation, the corresponding static classification set of each key frame and default video classification parameter, and carrying out visual classification using key frame solves the problems, such as that visual classification accuracy is poor.

Description

A kind of video classification methods and system

Technical field

The present invention relates to technical field of video image processing more particularly to a kind of video classification methods and systems.

Background technology

With the explosive increase of video data, is handled for massive video data and extract having in video content Imitating information becomes current research hotspot.Visual classification technology is one of the key technology of video content recognition and retrieval.

Low-level image feature and motion feature of the current visual classification technology based on video image, significantly regard for feature Frequency can reach preferable classifying quality, but not satisfactory for the distant visual classification effect of feature.Therefore, having must Video data is handled from video content, further increases the accuracy of classification.

Invention content

In order to solve the above technical problems, an embodiment of the present invention is intended to provide a kind of video classification methods and system, raising regards The accuracy of frequency division class.

The technical proposal of the invention is realized in this way：

In a first aspect, an embodiment of the present invention provides a kind of video classification methods, the method is used for a kind of visual classification System, the method includes：

The system obtains at least one video segmentation from input video；

The system obtains the corresponding key frame of each video segmentation according to the range performance in each video segmentation；

The system carries out image classification to each key frame, obtains the corresponding static classification set of each key frame；

The system is according to the corresponding key frame of each video segmentation, the corresponding static classification set of each key frame and default regards Frequency classification parameter obtains the classification results of the input video.

In above-described embodiment, the system obtains at least one video segmentation from input video, specifically includes：

HS two-dimensional color Histogram distance and preset first threshold of the system according to adjacent image frame in input video In correspondence and input video between value between the perceptual hash vector distance and preset second threshold of adjacent image frame Correspondence input video is segmented, obtain at least one video segmentation.

Further, the system according to the HS two-dimensional colors Histogram distance of adjacent image frame in input video with it is default First threshold between correspondence and input video in adjacent image frame perceptual hash vector distance and preset second Correspondence between threshold value is segmented input video, obtains at least one video segmentation, specifically includes：

The tone saturation degree HS two-dimensional colors histogram of adjacent image frame and perception are breathed out in the system-computed input video Uncommon vector；

The system is vectorial according to the HS two-dimensional colors histogram and perceptual hash of adjacent image frame in the input video, Calculate the HS two-dimensional colors Histogram distance and perceptual hash vector distance of adjacent image frame in the input video；

When adjacent image frame in the input video HS two-dimensional color Histogram distances be more than preset first threshold, and When the perceptual hash vector distance of adjacent image frame is more than preset second threshold in the input video, the system is to input Video is segmented, and at least one video segmentation is obtained.

Further, the process of the system-computed HS two-dimensional color histograms specifically includes：

Picture frame is transformed into tone saturation degree lightness hsv color space by the system from RGB RGB color, is obtained Take tone H components, saturation degree S components；

The corresponding channels H of H components channel S corresponding with S components is divided into a sections by the system, and statistics is schemed As the HS two-dimensional color histograms of frame.

Further, the process of the system-computed perceptual hash vector specifically includes：

Picture frame is zoomed to pre-set dimension by the system, obtains the first image array；

The system carries out image gray processing processing to described first image matrix, obtains the second image array；

The system carries out two-dimension discrete cosine transform DCT to second image array, obtains third image array；

The system, which is chosen, presets submatrix as the 4th image array in third image array；

The average pixel value of 4th image array described in the system-computed；

The pixel value of each picture element in 4th image array is compared by the system with the average pixel value, Obtain newer 4th image array；

Newer 4th image array is carried out vectorization by the system, obtains final perceptual hash vector.

In above-described embodiment, when the adjacent image frame HS two-dimensional colors Histogram distance and preset first threshold it Between correspondence and adjacent image frame perceptual hash vector distance and preset second threshold between correspondence not The HS two-dimensional color Histogram distances for meeting the adjacent image frame are more than preset first threshold, and the perception of adjacent image frame When Hash vector distance is more than preset second threshold, the system will not be split input video.

In above-described embodiment, it is corresponding to obtain each video segmentation according to the range performance in each video segmentation for the system Key frame specifically includes：

The system obtains the average HS two-dimensional colors histogram of each video segmentation, and straight according to each average HS two-dimensional colors Between the HS two-dimensional color histograms of the picture frame of side's figure video segmentation corresponding with each average HS two-dimensional color histograms away from From relationship, the corresponding key frame of each video segmentation is obtained.

Further, the system is by the average HS two-dimensional colors histogram of video segmentation and average HS two-dimensional color histograms The HS two-dimensional color histograms for scheming picture frame all in corresponding video segmentation are compared, video segmentation described in selected distance The average nearest picture frame of HS two-dimensional colors histogram key frame of the HS two-dimensional colors histogram as the video segmentation.

In above-described embodiment, the system carries out image classification to each key frame, and it is static point corresponding to obtain each key frame Class set, specifically includes：

The system carries out image classification by Image Classifier to each key frame, is obtained according to default key frame classification parameter Take the corresponding static classification set of each key frame；Wherein, described image grader is generated by deep neural network.

Further, the default key frame classification parameter, for limiting in the corresponding static classification set of each key frame The number of crucial frame category.

In upper embodiment, the system is according to the corresponding key frame of each video segmentation, the corresponding static classification of each key frame Set and default video classification parameter, obtain the classification results of the input video, specifically include：

The number of image frames calculating that the number of image frames and each video segmentation that the system includes according to input video include respectively regards The time weighting of frequency division section；

The system obtains the visual classification of the input video according to the corresponding static classification set of each key frame Set；

The system is according to each visual classification static classification set corresponding with each key frame in the visual classification set Between relationship, obtain the visual classification coefficient of each visual classification in the visual classification set；

The system is regarded according to each visual classification in the time weighting of each video segmentation, the visual classification set Frequency classification factor calculates the visual classification weight of each visual classification in visual classification set；

The system is according to the visual classification weight of each visual classification and default video class in the visual classification set Other parameter obtains the final classification result of input video.

Further, the system is using the corresponding static classification union of sets collection of each key frame as the input video Visual classification set.

Further, the default video classification parameter, the number for limiting input video classification results.

Second aspect, an embodiment of the present invention provides a kind of video classification system, the system comprises：First obtains mould Block, the second acquisition module, the first sort module and the second sort module, wherein

First acquisition module, for obtaining at least one video segmentation from input video；

Second acquisition module, for according to the range performance in each video segmentation, it is corresponding to obtain each video segmentation Key frame；

It is static point corresponding to obtain each key frame for carrying out image classification to each key frame for first sort module Class set；

Second sort module, for static point corresponding according to the corresponding key frame of each video segmentation, each key frame Class set and default video classification parameter, obtain the classification results of the input video.

In above-described embodiment, first acquisition module is specifically used for

According between the HS two-dimensional colors Histogram distance and preset first threshold of adjacent image frame in input video Corresponding pass in correspondence and input video between the perceptual hash vector distance of adjacent image frame and preset second threshold System is segmented input video, obtains at least one video segmentation.

Further, first acquisition module, is specifically used for

Calculate the HS two-dimensional colors histogram of adjacent image frame and perceptual hash vector in input video；

It is vectorial according to the HS two-dimensional colors histogram of adjacent image frame in the input video and perceptual hash, described in calculating The HS two-dimensional colors Histogram distance and perceptual hash vector distance of adjacent image frame in input video；

When adjacent image frame in the input video HS two-dimensional color Histogram distances be more than preset first threshold, and When the perceptual hash vector distance of adjacent image frame is more than preset second threshold in the input video, input video is carried out Segmentation, obtains at least one video segmentation.

Further, for the process of calculating HS two-dimensional color histograms, first acquisition module includes the first transformation Submodule and statistic submodule, wherein

First transformation submodule obtains H points for picture frame to be transformed into hsv color space from RGB color Amount, S components；

The statistic submodule, for the corresponding channels H of H components channel S corresponding with S components to be divided into a sections, Statistics obtains the HS two-dimensional color histograms of picture frame.

Further, for calculating the process of perceptual hash vector, the first acquisition module described in first acquisition module Further include scaling submodule, gray processing processing submodule, the second transformation submodule, to choose submodule, computational submodule, comparison sub Module and vectorization submodule, wherein

The scaling submodule obtains the first image array for picture frame to be zoomed to pre-set dimension；

The gray processing handles submodule, for carrying out image gray processing processing to described first image matrix, obtains the Two image arrays；

Second transformation submodule obtains third image moment for carrying out two-dimensional dct to second image array Battle array；

The selection submodule presets submatrix as the 4th image array for choosing in third image array；

The computational submodule, the average pixel value for calculating the 4th image array；

The comparison sub-module is used for the pixel value of each picture element in the 4th image array and the average pixel Value is compared, and obtains newer 4th image array；

The vectorization submodule obtains final sense for newer 4th image array to be carried out vectorization Know Hash vector.

Further, when between the HS two-dimensional colors Histogram distance and preset first threshold of the adjacent image frame Correspondence between correspondence and the perceptual hash vector distance and preset second threshold of adjacent image frame is unsatisfactory for The HS two-dimensional color Histogram distances of the adjacent image frame are more than preset first threshold, and the perceptual hash of adjacent image frame When vector distance is more than preset second threshold, first acquisition module will not be split input video.

In above-described embodiment, second acquisition module is specifically used for

The average HS two-dimensional colors histogram of each video segmentation is obtained, and according to each average HS two-dimensional colors histogram and respectively Averagely the distance between HS two-dimensional color histograms of picture frame of the corresponding video segmentation of HS two-dimensional color histograms relationship, is obtained Take the corresponding key frame of each video segmentation.

Further, second acquisition module, is specifically used for

By the average HS two-dimensional colors histogram of video segmentation video segmentation corresponding with average HS two-dimensional color histograms In the HS two-dimensional color histograms of all picture frames be compared, the average HS two-dimensional colors of video segmentation described in selected distance Key frame of the HS two-dimensional colors histogram of the nearest picture frame of histogram as the video segmentation.

In above-described embodiment, first sort module is specifically used for

Image classification is carried out to each key frame by Image Classifier, each key is obtained according to default key frame classification parameter The corresponding static classification set of frame；Wherein, described image grader is generated by deep neural network.

In above-described embodiment, second sort module is obtained including the first acquisition submodule, the second acquisition submodule, third Take submodule, the 4th acquisition submodule and the 5th acquisition submodule, wherein

First acquisition submodule, the figure that number of image frames and each video segmentation for including according to input video include As frame number calculates the time weighting of each video segmentation；

Second acquisition submodule, for according to the corresponding static classification set of each key frame, obtaining described defeated Enter the visual classification set of video；

The third acquisition submodule, for corresponding with each key frame according to each visual classification in the visual classification set Static classification set between relationship, obtain the visual classification coefficient of each visual classification in the visual classification set；

4th acquisition submodule, for the time weighting according to each video segmentation, the visual classification set In each visual classification visual classification coefficient, calculate visual classification set in each visual classification visual classification weight；

5th acquisition submodule, for the visual classification weight according to each visual classification in the visual classification set And default video classification parameter, obtain the final classification result of input video.

Further, second acquisition submodule, is specifically used for

Using the corresponding static classification union of sets collection of each key frame as the visual classification set of the input video.

Further, in the 5th acquisition submodule, the default video classification parameter, for limiting input video point The number of class result.

An embodiment of the present invention provides a kind of video classification methods and system, the method includes：The system is from input At least one video segmentation is obtained in video；The system obtains each video segmentation according to the range performance in each video segmentation Corresponding key frame；The system carries out image classification to each key frame, obtains the corresponding static classification set of each key frame；Institute System is stated according to the corresponding key frame of each video segmentation, the corresponding static classification set of each key frame and default video classification to be joined Number, obtains the classification results of the input video, and carrying out visual classification using key frame solves asking for visual classification accuracy difference Topic.

Description of the drawings

Fig. 1 is a kind of flow chart of video classification methods provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart obtaining video segmentation provided in an embodiment of the present invention；

Fig. 3 is a kind of flow chart calculating HS two-dimensional color histograms provided in an embodiment of the present invention；

Fig. 4 is a kind of flow chart calculating perceptual hash vector provided in an embodiment of the present invention；

Fig. 5 is a kind of flow chart obtaining input video classification results provided in an embodiment of the present invention；

Fig. 6 is a kind of flow chart of visual classification specific method provided in an embodiment of the present invention；

Fig. 7 is a kind of particular flow sheet calculating HS two-dimensional color histograms provided in an embodiment of the present invention；

Fig. 8 is a kind of particular flow sheet calculating perceptual hash vector provided in an embodiment of the present invention.

Fig. 9 is a kind of structure diagram of video classification system provided in an embodiment of the present invention；

Figure 10 is a kind of structure diagram of first acquisition module provided in an embodiment of the present invention；

Figure 11 is the structure diagram of another first acquisition module provided in an embodiment of the present invention；

Figure 12 is a kind of structure diagram of second sort module provided in an embodiment of the present invention.

Specific implementation mode

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes.

The basic thought of the embodiment of the present invention is：The system obtains at least one video segmentation from input video；Institute System is stated according to the range performance in each video segmentation, obtains the corresponding key frame of each video segmentation；The system is to each key Frame carries out image classification, obtains the corresponding static classification set of each key frame；The system is according to the corresponding pass of each video segmentation The corresponding static classification set of key frame, each key frame and default video classification parameter, obtain the classification results of the input video, Visual classification is carried out using key frame and solves the problems, such as that visual classification accuracy is poor.

Embodiment one

Referring to Fig. 1, it illustrates a kind of video classification methods, the method is used for a kind of video classification system, the side Method includes：

S101：The system obtains at least one video segmentation from input video；

S102：The system obtains the corresponding key frame of each video segmentation according to the range performance in each video segmentation；

S103：The system carries out image classification to each key frame, obtains the corresponding static classification set of each key frame；

S104：The system according to the corresponding key frame of each video segmentation, the corresponding static classification set of each key frame and Default video classification parameter, obtains the classification results of the input video.

For step S101, the system obtains at least one video segmentation from input video, specifically includes：

As shown in Fig. 2, the system according to the HS two-dimensional colors Histogram distance of adjacent image frame in input video with it is pre- If first threshold between correspondence and input video in adjacent image frame perceptual hash vector distance and preset the Correspondence between two threshold values is segmented input video, obtains at least one video segmentation, specifically includes：

S1011：In the system-computed input video HS two-dimensional colors histogram of adjacent image frame and perceptual hash to Amount；

S1012：The system is breathed out according to the HS two-dimensional colors histogram and perception of adjacent image frame in the input video Uncommon vector, calculates the HS two-dimensional colors Histogram distance and perceptual hash vector distance of adjacent image frame in the input video；

S1013：When the HS two-dimensional color Histogram distances of adjacent image frame in the input video are more than preset first Threshold value, and in the input video adjacent image frame perceptual hash vector distance be more than preset second threshold when, the system System is segmented input video, obtains at least one video segmentation.

For step S1011, by taking a picture frame in adjacent image frame in input video as an example, referring to Fig. 3, the system Statistics calculate HS two-dimensional color histograms process be step S301 to step S302, specifically include：

S301：The system satisfies picture frame from RGB (RGB, Red Green Blue) color space conversion to tone With degree lightness (HSV, Hue Saturation Value) color space, obtain tone (H, Hue) component, saturation degree (S, Saturation) component；

S302：The corresponding channels H of H components channel S corresponding with S components is divided into a sections by the system, and statistics obtains Obtain tone saturation degree (HS, Hue Saturation) two-dimensional color histogram of picture frame.

Specifically, it for step S301, is transformed into hsv color space from RGB color and obtains H components and S components Formula is respectively as shown in formula (1) and formula (2)；

In formula (1) and formula (2), R indicates that the red component of input video, G indicate the green component of input video, B indicates that the blue component of input video, H indicate that the chrominance component of input video, S indicate the saturation degree component of input video, V tables Show the lightness component of input video, V=max (R, G, B), 0≤V≤1；

Wherein, 0≤H≤360 in formula (1), if H<0, then H=H+360；In formula (2), 0≤S≤1.

For step S302, it is preferable that the system is by the corresponding channels H of H components channel S corresponding with S components It is divided into 32 sections.

For step S1011, by taking a picture frame in adjacent image frame in input video as an example, referring to Fig. 4, the system Statistics calculate perceptual hash vector process be step S401 to step S407, specifically include：

S401：Picture frame is zoomed to pre-set dimension by the system, obtains the first image array；

S402：The system carries out image gray processing processing to described first image matrix, obtains the second image array；

S403：The system carries out two-dimensional dct to second image array, obtains third image array；

S404：The system, which is chosen, presets submatrix as the 4th image array in third image array；

S405：The average pixel value of 4th image array described in the system-computed；

S406：The system carries out the pixel value of each picture element in the 4th image array and the average pixel value Compare, obtains newer 4th image array；

S407：Newer 4th image array is carried out vectorization by the system, obtain final perceptual hash to Amount.

For step S401, it is preferable that picture frame is zoomed to the size of 32 × 32cm by the system.

Specifically, for step S403, formula is such as discrete cosine transform (DCT, Discrete Cosine Transform) Shown in formula (3)；

Wherein, u, v=0,1,2 ..., N-1, it is the size value that picture frame is zoomed to pre-set dimension by the system, I to take N (x, y) indicates image in the pixel value of (x, y) point, dct transform result of F (u, the v) expressions positioned at (u, v) point.

For step S404, it is preferable that the submatrix that the system chooses 8 × 8cm of the upper left corner in third image array is made For the 4th image array.

Specifically, for step S406, when the pixel value of the picture element in the 4th image array is more than the average pixel The pixel value of picture element in 4th image array is labeled as 0 by value, the system；Pixel in the 4th image array The pixel value of point is less than the average pixel value, and the system marks the pixel value of the picture element in the 4th image array It is 1, to obtain newer 4th image array.

For step S1012, shown in the calculating such as formula (4) of the HS two-dimensional colors Histogram distance；

Wherein, D_HIndicate the two-dimensional color Histogram distance of adjacent image frame, H_t(x, y) indicates in adjacent image frame one The HS two-dimensional colors histogram of picture frame is in the statistical value of (x, y) point, H_t+1(x, y) indicates another image in adjacent image frame Statistical value of the HS two-dimensional colors histogram of frame in (x, y) point.

For step S1012, shown in the calculation formula such as formula (5) and formula (6) of the perceptual hash vector distance：

In formula (5), D_pIndicate perceptual hash vector distance, P_t(i) perception of a picture frame in adjacent image frame is indicated I-th of element of Hash vector, P_t+1(i) vectorial i-th of the element of the perceptual hash of another picture frame in adjacent image frame is indicated； In formula (6), P is worked as in D (x, x') expressions_t+1(i)=P_t+1(i) when, D (x, x')=1；Work as P_t+1(i)≠P_t+1(i) when, D (x, x') =0.

For step S1013, it should be noted that when the adjacent image frame HS two-dimensional colors Histogram distance with it is pre- If first threshold between correspondence and adjacent image frame perceptual hash vector distance and preset second threshold it Between correspondence be unsatisfactory for the HS two-dimensional color Histogram distances of the adjacent image frame and be more than preset first threshold, and phase When the perceptual hash vector distance of adjacent picture frame is more than preset second threshold, the system will not divide input video It cuts.

For step S102, it is corresponding to obtain each video segmentation according to the range performance in each video segmentation for the system Key frame specifically includes：

Specifically, for step S102, the system is by the average HS two-dimensional colors histogram of video segmentation and average HS The HS two-dimensional color histograms of all picture frames are compared in the corresponding video segmentation of two-dimensional color histogram, selected distance The HS two-dimensional colors histogram of the average nearest picture frame of HS two-dimensional colors histogram of the video segmentation is as the video The key frame of segmentation.

For step S103, the system carries out image classification to each key frame, and it is static point corresponding to obtain each key frame Class set, specifically includes：

The system carries out image classification by Image Classifier to each key frame, is obtained according to default key frame classification parameter Take the corresponding static classification set of each key frame；Wherein, described image grader is generated by deep neural network；

Specifically, for the generation of Image Classifier, the system collects each other data representing image of video class, makees It, can also be by carrying out the pre- places such as image enhancement, rotation, random cropping to data representing image for the training data of disaggregated model Training data is inputted deep neural network, the depth nerve by reason operation, further training for promotion data bulk, the system The spy of the similar model extraction training data of depth convolutional neural networks such as AlexNet, GoogleNet or other can be used in network Sign, by continuous repetitive exercise, obtains accurate image classification model as Image Classifier.

For step S103, the default key frame classification parameter, for limiting the corresponding static classification collection of each key frame The number of crucial frame category in conjunction；

Wherein, the default key frame classification parameter needs are specifically determined according to actual conditions, for example, by successive ignition Trained deep neural network, can be using the larger preceding k classes of probability in the classification results of deep neural network output as input The static classification result of key frame of video.

For step S104, referring to Fig. 5, the system is corresponded to according to the corresponding key frame of each video segmentation, each key frame Static classification set and default video classification parameter, obtain the classification results of the input video, specifically include：

S1041：The number of image frames meter that the number of image frames and each video segmentation that the system includes according to input video include Calculate the time weighting of each video segmentation；

S1042：The system obtains regarding for the input video according to the corresponding static classification set of each key frame Frequency division class set；

S1043：Static point corresponding with each key frame according to each visual classification in the visual classification set of the system Relationship between class set obtains the visual classification coefficient of each visual classification in the visual classification set；

S1044：The system is according to each video point in the time weighting of each video segmentation, the visual classification set The visual classification coefficient of class calculates the visual classification weight of each visual classification in visual classification set；

S1045：The system is according to the visual classification weight of each visual classification in the visual classification set and presets Video classification parameter obtains the final classification result of input video.

For step S1041, it is assumed that input video is made of N number of picture frame, wherein key frame KF₁,KF₂,...,KF_nIn KF_iCorresponding video segmentation S_iBy n_iA picture frame composition, KF_iCorresponding video segmentation S_iTime weighting w_iCalculation formula is such as public Shown in formula (7).

For step S1042, specifically, the system is using the corresponding static classification union of sets collection of each key frame as institute State the visual classification set of input video.

For step S1043, visual classification coefficient indicates key frame KF_iStatic classification result in whether include video point Visual classification C in class set_x；As input video key KF_iStatic classification result in include the video in visual classification set Classify C_x, then S_i(C_x)=1；As input video key frame KF_iStatic classification result in do not include visual classification set in regarding Frequency division class C_x, then S_i(C_x)=0.

For step S1044, C in the visual classification set is calculated_xVisual classification weightAs shown in formula (8)；

Wherein, w_iFor KF_iCorresponding video segmentation S_iTime weighting, S_i(C_x) indicate C_xVisual classification coefficient.

For step S1045, the default video classification parameter, the number for limiting input video classification results；

Wherein, the default video classification parameter needs are specifically determined according to actual conditions, for example, can be by input video The corresponding visual classification weight of visual classification set in the larger first k' final classification result as input video of weights.

A kind of video classification methods are present embodiments provided, the system obtains at least one video point from input video Section；The system obtains the corresponding key frame of each video segmentation according to the range performance in each video segmentation；The system is to each Key frame carries out image classification, obtains the corresponding static classification set of each key frame；The system is corresponded to according to each video segmentation Key frame, the corresponding static classification set of each key frame and default video classification parameter, obtain the classification of the input video As a result, carrying out visual classification using key frame solves the problems, such as that visual classification accuracy is poor.

Embodiment two

Based on the identical technical concept of previous embodiment, referring to Fig. 6, it illustrates a kind of visual classification specific method, institutes The method of stating includes：

S601：In the system-computed input video HS two-dimensional colors histogram of adjacent image frame and perceptual hash to Amount；

S602：The system is according to the HS two-dimensional colors histogram and perceptual hash of adjacent image frame in the input video Vector calculates the HS two-dimensional colors Histogram distance and perceptual hash vector distance of adjacent image frame in the input video；

S603：When the HS two-dimensional color Histogram distances of adjacent image frame in the input video are more than preset first threshold Value, and in the input video adjacent image frame perceptual hash vector distance be more than preset second threshold when, the system Input video is segmented, at least one video segmentation is obtained；

S604：The system obtains the average HS two-dimensional colors histogram of each video segmentation, and according to each average HS two dimensions The HS two-dimensional colors histogram of the picture frame of color histogram video segmentation corresponding with each average HS two-dimensional color histograms it Between distance relation, obtain the corresponding key frame of each video segmentation；

S605：The system carries out image classification by Image Classifier to each key frame, according to default crucial frame category The corresponding static classification set of each key frame of parameter acquiring；Wherein, described image grader is generated by deep neural network；

S606：The number of image frames that the number of image frames and each video segmentation that the system includes according to input video include calculates The time weighting of each video segmentation；

S607：The system obtains regarding for the input video according to the corresponding static classification set of each key frame Frequency division class set；

S608：The system is according to each visual classification static classification corresponding with each key frame in the visual classification set Relationship between set obtains the visual classification coefficient of each visual classification in the visual classification set；

S609：The system is according to each video point in the time weighting of each video segmentation, the visual classification set The visual classification coefficient of class calculates the visual classification weight of each visual classification in visual classification set；

S610：The system is according to the visual classification weight of each visual classification in the visual classification set and default regards Frequency classification parameter obtains the final classification result of input video.

For step S601, by taking a picture frame in adjacent image frame in input video as an example, referring to Fig. 7, the system Statistics calculate HS two-dimensional color histograms process be step S701 to step S702, specifically include：

S701：Picture frame is transformed into hsv color space by the system from RGB color, obtains H, component, S components；

S702：The corresponding channels H of H components channel S corresponding with S components is divided into a sections by the system, and statistics obtains Obtain the HS two-dimensional color histograms of picture frame；Preferably, the system is corresponding with S components by the corresponding channels H of the H components Channel S is divided into 32 sections.

Specifically, it for step S701, is transformed into hsv color space from RGB color and obtains H components and S components Formula is respectively as shown in formula (1) and formula (2)；

In formula (1) and formula (2), R indicates that the red component of input video, G indicate the green component of input video, B indicates that the blue component of input video, H indicate that the chrominance component of input video, S indicate the saturation degree component of input video, V tables Show the lightness component of input video, V=max (R, G, B), 0≤V≤1；Wherein, 0≤H≤360 in formula (1), if H<0, then H =H+360；In formula (2), 0≤S≤1.

32 sections are divided into [0,360] section for step S702, such as by the channels H, specially [0,11.25), [11.25,22.5) ... [348.75,360]；Channel S is divided into 32 sections in [0,1] section, specially [0,0.03125), [0.03125,0.0625) ... [0.96875,1]；When the numerical value of H components belongs to [0,11.25), [11.25,22.5) ... In [348.75,360] when some section, the statistical value in corresponding section adds 1, when the numerical value of S components belongs to [0,0.03125), [0.03125,0.0625) ... in [0.96875,1] when some section, the statistical value in corresponding section adds 1, and final statistics obtains Obtain the HS two-dimensional color histograms of picture frame.

For step S601, by taking a picture frame in adjacent image frame in input video as an example, referring to Fig. 8, sense is calculated Know Hash vector process be step S801 to step S807, specifically include：

S801：Picture frame is zoomed to pre-set dimension by the system, obtains the first image array；Preferably, pre-set dimension For 32 × 32cm；

S802：The system carries out image gray processing processing to described first image matrix, obtains the second image array；

S803：The system carries out two-dimensional dct transform to second image array, obtains third image array；

S804：The system, which is chosen, presets submatrix as the 4th image array in third image array；Preferably, it presets Submatrix is the image array in the upper left corner 8 × 8 in third image array；

S805：The average pixel value of 4th image array described in the system-computed；

S806：The system carries out the pixel value of each picture element in the 4th image array and the average pixel value Compare, obtains newer 4th image array；

S807：Newer 4th image array is carried out vectorization by the system, obtain final perceptual hash to Amount.

Specifically, for step S803, shown in the formula such as formula (3) of DCT；

Wherein, u, v=0,1,2 ..., N-1, it is the size value that picture frame is zoomed to pre-set dimension by the system, I to take N (x, y) indicates image in the pixel value of (x, y) point, DCT result of F (u, the v) expressions positioned at (u, v) point.

Specifically, for step S806, when the pixel value of the picture element in the 4th image array is more than the average pixel The pixel value of picture element in 4th image array is labeled as 0 by value, the system；Pixel in the 4th image array The pixel value of point is less than the average pixel value, and the system marks the pixel value of the picture element in the 4th image array It is 1, to obtain newer 4th image array.

Specifically, for step S807, two values matrix of the system by newer 4th image array from 8 × 8 It is changed into 1 × 8²Row matrix or be changed into 8²× 1 column matrix, then 1 × 8 after changing²Row matrix or 8²× 1 row square Battle array is that final perceptual hash is vectorial.

For step S602, shown in the calculation formula such as formula (4) of the HS two-dimensional colors Histogram distance；

Wherein, D_HIndicate the two-dimensional color Histogram distance of adjacent image frame, H_t(x, y) indicates in adjacent image frame one The HS two-dimensional colors histogram of picture frame is in the statistical value of (x, y) point, H_t+1(x, y) indicates another image in adjacent image frame Statistical value of the HS two-dimensional colors histogram of frame in (x, y) point；

Specifically, from formula (4) as can be seen that D_HResult be two picture frames HS two-dimensional color histograms it is same The quadratic sum of the difference of the statistical value of position (x, y) point.

For step S602, shown in the calculation formula such as formula (5) and formula (6) of the perceptual hash vector distance；

In formula (5), D_pIndicate perceptual hash vector distance, P_t(i) perception of a picture frame in adjacent image frame is indicated I-th of element of Hash vector, P_t+1(i) vectorial i-th of the element of the perceptual hash of another picture frame in adjacent image frame is indicated； In formula (6), D (x, x') indicates P_t(i) and P_t+1(i) relationship between, works as P_t+1(i)=P_t+1(i) when, D (x, x')=1；When P_t+1(i)≠P_t+1(i) when, D (x, x')=0；

Specifically, it can be seen that from formula (6) when vectorial i-th of the element of the perceptual hash of adjacent image frame is equal, D (x, x')=1, when vectorial i-th of the element of the perceptual hash of adjacent image frame is unequal, D (x, x')=0, i.e., by adjacent image The perceptual hash vector of frame carries out " position with ", then the result of " position with " is summed, obtained result is perceptual hash vector Distance D_p。

For step S603, it should be noted that when the adjacent image frame HS two-dimensional colors Histogram distance with it is pre- If first threshold between correspondence and adjacent image frame perceptual hash vector distance and preset second threshold it Between correspondence be unsatisfactory for the HS two-dimensional color Histogram distances of the adjacent image frame and be more than preset first threshold, and phase When the perceptual hash vector distance of adjacent picture frame is more than preset second threshold, the system will not divide input video It cuts.

Specifically, for step S604, the system is by the average HS two-dimensional colors histogram of video segmentation and average HS The HS two-dimensional color histograms of all picture frames are compared in the corresponding video segmentation of two-dimensional color histogram, selected distance The HS two-dimensional colors histogram of the average nearest picture frame of HS two-dimensional colors histogram of the video segmentation is as the video The key frame of segmentation.

Illustratively, the adjacent image frame F obtained for step S602 to step S604, the system_tWith F_t+1HS bis- Tie up color histogram map distance D_HMore than two-dimensional color histogram thresholding T₁, and perceptual hash vector distance D_pMore than perceptual hash to Measure threshold value T₂, then the system will be split input video, obtain a video segmentation, same method, final institute State all video segmentation S that system will obtain the input video₁,S₂,...,S_n, the system-computed video segmentation S_iIt is flat Equal HS two-dimensional color histogramsAnd it willWith video segmentation S_iIn the HS two-dimensional color histograms of all picture frames compared Compared with, selection withHS two-dimensional colors histogram apart from immediate video segmentation picture frame is as video segmentation S_iKey Frame KF_i, same method, the final system will obtain all video segmentation S₁,S₂,...,S_nCorresponding key frame KF₁, KF₂,...,KF_n。

For step S605, the default key frame classification parameter, for limiting the corresponding static classification collection of each key frame The number of crucial frame category in conjunction；

For step S605, specifically, the generation for Image Classifier, the system collects each video class other generation Table image data, as the training data of disaggregated model, can also by data representing image carry out image enhancement, rotation, Training data is inputted depth nerve net by the pretreatment operations such as random cropping, further training for promotion data bulk, the system The similar model of depth convolutional neural networks such as AlexNet, GoogleNet or other can be used in network, the deep neural network The feature of extraction training data obtains accurate image classification model as Image Classifier by continuous repetitive exercise.

For step S606, it is assumed that input video is made of N number of picture frame, wherein key frame KF₁,KF₂,...,KF_nIn KF_iCorresponding video segmentation S_iBy n_iA picture frame composition, KF_iCorresponding video segmentation S_iTime weighting calculation formula such as formula (7) shown in.

For step S607, specifically, the system is using the corresponding static classification union of sets collection of each key frame as institute State the visual classification set of input video；

For example, default key frame classification parameter is 3, KF₁Corresponding static classification collection is combined into { C₁,C₂,C₅, KF₂It is corresponding Static classification collection is combined into { C₂,C₄,C₆..., KF_nCorresponding static classification collection is combined into { C₃,C₄,C₅, then the video of input video Category set is combined into { C₁,C₂,C₃,C₄,C₅,C₆}。

Specifically, for step S608, it is assumed that the visual classification collection of input video is combined into { C₁,C₂,C₃,C₄,C₅,C₆Totally 6 Class, using S_i(C_x) indicate C_xVisual classification coefficient, specifically, S_i(C_x) indicate key frame KF_iStatic classification result in be The no visual classification C comprising in visual classification set_x, by S_i(C_x) it is used as C_xVisual classification coefficient, wherein C_x∈{C₁,C₂, C₃,C₄,C₅,C₆}；As input video key KF_iStatic classification result in include the visual classification C in visual classification set_x, then S_i(C_x)=1；As input video key frame KF_iStatic classification result in do not include visual classification set in visual classification C_x, Then S_i(C_x)=0.

For step S609, C in the visual classification set is calculated_xVisual classification weightAs shown in formula (8)；

For step S610, the default video classification parameter, the number for limiting input video classification results；

Wherein, the default video classification parameter needs are specifically determined according to actual conditions, for example, can be by input video Visual classification set { C₁,C₂,C₃,C₄,C₅,C₆Corresponding each visual classification weightMiddle power It is worth the first 3 larger final classification results as input video.

Present embodiments provide a kind of visual classification specific method, adjacent image frame in the system-computed input video HS two-dimensional colors histogram and perceptual hash vector；The system is according to the HS two dimension face of adjacent image frame in the input video Color Histogram and perceptual hash vector, calculate HS two-dimensional colors Histogram distance and the sense of adjacent image frame in the input video Know Hash vector distance；When the HS two-dimensional color Histogram distances of adjacent image frame in the input video are more than preset first Threshold value, and in the input video adjacent image frame perceptual hash vector distance be more than preset second threshold when, the system System is segmented input video, obtains at least one video segmentation；The system obtains the average HS two dimensions of each video segmentation Color histogram, and according to the average HS two-dimensional colors histogram of each video segmentation and average HS two-dimensional color histograms pair The distance between the HS two-dimensional color histograms of the picture frame for the video segmentation answered relationship, obtains the corresponding pass of each video segmentation Key frame；The system carries out image classification by Image Classifier to each key frame, is obtained according to default key frame classification parameter The corresponding static classification set of each key frame；Wherein, described image grader is generated by deep neural network；The system The number of image frames that the number of image frames and each video segmentation for including according to input video include calculates the time weighting of each video segmentation； The system obtains the visual classification set of the input video according to the corresponding static classification set of each key frame；Institute System is stated according to the relationship between each visual classification static classification set corresponding with each key frame in the visual classification set, Obtain the visual classification coefficient of each visual classification in the visual classification set；The system according to each video segmentation when Between in weight, the visual classification set each visual classification visual classification coefficient, calculate each video point in visual classification set The visual classification weight of class；The system is according to the visual classification weight of each visual classification in the visual classification set and in advance Setting video classification parameter obtains the final classification of input video as a result, carrying out visual classification using key frame solves visual classification The problem of accuracy difference.

Embodiment three

Referring to Fig. 9, it illustrates a kind of structure of video classification system 90, the system comprises：First acquisition module 901, the second acquisition module 902, the first sort module 903 and the second sort module 904, wherein

First acquisition module 901, for obtaining at least one video segmentation from input video；

Second acquisition module 902, for according to the range performance in each video segmentation, obtaining each video segmentation and corresponding to Key frame；

First sort module 903 obtains the corresponding static state of each key frame for carrying out image classification to each key frame Classification set；

Second sort module 904, for according to the corresponding key frame of each video segmentation, the corresponding static state of each key frame Classification set and default video classification parameter, obtain the classification results of the input video.

For first acquisition module 901, it is specifically used for

Further, first acquisition module 901, is specifically used for

And HS two-dimensional colors histogram and the perceptual hash vector according to adjacent image frame in the input video, meter Calculate the HS two-dimensional colors Histogram distance and perceptual hash vector distance of adjacent image frame in the input video；

And when the HS two-dimensional color Histogram distances of adjacent image frame in the input video are more than preset first threshold Value, and in the input video adjacent image frame perceptual hash vector distance be more than preset second threshold when, to input regard Frequency is segmented, and at least one video segmentation is obtained.

For calculating the process of HS two-dimensional color histograms, it is with a picture frame in adjacent image frame in input video Example, first acquisition module 901 include the first transformation submodule 9011 and statistic submodule 9012, and Figure 10 is the first acquisition mould A kind of structure diagram of block 901, wherein

First transformation submodule 9011 is obtained for picture frame to be transformed to hsv color space from RGB color Take H components, S components；

The statistic submodule 9012, for the corresponding channels H of H components channel S corresponding with S components to be divided into A sections, statistics obtains the HS two-dimensional color histograms of picture frame.

Specifically, it for first transformation submodule 9011, is transformed into hsv color space from RGB color and obtains H The formula of component and S components is respectively as shown in formula (1) and formula (2)；

In formula (1) and formula (2), R indicates that the red component of input video, G indicate the green component of input video, B Indicate that the blue component of input video, H indicate that the chrominance component of input video, S indicate the saturation degree component of input video, V tables Show the lightness component of input video, V=max (R, G, B), 0≤V≤1；

For the statistic submodule 9012, it is preferable that first acquisition module is by the corresponding channels H of the H components Channel S corresponding with S components is divided into 32 sections.

For the calculating process of perceptual hash vector, by taking a picture frame in adjacent image frame in input video as an example, First acquisition module 901 further includes scaling submodule 9013, gray processing processing submodule 9014, the second transformation submodule 9015, submodule 9016, computational submodule 9017, comparison sub-module 9018 and vectorization submodule 9019 are chosen, Figure 11 the Another structure diagram of one acquisition module 901, wherein

The scaling submodule 9013 obtains the first image array for picture frame to be zoomed to pre-set dimension；

The gray processing handles submodule 9014, for carrying out image gray processing processing to described first image matrix, obtains Take the second image array；

Second transformation submodule 9015 obtains third image for carrying out two-dimensional dct to second image array Matrix；

The selection submodule 9016 presets submatrix as the 4th image array for choosing in third image array；

The computational submodule 9017, the average pixel value for calculating the 4th image array；

The comparison sub-module 9018, for by the pixel value of each picture element in the 4th image array with it is described average Pixel value is compared, and obtains newer 4th image array；

The vectorization submodule 9019 obtains final for newer 4th image array to be carried out vectorization Perceptual hash vector.

For the scaling submodule 9013, it is preferable that picture frame is zoomed to the size of 32 × 32cm.

Specifically, for second transformation submodule 9015, shown in DCT formula such as formula (3)；

For the selection submodule 9016, it is preferable that choose the submatrix of 8 × 8cm of the upper left corner in third image array As the 4th image array.

Specifically, for the comparison sub-module 9018, when the pixel value of the picture element in the 4th image array is more than institute Average pixel value is stated, the pixel value of the picture element in the 4th image array is labeled as 0 by the system；When the 4th image moment The pixel value of picture element in battle array is less than the average pixel value, and the system is by the picture element in the 4th image array Pixel value is labeled as 1, to obtain newer 4th image array.

For first acquisition module 901, calculation formula such as formula (4) institute of the HS two-dimensional colors Histogram distance Show；

For first acquisition module 901, the calculation formula such as formula (5) and formula of the perceptual hash vector distance (6) shown in；

For the first acquisition module 901, it should be noted that when the adjacent image frame HS two-dimensional colors histogram away from Perceptual hash vector distance and preset second from correspondence and adjacent image frame between preset first threshold The HS two-dimensional color Histogram distances that correspondence between threshold value is unsatisfactory for the adjacent image frame are more than preset first threshold Value, and the perceptual hash vector distance of adjacent image frame be more than preset second threshold when, first acquisition module 901 will not Input video can be split.

For second acquisition module 902, it is specifically used for

Further, second acquisition module 902, is specifically used for

For first sort module 903, it is specifically used for

Specifically, for the generation of Image Classifier, first sort module 903 is mainly used for

The other data representing image of each video class is collected, it, can also be by generation as the training data of disaggregated model Table image data carries out the pretreatment operations such as image enhancement, rotation, random cropping, further training for promotion data bulk；

And training data is inputted into deep neural network, depth convolutional Neural net can be used in the deep neural network The feature of the similar model extraction training datas of network such as AlexNet, GoogleNet or other is obtained by continuous repetitive exercise Accurate image classification model is obtained as Image Classifier.

To first sort module 903, the default key frame classification parameter is corresponding quiet for limiting each key frame The number of crucial frame category in state classification set；

As shown in figure 12, second sort module 904 includes the first acquisition submodule 9041, the second acquisition submodule 9042, third acquisition submodule 9043, the 4th acquisition submodule 9044 and the 5th acquisition submodule 9045, wherein

First acquisition submodule 9041, number of image frames and each video segmentation for including according to input video include Number of image frames calculate the time weighting of each video segmentation；

Second acquisition submodule 9042, for according to the corresponding static classification set of each key frame, obtaining institute State the visual classification set of input video；

The third acquisition submodule 9043, for according to each visual classification in the visual classification set and each key frame Relationship between corresponding static classification set obtains the visual classification coefficient of each visual classification in the visual classification set；

4th acquisition submodule 9044, for time weighting, the visual classification according to each video segmentation The visual classification coefficient of each visual classification in set calculates the visual classification weight of each visual classification in visual classification set；

5th acquisition submodule 9045, for the visual classification according to each visual classification in the visual classification set Weight and default video classification parameter, obtain the final classification result of input video.

For first acquisition submodule 9041, it is assumed that input video is made of N number of picture frame, wherein key frame KF₁, KF₂,...,KF_nMiddle KF_iCorresponding video segmentation S_iBy n_iA picture frame composition, KF_iCorresponding video segmentation S_iTime weighting w_i Shown in calculation formula such as formula (7).

For second acquisition submodule 9042, specifically, by the corresponding static classification union of sets collection of each key frame Visual classification set as the input video.

For the third acquisition submodule 9043, S_i(C_x) indicate C_xVisual classification coefficient, specifically, S_i(C_x) indicate Key frame KF_iStatic classification result in whether include the visual classification C in visual classification set_x；As input video key KF_i Static classification result in include the visual classification C in visual classification set_x, then S_i(C_x)=1；As input video key frame KF_i Static classification result in do not include visual classification set in visual classification C_x, then S_i(C_x)=0.

For the 4th acquisition submodule 9044, C in the visual classification set is calculated_xVisual classification weight W_CxSuch as Shown in formula (8)；

For the 5th acquisition submodule 9045, the default video classification parameter, for limiting input video classification As a result number；

Specifically, for the present embodiment, first acquisition module 901, the second acquisition module 902, the first sort module 903 and second the function of sort module 904 program or pre- storage in memory can be called by the processor of the system 90 Data are realized.In practical applications, above-mentioned processor can be application-specific IC (ASIC, Application Specific Integrated Circuit), digital signal processor (DSP, Digital Signal Processor), number Word signal processing apparatus (DSPD, Digital Signal Processing Device), programmable logic device (PLD, Programmable Logic Device), field programmable gate array (FPGA, Field Programmable Gate Array), in central processing unit (CPU, Central Processing Unit), controller, microcontroller, microprocessor extremely Few one kind.It is to be appreciated that for different systems, the electronic device for realizing above-mentioned processor function can also be it It, the embodiment of the present invention is not especially limited.

Present embodiments provide a kind of video classification system, first acquisition module 901, for being obtained from input video Take at least one video segmentation；Second acquisition module 902, for according to the range performance in each video segmentation, obtaining each The corresponding key frame of video segmentation；First sort module 903 obtains each pass for carrying out image classification to each key frame The corresponding static classification set of key frame；Second sort module 904 is used for according to the corresponding key frame of each video segmentation, respectively The corresponding static classification set of key frame and default video classification parameter, obtain the classification results of the input video, using pass Key frame carries out visual classification and solves the problems, such as that visual classification accuracy is poor.

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the present invention Formula.Moreover, the present invention can be used can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).

The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.

The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.

Claims

1. a kind of video classification methods, which is characterized in that the method is used for a kind of video classification system, the method includes：

The system obtains at least one video segmentation from input video；

The system is according to the corresponding key frame of each video segmentation, the corresponding static classification set of each key frame and default video class Other parameter, obtains the classification results of the input video.

2. according to the method described in claim 1, it is characterized in that, the system obtains at least one video from input video Segmentation, specifically includes：

The system according to the HS two-dimensional colors Histogram distance of adjacent image frame in input video and preset first threshold it Between correspondence and input video in adjacent image frame perceptual hash vector distance and preset second threshold between pair It should be related to and input video is segmented, obtain at least one video segmentation.

3. according to the method described in claim 2, it is characterized in that, the system is according to the HS of adjacent image frame in input video The perception of adjacent image frame in correspondence and input video between two-dimensional color Histogram distance and preset first threshold Correspondence between Hash vector distance and preset second threshold is segmented input video, obtains at least one video Segmentation, specifically includes：

In the system-computed input video tone saturation degree HS two-dimensional colors histogram of adjacent image frame and perceptual hash to Amount；

The system is calculated according to HS two-dimensional colors histogram and the perceptual hash vector of adjacent image frame in the input video The HS two-dimensional colors Histogram distance and perceptual hash vector distance of adjacent image frame in the input video；

It is and described when the HS two-dimensional color Histogram distances of adjacent image frame in the input video are more than preset first threshold When the perceptual hash vector distance of adjacent image frame is more than preset second threshold in input video, the system is to input video It is segmented, obtains at least one video segmentation.

4. according to the method described in claim 3, it is characterized in that, the process tool of the system-computed HS two-dimensional color histograms Body includes：

Picture frame is transformed into tone saturation degree lightness hsv color space by the system from RGB RGB color, obtains color Adjust H components, saturation degree S components；

The corresponding channels H of H components channel S corresponding with S components is divided into a sections by the system, and statistics obtains picture frame HS two-dimensional color histograms.

5. according to the method described in claim 3, it is characterized in that, the process of the system-computed perceptual hash vector is specifically wrapped It includes：

The average pixel value of 4th image array described in the system-computed；

The pixel value of each picture element in 4th image array is compared by the system with the average pixel value, is obtained Newer 4th image array；

6. according to the method described in claim 2, it is characterized in that, when the adjacent image frame HS two-dimensional colors histogram away from Perceptual hash vector distance and preset second from correspondence and adjacent image frame between preset first threshold The HS two-dimensional color Histogram distances that correspondence between threshold value is unsatisfactory for the adjacent image frame are more than preset first threshold Value, and the perceptual hash vector distance of adjacent image frame be more than preset second threshold when, the system will not to input regard Frequency is split.

7. according to the method described in claim 1, it is characterized in that, the system according to the range performance in each video segmentation, The corresponding key frame of each video segmentation is obtained, is specifically included：

The system obtains the average HS two-dimensional colors histogram of each video segmentation, and according to each average HS two-dimensional color histograms The distance between the HS two-dimensional color histograms of picture frame of video segmentation corresponding with each averagely HS two-dimensional color histograms close System, obtains the corresponding key frame of each video segmentation.

8. the method according to the description of claim 7 is characterized in that the system is straight by the average HS two-dimensional colors of video segmentation The HS two-dimensional color histograms of all picture frames carry out in side's figure video segmentation corresponding with average HS two-dimensional color histograms Compare, the HS two-dimensional color histograms of the average nearest picture frame of HS two-dimensional colors histogram of video segmentation described in selected distance Key frame as the video segmentation.

9. according to the method described in claim 1, it is characterized in that, the system carries out image classification, acquisition to each key frame The corresponding static classification set of each key frame, specifically includes：

The system carries out image classification by Image Classifier to each key frame, is obtained according to default key frame classification parameter each The corresponding static classification set of key frame；Wherein, described image grader is generated by deep neural network.

10. according to the method described in claim 9, it is characterized in that, the default key frame classification parameter, for limiting each pass The number of crucial frame category in the corresponding static classification set of key frame.

11. according to the method described in claim 1, it is characterized in that, the system according to the corresponding key frame of each video segmentation, Each corresponding static classification set of key frame and default video classification parameter, obtain the classification results of the input video, specifically Including：

The number of image frames that the number of image frames and each video segmentation that the system includes according to input video include calculates each video point The time weighting of section；

The system obtains the visual classification collection of the input video according to the corresponding static classification set of each key frame It closes；

The system is according between each visual classification static classification set corresponding with each key frame in the visual classification set Relationship, obtain the visual classification coefficient of each visual classification in the visual classification set；

The system is according to the video point of each visual classification in the time weighting of each video segmentation, the visual classification set Class coefficient calculates the visual classification weight of each visual classification in visual classification set；

The system is joined according to the visual classification weight of each visual classification in the visual classification set and default video classification Number, obtains the final classification result of input video.

12. according to the method for claim 11, which is characterized in that the system is by the corresponding static classification collection of each key frame Visual classification set of the union of conjunction as the input video.

13. according to the method for claim 11, which is characterized in that the default video classification parameter, for limiting input The number of visual classification result.

14. a kind of video classification system, which is characterized in that the system comprises：First acquisition module, the second acquisition module, One sort module and the second sort module, wherein

Second acquisition module, for according to the range performance in each video segmentation, obtaining the corresponding key of each video segmentation Frame；

First sort module obtains the corresponding static classification collection of each key frame for carrying out image classification to each key frame It closes；

Second sort module, for according to the corresponding key frame of each video segmentation, the corresponding static classification collection of each key frame Conjunction and default video classification parameter, obtain the classification results of the input video.

15. system according to claim 14, which is characterized in that first acquisition module is specifically used for

According to corresponding between the HS two-dimensional colors Histogram distance of adjacent image frame in input video and preset first threshold Correspondence pair in relationship and input video between the perceptual hash vector distance and preset second threshold of adjacent image frame Input video is segmented, and at least one video segmentation is obtained.

16. system according to claim 15, which is characterized in that first acquisition module is specifically used for

According to the HS two-dimensional colors histogram of adjacent image frame in the input video and perceptual hash vector, the input is calculated The HS two-dimensional colors Histogram distance and perceptual hash vector distance of adjacent image frame in video；

It is and described when the HS two-dimensional color Histogram distances of adjacent image frame in the input video are more than preset first threshold When the perceptual hash vector distance of adjacent image frame is more than preset second threshold in input video, input video is divided Section, obtains at least one video segmentation.

17. system according to claim 16, which is characterized in that for calculating the process of HS two-dimensional color histograms, institute It includes the first transformation submodule and statistic submodule to state the first acquisition module, wherein

First transformation submodule obtains H components, S for picture frame to be transformed into hsv color space from RGB color Component；

The statistic submodule, for the corresponding channels H of H components channel S corresponding with S components to be divided into a sections, statistics Obtain the HS two-dimensional color histograms of picture frame.

18. system according to claim 16, which is characterized in that for calculating the process of perceptual hash vector, described the First acquisition module described in one acquisition module further includes scaling submodule, gray processing processing submodule, the second transformation submodule, choosing Take submodule, computational submodule, comparison sub-module and vectorization submodule, wherein

The gray processing handles submodule, for carrying out image gray processing processing to described first image matrix, obtains the second figure As matrix；

Second transformation submodule obtains third image array for carrying out two-dimensional dct to second image array；

The comparison sub-module, for by the pixel value of each picture element in the 4th image array and the average pixel value into Row compares, and obtains newer 4th image array；

The vectorization submodule obtains final perception and breathes out for newer 4th image array to be carried out vectorization Uncommon vector.

19. system according to claim 15, which is characterized in that when the HS two-dimensional color histograms of the adjacent image frame The perceptual hash vector distance of correspondence and adjacent image frame between distance and preset first threshold and preset the The HS two-dimensional color Histogram distances that correspondence between two threshold values is unsatisfactory for the adjacent image frame are more than preset first Threshold value, and the perceptual hash vector distance of adjacent image frame be more than preset second threshold when, first acquisition module will not Input video can be split.

20. system according to claim 14, which is characterized in that second acquisition module is specifically used for

Obtain the average HS two-dimensional colors histogram of each video segmentation, and according to each average HS two-dimensional colors histogram with it is each average The distance between the HS two-dimensional color histograms of the picture frame of the corresponding video segmentation of HS two-dimensional color histograms relationship obtains each The corresponding key frame of video segmentation.

21. system according to claim 20, which is characterized in that second acquisition module is specifically used for

By institute in the average HS two-dimensional colors histogram of video segmentation video segmentation corresponding with average HS two-dimensional color histograms The HS two-dimensional color histograms of some picture frames are compared, the average HS two-dimensional colors histogram of video segmentation described in selected distance Scheme key frame of the HS two-dimensional colors histogram of nearest picture frame as the video segmentation.

22. system according to claim 14, which is characterized in that first sort module is specifically used for

Image classification is carried out to each key frame by Image Classifier, each key frame pair is obtained according to default key frame classification parameter The static classification set answered；Wherein, described image grader is generated by deep neural network.

23. system according to claim 22, which is characterized in that the default key frame classification parameter, it is each for limiting The number of crucial frame category in the corresponding static classification set of key frame.

24. system according to claim 14, which is characterized in that second sort module includes the first acquisition submodule Block, the second acquisition submodule, third acquisition submodule, the 4th acquisition submodule and the 5th acquisition submodule, wherein

First acquisition submodule, the picture frame that number of image frames and each video segmentation for including according to input video include Number calculates the time weighting of each video segmentation；

Second acquisition submodule, for according to the corresponding static classification set of each key frame, obtaining the input and regarding The visual classification set of frequency；

The third acquisition submodule, for corresponding with each key frame quiet according to each visual classification in the visual classification set Relationship between state classification set, obtains the visual classification coefficient of each visual classification in the visual classification set；

4th acquisition submodule, for according to each in the time weighting of each video segmentation, the visual classification set The visual classification coefficient of visual classification calculates the visual classification weight of each visual classification in visual classification set；

5th acquisition submodule, for according to the visual classification weight of each visual classification in the visual classification set and Default video classification parameter, obtains the final classification result of input video.

25. according to the method for claim 24, which is characterized in that second acquisition submodule is specifically used for

26. according to the method for claim 24, which is characterized in that in the 5th acquisition submodule, the pre- setting video Classification parameter, the number for limiting input video classification results.