CN110377787A

CN110377787A - A kind of video classification methods, device and computer readable storage medium

Info

Publication number: CN110377787A
Application number: CN201910545220.5A
Authority: CN
Inventors: 杨洋
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2019-10-25
Anticipated expiration: 2039-06-21
Also published as: CN110377787B

Abstract

The present invention provides a kind of video classification methods, device and computer readable storage mediums, comprising: extracts the video attitude data in target video；Video attitude data is matched with the seed attitude data that preset posture search library includes, determines the corresponding posture class categories of video attitude data, posture search library includes the corresponding relationship of attitude data Yu posture class categories；By the corresponding posture class categories of video attitude data, it is determined as the class categories of target video, the present invention can carry out the matching of posture dimension by video attitude data and seed attitude data, realize the purpose classified in posture dimension to target video.Interference caused by the information of other dimensions is reduced, to reduce data calculation amount, solves the problems, such as to cause calculation amount larger since characteristic similarity calculates.In addition, avoiding the operation for carrying out feature extraction frame by frame to video by the matching way of attitude data, the efficiency to target video classification is improved.

Description

A kind of video classification methods, device and computer readable storage medium

Technical field

The invention belongs to field of computer technology, more particularly to a kind of video classification methods, device and computer-readable Storage medium.

Background technique

In long video, often there is the limb action for much comparing exaggeration in posture in the personage that some plots show, Video section comprising these contents is often that relatively excellent video clip extracts excellent watching focus in video recommendations, and video is beaten Label etc., can high efficiency, accurately find these corresponding video clips of exaggeration posture, and to these video clips into Row classification has important value and significance to video traffic.

In the prior art, it can show many excellent pictures in video, and the corresponding segment of these excellent pictures is mentioned It takes and classifies, the disaggregated model analyzed at present frequently with the feature based on static images is handled, specifically, the classification Template in model by Manual definition including some excellent pictorial feature set, and define the classification mark of these set Label, each set include multiple excellent pictorial features, the image that disaggregated model passes through each video frame in extraction video Feature, and these characteristics of image are matched with the pre-set image feature in excellent pictorial feature set, and according to excellent picture The class label of region feature set determines the classification of video and wonderful.

But during carrying out visual classification, need to carry out feature extraction frame by frame, Yi Jijin to the video frame of video Row characteristic similarity calculates, and data calculation amount is larger, so that treatment effeciency is lower.

Summary of the invention

In view of this, the present invention provides a kind of video classification methods, device and computer readable storage medium, in certain journey The data calculation amount for solving visual classification in current scheme on degree is larger, the lower problem for the treatment of effeciency.

According to the present invention in a first aspect, providing a kind of video classification methods, this method may include:

Extract the video attitude data in target video；

The video attitude data is matched with the seed attitude data that preset posture search library includes, determines institute The corresponding posture class categories of video attitude data are stated, the posture search library includes that the seed attitude data and posture are classified The corresponding relationship of classification；

Class categories by the corresponding posture class categories of the video attitude data, as the target video.

Second aspect according to the present invention provides a kind of visual classification device, the apparatus may include:

First extraction module, for extracting the video attitude data in target video；

Matching module, seed attitude data for including by the video attitude data and preset posture search library into Row matching, determines the corresponding posture class categories of the video attitude data, the posture search library includes the seed posture The corresponding relationship of data and posture class categories；

Determining module is used for by the corresponding posture class categories of the video attitude data, as the target video Class categories.

The third aspect, the embodiment of the invention provides a kind of computer readable storage mediums, which is characterized in that the calculating Computer program is stored on machine readable storage medium storing program for executing, is realized as described in relation to the first aspect when the computer program is executed by processor Video classification methods the step of.

For first technology, the present invention has following advantage:

A kind of video classification methods provided by the invention, by extracting the video attitude data in target video；By video Attitude data is matched with preset posture search library, determines the corresponding posture class categories of video attitude data, posture inspection Suo Ku includes the corresponding relationship of attitude data Yu posture class categories；Posture class categories are determined as to the classification class of target video Not, in the present invention, attitude data is a kind of information for being concerned only with posture movement, special compared to the image of video frame in video Sign, the data volume of attitude data is smaller, and the calculation amount that when matching generates is also smaller.The present invention can pass through the view in target video Seed attitude data in frequency attitude data, with preset posture search library carries out the matching of posture dimension, realizes and ties up in posture Spend the purpose classified to target video.In the matching process of attitude data, the information for reducing other dimensions causes the present invention Interference, to reduce data calculation amount, solve due between characteristics of image characteristic similarity calculate and lead to calculation amount Larger problem.In addition, avoiding the operation for carrying out feature extraction frame by frame to video by the matching way of attitude data, mentioning The high efficiency to target video classification.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 is a kind of step flow chart of video classification methods provided in an embodiment of the present invention；

Fig. 2 is a kind of schematic diagram of human body attitude data provided in an embodiment of the present invention；

Fig. 3 is the schematic diagram of another human body attitude data provided in an embodiment of the present invention；

Fig. 4 is the step flow chart of another video classification methods provided in an embodiment of the present invention；

Fig. 5 is the schematic diagram of another human body attitude data provided in an embodiment of the present invention；

Fig. 6 is the schematic diagram of another human body attitude data provided in an embodiment of the present invention；

Fig. 7 is the schematic diagram of another human body attitude data provided in an embodiment of the present invention；

Fig. 8 is the step flow chart of another video classification methods provided in an embodiment of the present invention；

Fig. 9 is a kind of block diagram of visual classification device provided in an embodiment of the present invention；

Figure 10 is the block diagram of another visual classification device provided in an embodiment of the present invention；

Figure 11 is a kind of block diagram of matching module provided in an embodiment of the present invention.

Specific embodiment

The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention It is fully disclosed to those skilled in the art.

Fig. 1 is a kind of step flow chart of video classification methods provided in an embodiment of the present invention, is applied to terminal, such as Fig. 1 Shown, this method may include:

Video attitude data in step 101, extraction target video.

In embodiments of the present invention, the object content in video pictures when making relevant action, generally can by object content Deformation physically is generated, and attitude data can be used to express this deformation physically, attitude data may include object The relative position information between multiple key points and multiple key points in content, according to the structure of object content, Duo Geguan It can establish connection relationship between key point, since attitude data only includes multiple key points with connection relationship, so posture The data volume of data is typically small, so that the calculation amount of generation is relatively small in the treatment process to attitude data.

In embodiments of the present invention, compared to the image feature information of video frame in video, the data volume of attitude data is more It is small, and attitude data is more directed to posture movement itself, attitude data can pass through the phase between several key points and key point Accurately to express a posture movement to position vector, also, is eliminated in attitude data some incoherent in video pictures Other dimensional informations, e.g., other dimensional informations may include the information such as tone, contrast in video pictures.Due to posture Data eliminate other incoherent dimensional informations, therefore data calculation amount is reduced during visual classification, improve point Class accuracy.

In this step, the video attitude data in target video is extracted, then can first extract the frame view of target video Frequency frame image determines several key points later according to the structure of the object content in video frame images in the object content, And the relative position information between these key points.

For example, referring to Fig. 2, it illustrates a kind of schematic diagrames of human body attitude data, specifically, assuming that Fig. 2 shows views A frame video frame images in frequency, which show the postures that personage goes down on one's kness, then the foundation of human body attitude data can be determined first 6 key points in video frame images: A, B, C, D, E, F.Later according to the structure of human body limb, establish between 6 key points Line relationship, the head pose of human body can be such as determined by line AB, is the arm posture that determines human body by line BC, logical Line BD is crossed to determine the trunk posture of human body, determine therefore the leg posture of human body passes through human body by line DE and line EF The line relationship between 6 key points and this 6 key points for including in attitude data, can accurately express current video frame The posture that personage goes down on one's kness in image.

Step 102, the seed attitude data progress for including by the video attitude data and preset posture search library Match, determine the corresponding posture class categories of the video attitude data, the posture search library includes the seed attitude data With the corresponding relationship of posture class categories.

In embodiments of the present invention, the objects content such as personage in video pictures is usually present many relatively exaggerations or unique Posture movement, the video section comprising these posture movement contents is often the splendid contents of opposite watching focus, quickly accurate Realize the classification for determining these splendid contents, and extract to these splendid contents segments have in business important Application value.The embodiment of the present invention can provide a kind of method for realizing visual classification based on posture dimension, can pass through posture Matching between data accurately quickly finds the posture class categories of splendid contents in video, realizes the classification to video.

Visual classification is realized based on posture dimension, it can be by the video attitude data in target video, with preset appearance Seed attitude data in state search library carries out the matching of posture dimension, to avoid carrying out at the feature extraction to target video Reason and subsequent characteristic matching calculation processing, and the interference of other dimensional informations in addition to posture dimension is reduced, significantly Reduce the data calculation amount in assorting process.

For example, referring to Fig. 2, Fig. 2 shows a kind of movement of posture that personage goes down on one's kness, the human body attitude data in Fig. 2 include 6 A key point: the A, relative position information between B, C, D, E, F and this 6 key points, it is assumed that stored in posture search library The attitude data of going down on one's kness of posture of going down on one's kness movement, referring to Fig. 3, which includes corresponding 6 passes of kneeling part under standard Key point: the relative position information between A ', B ', C ', D ', E ', F ' and this 6 key points, then by the human body appearance in Fig. 2 Attitude data of going down on one's kness in state data and Fig. 3 carries out in matching process, the lower kneeling position in human body attitude data and Fig. 3 in Fig. 2 The keypoint quantity of state data is identical, and the relative position height between key point is similar, therefore by between attitude data It matches, the human body attitude data in Fig. 2 can be directly matched to posture class categories of going down on one's kness.

Step 103, the classification class by the corresponding posture class categories of the video attitude data, as the target video Not.

In the embodiment of the present invention, if target video is to have the movement of multiple postures compared with long video, then multiple posture movements pair The attitude data answered can all be matched with posture search library, to determine a posture classification class for the movement of each posture Not, so that the longer target video has multiple posture class categories.

In addition, in extracting target video when the video attitude data of posture movement, it can also be by video posture number It is associated according to the video frame for including posture movement, so that while determining the posture class categories of posture movement, It further defines the posture and acts corresponding video frame, these video frames, which extract, can be individually created as a piece of video Section extracts the purpose that posture acts corresponding video clip to realize from target video.

To sum up, video classification methods provided in an embodiment of the present invention, comprising: extract the video posture number in target video According to；Video attitude data is matched with preset posture search library, determines the corresponding posture classification class of video attitude data Not, posture search library includes the corresponding relationship of attitude data Yu posture class categories；Posture class categories are determined as target view The class categories of frequency, the present invention can pass through the kind in the video attitude data in target video, with preset posture search library Sub- attitude data carries out the matching of posture dimension, realizes the purpose classified in posture dimension to target video.The present invention is in posture In the matching process of data, reduce interference caused by the information of other dimensions, to reduce data calculation amount, solve by Between characteristics of image characteristic similarity calculate and the problem that causes calculation amount larger.In addition, the match party for passing through attitude data Formula avoids the operation that feature extraction frame by frame is carried out to video, improves the efficiency to target video classification.

Fig. 4 is the step flow chart of another video classification methods provided in an embodiment of the present invention, as shown in figure 4, the party Method may include:

Step 401, will be according to the first preset period of time, multiple video frame figures for being extracted from the target video Picture, using as the video attitude data.

Specifically, in this step, it is assumed that video classification methods provided in an embodiment of the present invention need to serve client, then Target video is highly relevant with the business of client from client, at this time can will be according to the first preset period of time, from target The multiple video frame images extracted in video, and can specifically be determined in video attitude data using open pose algorithm Relative position vector between multiple key points and multiple key points.Wherein, open pose algorithm is based on deep learning Pose estimation Open Framework, by open pose increase income algorithm, the extraction to attitude data in image may be implemented.

In addition, when being extracted to target video, it can be according to faster first preset period of time, to be regarded from target More information is extracted in frequency, avoids the omission of key message.Such as, the first preset period of time can be 1 second.

Step 402, according to the similarity value between the video attitude data and the seed attitude data, from the view Choose target video attitude data in frequency attitude data, the target video attitude data and the target seed attitude data it Between similarity value be greater than or equal to the first similarity threshold.

In embodiments of the present invention, pass through seed representative included in posture search library or with high value Attitude data, and the quantity of the video attitude data obtained by step 401 is very more, then can according to video attitude data and Similarity value between seed attitude data screens video attitude data using seed attitude data, thus from video The target video attitude data with higher-value, and the phase between target video attitude data are filtered out in attitude data It is greater than or equal to the target seed attitude data of the first default similarity threshold like degree, and target seed attitude data is corresponding Posture class categories are determined as the posture class categories of target video attitude data.

Optionally, each video attitude data and each seed attitude data include multiple with connection pass Relative position vector between the key point of system and the key point, in an implementation of the embodiment of the present invention, step Rapid 402 can specifically include:

Sub-step 4021 obtains mesh according to the Relative position vector in the video attitude data between the first key point Mark corresponding first angle of the first key point.

In embodiments of the present invention, referring to Fig. 2, it is assumed that Fig. 2 shows the pictorial diagram of a video attitude data, depending on Include 6 key points in frequency attitude data: the A, relative position information between B, C, D, E, F and this 6 key points, The first key point of target B, D, E of middle non-end endpoint respectively correspond the first angle ∠ ABC, ∠ BDE, ∠ DEF, these three first Angle can be determined as the relative position information in video attitude data first between key point.

Sub-step 4022 obtains mesh according to the Relative position vector in the seed attitude data between the second key point Mark corresponding second angle of the second key point.

In embodiments of the present invention, referring to Fig. 3, it is assumed that Fig. 3 shows the pictorial diagram of a seed attitude data, kind It include 6 key points in sub- attitude data: the relative position letter between A ', B ', C ', D ', E ', F ' and this 6 key points Breath, wherein the first key point of target B ', D ', the E ' of non-end endpoint respectively correspond the second angle ∠ A ' B ' C ', ∠ B ' D ' E ', ∠ D ' E ' F ', these three second angles can be determined as the relative position information in seed attitude data between the second key point.

Sub-step 4023, using video attitude data corresponding to the first angle of target in first angle as described in Target video attitude data.

Sub-step 4024, using the corresponding seed attitude data of the second angle of target in second angle as the mesh Mark video attitude data.

Wherein, the difference between first angle of target and the second angle of the target is less than or equal to default angle The absolute value of value；First key point and second key point are the key point of non-end endpoint.

In embodiments of the present invention, for the first angle ∠ ABC, the ∠ BDE, ∠ provided in the example of sub-step 4021 The second angle ∠ A ' B ' C ', the ∠ B ' D ' E ', ∠ D ' E ' F ' provided in the example of DEF and sub-step 4022, can carry out ∠ The calculating of difference between ABC and ∠ A ' B ' C ', carry out ∠ BDE and ∠ B ' D ' E ' between difference calculating, and carry out ∠ DEF and The calculating of difference between ∠ D ' E ' F '.

The mean value of the first angle and the second angle be respectively less than or equal to default angle value absolute value when, can determine figure Seed attitude data shown in 3, to Fig. 2 shows as video attitude data height it is similar, may thereby determine that the first angle For the first angle of target, and determine the corresponding video attitude data as shown in Figure 2 of first angle of target as the target Video attitude data.Further, which can be determined that seed attitude data.It can determine that the second angle is target Second angle, and determine the corresponding seed attitude data as shown in Figure 3 of second angle of target as the target seed appearance State data.The absolute value of default angle value is preferably 10 degree to 40 degree.

Since posture search library includes the corresponding relationship of seed attitude data Yu posture class categories, so of the invention real It applies in a kind of implementation of example, it, can be by target after determining target video attitude data and target seed attitude data The corresponding posture class categories of seed attitude data, using as the corresponding posture class categories of video attitude data.

Optionally, in an implementation of the embodiment of the present invention, after sub-step 4024, can also include:

Sub-step 4025 establishes the first matrix according to the target video attitude data.

In embodiments of the present invention, in the matching by angle, target video attitude data and target seed attitude data, But for higher nicety of grading, it is also necessary to further be screened to target video attitude data, therefore, the present invention is real It applies example and provides one kind by affine transformation calculating, to screen out from target video attitude data and seed attitude data similarity Lower target video attitude data.

Specifically, affine transformation calculating is a linear transform process, it, can in order to reduce the calculation amount of affine transformation calculating To establish the first matrix according to target video attitude data, so that target video attitude data is expressed in the matrix form.

Sub-step 4026 establishes the second matrix according to the target seed attitude data.

In this step, it in order to reduce the calculation amount that affine transformation calculates, can be established according to target seed attitude data Second matrix, so that target seed attitude data is expressed in the matrix form.

First matrix and second matrix are carried out affine transformation calculating, obtain calculated result by sub-step 4027.

Specifically in practical applications, video pictures in shooting process, according to the change of shooting angle and shooting distance, institute The posture movement showed is that 3D posture acts, if a 3D posture movement is converted to video frame frame by frame, the 3D posture Movement can have a variety of different presentation modes in the video frame of 2D.And in embodiments of the present invention, preset posture retrieval The corresponding relationship of attitude data Yu posture class categories is stored in library, the video attitude data in target video and posture are retrieved The attitude data stored in library can be calculated by affine transformation, be eliminated between video attitude data and attitude data in 2D, 3D Spatial information in angle dimension.Therefore, for a video attitude data, even if shooting angle has occurred in shooting process The variation of degree, shooting distance, but as long as the relative position information between the key point and key point of the video attitude data does not have It changes, so that it may be calculated by affine transformation, realize the seed posture number in the video attitude data and posture search library Same dimension matching between, thus the more valuable video attitude data of further screening from video attitude data, So that providing the more robust matching way of shooting angle, shooting distance during visual classification.

Specifically, in embodiments of the present invention, affine transformation, also known as affine maps, affine transformation are geometrically being defined as Affine transformation or an affine maps between two vector spaces, in particular in geometry, a vector space carries out Once linear converts and connects a translation vector, is transformed to the process of another vector space.Affine transformation includes rotation, puts down Move, be flexible, after straight line affine transformation originally or straight line, original parallel lines by after affine transformation or parallel lines, Here it is affine.Affine transformation maintains the " grazing " (straight line is still straight line after affine transformation) and " flat of X-Y scheme (relative positional relationship between straight line remains unchanged row ", and parallel lines are still parallel lines after affine transformation, and on straight line The sequence of positions of point will not change).

It in this step, is to judge whether solve to obtain affine matrix in calculated result by the purpose of affine calculating.? During affine calculating, once linear transformation is carried out there are a vector space and connects a translation vector, be transformed to another The process of one vector space, affine matrix can be the expression matrix form of the translation vector, for reflecting two vector skies Between between space reflection relationship.

Sub-step 4028, if the calculated result is including obtaining affine matrix, and the affine matrix and first square The product matrix that battle array obtains after being multiplied, the average distance between second matrix is greater than or equal to preset threshold, then from institute In the target video attitude data having, the corresponding target video attitude data of first matrix is deleted.

Such as, in embodiments of the present invention, it is assumed that the first matrix is S, and the second matrix is M.Second matrix M is subjected to affine change Change, an affine matrix T solved by way of optimization, if this affine matrix T with obtain after the second matrix multiple Average distance between product matrix, with the first matrix is greater than or equal to preset threshold, then it is assumed that the corresponding target of the second matrix M Similarity between seed attitude data target video attitude data corresponding with the first matrix S is smaller, so as to from all The corresponding target seed attitude data of the second matrix M is deleted in target video attitude data.If product matrix, with the first matrix it Between average distance be less than preset threshold, then it is assumed that the corresponding target seed attitude data of the second matrix M is corresponding with the first matrix S Target video attitude data between similarity it is larger, so as to retain the second square from all target video attitude datas The corresponding target seed attitude data of battle array M.

For example, referring to Fig. 2 and Fig. 5, the posture gone down on one's kness Fig. 2 shows the personage shot under a kind of conventional shooting angle is dynamic Make, shooting angle is that the lower kneeling part of personage is shot from side, and Fig. 2 can be understood as the target video in the embodiment of the present invention Attitude data.It is postulated that in order to taking the other angles picture for the personage that goes down on one's kness, the shooting angle of camera is deflected, thus The posture movement in Fig. 5 is obtained, Fig. 5 can be understood as the target seed attitude data in the embodiment of the present invention, then according to Fig. 5 In include target seed attitude data and Fig. 2 in target video attitude data carry out affine transformation calculating, if can obtain One affine matrix, and this affine matrix with the product matrix that is obtained after the second matrix multiple, between the first matrix Average distance is less than preset threshold, then it is assumed that the target video posture number in target seed attitude data and Fig. 2 for including in Fig. 5 Similarity between is larger.The target seed attitude data that Fig. 5 includes after the mapping by affine matrix, can be converted into The shooting angle of the target video attitude data of Fig. 2 is consistent, i.e. Fig. 5 can be converted into Fig. 6, finally asks product matrix and the first matrix Between average distance, that is to say the similarity asked between Fig. 6 and Fig. 2, and the similarity between Fig. 6 and Fig. 2 is by can be in figure Find out, height is similar therebetween, and the corresponding target seed of the second matrix M can be retained in all target video attitude datas Attitude data.

Sub-step 4029, if the calculated result includes obtaining affine matrix, and the rotation obtained according to the affine matrix Turn component value and be greater than or equal to default component value, from all target video attitude datas, deletes first matrix Corresponding target video attitude data.

In embodiments of the present invention, there are the confusing matching problems of whole angle of some postures movement, in Fig. 2 Side view personage posture of going down on one's kness can obtain Fig. 7 after rotating clockwise 90 degree, and the movement posture in Fig. 7 is easy again and cycling Movement posture obscure, this to obscure the reduction that will cause visual classification precision, e.g., Fig. 7 is that Fig. 2 rotates clockwise 90 degree The movement that the figure obtained afterwards, picture or a personage go down on one's kness, and when carrying out visual classification, it usually can be corresponding by Fig. 7 Picture is determined as cycling classification, which is wrong.

And in embodiments of the present invention, it can be acted for a posture and pre-define a default component value, this is default Component value contains the posture and acts the degree of rotation that can be received, i.e., legal degree, when the component value of posture movement is less than When default component value, it can determine that posture movement is illegal.When the component value of posture movement is greater than or equal to default component value, It can determine that posture movement is legal.

Affine transformation is carried out according to the first matrix and the second matrix and calculates the affine matrix solved, can further acquire rotation Turn component value, it, can be from the determining corresponding target of first matrix if the rotational component value is greater than or equal to default component value Video attitude data is illegal, so as to delete the corresponding target video of the first matrix from all target video attitude datas Attitude data.

It such as, is 100 degree for lower kneeling part preset default component value, then Fig. 7 is only that Fig. 2 rotates clockwise 90 degree The figure obtained afterwards, rotational component value is less than 100 degree, it is believed that the movement posture in Fig. 7 is legal posture of going down on one's kness.

Sub-step 40210, if the calculated result includes not obtaining affine matrix, from all target video postures In data, the corresponding target video attitude data of first matrix is deleted.

In this step, if calculated result includes not obtaining affine matrix, i.e., affine referring to the example in sub-step 4028 Calculating can not solve corresponding affine matrix, or the average distance between the product matrix and the first matrix acquired is greater than or equal to Preset threshold, then it is assumed that similar between the target seed attitude data for including in Fig. 5 and the target video attitude data in Fig. 2 Degree is smaller, and target video attitude data shown in fig. 5 need to be deleted from all target video attitude datas.

Step 403, from the seed attitude data, choose corresponding with target video attitude data target seed Attitude data.

It in this step, can also be further from seed attitude data, really when target video attitude data has been determined The fixed similarity value between target video attitude data is greater than or equal to the target seed attitude data of the first similarity threshold.

Step 404, by the corresponding posture class categories of the target seed attitude data, using as the video posture number According to corresponding posture class categories.

In this step, by the corresponding posture class categories of target seed attitude data, it is determined as target video posture number According to posture class categories, that is, determined in target video, the posture class categories of video attitude data corresponding posture movement.

Optionally, after step 404, can also include:

Step 405, according to the target video attitude data, video clip is established.

In embodiments of the present invention, it according to determining target video attitude data, can also be looked for simultaneously in target video To the corresponding video frame images of target video attitude data, these video frame images are extracted, that is, produce a video Segment, the video clip often correspond to the posture movement an of splendid contents, have reached and acted in target video according to posture Extract the purpose of splendid contents segment.

Step 406, the posture class categories of the target video attitude data are determined as to the posture of the video clip Class categories.

In this step, by the posture class categories of target video attitude data, it is determined as the posture classification of video clip Classification determined in target video, the posture class categories of the corresponding posture movement of video clip.

Optionally, can also include: before step 402 referring to Fig. 8

Step 407, the sample attitude data in video sample is extracted.

In embodiments of the present invention, the current prior art requires to define excellent posture movement by people in advance, but Be it is more close between the attitude data for being limited due to the imagination of people, thus excavated by manual work being caused to excavate, attitude data Type is also relatively fewer, in embodiments of the present invention, can use the video sample of magnanimity, carries out appearance by unsupervised mode The excavation of state movement, reaches automation, comprehensive mining effect.

Specifically, due to be magnanimity video sample, the sample attitude data number extracted from video sample Amount is also very much, and the present invention can screen the sample attitude data, by unsupervised automatic mode thus from all samples The valuable sample attitude data of tool is filtered out in this attitude data, to carry out the foundation of posture search library.So that posture acts Definition with the automatic progress of pure mathematics calculation, solve the problems, such as that posture movement relies on Manual definition, since posture is dynamic The definition of work is unsupervised automatic mode, then can introduce the huge video sample of quantity, realizes the comprehensive of posture movement It excavates, reduces the omission probability of high value posture movement.

In addition, video sample can be chosen from the same video sample library, it can also be respectively from different videos Required video sample is extracted in sample database, the embodiment of the present invention is not construed as limiting this.

Optionally, step 407 can specifically include:

Sub-step 4071, according to the second preset period of time, the multiple video frame figures that will be extracted from the first video sample Picture, using as first sample attitude data.

In this step, first video of certain amount (such as 5 to 100,000) can be chosen from the video sample library of magnanimity Sample, and using the frame video frame images extracted every the second preset period of time from the first video sample as first sample appearance State data, and specifically can use open pose algorithm and extract the pass for reflecting primary objects content in first sample attitude data Relative position vector between key point and key point.

Sub-step 4072 classifies to the first sample attitude data, obtains multiple sample class.

The first sample attitude data obtained by sub-step 4071, the quantity of these first sample attitude datas is various, And largely there is a situation where that height is similar between attitude data and attitude data, therefore, in the step, to the first sample appearance State data are classified, and multiple sample class are obtained.

Specifically, the mode that first sample attitude data is classified can be realized by clustering algorithm model, Specifically, these first sample attitude datas are imported preset clustering algorithm model, by clustering algorithm model to the first sample This attitude data is classified, each obtained sample class can be used as a group variety (cluster), is calculated by cluster In the group variety that method obtains, it is believed that often correspond to common posture with the more group variety of first sample attitude data after cluster Movement, for example stand, on foot etc.；And those group varietys less with first sample attitude data, maximum probability are rarer Posture movement, these postures are likely to compare exaggeration and have the posture of watching focus, for example go down on one's kness, stand upside down etc., often correspond to essence Color content.

Specifically, clustering algorithm is also known as cluster analysis, it is a kind of statistical analysis technique for studying classification problem, is also simultaneously One important algorithm of data mining.If clustering algorithm is made of dry model (Pattern), in general, mode is a degree The vector or a point in hyperspace for measuring (Measurement), clustering algorithm is based on similitude, at one Than having more similitudes not between the mode in same cluster between mode in cluster.In embodiments of the present invention, Clustering algorithm model can be analyzed using first sample attitude data as mode, study each first sample attitude data it Between similitude, and by the higher a kind of first sample attitude data cluster of similitude into a group variety.

Sub-step 4073 chooses the second sample posture from the first sample attitude data that each sample class includes Data.

In this step, it in the group variety clustered, contains common posture and acts corresponding group variety and rare posture Corresponding group variety is acted, in general, common posture acts corresponding group variety and rare posture is acted and wrapped in corresponding group variety Greater number of first sample attitude data is contained, one or more can be chosen from the first sample attitude data that group variety includes A second sample attitude data, these the second sample attitude datas can be in group variety with certain representativeness or with high value Posture movement.

Sub-step 4074, will be according to third preset period of time, the multiple videos that will be extracted from the second video sample Frame image, using as third sample attitude data.

Wherein, second preset period of time is greater than the third preset period of time.

In this step, it is assumed that video classification methods provided in an embodiment of the present invention need to serve client, then the first view Frequency sample may come from internet, or from the video sample library of one's own side, the second video sample may come from client, with The business of client is highly relevant, i.e. the second video sample may come from the task video sample collection of customer's offer, or come from In the video sample library of customer, can will will be extracted from the second video sample according to third preset period of time at this time Multiple video frame images, extract third sample appearance using open pose algorithm using as third sample attitude data, and specifically Reflect the Relative position vector between the key point and key point of primary objects content in state data.

In addition, being directed to the first video sample, video frame images can be carried out according to the second slower preset period of time It extracts, because the first video sample is general video sample in above-mentioned application scenarios, source is limited there is no specific, Therefore too high pumping frame rate is not needed.And the second video sample often with the business demand strong correlation of customer, therefore, right It, can be according to faster third preset period of time, to be extracted from the second video sample when second video sample extracts More information avoids the omission of key message.Such as, the second preset period of time can be 10 seconds, third preset period of time It can be 1 second.

It should be noted that in some other visual classification scenes, the first video sample and the second video sample It may come from same video sample library, the embodiment of the present invention is not construed as limiting this.

Step 408, the seed attitude data is chosen from the sample attitude data.

It in embodiments of the present invention, can due to having obtained more sample attitude data to extraction in video sample To filter out representative or sample appearance with high value from more sample attitude data by further screening State data guarantee the validity of data in posture search library to carry out the foundation of posture search library.

Optionally, step 408 specifically can be by choosing and second sample from the third sample attitude data Similarity between attitude data is greater than or equal to the third sample attitude data of the second similarity threshold, using as the seed The mode of attitude data is realized.

Specifically, being chosen similar between the second sample attitude data from the third sample attitude data Degree is greater than or equal to the third sample attitude data of the second similarity threshold, using the realization as the seed attitude data Journey is specifically referred to the description of the related sub-step of above-mentioned steps 402 and step 402, and details are not described herein again.

Step 409, the posture search library including the seed attitude data is established.

In this step, seed attitude data can be stored to posture search library, so that realizing video point subsequent During class method, seed attitude data therein can be obtained by calling posture search library.

Optionally, step 409 can specifically include:

Sub-step 4091 adds corresponding posture class categories for the seed attitude data.

For seed attitude data, it can be based on image analysis or manual analysis, feature knowledge is carried out to seed attitude data Not, and further corresponding posture class categories are added, e.g., the postures class categories such as run, go down on one's kness, dancing.

The seed attitude data and the posture class categories are added to the posture search library by sub-step 4092.

In the application scenarios that video classification methods serve customer, which is from the first general video sample In obtain it is representative or with high value posture movement the second sample attitude data, and further from customer provide It is extracted in second video sample and obtains third sample attitude data, pass through the second sample attitude data and third sample attitude data Between similarity calculation, from the third sample attitude data, choose and the phase between the second sample attitude data It is greater than or equal to the third sample attitude data of the second similarity threshold like degree, using as seed attitude data, and according to seed Attitude data and corresponding posture class categories, establish posture search library, so that subsequent carrying out the video for the customer When classification business, posture search library, which is capable of providing, more has targetedly classified service.

In conclusion video classification methods provided in an embodiment of the present invention, comprising: extract the video posture in target video Data；Video attitude data is matched with preset posture search library, determines the corresponding posture classification of video attitude data Classification, posture search library include the corresponding relationship of attitude data Yu posture class categories；Posture class categories are determined as target The class categories of video, in the present invention, attitude data are a kind of information for being concerned only with posture movement, the data volume of attitude data Smaller, the calculation amount that when matching generates is also smaller.The present invention can be and preset by the video attitude data in target video Seed attitude data in posture search library carries out the matching of posture dimension, realizes the mesh classified in posture dimension to target video 's.The present invention reduces interference caused by the information of other dimensions in the matching process of attitude data, to reduce data Calculation amount solves the problems, such as to cause calculation amount larger due to characteristic similarity calculating between characteristics of image.In addition, passing through appearance The matching way of state data avoids the operation that feature extraction frame by frame is carried out to video, improves the effect to target video classification Rate.In addition, its corresponding attitude data and posture search library are being carried out matched mistake for the posture movement in video Cheng Zhong, can eliminate in shooting process because shooting angle, shooting distance change due to caused by Spatial Dimension difference, thus for should Posture movement is matched to accurate posture class categories, improves classification accuracy.

Fig. 9 is a kind of block diagram of visual classification device provided in an embodiment of the present invention, as shown in figure 9, the visual classification fills Setting 90 may include:

First extraction module 901, for extracting the video attitude data in target video；

Optionally, first extraction module 901 is specifically used for regard according to the first preset period of time from the target The multiple video frame images extracted in frequency, using as the video attitude data.

Matching module 902, the seed posture number for including by the video attitude data and preset posture search library According to being matched, the corresponding posture class categories of the video attitude data are determined, the posture search library includes the seed The corresponding relationship of attitude data and posture class categories；

Determining module 903 is used for by the corresponding posture class categories of the video attitude data, as the target video Class categories.

Optionally, referring to Fig.1 0, described device further include:

Second extraction module 904, for extracting the sample attitude data in video sample；

Optionally, the second extraction module 904 is specifically used for:

According to the second preset period of time, the multiple video frame images that will be extracted from the first video sample, using as One sample attitude data；

Classify to the first sample attitude data, obtains multiple sample class；

The second sample attitude data is chosen from the first sample attitude data that each sample class includes；

It will be according to third preset period of time, the multiple video frame images that will be extracted from the second video sample, to make For third sample attitude data；

Module 905 is chosen, for choosing the seed attitude data from the sample attitude data；

Optionally, module 905 is chosen to be specifically used for:

From the third sample attitude data, choose and the second sample attitude data between similarity be greater than or Equal to the third sample attitude data of the second similarity threshold, using as the seed attitude data；

Module 906 is established, for establishing the posture search library including the seed attitude data.

Optionally, the module 906 of establishing is specifically used for:

Corresponding posture class categories are added for the seed attitude data；

By the seed attitude data and the posture class categories, it is added to the posture search library.

Optionally, referring to Fig.1 1, matching module 902, comprising:

First chooses submodule 9021, for according to the phase between the video attitude data and the seed attitude data Like angle value, target video attitude data, the target video attitude data and the mesh are chosen from the video attitude data The similarity value marked between seed attitude data is greater than or equal to the first similarity threshold；

Optionally, each video attitude data and each seed attitude data include multiple with connection pass Relative position vector between the key point of system and the key point, the first selection submodule 9021 are specifically used for:

According to the Relative position vector in the video attitude data between the first key point, the first key point of target is obtained Corresponding first angle；

According to the Relative position vector in the seed attitude data between the second key point, the second key point of target is obtained Corresponding second angle；

Using video attitude data corresponding to the first angle of target in first angle as the target video appearance State data；

Using the corresponding seed attitude data of the second angle of target in second angle as the target video posture Data；

Optionally, the first selection submodule 9021 is specifically used for:

According to the target video attitude data, the first matrix is established；

According to the target seed attitude data, the second matrix is established；

First matrix and second matrix are subjected to affine transformation calculating, obtain calculated result；

If the calculated result includes obtaining affine matrix, and obtain after the affine matrix and first matrix multiple Product matrix, average distance between second matrix is greater than or equal to preset threshold, then from all targets In video attitude data, the corresponding target video attitude data of first matrix is deleted；

If the calculated result includes obtaining affine matrix, and the rotational component value obtained according to the affine matrix is greater than Or it is equal to default component value, from all target video attitude datas, delete the corresponding target view of first matrix Frequency attitude data；

If the calculated result includes not obtaining affine matrix, from all target video attitude datas, delete The corresponding target video attitude data of first matrix.

Second chooses submodule 9022, for choosing and the target video posture number from the seed attitude data According to corresponding target seed attitude data；

First determines submodule 9023, is used for the corresponding posture class categories of the target seed attitude data, to make For the corresponding posture class categories of the video attitude data.

Setting up submodule 9024, for establishing video clip according to the target video attitude data；

Second determines submodule 9025, described for being determined as the posture class categories of the target video attitude data The posture class categories of video clip.

In conclusion visual classification device provided in an embodiment of the present invention, comprising: extract the video posture in target video Data；Video attitude data is matched with preset posture search library, determines the corresponding posture classification of video attitude data Classification, posture search library include the corresponding relationship of attitude data Yu posture class categories；Posture class categories are determined as target The class categories of video, the present invention can be by the video attitude datas in target video, with preset posture search library Seed attitude data carries out the matching of posture dimension, realizes the purpose classified in posture dimension to target video.The present invention is in appearance In the matching process of state data, interference caused by the information of other dimensions is reduced, to reduce data calculation amount, is solved The problem for causing calculation amount larger due to characteristic similarity calculating between characteristics of image.In addition, the matching for passing through attitude data Mode avoids the operation that feature extraction frame by frame is carried out to video, improves the efficiency to target video classification.

For above-mentioned apparatus embodiment, since it is basically similar to the method embodiment, so be described relatively simple, The relevent part can refer to the partial explaination of embodiments of method.

Preferably, the embodiment of the present invention also provides a kind of terminal, including processor, and memory stores on a memory simultaneously The computer program that can be run on the processor, the computer program realize above-mentioned visual classification side when being executed by processor Each process of method embodiment, and identical technical effect can be reached, to avoid repeating, which is not described herein again.

The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, the computer program realize each process of above-mentioned video classification methods embodiment, and energy when being executed by processor Reach identical technical effect, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, such as only Read memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc..

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

It would have readily occurred to a person skilled in the art that: any combination application of above-mentioned each embodiment is all feasible, therefore Any combination between above-mentioned each embodiment is all embodiment of the present invention, but this specification exists as space is limited, This is not just detailed one by one.

Provided herein video classification methods not with any certain computer, virtual system or the intrinsic phase of other equipment It closes.Various general-purpose systems can also be used together with teachings based herein.As described above, construction has present invention side Structure required by the system of case is obvious.In addition, the present invention is also not directed to any particular programming language.It should be bright It is white, it can use various programming languages and realize summary of the invention described herein, and retouched above to what language-specific was done State is in order to disclose the best mode of carrying out the invention.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the present invention and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, such as right As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool Thus claims of body embodiment are expressly incorporated in the specific embodiment, wherein each claim conduct itself Separate embodiments of the invention.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Can in any combination mode come using.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) come realize some in video classification methods according to an embodiment of the present invention or The some or all functions of person's whole component.The present invention is also implemented as one for executing method as described herein Point or whole device or device programs (for example, computer program and computer program product).Such this hair of realization Bright program can store on a computer-readable medium, or may be in the form of one or more signals.It is such Signal can be downloaded from an internet website to obtain, and is perhaps provided on the carrier signal or is provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims

1. a kind of video classification methods, which is characterized in that the described method includes:

Extract the video attitude data in target video；

The video attitude data is matched with the seed attitude data that preset posture search library includes, determines the view The corresponding posture class categories of frequency attitude data, the posture search library include the seed attitude data and posture class categories Corresponding relationship；

2. the method according to claim 1, wherein the step for extracting the video attitude data in target video Suddenly, comprising:

Will be according to the first preset period of time, the multiple video frame images extracted from the target video, using as described Video attitude data.

3. according to the method described in claim 2, it is characterized in that, described examine the video attitude data and preset posture The step of seed attitude data that Suo Ku includes is matched, and determines the video attitude data corresponding posture class categories, Include:

According to the similarity value between the video attitude data and the seed attitude data, from the video attitude data Choose target video attitude data, the similarity value between the target video attitude data and the target seed attitude data More than or equal to the first similarity threshold；

From the seed attitude data, target seed attitude data corresponding with the target video attitude data is chosen；

By the corresponding posture class categories of the target seed attitude data, using as the corresponding posture of the video attitude data Class categories.

4. according to the method described in claim 3, it is characterized in that, each video attitude data and each seed appearance State data all include the Relative position vector between multiple key points with connection relationship and the key point, and described According to the similarity value between the video attitude data and the seed attitude data, mesh is chosen from the video attitude data The step of marking video attitude data, comprising:

According to the Relative position vector in the video attitude data between the first key point, it is corresponding to obtain the first key point of target The first angle；

According to the Relative position vector in the seed attitude data between the second key point, it is corresponding to obtain the second key point of target The second angle；

Using video attitude data corresponding to the first angle of target in first angle as the target video posture number According to；

Using the corresponding seed attitude data of the second angle of target in second angle as the target video attitude data；

Wherein, the difference between first angle of target and the second angle of the target, less than or equal to default angle value Absolute value；First key point and second key point are the key point of non-end endpoint.

5. according to the method described in claim 4, it is characterized in that, the first angle of target institute in first angle is right After the step of video attitude data answered is as the target video attitude data, further includes:

According to the target video attitude data, the first matrix is established；

According to the target seed attitude data, the second matrix is established；

If the calculated result includes obtaining affine matrix, and what is obtained after the affine matrix and first matrix multiple multiplies Average distance between product matrix, with second matrix is greater than or equal to preset threshold, then from all target videos In attitude data, the corresponding target video attitude data of first matrix is deleted；

If the calculated result includes obtaining affine matrix, and the rotational component value obtained according to the affine matrix is greater than or waits In default component value, from all target video attitude datas, the corresponding target video appearance of first matrix is deleted State data；

If the calculated result includes not obtaining affine matrix, from all target video attitude datas, described in deletion The corresponding target video attitude data of first matrix.

6. according to the method described in claim 3, it is characterized in that, dividing by the corresponding posture of the target seed attitude data Class classification, the step of using posture class categories corresponding as the video attitude data after, further includes:

According to the target video attitude data, video clip is established；

The posture class categories of the target video attitude data are determined as to the posture class categories of the video clip.

7. the method according to claim 1, wherein being retrieved by the video attitude data and preset posture The step of seed attitude data that library includes is matched, and determines the video attitude data corresponding posture class categories it Before, the method also includes:

Extract the sample attitude data in video sample；

The seed attitude data is chosen from the sample attitude data；

Establish the posture search library including the seed attitude data.

8. the method according to the description of claim 7 is characterized in that the attitude data extracted in the video sample, packet It includes:

According to the second preset period of time, the multiple video frame images that will be extracted from the first video sample, using as the first sample This attitude data；

Classify to the first sample attitude data, obtains multiple sample class；

It will be according to third preset period of time, the multiple video frame images that will be extracted from the second video sample, using as Three sample attitude datas；

9. according to the method described in claim 8, it is characterized in that, described choose the seed from the sample attitude data Attitude data, comprising:

From the third sample attitude data, the similarity chosen between the second sample attitude data is greater than or equal to The third sample attitude data of second similarity threshold, using as the seed attitude data；

It is described to establish the posture search library including the seed attitude data, comprising:

Corresponding posture class categories are added for the seed attitude data；

10. a kind of visual classification device, which is characterized in that described device includes:

Matching module, the seed attitude data progress for including by the video attitude data and preset posture search library Match, determine the corresponding posture class categories of the video attitude data, the posture search library includes the seed attitude data With the corresponding relationship of posture class categories；

Determining module, for the classification by the corresponding posture class categories of the video attitude data, as the target video Classification.

11. device according to claim 10, which is characterized in that first extraction module is specifically used for will be according to first Preset period of time, the multiple video frame images extracted from the target video, using as the video attitude data.

12. device according to claim 11, which is characterized in that the matching module, comprising:

First chooses submodule, for according to the similarity value between the video attitude data and the seed attitude data, Target video attitude data, the target video attitude data and the target seed appearance are chosen from the video attitude data Similarity value between state data is greater than or equal to the first similarity threshold；

Second chooses submodule, for choosing corresponding with the target video attitude data from the seed attitude data Target seed attitude data；

First determines submodule, for by the corresponding posture class categories of the target seed attitude data, using as the view The corresponding posture class categories of frequency attitude data.

13. device according to claim 12, which is characterized in that each video attitude data and each seed Attitude data all includes the Relative position vector between multiple key points with connection relationship and the key point；

The first selection submodule is specifically used for:

14. device according to claim 13, which is characterized in that the first selection submodule is specifically used for:

According to the target video attitude data, the first matrix is established；

According to the target seed attitude data, the second matrix is established；

15. device according to claim 12, which is characterized in that the matching module further include: setting up submodule is used for According to the target video attitude data, video clip is established；

Second determines submodule, for the posture class categories of the target video attitude data to be determined as the video clip Posture class categories.

16. device according to claim 10, which is characterized in that described device further include:

Second extraction module, for extracting the sample attitude data in video sample；

Module is chosen, for choosing the seed attitude data from the sample attitude data；

Module is established, for establishing the posture search library including the seed attitude data.

17. device according to claim 16, which is characterized in that second extraction module is specifically used for:

Classify to the first sample attitude data, obtains multiple sample class；

18. device according to claim 17, which is characterized in that the selection module is specifically used for:

The module of establishing is specifically used for:

Corresponding posture class categories are added for the seed attitude data；

19. a kind of computer readable storage medium, which is characterized in that store computer journey on the computer readable storage medium Sequence, the computer program realize video classification methods as described in any one of claim 1 to 9 when being executed by processor.