CN113128448B

CN113128448B - Video matching method, device, equipment and storage medium based on limb identification

Info

Publication number: CN113128448B
Application number: CN202110473266.8A
Authority: CN
Inventors: 刘静
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2024-05-24
Anticipated expiration: 2041-04-29
Also published as: CN113128448A

Abstract

The invention discloses a video matching method, a device, computer equipment and a storage medium based on limb identification, wherein the method comprises the steps of obtaining a video to be matched and a standard video; inputting the video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video or not; if the video to be matched is a normal matching video, respectively inputting the standard video and the video to be matched into a preset gesture recognition model to obtain each standard limb sequence and each limb sequence to be matched; determining a key frame index sequence corresponding to the standard video through an extremum key frame identification method according to each standard limb sequence; determining a plurality of matching index sequences matched with the key frame index sequences and sequence matching values corresponding to the matching index sequences in each limb sequence to be matched by a maximum average value video segment identification method; and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence. The invention improves the efficiency and accuracy of video matching based on limb identification.

Description

Video matching method, device, equipment and storage medium based on limb identification

Technical Field

The present invention relates to the field of image matching technologies, and in particular, to a video matching method and apparatus based on limb identification, a computer device, and a storage medium.

Background

In the prior art, an gesture recognition model is introduced for limb action matching to judge, for example, in a dance video scoring system, the score of a dance is estimated by a method of similarity of the positions of recognized limb points; however, the above method has the following disadvantages: firstly, when an unmanned landscape video appears, the false recognition phenomenon of the limb points of the human body can be recognized; secondly, a method for calculating the approximate similarity of the positions of the limb points is adopted, so that difference information in two sections of videos is weakened, and the dance score gap finally estimated is not large; thirdly, the problem that the initial frames of the two video actions are aligned, namely the starting time of the two video actions cannot be aligned uniformly, further the final limb action matching accuracy is low.

Disclosure of Invention

The embodiment of the invention provides a video matching method, a device, computer equipment and a storage medium based on limb identification, which are used for solving the problem of low accuracy of limb action matching.

A video matching method based on limb identification, comprising:

acquiring a video to be matched and a standard video corresponding to the video to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

inputting the video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video or not;

If the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into a preset gesture recognition model to obtain a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched;

Determining a key frame index sequence corresponding to the standard video through an extremum key frame identification method according to each standard limb sequence;

determining a plurality of matching index sequences matched with the key frame index sequences and sequence matching values corresponding to the matching index sequences in the limb sequences to be matched through a maximum average value video segment identification method;

and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

A video matching device based on limb identification, comprising:

The standard video acquisition module is used for acquiring videos to be matched and standard videos corresponding to the videos to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

The matching video judging module is used for inputting the video to be matched into a preset limb classification model so as to determine whether the video to be matched is a normal matching video or not;

the gesture recognition module is used for inputting the standard video and the video to be matched into a preset gesture recognition model respectively if the video to be matched is a normal matching video to obtain a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched;

The key frame identification module is used for determining a key frame index sequence corresponding to the standard video through an extremum key frame identification method according to each standard limb sequence;

the sequence matching value determining module is used for determining a plurality of matching index sequences matched with the key frame index sequences and sequence matching values corresponding to the matching index sequences in the limb sequences to be matched through a maximum average value video segment identification method;

And the video matching result determining module is used for determining the video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the video matching method based on limb identification described above when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor implements the video matching method based on limb identification described above.

The video matching method, the device, the computer equipment and the storage medium based on limb identification are characterized in that videos to be matched and standard videos corresponding to the videos to be matched are obtained; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image; inputting the video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video or not; if the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into a preset gesture recognition model to obtain a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched; determining a key frame index sequence corresponding to the standard video through an extremum key frame identification method according to each standard limb sequence; determining a plurality of matching index sequences matched with the key frame index sequences and sequence matching values corresponding to the matching index sequences in the limb sequences to be matched through a maximum average value video segment identification method; and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

The method adopts the mode of aligning two sections of videos, which combines an extremum key frame identification method and a maximum average value video section identification method, and further carries out low-dimensional corresponding frame similarity calculation after an aligned section is found out through a maximum limb element and a minimum limb element; the accuracy of video matching is improved; the invention also introduces a preset limb classification model, when the video to be matched is not a normal matching video, the subsequent steps are not required to be executed to determine the video matching result, so that the calculation complexity of a computer is reduced, and the efficiency and the accuracy of video matching are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of a video matching method based on limb identification according to an embodiment of the present invention;

FIG. 2 is a flow chart of a video matching method based on limb identification in an embodiment of the invention;

FIG. 3 is a flowchart of step S40 in a video matching method based on limb identification according to an embodiment of the present invention;

FIG. 4 is a flowchart of step S50 in a video matching method based on limb identification according to an embodiment of the present invention;

FIG. 5 is a flowchart of step S503 in a video matching method based on limb identification according to an embodiment of the present invention;

FIG. 6 is a schematic block diagram of a video matching device based on limb identification in an embodiment of the invention;

FIG. 7 is a schematic block diagram of a key frame identification module in a limb identification-based video matching apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of a sequence matching value determining module in a limb identification-based video matching apparatus in accordance with an embodiment of the present invention;

FIG. 9 is a schematic block diagram of a slide matching unit in a limb identification based video matching apparatus in accordance with an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The video matching method based on the limb identification provided by the embodiment of the invention can be applied to an application environment shown in figure 1. Specifically, the video matching method based on the limb identification is applied to a video matching system based on the limb identification, and the video matching system based on the limb identification comprises a client and a server as shown in fig. 1, wherein the client and the server are communicated through a network, so that the problem of low accuracy of limb action matching is solved. The client is also called a client, and refers to a program corresponding to the server for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a video matching method based on limb identification is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:

S10: acquiring a video to be matched and a standard video corresponding to the video to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

Standard video is understood to mean video that is checked for errors; the video to be matched refers to the video to be matched; for example, in a dance video scoring scene, the standard video may be an exemplary video of dance actions without errors; the video to be matched can be video shot by the dance self in the standard video learned by different students. The images to be matched refer to images containing limb actions in the videos to be matched, and in this embodiment, the videos to be matched include limb actions in at least one frame of the videos to be matched, that is, the videos to be matched which do not include any limb actions are manually screened, or the videos to be matched are removed in advance by adopting a preset limb classification model in step S20. The standard image refers to an image containing limb movements in a standard video.

S20: inputting the video to be matched into a preset limb classification model to determine whether the video to be matched is a normal matching video or not;

It can be understood that the preset limb classification model is a neural network-based two-class model, and the preset limb classification model can identify whether the image to be matched contains a human body or is limb motion; for example, if the image to be matched contains a human body or a limb action, the preset limb classification model marks the image to be matched as 1, i.e. classifies the image to be matched into a category containing the human body or the limb action; if the image to be matched does not contain human body or limb action, the preset limb classification model marks the image to be matched as0, namely classifying the image to be matched into a category which does not contain human body or limb action.

Further, the preset limb classification model in the embodiment may be trained based on the preset gesture recognition model in step S30, that is, after a plurality of standard videos are input into the preset gesture recognition model, the preset gesture recognition model outputs a standard limb sequence corresponding to a standard image including a limb motion, and then inputs the standard limb sequence into the preset limb classification model, so that the preset limb classification model recognizes and marks the standard limb sequence.

In one embodiment, step S20 includes:

determining whether each frame of images to be matched in the video to be matched contains limb actions or not according to the preset limb classification model;

acquiring a first total number of images to be matched containing limb actions and a second total number of all the images to be matched in the video to be matched;

As can be appreciated, the first total number refers to the total number of images to be matched that contain limb actions in the video to be matched; the second total number refers to the total number of all images to be matched in the video to be matched.

And when the ratio between the first total number and the second total number is greater than or equal to the preset number ratio, determining that the video to be matched is a normal matched video.

The preset number ratio may be determined according to the accuracy requirement of the application scene, and may be set to 80%,90% or the like, for example.

Specifically, after determining whether each frame of images to be matched in the video to be matched contains limb actions or not through a preset limb classification model, recording a first total number of images to be matched containing limb actions and recording a second total number of images to be matched; comparing the ratio between the first total number and the second total number with a preset number ratio, and if the ratio between the first total number and the second total number is larger than or equal to the preset number ratio, characterizing that the images to be matched containing limb actions in the videos to be matched meet the number requirement, namely determining that the videos to be matched are normal matching videos. If the ratio of the first total quantity to the second total quantity is smaller than the ratio of the preset quantity, the number of images to be matched, which contain limb actions, in the video to be matched is represented to be small, and the quantity requirement is not met, so that the video to be matched is determined to be not a normal matching video.

Further, if the video to be matched is not the normal matched video, a video uploading error instruction is sent to a preset receiver, so that the preset receiver updates the video to be matched.

If the ratio between the first total number and the second total number is smaller than the ratio of the preset number, determining that the video to be matched is an abnormal matching video, namely representing that the video to be matched contains fewer image frames to be matched with limb actions, and therefore sending a video uploading error instruction to a preset receiver so as to enable the preset receiver to update the video to be matched; the updating mode may be to re-upload a new video to be matched. The preset receiver may be a party sending the video to be matched.

S30: if the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into a preset gesture recognition model to obtain a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched;

It can be appreciated that the preset gesture recognition model is used to recognize the limb actions of the images to be matched in the video to be matched, and the limb actions of the standard images in the standard video. The standard limb sequence refers to the sequence combination of the coordinate positions of all limb elements in the standard image, and the limb sequence to be matched refers to the sequence combination of the coordinate positions of all limb elements in the image to be matched.

Further, the limb elements in the present embodiment include, but are not limited to, a head limb element, a left shoulder limb element, a right shoulder limb element, a left elbow limb element, a right elbow limb element, a left wrist limb element, a right wrist limb element, a left hip limb element, a right hip limb element, a left knee limb element, a right knee limb element, a left ankle limb element, and a right ankle limb element. For each frame of standard image, a standard limb sequence containing the coordinate positions of the 13 limb elements is corresponding, and illustratively, the standard limb sequence may be: [ (x 00, y 00), (x 01, y 01), (x 02, y 02), (x 03, y 03), (x 04, y 04), (x 05, y 05), …, (x 013, y 013) ].

Specifically, if the video to be matched is a normal matching video, the standard video and the video to be matched are respectively input into a preset gesture recognition model, so that the preset gesture recognition model recognizes a to-be-matched image and a standard image containing limb actions, limb element coordinate position labeling is performed on each standard image and each to-be-matched image containing limb actions, and then a standard limb sequence corresponding to each frame of standard image and a to-be-matched limb sequence corresponding to each frame of to-be-matched image are output.

S40: determining a key frame index sequence corresponding to the standard video through an extremum key frame identification method according to each standard limb sequence;

it will be appreciated that the extremum key frame identifying method refers to a method of determining the maximum and minimum values of each limb element from all standard limb sequences.

In one embodiment, as shown in fig. 3, in step S40, the standard limb sequence associates an image frame tag; that is, the determining, according to each standard limb sequence, the key frame index sequence corresponding to the standard video by the extremum key frame identification method includes:

S401: obtaining standard limb elements in all the standard limb sequences, wherein each standard limb sequence comprises standard limb elements of a preset number of categories, and each standard limb element of each category in each standard limb sequence is only one;

It is to be understood that the limb elements in the present embodiment are indicated in step S30 to include, but not limited to, a head limb element, a left shoulder limb element, a right shoulder limb element, a left elbow limb element, a right elbow limb element, a left wrist limb element, a right wrist limb element, a left hip limb element, a right hip limb element, a left knee limb element, a right knee limb element, a left ankle limb element, and a right ankle limb element. Thus, the standard limb elements in the standard limb sequence are also all limb elements described above. In this embodiment, the preset number of categories is 13 categories; further, the standard image including the limb movements for one frame (only one standard image including the limb movements including only one individual is discussed in this embodiment) corresponds to one standard limb sequence, so that each type of standard limb element in one standard limb sequence is only one, that is, the coordinate information of each type of standard limb element is in one frame of standard image and only one standard limb element is in the other standard image.

S402: extracting the maximum limb element and the minimum limb element in each type of standard limb elements;

It will be understood that the maximum limb element refers to the maximum value of the two-dimensional coordinate information in each type of standard limb element, and the minimum limb element refers to the minimum value of the coordinate information in each type of standard limb element. Further, for different types of standard videos and videos to be matched, different methods for extracting the maximum limb element and the minimum limb element can exist, for example, when actions with less squat jump exist in the change of limb actions in the standard videos and the videos to be matched, such as square dance, only transverse coordinate information in two-dimensional coordinate information in each standard limb element can be considered, namely, the maximum value in the transverse coordinate information is extracted to be used as the maximum limb element, and the minimum value in the transverse coordinate information is extracted to be used as the minimum limb element; if there are many squat and jump actions in the standard video and the video to be matched, such as the armed action video, the transverse coordinate information and the longitudinal coordinate information in the two-dimensional coordinate information in each standard limb element can be comprehensively considered, so that the maximum limb element and the minimum limb element in each standard limb element are determined according to the transverse coordinate information and the longitudinal coordinate information in the two-dimensional coordinate information in each standard limb element.

S403: generating a maximum frame index sequence according to all the extracted maximum limb elements and the image frame labels corresponding to the maximum limb elements, and generating a minimum frame index sequence according to all the extracted minimum limb elements and the image frame labels corresponding to the minimum limb elements;

It will be appreciated that the maximum limb element is determined from standard limb elements in all standard limb sequences, and each standard limb sequence corresponds to a frame of standard image, so that the maximum limb element is associated with a frame of standard image, and thus the image frame tag corresponding to each maximum limb element refers to the number of frames of standard image corresponding to each maximum limb element, i.e. the frame ordering of the standard image in standard video. Illustratively, assuming that one type of maximum limb element is one limb element in a 16 th frame standard image from a standard video, an image frame label corresponding to the maximum limb element is 16 frames. Similarly, the image frame label corresponding to each minimum limb element refers to the number of frames of the standard image corresponding to each minimum limb element. Further, since the extremum key frame identification method is adopted in the present embodiment, each type of standard limb element may appear in different standard images, but at the same time, there may be image frame labels corresponding to multiple standard limb elements that are the same, that is, multiple types of maximum limb elements or multiple types of minimum limb elements may appear in one frame of standard image at the same time.

S404: and carrying out sequence merging processing on the maximum frame index sequence and the minimum frame index sequence to obtain the key frame index sequence.

Specifically, after generating the maximum frame index sequence according to all the extracted maximum limb elements and the image frame labels corresponding to the maximum limb elements, and generating the minimum frame index sequence according to all the extracted minimum limb elements and the image frame labels corresponding to the minimum limb elements, merging the maximum frame index sequence and the minimum frame index sequence to obtain a merged key frame index sequence, that is, the key frame index sequence contains all the maximum limb elements and the minimum limb elements, and simultaneously carries the image frame labels corresponding to the maximum limb elements, and the image frame labels are taken as frame indexes, so that the matching index sequence is conveniently determined in step S50.

S50: determining a plurality of matching index sequences matched with the key frame index sequences and sequence matching values corresponding to the matching index sequences in the limb sequences to be matched through a maximum average value video segment identification method;

It can be understood that the matching index sequence refers to a sequence matched with each maximum limb element and each minimum limb element in the key frame index sequence in the limb sequence to be matched; the sequence matching value refers to the degree of matching between the matching index sequence and the corresponding element video segment of the largest limb element, and can be determined by a cosine similarity algorithm.

In one embodiment, as shown in fig. 4, in step S50, the method includes:

S501: selecting a first limb element video segment corresponding to each maximum limb element from the standard video according to the image frame label corresponding to each maximum limb element; the first limb element video segment comprises standard images of a first preset number of frames;

for example, assuming that the image frame tag corresponding to the maximum limb element of one type is the 16 th frame standard image in the standard video, the selected first limb element video segment may be a video segment formed by 20 frames of standard images in the standard video, which take the 16 th frame standard image as a starting frame, and namely, a video segment formed by the standard images of the 16 th frame to the 35 th frame in the standard video is the first limb element video segment.

S502: selecting a second limb element video segment corresponding to each maximum limb element from the video to be matched according to the image frame label corresponding to each maximum limb element; the second limb element video segment comprises images to be matched of a second preset number of frames; the second preset number of frames is greater than the first preset number of frames;

For example, assuming that the image frame tag corresponding to the maximum limb element of one type is the 16 th frame standard image in the standard video, the selected video segment of the second limb element may be a video segment formed by 20 frames of images to be matched with the 6 th frame of images to be matched as the starting frame and 20 frames of images to be matched with the 26 th frame of images to be matched as the starting frame, namely, a video segment formed by the 6 th frame to the 45 th frame of images to be matched in the video to be matched is the video segment of the second limb element.

S503: sliding and matching the first limb element video segment on the second limb element video segment to determine a sliding and matching sequence corresponding to each first limb element video segment in the second limb element video segment;

It can be understood that the sliding matching process is a matching process of the first limb element video segment and the second limb element video segment, so as to determine a sliding matching sequence corresponding to each first limb element video segment in the second limb element video segment.

In one embodiment, as shown in fig. 5, step S503 includes:

S5031: displaying the images to be matched of each frame in the video segment of the second limb element on a preset video time axis according to a time sequence;

S5032: the first limb element video segment is used as a matching sliding window, and after the matching sliding window is aligned with the initial frame of the second limb element video segment, the second limb element video segment which is the same as the first preset number of frames is recorded as a first matching sequence;

it can be understood that, after the first limb element video segment is used as the matching sliding window, the length of the matching sliding window is the length of the first preset number of frames in the first limb element video segment. Further, displaying all the images to be matched in the video segment of the second limb element on a preset video time axis according to a time sequence (namely, the acquisition time sequence of the images to be matched), displaying the images to be matched on the preset video time axis according to a time sequence from first to last, further taking the video segment of the first limb element as a sliding matching window, and aligning the matching sliding window with a starting frame of the video segment of the second limb element; and recording the second limb element video segments which are the same as the first preset number of frames as a first matching sequence, namely, in the second limb element video segments, taking the images to be matched of the initial frames of the second limb element video segments as initial images, and having the image sequences of the first preset number of images to be matched.

S5033: moving the matched sliding window on the second limb element video segment by a preset frame number step length in a direction away from the initial frame;

Optionally, the preset frame number step may be determined according to the number of images to be matched in the second limb element video segment, and for example, when the number of images to be matched in the second limb element video segment is smaller, the preset frame number step may be set to 1; when the number of images to be matched in the second limb element video segment is large, the preset frame number compensation can be set to 3 and the like.

Specifically, after the first limb element video segment is used as a matching sliding window and the matching sliding window is aligned with the initial frame of the second limb element video segment, the second limb element video segment which is the same as the first preset number of frames is recorded as a first matching sequence, and then the matching sliding window is moved on the second limb element video segment to a preset frame step length in a direction away from the initial frame.

S5034: adding the images to be matched with the preset frame step length after the last image to be matched in the first matching sequence into the first matching sequence, and deleting the images to be matched with the preset frame step length in the first matching sequence from the initial frame to obtain a second matching sequence;

Specifically, after the matching sliding window is moved on the second limb element video segment in a direction away from the initial frame by a preset frame number step, adding the images to be matched with the preset frame number step after the last image to be matched in the first matching sequence to the first matching sequence, and deleting the images to be matched with the preset frame number step from the initial frame image to be matched of the first matching sequence to obtain a second matching sequence, wherein the number of the images to be matched in the second matching sequence is equal to the first preset number of frames. For example, assuming that the preset frame number step is 1, after the first matching sequence is obtained, the sliding matching window is moved backward by one frame, and at this time, the second frame of the image to be matched in the second limb element video segment is aligned with the sliding matching window.

S5035: detecting whether the frame number of the images to be matched after a preset frame number step length after the last image to be matched in the second matching sequence is larger than or equal to the preset frame number step length;

S5036: and if the frame number of the images to be matched after the preset frame number step length is smaller than the preset frame number step length after the last image to be matched in the second matching sequence, recording the first matching sequence and the second matching sequence as the sliding matching sequence.

Specifically, adding an image to be matched with a preset frame number step length after the last image to be matched in the first matching sequence to the first matching sequence, deleting the image to be matched with the preset frame number step length in the first matching sequence from the initial frame to obtain a second matching sequence, and detecting whether the frame number of the image to be matched with the preset frame number step length after the last image to be matched in the second matching sequence is larger than or equal to the preset frame number step length; for example, assuming that the preset frame step length is 2, after the second matching sequence is obtained, the sliding matching window needs to be continuously moved back by two frame step lengths on the second limb element video segment, and if the to-be-matched image of the second limb element video segment after the moving is 0 or 1, a third matching sequence matched with the sliding window cannot be generated; therefore, if the frame number of the image to be matched after the preset frame number step length is smaller than the preset frame number step length in the last image to be matched in the second matching sequence, a new matching sequence cannot be generated at the moment, and the first matching sequence and the second matching sequence are directly recorded as sliding matching sequences.

In one embodiment, after step S5035, the method further includes:

If the frame number of the images to be matched after the preset frame number step length of the last image to be matched in the second matching sequence is larger than or equal to the preset frame number step length, adding the images to be matched with the preset frame number step length after the last image to be matched in the second matching sequence into the second matching sequence, and deleting the images to be matched with the preset frame number step length in the second matching sequence from the initial frame to obtain a third matching sequence;

Specifically, after detecting whether the frame number of the image to be matched after the preset frame number step length of the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length, if the frame number of the image to be matched after the preset frame number step length of the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step length, adding the image to be matched after the preset frame number step length of the last image to be matched in the second matching sequence to the second matching sequence, and deleting the image to be matched with the preset frame number step length in the second matching sequence from the initial frame to obtain a third matching sequence.

Detecting whether the frame number of the images to be matched after a preset frame number step length after the last image to be matched in the third matching sequence is larger than or equal to the preset frame number step length;

And if the frame number of the images to be matched after the preset frame number step length is smaller than the preset frame number step length after the last image to be matched in the third matching sequence, recording the first matching sequence, the second matching sequence and the third matching sequence as the sliding matching sequence.

Specifically, adding an image to be matched with a preset frame number step length after the last image to be matched in the second matching sequence to the second matching sequence, deleting the image to be matched with the preset frame number step length in the second matching sequence from a starting frame to obtain a third matching sequence, and detecting whether the frame number of the image to be matched with the preset frame number step length after the last image to be matched in the third matching sequence is larger than or equal to the preset frame number step length; and if the frame number of the images to be matched after the preset frame number step length is smaller than the preset frame number step length after the last image to be matched in the third matching sequence, recording the first matching sequence, the second matching sequence and the third matching sequence as sliding matching sequences.

Further, if the frame number of the images to be matched after the preset frame number step length is greater than or equal to the preset frame number step length in the last image to be matched in the third matching sequence, continuing to move the sliding matching window to obtain a fourth matching sequence; further, after the fourth matching sequence, a fifth matching sequence, a sixth matching sequence, etc. may be further included, and the method for determining each matching sequence is already indicated in the above description, so that no further description is given here.

S504: determining sequence similarity scores between the sliding matching sequences and the corresponding first limb element video segments through a preset similarity algorithm;

specifically, after the first limb element video segments are matched in a sliding manner on the second limb element video segments to determine first sliding matching sequences corresponding to the first limb element video segments in the second limb element video segments, determining sequence similarity scores between the sliding matching sequences and the first limb element video segments corresponding to the sliding matching sequences through a preset similarity algorithm (such as a cosine similarity algorithm). The sequence similarity score represents the similarity degree between the sliding matching sequence and the first limb element video segment, the higher the sequence similarity score represents the higher the similarity degree between the sliding matching sequence and the first limb element video segment, and the lower the sequence similarity score represents the lower the similarity degree between the sliding matching sequence and the first limb element video segment.

S505: and recording the maximum sequence similarity score corresponding to the same maximum limb element as the sequence matching value, and recording the sliding matching sequence corresponding to the sequence matching value as the matching index sequence.

It can be understood that there are a plurality of sliding matching sequences for one maximum limb element, and sequence similarity scores corresponding to each sliding matching sequence, so that only the maximum sequence similarity score corresponding to the same maximum limb element needs to be recorded as a sequence matching value, and the sliding matching sequence corresponding to the sequence matching value is recorded as a matching index sequence.

S60: and determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

Specifically, after a plurality of matching index sequences matched with the key frame index sequences in each limb sequence to be matched and sequence matching values corresponding to the matching index sequences are determined through a maximum average value video segment identification method, determining a video matching result of the video to be matched according to the sequence matching values corresponding to each matching index sequence.

In one embodiment, step S60 includes:

Obtaining the total number of index sequences of the matching index sequences;

and determining the video matching result through an average algorithm according to the sequence matching value corresponding to each matching index sequence and the total number of the index sequences.

It is understood that the total number of index sequences is the total number of matching index sequences; specifically, after a plurality of matching index sequences matched with the key frame index sequences in each limb sequence to be matched and sequence matching values corresponding to the matching index sequences are determined through a maximum average value video segment identification method, the total number of the index sequences of the matching index sequences is obtained, and the ratio between the total number of the index sequences and the sum of the sequence matching values corresponding to each matching index sequence according to the sequence matching values corresponding to each matching index sequence and the total number of the index sequences is recorded as a video matching result. The video matching result characterizes the matching degree between the video to be matched and the standard video, and if the video matching result is applied to a dancing video scoring scene, the video matching result can score the dancing of the video to be matched.

In the embodiment, the two sections of video are aligned by combining an extremum key frame identification method and a maximum average value video section identification method, and then performing low-dimensional corresponding frame similarity calculation after an aligned section is found out through a maximum limb element and a minimum limb element; the accuracy of video matching is improved; the embodiment also introduces a preset limb classification model, when the video to be matched is not a normal matching video, the follow-up step is not required to be executed to determine the video matching result, so that the calculation complexity of a computer is reduced, and meanwhile, the video matching efficiency and accuracy are improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, a video matching device based on limb identification is provided, where the video matching device based on limb identification corresponds to the video matching method based on limb identification in the above embodiment one by one. As shown in fig. 6, the video matching device based on limb recognition includes a standard video acquisition module 10, a matching video judgment module 20, a gesture recognition module 30, a key frame recognition module 40, a sequence matching value determination module 50, and a video matching result determination module 60. The functional modules are described in detail as follows:

The standard video acquisition module 10 is used for acquiring videos to be matched and standard videos corresponding to the videos to be matched; the video to be matched comprises at least one frame of image to be matched; the standard video comprises at least one frame of standard image;

The matching video judging module 20 is configured to input the video to be matched to a preset limb classification model to determine whether the video to be matched is a normal matching video;

The gesture recognition module 30 is configured to, if the video to be matched is a normal matching video, input the standard video and the video to be matched into a preset gesture recognition model respectively, to obtain a standard limb sequence corresponding to each frame of standard image and a limb sequence to be matched corresponding to each frame of image to be matched;

The key frame identification module 40 is configured to determine a key frame index sequence corresponding to the standard video according to each standard limb sequence by using an extremum key frame identification method;

the sequence matching value determining module 50 is configured to determine, by using a maximum average video segment identification method, a plurality of matching index sequences matched with the key frame index sequences in each limb sequence to be matched and sequence matching values corresponding to the matching index sequences;

The video matching result determining module 60 is configured to determine a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence.

Preferably, the matching video judging module 20 includes:

The limb action recognition unit is used for determining whether limb actions are contained in each frame of images to be matched in the video to be matched or not through the preset limb classification model;

The image quantity acquisition unit is used for acquiring a first total quantity of images to be matched containing limb actions and a second total quantity of all the images to be matched in the video to be matched;

And the image quantity comparison unit is used for determining that the video to be matched is a normal matched video when the ratio between the first total quantity and the second total quantity is larger than or equal to a preset quantity ratio.

Preferably, as shown in fig. 7, the key frame identification module 40 includes:

A limb element obtaining unit 401, configured to obtain standard limb elements in all the standard limb sequences, where each standard limb sequence includes a preset number of standard limb elements; and only one of each type of said standard limb element in one of said standard limb sequences;

A limb element extracting unit 402, configured to extract a maximum limb element and a minimum limb element in each type of standard limb elements;

A frame index sequence generating unit 403, configured to generate a maximum frame index sequence according to all the extracted maximum limb elements and image frame labels corresponding to the maximum limb elements, and generate a minimum frame index sequence according to all the extracted minimum limb elements and image frame labels corresponding to the minimum limb elements;

and a sequence merging unit 404, configured to perform sequence merging processing on the maximum frame index sequence and the minimum frame index sequence, so as to obtain the key frame index sequence.

Preferably, as shown in fig. 8, the sequence matching value determining module 50 includes:

A first video segment selection unit 501, configured to select, from the standard video, a first limb element video segment corresponding to each maximum limb element according to an image frame tag corresponding to each maximum limb element; the first limb element video segment comprises standard images of a first preset number of frames;

A second video segment selection unit 502, configured to select, from the video to be matched, a second limb element video segment corresponding to each maximum limb element according to an image frame tag corresponding to each maximum limb element; the second limb element video segment comprises images to be matched of a second preset number of frames; the second preset number of frames is greater than the first preset number of frames;

A sliding matching unit 503, configured to slide-match the first element video segment on the second element video segment, so as to determine a sliding matching sequence corresponding to each first element video segment in the second element video segment;

A similarity matching unit 504, configured to determine, by using a preset similarity algorithm, a sequence similarity score between each of the sliding matching sequences and the first limb element video segment corresponding to the sliding matching sequence;

and the matching index sequence determining unit 505 is configured to record, as the sequence matching value, a maximum sequence similarity score corresponding to the same maximum limb element, and record, as the matching index sequence, a sliding matching sequence corresponding to the sequence matching value.

Preferably, the slide matching unit 503 includes:

A matching image display subunit 5031, configured to display each frame of image to be matched in the second limb element video segment on a preset video time axis according to a time sequence;

A first matching sequence recording subunit 5032, configured to record, as a first matching sequence, the second limb element video segment that is the same as the first preset number of frames after the first limb element video segment is used as a matching sliding window and the matching sliding window is aligned with a start frame of the second limb element video segment;

A window moving subunit 5033, configured to move the matched sliding window on the second limb element video segment by a preset frame number step in a direction away from the start frame;

A second matching sequence determining subunit 5034, configured to add an image to be matched with a preset frame number step size after a last image to be matched in the first matching sequence to the first matching sequence, and delete the image to be matched with the preset frame number step size in the first matching sequence from the start frame to obtain a second matching sequence;

A first image frame number detection subunit 5035, configured to detect whether the frame number of the image to be matched after the preset frame number step after the last image to be matched in the second matching sequence is greater than or equal to the preset frame number step;

A first sliding matching sequence recording subunit 5036, configured to record the first matching sequence and the second matching sequence as the sliding matching sequence if the number of frames of the images to be matched after the preset number of frame steps after the last image to be matched in the second matching sequence is smaller than the preset number of frame steps.

Preferably, the slide matching unit 503 further includes:

A third matching sequence determining subunit, configured to add, if the number of frames of the image to be matched after the preset number of frames step after the last image to be matched in the second matching sequence is greater than or equal to the preset number of frames step, the image to be matched after the preset number of frames step after the last image to be matched in the second matching sequence to the second matching sequence, and delete the image to be matched with the preset number of frames step in the second matching sequence from the start frame, to obtain a third matching sequence;

A second image frame number detection subunit, configured to detect whether the frame number of the image to be matched after a preset frame number step after the last image to be matched in the third matching sequence is greater than or equal to the preset frame number step;

And the second sliding matching sequence recording subunit is used for recording the first matching sequence, the second matching sequence and the third matching sequence as the sliding matching sequence if the frame number of the images to be matched after the preset frame number step length is smaller than the preset frame number step length after the last image to be matched in the third matching sequence.

Preferably, the video matching result determination module 60 includes:

an index sequence total number obtaining unit, configured to obtain the index sequence total number of the matching index sequence;

And the video matching result determining unit is used for determining the video matching result through an average algorithm according to the sequence matching value corresponding to each matching index sequence and the total number of the index sequences.

For specific limitations on the limb identification-based video matching apparatus, reference may be made to the above limitations on the limb identification-based video matching method, and no further description is given here. The various modules in the video matching device based on limb identification can be implemented in whole or in part by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data used by the video matching method based on limb identification in the above embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a video matching method based on limb identification.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the limb identification-based video matching method of the above embodiments when the computer program is executed by the processor.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the limb identification based video matching method of the above embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A video matching method based on limb identification, comprising:

Determining a video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence;

The determining the key frame index sequence corresponding to the standard video according to each standard limb sequence through an extremum key frame identification method comprises the following steps:

obtaining standard limb elements in all the standard limb sequences, wherein each standard limb sequence comprises standard limb elements of a preset number of categories, and each standard limb element of each category in each standard limb sequence is only one;

Extracting the maximum limb element and the minimum limb element in each type of standard limb elements; the maximum limb element refers to the maximum value of the two-dimensional coordinate information in each type of standard limb element, and the minimum limb element refers to the minimum value of the coordinate information in each type of standard limb element;

Generating a maximum frame index sequence according to all the extracted maximum limb elements and the image frame labels corresponding to the maximum limb elements, and generating a minimum frame index sequence according to all the extracted minimum limb elements and the image frame labels corresponding to the minimum limb elements;

Performing sequence combination processing on the maximum frame index sequence and the minimum frame index sequence to obtain the key frame index sequence;

The determining, by the maximum-mean video segment identification method, a plurality of matching index sequences matched with the key frame index sequences and sequence matching values corresponding to the matching index sequences in each limb sequence to be matched includes:

selecting a first limb element video segment corresponding to each maximum limb element from the standard video according to the image frame label corresponding to each maximum limb element; the first limb element video segment comprises standard images of a first preset number of frames;

selecting a second limb element video segment corresponding to each maximum limb element from the video to be matched according to the image frame label corresponding to each maximum limb element; the second limb element video segment comprises images to be matched of a second preset number of frames; the second preset number of frames is greater than the first preset number of frames;

Sliding and matching the first limb element video segment on the second limb element video segment to determine a sliding and matching sequence corresponding to each first limb element video segment in the second limb element video segment;

Determining sequence similarity scores between the sliding matching sequences and the corresponding first limb element video segments through a preset similarity algorithm;

And recording the maximum sequence similarity score corresponding to the same maximum limb element as the sequence matching value, and recording the sliding matching sequence corresponding to the sequence matching value as the matching index sequence.

2. The method for matching video based on limb identification according to claim 1, wherein the inputting the video to be matched to a preset limb classification model to determine whether the video to be matched is a normal matching video comprises:

3. The method for matching video based on limb identification according to claim 1, wherein the sliding matching the first limb element video segment over the second limb element video segment to determine a sliding matching sequence corresponding to each of the first limb element video segments in the second limb element video segment comprises:

displaying the images to be matched of each frame in the video segment of the second limb element on a preset video time axis according to a time sequence;

the first limb element video segment is used as a matching sliding window, and after the matching sliding window is aligned with the initial frame of the second limb element video segment, the second limb element video segment which is the same as the first preset number of frames is recorded as a first matching sequence;

Moving the matched sliding window on the second limb element video segment by a preset frame number step length in a direction away from the initial frame;

Adding the images to be matched with the preset frame step length after the last image to be matched in the first matching sequence into the first matching sequence, and deleting the images to be matched with the preset frame step length in the first matching sequence from the initial frame to obtain a second matching sequence;

detecting whether the frame number of the images to be matched after a preset frame number step length after the last image to be matched in the second matching sequence is larger than or equal to the preset frame number step length;

And if the frame number of the images to be matched after the preset frame number step length is smaller than the preset frame number step length after the last image to be matched in the second matching sequence, recording the first matching sequence and the second matching sequence as the sliding matching sequence.

4. The method for matching video based on limb recognition according to claim 3, wherein the detecting whether the image to be matched after the preset frame step after the last image to be matched in the second matching sequence is greater than or equal to the preset frame step comprises:

5. The method for matching video based on limb identification according to claim 1, wherein the determining the video matching result of the video to be matched according to the sequence matching value corresponding to each matching index sequence comprises:

Obtaining the total number of index sequences of the matching index sequences;

6. A limb-identification-based video matching apparatus for performing the limb-identification-based video matching method according to any one of claims 1 to 5, the limb-identification-based video matching apparatus comprising:

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the limb identification based video matching method according to any of claims 1 to 5 when executing the computer program.

8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the limb identification based video matching method of any one of claims 1 to 5.