WO2021174697A1 - 人体姿态评估方法、装置、计算机设备及存储介质 - Google Patents

人体姿态评估方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021174697A1
WO2021174697A1 PCT/CN2020/093332 CN2020093332W WO2021174697A1 WO 2021174697 A1 WO2021174697 A1 WO 2021174697A1 CN 2020093332 W CN2020093332 W CN 2020093332W WO 2021174697 A1 WO2021174697 A1 WO 2021174697A1
Authority
WO
WIPO (PCT)
Prior art keywords
evaluated
image
standard
coordinate
human body
Prior art date
Application number
PCT/CN2020/093332
Other languages
English (en)
French (fr)
Inventor
陈嘉莉
周超勇
刘玉宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174697A1 publication Critical patent/WO2021174697A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Definitions

  • This application relates to the field of image processing in the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for evaluating human posture.
  • the embodiments of the present application provide a human body posture evaluation method, device, computer equipment, and storage medium to solve the problem that the human body posture cannot be quickly and accurately evaluated.
  • a method for evaluating human posture including:
  • the standard comparison data includes standard coordinates and a confidence level corresponding to each of the standard coordinates
  • the similarity between the coordinate to be evaluated and the standard coordinate is calculated by the confidence level to obtain the evaluation information of the image to be evaluated.
  • a method for evaluating human posture including:
  • to-be-processed video data is video data including a user's posture recorded by a video capture device
  • an evaluation score of the video data to be processed is calculated.
  • a human body posture evaluation device includes:
  • An image acquisition module to be evaluated configured to acquire an image to be evaluated, where the image to be evaluated is an image that includes a user's posture
  • the input module is configured to input the image to be evaluated into a preset human body pose estimation network to obtain human body key point data, where the human body key point data includes key point coordinates and a human body enclosing frame;
  • a zoom processing module configured to perform zoom processing on the image to be evaluated through the human body enclosing frame, and perform coordinate transformation on the coordinates of the key points according to the image to be evaluated after the zoom processing, to obtain the coordinates to be evaluated;
  • the standard comparison data acquisition module is used to acquire standard comparison data, where the standard comparison data includes standard coordinates and a confidence level corresponding to each of the standard coordinates;
  • the similarity calculation module is configured to calculate the similarity between the coordinate to be evaluated and the standard coordinate according to the confidence, to obtain the evaluation information of the image to be evaluated.
  • a human body posture evaluation device includes:
  • the to-be-processed video data acquisition module is used to acquire the to-be-processed video data, and the to-be-processed video data is the video data including the user's posture recorded by the video acquisition device;
  • the extraction module is used to extract a set of images to be evaluated from the video data to be processed according to a preset time node, and the set of images to be evaluated includes N images to be evaluated;
  • the evaluation module is used to use the human posture evaluation method to evaluate each image to be evaluated in the image set to be evaluated, and obtain the evaluation information of each image to be evaluated;
  • the evaluation score calculation module is used to calculate the evaluation score of the video data to be processed according to the evaluation information of each image to be evaluated.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the standard comparison data includes standard coordinates and a confidence level corresponding to each of the standard coordinates
  • the similarity between the coordinate to be evaluated and the standard coordinate is calculated by the confidence level to obtain the evaluation information of the image to be evaluated.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • to-be-processed video data is video data including a user's posture recorded by a video capture device
  • an evaluation score of the video data to be processed is calculated.
  • One or more readable storage media storing computer readable instructions, when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
  • the standard comparison data includes standard coordinates and a confidence level corresponding to each of the standard coordinates
  • the similarity between the coordinate to be evaluated and the standard coordinate is calculated by the confidence level to obtain the evaluation information of the image to be evaluated.
  • This application evaluates and analyzes the similarity between each coordinate to be assessed and the standard coordinate in the image to be assessed, and realizes the rapid and accurate assessment of the user's human body posture, thereby facilitating the provision of accurate exercise guidance and targeted training to the user. Feedback.
  • FIG. 1 is a schematic diagram of an application environment of the human body posture evaluation method in an embodiment of the present application
  • FIG. 2 is an example diagram of a human body posture evaluation method in an embodiment of the present application
  • FIG. 3 is another example diagram of a human body posture evaluation method in an embodiment of the present application.
  • FIG. 4 is another example diagram of a human body posture evaluation method in an embodiment of the present application.
  • FIG. 5 is another example diagram of a human body posture evaluation method in an embodiment of the present application.
  • Fig. 6 is another example diagram of a human body posture evaluation method in an embodiment of the present application.
  • Fig. 7 is a functional block diagram of a human body posture evaluation device in an embodiment of the present application.
  • FIG. 8 is another principle block diagram of the human body posture evaluation device in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the human body posture evaluation method provided by the embodiment of the present application can be applied in the application environment as shown in FIG. 1.
  • the human body posture evaluation method is applied to a human body posture evaluation system.
  • the human body posture evaluation system includes a client and a server as shown in FIG.
  • the problem of quick and accurate assessment of posture is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
  • the client can be installed on, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented with an independent server or a server cluster composed of multiple servers.
  • a method for evaluating human posture is provided.
  • the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • S11 Obtain an image to be evaluated, and the image to be evaluated is an image including a user's posture.
  • the image to be evaluated refers to the image to be evaluated for the posture of the human body.
  • the image to be evaluated includes an image of the user's posture, that is, the image to be evaluated includes the user's standing posture, sitting posture, kneeling posture, or any other posture.
  • an image containing the user's posture can be collected in real time through a camera as the image to be evaluated, or an image containing the user's posture can be collected in advance as the image to be evaluated, or the user's posture image can be obtained directly from the user posture library as the image to be evaluated.
  • the user's posture image can also be obtained from the data set disclosed by the Internet or a third-party organization/platform as the image to be evaluated, such as a fitness guidance App.
  • S12 Input the image to be evaluated into a preset human body posture estimation network to obtain human body key point data.
  • the human body key point data includes key point coordinates and a human body enclosing frame.
  • the human body pose estimation network refers to a pre-built network framework that can recognize the user's pose in the image to be evaluated, and output a recognition result, that is, the key point data of the human body.
  • the human body pose estimation network uses the OpenPose framework.
  • openpose is an open-source library based on convolutional neural networks and supervised learning, using caffe as the framework of Ctrip. It can track human facial expressions, torso, limbs, and even fingers. It can output the location of key points of human faces and key points of human hands. Positioning and positioning of various joints of the human body; the OpenPose framework is not only suitable for a single person but also for multiple people, and has good robustness.
  • the image to be evaluated is input into a preset human body pose estimation network, and the human body’s joints (neck, shoulder, elbow, etc.) are connected to calculate the relative position of the human body’s key points in the three-dimensional space to observe the human body.
  • the position of the key point changes, thereby estimating the posture of the human body, and obtaining the key point data of the human body.
  • the key point data of the human body includes the coordinates of the key point and the enclosing frame of the human body.
  • the key point coordinate refers to the coordinate position of the key point of the human body posture recognized from the image to be evaluated.
  • the key points of human body posture can be 14 key points such as head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle and right ankle.
  • the key point coordinates can be represented by a plane two-dimensional coordinate.
  • each image to be evaluated includes multiple key point coordinates.
  • the human body enclosing frame refers to the outer enclosing bounding box surrounding the key points of the human body.
  • the human body enclosing frame can be represented by the coordinate values of four points on the enclosing frame.
  • S13 Perform zoom processing on the image to be evaluated by enclosing the human body, and perform coordinate transformation on the key point coordinates according to the zoomed image to be evaluated, to obtain the coordinates to be evaluated.
  • the coordinates to be evaluated refer to coordinate information obtained after coordinate transformation of the key point coordinates in the image to be evaluated.
  • scaling the image to be evaluated by the human body enclosing frame refers to cropping the image to be evaluated according to the human enclosing frame, and only the part of the image within the human enclosing frame is retained; then the cropped image to be evaluated is processed according to the preset standard size The process of scaling. It is understandable that the image to be evaluated after the scaling process only includes the image part within the frame of the human body.
  • the size of the image to be evaluated after the scaling process may be the same or different from the size of the image to be evaluated before the scaling process.
  • first crop the image to be evaluated according to the human body enclosing frame crop the part outside the human encircling frame in the image to be evaluated, and then scale the cropped image to be evaluated to an image of the same size as the image to be evaluated before cropping.
  • the coordinates of the key points in the image to be evaluated are changed equally, that is, according to the same zoom ratio as the image to be evaluated, the coordinates of the corresponding key points are transformed in the same coordinate to obtain the to-be-assessed image coordinate.
  • the key point coordinates of the human body key point A in the image to be evaluated are (x 1 , y 1 ). If the cropped image to be evaluated is magnified by 2 times, the key point A of the human body in the image to be evaluated is obtained.
  • the evaluation coordinates are (2x 1 , 2y 1 ).
  • the standard comparison data refers to the pre-collected standard data used to evaluate whether the posture of the human body in the image to be evaluated meets the requirements.
  • the standard comparison data may be a reference image of the human body posture of a fitness coach, or a reference image of the human body posture that meets the requirements after evaluation and screening of the human body posture estimation network.
  • the standard comparison data obtained in this step and the human body posture in the image to be evaluated belong to the same posture type data.
  • the fitness video of the fitness trainer can be input into the preset human posture estimation network in advance, and then the multiple output reference images of the human posture and the standard coordinates included in each reference image of the human posture are corresponding to each standard coordinate.
  • the confidence level of is stored in the database of the server.
  • the standard comparison data includes standard coordinates and the confidence level corresponding to each standard coordinate.
  • Standard coordinates refer to the position information corresponding to each key point of the human body in the standard comparison data.
  • standard coordinates can also be represented by a plane two-dimensional coordinate. Confidence is the probability value used to show that each standard coordinate position is correct. Understandably, the degree of importance of each standard coordinate can be judged according to the degree of confidence corresponding to each standard coordinate.
  • standard comparison data includes standard coordinates (x 1 , y 1 ) and corresponding confidence levels PA, standard coordinates (x 2 , y 2 ) and corresponding confidence levels PB, etc. If PA>PB, it means standard coordinates ( The importance of x 1 , y 1 ) is greater than the standard coordinates (x 2 , y 2 ).
  • S15 Calculate the similarity between the coordinate to be evaluated and the standard coordinate by the confidence, to obtain the evaluation information of the image to be evaluated.
  • the evaluation information refers to information used to evaluate the accuracy of the user's posture in the image to be evaluated.
  • the evaluation information may be a specific evaluation score. The higher the evaluation score, the more accurate the user posture in the image to be evaluated.
  • the weight of each standard coordinate can be set according to the confidence corresponding to each standard coordinate.
  • the standard coordinate with high confidence corresponds to a higher weight
  • the standard coordinate with low confidence corresponds to a lower weight
  • Similarity algorithm such as cosine similarity algorithm, calculates the similarity between each coordinate to be evaluated and the corresponding standard coordinate, and obtains the initial similarity value.
  • all initial similarities are calculated The value is weighted and calculated to obtain the evaluation information of the image to be evaluated.
  • the evaluation calculation formula can be defined in advance according to the confidence and similarity algorithm, and then the evaluation calculation formula is directly used to calculate the similarity between each coordinate to be evaluated and the corresponding standard coordinate, so as to obtain the evaluation information of the image to be evaluated.
  • the image to be evaluated is obtained, which is an image including the posture of the user;
  • the image to be evaluated is input into a preset human posture estimation network to obtain human body key point data, which includes key point coordinates and human body Enclosing frame:
  • the image to be evaluated is scaled by enclosing the frame of the human body, and the coordinates of the key points are transformed according to the scaled image to be evaluated to obtain the coordinates to be evaluated;
  • the standard comparison data is obtained, and the standard comparison data includes standard coordinates and Confidence degree corresponding to each standard coordinate; calculate the similarity between the coordinate to be evaluated and the standard coordinate by the confidence degree to obtain the evaluation information of the image to be evaluated; evaluate and analyze the similarity of each coordinate to be evaluated and the standard coordinate in the image to be evaluated ,
  • To achieve a quick and accurate assessment of the user's human body posture so as to facilitate the provision of accurate exercise guidance and targeted feedback to the user.
  • calculating the similarity between the coordinate to be evaluated and the standard coordinate by the confidence degree, to obtain the evaluation information of the image to be evaluated specifically includes the following steps:
  • D(F, G) is the similarity between the coordinate to be evaluated and the standard coordinate
  • K is the number of standard coordinates.
  • the distance between the k-th coordinate to be evaluated and the k-th standard coordinate Is the distance between the k-th coordinate to be evaluated and the k-th standard coordinate, and then the distance between the k-th coordinate to be evaluated and the k-th standard coordinate is multiplied by the confidence level corresponding to the k-th standard coordinate, namely The similarity between the k-th coordinate to be evaluated and the k-th standard coordinate can be obtained.
  • the preset above formula can be directly used to obtain the similarity between all the coordinates to be evaluated and the corresponding standard coordinates in the image to be evaluated.
  • the similarity between the coordinate to be evaluated and the standard coordinate can be represented by a specific numerical value, and a similarity of 1 means that the coordinate to be evaluated and the standard coordinate are exactly the same.
  • the similarity between the coordinate to be evaluated and the standard coordinate can be 0.8, 0.85 or 0.9.
  • S152 Transform the similarity to obtain evaluation information of the image to be evaluated.
  • the similarity can be converted according to a preset conversion rule to obtain the evaluation information of the image to be evaluated.
  • the conversion rule may be to first convert the similarity between the coordinate to be evaluated and the standard coordinate into a specific evaluation score, and then give specific evaluation suggestions based on the evaluation score, so as to obtain the evaluation information of the image to be evaluated. For example: if the similarity between the coordinate to be evaluated and the standard coordinate is 0.9, the corresponding evaluation score can be 90 points.
  • the similarity between the coordinate to be evaluated and the standard coordinate is calculated by the following formula:
  • D(F, G) is the similarity between the coordinate to be evaluated and the standard coordinate
  • Is the k-th coordinate to be evaluated among the coordinates to be evaluated Is the k-th standard coordinate in the standard coordinates
  • K is the number of standard coordinates
  • the similarity is transformed to obtain the evaluation information of the image to be evaluated; thus, the accuracy of the evaluation information of the generated image to be evaluated is improved.
  • the image to be evaluated is scaled by the human body enclosing the frame, and the coordinates of the key points are transformed according to the scaled image to be evaluated to obtain the coordinates to be evaluated, which specifically includes the following steps:
  • S131 Cropping the image to be evaluated by enclosing the frame of the human body, and scaling the cropped image to be evaluated according to a preset standard size.
  • cutting the image to be evaluated by the human body enclosing frame refers to a process of cutting off the outer part of the human encircling frame in the image to be evaluated, and only retaining the inner part of the human enclosing frame.
  • an image cropping tool can be used to implement cropping processing of the image to be evaluated.
  • the image cropping tool can be jQuery Jcrop image cropping tool or FOTOE image cropping tool, etc.
  • the image segmentation algorithm of opencv can also be used to automatically realize the cropping of the image to be evaluated.
  • the preset standard size refers to a preset standard image size.
  • the preset standard size can be 600*600, 750*750 or 800*800, etc.
  • the standard size is set to the same size as the image size of the image to be evaluated before cropping.
  • an image scaling algorithm can be used to scale the cropped image to be evaluated; or an image scaling tool can be used to scale the cropped image to be evaluated to obtain the image to be evaluated with the same size as the preset standard size.
  • the image scaling algorithm may be a bilinear interpolation algorithm or a trilinear convolution interpolation algorithm.
  • the image zoom tool can be photoshop, iResizer or FastStone Photo Resizer.
  • the scaling parameter refers to a parameter obtained by scaling the standard size and the image size of the image to be evaluated after cropping. For example: if the image size of the image to be evaluated after cropping is 600*600, and the standard size is 800*1000; the zoom parameter obtained is (4/3, 5/3); among them, 4/3 is in the x-axis direction The zoom ratio of 5/3 is the zoom ratio in the y-axis direction. Specifically, the coordinate transformation of the key point is performed on the same scale through the obtained scaling parameter.
  • the key point coordinates are (12,15) and the zoom parameter is (4/3,5/3)
  • the key point coordinates after coordinate transformation is (16,25), that is, the x-axis direction in the key point coordinates
  • the value of 12 on the scale is 4/3 times
  • the value of 15 in the y-axis direction in the key point coordinates is scaled by 5/3 times.
  • S133 Perform L1 normalization or L2 normalization processing on the key point coordinates after coordinate transformation to obtain the coordinates to be evaluated.
  • the key point coordinates after coordinate transformation are regarded as a vector array, and then L1 or L2 normalization processing is performed on the vectors in the vector array to obtain the coordinates to be evaluated.
  • the L1 normalization of the vectors in the vector array is to scale the vectors in the pointer array to unit norm, and the L2 normalization of the vectors in the vector array means to uniformize each vector in the vector array Process and sum them, and the result will be 1.
  • the user can choose any normalization method according to the actual situation. This solution does not make specific restrictions.
  • the key point coordinates after coordinate transformation are subjected to L1 or L2 normalization processing, thereby ensuring the accuracy of the generated coordinates to be evaluated.
  • the image to be evaluated is cropped by the human body enclosing the frame, and the cropped image to be evaluated is scaled according to a preset standard size to generate scaling parameters; the key point coordinates are transformed by the scaling parameters; After the coordinate transformation, the key point coordinates are normalized by L1 or L2 to obtain the coordinate to be evaluated; thus, the accuracy of the generated coordinate to be evaluated is ensured, and the accuracy of the subsequent similarity calculation using the coordinate to be evaluated and the standard data is further improved Accuracy.
  • a method for evaluating a human body posture is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:
  • the video data to be processed is video data including a user's posture recorded by a video capture device.
  • the video data to be processed is the original video data to be processed.
  • the video data to be processed is video data including the user's posture recorded by a video capture device.
  • the video capture device sends the recorded video data to be processed to the server, and the server obtains the video data to be processed.
  • the preset time node refers to a preset time point for extracting the image to be evaluated for the video data to be processed.
  • the time node can be 1 minute 23 seconds, 2 minutes 23 seconds, 2 minutes 23 seconds, and so on. Understandably, there may be one or more preset time nodes.
  • a time node can be set in the initial stage, intermediate stage and final stage of the video data to be processed. Specifically, according to the preset time node, the image to be evaluated corresponding to each time node is extracted from the video data to be processed, and then the image to be evaluated corresponding to each time node is formed into the image set to be evaluated.
  • the image set includes N images to be evaluated.
  • the filter function in FFmpeg can be used to extract the image of the video data to be processed.
  • FFmpeg is a set of open source computer-readable instructions that can be used to record, convert digital audio and video, and convert them into streams.
  • the crop function in filter is used to extract the image of the video data to be processed.
  • the image to be evaluated corresponding to each time node extracted from the video data to be processed is at least For two.
  • each image to be evaluated in the image set to be evaluated is evaluated, and the evaluation information of each image to be evaluated can be obtained. I will not repeat them here.
  • S24 Calculate the evaluation score of the video data to be processed according to the evaluation information of each image to be evaluated.
  • the evaluation information of each image to be evaluated is integrated to obtain the evaluation score of the video data to be processed. Since the evaluation information of each image to be evaluated includes a corresponding evaluation score, in this step, the evaluation scores in the evaluation information of each image to be evaluated are statistically summed and then averaged to obtain The evaluation score of the video data to be processed.
  • the to-be-processed video data is the video data including the user's posture recorded by the video capture device; the to-be-evaluated image set is extracted from the to-be-processed video data according to a preset time node,
  • the image set to be evaluated includes N images to be evaluated; the human posture evaluation method is used to evaluate each image to be evaluated in the image set to be evaluated, and the evaluation information of each image to be evaluated is obtained; according to the evaluation information of each image to be evaluated, Calculate the evaluation score of the video data to be processed; extract the image to be evaluated in the video data to be processed, and then calculate the evaluation score of the video data to be processed according to the evaluation information of the image to be evaluated, thereby realizing the user's body posture in the video data to be processed Perform fast and accurate assessments.
  • the time node includes M sub-time nodes, and each sub-time node corresponds to at least one image to be evaluated;
  • Extracting the image set to be evaluated from the video data to be processed according to the preset time node specifically includes the following steps:
  • a preset number of images to be evaluated are extracted from the video data to be processed according to each sub-time node, and the images to be evaluated corresponding to each sub-time node are formed into a set of images to be evaluated.
  • the preset number refers to the preset number of images to be evaluated corresponding to each sub-time node extracted from the video data to be processed.
  • the preset number is greater than or equal to two. Understandably, the greater the preset number, that is, the greater the number of images to be evaluated, the higher the accuracy of subsequent evaluation of the video data to be processed. However, the computational complexity of the server will also be higher.
  • the specific number can be based on different applications. Set as required by the scene. If you focus on recognition accuracy, you can increase the preset number, if you focus on recognition efficiency, you can appropriately reduce the number to be preset.
  • the time node includes M sub-time nodes, and each sub-time node corresponds to at least one image to be evaluated.
  • a preset number of images to be evaluated are extracted from the video data to be processed according to each sub-time node, and then the images to be evaluated corresponding to each sub-time node are formed into a set of images to be evaluated.
  • calculating the evaluation score of the video data to be processed according to the evaluation information of each image to be evaluated includes the following steps:
  • S241 Determine target evaluation information from the preset number of images to be evaluated corresponding to each sub-time node, where the target evaluation information is the evaluation information indicating the highest similarity to the corresponding standard coordinate.
  • the most representative image to be evaluated can be selected from the preset number of images to be evaluated corresponding to each sub-time node
  • the evaluation information corresponding to each target evaluation image is determined as the target evaluation information.
  • the target evaluation information is the evaluation information indicating the highest similarity with the corresponding standard coordinates.
  • selecting the target evaluation image from the preset number of images to be evaluated corresponding to each sub-time node can be implemented by pre-training the corresponding neural network model to obtain a gesture recognition model. That is, by labeling a large amount of image data representing different poses and then inputting it into a neural network model for training, the pose recognition model is obtained.
  • S242 Calculate the evaluation score of the video data to be processed according to the target evaluation information corresponding to each sub-time node.
  • the target evaluation information corresponding to each sub-time node is integrated to obtain the evaluation score of the video data to be processed. Understandably, since each target evaluation information includes a corresponding evaluation score, in this step, the evaluation scores in each target evaluation information are statistically summed and then averaged to obtain the to-be-processed The evaluation score of the video data.
  • the target evaluation information is determined from the preset number of images to be evaluated corresponding to each sub-time node, where the target evaluation information is the evaluation information indicating the highest similarity to the corresponding standard coordinates;
  • the target evaluation information corresponding to a sub-time node calculates the evaluation score of the to-be-processed video data; thereby, the accuracy of the calculated evaluation score of the to-be-processed video data is further improved.
  • a human body posture evaluation device is provided, and the human body posture evaluation device corresponds to the human body posture evaluation method in the above-mentioned embodiment in a one-to-one correspondence.
  • the human posture evaluation device includes an image acquisition module 11 to be evaluated, an input module 12, a zoom processing module 13, a standard comparison data acquisition module 14, and a similarity calculation module 15.
  • the detailed description of each functional module is as follows:
  • the image to be evaluated acquisition module 11 is used to acquire an image to be evaluated, and the image to be evaluated is an image that includes a user's posture;
  • the input module 12 is used to input the image to be evaluated into a preset human body pose estimation network to obtain human body key point data.
  • the human body key point data includes key point coordinates and a human body enclosing frame;
  • the zoom processing module 13 is used to perform zoom processing on the image to be evaluated by enclosing the frame of the human body, and perform coordinate transformation on the coordinates of the key points according to the image to be evaluated after the zoom processing, to obtain the coordinates to be evaluated;
  • the standard comparison data acquisition module 14 is used to acquire standard comparison data, and the standard comparison data includes standard coordinates and the confidence level corresponding to each standard coordinate;
  • the similarity calculation module 15 is used to calculate the similarity between the coordinate to be evaluated and the standard coordinate by the degree of confidence to obtain the evaluation information of the image to be evaluated.
  • the similarity calculation module 15 includes:
  • the similarity calculation unit 151 is used to calculate the similarity between the coordinate to be evaluated and the standard coordinate by the following formula:
  • D(F,G) is the similarity between the coordinate to be evaluated and the standard coordinate
  • K is the number of standard coordinates
  • the conversion unit 152 is configured to convert the similarity to obtain evaluation information of the image to be evaluated.
  • the scaling processing module 13 includes:
  • the cropping and zooming unit is used to crop the image to be evaluated by enclosing the frame of the human body, and to scale the cropped image to be evaluated according to a preset standard size;
  • Coordinate transformation unit used for coordinate transformation of key point coordinates through scaling parameters
  • the normalization processing unit is used to perform L1 normalization or L2 normalization processing on the key point coordinates after coordinate transformation to obtain the coordinates to be evaluated.
  • the human body posture evaluation device further includes:
  • the to-be-processed video data acquisition module is used to acquire the to-be-processed video data, and the to-be-processed video data is the video data including the user's posture recorded by the video acquisition device;
  • the extraction module is used to extract a set of images to be evaluated from the video data to be processed according to a preset time node, and the set of images to be evaluated includes N images to be evaluated;
  • the evaluation module is used to use the human posture evaluation method to evaluate each image to be evaluated in the image set to be evaluated, and obtain the evaluation information of each image to be evaluated;
  • the evaluation score calculation module is used to calculate the evaluation score of the video data to be processed according to the evaluation information of each image to be evaluated.
  • the extraction module includes:
  • the extraction unit is configured to extract a preset number of images to be evaluated from the video data to be processed according to each sub-time node, and compose the to-be-evaluated images corresponding to each sub-time node into a set of images to be evaluated.
  • the evaluation score calculation module includes:
  • the target evaluation information determining unit is configured to determine the target evaluation information from the preset number of images to be evaluated corresponding to each sub-time node, where the target evaluation information is the evaluation information indicating the highest similarity with the corresponding standard coordinate;
  • the evaluation score calculation unit is used to calculate the evaluation score of the video data to be processed according to the target evaluation information corresponding to each sub-time node.
  • the various modules in the above-mentioned human body posture assessment device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium.
  • the database of the computer device is used to store the data used in the human body posture evaluation method in the above embodiment.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize a human body posture assessment method.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor executes the computer-readable instructions, Human body posture assessment method.
  • one or more readable storage media storing computer readable instructions are provided.
  • the readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage media
  • the readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one or more processors implement the human body posture assessment method in the above-mentioned embodiment.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

涉及人工智能技术领域,一种人体姿态评估方法、装置、计算机设备及存储介质。人体姿态评估方法包括:获取待评估图像(S11);将待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,人体关键点数据包括关键点坐标和人体包围边框(S12);通过人体包围边框对待评估图像进行缩放处理,并且根据缩放处理后的待评估图像对关键点坐标进行坐标变换,得到待评估(S13);获取标准比对数据,标准比对数据包括标准坐标和每一标准坐标对应的置信度(S14);通过置信度计算待评估坐标和标准坐标的相似度,得到待评估图像的评估信息(S15)。由此,通过评估与分析待评估图像中每一待评估坐标和标准坐标的相似度,实现了对用户的人体姿态进行快速和准确的评估。

Description

人体姿态评估方法、装置、计算机设备及存储介质
本申请以2020年3月6日提交的申请号为202010152307.9,名称为“人体姿态评估方法、装置、计算机设备及存储介质”的中国申请专利申请为基础,并要求其优先权。
技术领域
本申请涉及人工智能技术领域的图像处理领域,尤其涉及一种人体姿态评估方法、装置、计算机设备及存储介质。
背景技术
通过图像分析人体的姿态是计算机视觉研究的重要问题。目前,人体姿态评估被广泛应用于人机交互、电影特效以及智能监控***等诸多领域。例如;在运动领域,越来越多的人通过健身指导类app进行健身运动。然而,发明人意识到,用户在对照视频教学进行运动练习时,可能由于动作不标准而导致运动效果降低甚至受伤。因此,如何快速准确地评估用户的动作姿态的正确性,给用户提供更加科学的健身指导,成为目前亟待解决的问题。
申请内容
本申请实施例提供一种人体姿态评估方法、装置、计算机设备及存储介质,以解决无法对人体姿态进行快速和准确评估的问题。
一种人体姿态评估方法,包括:
获取待评估图像,所述待评估图像为包括用户姿态的图像;
将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
一种人体姿态评估方法,包括:
获取待处理视频数据,所述待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;
根据预设的时间节点从所述待处理视频数据中提取出待评估图像集,所述待评估图像集包括N个待评估图像;
采用所述人体姿态评估方法,对所述待评估图像集中的每一待评估图像进行评估,得到每一所述待评估图像的评估信息;
根据每一所述待评估图像的所述评估信息,计算所述待处理视频数据的评估分数。
一种人体姿态评估装置,包括:
待评估图像获取模块,用于获取待评估图像,所述待评估图像为包括用户姿态的图像;
输入模块,用于将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
缩放处理模块,用于通过所述人体包围边框对所述待评估图像进行缩放处理,并且根 据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
标准比对数据获取模块,用于获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
相似度计算模块,用于通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
一种人体姿态评估装置,包括:
待处理视频数据获取模块,用于获取待处理视频数据,待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;
提取模块,用于根据预设的时间节点从待处理视频数据中提取出待评估图像集,待评估图像集包括N个待评估图像;
评估模块,用于采用人体姿态评估方法,对待评估图像集中的每一待评估图像进行评估,得到每一待评估图像的评估信息;
评估分数计算模块,用于根据每一待评估图像的评估信息,计算待处理视频数据的评估分数。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待评估图像,所述待评估图像为包括用户姿态的图像;
将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待处理视频数据,所述待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;
根据预设的时间节点从所述待处理视频数据中提取出待评估图像集,所述待评估图像集包括N个待评估图像;
采用人体姿态评估方法,对所述待评估图像集中的每一待评估图像进行评估,得到每一所述待评估图像的评估信息;
根据每一所述待评估图像的所述评估信息,计算所述待处理视频数据的评估分数。
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取待评估图像,所述待评估图像为包括用户姿态的图像;
将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
本申请通过评估与分析待评估图像中每一待评估坐标和标准坐标的相似度,实现了对用户的人体姿态进行快速和准确的评估,从而便于给用户提供精准的运动指导和有针对性的反馈。本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中人体姿态评估方法的一应用环境示意图;
图2是本申请一实施例中人体姿态评估方法的一示例图;
图3是本申请一实施例中人体姿态评估方法的另一示例图;
图4是本申请一实施例中人体姿态评估方法的另一示例图;
图5是本申请一实施例中人体姿态评估方法的另一示例图;
图6是本申请一实施例中人体姿态评估方法的另一示例图;
图7是本申请一实施例中人体姿态评估装置的一原理框图;
图8是本申请一实施例中人体姿态评估装置的另一原理框图;
图9是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的人体姿态评估方法,该人体姿态评估方法可应用如图1所示的应用环境中。具体地,该人体姿态评估方法应用在人体姿态评估***中,该人体姿态评估***包括如图1所示的客户端和服务端,客户端与服务端通过网络进行通信,用于解决无法对人体姿态进行快速和准确评估的问题。其中,客户端又称为用户端,是指与服务端相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种人体姿态评估方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:
S11:获取待评估图像,待评估图像为包括用户姿态的图像。
其中,待评估图像指待进行人体姿态评估的图像。待评估图像包括用户姿态的图像,即待评估图像中包括有用户的站姿、坐姿、跪姿或者其它任意一种姿态等。可选地,获取待评估图像可通过摄像头实时采集包含有用户姿态的图像作为待评估图像,或者预先采集包含有用户姿态的图像作为待评估图像,或者直接从用户姿态库中获取用户姿态图像作为待评估图像,还可以从互联网或第三方机构/平台所公开的数据集中获取用户姿态图像作为待评估图像,例如:健身指导App。
S12:将待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,人体关键点数据包括关键点坐标和人体包围边框。
其中,人体姿态估计网络是指预先搭建的、可对待评估图像中的用户姿态进行识别,并输出一个识别结果,即人体关键点数据的网络框架。在本实施例中,人体姿态估计网络 使用OpenPose框架。openpose是基于卷积神经网络和监督学习并以caffe为框架携程的开源库,可以实现人的面部表情、躯干和四肢甚至手指的跟踪,可输出包含人脸关键点的定位,人手的关键点的定位以及人身体的各个关节的定位;OpenPose框架不仅适用于单人也适用于多人,同时具有较好的鲁棒性。
具体地,将待评估图像输入到预设的人体姿态估计网络中,通过将人体的关节(颈部,肩膀,肘部等)联系起来,对人体关键点在三维空间相对位置的计算,观察人体关键点的位置变化,从而估计人体姿态,得到人体关键点数据。
其中,人体关键点数据包括关键点坐标和人体包围边框。关键点坐标是指从待评估图像中识别出的人体姿态关键点所在的坐标位置。人体姿态关键点可以为头部、脖子、左肩、右肩、左肘、右肘、左手腕、右手腕、左臀、右臀、左膝、右膝、左脚踝和右脚踝等14个关键点。关键点坐标可以用一个平面二维坐标来表示。例如:头部关键点所对应的关键点坐标为(x 1,y 1)、脖子关键点所对应的关键点坐标为(x 2,y 2)和左肩关键点所对应的关键点坐标为(x 3,y 3)等。可以理解地,每一待评估图像包括有多个关键点坐标。人体包围边框是指包围人体关键点的外部包围边界框。优选地,人体包围边框可通过包围边框上的四个点的坐标值表示。
S13:通过人体包围边框对待评估图像进行缩放处理,并且根据缩放处理后的待评估图像对关键点坐标进行坐标变换,得到待评估坐标。
其中,待评估坐标指对待评估图像中的关键点坐标进行坐标变换后得到的坐标信息。具体地,通过人体包围边框对待评估图像进行缩放处理是指根据人体包围边框对待评估图像进行裁剪,只保留人体包围边框内的图像部分;然后按照预设的标准尺寸对裁剪后的待评估图像进行缩放的过程。可以理解地,缩放处理后的待评估图像只包括有人体包围边框内的图像部分。缩放处理后的待评估图像的尺寸大小,与缩放处理前的待评估图像的尺寸大小可能相同或者不同。例如:先根据人体包围边框对待评估图像进行裁剪,裁剪掉待评估图像中人体包围边框外部的部分,然后再将裁剪后的待评估图像缩放至与裁剪前的待评估图像同等尺寸大小的图像。
进一步地,再根据缩放处理后的待评估图像对待评估图像中的关键点坐标进行同等坐标变化,即根据与待评估图像同等的缩放比例,对对应的关键点坐标进行同等坐标变换,得到待评估坐标。例如:待评估图像中人体关键点A的关键点坐标为(x 1,y 1),若对裁剪后的待评估图像进行了放大2倍处理,则得到待评估图像中人体关键点A的待评估坐标为(2x 1,2y 1)。
S14:获取标准比对数据,标准比对数据包括标准坐标和每一标准坐标对应的置信度。
其中,标准比对数据是指预先采集的用于评估待评估图像中的人体姿态是否符合要求的标准数据。例如:标准比对数据可以为健身教练的人体姿态参考图,或者为经人体姿态估计网络评估筛选后的满足要求的人体姿态参考图等。需要说明的是,在本步骤中获取的标准比对数据与待评估图像中的人体姿态属于相同姿态类型的数据。具体地,可以预先将健身教练的健身视频输入预设的人体姿态估计网络中,然后将输出的多个人体姿态参考图以及每一人体姿态参考图所包括的标准坐标,和每一标准坐标对应的置信度存储于服务端的数据库中,当得到待评估图像的待评估坐标之后,直接从数据库中获取与待评估图像相对应的标准比对数据。
其中,标准比对数据包括标准坐标和每一标准坐标对应的置信度。标准坐标是指标准比对数据中各个人体关键点所对应的位置信息。同样地,标准坐标也可以用一个平面二维坐标来表示。置信度是用于表明每一标准坐标位置为正确的概率值。可以理解地,可以根据每一标准坐标对应的置信度判断每一标准坐标的重要程度。例如:标准比对数据包括标准坐标(x 1,y 1)以及对应的置信度PA,标准坐标(x 2,y 2)以及对应的置信度PB等,若PA>PB,则表示标准坐标(x 1,y 1)的重要程度大于标准坐标(x 2,y 2)。
S15:通过置信度计算待评估坐标和标准坐标的相似度,得到待评估图像的评估信息。
其中,评估信息是指用于评估待评估图像中的用户姿态的准确度的信息。优选地,评估信息可以为一具体的评估分数,评估分数越高,表明该评估待评估图像中的用户姿态越准确。具体地,可以先根据每一标准坐标对应的置信度,对每一标准坐标进行权重设置,置信度高的标准坐标对应设置较高权重,置信度低的标准坐标对应设置较低权重;然后采用相似度算法,比如余弦相似度算法计算每一待评估坐标与对应的标准坐标之间的相似度,得到初始相似度值,最后再根据每一标准坐标所对应的权重值,将所有初始相似度值进行加权计算统计,从而得到待评估图像的评估信息。
优选地,还可以预先根据置信度和相似度算法定义好评估计算公式,然后直接采用该评估计算公式计算每一待评估坐标和对应的标准坐标的相似度,从而得到待评估图像的评估信息。
在本实施例中,获取待评估图像,待评估图像为包括用户姿态的图像;将待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,人体关键点数据包括关键点坐标和人体包围边框;通过人体包围边框对待评估图像进行缩放处理,并且根据缩放处理后的待评估图像对关键点坐标进行坐标变换,得到待评估坐标;获取标准比对数据,标准比对数据包括标准坐标和每一标准坐标对应的置信度;通过置信度计算待评估坐标和标准坐标的相似度,得到待评估图像的评估信息;通过评估与分析待评估图像中每一待评估坐标和标准坐标的相似度,实现了对用户的人体姿态进行快速和准确的评估,从而便于给用户提供精准的运动指导和有针对性的反馈。
在一实施例中,如图3所示,通过置信度计算待评估坐标和标准坐标的相似度,得到待评估图像的评估信息,具体包括如下步骤:
S151:通过如下公式计算所述待评估坐标和所述标准坐标的相似度:
Figure PCTCN2020093332-appb-000001
其中,D(F,G)为所述待评估坐标和所述标准坐标的相似度,
Figure PCTCN2020093332-appb-000002
为第k个标准坐标对应的置信度,
Figure PCTCN2020093332-appb-000003
为待评估坐标中第k个待评估坐标,
Figure PCTCN2020093332-appb-000004
为标准坐标中第k个标准坐标,K为标准坐标的数量。
具体地,
Figure PCTCN2020093332-appb-000005
为第k个待评估坐标与第k个标准坐标之间的距离,然后,将第k个待评估坐标与第k个标准坐标之间的距离乘于第k个标准坐标对应的置信度,即可得到第k个待评估坐标与第k个标准坐标之间的相似度。可以理解地,可直接采用预先设定的上述公式,得到待评估图像中所有待评估坐标和对应的标准坐标的相似度。可以理解地,待评估坐标和标准坐标的相似度可以通过一具体的数值表示,相似度为1则表示待评估坐标和标准坐标完全相同。例如:得到待评估坐标和标准坐标的相似度可以为0.8、0.85或者0.9等。
S152:将相似度进行转化,得到待评估图像的评估信息。
具体地,在确定了待评估坐标和所述标准坐标的相似度之后,可根据预设的转化规则将相似度进行转化,得到待评估图像的评估信息。优选地,将转化规则可以为先将待评估坐标和标准坐标的相似度化成一具体的评估分值,然后再根据评估分值给出具体的评估建议,从而得到待评估图像的评估信息。例如:若待评估坐标和标准坐标的相似度为0.9,则对应的评估分值可以为90分。
在本实施例中,通过如下公式计算所述待评估坐标和所述标准坐标的相似度:
Figure PCTCN2020093332-appb-000006
其中,D(F,G)为所述待评估坐标和所述标准坐标的相似度,
Figure PCTCN2020093332-appb-000007
为第k个标准坐标对应的置信度,
Figure PCTCN2020093332-appb-000008
为待评估坐标中第k个待评估坐标,
Figure PCTCN2020093332-appb-000009
为标准坐标中第k个标准坐标,K为标准坐标的数量;将相似度进行转化,得到待评估图像的评估信息;从而提高了生成的待评估图像的评估信息的准确性。
在一实施例中,如图4所示,通过人体包围边框对待评估图像进行缩放处理,并且根据缩放处理后的待评估图像对关键点坐标进行坐标变换,得到待评估坐标,具体包括如下步骤:
S131:通过人体包围边框对待评估图像进行裁剪,并按照预设的标准尺寸对裁剪后的待评估图像进行缩放。
具体地,通过人体包围边框对待评估图像进行裁剪是指裁剪掉待评估图像中人体包围边框外部部分、只保留人体包围边框内部部分的过程。具体地,可采用图像裁剪工具实现对待评估图像的裁剪处理。可选地,图像裁剪工具可以为jQuery Jcrop图像裁剪工具或FOTOE图像裁剪工具等。优选地,还可以采用opencv的图像分割算法自动实现对待评估图像的裁剪。
其中,预设的标准尺寸指预先设定的标准图像尺寸。例如:预设的标准尺寸可以为600*600、750*750或800*800等。优选地,在本实施例中,为了提高后续的评估精度,将标准尺寸设定为与裁剪前的待评估图像的图像尺寸大小相同的尺寸。具体地,可采用图像缩放算法对裁剪后的待评估图像进行缩放;或者采用图像缩放工具对裁剪后的待评估图像进行缩放,得到与预设的标准尺寸大小相同的待评估图像。可选地,图像缩放算法可以为双线性内插值算法或三线性卷积插值算法。图像缩放工具可以为photoshop、iResizer或FastStone Photo Resizer。
S132:通过缩放参数对关键点坐标进行坐标变换。
其中,缩放参数是指将标准尺寸与裁剪后的待评估图像的图像尺寸进行比例化后得到的参数。例如:若裁剪后的待评估图像的图像尺寸为600*600,标准尺寸为800*1000;则得到的缩放参数为(4/3,5/3);其中,4/3为x轴方向上的缩放比例,5/3为y轴方向上的缩放比例。具体地,通过得到的缩放参数对关键点坐标进行同等比例的坐标变换。例如:若关键点坐标为(12,15),缩放参数为(4/3,5/3),则进行坐标变换后的关键点坐标为(16,25),即将关键点坐标中x轴方向上的值12缩放4/3倍,将关键点坐标中y轴方向上的值15缩放5/3倍。
S133:将经过坐标变换后的关键点坐标进行L1归一化或者L2归一化处理,得到待评估坐标。
具体地,将经过坐标变换后的关键点坐标视为向量数组,然后对向量数组中的向量进行L1或者L2归一化处理,得到待评估坐标。对向量数组中的向量进行L1归一化处理是指向量数组中的向量缩放为单位范数,对向量数组中的向量进行L2归一化处理是指将向量数组中的每一向量进行一致化处理并对其求和,所得结果将为1,用户可根据实际情况自义定选择任意一种归一化处理方式,本方案不做具体限制。在本步骤中,通过将经过坐标变换后的关键点坐标进行L1或者L2归一化处理,从而保证了生成的待评估坐标的准确性。
在本实施例中,通过人体包围边框对待评估图像进行裁剪,并按照预设的标准尺寸对裁剪后的待评估图像进行缩放,生成缩放参数;通过缩放参数对关键点坐标进行坐标变换; 将经过坐标变换后的关键点坐标进行L1或者L2归一化处理,得到待评估坐标;从而保证了生成的待评估坐标的准确性,进一步提高了后续采用待评估坐标与标准数据进行相似度计算的准确精度。
在一实施例中,如图5所示,提供一种人体姿态评估方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:
S21:获取待处理视频数据,待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据。
其中,待处理视频数据为原始的待处理的视频数据。具体地,待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据。视频采集设备将录制的该待处理视频数据发送到服务端,服务端即获取到待处理视频数据。
S22:根据预设的时间节点从待处理视频数据中提取出待评估图像集,待评估图像集包括N个待评估图像。
其中,预设的时间节点指预先设定对待处理视频数据进行待评估图像提取的时间点。例如:时间节点可以为1分23秒、2分23秒和2分23秒等。可以理解地,预设的时间节点可以为一个或者多个。优选地,可以在待处理视频数据中的起始阶段、中间阶段和最后阶段中分别设置一个时间节点。具体地,根据预设的时间节点,从待处理视频数据中提取出每一时间节点所对应的待评估图像,然后,将每一时间节点所对应的待评估图像组成待评估图像集,待评估图像集包括N个待评估图像。
具体地,可以采用FFmpeg中的滤镜(filter)功能来实现对待处理视频数据的图像提取。其中,FFmpeg是一套可以用来记录、转换数字音频、视频,并能将其转化为流的开源计算机可读指令。采用filter中的crop函数实现对待处理视频数据的图像提取。具体地,通过crop=width:height:x:y来实现对待处理视频数据进行图像截取。优选地,为了避免提取出的某一时间节点所对应的待评估图像出现失真或者模糊现象,在本实施例中,从待处理视频数据中提取出的每一时间节点所对应的待评估图像至少为两个。
S23:采用人体姿态评估方法,对待评估图像集中的每一待评估图像进行评估,得到每一待评估图像的评估信息。
具体地,采用上述实施例中的人体姿态评估方法,对待评估图像集中的每一待评估图像进行评估,即可得到每一待评估图像的评估信息。在此不做冗余赘述。
S24:根据每一待评估图像的评估信息,计算待处理视频数据的评估分数。
具体地,在确定了每一待评估图像的评估信息之后,将每一待评估图像的评估信息进行整合处理,即可得到待处理视频数据的评估分数。由于每一待评估图像的评估信息中包括有对应的评估分数,因此,在本步骤中,将每一待评估图像的评估信息中的评估分数进行统计求和后再求平均值,即可得到待处理视频数据的评估分数。优选地,还可以预先对每一待评估图像进行权重设置,然后根据每一待评估图像的权重值,对每一待评估图像的评估信息中的评估分数进行加权计算统计求和后再求平均值,从而得到待处理视频数据的评估分数。
在本实施例中,通过获取待处理视频数据,待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;根据预设的时间节点从待处理视频数据中提取出待评估图像集,待评估图像集包括N个待评估图像;采用人体姿态评估方法,对待评估图像集中的每一待评估图像进行评估,得到每一待评估图像的评估信息;根据每一待评估图像的评估信息,计算待处理视频数据的评估分数;通过提取待处理视频数据中的待评估图像,然后根据待评估图像的评估信息计算待处理视频数据的评估分数,从而实现了对待处理视频数据中用户的人体姿态进行快速和准确的评估。
在一实施例中,时间节点包括M个子时间节点,每一子时间节点对应至少一个待评估图像;
根据预设的时间节点从待处理视频数据中提取出待评估图像集,具体包括如下步骤:
根据每一子时间节点从所述待处理视频数据中提取预设数量的待评估图像,将每一子时间节点对应的所述待评估图像组成待评估图像集。
其中,预设数量指预先设定的从待处理视频数据中提取出每一子时间节点所对应的待评估图像的数量。在本实施例中,预设数量大于或等于2。可以理解地,预设数量越多,即待评估图像的数量越多,后续对待处理视频数据进行评估的精度就越高,然而服务端计算复杂度也会越高,具体的数量可以根据不同应用场景需要而设定。若侧重于识别精度,可以提高预设数量,若侧重于识别效率,可以适当降低待预设数量。
具体地,时间节点包括M个子时间节点,每一子时间节点对应至少一个待评估图像。根据每一子时间节点从待处理视频数据中提取预设数量的待评估图像,然后将每一子时间节点对应的待评估图像组成待评估图像集。
在一实施例中,如图6所示,根据每一待评估图像的评估信息,计算待处理视频数据的评估分数,具体包括如下步骤:
S241:从每一子时间节点中对应的预设数量的待评估图像中,确定目标评估信息,其中,目标评估信息为指示和对应的标准坐标相似度最高的评估信息。
具体地,由于每一子时间节点中对应的待评估图像至少为两个,则可以从每一子时间节点中对应的预设数量的待评估图像中,选取最具代表性的一个待评估图像作为目标评估图像,并将每一目标评估图像所对应的评估信息确定为目标评估信息。目标评估信息为指示和对应的标准坐标相似度最高的评估信息。具体地,从每一子时间节点中对应的预设数量的待评估图像中选取目标评估图像可以通过预先训练对应的神经网络模型,得到一个姿态识别模型来实现。即通过将大量代表不同姿态的图像数据进行标注之后输入到一个神经网络模型中进行训练,即得到姿态识别模型。
S242:根据每一子时间节点对应的目标评估信息,计算待处理视频数据的评估分数。
具体地,在确定了每一子时间节点对应的目标评估信息之后,再将每一子时间节点对应的目标评估信息进行整合处理,即可得到待处理视频数据的评估分数。可以理解地,由于每一目标评估信息中包括有对应的评估分数,因此,在本步骤中,将每一目标评估信息中的评估分数进行统计求和后再求平均值,即可得到待处理视频数据的评估分数。
在本实施例中,从每一子时间节点中对应的预设数量的待评估图像中,确定目标评估信息,其中,目标评估信息为指示和对应的标准坐标相似度最高的评估信息;根据每一子时间节点对应的目标评估信息,计算待处理视频数据的评估分数;从而进一步提高了计算得到的待处理视频数据的评估分数的准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种人体姿态评估装置,该人体姿态评估装置与上述实施例中人体姿态评估方法一一对应。如图7所示,该人体姿态评估装置包括待评估图像获取模块11、输入模块12、缩放处理模块13、标准比对数据获取模块14和相似度计算模块15。各功能模块详细说明如下:
待评估图像获取模块11,用于获取待评估图像,待评估图像为包括用户姿态的图像;
输入模块12,用于将待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,人体关键点数据包括关键点坐标和人体包围边框;
缩放处理模块13,用于通过人体包围边框对待评估图像进行缩放处理,并且根据缩放处理后的待评估图像对关键点坐标进行坐标变换,得到待评估坐标;
标准比对数据获取模块14,用于获取标准比对数据,标准比对数据包括标准坐标和每一标准坐标对应的置信度;
相似度计算模块15,用于通过置信度计算待评估坐标和标准坐标的相似度,得到待评 估图像的评估信息。
优选地,如图8所示,相似度计算模块15包括:
相似度计算单元151,用于通过如下公式计算待评估坐标和标准坐标的相似度:
Figure PCTCN2020093332-appb-000010
其中,D(F,G)为待评估坐标和标准坐标的相似度,
Figure PCTCN2020093332-appb-000011
为第k个标准坐标对应的置信度,
Figure PCTCN2020093332-appb-000012
为待评估坐标中第k个待评估坐标,
Figure PCTCN2020093332-appb-000013
为标准坐标中第k个标准坐标,K为标准坐标的数量;
转化单元152,用于将相似度进行转化,得到待评估图像的评估信息。
优选地,缩放处理模块13包括:
裁剪缩放单元,用于通过人体包围边框对待评估图像进行裁剪,并按照预设的标准尺寸对裁剪后的待评估图像进行缩放;
坐标变换单元,用于通过缩放参数对关键点坐标进行坐标变换;
归一化处理单元,用于将经过坐标变换后的关键点坐标进行L1归一化或者L2归一化处理,得到待评估坐标。
优选地,人体姿态评估装置还包括:
待处理视频数据获取模块,用于获取待处理视频数据,待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;
提取模块,用于根据预设的时间节点从待处理视频数据中提取出待评估图像集,待评估图像集包括N个待评估图像;
评估模块,用于采用人体姿态评估方法,对待评估图像集中的每一待评估图像进行评估,得到每一待评估图像的评估信息;
评估分数计算模块,用于根据每一待评估图像的评估信息,计算待处理视频数据的评估分数。
优选地,提取模块包括:
提取单元,用于根据每一子时间节点从待处理视频数据中提取预设数量的待评估图像,将每一子时间节点对应的待评估图像组成待评估图像集。
优选地,评估分数计算模块包括:
目标评估信息确定单元,用于从每一子时间节点中对应的预设数量的待评估图像中,确定目标评估信息,其中,目标评估信息为指示和对应的标准坐标相似度最高的评估信息;
评估分数计算单元,用于根据每一子时间节点对应的目标评估信息,计算待处理视频数据的评估分数。
关于人体姿态评估装置的具体限定可以参见上文中对于人体姿态评估方法的限定,在此不再赘述。上述人体姿态评估装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作***、计算机可读指令和数据库。该内存储器为可读存储介质中的操作***和计算机可读指令的运行提供环境。该计算 机设备的数据库用于存储上述实施例人体姿态评估方法中使用到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种人体姿态评估方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中的人体姿态评估方法。
在一个实施例中提供了一个或多个存储有计算机可读指令的可读存储介质,本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质;该可读存储介质上存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现上述实施例中的人体姿态评估方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或易失性可读存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种人体姿态评估方法,其中,包括:
    获取待评估图像,所述待评估图像为包括用户姿态的图像;
    将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
    通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
    获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
    通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
  2. 如权利要求1所述的人体姿态评估方法,其中,所述通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息,包括:
    通过如下公式计算所述待评估坐标和所述标准坐标的相似度:
    Figure PCTCN2020093332-appb-100001
    其中,D(F,G)为所述待评估坐标和所述标准坐标的相似度,
    Figure PCTCN2020093332-appb-100002
    为第k个标准坐标对应的置信度,
    Figure PCTCN2020093332-appb-100003
    为待评估坐标中第k个待评估坐标,
    Figure PCTCN2020093332-appb-100004
    为标准坐标中第k个标准坐标,K为标准坐标的数量;
    将所述相似度进行转化,得到所述待评估图像的评估信息。
  3. 如权利要求1所述的人体姿态评估方法,其中,所述通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标,包括:
    通过所述人体包围边框对所述待评估图像进行裁剪,并按照预设的标准尺寸对所述裁剪后的待评估图像进行缩放;
    通过缩放参数对所述关键点坐标进行坐标变换;
    将经过坐标变换后的所述关键点坐标进行L1归一化或者L2归一化处理,得到待评估坐标。
  4. 一种人体姿态评估方法,其中,包括:
    获取待处理视频数据,所述待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;
    根据预设的时间节点从所述待处理视频数据中提取出待评估图像集,所述待评估图像集包括N个待评估图像;
    采用权利要求1所述的人体姿态评估方法,对所述待评估图像集中的每一待评估图像进行评估,得到每一所述待评估图像的评估信息;
    根据每一所述待评估图像的所述评估信息,计算所述待处理视频数据的评估分数。
  5. 如权利要求4所述的人体姿态评估方法,其中,所述时间节点包括M个子时间节点,每一所述子时间节点对应至少一个待评估图像;
    所述根据预设的时间节点从所述待处理视频数据中提取出待评估图像集,包括:
    根据每一子时间节点从所述待处理视频数据中提取预设数量的待评估图像,将每一子时间节点对应的所述待评估图像组成待评估图像集。
  6. 如权利要求5所述的人体姿态评估方法,其中,所述根据每一所述待评估图像的所述评估信息,计算所述待处理视频数据的评估分数,包括:
    从每一子时间节点中对应的预设数量的待评估图像中,确定目标评估信息,其中,所述目标评估信息为指示和对应的标准坐标相似度最高的评估信息;
    根据每一子时间节点对应的所述目标评估信息,计算所述待处理视频数据的评估分数。
  7. 一种人体姿态评估装置,其中,包括:
    待评估图像获取模块,用于获取待评估图像,所述待评估图像为包括用户姿态的图像;
    输入模块,用于将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
    缩放处理模块,用于通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
    标准比对数据获取模块,用于获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
    相似度计算模块,用于通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
  8. 如权利要求7所述的人体姿态评估装置,其中,所述相似度计算模块包括:
    相似度计算单元,用于通过如下公式计算所述待评估坐标和所述标准坐标的相似度:
    Figure PCTCN2020093332-appb-100005
    其中,D(F,G)为所述待评估坐标和所述标准坐标的相似度,
    Figure PCTCN2020093332-appb-100006
    为第k个标准坐标对应的置信度,
    Figure PCTCN2020093332-appb-100007
    为待评估坐标中第k个待评估坐标,
    Figure PCTCN2020093332-appb-100008
    为标准坐标中第k个标准坐标,K为标准坐标的数量;
    转化单元,用于将所述相似度进行转化,得到所述待评估图像的评估信息。
  9. 如权利要求7所述的人体姿态评估装置,其中,所述缩放处理模块包括:
    裁剪缩放单元,用于通过人体包围边框对待评估图像进行裁剪,并按照预设的标准尺寸对裁剪后的待评估图像进行缩放;
    坐标变换单元,用于通过缩放参数对关键点坐标进行坐标变换;
    归一化处理单元,用于将经过坐标变换后的关键点坐标进行L1归一化或者L2归一化处理,得到待评估坐标。
  10. 一种人体姿态评估装置,其中,包括:
    待处理视频数据获取模块,用于获取待处理视频数据,待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;
    提取模块,用于根据预设的时间节点从待处理视频数据中提取出待评估图像集,待评估图像集包括N个待评估图像;
    评估模块,用于采用人体姿态评估方法,对待评估图像集中的每一待评估图像进行评估,得到每一待评估图像的评估信息;
    评估分数计算模块,用于根据每一待评估图像的评估信息,计算待处理视频数据的评估分数。
  11. 如权利要求10所述的人体姿态评估装置,其中,所述时间节点包括M个子时间节点,每一所述子时间节点对应至少一个待评估图像;
    所述提取模块包括:
    提取单元,用于根据每一子时间节点从待处理视频数据中提取预设数量的待评估图像,将每一子时间节点对应的待评估图像组成待评估图像集。
  12. 如权利要求11所述的人体姿态评估装置,其中,所述评估分数计算模块包括:
    目标评估信息确定单元,用于从每一子时间节点中对应的预设数量的待评估图像中,确定目标评估信息,其中,目标评估信息为指示和对应的标准坐标相似度最高的评估信息;
    评估分数计算单元,用于根据每一子时间节点对应的目标评估信息,计算待处理视频数据的评估分数。
  13. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待评估图像,所述待评估图像为包括用户姿态的图像;
    将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
    通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
    获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
    通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
  14. 如权利要求13所述的计算机设备,其中,所述通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息,包括:
    通过如下公式计算所述待评估坐标和所述标准坐标的相似度:
    Figure PCTCN2020093332-appb-100009
    其中,D(F,G)为所述待评估坐标和所述标准坐标的相似度,
    Figure PCTCN2020093332-appb-100010
    为第k个标准坐标对应的置信度,
    Figure PCTCN2020093332-appb-100011
    为待评估坐标中第k个待评估坐标,
    Figure PCTCN2020093332-appb-100012
    为标准坐标中第k个标准坐标,K为标准坐标的数量;
    将所述相似度进行转化,得到所述待评估图像的评估信息。
  15. 如权利要求13所述的计算机设备,其中,所述通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标,包括:
    通过所述人体包围边框对所述待评估图像进行裁剪,并按照预设的标准尺寸对所述裁剪后的待评估图像进行缩放;
    通过缩放参数对所述关键点坐标进行坐标变换;
    将经过坐标变换后的所述关键点坐标进行L1归一化或者L2归一化处理,得到待评估坐标。
  16. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待处理视频数据,所述待处理视频数据为通过视频采集设备录制的包括用户姿态的视频数据;
    根据预设的时间节点从所述待处理视频数据中提取出待评估图像集,所述待评估图像集包括N个待评估图像;
    采用人体姿态评估方法,对所述待评估图像集中的每一待评估图像进行评估,得到每一所述待评估图像的评估信息;
    根据每一所述待评估图像的所述评估信息,计算所述待处理视频数据的评估分数。
  17. 如权利要求16所述的计算机设备,其中,所述时间节点包括M个子时间节点,每一所述子时间节点对应至少一个待评估图像;
    所述根据预设的时间节点从所述待处理视频数据中提取出待评估图像集,包括:
    根据每一子时间节点从所述待处理视频数据中提取预设数量的待评估图像,将每一子时间节点对应的所述待评估图像组成待评估图像集。
  18. 如权利要求17所述的计算机设备,其中,所述根据每一所述待评估图像的所述评估信息,计算所述待处理视频数据的评估分数,包括:
    从每一子时间节点中对应的预设数量的待评估图像中,确定目标评估信息,其中,所述目标评估信息为指示和对应的标准坐标相似度最高的评估信息;
    根据每一子时间节点对应的所述目标评估信息,计算所述待处理视频数据的评估分数。
  19. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取待评估图像,所述待评估图像为包括用户姿态的图像;
    将所述待评估图像输入预设的人体姿态估计网络,得到人体关键点数据,所述人体关键点数据包括关键点坐标和人体包围边框;
    通过所述人体包围边框对所述待评估图像进行缩放处理,并且根据缩放处理后的所述待评估图像对所述关键点坐标进行坐标变换,得到待评估坐标;
    获取标准比对数据,所述标准比对数据包括标准坐标和每一所述标准坐标对应的置信度;
    通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息。
  20. 如权利要求19所述的可读存储介质,其中,所述通过所述置信度计算所述待评估坐标和所述标准坐标的相似度,得到所述待评估图像的评估信息,包括:
    通过如下公式计算所述待评估坐标和所述标准坐标的相似度:
    Figure PCTCN2020093332-appb-100013
    其中,D(F,G)为所述待评估坐标和所述标准坐标的相似度,
    Figure PCTCN2020093332-appb-100014
    为第k个标准坐标对应的置信度,
    Figure PCTCN2020093332-appb-100015
    为待评估坐标中第k个待评估坐标,
    Figure PCTCN2020093332-appb-100016
    为标准坐标中第k个标准坐标,K为标准坐标的数量;
    将所述相似度进行转化,得到所述待评估图像的评估信息。
PCT/CN2020/093332 2020-03-06 2020-05-29 人体姿态评估方法、装置、计算机设备及存储介质 WO2021174697A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010152307.9A CN111476097A (zh) 2020-03-06 2020-03-06 人体姿态评估方法、装置、计算机设备及存储介质
CN202010152307.9 2020-03-06

Publications (1)

Publication Number Publication Date
WO2021174697A1 true WO2021174697A1 (zh) 2021-09-10

Family

ID=71747201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093332 WO2021174697A1 (zh) 2020-03-06 2020-05-29 人体姿态评估方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN111476097A (zh)
WO (1) WO2021174697A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113838133A (zh) * 2021-09-23 2021-12-24 上海商汤科技开发有限公司 一种状态检测方法、装置、计算机设备和存储介质
CN113947614A (zh) * 2021-10-25 2022-01-18 北京影谱科技股份有限公司 一种人体3d姿态估计方法、装置及***
CN114140832A (zh) * 2022-01-30 2022-03-04 西安华创马科智能控制***有限公司 井下行人越界风险检测方法、装置、电子设备及存储介质
CN114241595A (zh) * 2021-11-03 2022-03-25 橙狮体育(北京)有限公司 数据处理方法、装置、电子设备及计算机存储介质
CN115909394A (zh) * 2022-10-25 2023-04-04 珠海视熙科技有限公司 一种坐姿识别的方法、装置、智能台灯及计算机存储介质
CN117086519A (zh) * 2023-08-22 2023-11-21 江苏凯立达数据科技有限公司 基于工业互联网的联网设备数据分析及评估***、方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329527B (zh) * 2020-09-29 2023-12-08 深圳大学 一种姿态估计方法、装置、电子设备及存储介质
CN112200074A (zh) * 2020-10-09 2021-01-08 广州健康易智能科技有限公司 一种姿态对比的方法和终端
CN112418153B (zh) * 2020-12-04 2024-06-11 上海商汤科技开发有限公司 图像处理方法、装置、电子设备和计算机存储介质
CN112800923A (zh) * 2021-01-22 2021-05-14 北京市商汤科技开发有限公司 人体图像质量检测方法及装置、电子设备、存储介质
CN113297963A (zh) * 2021-05-24 2021-08-24 网易(杭州)网络有限公司 多人姿态的估计方法、装置、电子设备以及可读存储介质
CN113239849B (zh) * 2021-05-27 2023-12-19 数智引力(厦门)运动科技有限公司 健身动作质量评估方法、***、终端设备及存储介质
CN113420719B (zh) * 2021-07-20 2022-07-22 北京百度网讯科技有限公司 生成动作捕捉数据的方法、装置、电子设备以及存储介质
CN113611387B (zh) * 2021-07-30 2023-07-14 清华大学深圳国际研究生院 一种基于人***姿估计的运动质量评估方法及终端设备
CN115222871B (zh) * 2021-08-31 2023-04-18 达闼科技(北京)有限公司 模型评估方法、装置、存储介质及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684943A (zh) * 2018-12-07 2019-04-26 北京首钢自动化信息技术有限公司 一种运动员辅助训练数据获取方法、装置及电子设备
US20190179420A1 (en) * 2009-01-29 2019-06-13 Sony Corporation Information processing device and method, program and recording medium for identifying a gesture of a person from captured image data
US20190175121A1 (en) * 2016-02-24 2019-06-13 Preaction Technology Corporation, dba/4c Sports Corporation Method and System for Determining Physiological Status of Users Based on Marker-Less Motion Capture and Generating Appropriate Remediation Plans
CN110059522A (zh) * 2018-01-19 2019-07-26 北京市商汤科技开发有限公司 人体轮廓关键点检测方法、图像处理方法、装置及设备
CN110354480A (zh) * 2019-07-26 2019-10-22 南京邮电大学 一种基于姿态比对的高尔夫挥杆动作评分估计方法
CN110575663A (zh) * 2019-09-25 2019-12-17 郑州大学 一种基于人工智能的体育辅助训练方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190179420A1 (en) * 2009-01-29 2019-06-13 Sony Corporation Information processing device and method, program and recording medium for identifying a gesture of a person from captured image data
US20190175121A1 (en) * 2016-02-24 2019-06-13 Preaction Technology Corporation, dba/4c Sports Corporation Method and System for Determining Physiological Status of Users Based on Marker-Less Motion Capture and Generating Appropriate Remediation Plans
CN110059522A (zh) * 2018-01-19 2019-07-26 北京市商汤科技开发有限公司 人体轮廓关键点检测方法、图像处理方法、装置及设备
CN109684943A (zh) * 2018-12-07 2019-04-26 北京首钢自动化信息技术有限公司 一种运动员辅助训练数据获取方法、装置及电子设备
CN110354480A (zh) * 2019-07-26 2019-10-22 南京邮电大学 一种基于姿态比对的高尔夫挥杆动作评分估计方法
CN110575663A (zh) * 2019-09-25 2019-12-17 郑州大学 一种基于人工智能的体育辅助训练方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113838133A (zh) * 2021-09-23 2021-12-24 上海商汤科技开发有限公司 一种状态检测方法、装置、计算机设备和存储介质
CN113947614A (zh) * 2021-10-25 2022-01-18 北京影谱科技股份有限公司 一种人体3d姿态估计方法、装置及***
CN114241595A (zh) * 2021-11-03 2022-03-25 橙狮体育(北京)有限公司 数据处理方法、装置、电子设备及计算机存储介质
CN114140832A (zh) * 2022-01-30 2022-03-04 西安华创马科智能控制***有限公司 井下行人越界风险检测方法、装置、电子设备及存储介质
CN115909394A (zh) * 2022-10-25 2023-04-04 珠海视熙科技有限公司 一种坐姿识别的方法、装置、智能台灯及计算机存储介质
CN115909394B (zh) * 2022-10-25 2024-04-05 珠海视熙科技有限公司 一种坐姿识别的方法、装置、智能台灯及计算机存储介质
CN117086519A (zh) * 2023-08-22 2023-11-21 江苏凯立达数据科技有限公司 基于工业互联网的联网设备数据分析及评估***、方法
CN117086519B (zh) * 2023-08-22 2024-04-12 京闽数科(北京)有限公司 基于工业互联网的联网设备数据分析及评估***、方法

Also Published As

Publication number Publication date
CN111476097A (zh) 2020-07-31

Similar Documents

Publication Publication Date Title
WO2021174697A1 (zh) 人体姿态评估方法、装置、计算机设备及存储介质
US11450080B2 (en) Image processing method and apparatus, and storage medium
Rao et al. Deep convolutional neural networks for sign language recognition
WO2021114892A1 (zh) 基于环境语义理解的人体行为识别方法、装置、设备及存储介质
WO2022000420A1 (zh) 人体动作识别方法、人体动作识别***及设备
CN111582141B (zh) 人脸识别模型训练方法、人脸识别方法及装置
WO2021052375A1 (zh) 目标图像生成方法、装置、服务器及存储介质
Liu et al. Bayesian model adaptation for crowd counts
Christa et al. CNN-based mask detection system using openCV and MobileNetV2
US20210110146A1 (en) Action recognition method and apparatus and electronic equipment
WO2023001063A1 (zh) 目标检测方法、装置、电子设备及存储介质
CN110874865A (zh) 三维骨架生成方法和计算机设备
CN112164091B (zh) 基于三维骨架提取的移动设备人***姿估计方法
CN110751039A (zh) 多视图3d人体姿态估计方法及相关装置
CN111723707A (zh) 一种基于视觉显著性的注视点估计方法及装置
Qian et al. Joint optimal transport with convex regularization for robust image classification
WO2021217937A1 (zh) 姿态识别模型的训练方法及设备、姿态识别方法及其设备
Hu et al. Face restoration via plug-and-play 3D facial priors
CN113269013B (zh) 对象行为分析方法、信息显示方法及电子设备
CN115471863A (zh) 三维姿态的获取方法、模型训练方法和相关设备
Sakurai et al. Restoring aspect ratio distortion of natural images with convolutional neural network
CN113298052B (zh) 一种基于高斯注意力的人脸检测装置、方法和存储介质
CN112633224B (zh) 一种社交关系识别方法、装置、电子设备及存储介质
CN114841851A (zh) 图像生成方法、装置、电子设备及存储介质
Hsu et al. A PSO-SVM lips recognition method based on active basis model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923612

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923612

Country of ref document: EP

Kind code of ref document: A1