CN110781765A

CN110781765A - Human body posture recognition method, device, equipment and storage medium

Info

Publication number: CN110781765A
Application number: CN201910942510.3A
Authority: CN
Inventors: 姚永强; 葛彦昊; 张伟; 曹煊; 倪辉; 汪铖杰; 李季檩; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-02-11
Anticipated expiration: 2039-09-30
Also published as: CN110781765B

Abstract

The embodiment of the invention provides a human body posture identification method, a human body posture identification device, human body posture identification equipment and a storage medium; the method comprises the following steps: acquiring an image to be identified; the image to be recognized comprises imaging information of a human body; carrying out human body detection on an image to be recognized to obtain at least one piece of human body information; carrying out multi-scale recognition on the human body posture key points in the image to be recognized to obtain multi-scale key points; performing multi-scale fusion on the multi-scale key points to obtain target key points; distributing the target key points to each piece of human body information to obtain human body posture information and finish human body posture recognition; the human body posture information is a set formed by the postures of each human body information. By the embodiment of the invention, the accuracy of human body posture recognition can be improved.

Description

Human body posture recognition method, device, equipment and storage medium

Technical Field

The invention relates to a computer vision technology in the field of artificial intelligence, in particular to a human body posture recognition method, a human body posture recognition device, human body posture recognition equipment and a storage medium.

Background

Artificial Intelligence (AI) is a new scientific technology that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. In recent years, with the development of artificial intelligence, the computer vision technology based on artificial intelligence is also rapidly developed, and human posture recognition is taken as an important method, so that the method has a great application prospect in a plurality of fields such as intelligent business surpassing, security protection and monitoring.

Generally, when human body posture recognition is performed, key points in an image to be recognized are detected, and then the key points are distributed to each person, so that posture information of each person is obtained. However, in the human body posture recognition process, since the key points are obtained from the information output from the last layer of the network model, and the downsampling operation needs to be performed on the image to be recognized in sequence in the processing process from the first layer to the last layer of the network model, there is a pixel loss; therefore, the accuracy of the key points is low, and the accuracy of the recognition result is low when the human body posture recognition is realized according to the key points.

Disclosure of Invention

The embodiment of the invention provides a human body posture recognition method, a human body posture recognition device and a storage medium, which can improve the accuracy of human body posture recognition.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a human body posture identification method, which comprises the following steps:

acquiring an image to be identified; the image to be recognized comprises imaging information of a human body;

carrying out human body detection on the image to be identified to obtain at least one piece of human body information;

carrying out multi-scale recognition on the human body posture key points in the image to be recognized to obtain multi-scale key points;

performing multi-scale fusion on the multi-scale key points to obtain target key points;

distributing the target key points to each piece of human body information to obtain human body posture information and finish human body posture recognition; the human body posture information is a set formed by the postures of the human body information.

In the above scheme, after the original keypoint recognition model is trained by using the target sample image to obtain the preset keypoint recognition model, the method further includes:

acquiring a new target sample image;

optimizing the preset key point identification model by using the new target sample image;

correspondingly, the performing multi-scale recognition on the human body posture key points in the image to be recognized by using a preset key point recognition model to obtain the multi-scale key points includes:

and carrying out multi-scale recognition on the human body posture key points in the image to be recognized by using the optimized preset key point recognition model to obtain the multi-scale key points.

The embodiment of the invention provides a human body posture recognition device, which comprises:

the image acquisition module is used for acquiring an image to be identified; the image to be recognized comprises imaging information of a human body;

the human body detection module is used for carrying out human body detection on the image to be identified to obtain at least one piece of human body information;

the key point identification module is used for carrying out multi-scale identification on the human body posture key points in the image to be identified to obtain multi-scale key points;

the key point fusion module is used for carrying out multi-scale fusion on the multi-scale key points to obtain target key points;

the key point distribution module is used for distributing the target key points to each piece of human body information to obtain human body posture information and finish human body posture recognition; the human body posture information is a set formed by the postures of the human body information.

a memory for storing executable instructions;

and the processor is used for realizing the method provided by the embodiment of the invention when executing the executable instructions stored in the memory.

Embodiments of the present invention provide a storage medium storing executable instructions for causing a processor to execute the method provided by the embodiments of the present invention.

The embodiment of the invention has the following beneficial effects: the target key points used for determining the posture of each human body information in the image to be recognized are obtained by fusing key points under at least two scales of the image to be recognized; therefore, the accuracy of the target key points is high; therefore, when the human body posture is recognized according to the target key points, the recognition result is high in accuracy, and the accuracy of human body posture recognition is improved.

Drawings

FIG. 1 is an alternative architecture diagram of a human gesture recognition system 100 provided by an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a server 200 according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of an alternative human body gesture recognition method according to an embodiment of the present invention;

FIG. 4 is an exemplary diagram of human detection provided by embodiments of the present invention;

FIG. 5 is a schematic flow processing diagram of a human body posture recognition method according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart of extracting multi-scale features according to an embodiment of the present invention;

FIG. 7 is an exemplary diagram of multi-scale fusion provided by embodiments of the present invention;

FIG. 8 is an exemplary diagram for obtaining key points of target individual types according to an embodiment of the present invention;

FIG. 9 is a schematic view of an alternative process for obtaining an image of a target specimen according to an embodiment of the present invention;

FIG. 10 is an exemplary diagram of human body part recognition results provided by an embodiment of the invention;

FIG. 11 is an exemplary diagram of a human body database provided by an embodiment of the invention;

FIG. 12 is an exemplary fusion schematic provided by embodiments of the present invention;

FIG. 13 is a schematic flow chart of another alternative human body posture recognition method provided by the embodiment of the invention;

FIG. 14 is a schematic flow chart of still another alternative human body posture recognition method provided by the embodiment of the invention;

FIG. 15 is an exemplary flow diagram of human gesture recognition provided by embodiments of the present invention;

FIG. 16 is a first exemplary diagram of a human gesture application provided by an embodiment of the invention;

fig. 17 is an exemplary diagram two of a human body gesture application provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the embodiments of the present invention is for the purpose of describing the embodiments of the present invention only and is not intended to be limiting of the present invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) An artificial neural network is a mathematical model simulating the structure and function of a biological neural network, and comprises an input layer, an intermediate layer and an output layer, wherein each layer is formed by connecting a large number of processing units, each node processes input data by using an excitation function and outputs the processed data to other nodes, and exemplary types of the excitation function comprise a threshold type, a linear type, an S growth curve (Sigmoid) type and the like.

2) Machine Learning (ML) is a multi-domain cross discipline, relating to multi-domain disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Specially researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills; reorganizing the existing knowledge structure to improve the performance of the knowledge structure. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and inductive learning.

3) Deep Learning (DL), a new research direction in the field of machine Learning; deep learning is to learn the internal rules and the expression levels of sample data, and the final aim is to enable a machine to have the analysis and learning capacity like a human and to recognize data such as characters, images and sounds; deep learning is a complex machine learning algorithm.

4) Convolutional Neural Networks (CNN), which is a kind of feed-forward Neural network including convolution calculation and having a deep structure, is one of the representative algorithms for deep learning.

5) A Partial Affinity Field (PAF), which refers to a set of variable number of flow Field representations encoded by unstructured correspondence of human body parts; the PAF is used for carrying out confidence measure on each pair of body parts, namely the body parts belong to the same human body.

6) The heat map, also called thermodynamic diagram, refers specifically to an image in which the key points of the human posture are displayed by heat points in the embodiment of the invention.

With the research and progress of artificial intelligence technology, the artificial intelligence technology develops research and application in a plurality of fields; for example, common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autonomous, unmanned, robotic, smart medical, and smart customer service, etc.; with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important value; artificial intelligence can also be applied, for example, in the field of computer vision.

Here, it should be noted that artificial intelligence is a comprehensive technique of computer science, which attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

In addition, the artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The inventor finds that in the process of implementing the invention, in the scheme provided by the related technology, a top-down identification method and a bottom-up identification method are generally adopted for human posture identification. The top-down identification method comprises the steps of firstly positioning the approximate position of a human body, and then specifically identifying the human body posture of each piece of human body information; for example, a position frame of each piece of human body information in the image to be recognized is detected, then, the human body posture key point detection is performed on each piece of human body information on the basis of the position frame, and finally, the human body posture of each piece of human body information is obtained. The bottom-up recognition method firstly detects key points of all human postures in an image to be recognized, and then clusters the key points of all human postures on different human bodies; for example, each key point of the human body posture is calculated by adopting a human body posture heat map, the calculated key points are connected by adopting a partial affinity field, and finally the human body posture of each human body information is obtained by adopting a bipartite graph solving method of graph theory.

However, with the top-down recognition method, the detection effect is poor when the human body in the image to be recognized approaches to generate spatial interference. Therefore, since the human body posture recognition is mostly a scene of a plurality of human bodies, a bottom-up recognition method is often selected. For the bottom-up recognition method, the key points of all human postures in the image are detected firstly, and are allocated to each human body information, so that the acquisition of the key points of the human postures is crucial to the result of human posture recognition, and the acquisition of the key points of the human postures is related to the recognition capability of a network model and the structure of the network model, the network model generally has poor distinguishing capability on easily confused limbs, and when the network model processes the image to be recognized, the data is output by adopting a down-sampling mode, and the output result of the last layer of the network model is used as the output result of the model, so that pixel loss exists; therefore, the accuracy of the obtained key points of the human body posture is poor, so that the accuracy of the recognition result is low when the human body posture recognition is realized according to the key points.

Based on this, embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for human body posture recognition, which can realize human body posture recognition based on artificial intelligence, and improve accuracy of human body posture recognition. In addition, the scheme provided by the embodiment of the invention relates to the computer vision technology of artificial intelligence, for example, human posture information is obtained according to a model; details will be explained below.

An exemplary application of the human body posture recognition device provided by the embodiment of the present invention is described below, and the human body posture recognition device provided by the embodiment of the present invention may be implemented as various types of user terminals such as a smart phone, a tablet computer, and a notebook computer, and may also be implemented as a server. Next, an exemplary application when the human body posture recognition apparatus is implemented as a server will be explained.

Referring to fig. 1, fig. 1 is an optional architecture diagram of a human body gesture recognition system 100 according to an embodiment of the present invention, in order to implement supporting a human body gesture recognition application, where a server 200 is a human body gesture recognition device according to an embodiment of the present invention, a terminal 400 (exemplary showing a terminal 400-1 and a terminal 400-2) is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 400 is used for collecting an image to be recognized, transmitting the image to be recognized to the server 200 through the network 300, and displaying human body posture information obtained from the server 200 through the network 300 on a graphical interface 410 (a graphical interface 410-1 and a graphical interface 410-2 are exemplarily shown); the server 200 is configured to obtain an image to be recognized from the terminal 400 through the network 300 for human body posture recognition, and send the recognized human body posture information to the terminal 400 through the network 300 for display.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 200 according to an embodiment of the present invention, where the server 200 shown in fig. 2 includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments of the invention is intended to comprise any suitable type of memory. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless-compatibility authentication (Wi-Fi), and Universal Serial Bus (USB), etc.;

a display module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;

an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.

In some embodiments, the human body gesture recognition apparatus provided by the embodiments of the present invention may be implemented in software, and fig. 2 shows a human body gesture recognition apparatus 255 stored in a memory 250, which may be software in the form of programs and plug-ins, and includes the following software modules: an image acquisition module 2551, a human body detection module 2552, a key point identification module 2553, a key point fusion module 2554, a key point assignment module 2555, a training module 2556, an identification module 2557 and an application module 2558, the functions of which will be described below.

In other embodiments, the human body gesture recognition apparatus provided by the embodiments of the present invention may be implemented in hardware, and for example, the human body gesture recognition apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the human body gesture recognition method provided by the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In the following, the human body gesture recognition method provided by the embodiment of the present invention will be described in conjunction with an exemplary application and implementation in which the human body gesture recognition apparatus provided by the embodiment of the present invention is implemented as a server.

Referring to fig. 3, fig. 3 is an alternative flow chart of a human body gesture recognition method provided by an embodiment of the present invention, which will be described with reference to the steps shown in fig. 3.

S101, acquiring an image to be identified; the image to be recognized contains imaging information of a human body.

In the embodiment of the invention, when the human body posture recognition equipment carries out human body posture recognition, an obtained object is an image to be recognized; here, the image to be recognized includes imaging information of a human body, such as a person image.

It should be noted that the image to be recognized may be an image captured by a capturing device, may also be an image in an image sequence corresponding to a video captured by the capturing device, and the like, which is not specifically limited in this embodiment of the present invention. In addition, human posture recognition refers to a process of describing human skeletal information by detecting key points (such as joints, five sense organs, etc.) of a human body and by the detected key points. In addition, the image to be recognized contains imaging information of the human body.

S102, carrying out human body detection on the image to be recognized to obtain at least one piece of human body information.

It should be noted that the image to be recognized includes imaging information of at least one piece of human body information, so that when the human body posture recognition device detects a human body in the image to be recognized, at least one piece of human body information can be obtained.

In the embodiment of the present invention, when the human body posture identifying device detects a human body in an image to be identified, returned position information (for example, a region box) of each human body in the image to be identified, that is, at least one piece of human body information (for example, at least one region box) is returned.

Illustratively, referring to fig. 4, fig. 4 is an exemplary schematic diagram of human body detection provided by the embodiment of the present invention, and as shown in fig. 4, two pieces of human body information, 4-21 and 4-22, are detected on the image 4-1 to be recognized.

S103, carrying out multi-scale recognition on the human body posture key points in the image to be recognized to obtain multi-scale key points.

In the embodiment of the invention, after the human body posture recognition equipment obtains the image to be recognized, the human body posture key points in the image to be recognized are recognized from multiple scales, so that the key points recognized under each scale are obtained, and a set formed by the key points recognized under each scale is a multi-scale key point.

It should be noted that when the human body posture recognition device performs human body posture recognition, human body posture key points, that is, key points of the human body described in S101, are preset; correspondingly, the preset key point recognition model is trained aiming at the preset human body posture key points of the human body posture recognition equipment in the training process; therefore, the multi-scale key points are the information of the human posture key points under at least two scales, and the human posture key points comprise at least one key point. For example, the human pose key points include: 14 key points of the head, the neck, the left shoulder, the left elbow, the left wrist, the right shoulder, the right elbow, the right wrist, the left hip, the left knee, the left ankle, the right hip, the right knee and the right ankle; thus, when the key points of the human body posture of the image to be recognized are recognized from 3 scales; for example, for the head keypoints, the information of the keypoints under 3 scales is corresponded; for the right ankle, the key point information under 3 scales is corresponded.

And S104, performing multi-scale fusion on the multi-scale key points to obtain target key points.

In the embodiment of the invention, after the human body posture identification equipment obtains the multi-scale key points, the multi-scale key points comprise key point information under at least two scales corresponding to the human body posture key points; therefore, the human body posture recognition equipment fuses the key point information under at least two scales, and the recognition result of the human body posture key points in the image to be recognized, namely the target key points, can be obtained. That is, the target keypoints are keypoints of the human body determined by the human body posture recognition device through recognition of the human body posture keypoints of the image to be recognized from at least two scales.

It should be noted that, S102 and S103-S104 are not sequentially executed in the execution sequence, and S102 and S103-S104 may be executed simultaneously, may also be executed in a cross manner, may also be executed in a delayed manner, may also be executed sequentially, and the like, which is not specifically limited in this embodiment of the present invention. And fig. 3 is a schematic diagram showing the sequential execution of S102 and S103-S104.

S105, distributing the target key points to each piece of human body information to obtain human body posture information and finish human body posture recognition; the human body posture information is a set formed by the postures of each human body information.

In the embodiment of the invention, after the human body posture recognition equipment obtains the target key point and at least one piece of human body information in the image to be recognized, the target key point is a set formed by the human body posture key points of each human body recognized from the image to be recognized; therefore, after the human body posture identification device determines the key points belonging to each piece of human body information in the at least one piece of human body information in the target key points, the distribution of the target key points to each piece of human body information in the at least one piece of human body information is completed; at this moment, the human body posture recognition device also obtains the key points corresponding to each human body information, and also obtains the posture of each human body information, so that at least one posture of the human body information, namely the human body posture information, is obtained, and the human body posture recognition is completed. That is, the human posture information is a set of postures of each human information.

Here, each of the human body information is human body information of at least one of the human body information, and the human body posture information is a set of postures of each of the human body information.

Referring to fig. 5, fig. 5 is a schematic flow processing diagram of the human body posture recognition method provided by the embodiment of the present invention, and as shown in fig. 5, on one hand, an image 5-1 to be recognized is processed by multi-scale recognition 5-2 and multi-scale fusion 5-3 to obtain a target key point 5-4; on the other hand, after the human body detection 5-5 processing, at least one piece of human body information 5-6 is obtained; and then, binding 5-7 the target key point 5-4 with at least one piece of human body information 5-6 to obtain human body posture information 5-8, thereby completing human body posture recognition of the image to be recognized 5-1.

It can be understood that the human body posture recognition device obtains all key points of the human body in the image to be recognized, namely target key points, by fusing the human body posture key points under multiple scales recognized from the image to be recognized, so that a scheme of effectively utilizing pixels of the image to be recognized is realized, and pixel loss in the process of obtaining the target key points is avoided; therefore, the obtained target key points can effectively represent the human body posture key points in the image to be recognized, and the accuracy is high; therefore, when the human body posture is recognized according to the target key points, the recognition result is high in accuracy, and the accuracy of human body posture recognition is improved.

Further, in the embodiment of the present invention, S103 may be implemented by S1031 to S1032; that is, the human body pose recognition device performs multi-scale recognition on the human body pose key points in the image to be recognized, and obtains multi-scale key points, including S1031 to S1032, which will be described below with reference to each step.

And S1031, performing multi-scale feature extraction on the image to be recognized to obtain target multi-scale features.

In the embodiment of the invention, the human body posture recognition equipment acquires images of images to be recognized under multiple scales, characteristic extraction is respectively carried out on the images under the multiple scales, and the extracted characteristics in the images under each scale are jointly combined into the target multi-scale characteristic.

S1032, identifying the key points of the human body posture of the target multi-scale features to obtain multi-scale key points.

In the embodiment of the invention, after the human body posture recognition equipment obtains the target multi-scale features, the human body posture key points in the target multi-scale features are recognized, and the multi-scale key points with the multi-scale features can be obtained. Here, the multi-scale feature refers to that each keypoint corresponds to keypoint information existing under multiple scales.

Further, in the embodiment of the present invention, in S103, the human posture identifying device performs multi-scale identification on the human posture key points in the image to be identified, and obtaining the multi-scale key points can also be implemented by the following steps: carrying out multi-scale recognition on the human body posture key points in the image to be recognized by using a preset key point recognition model to obtain multi-scale key points; the preset key point identification model is used for identifying the human body posture key points in the image to be identified from at least two scales.

It should be noted that, a preset key point recognition model is obtained in advance through training in the human body posture recognition device and is used for recognizing the human body posture key points in the image to be recognized from at least two scales; therefore, after the human body posture recognition device obtains the image to be recognized, the preset key point recognition model can be used for carrying out multi-scale recognition on the human body posture key points in the image to be recognized, and the obtained recognition result is the multi-scale key points.

In addition, the preset key point identification model is used for identifying the human posture key points in the image to be identified from at least two scales, so that each key point in the identified multi-scale key points corresponds to key point information under at least two scales.

The preset key point identification model comprises a multi-scale feature extraction model and a multi-scale key point output model, wherein the multi-scale feature extraction model is used for extracting features of the image to be identified from at least two scales, and the multi-scale key point output model is used for outputting key point information under at least two scales.

Correspondingly, in S1031, the human body posture recognition device performs multi-scale feature extraction on the image to be recognized by using the multi-scale feature extraction model to obtain the target multi-scale features.

In the embodiment of the invention, the preset key point identification model is multi-task and multi-stage, wherein the multi-task or multi-stage is realized by a multi-scale feature extraction model; the human body posture recognition equipment performs feature extraction on the image to be recognized under each scale in the image to be recognized through the multi-scale feature extraction model, so that the multi-scale feature extraction on the image to be recognized under each scale is realized, and a set formed by the extracted features under each scale in the image to be recognized is a target multi-scale feature.

It should be noted that the multi-scale feature extraction model is a convolutional neural network with a pyramid branch structure, and is used for extracting features of an image to be identified from multiple scales; therefore, the multi-scale feature extraction model comprises a plurality of network branches, wherein each network branch is used for extracting features of the image to be recognized under one scale; therefore, the target multi-scale features are obtained through interaction of a plurality of network branches.

Exemplarily, referring to fig. 6, fig. 6 is a schematic flowchart of a process for extracting multi-scale features provided by an embodiment of the present invention, and as shown in fig. 6, a multi-scale feature extraction model 6-2 performs feature extraction on an image to be recognized 6-1 at three scales (6-21, 6-22, and 6-23) to obtain a target multi-scale feature 6-3.

The method has the advantages that the characteristic extraction is carried out on the image to be recognized from multiple scales through the multi-scale characteristic extraction model, richer image characteristics can be extracted in the initial stage of human body posture recognition, effective pixels of the image to be recognized are utilized more fully, and preconditions are provided for the improvement of the accuracy of subsequent human body posture recognition; meanwhile, the problem of inaccurate recognition result caused by body confusion is solved due to high accuracy of human posture recognition.

In addition, S1032 is that the human body posture equipment utilizes the multi-scale key point output model to identify the human body posture key points of the target multi-scale features, and multi-scale key points are obtained.

In the embodiment of the invention, the preset key point identification model further comprises a multi-scale key point output model, and the multi-scale key point output model is used for outputting a heat map corresponding to the human posture key points under the multi-scale of the identified image to be identified, namely the multi-scale key points; therefore, after the human body posture recognition equipment obtains the target multi-scale features, the multi-scale key point output model is used for recognizing the human body posture key points of the target multi-scale features, and then the multi-scale key points are obtained.

It should be noted that the multi-scale key point output model is a model formed by at least two hourglass structure models, wherein the number of scales corresponding to at least two scales is consistent with the number of the hourglass structure models in the multi-scale key point output model; each hourglass structure model is divided into an upper branch and a lower branch, the upper branch result is used as the input of the next hourglass structure model, and the lower branch result is used as the output result; therefore, after the target multi-scale features pass through the multi-scale key point output model, at least two lower branch results can be obtained, and the multi-scale key points are obtained. Here, the lower branch result is a heat map corresponding to the identified human posture key point, and thus, heat maps corresponding to the human posture key points under at least two scales are obtained.

Further, in the embodiment of the present invention, S104 may be implemented by S1041 to S1043; that is, the human body posture recognition device performs multi-scale fusion on the multi-scale key points to obtain target key points, including S1041-S1043, which will be described below with reference to each step.

S1041, obtaining a local maximum value of the multi-scale key point to obtain a predicted multi-scale key point.

In the embodiment of the invention, the multi-scale key points are heat maps under multi-scale; therefore, after the human posture recognition device obtains the multi-scale key points, the local maximum value of the multi-scale key points is calculated based on the position information of each key point in each heat map, and then the human posture key points of the image to be recognized are predicted, namely the multi-scale key points are predicted.

S1042, based on the multi-scale key points, linear combination processing is carried out on the predicted multi-scale key points to obtain initial multi-scale key points.

In the embodiment of the invention, after the human body posture identification equipment predicts the predicted multi-scale key points, in order to further accurately determine the human body posture key points of the image to be identified, the accuracy of the predicted multi-scale key points is improved by using the multi-scale key points; here, the human body posture recognition device performs linear combination processing on the predicted multi-scale key points based on the multi-scale key points to improve the accuracy of the predicted multi-scale key points, and the improved predicted multi-scale key points are initial multi-scale key points. That is, the accuracy of the initial multi-scale keypoints is higher than the accuracy of the prediction of the multi-scale keypoints.

And S1043, performing multi-scale fusion processing on the initial multi-scale key points to obtain target key points.

In the embodiment of the invention, after the human body posture identification device obtains the initial multi-scale key points, the multi-scale fusion is carried out by taking each key point in the initial multi-scale key points as a unit, and then the target key points are obtained.

Further, in the embodiment of the present invention, S1042 may be implemented by S10421 to S10422; that is, the human body posture identifying device performs linear combination processing on the predicted multi-scale key points based on the multi-scale key points to obtain initial multi-scale key points, including S10421-S10422, which will be described below with reference to each step.

S10421, acquiring a preset number of adjacent key points with the minimum distance to the predicted multi-scale key points from the multi-scale key points.

In the embodiment of the invention, the accuracy of predicting the multi-scale key points is improved by the human body posture recognition equipment, and the predicted multi-scale key points are compared with key points near the predicted multi-scale key points in the multi-scale key points; here, the human body pose recognition apparatus takes a preset number (e.g., 4) of neighboring keypoints on the multi-scale keypoints whose distances from the predicted multi-scale keypoints are smallest as keypoints near the predicted multi-scale keypoints among the multi-scale keypoints.

S10422, carrying out linear combination on the predicted multi-scale key points and the adjacent key points to obtain initial scale key points.

It should be noted that, after the human body posture identification device obtains the adjacent key points, the predicted multi-scale key points and the adjacent key points are linearly combined, and the adopted linear combination mode depends on the comparison result of the predicted multi-scale key points and the adjacent key points; for example, if the predicted multi-scale keypoint is closer to the right keypoint in the neighboring keypoints (smaller than or equal to the threshold value) and is further away from other keypoints in the neighboring keypoints (larger than the threshold value), the linear combination method is to take the average value of the predicted multi-scale keypoint and the right keypoint in the neighboring keypoints to obtain the initial scale keypoint.

Further, in the embodiment of the present invention, the initial multi-scale key points include at least one multi-scale target key point, and each multi-scale target key point of the at least one multi-scale target key point is composed of a set of key points corresponding to each scale in at least two scales; here, S1043 may be implemented by S10431-S10433; that is, the human body posture identifying device performs multi-scale fusion processing on the initial multi-scale key points to obtain target key points, including S10431 to S10433, which will be described below with reference to each step.

S10431, obtaining a target center point of each multi-scale target key point in the initial multi-scale key points.

In the embodiment of the invention, the human body posture identification equipment calculates the central position of the key point corresponding to each scale under at least two scales corresponding to each multi-scale target key point aiming at each multi-scale target key point in the initial multi-scale key point, and the target central point of each multi-scale target key point is obtained.

It should be noted that, since the initial multi-scale keypoints include at least one multi-scale target keypoint, after the target center point corresponding to each multi-scale target keypoint is obtained, at least one target center point is obtained. In addition, the number of the at least one multi-scale target key point is more than or equal to the number of the preset human posture key points; for example, when the number of preset human pose keypoints is 14, the number of at least one multi-scale target keypoints may be 14 or 15.

Exemplarily, referring to fig. 7, fig. 7 is an exemplary schematic diagram of multi-scale fusion provided by the embodiment of the present invention, as shown in fig. 7, in a position coordinate, 11 star points 7-1 are one multi-scale target keypoint of at least one multi-scale target keypoint, and a solid origin 7-2 is a target center point corresponding to the one multi-scale target keypoint.

S10432, calculating the distance from each scale corresponding key point in each multi-scale target key point to the target center point to obtain the target distance.

In the embodiment of the invention, after the human body posture recognition equipment obtains the target central point, in order to improve the accuracy of the obtained target key points, the key points corresponding to each scale in each multi-scale target key point are calculated, and the distance from the key points to the target central point is calculated, so that the target distance is obtained; to exclude outliers based on target distance.

S10433, taking the average value of the key points with the target distance smaller than the preset distance threshold as the sub-target key points corresponding to each multi-scale target key point, thereby obtaining the target key points corresponding to the initial multi-scale key points.

In the embodiment of the invention, a preset distance threshold value is set in the human body posture recognition equipment and is used for eliminating outliers in each multi-scale target key point; the human body posture identification device excludes the key points corresponding to the preset distance threshold value or more in the target distance from each multi-scale target key point, and takes the average value of the key points smaller than the preset distance threshold value in the target distance as the sub-target key points corresponding to each multi-scale target key point.

It should be noted that after the sub-target keypoints corresponding to each multi-scale target keypoint are obtained, at least one sub-target keypoint corresponding to at least one multi-scale target keypoint, that is, a target keypoint, is also obtained.

Illustratively, with reference to fig. 7, 7-3 in fig. 7 is a preset distance threshold, the circular area 7-4 is an area obtained by taking the target central point 7-2 as a center and the preset distance threshold 7-3 as a radius, it is easy to know that the star point falling in the circular area 7-4 among the 11 star points 7-1 is a required star point, and the sub-target key point 7-5 is obtained by averaging the star points falling in the circular area 7-4.

It can be understood that the human body posture recognition device integrates the key points recognized under the multi-scale, the confidence coefficient of each key point under the multi-scale is considered, outliers are effectively eliminated, and the accuracy of the target key points is improved.

Further, in the embodiment of the present invention, S105 may be implemented by S1051-S1054; that is, the human body posture identifying apparatus assigns the target key points to each piece of human body information, and obtains human body posture information, including S1051 to S1054, which will be described below with reference to each step.

S1051, determining individual key points belonging to each human body information from the target key points.

In an embodiment of the present invention, the human body posture identifying device determines an individual key point of each of the human body information belonging to the at least one piece of human body information based on the position information of each of the human body information of the at least one piece of human body information and the position information of the target key point. Here, the human body posture recognition device divides the target key points into at least one individual key point according to each human body information of the at least one human body information; moreover, because each human body has an overlap, the same key point may exist between each individual key point in at least one individual key point, and therefore, each individual key point needs to be subjected to a screening process.

Exemplarily, when the at least one piece of human body information is the at least one human body region box, the key points falling in each human body region box in the target key points are the individual key points corresponding to the human body.

S1052, acquiring individual type key points corresponding to each key point type in the human body posture key points from the individual key points.

In the embodiment of the invention, the key points of the human body posture in the human body posture identification device comprise at least one key point type, and after the individual key points are obtained, the individual key points also comprise key point type information which is used for indicating the key point type of each key point in the individual key points; therefore, the human body posture identification equipment can acquire key points corresponding to each key point type in the human body posture key points, namely individual type key points; thus, for an individual keypoint, at least one individual type keypoint is also obtained.

S1053, obtaining target individual type key points from the individual type key points to obtain individual posture key points corresponding to each human body information.

In the embodiment of the present invention, since there is only one keypoint under each keypoint type in the individual keypoints, for example, a human body has only one head, one neck, one left shoulder, one left elbow, one left wrist, one right shoulder, one right elbow, one right wrist, one left hip, one left knee, one left ankle, one right hip, one right knee, and one right ankle; therefore, the human body posture identification device needs to delete redundant key points in each individual type key point, and at the moment, a target individual type key point is obtained from the individual type key points, wherein the target individual type key point is only one key point under each key point type. Target individual type key points corresponding to each key point type of each human body information are obtained, and at least one target individual type key point forms an individual posture key point corresponding to each human body information; here, the number of the at least one target individual type keypoints coincides with the number of types of human posture keypoints.

Here, the human body posture recognition device may input a plurality of key points for each key point type during training by training a model for screening key points of a target individual type, and label the key points of the target individual type on the input plurality of key points, so that a model for screening the key points of the target individual type can be obtained.

Referring to fig. 8, fig. 8 is an exemplary schematic diagram of obtaining target individual type key points according to an embodiment of the present invention, as shown in fig. 8, there are 3 head key points (black solid circles) of human body information 8-1, and through screening, target individual type key points 8-2 corresponding to the head are also obtained.

S1054, obtaining the individual posture corresponding to each piece of human body information according to the individual posture key points, thereby obtaining the human body posture information corresponding to the at least one piece of human body information.

It should be noted that the individual posture key points include at least one target individual type key point, and the human body posture recognition device connects the at least one target individual type key point according to a preset connection sequence, so that the posture corresponding to each piece of human body information is obtained, and thus the human body posture information corresponding to the at least one piece of human body information is obtained. Here, the preset connection order is a connection order between key points of the body posture, such as head-to-neck.

Further, in the embodiment of the present invention, the human body posture identifying device further includes a process of training to obtain a preset keypoint identifying model, that is, before S102, S106-S108 is further included; that is, before the human body gesture recognition device performs human body detection on the image to be recognized to obtain at least one piece of human body information, the human body gesture recognition method further includes S106-S108, which will be described below with reference to each step.

S106, obtaining a target sample image; and the target sample image is used for training to obtain a preset key point identification model.

In the embodiment of the invention, when the human body posture recognition device obtains the preset key point recognition model through training, a training sample, namely a target sample image, needs to be obtained first. Here, the target sample image and the type of image to be recognized also contain imaging information of the human body.

S107, building an original key point identification model; the original key point identification model is a multi-scale hierarchical structure model.

In the embodiment of the invention, the preset key point recognition model finally obtained through training is used for recognizing the key points of the human posture in the image to be recognized from at least two scales; therefore, the human body posture recognition equipment needs to build a model with a multi-scale hierarchical structure, namely the convolutional neural network with the pyramid branch structure, and an original key point recognition model is obtained.

It should be noted that the preset key point identification model includes a multi-scale feature extraction model and a multi-scale key point output model; thus, the original keypoint identification model comprises an original multi-scale feature extraction model and an original multi-scale keypoint output model.

And S108, training the original key point recognition model by using the target sample image to obtain a preset key point recognition model.

In the embodiment of the invention, after the human body posture recognition equipment obtains the target sample image and the original key point recognition model, the original key point recognition model can be continuously trained by using the target sample image until the trained original key point recognition model reaches the convergence balance, and the training is finished; at this time, the training of the original key point recognition model is completed, and the preset key point recognition model is obtained.

In addition, the training process and the type of the use process described in S103 are not described again in this embodiment of the present invention.

It should be noted that the human body posture recognition device performs a training process on the original key point recognition model, that is, a training process on the original multi-scale feature extraction model and the original multi-scale key point output model.

Further, referring to fig. 9, fig. 9 is an optional flowchart illustrating the obtaining of the target sample image according to an embodiment of the present invention, as shown in fig. 9, in the embodiment of the present invention, S106 may be implemented by S1061-S1064; that is, the human gesture recognition apparatus acquires the target sample image, including S1061-S1064, as described below in conjunction with the steps shown in fig. 9.

And S1061, acquiring a human body part sample image and a sample image to be identified.

In the embodiment of the present invention, the target sample image is an image subjected to data enhancement processing, where the image used in the data enhancement processing includes a human body part sample image and a sample image to be recognized, and therefore, the human body posture recognition device needs to obtain the human body part sample image and the sample image to be recognized.

The human body part sample image is an image used for performing data enhancement processing on a sample image to be identified to obtain a target sample image; and the sample image to be recognized is a sample image initially obtained by the human body posture recognition equipment.

S1062, identifying the human body part of the human body part sample image to obtain a human body part identification result.

In the embodiment of the invention, when the human body posture recognition equipment performs data enhancement processing on a sample image to be recognized by using a human body part sample image, each human body part in the human body part sample image is used; therefore, the human posture recognition apparatus needs to recognize the human body part of the human body part sample image, and each recognized human body part is the human body part recognition result.

Here, the human body posture identifying apparatus may identify the human body part of the human body part sample image based on pixel characteristics of the human body part; for example, the identification of the human body part sample image is realized based on the pixel characteristics of the human body part through a convolutional neural network.

Exemplarily, referring to fig. 10, fig. 10 is an exemplary schematic diagram of a human body part recognition result provided by an embodiment of the present invention, and as shown in fig. 10, the recognition result of the human body part sample image 10-1 is a human body part recognition result 10-2.

S1063, extracting the human body part from the human body part sample image based on the human body part identification result to obtain a human body part database.

In the embodiment of the invention, after the human body posture recognition device obtains the human body part recognition result, the recognized human body part recognition result is used as a mask, the human body part is segmented in the human body part sample image, and the segmented human body parts form the human body part database.

Illustratively, referring to fig. 11, fig. 11 is an exemplary schematic diagram of a human body position database provided by the embodiment of the present invention, as shown in fig. 11, the human body position database 11-1 includes a head 11-11, upper limbs 11-12, feet 11-13, legs 11-14 and upper body 11-15.

And S1064, fusing the human body part in the human body part database with the corresponding human body part in the sample image to be recognized to obtain the target sample image.

In the embodiment of the invention, after the human body posture recognition equipment obtains the human body part database, the human body parts in the human body part database are fused with the corresponding human body parts in the sample image to be recognized, and then the target sample image is obtained.

It should be noted that each human body part in the human body part database corresponds to a type, the human body posture recognition device recognizes the human body part of the sample image to be recognized based on the pixel characteristics of the human body part, and according to the recognition result of the sample image to be recognized and the type corresponding to each human body part, fusion of the human body part in the human body part database and the corresponding human body part in the sample image to be recognized is achieved.

Referring to fig. 12, fig. 12 is an exemplary fusion diagram provided by an embodiment of the present invention, as shown in fig. 12, an upper limb 12-1 is attached near the upper limb of the human body in the sample image to be recognized 12-2 (at a position smaller than a preset attachment distance), a foot 12-3 is attached near the foot of the human body in the sample image to be recognized 12-2, and a leg 12-4 is attached near the leg of the human body in the sample image to be recognized 12-2, so that a target sample image 12-5 is obtained.

It can be understood that the target sample image is obtained by adopting a data enhancement processing mode, so that the accuracy and the precision are high when the preset key point identification model obtained by training the target sample image is used for detecting the confusable human key points.

Further, in the embodiment of the present invention, S108 is followed by S109-S110; that is, after the human body posture identifying device trains the original key point identifying model by using the target sample image to obtain the preset key point identifying model, the human body posture identifying method further includes S109-S110, which will be described below with reference to each step.

And S109, acquiring a new target sample image.

In the embodiment of the invention, in order to improve the generalization capability of the preset key point identification model, the human body posture identification device can also acquire a new target sample image; here, the new target sample image, similar to the target sample image, contains imaging information of the human body; in addition, the new target sample image is used to optimize the pre-set keypoint identification model.

And S110, optimizing a preset key point identification model by using the new target sample image.

It should be noted that, after the human body posture identifying device obtains a new target sample image, the preset key point identifying model may be continuously optimized by using the new target sample image until the optimization is finished when an optimization stopping condition is reached (for example, the loss function is lower than a preset loss value), and at this time, the optimized preset key point identifying model is obtained.

In addition, the optimization process is consistent with the training process in S108, and the embodiment of the present invention is not described herein again.

Correspondingly, S103 can be implemented by performing multi-scale recognition on the human posture key points in the image to be recognized by the human posture recognition device using the preset key point recognition model to obtain multi-scale key points, including: and carrying out multi-scale recognition on the human posture key points in the image to be recognized by using the optimized preset key point recognition model to obtain multi-scale key points. That is, the human body posture recognition apparatus performs multi-scale recognition using the latest model.

The human body posture recognition equipment can realize the optimization of the preset key point recognition model by acquiring a new target sample image, and the generalization capability of the preset key point recognition model is improved; therefore, when the human posture recognition equipment carries out multi-scale recognition on the human posture key points in the image to be recognized by using the optimized preset key point recognition model, the accuracy of the obtained multi-scale key points is further improved.

Further, in the embodiment of the present invention, S106 is followed by S111-S112; that is, after the human posture identifying apparatus obtains the target sample image, the human posture identifying method further includes S111-S112, which will be described below in conjunction with the respective steps.

S111, building an original human body detection model; the original human body detection model is an initial model used for identifying each human body information in the image to be identified.

In the embodiment of the invention, the human body posture recognition equipment builds the initial model for recognizing each human body information in the image to be recognized, and thus the building of the original human body detection model is completed. Here, the original human body detection model is, for example, a deep neural network model.

S112, training an original human body detection model by using the target sample image to obtain a human body detection model; the human body detection model is used for identifying each human body information in the image to be identified.

In the embodiment of the invention, the target sample image is also used for training to obtain a model for identifying each human body information in the image to be identified; therefore, after the original human body detection model is built by the human body posture recognition equipment, the original human body detection model is continuously trained by using the target sample image until the training is finished when convergence is reached, and at the moment, the human body detection model is obtained, so that each piece of human body information in the image to be recognized can be recognized by using the human body detection model.

Correspondingly, in S102, the human posture identifying device performs human detection on the image to be identified to obtain at least one piece of human information, and the method can be implemented by the following steps: the human body posture recognition equipment carries out human body detection on the image to be recognized by utilizing the human body detection model to obtain at least one piece of human body information.

Further, referring to fig. 13, fig. 13 is another optional flowchart of the human body posture identifying method provided by the embodiment of the present invention, and as shown in fig. 13, the human body posture identifying device may further adopt a top-down identifying method to identify the human body posture. That is, S113-S114 are also included after S102, that is, after the human body gesture recognition device performs human body detection on the image to be recognized to obtain at least one piece of human body information, the human body gesture recognition method further includes S113-S114, which will be described with reference to the steps shown in fig. 13.

S113, carrying out multi-scale recognition on the human posture key points of each human body by using a preset key point recognition model to obtain individual posture key points, so as to obtain at least one individual posture key point corresponding to at least one piece of human body information.

In the embodiment of the invention, after the human body posture identification device obtains at least one piece of human body information, the preset key point identification model is used for carrying out multi-scale identification on the human body posture key points of each piece of human body information of the at least one piece of human body information one by one, so that the individual posture key points corresponding to each piece of human body information are obtained, and at least one individual posture key point corresponding to the at least one piece of human body information is obtained.

It should be noted that, the human body posture identifying device obtains the human body posture key point corresponding to each piece of human body information by using the multi-scale fusion processing and the screening (S1052-S1053) processing provided by the embodiment of the present invention for each individual posture key point, which is not described herein again in the embodiment of the present invention.

And S114, determining human body posture information according to at least one individual posture key point, and finishing human body posture identification.

In the embodiment of the invention, after the human body posture recognition equipment obtains at least one individual posture key point, the connection is carried out according to the preset connection sequence among the human body posture key points, so that the posture corresponding to each piece of human body information is obtained, the human body posture information corresponding to at least one piece of human body information is obtained, and the human body posture recognition is completed.

Further, referring to fig. 14, fig. 14 is a schematic diagram illustrating still another optional flow of the human body posture recognition method according to the embodiment of the present invention, as shown in fig. 14, based on fig. 3, S115 is further included after S105; that is, the human body posture recognition apparatus determines human body posture information, and after the human body posture recognition is completed, the human body posture recognition method further includes S115: determining an individual motion track according to the human body posture information; the individual motion trajectories are used to determine decision information.

It should be noted that, when the image to be recognized is at least one image, after the human body posture equipment obtains the human body posture information, the motion trajectory of each piece of human body information, that is, the individual motion trajectory, can be obtained; here, the individual motion trajectory is used to determine decision information, for example, the orientation of the face of the target user at the shelf side of the supermarket goods is obtained according to the human body posture information, so as to obtain the goods focused by the target user. In addition, after the human body posture recognition device obtains the individual motion track, the individual motion track can be displayed through the display device.

Referring to fig. 15, fig. 15 is a schematic view of an exemplary flow of human body posture recognition provided by an embodiment of the present invention, and as shown in fig. 15, on one hand, a multi-scale feature extraction model 15-2 performs feature extraction on an image to be recognized 15-1 from three scales, then a multi-scale keypoint output model 15-3 outputs a plurality of heat maps at each scale, and then a multi-scale fusion module 15-4 fuses keypoints on the plurality of heat maps at each scale to obtain target keypoints. On the other hand, the human body detection module 15-5 detects the human body in the image to be recognized 15-1 to obtain two pieces of human body information. At this time, the human body posture recognition device distributes the target key points on the two pieces of human body information, and further determines and screens the key points aiming at each piece of human body information, so that only one key point is reserved for each key point type of the human body posture key points in each piece of human body information, and then the human body posture information 15-6 is obtained.

It should be noted that human body posture recognition is a research of computer vision, and plays a fundamental role in researches of other related fields of computer vision, such as behavior recognition, person tracking, gait recognition and other related fields. The application of human body posture recognition mainly focuses on intelligent video monitoring, a patient monitoring system, human-computer interaction, virtual reality, human body animation, intelligent home, intelligent security, athlete auxiliary training and the like.

It can be understood that, through the recognition of the human body posture, on one hand, the embodiment of the present invention can realize the motion recognition, i.e. track the change of a human body posture in a period of time, and can also be applied to the motion, gesture and gait recognition, for example, to detect whether a person falls down or is ill, or to automatically teach fitness, sports, dance, etc., or to understand the body language of the whole body (e.g. airport runway signals, traffic police signals, etc.), or to enhance security and monitoring. On the other hand, motion capture and augmented reality can be realized, namely, graphs, styles, special effect enhancement, equipment, artistic modeling and the like are loaded on the detected human posture information, and the rendered graphs can be naturally fused with people when the human body moves by tracking the change of the human posture (the motion trail of the human posture). In yet another aspect, a training robot may also be implemented.

It should be noted that the human body posture recognition method provided by the embodiment of the present invention may be applied to other living bodies to obtain posture information of other living bodies.

In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.

Referring to FIG. 16, FIG. 16 is a first exemplary diagram of a human body posture application provided by the embodiment of the present invention, as shown in FIG. 16, the individual movement trajectories of the target user (human body) 16-1 on the street 16-2 are 16-31, 16-32, 16-33, 16-34, 16-35 and 16-36; wherein the display area 16-4 displays basic information of the target user 16-1, the display area 16-5 displays detailed information of the individual movement track of the target user 16-1, the display area 16-6 displays the individual movement track of the target user 16-1 on the street 16-2, and when a play track button in the display area 16-6 is triggered, the individual movement track of the target user 16-1 can be played. Therefore, for example, the real shopping interest and purchasing behavior of the customer can be analyzed and restored by tracking the human body and analyzing the track of the customer entering a shopping mall, so that the follow-up customized end-to-end service and advertisement pushing can be promoted, and the customer is helped to promote the shopping experience while the revenue is further promoted.

Referring to fig. 17, fig. 17 is a second exemplary schematic diagram of a human body posture application provided in the embodiment of the present invention, and as shown in fig. 17, an individual motion trajectory may also assist human head and human body binding tracking: target user (human) 17-1, target user 17-2, and target user 17-3 are going out of the elevator, and target user 17-4 and target user 17-5 are going into the elevator. Therefore, the human body motion trail can be accurately and abundantly obtained through human body posture recognition.

Continuing with the exemplary structure of the human gesture recognition device 255 provided by the embodiments of the present invention implemented as software modules, in some embodiments, as shown in fig. 2, the software modules stored in the human gesture recognition device 255 of the memory 250 may include:

an image obtaining module 2551, configured to obtain an image to be identified; the image to be recognized comprises imaging information of a human body;

a human body detection module 2552, configured to perform human body detection on the image to be identified, so as to obtain at least one piece of human body information;

a key point identification module 2553, configured to perform multi-scale identification on the human body posture key points in the image to be identified, so as to obtain multi-scale key points;

a keypoint fusion module 2554, configured to perform multi-scale fusion on the multi-scale keypoints to obtain target keypoints;

a key point distribution module 2555, configured to distribute the target key points to each piece of human body information, to obtain human body posture information, and complete human body posture recognition; the human body posture information is a set formed by the postures of the human body information.

Further, the keypoint identification module 2553 is further configured to perform multi-scale feature extraction on the image to be identified to obtain a target multi-scale feature; and identifying the human body posture key points of the target multi-scale features to obtain the multi-scale key points.

Further, the key point identification module 2553 is further configured to perform multi-scale identification on the human posture key points in the image to be identified by using a preset key point identification model to obtain the multi-scale key points; the preset key point identification model is used for identifying the human body posture key points in the image to be identified from at least two scales.

Further, the keypoint fusion module 2554 is further configured to obtain a local maximum of the multi-scale keypoints, so as to obtain a predicted multi-scale keypoint; based on the multi-scale key points, performing linear combination processing on the predicted multi-scale key points to obtain initial multi-scale key points; and performing multi-scale fusion processing on the initial multi-scale key points to obtain the target key points.

Further, the keypoint fusion module 2554 is further configured to obtain, from the multi-scale keypoints, a preset number of adjacent keypoints with the smallest distance from the predicted multi-scale keypoints;

and carrying out linear combination on the predicted multi-scale key point and the adjacent key points to obtain the initial scale key point.

Further, the initial multi-scale key points comprise at least one multi-scale target key point, and each multi-scale target key point is composed of a set formed by key points corresponding to each scale under at least two scales; the keypoint fusion module 2554 is further configured to obtain a target center point of each multi-scale target keypoint in the initial multi-scale keypoints; calculating the distance from the key point corresponding to each scale in each multi-scale target key point to the target center point to obtain a target distance; and taking the average value of the key points with the target distance smaller than a preset distance threshold value as a sub-target key point corresponding to each multi-scale target key point, so as to obtain the target key point corresponding to the initial multi-scale key point.

Further, the key point allocating module 2555 is further configured to determine, from the target key points, individual key points belonging to each piece of human body information; acquiring individual type key points corresponding to each key point type in the human body posture key points from the individual key points; acquiring target individual type key points from the individual type key points to obtain individual posture key points corresponding to each piece of human body information; and obtaining the individual posture corresponding to each piece of human body information according to the individual posture key points, thereby obtaining the human body posture information corresponding to the at least one piece of human body information.

Further, the human gesture recognition apparatus 255 further includes a training module 2556, configured to obtain a target sample image; the target sample image is used for training to obtain the preset key point identification model; building an original key point identification model; the original key point identification model is a multi-scale hierarchical structure model; and training the original key point identification model by using the target sample image to obtain the preset key point identification model.

Further, the training module 2556 is further configured to obtain a human body part sample image and a sample image to be identified; identifying the human body part of the human body part sample image to obtain a human body part identification result; extracting a human body part from the human body part sample image based on the human body part identification result to obtain a human body part data base; and fusing the human body part in the human body part data base with the corresponding human body part in the sample image to be identified to obtain the target sample image.

Further, the training module 2556 is further configured to build an original human body detection model; the original human body detection model is an initial model used for identifying each human body information in the image to be identified; training the original human body detection model by using the target sample image to obtain a human body detection model; the human body detection model is used for identifying each human body information in the image to be identified.

Correspondingly, the human body detection module 2552 is further configured to perform human body detection on the image to be recognized by using the human body detection model, so as to obtain the at least one piece of human body information.

Further, the human body posture identifying device 255 further includes an identifying module 2557, configured to perform multi-scale identification on the human body posture key point of each piece of the at least one piece of human body information by using the preset key point identifying model to obtain an individual posture key point, so as to obtain at least one individual posture key point corresponding to the at least one piece of human body information; and determining the human body posture information according to the at least one individual posture key point to finish human body posture identification.

Further, the human body posture recognition device 255 further includes an application module 2558, configured to determine an individual motion trajectory according to the human body posture information; the individual motion trajectories are used to determine decision information.

Embodiments of the present invention provide a storage medium storing executable instructions, which when executed by a processor, will cause the processor to execute a human body posture recognition method provided by embodiments of the present invention, for example, the human body posture recognition method shown in fig. 3.

In some embodiments, the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiments of the present invention, the target key points for determining the posture of each human body information in the image to be recognized are obtained by fusing key points of at least two scales of the image to be recognized; therefore, the accuracy of the target key points is high, so that when the human body posture recognition is realized according to the target key points, the accuracy of the recognition result is high, and the accuracy of the human body posture recognition is improved.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A human body posture recognition method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the performing multi-scale recognition on the human body posture key points in the image to be recognized to obtain multi-scale key points comprises:

performing multi-scale feature extraction on the image to be identified to obtain target multi-scale features;

and identifying the human body posture key points of the target multi-scale features to obtain the multi-scale key points.

3. The method according to claim 1 or 2, wherein the performing multi-scale recognition on the human body posture key points in the image to be recognized to obtain multi-scale key points comprises:

performing multi-scale recognition on the human body posture key points in the image to be recognized by using a preset key point recognition model to obtain the multi-scale key points;

the preset key point identification model is used for identifying the human body posture key points in the image to be identified from at least two scales.

4. The method according to claim 1 or 2, wherein the performing multi-scale fusion on the multi-scale key points to obtain target key points comprises:

obtaining a local maximum value of the multi-scale key point to obtain a predicted multi-scale key point;

based on the multi-scale key points, performing linear combination processing on the predicted multi-scale key points to obtain initial multi-scale key points;

and performing multi-scale fusion processing on the initial multi-scale key points to obtain the target key points.

5. The method of claim 4, wherein the performing linear combination processing on the predicted multi-scale keypoints based on the multi-scale keypoints to obtain initial multi-scale keypoints comprises:

acquiring a preset number of adjacent key points with the minimum distance from the predicted multi-scale key points from the multi-scale key points;

6. The method of claim 4, wherein the initial multi-scale keypoints comprise at least one multi-scale target keypoint, each multi-scale target keypoint consisting of a set of keypoints corresponding to each scale at least two scales; the performing multi-scale fusion processing on the initial multi-scale key points to obtain the target key points comprises:

acquiring a target central point of each multi-scale target key point in the initial multi-scale key points;

calculating the distance from the key point corresponding to each scale in each multi-scale target key point to the target center point to obtain a target distance;

and taking the average value of the key points with the target distance smaller than a preset distance threshold value as a sub-target key point corresponding to each multi-scale target key point, so as to obtain the target key point corresponding to the initial multi-scale key point.

7. The method according to claim 1 or 2, wherein the assigning the target key point to each human body information, obtaining human body posture information, comprises:

determining individual key points belonging to each piece of human body information from the target key points;

acquiring individual type key points corresponding to each key point type in the human body posture key points from the individual key points;

acquiring target individual type key points from the individual type key points to obtain individual posture key points corresponding to each piece of human body information;

and obtaining the individual posture corresponding to each piece of human body information according to the individual posture key points, thereby obtaining the human body posture information corresponding to the at least one piece of human body information.

8. The method according to claim 3, wherein before the human body detection is performed on the image to be recognized to obtain at least one piece of human body information, the method further comprises:

acquiring a target sample image; the target sample image is used for training to obtain the preset key point identification model;

building an original key point identification model; the original key point identification model is a multi-scale hierarchical structure model;

and training the original key point identification model by using the target sample image to obtain the preset key point identification model.

9. The method of claim 8, wherein the obtaining a target sample image comprises:

acquiring a human body part sample image and a sample image to be identified;

identifying the human body part of the human body part sample image to obtain a human body part identification result;

extracting a human body part from the human body part sample image based on the human body part identification result to obtain a human body part data base;

and fusing the human body part in the human body part data base with the corresponding human body part in the sample image to be identified to obtain the target sample image.

10. The method of claim 7, wherein after the obtaining the target sample image, the method further comprises:

building an original human body detection model; the original human body detection model is an initial model used for identifying each human body information in the image to be identified;

training the original human body detection model by using the target sample image to obtain a human body detection model; the human body detection model is used for identifying each human body information in the image to be identified;

correspondingly, the human body detection is performed on the image to be recognized to obtain at least one piece of human body information, and the method comprises the following steps:

and carrying out human body detection on the image to be recognized by utilizing the human body detection model to obtain the at least one piece of human body information.

11. The method according to claim 3, wherein after the human body detection is performed on the image to be recognized to obtain at least one piece of human body information, the method further comprises:

performing multi-scale recognition on the human posture key points of each piece of human body information by using the preset key point recognition model to obtain individual posture key points, so as to obtain at least one individual posture key point corresponding to the at least one piece of human body information;

and determining the human body posture information according to the at least one individual posture key point to finish human body posture identification.

12. The method according to claim 1 or 11, wherein after obtaining the body posture information and completing body posture recognition, the method further comprises:

determining an individual motion track according to the human body posture information; the individual motion trajectories are used to determine decision information.

13. A human body posture identifying device, comprising:

14. A human body posture recognition apparatus, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 12 when executing executable instructions stored in the memory.

15. A storage medium having stored thereon executable instructions for causing a processor to perform the method of any one of claims 1 to 12 when executed.