CN114077892A

CN114077892A - Human body skeleton sequence extraction and training method, device and storage medium

Info

Publication number: CN114077892A
Application number: CN202010816192.9A
Authority: CN
Inventors: 王凯; 王晴
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2022-02-22

Abstract

The invention discloses a human body bone sequence extraction and training method, a device and a storage medium. The training method comprises the following steps: obtaining a training sample set, the training sample set comprising: the millimeter wave radar signal detection method comprises tag information, a first thermodynamic diagram and a second thermodynamic diagram of a millimeter wave radar signal with a time dimension, wherein the tag information is three-dimensional coordinates of each node in a human skeleton sequence of a target object in the time dimension, the first thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a horizontal direction, and the second thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a vertical direction; training the human body bone sequence extraction model based on the training sample set to obtain a trained human body bone sequence extraction model; the human skeleton sequence extraction model comprises the following steps: the system comprises a local information coding module, a global information coding module and an attention pooling module. The extraction of human skeleton sequences can be carried out based on millimeter wave radar signals, and the prediction efficiency and accuracy are good.

Description

Human body skeleton sequence extraction and training method, device and storage medium

Technical Field

The invention relates to the field of behavior recognition, in particular to a human skeleton sequence extraction and training method, a human skeleton sequence extraction and training device and a storage medium.

Background

With the development of computer vision technology, the behavior of an object in a visual scene can be perceived based on the computer vision technology, so that behavior recognition is realized. For example, the video-based human bone sequence extraction has the following defects: 1. the video is sensitive to the change of illumination and shooting angles, so that the problem of difficult subsequent identification is caused; 2. the video data has large calculation amount and high requirement on calculation resources.

Compared with the traditional video technology, the millimeter wave radar signal can overcome uncertain factors such as background change, illumination change, visual angle change and human appearance change, has the characteristic of not invading user privacy, and can greatly improve the intelligent level of equipment. In the related art, there are two types of millimeter wave radars: high resolution millimeter wave radar and low resolution millimeter wave radar. The high-resolution millimeter wave radar acquires rich information through a large number of receiving and transmitting antennas, and the point cloud imaging of the high-resolution millimeter wave radar can directly see human body outline and detail information without privacy protection; the high-resolution millimeter wave radar uses dozens of antennas, so that the equipment is huge, the power consumption is high, the high-resolution millimeter wave radar has radiation hazard to human bodies, is not suitable for family scenes, and is often used for airport and subway security inspection; the low-resolution millimeter wave radar has the advantages of less antennas and limited information, so that the low-resolution millimeter wave radar has better privacy protection, is low in cost and power consumption, and is suitable for being used in family scenes. In the related art, gesture recognition is often performed based on a low-resolution millimeter wave radar, and the requirement for behavior recognition is difficult to meet.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, and a storage medium for extracting and training a human skeleton sequence, and aim to extract a human skeleton sequence based on millimeter wave radar signals of a millimeter wave radar.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a training method of a human body bone sequence extraction model, which comprises the following steps:

obtaining a training sample set, the training sample set comprising: the millimeter wave radar signal detection method comprises tag information, a first thermodynamic diagram and a second thermodynamic diagram of a millimeter wave radar signal with a time dimension, wherein the tag information is three-dimensional coordinates of each node in a human skeleton sequence of a target object in the time dimension, the first thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a horizontal direction, and the second thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a vertical direction;

training the human body bone sequence extraction model based on the training sample set to obtain a trained human body bone sequence extraction model;

wherein the human bone sequence extraction model comprises: the system comprises a local information coding module, a global information coding module and an attention pooling module; the local information coding module is used for learning local information of the millimeter wave radar signals, the global information coding module is used for learning global information of the millimeter wave radar signals based on increased network receptive fields, and the attention pooling module is used for pooling output results of the local information coding module and output results of the global information coding module.

The embodiment of the invention also provides a human body bone sequence extraction method, which comprises the following steps:

acquiring a first thermodynamic diagram and a second thermodynamic diagram of a millimeter wave radar signal with a time dimension;

and inputting the first thermodynamic diagram and the second thermodynamic diagram of the millimeter wave radar signal with the time dimension into the human skeleton sequence extraction model obtained by training with the training method of the human skeleton sequence extraction model in the embodiment of the invention to obtain the three-dimensional coordinates of each node in the human skeleton sequence.

The embodiment of the invention also provides a training device for the human body bone sequence extraction model, which comprises:

a first obtaining module, configured to obtain a training sample set, where the training sample set includes: the millimeter wave radar signal detection method comprises tag information, a first thermodynamic diagram and a second thermodynamic diagram of a millimeter wave radar signal with a time dimension, wherein the tag information is three-dimensional coordinates of each node in a human skeleton sequence of a target object in the time dimension, the first thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a horizontal direction, and the second thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a vertical direction;

the training module is used for training the human body bone sequence extraction model based on the training sample set to obtain a trained human body bone sequence extraction model;

The embodiment of the invention also provides a human skeleton sequence extraction device, which comprises:

the second acquisition module is used for acquiring a first thermodynamic diagram and a second thermodynamic diagram of the millimeter wave radar signal with a time dimension;

the identification module is used for inputting the first thermodynamic diagram and the second thermodynamic diagram of the millimeter wave radar signal with the time dimension into the human skeleton sequence extraction model obtained by training of the training device in the embodiment of the invention, and obtaining the three-dimensional coordinates of each node in the human skeleton sequence.

The embodiment of the invention also provides training equipment for the human body bone sequence extraction model, which comprises: a processor and a memory for storing a computer program capable of running on the processor, wherein the processor, when running the computer program, is configured to execute the steps of the training method for the human bone sequence extraction model according to the embodiment of the present invention.

The embodiment of the invention also provides human skeleton sequence extraction equipment, which comprises: a processor and a memory for storing a computer program capable of running on the processor, wherein the processor, when running the computer program, is configured to execute the steps of the human bone sequence extraction method according to the embodiment of the present invention.

The embodiment of the present invention further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of any one of the methods in the embodiment of the present invention are implemented.

According to the technical scheme provided by the embodiment of the invention, the training sample set comprises: the method comprises the following steps of tag information, a first thermodynamic diagram and a second thermodynamic diagram of millimeter wave radar signals with a time dimension, and a human skeleton sequence extraction model comprises the following steps: a local information coding module, a global information coding module and an attention pooling module, wherein the training sample set is used for training the human skeleton sequence extraction model to obtain the trained human skeleton sequence extraction model, therefore, the human skeleton sequence can be extracted based on the millimeter wave radar signal, and the human skeleton sequence extraction model of the embodiment of the invention introduces the global information coding module and the attention pooling module, can solve the degradation problem caused by network deepening existing in the local information learning of the input millimeter wave radar signal by a local information coding module, the global information learning is carried out through the global information coding module, the attention-enhanced learning is carried out by utilizing the attention pooling module, the key information can be strengthened in a self-adaptive manner, therefore, the human skeleton sequence extraction model provided by the embodiment of the invention has good prediction efficiency and accuracy.

Drawings

FIG. 1 is a schematic flow chart of a training method for a human skeleton sequence extraction model according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an experimental system for extracting human skeleton sequences based on millimeter-wave radar signals in an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a deviation between a millimeter wave radar coordinate system and a coordinate system of a Kinect device according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of each node of a human skeleton sequence identified by a Kinect device in the embodiment of the present invention;

FIG. 5 is a schematic flow chart of a training method for extracting a model from a human skeleton sequence according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a human skeleton sequence extraction model in an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a training apparatus for extracting a model from a human skeleton sequence according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a human bone sequence extraction apparatus according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of a training apparatus for extracting a model from a human bone sequence according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a human bone sequence extraction apparatus according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The embodiment of the invention provides a training method of a human body bone sequence extraction model, which comprises the following steps of:

step 101, obtaining a training sample set, wherein the training sample set comprises: the millimeter wave radar signal detection method comprises tag information, a first thermodynamic diagram and a second thermodynamic diagram of a millimeter wave radar signal with a time dimension, wherein the tag information is three-dimensional coordinates of each node in a human skeleton sequence of a target object in the time dimension, the first thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a horizontal direction, and the second thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a vertical direction;

and 102, training the human body bone sequence extraction model based on the training sample set to obtain the trained human body bone sequence extraction model.

In the embodiment of the present invention, the human bone sequence extraction model includes: the system comprises a local information coding module, a global information coding module and an attention pooling module; the local information coding module is used for learning local information of the millimeter wave radar signals, the global information coding module is used for learning global information of the millimeter wave radar signals based on increased network receptive fields, and the attention pooling module is used for pooling output results of the local information coding module and output results of the global information coding module.

Therefore, the human skeleton sequence extraction model of the embodiment of the invention forms a multi-angle convolutional network, compared with a common local information coding module for learning local information of an input point cloud data stream, the problem of degradation caused by network deepening exists, global information learning is carried out through the global information coding module, attention reinforcement learning is carried out by utilizing the attention pooling module, key information can be strengthened in a self-adaptive manner, and the human skeleton sequence extraction model of the embodiment of the invention has good prediction efficiency and accuracy.

In some embodiments, obtaining the training sample set comprises: the method comprises the steps of obtaining label information of a training sample set, wherein illustratively, the label information in the training sample set can be obtained by a Kinect device, the Kinect device can output three-dimensional coordinates of 25 nodes of a human skeleton sequence, the obtained three-dimensional coordinates can be converted into a coordinate system with a millimeter wave radar as an origin through geometric coordinate registration, and the obtained three-dimensional coordinates can be used as label information of millimeter wave radar signals. It can be understood that the millimeter wave radar signal and the tag information are generated for the same target object and have the same time dimension, and the tag information of the training sample set may be generated in other manners, for example, the three-dimensional coordinates of each node of the human skeleton sequence are generated based on the millimeter wave radar signal, as long as the three-dimensional coordinates correspond to the millimeter radar signal having the time dimension, and the generation manner of the tag information is not specifically limited in the embodiment of the present invention.

In some embodiments, obtaining the training sample set comprises: the method comprises the steps of obtaining millimeter wave radar signals with a time dimension, namely millimeter wave radar signal data streams (for example, the millimeter wave radar signal data streams have three dimensions of space and time dimension), and converting the millimeter wave radar signal data streams into two three-dimensional space-time sequences (one is thermodynamic diagram in a two-dimensional horizontal direction and the time dimension, and the other is thermodynamic diagram in a two-dimensional vertical direction and the time dimension). In other examples, two-dimensional millimeter wave radars may be used to generate millimeter wave radar signals in the horizontal direction and the vertical direction, respectively, that is, in the embodiment of the present invention, the first thermodynamic diagram and the second thermodynamic diagram may be generated by converting millimeter wave radar signals acquired by one three-dimensional millimeter wave radar, or may be generated by converting millimeter wave radar signals acquired by two-dimensional millimeter wave radars, which is not specifically limited in the embodiment of the present invention. In the embodiment of the invention, the thermodynamic diagram of the millimeter wave radar signal in the horizontal direction is called a first thermodynamic diagram, the thermodynamic diagram of the millimeter wave radar signal in the vertical direction is called a second thermodynamic diagram, the three-dimensional space-time sequence corresponding to the first thermodynamic diagram is called a first three-dimensional space-time sequence, and the three-dimensional space-time sequence corresponding to the second thermodynamic diagram is called a second three-dimensional space-time sequence.

In some embodiments, the training the human bone sequence extraction model based on the training sample set to obtain a trained human bone sequence extraction model includes:

respectively carrying out information encoding on a first thermodynamic diagram and a second thermodynamic diagram of millimeter wave radar signals with time dimensions in the training sample set on the basis of the local information encoding module, the global information encoding module and the attention pooling module;

decoding and predicting the coding information corresponding to the first thermodynamic diagram and the coding information corresponding to the second thermodynamic diagram based on a full-link layer to obtain a predicted three-dimensional coordinate of each node in a human body skeleton sequence;

determining a network loss value of the human body skeleton sequence extraction model based on the predicted three-dimensional coordinates of each node in the human body skeleton sequence and the three-dimensional coordinates of each node in the label information;

based on the network loss value, adjusting model parameters of the human body skeleton sequence extraction model by adopting a back propagation algorithm;

extracting a model and the training sample set based on the adjusted human skeleton sequence, and continuously adjusting model parameters;

and determining that the adjustment times of the model parameters of the human body bone sequence extraction model reach set times, or determining that the network loss value is less than or equal to a preset threshold value, so as to obtain the trained human body bone sequence extraction model.

Therefore, the obtained trained human body bone sequence extraction model can support extraction of human body bone sequences of millimeter wave radar signals collected by the millimeter wave radar, and has good prediction efficiency and accuracy.

In some embodiments, the encoding information based on the local information encoding module, the global information encoding module, and the attention pooling module for the first thermodynamic diagram and the second thermodynamic diagram, respectively, of millimeter wave radar signals having a time dimension in the training sample set comprises:

processing a first three-dimensional space-time sequence formed by a first thermodynamic diagram of millimeter wave radar signals with time dimensions in the training sample set by the local information coding module and the global information coding module respectively, and pooling an output result of the local information coding module and an output result of the global information coding module based on the attention pooling module to obtain a first pooling result;

processing a second three-dimensional space-time sequence formed by a second thermodynamic diagram of millimeter wave radar signals with time dimensions in the training sample set by the local information coding module and the global information coding module respectively, and pooling an output result of the local information coding module and an output result of the global information coding module based on the attention pooling module to obtain a second pooling result;

correspondingly, the decoding and predicting the coding information corresponding to the first thermodynamic diagram and the coding information corresponding to the second thermodynamic diagram based on a full-link layer to obtain the predicted three-dimensional coordinates of each node in the human body skeleton sequence includes:

and decoding and predicting the first pooling result and the second pooling result based on a full-link layer to obtain a predicted three-dimensional coordinate of each node in the human body skeleton sequence.

In some embodiments, the local information encoding module is specifically configured to:

and explicitly learning local information for the input first three-dimensional space-time sequence and the input second three-dimensional space-time sequence respectively. By encoding the spatial geometry information at different times, the network can better learn the spatial geometry structure from the relative position and distance information of each point and the optical flow trend.

Illustratively, the local information encoding module may use a resnet block (residual network block) as a base module.

In some embodiments, the global information encoding module is specifically configured to:

learning the connection relation between skeleton nodes in the millimeter wave radar signal based on the similarity between any two points in the first three-dimensional space-time sequence to obtain first global coding information implying the connection relation;

and learning the connection relation between all skeleton nodes in the millimeter wave radar signal based on the similarity between any two points in the second three-dimensional space-time sequence to obtain second global coding information implying the connection relation.

Here, the global information encoding module is configured to perform global information learning based on the increased network receptive field. In the related art, since the convolution operation focuses only on a local receptive field, if it is desired to increase the receptive field, it is generally implemented by stacking convolution layers and pooling layers, but the amount and complexity of calculation increases and the feature map is reduced in size. While a fully connected layer that can learn global information can bring a large number of parameters, making optimization difficult. In the task of extracting the skeleton sequence, considering the sparsity of a point cloud picture, a network is expected to learn the connection relation between skeletons, if global information is introduced in the early learning stage, the situation that the global information cannot be seen in convolution operation can be well solved, and richer information is brought to the later layers. In the embodiment of the present invention, a global information encoding module is introduced, and exemplarily, the global information encoding module is defined as follows:

wherein x is input, y is output, i is current point, j is other points except current point in the input point cloud, that is, x_iFor the input of the current point, x_jFor input of points other than the current point in the input point cloud, y_iIs the output of the current point; the f-function is a function for calculating the similarity between two points, and the f-function can be defined as a variation of a gaussian function, i.e. a function for calculating the similarity in a coding field (embedding space). The g-function is a mapping function for mapping a point to a vector, i.e. to be understood as a feature of a point.

Illustratively, the g function may be a convolution function with a convolution kernel of 1.

Exemplaryly,

m(x_i)＝W_mx_i

n(x_j)＝W_nx_j

wherein, W_m、W_nRespectively, a function used in the similarity encoding process, W_mOr W_nThe function may be a linear function or a convolution function, which is not particularly limited in this embodiment of the present invention.

In some embodiments, the attention pooling module is specifically configured to:

performing attention-enhancing learning on the basis of a neighborhood point set on the basis of first local coding information output by the local information coding module on the basis of the first three-dimensional space-time sequence and first global coding information output by the global information coding module on the basis of the first three-dimensional space-time sequence to obtain a first pooling result;

and performing attention-enhancing learning on the basis of a neighborhood point set on the basis of second local coding information output by the local information coding module on the basis of the second three-dimensional space-time sequence and second global coding information output by the global information coding module on the basis of the second three-dimensional space-time sequence to obtain a second pooling result.

Here, the attention pooling module is used to aggregate features of the neighborhood point set of the cell together, and in the related art, often a heuristic max/mean or the like is used to implement hard aggregation of neighborhood points, which may cause much useful information to be lost. In the embodiment of the invention, the attention pooling module automatically learns and aggregates useful information in the neighborhood point set through attention-enhanced learning. Assuming that the neighborhood point set of each window in the pooling process is defined as f_iWhen the neighborhood point set is formed by 1 to k points { X1, … Xk }, the attention of each point in the neighborhood point set is scored, and the formula is as follows:

s_i＝a(f_i,Q)

where Q is the input parameter for pooling (posing), the a-function is the shared function used to calculate the attention score for each point, s_iThe attention of each point was scored. Thus, the attention score can be used as a soft indicator for automatically selecting key features, and the final obtained feature is a weighted sum of the neighborhood point sets, which is defined as follows:

wherein,

as neighborhood pointsCharacteristic output of the set, sigma_k(f_i,s_i) Is a weighted sum of the attention scores of the set of neighborhood points.

In some embodiments, the determining a network loss value of the human bone sequence extraction model based on the predicted three-dimensional coordinates of each node in the human bone sequence and the three-dimensional coordinates of each node in the tag information includes:

and counting the sum of squares of coordinate differences of the predicted three-dimensional coordinates of each node in the human body skeleton sequence and the corresponding three-dimensional coordinates in the label information, and determining the network loss value based on the sum of squares.

Illustratively, the loss function is defined as follows:

wherein c is the number of skeletal nodes in the human skeletal sequence, x_i、y_i、z_iIs each coordinate value, x 'of the predicted three-dimensional coordinates of the bone node i'_i、y′_i、z′_iIs each coordinate value in the three-dimensional coordinates of the bone node i in the tag information. For example, the value of c may be 17 or 25, and the value of c is not specifically limited in the embodiment of the present invention.

The present invention will be described in further detail with reference to the following application examples.

As shown in fig. 2, in this application embodiment, an experimental system for extracting a human bone sequence based on a millimeter wave radar signal includes: the system comprises a teacher network and a student network, wherein the teacher network is used for synchronously acquiring a real value of a human skeleton sequence based on Kinect equipment, so that label information is provided for the student network of millimeter wave radar signals.

During the experiment, the plane of the built lens of the Kinect device is parallel to the plane of the sensor of the millimeter wave radar. The millimeter wave radar can adopt common commercial millimeter wave radar factories and commercial TI, NXP, content and the like, the transmitting and receiving antenna is required to be 3-transmitting and 4-receiving or more, a static elimination algorithm carried by a millimeter wave radar system is recommended to be used, and only the information of a moving human body is kept. And acquiring a 3D point cloud output by the millimeter wave radar as the point cloud to be identified, and generating a horizontal thermodynamic diagram and a vertical thermodynamic diagram through processing, wherein the horizontal thermodynamic diagram and the vertical thermodynamic diagram are used as the input of the student network. The Kinect equipment can output 3-dimensional coordinate data of 25 nodes of a human body skeleton sequence, and the three-dimensional coordinate data are converted into a coordinate system with a millimeter wave radar as an origin through geometric coordinate registration and can be used as tag information of 3D point cloud.

The foregoing geometric registration is exemplified below:

exemplarily, a modeling of the geometrical coordinate registration is required. Firstly, the deviation angle of the coordinate system of the Kinect device and the coordinate system of the millimeter wave radar in the xyz-3 dimension is small and can be ignored. Secondly, taking xy plane as an example, the geometric registration problem can be abstractly expressed as the deviation of the millimeter wave radar coordinate system and the coordinate system of the Kinect device as shown in fig. 3.

As shown in FIG. 3, O (0,0) is the origin of the millimeter wave radar coordinate system, A (x'_o,y'_o) Is the coordinate of the origin of the coordinate system of the Kinect equipment in the millimeter wave radar coordinate system, and for the unified target point M, the coordinate of the coordinate system of the Kinect equipment in the millimeter wave radar coordinate system is (x) respectively₁,y₁) And (x'₁,y'₁) Coordinates (x ') of the target point in a coordinate system of the Kinect device'_i,y'_i) Conversion to coordinates in a millimeter-wave radar coordinate system

Wherein n is a representative point in the human skeleton sequence recognized by the Kinect device.

Illustratively, based on each node of the human skeleton sequence identified by the Kinect device, as shown in fig. 4, the embodiment of the present invention proposes 5 representative point schemes, as shown in table 1 below.

TABLE 1

And determining the millimeter wave radar-signal strongest reflection point, Kinect-upper half body mean value/scheme without calculating spine midpoint through about 6000 samples based on the corrected representative point error distance and angle, selecting and determining geometric registration parameters based on the representative points, and determining the parameters delta x and delta y in the coordinate conversion formula based on the selected representative points.

As shown in fig. 5, in an embodiment of the present invention, a training method for a human skeleton sequence extraction model includes the following steps:

step 501, acquiring a training data set;

and acquiring 4D point cloud data (space 3 dimension + time dimension) of a space-time sequence acquired by the millimeter wave radar and the described label information subjected to geometric coordinate registration conversion.

Converting 4D point cloud data into 2 3D dimensional data (1 is 2D horizontal thermodynamic diagram + time dimension, and 2D vertical thermodynamic diagram + time dimension), wherein the specific conversion method comprises the following steps:

converting the 3D point cloud into a 2D thermodynamic diagram: and determining the position of a human body based on the maximum reflection point in the millimeter wave radar, and then determining the starting points of the horizontal thermodynamic diagram and the vertical thermodynamic diagram and the coordinates of each node by taking the lengths a and b as the step lengths in the horizontal direction and the vertical direction respectively. And determining the nearest point positions of the point cloud images in the horizontal and vertical images according to the 3D coordinates of the point cloud images received by the millimeter wave radar once, and giving the reflection intensity values to the nearest points.

Thermodynamic diagram normalization: for each graph, normalizing all nodes by taking the maximum value as a denominator:

wherein, p'_iNormalizing the processed value, p, for the node in the thermodynamic diagram_iFor the current value of a node in the thermodynamic diagram, p_maxIs the maximum value of the nodes in the thermodynamic diagram.

Step 502, training a human skeleton sequence extraction model based on the training sample set;

the human skeleton sequence extraction model comprises the following steps: the local information encoding module, the global information encoding module, and the attention pooling module may refer to the foregoing description for a specific training process, and are not described herein again.

Step 503, calculating a network loss value of the skeleton sequence extraction model based on the 3D coordinates of the prediction nodes and the corresponding real coordinates;

step 504, updating model parameters of the skeleton sequence extraction model by adopting a back propagation algorithm based on the network loss value obtained by the training;

step 505, determining whether the adjustment times of the model parameters reach a set time or not, or whether the network loss value is less than or equal to a preset threshold value or not, if so, executing step 506; if not, randomly selecting a batch of training samples from the training sample set, and repeating the steps 502 to 505;

step 506, obtaining a trained human skeleton sequence extraction model.

The trained human body bone sequence extraction model provided by the embodiment of the invention supports the extraction of the human body bone sequence from the millimeter wave radar signal of the millimeter wave radar.

In the embodiment of the invention, the global information coding module solves the problems caused by sparsity and discontinuity characteristics of the low-resolution millimeter wave point cloud signal, and considers that the low-resolution millimeter wave point cloud signal is sparse and irregular, the corresponding relation between points and frames is not continuous, and a network is expected to learn the connection relation between skeleton nodes.

In addition, the attention pooling module of the embodiment of the invention reduces the loss of effective information in the network learning process. In order to guarantee privacy, a low-resolution millimeter wave radar can be adopted, and information points which can be collected by each frame are limited. In order to avoid useful information being discarded during pooling during training, embodiments of the present invention introduce an attention pooling module to automatically learn valid information.

Therefore, the human skeleton sequence extraction model provided by the embodiment of the invention adopts a multi-angle convolution network, can efficiently extract spatial structure characteristics with high precision requirements, and specifically, utilizes a local information coding module to learn local information of an input point cloud image stream; expanding the range of a convolution receptive field by using a global information coding module, extracting the structural characteristics of global point cloud, and obtaining more complete and rich representation; the attention pooling module is utilized to enhance the important information of self-adaptive selection of space-time convolution, enhance the relation of key point cloud in space-time dimension and obtain more robust representation, thereby improving the prediction accuracy of the whole skeleton sequence.

and inputting the first thermodynamic diagram and the second thermodynamic diagram of the millimeter wave radar signal with the time dimension into the human skeleton sequence extraction model obtained by training with the training method of the human skeleton sequence extraction model in the embodiment to obtain the three-dimensional coordinates of each node in the human skeleton sequence.

Here, the first thermodynamic diagram and the second thermodynamic diagram for acquiring the millimeter wave radar signal with the time dimension may refer to the corresponding description in the foregoing step 501, and are not described herein again.

In an application example, the structure of the human bone sequence extraction model is shown in fig. 6, the input of the human bone sequence extraction model is 2 three-dimensional space-time sequences obtained based on millimeter wave radar signals, namely a horizontal direction thermodynamic diagram + a time dimension (N × H × W × T) and a vertical direction thermodynamic diagram + a time dimension (N × H × W × T), wherein, H and W are the dimension of the thermodynamic diagram, T is the time dimension, N can be understood as the frame number, the value of the set batch size is set in the actual training process, the horizontal point cloud picture is output to a local information coding module and a global information coding module after being processed by a convolutional layer to obtain first local coding information and first global coding information, the first local coding information and the first global coding information are output to an attention pooling module after being learned by a residual error network, and a first pooling result is output by the attention pooling module; the vertical point cloud picture is processed by a convolution layer and then output to a local information coding module and a global information coding module to obtain second local coding information and second global coding information, the second local coding information and the second global coding information are output to an attention pooling module after being learned by resnet, a second pooling result is output by the attention pooling module, a full connection layer connects the first pooling result and the second pooling result, and the spatial three-dimensional position of each node in the human body skeleton sequence is obtained through regression prediction. Illustratively, the number of bone nodes may be 17 or 25 or other numbers, which is not specifically limited in the embodiments of the present invention.

For example, the label information of the training data set can be from 25 bone node spatial positions provided by the Kinect device, and is subjected to data preprocessing and coordinate axis matching.

In order to implement the training method of the human bone sequence extraction model of the embodiment of the present invention, an embodiment of the present invention further provides a training apparatus of a human bone sequence extraction model, the training apparatus of the human bone sequence extraction model corresponds to the training method of the human bone sequence extraction model, and each step in the embodiment of the training method of the human bone sequence extraction model is also completely applicable to the embodiment of the training apparatus of the human bone sequence extraction model.

As shown in fig. 7, the training apparatus for the human bone sequence extraction model includes: a first acquisition module 701 and a training module 702; wherein,

the first obtaining module 701 is configured to obtain a training sample set, where the training sample set includes: the millimeter wave radar signal detection method comprises tag information, a first thermodynamic diagram and a second thermodynamic diagram of a millimeter wave radar signal with a time dimension, wherein the tag information is three-dimensional coordinates of each node in a human skeleton sequence of a target object in the time dimension, the first thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a horizontal direction, and the second thermodynamic diagram is a thermodynamic diagram of the millimeter wave radar signal in a vertical direction;

the training module 702 is configured to train the human skeleton sequence extraction model based on the training sample set to obtain a trained human skeleton sequence extraction model;

the human skeleton sequence extraction model comprises the following steps: the system comprises a local information coding module, a global information coding module and an attention pooling module; the local information coding module is used for learning local information of the millimeter wave radar signals, the global information coding module is used for learning global information of the millimeter wave radar signals based on increased network receptive fields, and the attention pooling module is used for pooling output results of the local information coding module and output results of the global information coding module.

In some embodiments, training module 702 is specifically configured to:

In some embodiments, the training module 702 performs information encoding on the first thermodynamic diagram and the second thermodynamic diagram of the millimeter wave radar signal having the time dimension in the training sample set based on the local information encoding module, the global information encoding module, and the attention pooling module, respectively, and includes:

In some embodiments, the training module 702 determines the network loss value of the human bone sequence extraction model based on the predicted three-dimensional coordinates of each node in the human bone sequence and the three-dimensional coordinates of each node in the label information, including:

In practical applications, the first obtaining module 701 and the training module 702 may be implemented by a processor in a training apparatus for extracting a model from a human bone sequence. Of course, the processor needs to run a computer program in memory to implement its functions.

It should be noted that: in the training device for the human bone sequence extraction model provided in the above embodiment, when the training of the human bone sequence extraction model is performed, only the division of the above program modules is taken as an example, and in practical applications, the above processing may be distributed to different program modules according to needs, that is, the internal structure of the device may be divided into different program modules to complete all or part of the above-described processing. In addition, the training device for the human bone sequence extraction model provided in the above embodiments and the training method embodiment for the human bone sequence extraction model belong to the same concept, and the specific implementation process is described in detail in the method embodiments and is not described herein again.

In order to implement the human bone sequence extraction method of the embodiment of the present invention, an embodiment of the present invention further provides a human bone sequence extraction device, which corresponds to the human bone sequence extraction method, and each step in the human bone sequence extraction method is also completely applicable to the embodiment of the human bone sequence extraction device.

As shown in fig. 8, the human bone sequence extraction apparatus according to the embodiment of the present invention includes: a second obtaining module 801 and an identifying module 802; wherein,

the second obtaining module 801 is configured to obtain a first thermodynamic diagram and a second thermodynamic diagram of a millimeter wave radar signal having a time dimension;

the recognition module 802 is configured to input the first thermodynamic diagram and the second thermodynamic diagram of the millimeter wave radar signal with the time dimension into a human skeleton sequence extraction model obtained by training with the training apparatus according to the embodiment of the present invention, so as to obtain three-dimensional coordinates of each node in the human skeleton sequence.

In practical applications, the second obtaining module 801 and the identifying module 802 may be implemented by a processor in the human bone sequence extracting apparatus. Of course, the processor needs to run a computer program in memory to implement its functions.

It should be noted that: in the human bone sequence extraction device provided in the above embodiment, when extracting the human bone sequence, only the division of the above program modules is taken as an example, and in practical applications, the above processing allocation may be completed by different program modules according to needs, that is, the internal structure of the device is divided into different program modules to complete all or part of the above-described processing. In addition, the human bone sequence extraction device provided by the above embodiment and the human bone sequence extraction method embodiment belong to the same concept, and the specific implementation process thereof is described in the method embodiment and is not described herein again.

Based on the hardware implementation of the program module, and in order to implement the training method of the human body bone sequence extraction model in the embodiment of the present invention, the embodiment of the present invention further provides a training device of the human body bone sequence extraction model. Fig. 9 shows only an exemplary structure of the training apparatus of the human bone sequence extraction model, not a whole structure, and a part of or the whole structure shown in fig. 9 may be implemented as necessary.

As shown in fig. 9, a training apparatus 900 for extracting a model from a human bone sequence according to an embodiment of the present invention includes: at least one processor 901, memory 902, a user interface 903, and at least one network interface 904. The various components of the training apparatus 900 of the human skeletal sequence extraction model are coupled together by a bus system 905. It will be appreciated that the bus system 905 is used to enable communications among the components. The bus system 905 includes a power bus, a control bus, and a status signal bus, in addition to a data bus. For clarity of illustration, however, the various buses are labeled in fig. 9 as bus system 905.

The user interface 903 may include a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, a touch screen, or the like, among others.

The memory 902 in embodiments of the present invention is used to store various types of data to support the operation of a training apparatus for human bone sequence extraction models. Examples of such data include: any computer program for operating on a training apparatus for extracting a model of a human bone sequence.

The training method for the human body bone sequence extraction model disclosed by the embodiment of the invention can be applied to the processor 901, or can be realized by the processor 901. The processor 901 may be an integrated circuit chip having signal processing capabilities. In the implementation process, the steps of the training method for the human bone sequence extraction model may be completed by the integrated logic circuit of hardware in the processor 901 or instructions in the form of software. The Processor 901 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 901 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the memory 902, and the processor 901 reads information in the memory 902, and completes the steps of the training method for the human bone sequence extraction model provided in the embodiment of the present invention in combination with hardware thereof.

In an exemplary embodiment, the training Device of the human skeletal sequence extraction model may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.

Based on the hardware implementation of the program module, in order to implement the human bone sequence extraction method in the embodiment of the present invention, an embodiment of the present invention further provides a human bone sequence extraction device. Fig. 10 shows only an exemplary structure of the human bone sequence extraction apparatus, not the entire structure, and a part or the entire structure shown in fig. 10 may be implemented as necessary.

As shown in fig. 10, a human bone sequence extraction apparatus 1000 according to an embodiment of the present invention includes: at least one processor 1001, memory 1002, a user interface 1003 and at least one network interface 1004. The various components in the human skeletal sequence extraction device 1000 are coupled together by a bus system 1005. It will be appreciated that bus system 1005 is used to enable communications among the components of the connection. The bus system 1005 includes a power bus, a control bus, and a status signal bus, in addition to a data bus. But for the sake of clarity the various busses are labeled in figure 10 as the bus system 1005.

The user interface 1003 may include a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, a touch screen, or the like, among others.

The memory 1002 in the present embodiment is used to store various types of data to support the operation of the human bone sequence extraction apparatus. Examples of such data include: any computer program for operating on a human bone sequence extraction device.

The human skeleton sequence extraction method disclosed by the embodiment of the invention can be applied to the processor 1001, or can be realized by the processor 1001. The processor 1001 may be an integrated circuit chip having signal processing capabilities. In the implementation process, the steps of the human skeleton sequence extraction method may be implemented by an integrated logic circuit of hardware in the processor 1001 or instructions in the form of software. The Processor 1001 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 1001 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the memory 1002, and the processor 1001 reads the information in the memory 1002, and completes the steps of the human skeleton sequence extraction method provided by the embodiment of the present invention in combination with hardware thereof.

In an exemplary embodiment, the human skeletal sequence extraction device may be implemented by one or more ASICs, DSPs, PLDs, CPLDs, FPGAs, general purpose processors, controllers, MCUs, microprocessors, or other electronic components for performing the foregoing methods.

It will be appreciated that the

memories

902, 1002 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The described memory for embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.

In an exemplary embodiment, the embodiment of the present invention further provides a storage medium, that is, a computer storage medium, which may specifically be a computer readable storage medium, for example, a memory 902 storing a computer program, where the computer program is executable by a processor 901 of a training apparatus for extracting a model from a human bone sequence, so as to complete the steps of the training method for extracting a model from a human bone sequence according to the embodiment of the present invention; as another example, a memory 1002 is included for storing a computer program, which can be executed by the processor 1001 of the human bone sequence extraction apparatus to complete the steps of the human bone sequence extraction method according to the embodiment of the present invention. The computer readable storage medium may be a ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM, among others.

It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

In addition, the technical solutions described in the embodiments of the present invention may be arbitrarily combined without conflict.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A training method for a human bone sequence extraction model is characterized by comprising the following steps:

2. The method of claim 1, wherein the training the human bone sequence extraction model based on the training sample set to obtain a trained human bone sequence extraction model comprises:

3. The method of claim 2, wherein the encoding information based on the local information encoding module, the global information encoding module, and the attention pooling module for the first thermodynamic diagram and the second thermodynamic diagram, respectively, of millimeter wave radar signals having a time dimension in the training sample set comprises:

4. The method of claim 3, wherein the global information encoding module is specifically configured to:

5. The method of claim 3, wherein the attention pooling module is specifically configured to:

6. The method of claim 2, wherein determining the network loss value of the human bone sequence extraction model based on the predicted three-dimensional coordinates of each node in the human bone sequence and the three-dimensional coordinates of each node in the tag information comprises:

7. A human bone sequence extraction method is characterized by comprising the following steps:

inputting the first thermodynamic diagram and the second thermodynamic diagram of the millimeter wave radar signal with the time dimension into the human skeleton sequence extraction model obtained by training according to the method of any one of claims 1 to 6, and obtaining the three-dimensional coordinates of each node in the human skeleton sequence.

8. A training device for a human bone sequence extraction model comprises:

9. A human bone sequence extraction device, comprising:

a recognition module, configured to input the first thermodynamic diagram and the second thermodynamic diagram of the millimeter wave radar signal with the time dimension into the human skeleton sequence extraction model trained by the training apparatus according to claim 8, so as to obtain three-dimensional coordinates of each node in the human skeleton sequence.

10. A training device for a human bone sequence extraction model is characterized by comprising: a processor and a memory for storing a computer program capable of running on the processor, wherein,

the processor, when executing the computer program, is adapted to perform the steps of the method of any of claims 1 to 6.

11. A human bone sequence extraction apparatus, comprising: a processor and a memory for storing a computer program capable of running on the processor, wherein,

the processor, when executing the computer program, performs the steps of the method of claim 7.

12. A storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of the method of any one of claims 1 to 7.