CN113283380A

CN113283380A - Children motion attitude automatic identification method based on 3D convolution long-term and short-term memory network

Info

Publication number: CN113283380A
Application number: CN202110652217.0A
Authority: CN
Inventors: 庄悦阳; 李睿; 张洁欣; 江涛; 冯安豫; 陈经纬; 柳维林; 陈宪泽
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-08-20

Abstract

The invention discloses a children motion posture automatic identification method based on a 3D convolution long-term and short-term memory network, which comprises the following steps: step S1, recording three sections of videos of the inspected person walking naturally indoors by using a non-tip or tip and above camera, wherein the videos are the videos of the natural walking indoors, the videos are fixed at the iliac spine of the inspected person in a height horizontal mode, the videos are arranged 5m away from the inspected person by the camera, and the angles of the three sections of videos are the front side, the back side and the side face respectively; s2, carrying out optimal estimation on the motion target information by using a Kalman filtering technology on the acquired image video, and realizing optimal acquisition on the motion target tracking image; and step S3, preprocessing the optimal moving target tracking image to obtain an enhanced image. The invention detects the diseases of the whole body of the human body through the dynamic video collected by the non-tip or tip camera and the cameras above, has diversified corresponding modes and plays the roles of early warning and correction training.

Description

Children motion attitude automatic identification method based on 3D convolution long-term and short-term memory network

Technical Field

The invention relates to the technical field of intelligent medical treatment, in particular to a children motion posture automatic identification method based on a 3D convolution long-term and short-term memory network.

Background

The Long Short-Term Memory network (LSTM) is a time-cycle neural network, which is specially designed to solve the Long-Term dependence problem of the general RNN (cyclic neural network), and all RNNs have a chain form of repeated neural network modules. In the standard RNN, this repeated structure block has only a very simple structure, e.g. one tanh layer.

At present, the technology of diagnosing human diseases through computer vision is gradually mature, especially for diseases in the aspect of human body, such as humpback, lumbar vertebra arch and the like, the trend of problems of children groups is more and more younger, and at the same time, the technology is mostly used for checking by using a sensor on a human body belt, and the technology is assisted by a wearable system, so that the operation process is complex, time-consuming and labor-consuming. At present, most of people in the prior art are adults, and adult kinematic parameters and models cannot be used for children. Meanwhile, most of the conventional technical network structures are used for checking through inputting the pictures of the checked person, the difficulty in acquiring the human body posture in the picture processing is high, accurate human body posture judgment is not easy to make, so that the error of the diagnosis result is large and inaccurate, and the technology for diagnosing through computer vision becomes unreliable. Therefore, it is necessary to design a method for automatically recognizing the motion posture of the child based on a 3D convolution long-short term memory network.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the automatic children motion posture recognition method based on the 3D convolution long-short term memory network.

In order to achieve the purpose, the invention provides the following technical scheme: the children motion posture automatic identification method based on the 3D convolution long-term and short-term memory network comprises the following steps:

step S1, recording three sections of videos of the inspected person walking naturally indoors by using a non-tip or tip and above camera, wherein the videos are the videos of the natural walking indoors, the videos are fixed at the iliac spine of the inspected person in a height horizontal mode, the videos are arranged 5m away from the inspected person by the camera, and the angles of the three sections of videos are the front side, the back side and the side face respectively;

s2, carrying out optimal estimation on the motion target information by using a Kalman filtering technology on the acquired image video, and realizing optimal acquisition on the motion target tracking image;

step S3, preprocessing the optimal moving target tracking image to obtain an enhanced image;

step S4, processing the acquired video by adopting a deppose network in python language, and acquiring a human body posture template through processing;

s5, predicting a human posture template by adopting a deep learning framework keras of Google in python language; then, a classification network structure based on ConvLSTM is independently established for each disease by adopting a deep learning framework keras in python language to perform two-classification prediction;

and step S6, after analysis, a possible disease state result report is given, and an early warning or correction method is prompted.

Preferably, the acquiring of the optimal moving object tracking image based on the kalman filtering technique in step S2 includes the following steps:

1) acquiring an initial position and information of a target in an image video by acquiring a target image sequence in the image video, and endowing the information to a Kalman filter for initialization setting;

2) estimating the position of the target in the next frame of image video by using Kalman filtering to obtain an estimated value, and narrowing the range for the detection of the next frame;

3) and comparing the actually detected target value with the estimated value, updating the parameters of the filter, preparing for predicting the next frame, and acquiring the tracking image of the optimal moving target in repeated prediction and updating.

Preferably, the Kalman filter is described by two parts, namely a state equation and an observation equation, and if the Kalman filter is a discrete dynamic system, the two parts are an N-dimensional dynamic system and a P-dimensional (P ≦ N) observation system, wherein U is_KIs a 1-dimensional input vector, W_KNoise vector of dimension M, X_k(X∈Rⁿ) Is an N-dimensional state vector, and the state of the system is generally described by the following difference equation:

X_k＝AX_K-1+BU_K+W_K-1

the n X n matrix A is the state transition matrix of the system, and is used for measuring the system state X_k-1To X_kThe key to the transformation of (1), matrix B is an n × 1 matrix, which is actually the input transformation matrix, W_kIs white noiseAcoustic, the system's observation equation is as follows:

Z_k＝H_kX_k+V_k

where H is the observation matrix, the general localization constant, V_kIs a white noise sequence with a mean of 0.

Preferably, the step S3 is to pre-process the optimal moving object tracking image, and includes:

assuming that the noise on the image is additive, uncorrelated, and has a mean value of zero, a noisy digital image includes the original image signal f (x, y) and the noise n (x, y), i.e.: g (x, y) ═ f (x, y) + n (x, y), and the smoothed image obtained after the local averaging processing is:

wherein, S is the point set in the point (x, y) field, M is the total number of points (x, y) in the point set S, and the noise variance can be reduced to 1/M after the image is smoothed.

Preferably, the 3D convolution calculation formula in step S5 is as follows:

Input：(N，Cin，Din，Hin，Win)；

Output：(N，Cout，Dout，Hout，Wout)where；

wherein N is batch _ size, C is the number of channels, D is the image depth, H is the image height, W is the image width, Padding is the space between the element frame and the element content, kernel is the convolution kernel, and stride is the step length of each dimension of the image;

ConvLSTM is calculated as:

it＝σ(Wxi*Xt+Whi*Ht-1+WcioCt-1+bi)；

ft＝σ(Wxf*Xt+Whf*Ht-1+WcfoCt-1+bf)；

Ct＝ftoCt-1+itotanh(Wxc*Xt+Whc*Ht-1+bc)；

ot＝σ(Wxo*Xt+Who*Ht-1+WcooCt+bo)；

Ht＝ototanh(Ct)；

because the training adopts the condition that a plurality of networks are set up to independently analyze each disease condition, the utilization loss of the training is reduced;

the function is a two-class loss function:

wherein m is the number of samples, y is the real label of the training set, t is the last layer Dense output of the network, and Π (t) is the sigmoi activation function as follows:

wherein t is the output result of the last layer of the network;

optimization algorithm Adam formula:

mt＝β1mt-1+(1-β1)gt；

vt＝β2vt-1+(1-β2)g 2；

wherein g is the gradient of t time step, m is the first-order gradient momentum of time step, v is the second-order gradient momentum of time step, m ^ and v ^ are the gradient correction to the momentum, θ t represents the model parameter at the time of the t iteration, ε is a minimum value to prevent the denominator from being 0, β is the power parameter, and 0.9 and 0.999 are respectively taken.

The invention has the beneficial effects that:

1. the invention detects the diseases of the whole body of the human body through the dynamic video acquired by the non-tip or tip camera and the cameras, has diversified corresponding modes and plays the roles of early warning and correction training, and the technology particularly collects the motion attitude data of children and provides a new method for early warning, auxiliary diagnosis and correction training of diseases of young children;

2. according to the invention, the acquired image video is subjected to optimal estimation on the moving target information by using the Kalman filtering technology, so that the optimal acquisition of the moving target tracking image is realized, and the noise reduction processing is carried out on the optimal moving target tracking image, so that the acquired picture quality can be effectively improved, and the later diagnosis precision is conveniently improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a children motion gesture automatic identification method based on a 3D convolution long-short term memory network;

FIG. 2 is a flow chart of the present invention for optimal moving object tracking image acquisition based on Kalman filtering technique.

Detailed Description

The technical scheme of the invention is clearly and completely described in the following with reference to the accompanying drawings. In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance.

Example one

As shown in fig. 1-2, the present invention provides the following technical solutions: the children motion posture automatic identification method based on the 3D convolution long-term and short-term memory network comprises the following steps:

Preferably, the acquisition of the optimal moving object tracking image based on the kalman filtering technique in step S2 includes the following steps:

Preferably, the Kalman filter is described by two parts, a state equation and an observation equation, respectively, and in the case of a discrete dynamic system, the two parts are an N-dimensional dynamic system and a P-dimensional (P ≦ N) observation system, respectively, where U is_KIs a 1-dimensional input vector, W_KNoise vector of dimension M, X_k(X∈Rⁿ) Is an N-dimensional state vector, and the state of the system is generally described by the following difference equation:

X_k＝AX_K-1+BU_K+W_K-1

the n X n matrix A is the state transition matrix of the system, and is used for measuring the system state X_k-1To X_kThe key to the transformation of (1), matrix B is an n × 1 matrix, which is actually the input transformation matrix, W_kIs white noise, the system's observation equation is as follows:

Z_k＝H_kX_k+V_k

Preferably, the 3D convolution calculation formula in step S5 is as follows:

Input：(N，Cin，Din，Hin，Win)；

Output：(N，Cout，Dout，Hout，Wout)where；

ConvLSTM is calculated as:

it＝σ(Wxi*Xt+Whi*Ht-1+WcioCt-1+bi)；

ft＝σ(Wxf*Xt+Whf*Ht-1+WcfoCt-1+bf)；

Ct＝ftoCt-1+itotanh(Wxc*Xt+Whc*Ht-1+bc)；

ot＝σ(Wxo*Xt+Who*Ht-1+WcooCt+bo)；

Ht＝ototanh(Ct)；

the function is a two-class loss function:

wherein t is the output result of the last layer of the network;

optimization algorithm Adam formula:

mt＝β1mt-1+(1-β1)gt；

vt＝β2vt-1+(1-β2)g 2；

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The children motion attitude automatic identification method based on the 3D convolution long-term and short-term memory network is characterized by comprising the following steps: the method comprises the following steps:

2. The method for automatically recognizing the motion posture of the child based on the 3D convolution long-short term memory network as claimed in claim 1, wherein the method comprises the following steps: the acquisition of the optimal moving target tracking image based on the kalman filtering technique in step S2 includes the following steps:

3. The method for automatically recognizing the motion posture of the child based on the 3D convolution long-short term memory network as claimed in claim 2, wherein the method comprises the following steps: the Kalman filter is described by two parts, namely a state equation and an observation equation, if the Kalman filter is a discrete dynamic system, the two parts are an N-dimensional dynamic system and a P-dimensional (P is less than or equal to N) observation system respectively, wherein U is_KIs a 1-dimensional input vector, W_KNoise vector of dimension M, X_k(X∈Rⁿ) Is an N-dimensional state vector, and the state of the system is generally described by the following difference equation:

X_k＝AX_K-1+BU_K+W_K-1

Z_k＝H_hX_k+V_k

4. The method for automatically recognizing the motion posture of the child based on the 3D convolution long-short term memory network as claimed in claim 1, wherein the method comprises the following steps: the step S3 is to pre-process the optimal moving object tracking image, and includes:

5. The method for automatically recognizing the motion posture of the child based on the 3D convolution long-short term memory network as claimed in claim 1, wherein the method comprises the following steps: the 3D convolution calculation formula in step S5 is as follows:

Input：(N,Cin,Din,Hin,Win)；

Output：(N,Cout,Dout,Hout,Wout)where；