CN111275694B

CN111275694B - Attention mechanism guided progressive human body division analysis system and method

Info

Publication number: CN111275694B
Application number: CN202010081219.4A
Authority: CN
Inventors: 邵杰; 黄茜; 曹坤涛; 徐行
Original assignee: Research Institute Of Yibin University Of Electronic Science And Technology; University of Electronic Science and Technology of China
Current assignee: Research Institute Of Yibin University Of Electronic Science And Technology; University of Electronic Science and Technology of China
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2020-10-23
Anticipated expiration: 2040-02-06
Also published as: CN111275694A

Abstract

The invention discloses a system and a method for analyzing a human body by progressive division guided by an attention mechanism, wherein the proposed system mainly explores the enhancement effect of significance detection on human body analysis and the effectiveness of the attention mechanism on human body analysis. On the network structure, a feature extraction module is constructed, feature information is effectively extracted, multi-dimensional features are fused, and the human body analysis effect is enhanced; an adaptive attention module is designed to carry out position attention weighting on the features, and an effective fusion idea for fusing different levels of features is provided; finally, the significance detection and the human body analysis are integrated into an end-to-end network structure in a bottom-to-top mode, and the modules are applied to all branches, so that a unified effective structure is obtained. The performance exceeds the performance of the existing known method, and the optimal human body analysis effect is shown.

Description

Attention mechanism guided progressive human body division analysis system and method

Technical Field

The invention belongs to the field of image processing, and particularly relates to a system and a method for analyzing a progressively divided human body guided by an attention mechanism.

Background

Understanding human anatomy is a crucial but challenging topic in computer vision, and human body interpretation is one of the tasks to achieve this goal. Human body parsing is a dense prediction task aimed at accurately locating the human body and further dividing it into multiple semantic regions at the pixel level. In recent years, human body analysis is widely applied to other tasks also aimed at analyzing human body, such as pedestrian re-recognition, posture estimation, and human body image generation.

In recent work, researchers have proposed various methods to improve the expressiveness of the human body analysis network. One typical approach is to utilize additional domain information provided by other related tasks. For example, some work (fanging Xia, Peng Wang, Xianjie Chen and Alan L.Yuille. Joint Multi-person position estimation and magnetic part segmentation [ C ]. CVPR,2017: 6080-6089. and XuechENIE, JianshiFeng and Shuiching Yang.Multi free to adaptation for joint human positioning and position estimation [ C ]. ECCV,2018: 519-534) investigated the guidance of pose structure to human body interpretation by adding joint structure losses or dynamically updating model constraints learned from the pose estimation task. There have been other works (KeGong, Xiaoodan Liang, Yiche Li, Yimin Chen, Ming Yang, Liang Lin. instant-level Human matching via part grouping network [ C ]. ECCV,2018: 805-822. and Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhua. devil in the Details: TodsAccurate Single and Multiple Human matching [ C ]. AAAI,2019: 4814-. Although these information fusions provide a satisfactory improvement, there may be incompatibilities with training multiple tasks in the same network due to inconsistent optimization objectives, which somewhat weakens the predictive power of the overall structure.

In previous work (Ke Gong, Xiaoodan Liang, Dongyu Zhang, XiaohuiShen and Liang Lin. Look into Person: Self-Supervised Structure-Sensitive Learning and aNew Benchmark for Human matching [ C ] CVPR,2017: 6757-6765. and Xiaoodan Liang, KeGong, XiaohuiShen and Liang Lin. Look into Person: Joint Body matching & Poseestimation network world a New Benchk [ J ] TPAMI,2019:41(4)871 885), the approach of applying the attention machine did not explore the adaptive attention module for Human analysis tasks, but simply used some attention modules along the semantic meaning of Human Body, and thus did not refine the Human Body well.

Disclosure of Invention

Aiming at the defects in the prior art, the attention mechanism guided progressive human body analysis dividing system and method provided by the invention solve the problem that the prior art cannot accurately predict and analyze human body parts and analyze significance.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

an attention mechanism guided progressive compartmentalization human body interpretation system comprising: the system comprises a residual error neural network ResNet-101, a significance detection subsystem and a human body analysis subsystem;

the residual error neural network ResNet-101 is a structural neural network and is used for processing a human body image to obtain a shallow layer low-level feature map and a deep layer high-level feature map; the output Block1 and the output Block2 are connected with the significance detection subsystem in a communication mode and used for inputting the shallow low-level feature map into the significance detection subsystem; the output Block3 and the output Block4 are in communication connection with the human body analysis subsystem and are used for inputting the deep high-level feature map into the human body analysis subsystem;

the significance detection subsystem is used for carrying out significance prediction on the shallow low-level feature map to obtain two classification significance prediction maps;

the human body analysis subsystem is used for carrying out human body analysis prediction on the deep-layer high-level characteristic diagram to obtain a human body analysis prediction diagram.

Further, the significance detection subsystem includes: convolutional layer Conv1, convolutional layer Conv2, convolutional layer Conv3, convolutional layer Conv4, adaptive attention module GAM1, upsampling module 1, and upsampling module 2;

the convolutional layer Conv1 is a 1 × 1 convolutional layer, is used for performing dimension reduction processing on a shallow layer low-level feature map transmitted by an output Block1 of a residual neural network ResNet-101, and has an input end in communication connection with the output Block1 of the residual neural network ResNet-101 and an output end in communication connection with an input end A of an adaptive attention module GAM 1;

the convolutional layer Conv2 is a 1 × 1 convolutional layer, is used for performing dimension reduction processing on a shallow layer low-level feature map transmitted by an output Block2 of a residual neural network ResNet-101, and has an input end in communication connection with the output Block2 of the residual neural network ResNet-101 and an output end in communication connection with an input end of an up-sampling module 1;

the up-sampling module 1 is used for performing up-sampling processing on image data subjected to dimensionality reduction processing and transmitted by a shallow layer low-level feature map from an output Block2 of a residual neural network ResNet-101, and the output end of the up-sampling module is in communication connection with an input end B of an adaptive attention module GAM 1;

the adaptive attention module GAM1 is used for extracting attention features, and the output end of the adaptive attention module GAM1 is in communication connection with the convolutional layer Conv3 and is in communication connection with the human body analysis subsystem for feature enhancement;

the convolutional layer Conv3, the convolutional layer Conv4 and the up-sampling module 2 are used for processing the attention features extracted by the adaptive attention module GAM1 to obtain a two-classification significance prediction graph; the convolutional layer Conv3 is a 3 × 3 convolutional layer, the output of which is communicatively connected to the input of convolutional layer Conv 4; the convolutional layer Conv4 is a 1 × 1 convolutional layer, and the output end of the convolutional layer is in communication connection with the input end of the up-sampling module 2; the output end of the up-sampling module 2 is used as the processing result output end of the significance detection subsystem to output the two-classification significance prediction graph obtained by the system operation.

Further, the human body analysis subsystem includes: a feature extraction module FEM1, a feature extraction module FEM2, an adaptive attention module GAM2, an upsampling module 3, an upsampling module 4, an adding module 1, a convolutional layer Conv5, and a convolutional layer Conv 6;

the feature extraction module FEM1 is used for carrying out multi-dimensional feature extraction on a deep high-level feature map transmitted by an output Block Block3 of a residual neural network ResNet-101 to obtain multi-dimensional context information, the input end of the feature extraction module FEM1 is in communication connection with the output Block Block3 of the residual neural network ResNet-101, and the output end of the feature extraction module FEM1 is in communication connection with the input end A of the adaptive attention module GAM 2;

the feature extraction module FEM2 is used for carrying out multi-dimensional feature extraction on a deep high-level feature map transmitted by an output Block Block4 of a residual neural network ResNet-101 to obtain multi-dimensional context information, the input end of the feature extraction module FEM2 is in communication connection with the output Block Block4 of the residual neural network ResNet-101, and the output end of the feature extraction module FEM2 is in communication connection with the input end B of the adaptive attention module GAM 2;

the adaptive attention module GAM2 is used for processing multi-dimensional context information to obtain effective weighting characteristics, and the output end of the adaptive attention module GAM2 is in communication connection with the input end of the up-sampling module 3;

the up-sampling module 3 is used for performing up-sampling processing on the effective weighting characteristics, and the output end of the up-sampling module is in communication connection with the input end A of the addition module 1;

the addition module 1 is used for adding the attention characteristics extracted by the adaptive attention module GAM1 and the effective weighting characteristics obtained by the adaptive attention module GAM2 according to elements so as to fuse the characteristic diagrams provided by the adaptive attention module GAM1 and the adaptive attention module GAM2, highlight a target area and improve compactness among classes; its input B is communicatively connected to the output of the adaptive attention module GAM1, and its output is communicatively connected to the input of the convolutional layer Conv 5;

the convolutional layer Conv5, the convolutional layer Conv6 and the up-sampling module 4 are used for processing the attention characteristics obtained by adding the elements by the addition module 1 to obtain a human body analysis prediction graph; the convolutional layer Conv5 is a 3 × 3 convolutional layer, the output of which is communicatively connected to the input of convolutional layer Conv 6; the convolutional layer Conv6 is a 1 × 1 convolutional layer, and the output end of the convolutional layer is connected with the input end of the up-sampling module 4 in a communication manner; the output end of the up-sampling module 4 is used as the processing result output end of the human body analysis subsystem to output the human body analysis prediction graph obtained by the system operation.

Further, the feature extraction module FEM1 and the feature extraction module FEM2 each include: convolutional layer Conv11, convolutional layer Conv12, convolutional layer Conv13, convolutional layer Conv14, convolutional layer Conv15, convolutional layer Conv16, convolutional layer Conv17, and addition module 11;

an input of the convolutional layer Conv11 communicatively connected to an input of convolutional layer Conv12, a convolutional layer Conv13 input, and an input of convolutional layer Conv14, and acting as an input of feature extraction module FEM1 and an input of feature extraction module FEM 2; an output of the convolutional layer Conv11 is communicatively connected with an input of convolutional layer Conv 15; an output of the convolutional layer Conv12 is communicatively connected with an input of convolutional layer Conv 16; an output of the convolutional layer Conv13 is communicatively connected with an input of convolutional layer Conv 17; an output of the convolutional layer Conv14 is communicatively connected to an input a of the summing module 11, an output of the convolutional layer Conv15 is communicatively connected to an input B of the summing module 11, an output of the convolutional layer Conv16 is communicatively connected to an input C of the summing module 11, and an output of the convolutional layer Conv17 is communicatively connected to an input D of the summing module 11; the output end of the addition module 11 is used as the output end of the feature extraction module FEM1 and the output end of the feature extraction module FEM 2;

the convolutional layer Conv11 is a 3 × 3 void convolutional layer, and the void convolution rate is 3;

the convolutional layer Conv12 is a 3 × 3 void convolutional layer, and the void convolution rate is 8;

the convolutional layer Conv13 is a 3 × 3 void convolutional layer, and the void convolution rate is 12;

the convolutional layers Conv14, Conv15, Conv16 and Conv17 were all 1 × 1 convolutional layers.

Further, the adaptive attention module GAM1 and the adaptive attention module GAM2 each include: convolutional layer Conv21, convolutional layer Conv22, global mean pooling layer 21, global mean pooling layer 22, addition module 21, Softmax layer, and multiplication module 21;

the convolutional layer Conv21 is a 1 × 1 convolutional layer with inputs as input a of adaptive attention module GAM1 and input a of adaptive attention module GAM2, and outputs communicatively connected to inputs of the global mean pooling layer 21;

the convolutional layer Conv22 is a 1 × 1 convolutional layer with inputs as input B of adaptive attention module GAM1 and input B of adaptive attention module GAM2, and outputs communicatively connected to inputs of global mean pooling layer 22;

the output end of the global pooling layer 21 is in communication connection with the input end A of the adding module 21, and the output end of the global mean pooling layer 22 is in communication connection with the input end B of the adding module 21;

an output of the summing module 21 is communicatively coupled to an input of a Softmax layer;

an output of the Softmax layer is communicatively connected to an input of a multiplication module 21;

the output of the multiplication module 21 serves as the output of the adaptive attention module GAM1 and the output of the adaptive attention module GAM 2.

The adaptive attention module focuses on selectively extracting location information and fusing different levels of weighted attention features to achieve mutual information fusion. Featuring input data of adaptive attention module

Wherein C, H, W represents the number of characteristic channels, height and width, respectively, and i represents the ith operation. The inputs to the attention module are two different levels of profiles A and B, denoted respectively

And

feature(s)

And

after passing through the convolutional layer Conv21 and the convolutional layer Conv22, respectively, the number of channels is reduced to C/2;

newly acquired features

And

the number of channels is further reduced by the global mean pooling layer 21 and the global mean pooling layer 22, and the processing flow can be expressed as the following expression:

and

after the two feature maps a and B at different levels are processed as described above, the fusion is completed by the addition module 21 by adding elements, which is done to keep more residual attention weight information. Then, it is passed through a normalization operation so that the weight values are between (0, 1), which is implemented by the Softmax layer.

Such as formula

Shown;

finally, the original features are connected

And

as S ∈ R^2C×H×WMultiply it by element with the weight obtained from the previous operation to obtain the final weighted feature map, such as

As shown.

An attention mechanism guided progressive human body division analysis method comprises the following steps:

s1, obtaining human body images of the known corresponding two-classification significance prediction image and human body analysis prediction image from the big data platform to form a training data set and a test data set;

s2, training the attention mechanism guided progressive dividing human body analysis system through the training data set to obtain a trained attention mechanism guided progressive dividing human body analysis system;

s3, verifying the trained attention mechanism guided progressive dividing human body analysis system through the test data set to obtain a verified attention mechanism guided progressive dividing human body analysis system;

and S4, predicting and analyzing the human body image through the verified attention mechanism guided progressive dividing human body analysis system to obtain a two-classification significance prediction graph and a human body analysis prediction graph corresponding to the human body image.

Further, the step S2 includes the following steps:

s21, preprocessing the training data set;

s22, setting the initial parameters and training rules of the human body analysis system by the progressive division guided by the attention mechanism;

and S23, performing parameter iteration on each module in the attention mechanism guided progressive division human body analysis system according to the preprocessed training data set through a back propagation method.

Further, the step S21 includes the following steps: and carrying out random scaling processing of 0.5-1.5 on the data in the training data set and carrying out operations of cutting and left-right turning on the data in the training data set.

Further, the initial parameters and the training rules in step S22 include the following expressions:

L_APPNet＝L_parsing+αL_sailency(2)

α＝1 (3)

power＝0.9 (4)

base_lr＝0.007 (5)

wherein, formula 1 is a learning rate iteration rule, lr is a current learning rate, base _ lr is an initial learning rate, iter is a current iteration number, max _ iter is a total iteration number, and power is an index parameter; equation 2 is the loss function of the training rule, L_parsingFor the cross-entropy loss of the segmentation prediction graph and the segmentation annotation graph, L_sailencyThe cross entropy loss of the significance prediction graph and the real labeling graph is shown, and alpha is a proportion parameter used for balancing the segmentation loss and the significance loss.

The invention has the beneficial effects that: the system provided by the invention mainly explores the enhancement effect of significance detection on human body analysis and the effectiveness of attention mechanism on human body analysis. On the network structure, a feature extraction module is constructed, feature information is effectively extracted, multi-dimensional features are fused, and the human body analysis effect is enhanced; an adaptive attention module is designed to carry out position attention weighting on the features, and an effective fusion idea for fusing different levels of features is provided; finally, the significance detection and the human body analysis are integrated into an end-to-end network structure in a bottom-to-top mode, and the modules are applied to all branches, so that a unified effective structure is obtained. The performance exceeds the performance of the existing known method, and the optimal human body analysis effect is shown.

Drawings

FIG. 1 is a block diagram of an attention mechanism guided progressive partition body analysis system;

FIG. 2 is a block diagram of a feature extraction module architecture;

FIG. 3 is a block diagram of an adaptive attention module architecture;

FIG. 4 is a schematic flow chart of a human body analysis method using progressive segmentation guided by attention mechanism;

FIG. 5 is a graph showing the effect of the experiment.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1: an attention mechanism guided progressive compartmentalization human body interpretation system comprising: the system comprises a residual error neural network ResNet-101, a significance detection subsystem and a human body analysis subsystem;

The significance detection subsystem includes: convolutional layer Conv1, convolutional layer Conv2, convolutional layer Conv3, convolutional layer Conv4, adaptive attention module GAM1, upsampling module 1, and upsampling module 2;

The human body analysis subsystem comprises: a feature extraction module FEM1, a feature extraction module FEM2, an adaptive attention module GAM2, an upsampling module 3, an upsampling module 4, an adding module 1, a convolutional layer Conv5, and a convolutional layer Conv 6;

As shown in fig. 2: the feature extraction module FEM1 and the feature extraction module FEM2 each include: convolutional layer Conv11, convolutional layer Conv12, convolutional layer Conv13, convolutional layer Conv14, convolutional layer Conv15, convolutional layer Conv16, convolutional layer Conv17, and addition module 11;

As shown in fig. 3: the adaptive attention module GAM1 and the adaptive attention module GAM2 each include: convolutional layer Conv21, convolutional layer Conv22, global mean pooling layer 21, global mean pooling layer 22, addition module 21, Softmax layer, and multiplication module 21;

The adaptive attention module focuses on selectively extracting location informationAnd fusing different levels of weighted attention features to achieve mutual information fusion. Featuring input data of adaptive attention module

And

feature(s)

And

newly acquired features

And

and

after the two feature maps a and B at different levels are processed as described above, the fusion is completed by the addition module 21 by adding elements, which is done to keep more residual attention weight information. Then, let it passAnd normalizing operation to make the weight value between (0, 1), wherein the operation is realized by a normalization module Softmax. Such as formula

Finally, the original features are connected

And

As shown.

As shown in fig. 4: an attention mechanism guided progressive human body division analysis method comprises the following steps:

in this example, three mainstream human body analytic data sets including LIP, CIHP, and PPSS were selected for the experiment.

LIP is a current maximum number of human body analysis data sets, which comprises 50462 pictures, wherein 30462 pictures are used for training, 10000 pictures are used for verification, and the remaining 10000 pictures are used for testing. The data set contains 20 categories in total, and most pictures contain only a single human body.

The CIHP is a data set for instance body analysis, each picture contains multiple instances, and the pictures are more complex and challenging compared with the existing mainstream data set. The data set contains 38280 pictures, 28280 pictures for training, 5000 pictures in the test set and 5000 pictures in the validation set, and 20 pictures in the category classification.

PPSS is a small human body analytic data set, mainly composed of pedestrian pictures, with the complexity of a real scene. The data set was collected from 171 video sequences and contained 3673 pictures in total. Wherein, the training set consists of the first 100 sequences, and the test set consists of the last 71 sequences. The data set contains a total of 8 categories.

The three data sets are selected to verify the adaptability and robustness of the system to different types of data sets, and the LIP and the CIHP both contain 20 classes, which belong to a complex multi-class analysis problem. Meanwhile, the CIHP comprises a plurality of examples, and the difficulty of analysis is increased. In addition, the PPSS is a data set with a small classification number, mainly consists of pedestrian pictures, has a different picture style from the first two data sets, and can be used for detecting the robustness of the system.

The step S2 includes the steps of:

s21, preprocessing the training data set;

The step S21 includes the following: and carrying out random scaling processing of 0.5-1.5 on the data in the training data set and carrying out operations of cutting and left-right turning on the data in the training data set. The significance labeling graph in the training data set is obtained by unifying non-background pixels in the labeling graph, and finally, the background class is marked by '0' and the edge is marked by '1'.

The initial parameters and the training rules in step S22 include the following expressions:

L_APPNet＝L_parsing+αL_sailency(2)

α＝1 (3)

power＝0.9 (4)

base_lr＝0.007 (5)

In the training process of this embodiment, different picture input sizes are adopted because of differences in data of the three platforms LIP, CIHP, and PPSS. For LIP, the input size is 473 × 473; for CIHP, the input size used is 512 × 512; for PPSS, the input size is 256 × 256. The three data set classification cases also have differences, the number of LIP and CIHP categories K is set to 20, and the number of PPSS categories K is 8.

The system provided by the invention is trained and verified on the three data sets mentioned in the steps. In the verification process, an edge label graph does not need to be generated. All experiments take the average cross-over ratio mIoU as an evaluation standard, and the formula is

Where K +1 represents the total number of categories of the data set (corresponding to the number of categories K), p_ijRepresenting class i is recognizedTotal number of pixels classified as class j, p_jiRepresenting the total number of pixels, p, for which class j is identified as class i_iiIndicating that the correct total number of pixels was identified. The experimental results show that the mIoU realized by the system on LIP, CIHP and PPSS is 54.08%, 59.88% and 60.2% respectively. The performance on all three data sets outperformed the existing methods. This proves that the system provided by the invention has effectiveness, robustness and universality in solving the human body analysis of the actual scene. Fig. 5 shows a comparison of the effect of the human segmentation map generated by the human body analysis system proposed by the present invention. In the verification process, in order to prove the effectiveness of the feature extraction module and the attention module provided by the invention, a series of experiments of eliminating the above modules from the original system are performed on the LIP data set, and the specific experimental results are shown in the following table, wherein GAM1 represents the attention module used in the significance detection subsystem, and GAM2 represents the attention module used in the human body analysis subsystem. A comparison of the segmentation maps generated with the original system is also shown in FIG. 5, in which CE2P is a paper (Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, Yao zhao. device in the Details: Towards Accurate Single and Multiple HumanParsing [ C)]AAAI,2019: 4814-. The comparison shows that the two modules provided by the invention have outstanding enhancement effect and application value.

TABLE 1 comparison of the performance of mIoU of the present invention with the method described in each article

Claims

1. An attention mechanism guided progressive compartmentalization human body analysis system comprising: the system comprises a residual error neural network ResNet-101, a significance detection subsystem and a human body analysis subsystem;

the convolutional layer Conv3, the convolutional layer Conv4 and the up-sampling module 2 are used for processing the attention features extracted by the adaptive attention module GAM1 to obtain a two-classification significance prediction graph; the convolutional layer Conv3 is a 3 × 3 convolutional layer, the output of which is communicatively connected to the input of convolutional layer Conv 4; the convolutional layer Conv4 is a 1 × 1 convolutional layer, and the output end of the convolutional layer is in communication connection with the input end of the up-sampling module 2; the output end of the up-sampling module 2 is used as a processing result output end of the significance detection subsystem to output two classification significance prediction graphs obtained by the system operation; the human body analysis subsystem is used for carrying out human body analysis prediction on the deep-layer high-level characteristic diagram to obtain a human body analysis prediction diagram;

the convolutional layer Conv5, the convolutional layer Conv6 and the up-sampling module 4 are used for processing the attention characteristics obtained by adding the elements by the addition module 1 to obtain a human body analysis prediction graph; the convolutional layer Conv5 is a 3 × 3 convolutional layer, the output of which is communicatively connected to the input of convolutional layer Conv 6; the convolutional layer Conv6 is a 1 × 1 convolutional layer, and the output end of the convolutional layer is connected with the input end of the up-sampling module 4 in a communication manner; the output end of the up-sampling module 4 is used as the processing result output end of the human body analysis subsystem to output the human body analysis prediction graph obtained by the operation of the subsystem.

2. The attention mechanism-guided progressive segmentation human body analysis system as claimed in claim 1, wherein the feature extraction module FEM1 and the feature extraction module FEM2 each comprise: convolutional layer Conv11, convolutional layer Conv12, convolutional layer Conv13, convolutional layer Conv14, convolutional layer Conv15, convolutional layer Conv16, convolutional layer Conv17, and addition module 11;

3. The attention mechanism-guided progressive segmentation human body interpretation system of claim 1, wherein the adaptive attention module GAM1 and adaptive attention module GAM2 each comprise: convolutional layer Conv21, convolutional layer Conv22, global mean pooling layer 21, global mean pooling layer 22, addition module 21, Softmax layer, and multiplication module 21;

the output end of the global mean pooling layer 21 is in communication connection with the input end A of the adding module 21, and the output end of the global mean pooling layer 22 is in communication connection with the input end B of the adding module 21;

4. A human body analysis method by progressive division guided by attention mechanism is characterized by comprising the following steps:

5. The attention mechanism-guided progressive segmentation human body analysis method according to claim 4, wherein the step S2 includes the steps of:

s21, preprocessing the training data set;

6. The attention mechanism-guided progressive segmentation human body analysis method according to claim 5, wherein the step S21 includes the following steps: and carrying out random scaling processing of 0.5-1.5 on the data in the training data set and carrying out operations of cutting and left-right turning on the data in the training data set.

7. The attention mechanism-guided progressive segmentation human body analysis method according to claim 5, wherein the initial parameters and the training rules in the step S22 include the following expressions:

L_APPNet＝L_parsing+αL_sailency(2)

α＝1(3)

power＝0.9(4)

base_lr＝0.007(5)

wherein, the formula (1) is a learning rate iteration rule, lr is a current learning rate, base _ lr is an initial learning rate, iter is a current iteration number, max _ iter is a total iteration number, and power is an index parameter; equation (2) is a loss function of the training rule, L_parsingFor the cross-entropy loss of the segmentation prediction graph and the segmentation annotation graph, L_sailencyThe cross entropy loss of the significance prediction graph and the real labeling graph is shown, and alpha is a proportion parameter used for balancing the segmentation loss and the significance loss.