CN110807434B

CN110807434B - Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Info

Publication number: CN110807434B
Application number: CN201911078998.6A
Authority: CN
Inventors: 陈彬; 赵聪聪; 白雪峰; 于水; 胡明亮; 朴铁军
Original assignee: Weihai Ruowei Information Technology Co ltd
Current assignee: Weihai Ruowei Information Technology Co ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2023-08-15
Anticipated expiration: 2039-11-06
Also published as: CN110807434A

Abstract

A pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity comprises a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-recognition model loading module and a user retrieval module; the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model; the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module; compared with the prior art, the invention has the beneficial effects that: in the aspect of human body analysis, the neural network model is designed in a mode of combining coarse granularity and fine granularity, so that human semantics of different layers are emphasized, pedestrian characteristics with more discrimination are extracted, and the accuracy is improved; and moreover, a loss function is designed by combining the knowledge distillation idea, so that the recognition time of pedestrian re-recognition is effectively reduced by optimizing the training of the network, and the efficiency is improved.

Description

Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Technical Field

The invention relates to the field of pedestrian re-identification, in particular to a pedestrian re-identification system based on human body analysis coarse-fine granularity combination.

Background

In the face of massive videos, the traditional analysis of videos by manpower is a very labor-consuming matter, and meanwhile, the long-time observation is easy to cause visual fatigue of workers to cause certain errors. Aiming at the problems existing in the traditional manual search, people begin to pay attention to the problem of accurately and efficiently completing the search of interested pedestrians from a large number of videos by means of a computer vision technology, so that pedestrians in videos under different cameras are analyzed by utilizing a Person Re-Identification technology in computer vision to assist and even replace workers.

Unlike the face recognition technology which requires the cooperation of the person to be recognized and requires high-quality pictures, the technology does not require the cooperation of the person to be recognized and can recognize the low-resolution pedestrian image in a complex scene, and the appearance of the interested pedestrian in the monitoring camera network can be quickly inquired by the technology, so that the pedestrian recognition technology has wide application prospects in the intelligent security field, the man-machine interaction field and the new retail field.

The current study of pedestrian re-identification mainly aims at extracting features of pedestrian pictures to obtain robust features capable of coping with complex changes of different camera scenes so as to realize accurate matching of target pedestrians. The research of the traditional pedestrian re-identification method is mainly divided into two aspects: 1) Characteristic representation learning, namely, dealing with the appearance change of pedestrians under different camera angles by designing characteristic representations with certain invariance to the identities of the pedestrians; 2) Metric learning, namely mapping high-dimensional features to a new feature space by learning, so that the same person features are closer in distance and different persons are farther in distance under the new feature space. In 2014, as researchers introduce deep learning into the pedestrian re-recognition field, feature representation learning and metric learning can be jointly optimized in an end-to-end manner through a convolutional neural network, performance exceeds that of a traditional method, and deep learning is also gradually the main stream method in the pedestrian re-recognition field.

In the development of pedestrian re-recognition, in the process from two stages of feature extraction and measurement learning of a traditional method to end-to-end learning based on deep learning, the pedestrian re-recognition technology based on deep learning adopts data driving to learn end-to-end to improve the robustness and discrimination capability of the changing features of pedestrian pictures under different cameras. The current pedestrian re-recognition method based on deep learning has obtained good results on most public data sets, but because pedestrian pictures in the data sets are usually obtained through manual cutting and screening, the prior information of human body structures often has larger field deviation with a monitoring scene in the pre-training process of a large data set ImageNet in the current pedestrian re-recognition technology, and the error rate of pedestrian re-recognition can be increased by dividing the pedestrian pictures by using incorrect prediction results; in addition, in the attention degree of the features of different areas of the image details to be identified, the pedestrian re-identification technology is also often subjected to the problem of re-identification defects caused by different illumination, different angles of cameras and the like.

Disclosure of Invention

The invention solves the problem of providing a pedestrian re-identification system and a pedestrian re-identification method based on the combination of human body analysis coarse-fine granularity, which can effectively enhance the accuracy and efficiency of pedestrian re-identification under different visual angles, postures and illumination changes.

The pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity is characterized by comprising a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian characteristic extraction module, a human body re-recognition model loading module and a user retrieval module;

the parameter pre-training initialization module is used for initializing a network by parameter pre-training in the public data set to obtain a pedestrian re-identification network model;

the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module;

the user retrieval module is used for uploading the human body image to be retrieved and sending the human body image to the video image analysis module;

the video image analysis module comprises a video decoding submodule and an image preprocessing submodule, and the video decoding submodule is used for decoding and processing the video data uploaded by the monitoring video data reading module into a processable image; the image preprocessing sub-module is used for improving the visual effects of the image after video decoding and the human body image to be retrieved;

the pedestrian characteristic extraction module is used for designing a coarse-fine granularity combined neural network, and coarse-granularity branches and fine-granularity branches in the coarse-fine granularity combined neural network are used for respectively extracting pedestrian characteristics of the video decoded image and the human body image to be searched and storing the images;

the human body re-recognition model loading module is used for carrying out retrieval matching by utilizing the pedestrian re-recognition network model according to the stored pedestrian characteristics and the human body images to be retrieved, and calculating to obtain the similarity.

In the above technical solution, further, the user search module is further configured to set a similarity threshold. By setting the similarity threshold, the pedestrian pictures with different degrees of similarity can be identified, so that the identification standard is more flexible.

In the above technical solution, further, the body weight recognition model loading module is further configured to feed back the calculated similarity to the user retrieval module.

A method of a pedestrian re-identification system as described, comprising the steps of:

step A: performing parameter pre-training on the disclosed data set to initialize a network to obtain a pedestrian re-identification network model;

and (B) step (B): uploading and reading video data in the monitoring video data reading module; the video decoding submodule decodes video data, processes the video data into an employable picture format, carries out image pretreatment on the video data, and then utilizes a designed neural network model with combined coarse granularity and fine granularity, wherein the neural network model comprises a coarse granularity branch and a fine granularity branch; for coarse-grained branching, use is made ofThe distillation loss function is recognized to enhance the extraction of global features, and for fine-granularity branches, knowledge distillation loss function and triplet loss function are adopted to enhance the extraction of detail features; the learned characteristics are spliced to obtain a pedestrian characteristic set f _i The method comprises the steps of carrying out a first treatment on the surface of the Then, SE Block is utilized to learn a feature vector importance weight W to selectively enhance the feature with strong discrimination and inhibit the feature with weak discrimination;

W＝Sigmoid(FC(ReLU(FC(f _i ))))

wherein, two FC layers from inside to outside are used for compression and activation;

after obtaining the importance weight W of the pedestrian feature vector, outputting the pedestrian feature f ₀ ；

f ₀ ＝f _i *W+f _i

And storing;

step C: uploading the human body image to be searched in the searching module, and calculating and outputting the pedestrian characteristic of the human body image to be searched by utilizing the step B;

step D: and the pedestrian re-recognition network model extracts, detects and calculates the similarity of pedestrian features in the video decoded image according to the pedestrian features of the human body image to be searched, and if the similarity is higher than a threshold value, stores the similarity and returns the similarity in a similarity arrangement.

In the above technical solution, in step B, the picture format may be JPG or PNG. And the pictures with various formats are supported, and the adaptation breadth is improved.

In the above technical solution, further, in step B, the video data is from a monitoring camera.

In the above technical solution, further, in step B, the image preprocessing refers to performing distortion processing on the image. The quality of the image is improved, and the influence of interference information on the extraction of pedestrian characteristics is reduced.

In the above technical solution, further, in step B, the pedestrian feature f ₀ And storing the mat file. Facilitating later queries.

In the above technical solution, in step D, the decoded video image is detected using FPN-Person.

In the above technical solution, further, in step D, if the decoded image of the video has already undergone pedestrian feature extraction, CFNet is used to extract pedestrian features from the uploaded image of the human body to be retrieved, and the pedestrian features appearing in the corresponding mat file are read.

Compared with the prior art, the invention has the beneficial effects that: in the aspect of human body analysis, the neural network model is designed in a mode of combining coarse granularity and fine granularity, and human body semantics of different layers are emphasized, so that pedestrian characteristics with more discrimination are extracted, and the accuracy is improved; and moreover, the loss function is designed by combining knowledge distillation thought, so that the recognition time of pedestrian re-recognition is effectively reduced and the efficiency is improved by optimizing the training of the network.

Drawings

Fig. 1 is a block diagram of a pedestrian re-recognition system according to the present invention.

Fig. 2 is a flow chart of a method of the pedestrian re-recognition system according to the present invention.

Fig. 3 is a schematic illustration of human semantic attention in the method of the pedestrian re-recognition system of the present invention.

Fig. 4 is a schematic diagram of a pedestrian re-recognition network model of the pedestrian re-recognition system according to the present invention.

Fig. 5 is a schematic diagram of three classification of similarity information in the pedestrian re-recognition system according to the present invention.

Fig. 6 is a schematic diagram of a user search module starting process in the pedestrian re-recognition system according to the present invention.

Detailed Description

The invention is further described in the following examples with reference to the accompanying drawings.

1-6, a pedestrian re-recognition system based on human body analysis coarse-fine granularity combination comprises a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-recognition model loading module and a user retrieval module;

the pedestrian re-recognition network model is used for searching the image feature similarity in the uploaded generation search pedestrian image and video data;

the monitoring video data reading module is responsible for managing the input and output of images and video data, and comprises the step of reading the monitoring video data uploaded by a user and used for searching pedestrian pictures, designating time periods and camera numbers.

The user retrieval module is used for uploading the human body image to be retrieved and sending the human body image to the video image analysis module; after the user uploads the pedestrian picture to be queried and the video to be compared and searched is specified, and clicks a query button, the module reads and displays the pedestrian picture to be searched, which is uploaded by the user, reads video data under the user specified time period and the camera number, and stores the processed result in a return after the system processing is finished.

the video data is decoded into a mature prior art, which is not described in detail in this embodiment;

the main purpose of image preprocessing is to eliminate irrelevant information in the image, recover useful real information, enhance the detectability of relevant information, simplify data to the maximum extent, and thereby improve the reliability of feature extraction, image segmentation, matching and recognition.

The invention relates to a method for a pedestrian re-identification system, which comprises the following steps:

first, a parametric pre-training initialization network is required on an ImageNet large public dataset. The neural network model generally relies on random gradient descent for model training and parameter updating, the final performance of the network is directly related to the optimal solution obtained by convergence, and the convergence result is in fact greatly dependent on the initial initialization of the network parameters. The ideal network parameter initialization ensures that the model training is twice as great as possible, and conversely, the poor initialization scheme not only can influence the network convergence, but also can lead to gradient dispersion or explosion, and when the parameter pre-training is initialized, batch Normalization is utilized to change the input data distribution into Gaussian distribution, so that the input of each layer of neural network can be ensured to keep the same distribution, and the method has the advantages that the distribution gradually deviates along with the increase of the number of layers of the network, so that the convergence is slow, and the integral distribution is close to the upper limit and the lower limit of a nonlinear function value interval. This results in the gradient vanishing in the back propagation. BN is to forcedly pull back the distribution of the input value of any neuron of each layer of neural network to the standard normal distribution with the mean value of 0 and the variance of 1 by a normalization means, so that the activated input value falls into a relatively sensitive area in a nonlinear function, the gradient can be increased, the learning convergence speed is high, and the convergence speed can be greatly accelerated;

uploading and reading video data in the monitoring video data reading module after the pedestrian re-identification network model is obtained; the video decoding submodule decodes video data into a usable picture format, and performs image preprocessing on the video data, in this embodiment, the image preprocessing can be distortion processing on an image, and when the image preprocessing submodule performs image preprocessing operation, image enhancement operation is used to enhance useful information in the image, which is a distortion process, and the purpose is to improve the visual effect of the image, purposefully emphasize the whole or partial characteristics of the image for the application occasion of a given image, and make the original unclear image clear or emphasize some interesting characteristics, enlarge the differences between different object characteristics in the image, inhibit the interesting characteristics, improve the image quality and abundant information quantity, enhance the image interpretation and recognition effect, and meet the analysis requirement;

designing a Coarse-Fine granularity combined neural network model, designing a Coarse-Fine granularity combined neural network model (CFNet) to enable the network to extract pedestrian characteristics with different granularities, selecting ResNet-50 as a backbone network, dividing the part after the Res Block2 convolution module into two types of branches, wherein one Branch is Coarse granularity branches (Coarse Branch) and the other Branch is Fine granularity branches (Fine Branch), and the Fine granularity branches are further divided into two sub-branches: an upper body branch and a lower body branch; as shown in fig. 3, the human body analysis attention mechanism operation is performed: the geometrical transformation is carried out by the acquired human body analysis key points to calculate the same cross-view-angle area between two pedestrian images, and the popular attention attempts at present have a certain similarity, so that the probability maps of 20 human body parts are combined to generate human semantic attention attempts of 7 human body parts at different levels: m is M _{Shoes with wheels} ＝{Socks、LeftShoe、RightShoe}，M _{Head part} ＝ {Hat、Hair、Sunglasses、Face}，M _{Upper body} ＝{Glove、UpperClothes、Coat、Scarf、 LeftArm、RightArm}，M _{Lower body} ＝{Dress、Pants、Jumpsuits、Skirt、LeftLeg、 RightLeg}，M _{Upper half part} ＝M _{Upper body} +M _{Head part} ，M _{Lower half part} ＝M _{Lower body} +M _{Shoes with wheels} ，M _{Whole body} ＝ M _{Upper half part} +M _{Lower half part} ，

Through the semantic attention patterns, different parts of a human body can be positioned, different layers of output of a convolutional neural network have different semantic information, the human semantic attention patterns are combined with the characteristics of different layers of the convolutional network at different stages by adopting similar attention mechanisms, so that the network focuses on a local area of the body, a more macroscopic semantic graph is provided at a shallow layer to capture more detail characteristics, and higher semantic information is gradually provided at a deep layer to enhance the capture of abstract characteristics, wherein the form definition is as shown in the following formula:

F _attetnion ＝F _i *M+F _i

wherein M is { M ∈ } _{Whole body} 、M _{Upper body} 、M _{Lower body} 、M _{Upper body} 、M _{Lower body} 、M _{Head part} 、M _{Shoes with wheels} Semantic attention strikethrough for different layers, F _i Feature map output for each layer of network, F _attention To enhance the feature map of the local area closing;

when the model cannot output a good segmentation result at a very low resolution, M approaches 0, so that F _attention Near F _i . In this way, bad segmentation results cannot generate negative effects, good segmentation results can provide sufficient information to improve recognition accuracy, semantic attention generated by the method based on human body analysis is combined with a network in a similar attention mechanism mode, and compared with other methods, the method can fully utilize human body priori information without damaging model performance;

in the training process of the pedestrian re-recognition network model, many works treat it as a classification task to train using a cross entropy function with a One-Hot (One-Hot) code tag as a loss function. Whereas a single thermally coded tag typically does not contain similarity information between categories.

For the task of re-identifying pedestrians, the task is regarded as a classification task in the training stage, the cross entropy loss function with the single thermal coding label is used for prediction, the classification layer is omitted in the testing stage, and the feature vector after the global pooling layer is directly used as the feature representation of the pedestrians to perform similarity calculation. The goals of training and testing in this manner are greatly different, since the ultimate goal of pedestrian re-recognition is to distinguish between the similarities of different pedestrian pictures of unknown identity rather than a simple classification on the training set, and the one-hot encoding marks the class to which the data belongs as 1 and the other class as 0, ignoring the similarity information between the pedestrian pictures and subject to fitting on the training set, and thus this approach may not be optimal. By exploiting knowledge distillation concepts, we expect to introduce more similarity information during the training phase to optimize the network training process, thereby reducing the difference between training and testing, we propose knowledge distillation loss function (Knowledge Distillation Loss) to improve the cross entropy loss function with the single thermal encoding tag.

Firstly, carrying out classification training on a re-identification data set by taking CFNet as a teacher model to predict soft labels containing pedestrian picture similarity information, and then retraining the model by utilizing knowledge distillation loss functions formed by the soft labels and single-hot label coding labels, wherein the mathematical expression is shown as the following formula:

wherein H (·) is cross entropy, p _t Soft labels output for teacher model, p _s And outputting a standard softmax function of the student model, wherein τ is the smoothness degree of the temperature parameter control probability distribution, and α is the weight of the balance factor balancing the two terms.

Meanwhile, in order to enable the network to learn complementary features, different branches are learned by using different loss functions so as to focus on feature extraction in different aspects. For coarse-grained branches, knowledge distillation loss functions are adopted to pay attention to global feature extraction; for fine-grained branches, knowledge distillation loss functions and triplet loss functions are employed to enhance extraction of detail features.

The pedestrian characteristic extraction process comprises the following steps: firstly, training a backbone network (Resnet 50 is adopted as a backbone network of the backbone in the embodiment) on a pedestrian re-identification data set, wherein the loss adopts cross entropy loss based on pedestrian ID; then, combining the backup network with the obtained pedestrian component area prediction result to obtain a pedestrian component feature map: performing point multiplication on the feature map of the Backbone network and the feature map of the pedestrian component prediction area; carrying out global average pooling on the feature map, the pedestrian component feature map and the component region feature map output by the backhaul network to obtain global features, component region feature vectors and component visible probability; the component regional feature vector and the component visible probability are subjected to 1X 1 convolution to obtain component feature weights, and dot multiplication is carried out on the component regional feature vector to obtain final component local features;

after the pedestrian features learned by different branches are spliced, the features with more discriminant ability are highlighted through feature selection (Feature Select Module, FSM) and final pedestrian feature representation is obtained; the feature vectors of different branches are directly spliced to possibly ignore the importance of different features, and the method is inspired by Hu et al, i think that elements of the learned pedestrian feature vectors have different importance degrees, and the method selects SE Block to learn a feature with high importance weight W to selectively enhance the discrimination and inhibit the feature with weak discrimination, and the part of the operation is shown in the following formula:

W＝Sigmoid(FC(ReLU(FC(f _i ))))

wherein two FC layers from inside to outside are used to compress and activate the operation. After obtaining the weight W of the importance of the feature vector, outputting the feature f _o The calculation mode is as follows:

f _o ＝f _i *W+f _i

the sum and the plus operation are operations among elements, and the enhanced feature is added with the original feature vector to further enhance the distinguishing capability of the feature.

In order to enable the two types of branches to pay attention to information of different granularities of human bodies, human body semantic attention diagrams (Semantic Attention Map, SAM) of different levels are generated through a human body analysis model, and different semantic information is provided for guiding learning of a network in different branches; in addition, through analyzing the defects of the cross entropy Loss function commonly used in the training process of the pedestrian re-identification model, a knowledge distillation idea is adopted to design a knowledge distillation Loss function (Knowledge Distillation Loss, KD Loss) to provide soft labels containing pedestrian identity similarity information for a network to optimize the training of the model, meanwhile, in order to make pedestrian characteristics of two types of branch learning be complementary as much as possible, for coarse-granularity branches, we only use the knowledge distillation Loss function to supervise to extract side-weighted global characteristics, and for fine-granularity branches, we use triple Loss function (triple Loss) and knowledge distillation Loss function to jointly supervise to strengthen the attention of the network to the fine-granularity characteristics. Fig. 4 is a schematic diagram of a pedestrian re-recognition network model based on the combination of human body resolution coarse-fine granularity.

Uploading the human body image to be searched in the searching module, and calculating and outputting the pedestrian characteristic of the human body image to be searched by utilizing the step B;

and the pedestrian re-recognition network model detects pedestrian characteristics appearing in the video data according to a certain frame interval by using an FPN-Person according to the pedestrian characteristics of the human body image to be searched.

After the detection is finished, the CFNet is utilized to simultaneously extract the characteristics of the pedestrian picture to be searched and the pedestrian picture obtained by detection, which are uploaded by the user, for the monitoring video data queried for the first time, and the extracted characteristics are stored in a mat file so as to facilitate the subsequent query.

For the monitoring video data with the extracted characteristics, only the CFNet is used for extracting the characteristics of the pedestrian pictures to be searched, which are uploaded by the user, and then the corresponding pedestrian characteristics in the mat file video are directly read.

Calculating the similarity between the features of the pedestrians to be searched and the features of the detected pedestrian pictures, and returning the pedestrian pictures with the similarity larger than a given threshold value to the user after sorting according to the similarity;

for similarity calculation, as shown in fig. 5, when a model is trained to perform the three classification tasks of automobile, horse and zebra, there will be tags [1, 0], [0,1,0], [0, 1], and a trained network predictor is usually a probability distribution generated by the Softmax function, and its basic form is shown in the formula:

where z is the value of logits output by the last layer of the network, and p is the probability value of the corresponding class after being processed by the softmax function.

The network may be [0.95,0.03,0.02] for the class probability distribution predicted by the car of fig. 5 a) [0.06,0.73,0.21] for the horse of fig. 5 b) and [0.09,0.19,0.72] for the zebra of fig. 5 c), and from the predicted probability distribution of fig. 5 b) it may be seen that the picture has a probability of 0.21 being a zebra and a probability of 0.06 being a car, indicating that the zebra is more like a horse than the car, this predicted value contains similarity information between classes. The final purpose of the pedestrian re-identification task is to compare the pedestrian picture feature similarity information to identify, and the analysis above can know that training by using independent thermal codes can ignore the similarity information between pedestrian identities, and we introduce labels containing pedestrian similarity information to optimize the training and feature extraction of the network by referring to the knowledge distillation idea.

The user searching module is responsible for the interaction of user inquiry, and comprises the functions of uploading the pedestrian picture to be searched, the designated time period and the camera number, and displaying and browsing the searching result. The user can select the pedestrian picture to be searched for uploading, specify the time period to be searched and the number of the camera, and finally check and browse the search result returned by the system, and the realization flow chart of the module is shown in fig. 5;

the user firstly selects the pedestrian picture to be inquired through the Choose File button, and then reads in and displays the pedestrian picture through the input and output module.

And then selecting the camera number to be queried from the camera list, and designating the time period to be queried in the time input box.

After clicking the query button, the monitoring video reading module reads video data under the appointed time period and the camera number and sends the video data to the video image analysis module, the pedestrian characteristic extraction module and the human body re-identification model loading module;

and finally, displaying the result returned by the human body weight recognition model loading module on a search result display interface for browsing by a user, and displaying the pedestrian pictures which are arranged in the first 30 bits from the large to the small according to the similarity in the search library under the serial number of the camera designated by the user and are more than a given threshold value as the search result.

The present invention is not limited to the above-described embodiments, and various changes can be made by those skilled in the art within the scope of the present invention without departing from the spirit of the present invention.

Claims

1. The pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity is characterized by comprising a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian characteristic extraction module, a human body re-recognition model loading module and a user retrieval module; the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model;

the user retrieval module is used for uploading the human body image to be retrieved and sending the human body image to the video image analysis module; the video image analysis module comprises a video decoding submodule and an image preprocessing submodule, wherein the video decoding submodule is used for decoding and processing the video data uploaded by the monitoring video data reading module into a processable image; the image preprocessing sub-module is used for improving the visual effects of the images after video decoding and the human body images to be retrieved;

the pedestrian characteristic extraction module is used for designing a coarse-fine granularity combined neural network, and coarse-granularity branches and fine-granularity branches in the coarse-fine granularity combined neural network are used for respectively extracting pedestrian characteristics of the video decoded image and the human body image to be searched and storing the images; human body analysis attention machineThe control operation is to locate different parts of the human body through semantic attention, combine the human body semantic attention with the characteristics of different layers of the convolution network at different stages to make the network pay attention to the local area of the human body; for coarse-granularity branches, a knowledge distillation loss function is adopted to enhance the extraction of global features, and for fine-granularity branches, a knowledge distillation loss function and a triplet loss function are adopted to enhance the extraction of detail features; the learned characteristics are spliced to obtain a pedestrian characteristic set f _i The method comprises the steps of carrying out a first treatment on the surface of the Then, using SEBlock to learn a feature vector importance weight W to selectively enhance the feature with strong discrimination and inhibit the feature with weak discrimination;

2. The pedestrian re-recognition system based on the combination of human body resolution coarse and fine granularity according to claim 1, wherein the user retrieval module is further configured to set a similarity threshold.

3. The pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity according to claim 1, wherein the human body re-recognition model loading module is further configured to feed the calculated similarity back to the user retrieval module.

4. A method of a pedestrian re-identification system as in claim 1, comprising the steps of:

and (B) step (B): uploading and reading video data in the monitoring video data reading module; the video decoding submodule decodes video data, processes the video data into an employable picture format, performs image preprocessing on the video data, and utilizes a designed neural network module with combined coarse granularity and fine granularityExtracting pedestrian characteristics, performing human body analysis attention mechanism operation, positioning different parts of a human body through semantic attention, and combining the human body semantic attention with characteristics of different layers of a convolution network at different stages to enable the network to pay attention to a local area of the body; the neural network model with the combination of coarse granularity and fine granularity comprises coarse granularity branches and fine granularity branches; for coarse-granularity branches, the knowledge distillation loss function is adopted to enhance the extraction of global features, and for fine-granularity branches, the knowledge distillation loss function and the triplet loss function are adopted to enhance the extraction of detail features; the learned characteristics are spliced to obtain a pedestrian characteristic set f _i The method comprises the steps of carrying out a first treatment on the surface of the Then, using SEBlock to learn a feature vector importance weight W to selectively enhance the feature with strong discrimination and inhibit the feature with weak discrimination;

W＝Sigmoid(FC(ReLU(FC(f _i ))))

after obtaining the importance weight W of the pedestrian feature vector, outputting the pedestrian feature f ₀

f ₀ ＝f _i *W+f _i

And storing;

step D: and the pedestrian re-recognition network model extracts, detects and calculates the similarity of pedestrian features in the video decoded image according to the pedestrian features of the human body image to be searched, and if the similarity is higher than a threshold value, stores the similarity and returns the similarity in a similarity arrangement mode.

5. The method of pedestrian re-recognition system of claim 4 wherein in step B, the picture format may be JPG, PNG.

6. The method of a pedestrian re-identification system as in claim 4 wherein in step B, the video data is from a monitoring camera.

7. The method of a pedestrian re-recognition system according to claim 4, wherein in the step B, the image preprocessing means a distortion process of the image.

8. The method of a pedestrian re-identification system as set forth in claim 4, wherein,

in step B, the pedestrian feature f ₀ And storing the mat file.

9. The method of a pedestrian re-identification system as set forth in claim 4, wherein,

in step D, the video decoded image is detected using FPN-Person.

10. The method of a pedestrian re-identification system of claim 8 wherein,

in step D, if the decoded image of the video has been subjected to pedestrian feature extraction, the CFNet is used to extract pedestrian features from the uploaded human body image to be searched, and the pedestrian features appearing in the corresponding mat file are read.