CN110807434B - Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination - Google Patents

Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination Download PDF

Info

Publication number
CN110807434B
CN110807434B CN201911078998.6A CN201911078998A CN110807434B CN 110807434 B CN110807434 B CN 110807434B CN 201911078998 A CN201911078998 A CN 201911078998A CN 110807434 B CN110807434 B CN 110807434B
Authority
CN
China
Prior art keywords
pedestrian
human body
module
image
video data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911078998.6A
Other languages
Chinese (zh)
Other versions
CN110807434A (en
Inventor
陈彬
赵聪聪
白雪峰
于水
胡明亮
朴铁军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weihai Ruowei Information Technology Co ltd
Original Assignee
Weihai Ruowei Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weihai Ruowei Information Technology Co ltd filed Critical Weihai Ruowei Information Technology Co ltd
Priority to CN201911078998.6A priority Critical patent/CN110807434B/en
Publication of CN110807434A publication Critical patent/CN110807434A/en
Application granted granted Critical
Publication of CN110807434B publication Critical patent/CN110807434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity comprises a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-recognition model loading module and a user retrieval module; the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model; the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module; compared with the prior art, the invention has the beneficial effects that: in the aspect of human body analysis, the neural network model is designed in a mode of combining coarse granularity and fine granularity, so that human semantics of different layers are emphasized, pedestrian characteristics with more discrimination are extracted, and the accuracy is improved; and moreover, a loss function is designed by combining the knowledge distillation idea, so that the recognition time of pedestrian re-recognition is effectively reduced by optimizing the training of the network, and the efficiency is improved.

Description

Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination
Technical Field
The invention relates to the field of pedestrian re-identification, in particular to a pedestrian re-identification system based on human body analysis coarse-fine granularity combination.
Background
In the face of massive videos, the traditional analysis of videos by manpower is a very labor-consuming matter, and meanwhile, the long-time observation is easy to cause visual fatigue of workers to cause certain errors. Aiming at the problems existing in the traditional manual search, people begin to pay attention to the problem of accurately and efficiently completing the search of interested pedestrians from a large number of videos by means of a computer vision technology, so that pedestrians in videos under different cameras are analyzed by utilizing a Person Re-Identification technology in computer vision to assist and even replace workers.
Unlike the face recognition technology which requires the cooperation of the person to be recognized and requires high-quality pictures, the technology does not require the cooperation of the person to be recognized and can recognize the low-resolution pedestrian image in a complex scene, and the appearance of the interested pedestrian in the monitoring camera network can be quickly inquired by the technology, so that the pedestrian recognition technology has wide application prospects in the intelligent security field, the man-machine interaction field and the new retail field.
The current study of pedestrian re-identification mainly aims at extracting features of pedestrian pictures to obtain robust features capable of coping with complex changes of different camera scenes so as to realize accurate matching of target pedestrians. The research of the traditional pedestrian re-identification method is mainly divided into two aspects: 1) Characteristic representation learning, namely, dealing with the appearance change of pedestrians under different camera angles by designing characteristic representations with certain invariance to the identities of the pedestrians; 2) Metric learning, namely mapping high-dimensional features to a new feature space by learning, so that the same person features are closer in distance and different persons are farther in distance under the new feature space. In 2014, as researchers introduce deep learning into the pedestrian re-recognition field, feature representation learning and metric learning can be jointly optimized in an end-to-end manner through a convolutional neural network, performance exceeds that of a traditional method, and deep learning is also gradually the main stream method in the pedestrian re-recognition field.
In the development of pedestrian re-recognition, in the process from two stages of feature extraction and measurement learning of a traditional method to end-to-end learning based on deep learning, the pedestrian re-recognition technology based on deep learning adopts data driving to learn end-to-end to improve the robustness and discrimination capability of the changing features of pedestrian pictures under different cameras. The current pedestrian re-recognition method based on deep learning has obtained good results on most public data sets, but because pedestrian pictures in the data sets are usually obtained through manual cutting and screening, the prior information of human body structures often has larger field deviation with a monitoring scene in the pre-training process of a large data set ImageNet in the current pedestrian re-recognition technology, and the error rate of pedestrian re-recognition can be increased by dividing the pedestrian pictures by using incorrect prediction results; in addition, in the attention degree of the features of different areas of the image details to be identified, the pedestrian re-identification technology is also often subjected to the problem of re-identification defects caused by different illumination, different angles of cameras and the like.
Disclosure of Invention
The invention solves the problem of providing a pedestrian re-identification system and a pedestrian re-identification method based on the combination of human body analysis coarse-fine granularity, which can effectively enhance the accuracy and efficiency of pedestrian re-identification under different visual angles, postures and illumination changes.
The pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity is characterized by comprising a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian characteristic extraction module, a human body re-recognition model loading module and a user retrieval module;
the parameter pre-training initialization module is used for initializing a network by parameter pre-training in the public data set to obtain a pedestrian re-identification network model;
the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module;
the user retrieval module is used for uploading the human body image to be retrieved and sending the human body image to the video image analysis module;
the video image analysis module comprises a video decoding submodule and an image preprocessing submodule, and the video decoding submodule is used for decoding and processing the video data uploaded by the monitoring video data reading module into a processable image; the image preprocessing sub-module is used for improving the visual effects of the image after video decoding and the human body image to be retrieved;
the pedestrian characteristic extraction module is used for designing a coarse-fine granularity combined neural network, and coarse-granularity branches and fine-granularity branches in the coarse-fine granularity combined neural network are used for respectively extracting pedestrian characteristics of the video decoded image and the human body image to be searched and storing the images;
the human body re-recognition model loading module is used for carrying out retrieval matching by utilizing the pedestrian re-recognition network model according to the stored pedestrian characteristics and the human body images to be retrieved, and calculating to obtain the similarity.
In the above technical solution, further, the user search module is further configured to set a similarity threshold. By setting the similarity threshold, the pedestrian pictures with different degrees of similarity can be identified, so that the identification standard is more flexible.
In the above technical solution, further, the body weight recognition model loading module is further configured to feed back the calculated similarity to the user retrieval module.
A method of a pedestrian re-identification system as described, comprising the steps of:
step A: performing parameter pre-training on the disclosed data set to initialize a network to obtain a pedestrian re-identification network model;
and (B) step (B): uploading and reading video data in the monitoring video data reading module; the video decoding submodule decodes video data, processes the video data into an employable picture format, carries out image pretreatment on the video data, and then utilizes a designed neural network model with combined coarse granularity and fine granularity, wherein the neural network model comprises a coarse granularity branch and a fine granularity branch; for coarse-grained branching, use is made ofThe distillation loss function is recognized to enhance the extraction of global features, and for fine-granularity branches, knowledge distillation loss function and triplet loss function are adopted to enhance the extraction of detail features; the learned characteristics are spliced to obtain a pedestrian characteristic set f i The method comprises the steps of carrying out a first treatment on the surface of the Then, SE Block is utilized to learn a feature vector importance weight W to selectively enhance the feature with strong discrimination and inhibit the feature with weak discrimination;
W=Sigmoid(FC(ReLU(FC(f i ))))
wherein, two FC layers from inside to outside are used for compression and activation;
after obtaining the importance weight W of the pedestrian feature vector, outputting the pedestrian feature f 0
f 0 =f i *W+f i
And storing;
step C: uploading the human body image to be searched in the searching module, and calculating and outputting the pedestrian characteristic of the human body image to be searched by utilizing the step B;
step D: and the pedestrian re-recognition network model extracts, detects and calculates the similarity of pedestrian features in the video decoded image according to the pedestrian features of the human body image to be searched, and if the similarity is higher than a threshold value, stores the similarity and returns the similarity in a similarity arrangement.
In the above technical solution, in step B, the picture format may be JPG or PNG. And the pictures with various formats are supported, and the adaptation breadth is improved.
In the above technical solution, further, in step B, the video data is from a monitoring camera.
In the above technical solution, further, in step B, the image preprocessing refers to performing distortion processing on the image. The quality of the image is improved, and the influence of interference information on the extraction of pedestrian characteristics is reduced.
In the above technical solution, further, in step B, the pedestrian feature f 0 And storing the mat file. Facilitating later queries.
In the above technical solution, in step D, the decoded video image is detected using FPN-Person.
In the above technical solution, further, in step D, if the decoded image of the video has already undergone pedestrian feature extraction, CFNet is used to extract pedestrian features from the uploaded image of the human body to be retrieved, and the pedestrian features appearing in the corresponding mat file are read.
Compared with the prior art, the invention has the beneficial effects that: in the aspect of human body analysis, the neural network model is designed in a mode of combining coarse granularity and fine granularity, and human body semantics of different layers are emphasized, so that pedestrian characteristics with more discrimination are extracted, and the accuracy is improved; and moreover, the loss function is designed by combining knowledge distillation thought, so that the recognition time of pedestrian re-recognition is effectively reduced and the efficiency is improved by optimizing the training of the network.
Drawings
Fig. 1 is a block diagram of a pedestrian re-recognition system according to the present invention.
Fig. 2 is a flow chart of a method of the pedestrian re-recognition system according to the present invention.
Fig. 3 is a schematic illustration of human semantic attention in the method of the pedestrian re-recognition system of the present invention.
Fig. 4 is a schematic diagram of a pedestrian re-recognition network model of the pedestrian re-recognition system according to the present invention.
Fig. 5 is a schematic diagram of three classification of similarity information in the pedestrian re-recognition system according to the present invention.
Fig. 6 is a schematic diagram of a user search module starting process in the pedestrian re-recognition system according to the present invention.
Detailed Description
The invention is further described in the following examples with reference to the accompanying drawings.
1-6, a pedestrian re-recognition system based on human body analysis coarse-fine granularity combination comprises a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian feature extraction module, a human body re-recognition model loading module and a user retrieval module;
the parameter pre-training initialization module is used for initializing a network by parameter pre-training in the public data set to obtain a pedestrian re-identification network model;
the pedestrian re-recognition network model is used for searching the image feature similarity in the uploaded generation search pedestrian image and video data;
the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module;
the monitoring video data reading module is responsible for managing the input and output of images and video data, and comprises the step of reading the monitoring video data uploaded by a user and used for searching pedestrian pictures, designating time periods and camera numbers.
The user retrieval module is used for uploading the human body image to be retrieved and sending the human body image to the video image analysis module; after the user uploads the pedestrian picture to be queried and the video to be compared and searched is specified, and clicks a query button, the module reads and displays the pedestrian picture to be searched, which is uploaded by the user, reads video data under the user specified time period and the camera number, and stores the processed result in a return after the system processing is finished.
The video image analysis module comprises a video decoding submodule and an image preprocessing submodule, and the video decoding submodule is used for decoding and processing the video data uploaded by the monitoring video data reading module into a processable image; the image preprocessing sub-module is used for improving the visual effects of the image after video decoding and the human body image to be retrieved;
the video data is decoded into a mature prior art, which is not described in detail in this embodiment;
the main purpose of image preprocessing is to eliminate irrelevant information in the image, recover useful real information, enhance the detectability of relevant information, simplify data to the maximum extent, and thereby improve the reliability of feature extraction, image segmentation, matching and recognition.
The pedestrian characteristic extraction module is used for designing a coarse-fine granularity combined neural network, and coarse-granularity branches and fine-granularity branches in the coarse-fine granularity combined neural network are used for respectively extracting pedestrian characteristics of the video decoded image and the human body image to be searched and storing the images;
the human body re-recognition model loading module is used for carrying out retrieval matching by utilizing the pedestrian re-recognition network model according to the stored pedestrian characteristics and the human body images to be retrieved, and calculating to obtain the similarity.
The invention relates to a method for a pedestrian re-identification system, which comprises the following steps:
first, a parametric pre-training initialization network is required on an ImageNet large public dataset. The neural network model generally relies on random gradient descent for model training and parameter updating, the final performance of the network is directly related to the optimal solution obtained by convergence, and the convergence result is in fact greatly dependent on the initial initialization of the network parameters. The ideal network parameter initialization ensures that the model training is twice as great as possible, and conversely, the poor initialization scheme not only can influence the network convergence, but also can lead to gradient dispersion or explosion, and when the parameter pre-training is initialized, batch Normalization is utilized to change the input data distribution into Gaussian distribution, so that the input of each layer of neural network can be ensured to keep the same distribution, and the method has the advantages that the distribution gradually deviates along with the increase of the number of layers of the network, so that the convergence is slow, and the integral distribution is close to the upper limit and the lower limit of a nonlinear function value interval. This results in the gradient vanishing in the back propagation. BN is to forcedly pull back the distribution of the input value of any neuron of each layer of neural network to the standard normal distribution with the mean value of 0 and the variance of 1 by a normalization means, so that the activated input value falls into a relatively sensitive area in a nonlinear function, the gradient can be increased, the learning convergence speed is high, and the convergence speed can be greatly accelerated;
uploading and reading video data in the monitoring video data reading module after the pedestrian re-identification network model is obtained; the video decoding submodule decodes video data into a usable picture format, and performs image preprocessing on the video data, in this embodiment, the image preprocessing can be distortion processing on an image, and when the image preprocessing submodule performs image preprocessing operation, image enhancement operation is used to enhance useful information in the image, which is a distortion process, and the purpose is to improve the visual effect of the image, purposefully emphasize the whole or partial characteristics of the image for the application occasion of a given image, and make the original unclear image clear or emphasize some interesting characteristics, enlarge the differences between different object characteristics in the image, inhibit the interesting characteristics, improve the image quality and abundant information quantity, enhance the image interpretation and recognition effect, and meet the analysis requirement;
designing a Coarse-Fine granularity combined neural network model, designing a Coarse-Fine granularity combined neural network model (CFNet) to enable the network to extract pedestrian characteristics with different granularities, selecting ResNet-50 as a backbone network, dividing the part after the Res Block2 convolution module into two types of branches, wherein one Branch is Coarse granularity branches (Coarse Branch) and the other Branch is Fine granularity branches (Fine Branch), and the Fine granularity branches are further divided into two sub-branches: an upper body branch and a lower body branch; as shown in fig. 3, the human body analysis attention mechanism operation is performed: the geometrical transformation is carried out by the acquired human body analysis key points to calculate the same cross-view-angle area between two pedestrian images, and the popular attention attempts at present have a certain similarity, so that the probability maps of 20 human body parts are combined to generate human semantic attention attempts of 7 human body parts at different levels: m is M Shoes with wheels ={Socks、LeftShoe、RightShoe},M Head part = {Hat、Hair、Sunglasses、Face},M Upper body ={Glove、UpperClothes、Coat、Scarf、 LeftArm、RightArm},M Lower body ={Dress、Pants、Jumpsuits、Skirt、LeftLeg、 RightLeg},M Upper half part =M Upper body +M Head part ,M Lower half part =M Lower body +M Shoes with wheels ,M Whole body = M Upper half part +M Lower half part
Through the semantic attention patterns, different parts of a human body can be positioned, different layers of output of a convolutional neural network have different semantic information, the human semantic attention patterns are combined with the characteristics of different layers of the convolutional network at different stages by adopting similar attention mechanisms, so that the network focuses on a local area of the body, a more macroscopic semantic graph is provided at a shallow layer to capture more detail characteristics, and higher semantic information is gradually provided at a deep layer to enhance the capture of abstract characteristics, wherein the form definition is as shown in the following formula:
F attetnion =F i *M+F i
wherein M is { M ∈ } Whole body 、M Upper body 、M Lower body 、M Upper body 、M Lower body 、M Head part 、M Shoes with wheels Semantic attention strikethrough for different layers, F i Feature map output for each layer of network, F attention To enhance the feature map of the local area closing;
when the model cannot output a good segmentation result at a very low resolution, M approaches 0, so that F attention Near F i . In this way, bad segmentation results cannot generate negative effects, good segmentation results can provide sufficient information to improve recognition accuracy, semantic attention generated by the method based on human body analysis is combined with a network in a similar attention mechanism mode, and compared with other methods, the method can fully utilize human body priori information without damaging model performance;
in the training process of the pedestrian re-recognition network model, many works treat it as a classification task to train using a cross entropy function with a One-Hot (One-Hot) code tag as a loss function. Whereas a single thermally coded tag typically does not contain similarity information between categories.
For the task of re-identifying pedestrians, the task is regarded as a classification task in the training stage, the cross entropy loss function with the single thermal coding label is used for prediction, the classification layer is omitted in the testing stage, and the feature vector after the global pooling layer is directly used as the feature representation of the pedestrians to perform similarity calculation. The goals of training and testing in this manner are greatly different, since the ultimate goal of pedestrian re-recognition is to distinguish between the similarities of different pedestrian pictures of unknown identity rather than a simple classification on the training set, and the one-hot encoding marks the class to which the data belongs as 1 and the other class as 0, ignoring the similarity information between the pedestrian pictures and subject to fitting on the training set, and thus this approach may not be optimal. By exploiting knowledge distillation concepts, we expect to introduce more similarity information during the training phase to optimize the network training process, thereby reducing the difference between training and testing, we propose knowledge distillation loss function (Knowledge Distillation Loss) to improve the cross entropy loss function with the single thermal encoding tag.
Firstly, carrying out classification training on a re-identification data set by taking CFNet as a teacher model to predict soft labels containing pedestrian picture similarity information, and then retraining the model by utilizing knowledge distillation loss functions formed by the soft labels and single-hot label coding labels, wherein the mathematical expression is shown as the following formula:
wherein H (·) is cross entropy, p t Soft labels output for teacher model, p s And outputting a standard softmax function of the student model, wherein τ is the smoothness degree of the temperature parameter control probability distribution, and α is the weight of the balance factor balancing the two terms.
Meanwhile, in order to enable the network to learn complementary features, different branches are learned by using different loss functions so as to focus on feature extraction in different aspects. For coarse-grained branches, knowledge distillation loss functions are adopted to pay attention to global feature extraction; for fine-grained branches, knowledge distillation loss functions and triplet loss functions are employed to enhance extraction of detail features.
The pedestrian characteristic extraction process comprises the following steps: firstly, training a backbone network (Resnet 50 is adopted as a backbone network of the backbone in the embodiment) on a pedestrian re-identification data set, wherein the loss adopts cross entropy loss based on pedestrian ID; then, combining the backup network with the obtained pedestrian component area prediction result to obtain a pedestrian component feature map: performing point multiplication on the feature map of the Backbone network and the feature map of the pedestrian component prediction area; carrying out global average pooling on the feature map, the pedestrian component feature map and the component region feature map output by the backhaul network to obtain global features, component region feature vectors and component visible probability; the component regional feature vector and the component visible probability are subjected to 1X 1 convolution to obtain component feature weights, and dot multiplication is carried out on the component regional feature vector to obtain final component local features;
after the pedestrian features learned by different branches are spliced, the features with more discriminant ability are highlighted through feature selection (Feature Select Module, FSM) and final pedestrian feature representation is obtained; the feature vectors of different branches are directly spliced to possibly ignore the importance of different features, and the method is inspired by Hu et al, i think that elements of the learned pedestrian feature vectors have different importance degrees, and the method selects SE Block to learn a feature with high importance weight W to selectively enhance the discrimination and inhibit the feature with weak discrimination, and the part of the operation is shown in the following formula:
W=Sigmoid(FC(ReLU(FC(f i ))))
wherein two FC layers from inside to outside are used to compress and activate the operation. After obtaining the weight W of the importance of the feature vector, outputting the feature f o The calculation mode is as follows:
f o =f i *W+f i
the sum and the plus operation are operations among elements, and the enhanced feature is added with the original feature vector to further enhance the distinguishing capability of the feature.
In order to enable the two types of branches to pay attention to information of different granularities of human bodies, human body semantic attention diagrams (Semantic Attention Map, SAM) of different levels are generated through a human body analysis model, and different semantic information is provided for guiding learning of a network in different branches; in addition, through analyzing the defects of the cross entropy Loss function commonly used in the training process of the pedestrian re-identification model, a knowledge distillation idea is adopted to design a knowledge distillation Loss function (Knowledge Distillation Loss, KD Loss) to provide soft labels containing pedestrian identity similarity information for a network to optimize the training of the model, meanwhile, in order to make pedestrian characteristics of two types of branch learning be complementary as much as possible, for coarse-granularity branches, we only use the knowledge distillation Loss function to supervise to extract side-weighted global characteristics, and for fine-granularity branches, we use triple Loss function (triple Loss) and knowledge distillation Loss function to jointly supervise to strengthen the attention of the network to the fine-granularity characteristics. Fig. 4 is a schematic diagram of a pedestrian re-recognition network model based on the combination of human body resolution coarse-fine granularity.
Uploading the human body image to be searched in the searching module, and calculating and outputting the pedestrian characteristic of the human body image to be searched by utilizing the step B;
and the pedestrian re-recognition network model detects pedestrian characteristics appearing in the video data according to a certain frame interval by using an FPN-Person according to the pedestrian characteristics of the human body image to be searched.
After the detection is finished, the CFNet is utilized to simultaneously extract the characteristics of the pedestrian picture to be searched and the pedestrian picture obtained by detection, which are uploaded by the user, for the monitoring video data queried for the first time, and the extracted characteristics are stored in a mat file so as to facilitate the subsequent query.
For the monitoring video data with the extracted characteristics, only the CFNet is used for extracting the characteristics of the pedestrian pictures to be searched, which are uploaded by the user, and then the corresponding pedestrian characteristics in the mat file video are directly read.
Calculating the similarity between the features of the pedestrians to be searched and the features of the detected pedestrian pictures, and returning the pedestrian pictures with the similarity larger than a given threshold value to the user after sorting according to the similarity;
for similarity calculation, as shown in fig. 5, when a model is trained to perform the three classification tasks of automobile, horse and zebra, there will be tags [1, 0], [0,1,0], [0, 1], and a trained network predictor is usually a probability distribution generated by the Softmax function, and its basic form is shown in the formula:
where z is the value of logits output by the last layer of the network, and p is the probability value of the corresponding class after being processed by the softmax function.
The network may be [0.95,0.03,0.02] for the class probability distribution predicted by the car of fig. 5 a) [0.06,0.73,0.21] for the horse of fig. 5 b) and [0.09,0.19,0.72] for the zebra of fig. 5 c), and from the predicted probability distribution of fig. 5 b) it may be seen that the picture has a probability of 0.21 being a zebra and a probability of 0.06 being a car, indicating that the zebra is more like a horse than the car, this predicted value contains similarity information between classes. The final purpose of the pedestrian re-identification task is to compare the pedestrian picture feature similarity information to identify, and the analysis above can know that training by using independent thermal codes can ignore the similarity information between pedestrian identities, and we introduce labels containing pedestrian similarity information to optimize the training and feature extraction of the network by referring to the knowledge distillation idea.
The user searching module is responsible for the interaction of user inquiry, and comprises the functions of uploading the pedestrian picture to be searched, the designated time period and the camera number, and displaying and browsing the searching result. The user can select the pedestrian picture to be searched for uploading, specify the time period to be searched and the number of the camera, and finally check and browse the search result returned by the system, and the realization flow chart of the module is shown in fig. 5;
the user firstly selects the pedestrian picture to be inquired through the Choose File button, and then reads in and displays the pedestrian picture through the input and output module.
And then selecting the camera number to be queried from the camera list, and designating the time period to be queried in the time input box.
After clicking the query button, the monitoring video reading module reads video data under the appointed time period and the camera number and sends the video data to the video image analysis module, the pedestrian characteristic extraction module and the human body re-identification model loading module;
and finally, displaying the result returned by the human body weight recognition model loading module on a search result display interface for browsing by a user, and displaying the pedestrian pictures which are arranged in the first 30 bits from the large to the small according to the similarity in the search library under the serial number of the camera designated by the user and are more than a given threshold value as the search result.
The present invention is not limited to the above-described embodiments, and various changes can be made by those skilled in the art within the scope of the present invention without departing from the spirit of the present invention.

Claims (10)

1. The pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity is characterized by comprising a parameter pre-training initialization module, a monitoring video data reading module, a video image analysis module, a pedestrian characteristic extraction module, a human body re-recognition model loading module and a user retrieval module; the parameter pre-training initialization module is used for carrying out parameter pre-training initialization network in the public data set to obtain a pedestrian re-identification network model;
the monitoring video data reading module is used for uploading and reading video data and sending the video data to the video image analysis module;
the user retrieval module is used for uploading the human body image to be retrieved and sending the human body image to the video image analysis module; the video image analysis module comprises a video decoding submodule and an image preprocessing submodule, wherein the video decoding submodule is used for decoding and processing the video data uploaded by the monitoring video data reading module into a processable image; the image preprocessing sub-module is used for improving the visual effects of the images after video decoding and the human body images to be retrieved;
the pedestrian characteristic extraction module is used for designing a coarse-fine granularity combined neural network, and coarse-granularity branches and fine-granularity branches in the coarse-fine granularity combined neural network are used for respectively extracting pedestrian characteristics of the video decoded image and the human body image to be searched and storing the images; human body analysis attention machineThe control operation is to locate different parts of the human body through semantic attention, combine the human body semantic attention with the characteristics of different layers of the convolution network at different stages to make the network pay attention to the local area of the human body; for coarse-granularity branches, a knowledge distillation loss function is adopted to enhance the extraction of global features, and for fine-granularity branches, a knowledge distillation loss function and a triplet loss function are adopted to enhance the extraction of detail features; the learned characteristics are spliced to obtain a pedestrian characteristic set f i The method comprises the steps of carrying out a first treatment on the surface of the Then, using SEBlock to learn a feature vector importance weight W to selectively enhance the feature with strong discrimination and inhibit the feature with weak discrimination;
the human body re-recognition model loading module is used for carrying out retrieval matching by utilizing the pedestrian re-recognition network model according to the stored pedestrian characteristics and the human body images to be retrieved, and calculating to obtain the similarity.
2. The pedestrian re-recognition system based on the combination of human body resolution coarse and fine granularity according to claim 1, wherein the user retrieval module is further configured to set a similarity threshold.
3. The pedestrian re-recognition system based on the combination of human body analysis coarse and fine granularity according to claim 1, wherein the human body re-recognition model loading module is further configured to feed the calculated similarity back to the user retrieval module.
4. A method of a pedestrian re-identification system as in claim 1, comprising the steps of:
step A: performing parameter pre-training on the disclosed data set to initialize a network to obtain a pedestrian re-identification network model;
and (B) step (B): uploading and reading video data in the monitoring video data reading module; the video decoding submodule decodes video data, processes the video data into an employable picture format, performs image preprocessing on the video data, and utilizes a designed neural network module with combined coarse granularity and fine granularityExtracting pedestrian characteristics, performing human body analysis attention mechanism operation, positioning different parts of a human body through semantic attention, and combining the human body semantic attention with characteristics of different layers of a convolution network at different stages to enable the network to pay attention to a local area of the body; the neural network model with the combination of coarse granularity and fine granularity comprises coarse granularity branches and fine granularity branches; for coarse-granularity branches, the knowledge distillation loss function is adopted to enhance the extraction of global features, and for fine-granularity branches, the knowledge distillation loss function and the triplet loss function are adopted to enhance the extraction of detail features; the learned characteristics are spliced to obtain a pedestrian characteristic set f i The method comprises the steps of carrying out a first treatment on the surface of the Then, using SEBlock to learn a feature vector importance weight W to selectively enhance the feature with strong discrimination and inhibit the feature with weak discrimination;
W=Sigmoid(FC(ReLU(FC(f i ))))
wherein, two FC layers from inside to outside are used for compression and activation;
after obtaining the importance weight W of the pedestrian feature vector, outputting the pedestrian feature f 0
f 0 =f i *W+f i
And storing;
step C: uploading the human body image to be searched in the searching module, and calculating and outputting the pedestrian characteristic of the human body image to be searched by utilizing the step B;
step D: and the pedestrian re-recognition network model extracts, detects and calculates the similarity of pedestrian features in the video decoded image according to the pedestrian features of the human body image to be searched, and if the similarity is higher than a threshold value, stores the similarity and returns the similarity in a similarity arrangement mode.
5. The method of pedestrian re-recognition system of claim 4 wherein in step B, the picture format may be JPG, PNG.
6. The method of a pedestrian re-identification system as in claim 4 wherein in step B, the video data is from a monitoring camera.
7. The method of a pedestrian re-recognition system according to claim 4, wherein in the step B, the image preprocessing means a distortion process of the image.
8. The method of a pedestrian re-identification system as set forth in claim 4, wherein,
in step B, the pedestrian feature f 0 And storing the mat file.
9. The method of a pedestrian re-identification system as set forth in claim 4, wherein,
in step D, the video decoded image is detected using FPN-Person.
10. The method of a pedestrian re-identification system of claim 8 wherein,
in step D, if the decoded image of the video has been subjected to pedestrian feature extraction, the CFNet is used to extract pedestrian features from the uploaded human body image to be searched, and the pedestrian features appearing in the corresponding mat file are read.
CN201911078998.6A 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination Active CN110807434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911078998.6A CN110807434B (en) 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911078998.6A CN110807434B (en) 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Publications (2)

Publication Number Publication Date
CN110807434A CN110807434A (en) 2020-02-18
CN110807434B true CN110807434B (en) 2023-08-15

Family

ID=69501407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911078998.6A Active CN110807434B (en) 2019-11-06 2019-11-06 Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination

Country Status (1)

Country Link
CN (1) CN110807434B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553205B (en) * 2020-04-12 2022-11-15 西安电子科技大学 Vehicle weight recognition method, system, medium and video monitoring system without license plate information
CN111753092B (en) * 2020-06-30 2024-01-26 青岛创新奇智科技集团股份有限公司 Data processing method, model training method, device and electronic equipment
CN111832514B (en) * 2020-07-21 2023-02-28 内蒙古科技大学 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
CN111950411B (en) * 2020-07-31 2021-12-28 上海商汤智能科技有限公司 Model determination method and related device
CN111738362B (en) * 2020-08-03 2020-12-01 成都睿沿科技有限公司 Object recognition method and device, storage medium and electronic equipment
CN112233776A (en) * 2020-11-09 2021-01-15 江苏科技大学 Dermatosis self-learning auxiliary judgment system based on visual asymptotic cavity network
CN113277388B (en) * 2021-04-02 2023-05-02 东南大学 Data acquisition control method for electric hanging basket
CN113269117B (en) * 2021-06-04 2022-12-13 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN114049609B (en) * 2021-11-24 2024-05-31 大连理工大学 Multi-stage aggregation pedestrian re-identification method based on neural architecture search
CN116052220B (en) * 2023-02-07 2023-11-24 北京多维视通技术有限公司 Pedestrian re-identification method, device, equipment and medium
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising
CN116935447B (en) * 2023-09-19 2023-12-26 华中科技大学 Self-adaptive teacher-student structure-based unsupervised domain pedestrian re-recognition method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354548A (en) * 2015-10-30 2016-02-24 武汉大学 Surveillance video pedestrian re-recognition method based on ImageNet retrieval
CN109165738A (en) * 2018-09-19 2019-01-08 北京市商汤科技开发有限公司 Optimization method and device, electronic equipment and the storage medium of neural network model
CN109271895A (en) * 2018-08-31 2019-01-25 西安电子科技大学 Pedestrian's recognition methods again based on Analysis On Multi-scale Features study and Image Segmentation Methods Based on Features
CN109871821A (en) * 2019-03-04 2019-06-11 中国科学院重庆绿色智能技术研究院 The pedestrian of adaptive network recognition methods, device, equipment and storage medium again
CN109919246A (en) * 2019-03-18 2019-06-21 西安电子科技大学 Pedestrian's recognition methods again based on self-adaptive features cluster and multiple risks fusion
CN110188611A (en) * 2019-04-26 2019-08-30 华中科技大学 A kind of pedestrian recognition methods and system again introducing visual attention mechanism
CN110245592A (en) * 2019-06-03 2019-09-17 上海眼控科技股份有限公司 A method of for promoting pedestrian's weight discrimination of monitoring scene
CN110414368A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of unsupervised pedestrian recognition methods again of knowledge based distillation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354548A (en) * 2015-10-30 2016-02-24 武汉大学 Surveillance video pedestrian re-recognition method based on ImageNet retrieval
CN109271895A (en) * 2018-08-31 2019-01-25 西安电子科技大学 Pedestrian's recognition methods again based on Analysis On Multi-scale Features study and Image Segmentation Methods Based on Features
CN109165738A (en) * 2018-09-19 2019-01-08 北京市商汤科技开发有限公司 Optimization method and device, electronic equipment and the storage medium of neural network model
CN109871821A (en) * 2019-03-04 2019-06-11 中国科学院重庆绿色智能技术研究院 The pedestrian of adaptive network recognition methods, device, equipment and storage medium again
CN109919246A (en) * 2019-03-18 2019-06-21 西安电子科技大学 Pedestrian's recognition methods again based on self-adaptive features cluster and multiple risks fusion
CN110188611A (en) * 2019-04-26 2019-08-30 华中科技大学 A kind of pedestrian recognition methods and system again introducing visual attention mechanism
CN110245592A (en) * 2019-06-03 2019-09-17 上海眼控科技股份有限公司 A method of for promoting pedestrian's weight discrimination of monitoring scene
CN110414368A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of unsupervised pedestrian recognition methods again of knowledge based distillation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Zhong Zhang et al..Coarse-Fine Convolutional Neural Network for Person Re-Identification in Camera Sensor Networks.《IEEE Access》.2019,正文第1、3-4章,图3-4. *

Also Published As

Publication number Publication date
CN110807434A (en) 2020-02-18

Similar Documents

Publication Publication Date Title
CN110807434B (en) Pedestrian re-recognition system and method based on human body analysis coarse-fine granularity combination
Hong et al. Fine-grained shape-appearance mutual learning for cloth-changing person re-identification
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
He et al. Dynamic feature matching for partial face recognition
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN110580460A (en) Pedestrian re-identification method based on combined identification and verification of pedestrian identity and attribute characteristics
CN111666843A (en) Pedestrian re-identification method based on global feature and local feature splicing
CN109583482A (en) A kind of infrared human body target image identification method based on multiple features fusion Yu multicore transfer learning
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN111563452A (en) Multi-human body posture detection and state discrimination method based on example segmentation
CN112149538A (en) Pedestrian re-identification method based on multi-task learning
CN101996308A (en) Human face identification method and system and human face model training method and system
Ji et al. Pedestrian attribute recognition based on multiple time steps attention
CN109492528A (en) A kind of recognition methods again of the pedestrian based on gaussian sum depth characteristic
CN111738048A (en) Pedestrian re-identification method
CN113139501A (en) Pedestrian multi-attribute identification method combining local area detection and multi-level feature capture
CN113989577B (en) Image classification method and device
CN111882000A (en) Network structure and method applied to small sample fine-grained learning
Chen et al. A multi-scale fusion convolutional neural network for face detection
Yuan et al. Birds of a feather flock together: Category-divergence guidance for domain adaptive segmentation
Yang et al. Bottom-up foreground-aware feature fusion for practical person search
Galiyawala et al. Person retrieval in surveillance videos using deep soft biometrics
CN113435329B (en) Unsupervised pedestrian re-identification method based on video track feature association learning
CN110751005B (en) Pedestrian detection method integrating depth perception features and kernel extreme learning machine
Li et al. Foldover features for dynamic object behaviour description in microscopic videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant