CN115187906A - Pedestrian detection and re-identification method, device and system - Google Patents

Pedestrian detection and re-identification method, device and system Download PDF

Info

Publication number
CN115187906A
CN115187906A CN202210838515.3A CN202210838515A CN115187906A CN 115187906 A CN115187906 A CN 115187906A CN 202210838515 A CN202210838515 A CN 202210838515A CN 115187906 A CN115187906 A CN 115187906A
Authority
CN
China
Prior art keywords
pedestrian
identification
model
data set
pedestrian detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210838515.3A
Other languages
Chinese (zh)
Inventor
杨浩
张海睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210838515.3A priority Critical patent/CN115187906A/en
Publication of CN115187906A publication Critical patent/CN115187906A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian detection and re-identification method, device and system. The method comprises the following steps: 1) A data processing module for acquiring a pedestrian data set; 2) A prediction module for setting a network structure and training network parameters of pedestrian detection and pedestrian re-identification; 3) The network compression module is used for triggering training of the pedestrian detection and pedestrian re-identification network model and compressing the weight value and the activation value of the trained network model; 4) And converting the trained network model into a model format supported by the embedded platform and constructing a deployment module of the detection and re-identification system. By the technical scheme, the pedestrian target is integrally identified on the embedded platform with limited computing resources in real time, the model volume is compressed by multiple means on the premise of ensuring the model prediction precision, and the target detection and re-identification speed is effectively improved.

Description

Pedestrian detection and re-identification method, device and system
Technical Field
The invention belongs to the technical field of target detection and re-identification, and relates to a pedestrian detection and re-identification method, device and system, in particular to a pedestrian detection and re-identification method, device and system based on an embedded platform with limited computing resources.
Background
With the continuous popularization of deep learning and artificial intelligence technologies, video monitoring equipment as an important component of an urban safety precaution system plays an important role in the aspect of guaranteeing the public safety of the society. Mass data generated by the current monitoring equipment still searches a target object from video data in a manual checking mode, a large amount of time and energy are spent, the accuracy is low, and huge potential safety hazards exist. In addition, the artificial intelligence technology is not as deep as imagination, and the conversion of scientific research results to engineering is restricted by the computing power of embedded equipment, the accuracy of an algorithm, the generalization capability of a model and other aspects. In the application of video monitoring equipment, the detection and re-identification of pedestrian targets are important requirements.
Disclosure of Invention
The pedestrian target detection technology is a basic function in the field of intelligent security monitoring, and is used for detecting, analyzing, positioning and identifying a pedestrian target in a monitoring video; the pedestrian target re-identification technology realized on the basis of pedestrian detection is a popular research direction in the field of computer vision at present, is used as an auxiliary means for pedestrian identity identification in intelligent security monitoring, considers the characteristics of the whole body of a pedestrian in the identification process, provides more bases for identification, and provides identification possibility in a non-ideal environment.
With the gradual enhancement of the performance of the network model, the depth of the network model becomes deeper and deeper, and the network model branches more and more, which results in that the network model can consume more storage space and computing resources in the process of reasoning. The pedestrian detection and re-identification method and device based on the embedded platform with limited computing resources aim at realizing the detection of multiple pedestrians aiming at pedestrian images or pedestrian videos, simultaneously realizing the re-identification of the pedestrians in a cross-camera monitoring scene, considering the limited computing resources of edge equipment on the basis of meeting functional requirements and real-time system, carrying out lightweight design on a network model, meeting the actual requirements in the fields of intelligent security, public safety and the like, and having higher engineering application value.
The purpose of the invention is as follows: in order to overcome the defects of limited computing resources and storage space of the existing embedded edge device, the invention aims to provide a pedestrian detection and re-identification method and device based on an embedded platform with limited computing resources, which can process pedestrian image data and pedestrian video data, occupy less storage volume, have higher accuracy and low power consumption and solve the problems in the prior art.
In order to achieve the purpose, the invention adopts the technical scheme that:
in a first aspect, a method for detecting and re-identifying a pedestrian is provided, which includes:
acquiring a video image;
preprocessing a video image to obtain a preprocessed video image;
inputting the preprocessed video image into a pre-trained pedestrian detection model, and determining a pedestrian target and coordinate information according to the output of the pedestrian detection model;
cutting out a pedestrian target image on the preprocessed video image according to the determined pedestrian target and the coordinate information;
extracting the pedestrian target image by using a pre-trained pedestrian re-recognition model to obtain pedestrian target characteristics;
acquiring pedestrian characteristics in a pedestrian database, wherein the pedestrian database comprises a plurality of pedestrian characteristics with pedestrian ID tags;
calculating Euclidean distances between the pedestrian target features and pedestrian features in a pedestrian database, and selecting a pedestrian ID which has the minimum Euclidean distance with the pedestrian target features and meets the condition of an identification threshold value as a pedestrian re-identification result;
and drawing a rectangular frame in the region where the pedestrian target is located on the original video image and identifying the pedestrian ID according to the pedestrian target, the coordinate information and the pedestrian re-identification result.
In some embodiments, the video image is pre-processed, including:
and decoding the video stream, acquiring and storing an original video image, and zooming the original video image to a size which accords with the input size of the pedestrian detection model.
In some embodiments, the method for constructing the pedestrian detection model includes:
the pedestrian detection model is constructed based on a YOLO network model framework and comprises a trunk feature extraction network, a reinforced feature extraction network and a prediction network;
the main feature extraction network adopts CSPNet and MobileNet 3 to replace the original CSPDarkNet, and is configured to abstract and extract the features of the picture to generate three deep feature maps;
the reinforced feature extraction network adopts an FPN network to construct a third-order pyramid and is configured to perform up-sampling on three deep feature maps to realize feature fusion, and then perform down-sampling to realize feature fusion to obtain feature information of three different scales;
the prediction network adopts a prediction branch line decoupling head to replace an original coupling head, and the prediction branch decoupling head consists of three detection heads with different functions, namely a classification head, a regression head and an anchor point prediction head; the method is configured to classify, carry out boundary regression and carry out anchor point prediction on the feature information of three different scales to obtain a prediction result.
In some embodiments, the method for constructing the human re-recognition model includes:
the pedestrian re-identification model selects ResNet18 as a main feature extraction network, and 512-dimensional pedestrian features are extracted from the input image;
during training, a Neck network is added after a main feature extraction network of the pedestrian re-recognition model, batchNorm1d operation is carried out on features output by the main feature extraction network, and finally the output features are subjected to dimension raising to 751 dimension for predicting 751 pedestrian IDs.
In some embodiments, the training method of the pedestrian detection model includes:
acquiring a pedestrian detection image data set with a label, and dividing the pedestrian detection image data set into a training data set and a verification data set according to a proportion;
performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
and performing iterative optimization training on the pedestrian detection model by using the training data set subjected to data enhancement processing until an iteration stop condition is met by taking the minimized composite loss function as a target to obtain the trained pedestrian detection model.
The pedestrian detection model adopts a composite loss function which is the sum of three parts of category prediction loss, confidence prediction loss and boundary box regression loss, wherein the boundary box regression loss function adopts GIoU loss The formula is as follows:
Figure BDA0003749865430000041
the calculation method comprises the steps of IoU, calculating the ratio of intersection and union of a predicted frame and a real frame, C represents the minimum circumscribed rectangle of the real frame given by a label and the predicted frame given by a model, A U B represents the union of the real frame and the predicted frame, and C- (A U B) is the area of C which does not cover A and B.
In some embodiments, the training method of the pedestrian re-identification model includes:
acquiring a pedestrian re-identification data set with a label, and dividing the pedestrian re-identification data set into a training data set and a verification data set according to a proportion;
performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
to minimize the loss function Identity loss And performing iterative optimization training on the pedestrian re-recognition model by using the training data set subjected to data enhancement until an iteration stop condition is met to obtain a trained pedestrian re-recognition model.
The loss function adopted by the pedestrian re-identification model is Identity loss The formula is as follows:
Figure BDA0003749865430000051
wherein ID represents the pedestrian category label, n represents the total number of pedestrian categories in the pedestrian re-identification training set, and p (y) i |x i ) Representing an input image x i Predicted as pedestrian label y after being classified by softmax i The probability of (c).
In some embodiments, the trained pedestrian detection model and the pedestrian re-identification model are quantized by the network compression module, including:
the weight value and the activation value of the trained model are quantized and compressed, the quantization process is to replace the weight value and the activation value of a 32-bit-width floating point number with a fixed point number with 8-bit-width, and the original signed continuous value is quantized and compressed to only 2 8 The quantization process of the discrete value domain consisting of fixed point numbers is as follows:
Figure BDA0003749865430000052
wherein x represents a weight value or an activation value before quantization, x Q Representing the value of the weighted or activated value, X, after quantization max And X min Representing the maximum and minimum values of a 32-bit-wide floating-point number, the clamp function is to limit the weight value and the activation value to a given interval, and the formula is as follows:
Figure BDA0003749865430000053
the delta function represents the scaling factor for a fixed-point number compressed from a 32-bit wide floating-point number to an 8-bit wide fixed-point number, and is formulated as:
Figure BDA0003749865430000061
wherein a and b are given limit values.
In a second aspect, the present invention provides a pedestrian detection and re-identification apparatus, comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to the first aspect.
In a third aspect, a pedestrian detection and re-identification system includes the pedestrian detection and re-identification device of the second aspect.
The pedestrian detection and re-identification system further comprises:
the training data set building module is configured to acquire a labeled pedestrian detection model and a pedestrian re-identification data set and divide the pedestrian detection model and the pedestrian re-identification data set into a training data set and a verification data set according to a proportion; performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing of image properties and mosaic data enhancement;
the model construction training module is configured to perform iterative optimization training on the pedestrian detection model and the pedestrian re-recognition model by using the training data set subjected to data enhancement processing until an iteration stop condition is met, so as to obtain a trained pedestrian detection model and a trained pedestrian re-recognition model;
the network compression module is configured to quantize the trained pedestrian detection model and the trained pedestrian re-identification model, and acquire and store the quantized pedestrian detection model and the quantized pedestrian re-identification model;
and the deployment module is configured to convert the quantized pedestrian detection model and the pedestrian re-identification model into a model format which can be supported by the NPU of the embedded platform and is used on the embedded platform with limited computing resources.
Has the advantages that: the invention discloses a pedestrian detection and re-identification method, a device and a system, which realize the real-time integral identification of a pedestrian target on an embedded platform with limited computing resources through the technical scheme of the invention, compress the model volume through various means on the premise of ensuring the prediction precision of the model, and effectively improve the speed of target detection and re-identification. Has the following advantages:
1. the invention provides a pedestrian detection and re-identification method and device based on an embedded platform with limited computing resources, which can realize real-time detection and re-identification of pedestrians in videos or images on an embedded hardware platform.
2. The invention improves the YOLOX algorithm on the pedestrian detection algorithm, and improves the speed and the precision of detecting the pedestrian target.
3. The invention realizes a pedestrian re-identification algorithm based on the ResNet18 network, and realizes effective utilization of the representation information of the pedestrian target picture.
4. The invention adopts the optimization measure of quantitative compression to the network model, reduces the size of the model and improves the reasoning speed of the model.
Drawings
Fig. 1 is a schematic diagram of a pedestrian detection and re-identification system according to an embodiment of the invention.
Fig. 2 is a flow chart of implementing mosaic data enhancement in the data preprocessing module according to the embodiment of the present invention.
Fig. 3 is a network structure of a pedestrian detection model improved based on YOLOX in the embodiment of the present invention.
Fig. 4 is a network structure of a pedestrian re-identification model designed based on ResNet18 in the embodiment of the present invention.
FIG. 5 is a flow chart of a method for pedestrian detection and re-identification on an embedded device by a deployment module according to the present invention;
FIG. 6 is the accuracy and parameter quantities of the improved pedestrian detection model based on Yolox on the WiderPerson data set in the prediction module of the present invention;
FIG. 7 shows the accuracy and precision of a pedestrian re-identification model designed based on ResNet18 in a prediction module on a Market1501 data set;
fig. 8 is a diagram illustrating a result of predicting a frame in a surveillance video by the method and apparatus for pedestrian detection and re-identification based on an embedded platform with limited computing resources according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific methods.
In the description of the present invention, the meaning of a plurality is one or more, the meaning of a plurality is two or more, and the above, below, exceeding, etc. are understood as excluding the present numbers, and the above, below, within, etc. are understood as including the present numbers. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Example 1
A pedestrian detection and re-identification method, comprising:
acquiring a video image;
preprocessing a video image to obtain a preprocessed video image;
inputting the preprocessed video image into a pre-trained pedestrian detection model, and determining a pedestrian target and coordinate information according to the output of the pedestrian detection model;
cutting out a pedestrian target image on the preprocessed video image according to the determined pedestrian target and the coordinate information;
extracting the pedestrian target image by using a pre-trained pedestrian re-recognition model to obtain pedestrian target features;
acquiring pedestrian characteristics in a pedestrian database, wherein the pedestrian database comprises a plurality of pedestrian characteristics with pedestrian ID tags;
calculating Euclidean distances between the pedestrian target features and pedestrian features in a pedestrian database, and selecting a pedestrian ID which has the minimum Euclidean distance with the pedestrian target features and meets the condition of an identification threshold value as a pedestrian re-identification result;
and drawing a rectangular frame in the region where the pedestrian target is located on the original video image and identifying the pedestrian ID according to the pedestrian target, the coordinate information and the pedestrian re-identification result.
Preprocessing a video image, comprising:
and decoding the video stream, acquiring and storing an original video image, and zooming the original video image to a size which accords with the input size of the pedestrian detection model.
In some embodiments, the method for constructing the pedestrian detection model includes:
the pedestrian detection model is constructed based on a YOLO network model framework and comprises a trunk feature extraction network, a reinforced feature extraction network and a prediction network;
the main feature extraction network adopts CSPNet and MobileNet 3 to replace the original CSPDarkNet, and is configured to abstract and extract the features of the picture to generate three deep feature maps;
the reinforced feature extraction network adopts an FPN network to construct a third-order pyramid and is configured to perform up-sampling on three deep feature maps to realize feature fusion, and then perform down-sampling to realize feature fusion to obtain feature information of three different scales;
the prediction network adopts a prediction branch line decoupling head to replace an original coupling head, and the prediction branch decoupling head consists of three detection heads with different functions, namely a classification head, a regression head and an anchor point prediction head; the method is configured to classify, carry out boundary regression and carry out anchor point prediction on the feature information of three different scales to obtain a prediction result.
In some embodiments, the method for constructing the human re-recognition model includes:
the pedestrian re-identification model selects ResNet18 as a main feature extraction network, and 512-dimensional pedestrian features are extracted from the input image;
during training, a Neck network is added after a main feature extraction network of the pedestrian re-recognition model, batchNorm1d operation is carried out on features output by the main feature extraction network, and finally the output features are subjected to dimension raising to 751 dimension for predicting 751 pedestrian IDs.
In some embodiments, the training method of the pedestrian detection model includes:
acquiring a pedestrian detection image data set with a label, and dividing the pedestrian detection image data set into a training data set and a verification data set according to a proportion;
performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
and performing iterative optimization training on the pedestrian detection model by using the training data set subjected to data enhancement processing until an iteration stop condition is met by taking the minimized composite loss function as a target to obtain the trained pedestrian detection model.
The pedestrian detection model adopts a composite loss function which is the sum of three parts of category prediction loss, confidence prediction loss and boundary box regression loss, wherein the boundary box regression loss function adopts GIoU loss The formula is as follows:
Figure BDA0003749865430000111
the calculation method comprises the steps of IoU, calculating the ratio of intersection and union of a predicted frame and a real frame, C represents the minimum circumscribed rectangle of the real frame given by a label and the predicted frame given by a model, A U B represents the union of the real frame and the predicted frame, and C- (A U B) is the area of C which does not cover A and B.
In some embodiments, the training method of the pedestrian re-identification model includes:
acquiring a pedestrian re-identification data set with a label, and dividing the pedestrian re-identification data set into a training data set and a verification data set according to a proportion;
performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
to minimize the loss function Identity loss And performing iterative optimization training on the pedestrian re-recognition model by using the training data set subjected to data enhancement until an iteration stop condition is met to obtain a trained pedestrian re-recognition model.
The loss function adopted by the pedestrian re-identification model is Identity loss The formula is as follows:
Figure BDA0003749865430000112
wherein ID represents pedestrian category label, n represents pedestrian category total number in pedestrian re-identification training set, and p (y) i |x i ) Representing an input image x i Predicted as pedestrian label y after being classified by softmax i The probability of (c).
In some embodiments, the trained pedestrian detection model and the pedestrian re-identification model are quantized by the network compression module, including:
the weight value and the activation value of the trained model are quantized and compressed, the quantization process is to replace the weight value and the activation value of a 32-bit-width floating point number with a fixed point number with 8-bit-width, and the original signed continuous value is quantized and compressed to only 2 8 The quantization process of the discrete value domain consisting of fixed point numbers is as follows:
Figure BDA0003749865430000121
wherein x represents a weight value or an activation value before quantization, x Q Representing the value of the weighted or activated value, X, after quantization max And X min Representing the maximum and minimum values of a 32-bit-wide floating-point number, the clamp function is to limit the weight value and the activation value to a given interval, and the formula is as follows:
Figure BDA0003749865430000122
the delta function represents the scaling factor for a fixed-point number compressed from a 32-bit wide floating-point number to an 8-bit wide fixed-point number, and is formulated as:
Figure BDA0003749865430000123
wherein a and b are given limit values.
In some embodiments, as shown in fig. 5, the method specifically includes:
step 1, video image acquisition, namely acquiring a real-time video stream shot by a camera module on the embedded equipment and preprocessing the real-time video stream. The preprocessing comprises decoding a video stream, acquiring and storing an original video image, calling an image processing chip (ISP) to zoom the original video image, recording the zoom ratio, and zooming the original video image from 1920-1080 to 640-640 to be used as an input image of a pedestrian detection model.
And 2, reasoning the real-time video image obtained in the step 1 by utilizing a pedestrian detection network model deployed on the embedded equipment, obtaining a reasoning result (x, y, w, h, c, cls) of the detected target, judging whether the confidence coefficient reaches a set threshold value, and judging that the detection result meeting the conditions is the pedestrian target.
And 3, acquiring inference information (x, y, w, h) of the pedestrian target detected in the step 2, cutting out a pedestrian target image from the pedestrian image of 640 x 640, and zooming to the image size of 256 x 128.
Step 4, acquiring the image and the coordinate information of the pedestrian target processed in the step 3, calling a pedestrian re-recognition model deployed on the embedded platform to extract the features of the pedestrian target, comparing the features with the features of the pedestrians in a pedestrian database, and calculating the Euclidean space distance between the features, wherein the formula is as follows:
Figure BDA0003749865430000131
and selecting the pedestrian ID with the minimum Euclidean distance from the pedestrian target characteristics, comparing the pedestrian ID with a set identification threshold value, and packaging the pedestrian ID meeting the judgment condition as an inference result of the re-identification model into (x, y, w, h, ID).
And step 5, acquiring the original video image frame stored in the step 1, calculating the scaling, representing the output result of the step 5 on the original video image in a coordinate mode to obtain a visual inference image, wherein the visual inference image comprises the steps of displaying the area where the pedestrian target is located on the original video image, drawing a rectangular frame and a pedestrian ID, and outputting the inference image to display equipment in an image frame mode.
Example 2
In a second aspect, the present embodiment provides a pedestrian detection and re-identification apparatus, including a processor and a storage medium;
the storage medium is to store instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to embodiment 1.
Example 3
A pedestrian detection and re-identification system comprises the pedestrian detection and re-identification device and further comprises:
the training data set building module is configured to acquire a labeled pedestrian detection model and a pedestrian re-identification data set and divide the pedestrian detection model and the pedestrian re-identification data set into a training data set and a verification data set according to a proportion; performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
the model construction training module is configured to perform iterative optimization training on the pedestrian detection model and the pedestrian re-identification model by using the training data set subjected to data enhancement until an iteration stop condition is met, so as to obtain a trained pedestrian detection model and a trained pedestrian re-identification model;
the network compression module is configured to quantize the trained pedestrian detection model and the trained pedestrian re-identification model, and acquire and store the quantized pedestrian detection model and the quantized pedestrian re-identification model;
and the deployment module is configured to convert the quantized pedestrian detection model and the pedestrian re-identification model into a model format which can be supported by the NPU of the embedded platform and is used on the embedded platform with limited computing resources.
In some embodiments, as shown in FIG. 1, is a schematic diagram of a system.
The invention relates to a pedestrian detection and re-identification system based on an embedded platform with limited computing resources, which comprises a data processing module, a prediction module, a network compression module and a deployment module, and specifically comprises the following steps:
the method comprises the following steps that 1, a data processing module is used for acquiring a data set for training a pedestrian detection network and a pedestrian re-identification network, and carrying out various data preprocessing operations (random horizontal inversion, image attribute change and mosaic data enhancement) on an input image, and the method comprises the following steps:
step 1-1, acquiring a data set of pedestrian images required by a pedestrian detection model, carrying out pedestrian labeling on the images in the data set, and converting a labeled pedestrian image file into a data format conforming to a YOLOX network model.
And 1-2, acquiring a data set of the pedestrian image required by pedestrian re-identification, and classifying the data set according to the pedestrian ID and the camera number.
Through the steps 1-1 and 1-2, the obtained pedestrian detection data set and the pedestrian re-identification data set are divided into a training data set and a verification data set according to the proportion of 4:1 respectively, and data preprocessing operation is carried out on the training data set.
Step 1-3, during model training, the data processing module performs various data preprocessing operations (random horizontal inversion, image attribute change and mosaic data enhancement) on the pedestrian detection data set and the pedestrian re-identification data set to expand the training set.
Referring to fig. 2, the mosaic data enhancement takes a certain amount of data (batch) from the data set, randomly selects 4 images from the batch each time, selects random positions to perform image stitching, synthesizes new images, stores the new images in the data set, and repeats the batch size for several times.
And step 2, a prediction module is used for setting a network model for carrying out pedestrian target detection on the image and a network model for carrying out re-identification on the identity of the pedestrian.
And 2-1, the pedestrian detection network model adopted by the module refers to a YOLOX network model, and the network is decomposed into a main feature extraction network, a reinforced feature extraction network and a prediction network. A trunk feature extraction network CSPDarknet53 with high computing resource consumption in YOLOX is replaced by a CSPNet network and a lightweight MobileNet V3 network to extract pedestrian features and improve the pedestrian detection speed of the model; the FPN network is selected as an enhanced feature extraction network in a pedestrian detection network model, a 3-order feature pyramid is constructed, the detection capability of the model on pedestrians of different sizes is enhanced, and the detection on the pedestrian targets of different sizes is realized; the coupling head of the prediction network is replaced by a prediction branch line decoupling head, and the three branch decoupling heads are respectively used for classification, boundary regression and anchor point prediction, so that the detection precision of the pedestrian detection model is improved. The pedestrian detection model is structured as shown in fig. 3.
Step 2-2, since the resolution of the input image is larger to improve the detection accuracy and the resolution of the input image is too large to increase the amount of network computation, the input of the network model is adjusted to select an input image with an appropriate size, and in the pedestrian detection network model, the size of the input image is enlarged from 460 × 460 to 640 × 640, that is, the resolution of the input image is fixed to 640 × 640.
And 2-3, after the pedestrian image passes through a pedestrian detection network and the network is decomposed into a main feature extraction network, an enhanced feature extraction network and a prediction network, outputting tensors of 400 × 6, 1600 × 6 and 6400 × 6 by the three detection heads respectively, and outputting tensor dimensionality of 6 × 8400 after full connection layer and one transposition operation, wherein 8400 refers to the number of prediction frames, and 85 refers to information of each prediction frame (classification head prediction detection frame category confidence coefficient cls _ conf, regression head prediction detection frame coordinate regression quantity bbox _ reg and anchor point prediction detection frame target confidence coefficient obj _ conf). In addition, each layer of the pedestrian detection network model uses a Silu activation function, and the formula is as follows: f (x) = x sigmoid (x).
And 2-4, fusing the prediction results of the three detection heads to obtain a group of bboxes consisting of 5 prediction quantities, wherein each bbox consists of (x, y, w, h, C and C). Wherein x, y, w and h are the width and height of the center point of the regression box, and c represents the confidence coefficient that the target is the foreground. In addition, the confidence c also represents the intersection ratio of the bbox with the real box, and the formula is as follows:
Figure BDA0003749865430000161
c represents a predicted value of the category of the detected object, and the pedestrian detection model only detects a pedestrian target, so that the category is only one, and the formula is as follows:
Figure BDA0003749865430000162
during training, if an object is detected, the loss function value of the pedestrian detection network model is the sum of the category prediction loss, the confidence prediction loss and the boundary box regression loss. Wherein the bounding box regression loss function employs GIoU loss As shown in the following formula:
Figure BDA0003749865430000163
the method comprises the steps of calculating intersection and union ratios of predicted borders and real borders, wherein IoU is intersection and union ratios, C represents the minimum circumscribed rectangle of the real borders given by labels and the predicted borders given by a model, A U B represents the union of the real borders and the predicted borders, and C- (A U B) is the area, not covered with A and B, in C.
And 2-5, selecting a ResNet18 network by a pedestrian re-identification network model main network adopted by the module, removing the last two layers (an Average Pooling Average Pooling layer and a Full connection layer) of the ResNet18 to obtain a more effective characteristic diagram, connecting a binary adaptive mean value convergence layer (adaptive AvgPool2d layer), and efficiently extracting pedestrian target characteristics with the tensor dimension of 1 x 512 from a pedestrian target image.
And 2-6, considering the difference of the aspect ratio of the pedestrian target images in the pedestrian re-identification data set, performing cluster analysis on the aspect ratio, and selecting the size of the pedestrian target image as 256 × 128, namely fixing the resolution of the input image of the pedestrian re-identification model as 256 × 128.
And 2-7, when the network compression module triggers training, activating a hidden reinforced feature extraction network of the pedestrian re-recognition network model adopted by the module, wherein the reinforced feature extraction network is a full connection layer sequentially formed by a BatchNorm1d layer and a Linear layer, and finally obtaining 751-dimensional feature vector, and the model structure is shown in FIG. 4.
Step 2-8, the pedestrian re-identification model adopted by the module regards the re-identification problem of the pedestrian target as an image classification problem loss function, each ID is a pedestrian target class, the training set contains 751 pedestrian IDs in total, and the adopted loss function is Identity loss The predicted probability between the current pedestrian target and 751 pedestrian category labels is as follows:
Figure BDA0003749865430000171
wherein ID represents the pedestrian category label, n represents the total number of pedestrian categories in the pedestrian re-identification training set, and p (y) i |x i ) Representing an input image x i Predicted as pedestrian label y after being classified by softmax i The probability of (c).
Through the step 2, the pedestrian detection network model and the pedestrian re-identification network model designed by the module are easy to construct, and the model structure is simpler.
And step 3, the network compression module is used for triggering the data processing module to read the data in the training set so as to carry out learning training on the model parameters of the pedestrian detection and re-identification network models set in the prediction module, acquiring the weight values and the activation values of the trained pedestrian detection network and pedestrian re-identification network from the prediction module to carry out quantization compression processing, and storing the quantized compressed network models.
And 3-1, setting a training algorithm and training parameters for a preset network model in the module. The training algorithms of the two models both adopt a batch gradient descent method (BGD), the total iteration times (epoch) are 60 times, the training number (train batch) of each batch is 32, the initial learning rate (lr) is 0.003, the algorithm for adjusting the learning rate according to a given interval is adopted, the learning rate attenuation rate (gamma) is 0.0005, and the learning rate is updated every 5 epoochs.
Step 3-2, carrying out quantization compression on the weighted value and the activated value of the trained model, wherein the quantization process can be understood as replacing a 32-bit-width floating point number with an 8-bit-width fixed point number, and the original signed continuous value is quantized and compressed to only 2 8 The quantization process of the discrete value domain consisting of the fixed point numbers is as follows:
Figure BDA0003749865430000181
wherein x represents a weight value or an activation value before quantization, x Q Representing the value of the weighted or activated value, X, after quantization max And X min Representing the maximum and minimum values of a 32-bit-wide floating-point number, the clamp function is to limit the weight value and the activation value to a given interval, and the formula is as follows:
Figure BDA0003749865430000191
wherein a and b are given limit values;
the delta function represents the scaling factor for a fixed-point number compressed from a 32-bit wide floating-point number to an 8-bit wide fixed-point number, and is formulated as:
Figure BDA0003749865430000192
and 3-3, testing the quantized network model on the corresponding data set to obtain the detection precision and the recognition precision of the model, and when the precision of the model cannot meet the system requirement, modifying the training method according to the steps 3-1 and 3-2, increasing the total training iteration times, and adjusting the quantization parameters until a pedestrian detection model and a pedestrian re-recognition model meeting the system requirement are obtained for the subsequent deployment module to deploy in the embedded equipment.
And 4, deploying a module, converting the quantized pedestrian detection model and the pedestrian re-identification model from a PT format model into a model format which can be processed by an NPU (network platform unit), and constructing the pedestrian detection and re-identification device on embedded equipment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (10)

1. A pedestrian detection and re-identification method is characterized by comprising the following steps:
acquiring a video image;
preprocessing a video image to obtain a preprocessed video image;
inputting the preprocessed video image into a pre-trained pedestrian detection model, and determining a pedestrian target and coordinate information according to the output of the pedestrian detection model;
cutting out a pedestrian target image from the preprocessed video image according to the determined pedestrian target and the coordinate information;
extracting the pedestrian target image by using a pre-trained pedestrian re-recognition model to obtain pedestrian target features;
acquiring pedestrian characteristics in a pedestrian database, wherein the pedestrian database comprises a plurality of pedestrian characteristics with pedestrian ID labels;
calculating Euclidean distances between the pedestrian target features and pedestrian features in a pedestrian database, and selecting a pedestrian ID which has the minimum Euclidean distance with the pedestrian target features and meets the condition of an identification threshold value as a pedestrian re-identification result;
and drawing a rectangular frame in the region where the pedestrian target is located on the original video image and identifying the pedestrian ID according to the pedestrian target, the coordinate information and the pedestrian re-identification result.
2. The pedestrian detection and re-identification method according to claim 1, wherein preprocessing the video image comprises:
and decoding the video stream, acquiring and storing an original video image, and zooming the original video image to a size which accords with the input size of the pedestrian detection model.
3. The pedestrian detection and re-identification method according to claim 1, wherein the construction method of the pedestrian detection model comprises:
the pedestrian detection model is constructed based on a YOLO network model framework and comprises a trunk feature extraction network, a reinforced feature extraction network and a prediction network;
the main feature extraction network adopts CSPNet and MobileNet 3 to replace the original CSPDarkNet, and is configured to abstract and extract the features of the picture to generate three deep feature maps;
the reinforced feature extraction network adopts an FPN network to construct a third-order pyramid and is configured to perform up-sampling on three deep feature maps to realize feature fusion, and then perform down-sampling to realize feature fusion to obtain feature information of three different scales;
the prediction network adopts a prediction branch line decoupling head to replace an original coupling head, and the prediction branch decoupling head consists of three detection heads with different functions, namely a classification head, a regression head and an anchor point prediction head; the method is configured to classify, carry out boundary regression and carry out anchor point prediction on the feature information of three different scales to obtain a prediction result.
4. The pedestrian detection and re-identification method according to claim 1, wherein the construction method of the human re-identification model comprises:
the pedestrian re-identification model selects ResNet18 as a main feature extraction network, and 512-dimensional pedestrian features are extracted from the input image;
during training, a Neck network is added after a main feature extraction network of the pedestrian re-recognition model, batchNorm1d operation is carried out on features output by the main feature extraction network, and finally the output features are subjected to dimension raising to 751 dimension for predicting 751 pedestrian IDs.
5. The pedestrian detection and re-identification method according to claim 1, wherein the training method of the pedestrian detection model comprises:
acquiring a pedestrian detection image data set with a label, and dividing the pedestrian detection image data set into a training data set and a verification data set according to a proportion;
performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
performing iterative optimization training on the pedestrian detection model by using the training data set subjected to data enhancement processing until an iteration stop condition is met by taking the minimized composite loss function as a target to obtain a trained pedestrian detection model;
and/or the training method of the pedestrian re-recognition model comprises the following steps:
acquiring a pedestrian re-identification data set with a label, and dividing the pedestrian re-identification data set into a training data set and a verification data set according to a proportion;
performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
to minimize the loss function Identity loss And performing iterative optimization training on the pedestrian re-recognition model by using the training data set subjected to data enhancement until an iteration stop condition is met to obtain a trained pedestrian re-recognition model.
6. The pedestrian detection and re-identification method of claim 1, wherein the pedestrian detection model employs a composite loss function, which is a sum of three parts, namely, class prediction loss, confidence prediction loss and bounding box regression loss, wherein the bounding box regression loss function employs GIoU loss
And/or the loss function adopted by the pedestrian re-identification model is Identity loss The formula is as follows:
Figure FDA0003749865420000031
wherein ID represents the pedestrian category label, n represents the total number of pedestrian categories in the pedestrian re-identification training set, and p (y) i |x i ) Representing an input image x i Predicted as pedestrian label y after being sorted by softmax i The probability of (c).
7. The pedestrian detection and re-identification method according to claim 1, wherein the trained pedestrian detection model and pedestrian re-identification model are quantized by a network compression module, comprising:
the weight value and the activation value of the trained model are quantized and compressed, the quantization process is to replace the weight value and the activation value of a 32-bit-width floating point number with a fixed point number with 8-bit-width, and the original signed continuous value is quantized and compressed to only 2 8 The quantization process of the discrete value domain consisting of fixed point numbers is as follows:
Figure FDA0003749865420000041
wherein x represents a weight value or an activation value before quantization, x Q Representing the value of the weighted or activated value, X, after quantization max And X min Representing the maximum and minimum values of a 32-bit-wide floating-point number, the clamp function is to limit the weight value and the activation value to a given interval, and the formula is as follows:
Figure FDA0003749865420000042
the delta function represents the scaling factor for a fixed-point number compressed from a 32-bit wide floating-point number to an 8-bit wide fixed-point number, and is formulated as:
Figure FDA0003749865420000043
wherein a and b are given limit values.
8. A pedestrian detection and re-identification device is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 7.
9. A pedestrian detection and re-identification system comprising the pedestrian detection and re-identification device of claim 8.
10. The pedestrian detection and re-identification system of claim 9, further comprising:
the training data set construction module is configured to acquire a pedestrian detection model with a label and a pedestrian re-identification data set, and divide the pedestrian detection model and the pedestrian re-identification data set into a training data set and a verification data set according to a proportion; performing data enhancement processing on samples in a training data set, wherein the data enhancement comprises: random horizontal inversion, changing image attributes and mosaic data enhancement;
the model construction training module is configured to perform iterative optimization training on the pedestrian detection model and the pedestrian re-identification model by using the training data set subjected to data enhancement until an iteration stop condition is met, so as to obtain a trained pedestrian detection model and a trained pedestrian re-identification model;
the network compression module is configured to quantize the trained pedestrian detection model and the trained pedestrian re-identification model, and acquire and store the quantized pedestrian detection model and the quantized pedestrian re-identification model;
and the deployment module is configured to convert the quantified pedestrian detection model and the pedestrian re-identification model into a model format which can be supported by an NPU (network platform unit) of the embedded platform and is used on the embedded platform with limited computing resources.
CN202210838515.3A 2022-07-18 2022-07-18 Pedestrian detection and re-identification method, device and system Pending CN115187906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210838515.3A CN115187906A (en) 2022-07-18 2022-07-18 Pedestrian detection and re-identification method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210838515.3A CN115187906A (en) 2022-07-18 2022-07-18 Pedestrian detection and re-identification method, device and system

Publications (1)

Publication Number Publication Date
CN115187906A true CN115187906A (en) 2022-10-14

Family

ID=83519713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210838515.3A Pending CN115187906A (en) 2022-07-18 2022-07-18 Pedestrian detection and re-identification method, device and system

Country Status (1)

Country Link
CN (1) CN115187906A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Similar Documents

Publication Publication Date Title
CN109919031B (en) Human behavior recognition method based on deep neural network
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN114022432B (en) Insulator defect detection method based on improved yolov5
CN111798456A (en) Instance segmentation model training method and device and instance segmentation method
CN112560831B (en) Pedestrian attribute identification method based on multi-scale space correction
CN112183468A (en) Pedestrian re-identification method based on multi-attention combined multi-level features
CN111813997B (en) Intrusion analysis method, device, equipment and storage medium
CN113255443B (en) Graph annotation meaning network time sequence action positioning method based on pyramid structure
CN112528961B (en) Video analysis method based on Jetson Nano
CN111126197B (en) Video processing method and device based on deep learning
CN110222636B (en) Pedestrian attribute identification method based on background suppression
CN112364791B (en) Pedestrian re-identification method and system based on generation of confrontation network
CN113128360A (en) Driver driving behavior detection and identification method based on deep learning
CN113221770B (en) Cross-domain pedestrian re-recognition method and system based on multi-feature hybrid learning
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN115240024A (en) Method and system for segmenting extraterrestrial pictures by combining self-supervised learning and semi-supervised learning
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN115240119A (en) Pedestrian small target detection method in video monitoring based on deep learning
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN113724286A (en) Method and device for detecting saliency target and computer-readable storage medium
CN113128476A (en) Low-power consumption real-time helmet detection method based on computer vision target detection
CN115187906A (en) Pedestrian detection and re-identification method, device and system
CN114782979A (en) Training method and device for pedestrian re-recognition model, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination