CN117173794B

CN117173794B - Pedestrian re-identification method suitable for edge equipment deployment

Info

Publication number: CN117173794B
Application number: CN202311450373.4A
Authority: CN
Inventors: 区英杰; 符桂铭; 董万里; 谭焯康
Original assignee: Guangzhou Embedded Machine Tech Co ltd
Current assignee: Guangzhou Embedded Machine Tech Co ltd
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-03-08
Anticipated expiration: 2043-11-03
Also published as: CN117173794A

Abstract

The invention discloses a pedestrian re-identification method suitable for edge equipment deployment, which comprises the following steps: obtaining a pedestrian envelope frame on each frame of image, and endowing an id for each pedestrian envelope frame and tracking; filtering the pedestrian envelope frames with the same id; extracting pedestrian features and pedestrian attributes, fusing the pedestrian features and the pedestrian attributes of the same id, importing the fused pedestrian features into a database, adopting mean shift clustering in the database, further fusing the pedestrian features among different ids, removing redundant ids, and finally obtaining pedestrian features and attribute information of different ids; and finally, re-identifying pedestrians. The pedestrian re-identification method provided by the invention reduces the calculated amount, ensures the accuracy of the final inquiry effect of the scheme, and is more suitable for the deployment of edge equipment.

Description

Pedestrian re-identification method suitable for edge equipment deployment

Technical Field

The invention relates to the field of edge calculation, in particular to a pedestrian re-identification method suitable for edge equipment deployment.

Background

The pedestrian re-recognition technology is a research hotspot in the field of computer vision, can realize cross-border search of the same pedestrian under different monitoring cameras, can be used for lost person search, pedestrian track generation and active area positioning, and has wide application prospects in market and park management.

The existing pedestrian re-identification method based on deep learning mainly utilizes fusion of multiple information such as face information, pedestrian attribute information, pedestrian global or local abstract features and the like to compare pedestrians, and the pedestrian re-identification method is high in similarity, namely the same person. For a pedestrian re-identification system, the number of the processed monitoring video is large, the generated data is huge, and if the huge data is directly transmitted to a central server, a large amount of bandwidth is occupied and a large amount of computation load is consumed. With the advent of the worldwide interconnecting age, edge computing devices are widely used to replace part or all of the central server tasks to relieve the bandwidth pressure and computing energy consumption of the servers.

The technical problem of the method is that the traditional multi-information fusion of pedestrian re-identification is realized, a plurality of algorithm models and more complex business logic are needed to cooperate, the calculation power of the main stream edge equipment is limited at present, even if the main stream edge equipment is realized, the delay is higher, or the proposed scheme is difficult to achieve the balance of precision and efficiency on the edge equipment.

For example, the application number is 201910099228.3, the invention is a self-adaptive city searching method based on edge calculation, and the calculated amount is reduced by filtering redundant video frames; pedestrian detection is carried out on the uploaded video frames, pedestrian re-identification is carried out on a certain proportion of detected pedestrians, and characteristics of the pedestrians are output; the edge server uploads K pedestrians with the highest identified and found target similarity to the cloud server at regular time; and re-identifying pedestrians in the remaining proportion by utilizing a pedestrian re-identification module on the cloud server, combining identification results of the cloud server and the edge server, and returning the identified K pedestrians with the highest similarity with the search target to the terminal at fixed time for processing. The technical scheme has the following problems: (1) The model extracts single pedestrian characteristic information, and more interference exists in practical application; (2) Filtering redundant video frames mainly utilizes the similarity of image histograms as global features, and easily ignoring the difference of local features, so that effective information is lost; (3) The task of extracting features is proportionally distributed to the server and the side devices, and if the calculation force or bandwidth difference between the server and the side devices is large, the synchronization problem still occurs.

For another example, the application number is 202211708842.3, and the invention is the technical scheme of the pedestrian re-identification method, system and device of the distributed edge collaborative reasoning, and the image to be processed is obtained through the distributed edge collaborative reasoning architecture; extracting identity features and attribute features of an image through ResNet-50 to obtain a first classification result of the identity of a pedestrian in the image; obtaining attribute classification results of all pedestrians in the image according to the attribute characteristics; carrying out attribute feature fusion according to the attribute classification result to obtain fusion attribute features, and obtaining a second classification result of the identity of the pedestrian in the image according to the fusion attribute features; and carrying out pedestrian re-recognition according to the first classification result and the second classification result. The technical scheme has the following problems: (1) The Resnet50 is adopted to extract the identity characteristics and the attribute characteristics of the image, the model is large, and the edge end reasoning consumes long time; (2) No filtering mechanism is proposed and the full video sequence processing is computationally intensive.

For another example, the application number is CN202211138091.6, and the invention is named as a pedestrian re-recognition and tracking method based on edge calculation, firstly, multi-angle feature extraction is performed on pedestrians, single-camera feature detection is performed, and a multi-angle initial feature set of the pedestrians is constructed; then acquiring pedestrian characteristics and tracking target pedestrians; finally, searching pedestrian characteristics, and re-identifying pedestrians to obtain target pedestrian data; the multi-angle pedestrian characteristics are collected at the scene entrance and the scene exit, and the pedestrian re-recognition and pedestrian tracking algorithm is fused to accelerate calculation and be deployed at the edge side, so that the re-recognition and tracking result with good real-time performance can be provided. The technical scheme has the following problems: (1) The scheme has limitation, and authors limit both the initial feature set and the acquisition place, so that the scheme is not applicable to wider scenes; (2) The Fairmot is used for simultaneously carrying out pedestrian tracking and pedestrian feature extraction, the Fairmot is a multi-target detection and tracking algorithm, the capability of acquiring local features is weak, and the pedestrian features extracted by the Fairmot are directly used for pedestrian re-recognition, so that false recognition can exist.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art and provide a pedestrian re-identification method suitable for edge equipment deployment, which starts from the positions of model improvement and business logic and reduces the calculated amount under the condition of ensuring the precision.

The aim of the invention is achieved by the following technical scheme:

a pedestrian re-identification method suitable for edge equipment deployment comprises the following steps:

s1, obtaining pedestrian envelope frames on each frame of image through a target detection model, endowing each pedestrian envelope frame with an id by utilizing a sort multi-target tracking model, and tracking all the pedestrian envelope frames; filtering the pedestrian envelope frames with the same id;

s2, simultaneously extracting pedestrian features and pedestrian attributes from the pedestrian images through a fastreid model, fusing multi-frame pedestrian features and pedestrian attributes of the same id, importing the fused pedestrian features into a database, adopting mean shift clustering in the database, further fusing the pedestrian features among different ids, removing redundant ids, and finally obtaining pedestrian features and attribute information of different ids;

s3, performing cosine distance calculation on pedestrian features of different ids to obtain K similar pedestrians, and then performing secondary filtering by using pedestrian attribute information of the K similar pedestrians to filter attribute mismatch results in the K similar pedestrians so as to obtain the most similar pedestrian re-recognition result.

In step S1, the target detection model is a yolox_s model after light weight.

The lightweight process of the yolox_s model comprises the following steps:

(1) Performing batch norm weight sparse training on the yolox_s model, and then cutting a channel with the bn layer weight of the yolox_s model smaller than a preset value;

(2) Performing finetune fine tuning training on the trimmed yolox_s model;

(3) The yolox_s model performs int8 quantization when the edge device is migrated.

The bn layer of the yolox_s model is a feature normalization operator of the yolox_s model, the feature normalization operator is a batch norm2d operator, and the formula is as follows:

；

wherein,for input data requiring normalization, +.>And->As the mean and variance of the batch data,variable added to prevent zero occurrence of denominator, < >>And->Affine operation, namely linear transformation is carried out on the input value; from the formula, the weight of bn layer is +.>And->Output->Each channel in (a) has a corresponding +.>And->Use +.>Judging the importance degree of different channels; when->If the weight of the channel is smaller than the preset value, the weight of the channel is cut if the follow-up influence of the data of the channel on the model is smaller; the goal of the catch-norm weight sparse training procedure is to obtain a fraction of smaller +.>The value of the whole channel cut-out ratio is 40%, namely, the part with 40% smaller is +.>And clipping the channel corresponding to the value.

In step S1, the working process of the target detection model is as follows:

firstly, acquiring image data from a camera terminal, preprocessing the image data, and inputting the preprocessed image data into a target detection model yolox_s for reasoning: the image data is subjected to convolution operation on a backbone network of a target detection model yolox_s to obtain an output characteristic diagram; then inputting the feature map to an output head of a target detection model yolox_s to carry out regression and classification of a target frame;

the output head of the object detection model yolox_s has three branches in total: cls_output is used for predicting the class and the fraction of the target frame, the output size is W.H.C, wherein W is the width of the output feature map, H is the height of the output feature map, and C is the number of detection classes; the obj_output is used for judging whether the target frame is a foreground or a background, and the output size is W.times.H.times.1; reg_output used for predicting coordinate information (x, y, W, H) of the target frame, and outputting a W x 4;

combining three branches of an output head of a target detection model yolox_s to obtain final candidate prediction frame information, wherein the size of the final candidate prediction frame information is pred_num_dim, wherein pred_num=w_h, the number of the prediction frames is represented, dim=c+1+4, and the feature vector dimension of each prediction frame is represented; each prediction frame contains a feature vector with a dimension dim, [ cls_1 cls_2 ]. Cls_C obj x y h ], wherein x is x coordinate information of a center point of the target frame, y is y coordinate information of the center point of the target frame, w is width information of the target frame, h is height information of the target frame, obj is score information of the target frame, cls_1 is score of a category 1 of the target frame, and cls_C is score of a category C of the target frame;

post-processing candidate prediction frame results: firstly, reducing the normalized coordinates of the predicted frame to the actual pixel size, then filtering the redundant predicted frame by utilizing non-maximum value inhibition, and outputting to obtain the final pedestrian envelope frame.

The preprocessing includes normalized scaling of image size and normalization of data.

In step S1, the working process of the sort multi-target tracking model is as follows:

(1) An id is assigned to each pedestrian envelope frame obtained by the target detection model;

(2) Initializing a target tracker: for each detected pedestrian target, creating a Kalman filtering target tracker for each detected pedestrian target, wherein the Kalman filtering target tracker is used for tracking the motion of the pedestrian target in the subsequent image frames;

(3) Multi-frame tracking: for each frame, the following operations are performed:

1) And (3) data association: matching a pedestrian envelope frame bboxmeasurement obtained by a Kalman filtering prediction pedestrian envelope frame bboxprediction and a target detection model by using a Hungary algorithm;

2) And (5) updating the state: and updating the current state of a pedestrian envelope frame bboxmeasurement obtained by using a Kalman filtering prediction pedestrian envelope frame bboxprediction and a target detection model to obtain a pedestrian tracking envelope frame bboxoptimal as a tracking result.

In the step S1, the filtering comprises space-time filtering, quality filtering and posture assessment which are sequentially carried out; wherein, the space-time filtering means that for a certain pedestrian tracking result of the existing id, only the frame interval between the existing id and the previous frame image is kept to be larger than a time threshold value and the space interval is larger thanThe pedestrian tracking result of the space threshold value is then filtered by the quality of the reserved pedestrian tracking result; the quality filtering means filtering the pedestrian tracking results which do not meet the quality index, and scoring each quality index of the pedestrian tracking results at the same time, wherein the weighted summation of all the quality index scores is the quality score S of the pedestrian tracking results ₁ Then, the reserved pedestrian tracking result enters into gesture evaluation; the gesture evaluation is to divide the pedestrian gesture into a plurality of attributes, and output probability values of the plurality of pedestrian gestures and quality scores S through an attribute classification model ₁ Weighted summation to obtain final pedestrian quality score S _final ；

Inputting a pedestrian tracking result through space-time filtering, quality filtering and attitude evaluation: if the corresponding id in the pedestrian tracking result is stored with n pedestrian tracking results, if n is smaller than the frame number corresponding to the id, directly inputting, if n is greater than or equal to the frame number corresponding to the id, comparing the pedestrian quality scores S _final Pedestrian quality score S if pedestrian tracking result _final And if the score is larger than the minimum value of the scores of the n pedestrian tracking results, replacing.

In step S2, the pedestrian features and the pedestrian attributes are extracted from the pedestrian image through the fastreid model at the same time, which specifically includes the following steps: firstly, intercepting pedestrians in an image through a reserved pedestrian envelope frame, and then, preprocessing to obtain standard data; and then importing the standard data into a fastreid model, obtaining pedestrian characteristics after convolution operation, and outputting pedestrian attribute classification results after the pedestrian characteristics pass through a linear classifier.

In step S2, the multi-frame pedestrian feature fusion is obtained by a weighted summation mode; the multi-frame pedestrian attribute adopts a high-low double-threshold mode to judge the attribute, and performs result statistics on each attribute, and the state with the highest accumulated value is the pedestrian attribute fusion result; by fusing, one id corresponds to only one pedestrian feature result and one pedestrian attribute result.

Meanwhile, the invention provides:

a server comprising a processor and a memory, wherein at least one section of program is stored in the memory, and the program is loaded and executed by the processor to realize the pedestrian re-identification method suitable for edge device deployment.

A computer readable storage medium having stored therein at least one program loaded and executed by a processor to implement the pedestrian re-identification method described above as being suitable for edge device deployment.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. for quality evaluation and filtering of pedestrian pictures, multiple filtering (space-time, quality and gesture) and multiple quality evaluation indexes (resolution, definition, integrity, brightness and aspect ratio) are adopted to evaluate and filter the pedestrian pictures, so that on one hand, distortion pictures affecting accuracy are removed, on the other hand, redundant pictures affecting calculation efficiency, including spatial redundancy, temporal redundancy and gesture redundancy, are filtered, and the gesture is utilized as a scoring reference, so that recognition influence caused by loss of side gesture information is avoided. Compared with the prior art, the redundancy of the whole frame of image is not directly and globally judged, and the loss of local information is avoided.

2. In order to adapt to edge end deployment, the invention adopts a multitask model to extract pedestrian characteristics and pedestrian attributes, has multiple functions, and effectively reduces the calculated amount while ensuring the richer information extraction capability. In addition, the model is subjected to light weight processing, so that the model reasoning speed is further improved. Compared with the prior art, the model in the prior art is unreasonable in design, is not improved in light weight, is not suitable for edge end deployment, is single in feature extraction result, and is not strong in extraction capability.

3. The invention fuses the information in the same id, including the fusion of the characteristics and the attributes, on one hand, the time-space information characteristics are obtained, and on the other hand, the calculated data volume is reduced; for different ids, clustering fusion is also performed, and redundancy caused by pedestrian tracking errors among ids is reduced. The prior art mainly fuses the characteristics, but does not see the condition of fusing the attributes.

4. The query process of the invention carries out secondary judgment, and the pedestrian query process comprises two flows: and the feature comparison and the attribute comparison are carried out to search and filter similar pedestrians from the angles of the global features and the local features, so that the accuracy of the query result is improved.

Drawings

Fig. 1 is a flowchart of a pedestrian feature acquisition phase according to the present invention.

Fig. 2 is a flowchart of the pedestrian re-recognition phase according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

A pedestrian re-recognition method suitable for edge equipment deployment comprises two stages, namely a pedestrian characteristic acquisition stage and a pedestrian re-recognition stage.

1. Pedestrian feature acquisition phase (see FIG. 1)

1.1 pedestrian target detection

The adopted target detection algorithm is yolox_s, yolox adopts yolov3_spp as a reference model, and a decoupling Head, namely a coupled Head, an Anchor-free and other tricks are adopted for improvement, so that the Yolox-Darknet53 network structure is finally obtained.

The pedestrian target detection algorithm flow is as follows: firstly, acquiring image data from a camera terminal, and preprocessing the image data, wherein the preprocessing comprises standardized scaling of image size and normalization processing of data; the preprocessed image data are input into a yolox_s model for reasoning, the model mainly comprises a main network and an output head, and the data are subjected to convolution operation of various operators such as a convolution layer, a normalization layer, an activation layer and the like in the main network to obtain an output characteristic diagram; the feature map is then input to the output head for regression and classification of the target box. The output head has three branches in total: cls_output, mainly predicting the class and the fraction of the target frame, wherein the output size is W.times.H.times.C; obj_output is used for mainly judging whether a target frame is a foreground or a background, and the output size is W.times.H.times.1; reg_output is used for mainly predicting coordinate information (x, y, W and H) of a target frame, and outputting a W x 4; wherein W is the width of the output feature map, H is the height of the output feature map, C is the number of detection categories, for example, when the detection categories are only pedestrians, C is 1; and finally, combining the three output to obtain final candidate prediction frame information, wherein the size of the final candidate prediction frame information is pred_num, the pred_num=w×h, the number of the prediction frames is represented, the dim=c+1+4, and the feature vector dimension of each prediction frame is represented. Each prediction box contains a feature vector with dimension dim, [ cls_1cls_2..cls_c obj x ywh ]; x is the x coordinate information of the center point of the target frame; y is the y coordinate information of the center point of the target frame; w, target frame width information; h, target frame height information; obj is target frame score information; cls_1, target frame class 1 score; cls_C, target box category C score. The candidate prediction frames result in a large number and require filtering. There is then a subsequent post-treatment process. And firstly reducing the normalized coordinates of the predicted frame to the actual pixel size, then filtering the redundant predicted frame by utilizing non-maximum value inhibition, and outputting to obtain the final pedestrian target detection envelope frame.

In order to improve the running efficiency of the pedestrian target detection algorithm, the invention also performs light weight processing on the yolox_s model, and comprises the following operations: (1) Performing the sparse training of the weight of the batch norm layer, and then cutting the weight of the batch norm layerSmaller channels, overall channel cut-out ratio 40%; (2) performing finetune fine tuning training on the model after clipping; and (3) carrying out int8 quantization when the model is transplanted to the edge equipment. The pedestrian detection envelope frame can be obtained in real time by the target detection algorithm after the light weight processing.

The batch norm layer is a feature normalization operator in the network, this time is a batch norm2d operator, and the operator formula is as follows:

；

wherein,for input data requiring normalization, +.>And->As the mean and variance of the batch data,variable added to prevent zero occurrence of denominator, < >>And->The input values are subjected to affine operation, namely linear transformation, and the values are normalized information of the model on the training data of the current scene in the training process. From the formula, the weight of bn layer is +.>And->Output->Each channel in (a) has a corresponding +.>And->Use +.>Judging the importance degree of different channels; when->If the weight of the channel is smaller than the preset value, the weight of the channel is cut if the follow-up influence of the data of the channel on the model is smaller; the goal of the catch-norm weight sparse training procedure is to obtain a fraction of smaller +.>The value of the whole channel cut-out ratio is 40%, namely, the part with 40% smaller is +.>And clipping the channel corresponding to the value. The clipped pedestrian detection yolox_s model has smaller video memory occupation and obvious speed improvement.

The pedestrian detection envelope frame on each frame of image can be obtained by transmitting the image frames obtained by image acquisition to a pedestrian detection yolox_s model.

1.2 pedestrian target tracking

In order to improve the accuracy of pedestrian re-identification, the space-time information of pedestrians can be introduced, namely, the target detection result of the same continuous multi-frame pedestrians is obtained, and the process can be realized through a tracking algorithm, and the specific flow is as follows:

the pedestrian target detection envelope frame of the current image can be obtained through the pedestrian target detection algorithm of 1.1, each envelope frame is mainly endowed with an id through the tracking algorithm, the predicted envelope frame of each envelope frame in the next frame is predicted, the pedestrian target detection envelope frame of the next frame corresponds to the predicted envelope frame one by one, and finally a tracking envelope frame result is obtained through updating.

The invention adopts a concise SORT multi-target tracking algorithm to endow id to pedestrian detection results and continuously track, and the detection results of the same id are regarded as the same pedestrian, and the working process is as follows:

(1) Pedestrian target detection: using a target detection algorithm of 1.1 to detect the pedestrian target of the input image frame, and endowing each envelope frame with a unique id;

(2) Initializing a target tracker: for each detected target, a Kalman filter target Tracker (Tracker) is created for it. Is responsible for tracking the motion of the object in subsequent image frames.

1) And (3) data association: matching an envelope frame bboxmeasurement obtained by the Kalman filtering prediction envelope frame bboxprediction and a target detection algorithm by using a Hungary algorithm;

2) And (5) updating the state: and updating the current state by using the bboxprediction and the bboxmeasurement to obtain a tracking envelope frame bboxoptimal as a tracking result.

1.3 pedestrian image scoring and filtering

After the pedestrian target tracking is carried out, target detection results of the same pedestrian continuous multiframe are obtained, in the detection results, distortion conditions of too small resolution, motion blurring, shielding and too dark light exist, the detection results are directly used for subsequent feature extraction and comparison and can influence the accuracy of recognition, the detection results are required to be filtered, and the information-rich results are reserved. Meanwhile, in the whole life cycle of the monitoring video of the same pedestrian, the situations of space redundancy, time redundancy and gesture redundancy exist, that is, information redundancy does not exist only in the pedestrian detection results obtained in different spaces, different times and different gestures. By processing the distortion condition and the redundancy condition, on one hand, the subsequent recognition accuracy can be improved, on the other hand, the data volume is reduced, and the subsequent data processing pressure is reduced. The pedestrian image scoring and filtering scheme mainly comprises triple filtering, namely space-time filtering, quality filtering and gesture filtering. The filtering result is stored in the form of id sets, and pedestrian detection results under m discontinuous frames are stored in each id at most.

1.3.1 space-time filtration

For a certain image frame n, traversing all pedestrian tracking results, and for a certain pedestrian tracking result P _i The position coordinates are (x _i ,y _i ) If its id is never guaranteed in the filtering resultAnd if the pedestrian detection result of the previous frame is stored, directly entering the next round of filtering. If the id exists, the pedestrian tracking result P of the last time is obtained _i-1 Corresponding frame number m, coordinates (x _i-1 ,y _i-1 ) If the frame interval n-m is smaller than the time threshold T _time Filtering the pedestrian tracking result P _i . Similarly, if (x _i ,y _i ) And (x) _i-1 ,y _i-1 ) The coordinate distance is less than the spatial threshold T _dist Filtering the pedestrian tracking result P _i . And when the two thresholds meet the conditions, entering the next round of filtering process.

1.3.2 Mass filtration

P entering the filter _i Quality assessment of several indices, resolution, sharpness, integrity, brightness and aspect ratio, respectively, was performed. Each index produces a score, and the weighted sum of all scores is the final quality score S ₁ 。

Resolution ratio: calculation of P _i W and h, filtering the pedestrian tracking result P when w or h is less than the minimum resolution threshold Tsize _i 。

Definition: calculation of P using Reblur secondary blurring sharpness algorithm _i Compared with other definition algorithms based on gradients, the definition value sharp of (2) is better monotonically. In order to reduce the calculation amount, only the definition value of the image of the head and shoulder area is calculated, when the sharp is smaller than the definition threshold T _sharp Filtering the pedestrian tracking result P _i 。

Integrity: p (P) _i Intersection area A is formed by the rectangular frame of the pedestrian tracking result and the rectangular frames of the rest of the image frame n _inner Calculated to have a maximum value of max (A _inner ) Let P be _i If the rectangular frame area of (a) is a, the integrity is defined as 1-max (ain)/a, and if the integrity is smaller than the integrity threshold Teclipse, the pedestrian tracking result Pi is filtered.

Brightness: calculation of P _i Average gray value gray of (1) when gray is greater than maximum brightness threshold T _maxgray Or less than the minimum brightness threshold T _mingray Filtering the pedestrian tracking result P _i 。

Aspect ratio: assuming that the aspect ratio of a rectangular pedestrian frame accords with normal Gaussian distribution, obtaining the mean value u and standard deviation std of a Gaussian model through big data statistics, and calculating P _i The aspect ratio score is higher the smaller the absolute value of the difference of aspect from the mean u, and vice versa. When the aspect ratio score is less than the minimum aspect ratio score threshold T _aspect Filtering the pedestrian tracking result P _i 。

When the quality evaluation indexes meet the requirements, P _i And the next round of filtering process is carried out.

1.3.3 gesture filtration

For P entering the round of filtering _i And (3) performing pedestrian gesture evaluation, defining the pedestrian gesture evaluation as a pedestrian attribute classification task, training a 3-attribute classification model with a main network being a lightweight network mobilent, respectively taking the 3 attributes as front gesture, back gesture and side gesture, and finally respectively outputting probability values of the three gestures. In the present round of filtration, P is not directly filtered _i Instead, S in the probability value and quality filtering of the front and back poses ₁ Weighted summation is carried out to obtain a final pedestrian quality score S _final . Thus, the pedestrian picture with the front or back gesture will obtain a higher score, while the pedestrian picture with the side gesture will have a lower score.

If P _i Through the three filtering processes, a filtering result is input. If n pedestrian detection results are stored in the result corresponding to id, if n<m, then directly enter P _i If n>Comparing the quality scores if P _i S of (2) _final And if the score is larger than the minimum value of the scores of the n pedestrian detection results, replacing.

1.4 extraction of pedestrian characteristics and pedestrian Properties

The invention provides a multi-task model for extracting pedestrian characteristics and pedestrian attributes. Filtering 1.3 sections to obtain a pedestrian target detection result, firstly intercepting and obtaining a pedestrian image through an envelope frame, and then preprocessing to obtain standard data, wherein the standard data comprises size scaling operation and numerical normalization operation of the image; and then importing the standard data into a model, and obtaining output characteristics, namely required pedestrian characteristics after convolution operation of various operators such as a convolution layer, a normalization layer, an activation layer and the like, wherein the characteristics can output pedestrian attribute classification results after a linear classifier.

More specifically, the multitasking model adopts a architecture as a base line, adopts a Resnet34 as a backbone network, introduces a non-local module, replaces an IBN operator with a BN operator, and in the head part of the output head, besides outputting 512-dimensional pedestrian characteristics, a linear classifier branch is additionally arranged to output 15 pedestrian attributes, and the model output formula is as follows:

；

wherein,for inputting an image +.>For multitasking model, ++>For joint output, ++>For pedestrian character->Is a pedestrian attribute. The multitasking model is subjected to light weight treatment similar to the 1.1 section, and the specific steps are as follows: (1) The weight sparse training of the batch norm layer is carried out, and then the weight of the batch norm layer is cut>Smaller channels, overall channel cut-out ratio is 30%; (2) performing finetune fine tuning training on the model after clipping; and (3) carrying out int8 quantization when the model is transplanted to the edge equipment. Thus realizing a multitasking light model, a model is multipurpose, and global abstract features can be obtained at the same timeTo obtain attributes of these local features. Thus, the high feature extraction capability on the edge equipment can be maintained, and meanwhile, the high calculation efficiency can be maintained.

And extracting pedestrian characteristics and pedestrian attributes from the pedestrian pictures in the filtering result by using the multitasking model. 15 pedestrian attributes: whether to wear hat, whether to wear glasses, whether to wear short sleeves, whether to wear long shirts, whether to wear trousers, whether to wear skirt, whether to carry bags, whether to carry backpacks, whether to be old people, whether to be adults, whether to be children, and whether to be females. These features are not ambiguous and can reflect the local features of the pedestrian.

1.5 Multi-frame pedestrian feature and attribute fusion

And fusing the extracted multiple pedestrian characteristics and pedestrian attributes for the multiple frames of pedestrian detection results with the same id in the filtering result, so that the data transmission quantity can be reduced, the characteristics of the cross-space-time characteristics can be obtained, and the subsequent recognition accuracy is improved. And obtaining a pedestrian characteristic fusion result in a weighted summation mode. For the pedestrian attribute, as the probability value of each attribute is output by the pedestrian attribute branch, in order to ensure the robustness of the result, attribute judgment is performed by adopting a high-low double-threshold mode. For example, if the output probability P of wearing a cap is high, the threshold is P _high A low threshold of P _low When p is>=P _high If so, determining that the cap is worn (True state); when P _high >p>P _low When it is determined to be unknown (unknown state); when p is<=P _low If so, it is determined that the cap is not worn (False state). And judging the attribute of the pedestrian detection result of the multiple frames, carrying out result statistics on each attribute, and determining which of the three states has higher accumulated value as the pedestrian attribute fusion result. By fusion, only one pedestrian characteristic result and one pedestrian attribute result are corresponding to one id, and then the information of the id is imported into the database.

1.6 In-library data clustering

The current SORT tracking algorithm can have the same pedestrian id switching condition under shielding or high-speed movement, namely, one pedestrian has a plurality of ids. It is the same person that is actually characterized by multiple id information that is embodied in the database. In order to reduce redundancy of the database, mean shift clustering operation is performed in the database, and data are further fused.

2. Pedestrian re-recognition stage (as in figure 2)

For the person to be inquired, the pedestrian characteristic F of the inquired person is obtained through the flow of section 1.1-1.5 in the same way _query And pedestrian attribute A _query Then, the pedestrian re-recognition is performed through the following two processes:

2.1 feature alignment

F _query And (3) carrying out cosine similarity calculation on pedestrian features corresponding to the ids in the database, and sequencing the results to obtain K similar pedestrians with higher similarity, so that the subsequent processing data volume can be effectively reduced.

2.2 Attribute comparison

For the first K similar pedestrians, acquiring the pedestrian attribute results, and A _query And comparing pedestrian attributes. For the kth pedestrian P _k 15 attribute results thereof are respectively equal to A _query The corresponding attributes are compared. For certain attribute information, if one party is unknown, the comparison of the attribute is ignored; if neither is unknown, when the attribute states are consistent, the comparison of the attributes is ignored as well, and if not, P is filtered _k . And finally, the retained similar pedestrians are the final pedestrian re-identification result.

Compared with the prior art, the invention has the advantages that: the provided multiple actions ensure the accuracy of the final query effect of the scheme while reducing the calculated amount, and are more suitable for the deployment of edge equipment. The measures adopted are as follows: (1) Evaluating and filtering the pedestrian picture by adopting multiple filtering and a plurality of quality evaluation indexes; (2) application of lightweight algorithm: the method comprises the steps of applying a light target detection model and a multi-task feature extraction model, wherein a tracking algorithm is SORT tracking, (3) multiple information fusion of features and attributes between id and id, so that data transmission quantity and calculation quantity are reduced, (4) the query process is secondary judgment, similar pedestrians are respectively searched and filtered from global features and local attributes, and accuracy of a query result is improved.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The pedestrian re-identification method suitable for the edge equipment deployment is characterized by comprising the following steps of:

the filtering comprises space-time filtering, quality filtering and posture assessment which are sequentially carried out; the space-time filtering means that for a certain pedestrian tracking result with existing id, only the pedestrian tracking result with the frame interval larger than the time threshold and the space interval larger than the space threshold of the previous frame image is reserved, and then the reserved pedestrian tracking result enters the quality filtering; the quality filtering means filtering the pedestrian tracking results which do not meet the quality index, and scoring each quality index of the pedestrian tracking results at the same time, wherein the weighted summation of all the quality index scores is the quality score S of the pedestrian tracking results ₁ Then, the reserved pedestrian tracking result enters into gesture evaluation; the gesture evaluation is to divide the pedestrian gesture into a plurality of attributes, and output probability values of the plurality of pedestrian gestures and quality scores S through an attribute classification model ₁ Weighted summation to obtain final pedestrian quality score S _final ；

Inputting a pedestrian tracking result through space-time filtering, quality filtering and attitude evaluation: if the corresponding id in the pedestrian tracking result is stored with n pedestrian tracking results, if n is smaller than the frame number corresponding to the id, directly inputting, if n is greater than or equal to the frame number corresponding to the id, comparing the pedestrian quality scores S _final If the pedestrian is trackedPedestrian quality score S of result _final If the score is larger than the minimum value of the scores of the n pedestrian tracking results, replacing;

the multi-frame pedestrian feature fusion is obtained by a weighted summation mode; the multi-frame pedestrian attribute adopts a high-low double-threshold mode to judge the attribute, and performs result statistics on each attribute, and the state with the highest accumulated value is the pedestrian attribute fusion result; through fusion, one id corresponds to only one pedestrian characteristic result and one pedestrian attribute result;

2. The pedestrian re-recognition method suitable for edge equipment deployment according to claim 1, wherein in step S1, the target detection model is a lightweight yolox_s model.

3. The pedestrian re-identification method suitable for edge device deployment according to claim 2, wherein the lightweight processing of the yolox_s model comprises:

(2) Performing finetune fine tuning training on the trimmed yolox_s model;

4. The pedestrian re-recognition method suitable for edge equipment deployment according to claim 3, wherein the bn layer of the yolox_s model is a feature normalization operator of the yolox_s model, and the feature normalization operator is a batch norm2d operator, and the formula is as follows:

；

wherein,for input data requiring normalization, +.>And->For the mean and variance of the batch data, +.>Variable added to prevent zero occurrence of denominator, < >>And->Affine operation, namely linear transformation is carried out on the input value; from the formula, the weight of bn layer is +.>And->Output->Each channel in (a) has a corresponding +.>And->Use +.>Judging the importance degree of different channels; when->And if the weight of the channel is smaller than the preset value, the weight of the channel is cut if the data of the channel has smaller influence on the follow-up model.

5. The pedestrian re-recognition method suitable for edge equipment deployment according to claim 1, wherein in step S1, the working process of the target detection model is as follows:

6. The pedestrian re-identification method suitable for edge device deployment of claim 5, wherein the pre-processing includes normalized scaling of image size and normalization of data.

7. The pedestrian re-recognition method suitable for edge equipment deployment according to claim 1, wherein in step S1, the working process of the sort multi-target tracking model is as follows:

8. The pedestrian re-recognition method suitable for edge equipment deployment according to claim 1, wherein in step S2, pedestrian features and pedestrian attributes are simultaneously extracted from the pedestrian image through a fastreid model, specifically as follows: firstly, intercepting pedestrians in an image through a reserved pedestrian envelope frame, and then, preprocessing to obtain standard data; and then importing the standard data into a fastreid model, obtaining pedestrian characteristics after convolution operation, and outputting pedestrian attribute classification results after the pedestrian characteristics pass through a linear classifier.

9. A server comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the pedestrian re-identification method of any one of claims 1 to 8 that is suitable for edge device deployment.

10. A computer readable storage medium, characterized in that at least one program is stored in the storage medium, which program is loaded and executed by a processor to implement the pedestrian re-identification method suitable for edge device deployment according to any one of claims 1 to 8.