CN114581858A

CN114581858A - Method for identifying group of people with small shares and model training method

Info

Publication number: CN114581858A
Application number: CN202210486758.5A
Authority: CN
Inventors: 李星光; 张德馨
Original assignee: Zhongkezhiwei Technology Tianjin Co ltd
Current assignee: Zhongkezhiwei Technology Tianjin Co ltd
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-06-03
Anticipated expiration: 2042-05-06
Also published as: CN114581858B

Abstract

The invention discloses a method for re-identifying small-share crowd and a model training method, belongs to the technical field of pedestrian re-identification, and can solve the problem of low accuracy rate of re-identification of small-share crowd. The method comprises the following steps: acquiring a first sample image and a second sample image, and determining a first crowd area and a second crowd area in the first sample image and the second sample image; inputting the first crowd region into a first backbone network to obtain individual features and global features of a first crowd, and inputting the second crowd region into a second backbone network to obtain individual features and global features of a second crowd; inputting global characteristics of a first population and a second population into a first neural network for training to obtain a population similarity judgment network; inputting the individual characteristics of the first population and the second population into a second neural network for training to obtain an individual similarity judgment network; and constructing a small-share crowd re-identification model according to the crowd similarity judgment network and the individual similarity judgment network. The pedestrian re-identification method is used for pedestrian re-identification.

Description

Method for re-identifying small-share crowd and model training method

Technical Field

The invention relates to a method for identifying the heavy crowd at a small stock and a model training method, belonging to the technical field of pedestrian heavy identification.

Background

In recent years, with the continuous progress of artificial intelligence technology in the field of computer vision, the intelligent video monitoring technology has received extensive attention and research from academic circles and industrial circles, and has been applied to various actual complex scenes. Corresponding to a clear image of a face, an iris, a fingerprint or a body required for accurate identification, a pedestrian re-identification (Person re-identification) technology is a human-centered research field in a monitoring scene, and the field has very important practical significance and a commercial transformation prospect. The pedestrian re-identification is realized by means of a widely distributed video monitoring system to realize cross-camera pedestrian feature extraction and retrieval, and an efficient technical means is provided for public safety guarantee.

Most of the traditional pedestrian re-identification research methods focus on the problems of human body posture, background, illumination and the like under the condition of visible light, and research mainly focuses on two aspects of feature expression and measurement learning. In addition, there have been some studies that have begun to focus on the use of pedestrian population matching to assist in individual matching, studying visual descriptors of people on the same video frame or on adjacent video frames. In practical application, the pedestrian population can improve the detection matching rate of pedestrian re-identification, and has important effects on the research of the pedestrian population and the information analysis of the pedestrian population. Meanwhile, for places with dense crowds, the traditional pedestrian re-identification is easy to lose effectiveness due to interference of adjacent people shielding and the like, and the pedestrian group re-identification is often higher in robustness. However, the existing pedestrian population weight recognition network is difficult to consider both pedestrian population matching and individual matching, and mismatching is easy to occur, so that the accuracy of the small-share crowd weight recognition is low.

Disclosure of Invention

The invention provides a method for re-identifying small-share crowd and a model training method, which can solve the problem of low accuracy of re-identifying small-share crowd in the prior art.

In one aspect, the invention provides a method for training a re-recognition model of a small share crowd, which comprises the following steps:

s1, acquiring a first sample image and a second sample image, and determining a first crowd region in the first sample image and a second crowd region in the second sample image;

s2, inputting the first crowd region into a first backbone network to obtain the individual characteristic of each pedestrian in the first crowd and the global characteristic of the first crowd, and inputting the second crowd region into a second backbone network to obtain the individual characteristic of each pedestrian in the second crowd and the global characteristic of the second crowd;

s3, inputting the global features of the first population and the global features of the second population into a first neural network for training to obtain a population similarity judgment network;

s4, inputting the individual characteristics of the first population and the individual characteristics of the second population into a second neural network for training to obtain an individual similarity judgment network;

s5, constructing a small share crowd re-identification model according to the first backbone network, the second backbone network, the crowd similarity judgment network and the individual similarity judgment network.

Optionally, the S4 specifically includes:

selecting the coordinate of the pedestrian closest to the center of the first crowd area as a first center target, and selecting the coordinate of the pedestrian closest to the center of the second crowd area as a second center target;

respectively acquiring corresponding individual features of the first central target and the second central target according to the coordinates of the first central target and the second central target;

inputting the two individual characteristics into a second neural network for training to obtain an individual similarity judgment network.

Optionally, the determining the first crowd region in the first sample image and the second crowd region in the second sample image in S1 specifically includes:

s11, carrying out pedestrian target detection on the sample image, and marking a target frame of each target pedestrian;

s12, acquiring a target pedestrian closest to the center position of the sample image as a first target pedestrian;

s13, calculating coordinate distances between the other target pedestrians and the first target pedestrian, and sequencing the other target pedestrians according to the coordinate distances from small to large to form a target pedestrian sequence;

s14, dividing the first N target pedestrians with the coordinate distance smaller than the preset distance in the target pedestrian sequence and the first target pedestrian into a same group, wherein the region surrounded by the circumscribed rectangles of all the target frames of the same group is a crowd region;

s15, taking a target pedestrian which does not enter the crowd area as an updated first target pedestrian;

s16, calculating coordinate distances between the rest target pedestrians which do not enter the crowd area and the first target pedestrian, and sequencing the rest target pedestrians which do not enter the crowd area according to the coordinate distances from small to large to form an updated target pedestrian sequence;

s17, repeating the steps S14 to S16 until all target pedestrians in the sample image are drawn into a certain crowd area;

if the sample image is the first sample image, the crowd region is the first crowd region; and if the sample image is the second sample image, the crowd area is the second crowd area.

Optionally, after S1, the method further includes:

adjusting the size of the first population area or the second population area such that the size of the first population area and the second population area are the same.

Optionally, the first backbone network and/or the second backbone network are/is a CvT network structure with attention mechanism introduced.

Optionally, the loss function of the small group re-identification model

；

Wherein L is_qIs a population loss function; l is_gAs a function of individual losses; alpha is a preset hyper-parameter.

In another aspect, the present invention provides a method for re-identifying a small group of people, including:

s101, acquiring a first image to be detected and a second image to be detected, determining a first crowd area to be detected in the first image to be detected and a second crowd area to be detected in the second image to be detected, and combining two by two to form a comparison crowd pair;

s102, inputting a first to-be-detected crowd region in the comparison crowd pair into a first main network of a small crowd re-identification model to obtain individual features of each pedestrian in a first to-be-detected crowd and global features of the first to-be-detected crowd, and inputting a second to-be-detected crowd region in the comparison crowd pair into a second main network of the small crowd re-identification model to obtain individual features of each pedestrian in a second to-be-detected crowd and global features of the second to-be-detected crowd;

s103, inputting the global features of the first to-be-detected crowd and the global features of the second to-be-detected crowd into a crowd similarity judgment network of the small crowd re-identification model to obtain a crowd similarity score of the comparison crowd pair;

s104, combining the individual characteristics of all pedestrians in the first to-be-detected crowd and the individual characteristics of all pedestrians in the second to-be-detected crowd in pairs and inputting the combined individual characteristics into an individual similarity judgment network of the small-strand crowd re-identification model to obtain individual similarity scores of each pair of pedestrians in the comparison crowd pair;

s105, obtaining the final similarity score of the comparison crowd pair and the re-identification result of the same pedestrian in the comparison crowd pair according to the crowd similarity score and the individual similarity score;

the small-share crowd re-identification model is trained by adopting any one of the methods.

Optionally, the S105 specifically includes:

s1051, calculating an individual similarity statistical score of each pedestrian according to the individual similarity score of each pair of pedestrians;

s1052, calculating to obtain a final similarity score of the comparison crowd pair according to a first formula, and taking a pedestrian pair successfully matched in the comparison crowd pair as a re-identification result of the same pedestrian in the comparison crowd pair;

the first formula is

；

Wherein the content of the first and second substances,

(ii) a final similarity score for the pairwise population of pairs;

a population similarity score for the comparative demographic pair;

is a constant coefficient;

the statistical score of the individual similarity of the kth pedestrian in the first to-be-detected crowd region is obtained;

，

and matching the number of the same person in the comparison population pair.

Optionally, the S1051 specifically includes:

if the kth pedestrian in the first crowd area to be detected is in the pedestrian pair formed by each pedestrian in the second crowd area to be detected, only one individual similarity score of the pedestrian pair

Greater than a predetermined threshold

Then, then

(ii) a Wherein the content of the first and second substances,

if the kth pedestrian in the first crowd area to be detected is aligned with the pedestrian in the second crowd area to be detected, the individual similarity scores of a plurality of pedestrian pairs

Greater than a predetermined threshold

Then the pedestrian pairs meeting the requirement are ranked from big to small according to the individual similarity scores thereof to obtain

Then, then

(ii) a Wherein the content of the first and second substances,

is a constant coefficient of the number of the light emitting elements,

k is satisfied for pedestrian

The number of pedestrians.

Optionally, the total number of successfully matched pedestrian pairs in the comparison crowd pair

(ii) a Wherein, the T1 is the total number of pairs of pedestrians in the first detected crowd area whose individual similarity score is greater than the preset threshold; the T2 indicates that the individual similarity score in the second to-be-detected crowd region is larger than the preset threshold valueThe total number of pairs of pedestrians.

The invention can produce the beneficial effects that:

(1) according to the training method of the small-share crowd re-recognition model, the global features aiming at the crowd and the individual features aiming at all the pedestrians can be extracted at the same time, the consistency analysis is carried out on the individual features of all the individuals in the crowd, and meanwhile, the analysis result of crowd consistency judgment is combined, so that the accuracy of the small-share crowd re-recognition can be improved, and an individual relation data basis can be provided for the next semantic analysis.

(2) According to the training method of the small-share crowd re-identification model, the CvT network with the dynamic attention model is adopted to extract the features of the small-share crowd target area, and the feature description expression of a plurality of targets is promoted.

(3) The small-share crowd re-identification model training method provided by the invention designs a novel individual loss function calculated by comparing central target characteristics, and is combined with the crowd loss function judged by whether the same pedestrian exists or not to realize the integral training of the network, so that the network gives consideration to the description capacity of the crowd and the description capacity of the individual.

(4) According to the method for identifying the heavy crowd at the small stock, provided by the invention, the scheme of multi-target feature description extraction and comparison analysis is designed, and the accuracy of identifying the heavy crowd at the small stock is improved by being proved with the judgment of crowd consistency.

Drawings

FIG. 1 is a flowchart of a small segment crowd re-recognition model training method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a small segment crowd re-identification model according to an embodiment of the present invention;

fig. 3 is a flowchart of a small group re-identification method according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to examples, but the present invention is not limited to these examples.

The embodiment of the invention provides a training method of a small share crowd re-recognition model, wherein the small share crowd re-recognition comprises two layers of meanings: judging whether two video frame images from different cameras contain the same person; second is re-identification of the same person in the group.

The method is mainly used for re-identifying the pedestrians under the condition of high crowd density, and the pedestrians with high probability are overlapped and shielded or are accompanied. Typical scenes include subway exits and exits, pedestrian crossings, shopping malls, crowd-gathering squares, and the like. The method mainly comprises the steps of performing feature analysis on a small group consisting of a plurality of persons (more than or equal to 2 persons) appearing simultaneously in a single video frame acquired by one camera, and performing similarity calculation on features of the persons appearing in the single or simultaneously in the single video frame acquired by the other camera.

Taking fig. 2 as an example, when the right object in the upper left row image of fig. 2 is used as an independent person to be compared with the object in the lower left row image, due to the similarity of colors, the problem of occlusion exists, and mismatching is easy to occur. When the target in the upper row image on the left side is taken as a crowd target and compared with the crowd in the lower row image on the left side, the difference is more obvious, and the mistaken identification is not easy to occur.

Referring to fig. 2, the present invention respectively inputs the crowd Regions (ROIs) in two images collected in different spaces and different viewing angles into two twin backbone networks, combines the characteristics of the two networks together, and judges whether the two ROIs contain the same target through a small number of convolutional layers and full-link layers. Meanwhile, a feature layer with a proper size in the middle of the backbone network obtains the description features of the individual targets according to the coordinate sampling of the independent individual targets, and the description features are used for feature extraction of the subsequent individual targets.

Specifically, as shown in fig. 1, the method includes:

and S1, acquiring the first sample image and the second sample image, and determining a first crowd region in the first sample image and a second crowd region in the second sample image.

The first sample image and the second sample image are sample images for training a re-identification model of a small group of people, and the sample images in the training data set are paired for network training. The training optimization of the network parameters is realized by calculating the similarity degree of the crowd and the similarity degree of the individuals contained in the first sample image and the second sample image in the whole training process.

And (3) selecting a crowd region (namely a small crowd ROI region) from two video sample images from different cameras to obtain R1 and R2.

S2, inputting the first crowd area into the first trunk network to obtain the individual characteristic of each pedestrian in the first crowd and the global characteristic of the first crowd, and inputting the second crowd area into the second trunk network to obtain the individual characteristic of each pedestrian in the second crowd and the global characteristic of the second crowd.

And inputting the R1 and the R2 determined in the step S1 into the first backbone network and the second backbone network respectively for feature extraction.

The embodiment of the invention does not limit the specific structure types of the two backbone networks, and the structure types of the two backbone networks can be the same or different; for different application scenarios and modes, the two backbone networks can selectively adopt two different networks or the same network. The specific selection and network training is as follows.

Two different networks are used: the two backbone networks are respectively adaptive to different types of cameras or different scenes for learning and are used for a special scene, and the cameras and the scene are relatively fixed, so that the precision of actual deployment and use is improved conveniently. For example, the camera frame corresponding to the input of the first trunk network is arranged at the elevator entrance and faces the elevator for shooting, and the camera frame corresponding to the input of the second trunk network is arranged at one corner of a certain open space for shooting, so that the learned first trunk network and the learned second trunk network have obvious difference and cannot be used interchangeably. The two backbone networks are trained or fine-tuned on the basis of data collected in a scene, the training is carried out synchronously, and the parameters of the two backbone networks are different.

The same network is used: when two paths of cameras in some deployments are not clearly distinguished and are deployed in a large scale, the same backbone network can be adopted. During the training process, the backbone network is actually a network for parameter sharing. When the same network is adopted, the similarity of two targets, such as Euclidean distance and cosine distance, can be calculated by adopting a mode of calculating the distance between two pedestrian feature vectors without using an individual similarity judgment network.

In practical application, the two backbone networks may be deep network models with attention mechanism introduced, and are preferably CvT network structures.

Transformer was a model proposed in 2017, based mainly on Self-Attention structure. The method has high calculation efficiency and good expansibility, and can support the training of models with more than 100B parameters. In recent years, transformers have been introduced into computer vision. Vision Transformer (ViT) classifies images directly using transformers without the need for convolutional networks. In order for the ViT model to handle pictures, the picture is first divided into a number of tiles (Token), and then the sequence of tiles is passed into ViT. Convolution vision transformer (cvt) is a new architecture that improves performance and efficiency by introducing a convolution into ViT to produce the best effect of the combination of the two designs.

The embodiment of the invention adopts the CvT network with the dynamic attention model as the main network to extract the characteristics of the target area of the small group of people, thereby being beneficial to improving the description expression of the characteristics of a plurality of targets. In fig. 2, the dashed box is a normal operation in the process of extracting features from the backbone network, which is not described herein again.

And S3, inputting the global features of the first population and the global features of the second population into the first neural network for training to obtain a population similarity judgment network.

The first neural network is a preset initial network, and is trained by using the global features of the first population and the global features of the second population, so that a population similarity judgment network can be obtained.

After the feature extraction of the two main networks, the two main networks are combined for carrying out associated feature extraction, namely, the crowd similarity judgment network in the graph judges whether the same target people exist in R1 and R2.

And S4, inputting the individual characteristics of the first population and the individual characteristics of the second population into a second neural network for training to obtain an individual similarity judgment network.

The second neural network is a preset initial network, and is trained by using the individual features of the first population and the individual features of the second population, so that an individual similarity judgment network can be obtained.

In the training stage, in order to facilitate network training, only the features of the target closest to the center position of the ROI area can be extracted, and whether the individuals of the targets at the centers of the two images are the same person or not is compared.

Specifically, the method comprises the following steps: selecting the coordinate of the pedestrian closest to the center of the first crowd area as a first central target, and selecting the coordinate of the pedestrian closest to the center of the second crowd area as a second central target;

and inputting the two individual characteristics into a second neural network for training to obtain an individual similarity judgment network.

By sampling the individual characteristics of the central target (the pedestrian target closest to the most central position of the ROI in the crowd area), connecting the central target characteristics of the ROI of the two images to extract the associated characteristics and judge whether the central target characteristics are the same person, the individual loss function is calculated and combined with the crowd loss function, and the synchronous training of the network is realized.

In the testing stage, only the features of the central target may be extracted for comparison, and the features of all targets in the ROI may also be extracted for comparison and analysis, which is not limited in the embodiment of the present invention.

Concat → conv → FC in the population similarity determination network and the individual similarity determination network in fig. 2 refers to concatenation → convolution → full join operation for extracted features, where conv (convolution) may be one convolution layer or multiple convolution layers, which is not limited in the embodiment of the present invention.

S5, constructing a small stock crowd re-identification model according to the first backbone network, the second backbone network, the crowd similarity judgment network and the individual similarity judgment network.

From the above, the input to the model is the region of interest (ROI) of the crowd in the video frames from two different cameras, after pedestrian detection: the set of pedestrian target boxes in the first sample image is

，

The smaller the i serial number is, the closer the target is to the center of the image; the set of pedestrian target boxes in the second sample image is

，

The number of the targets is the number of the targets,

smaller sequence numbers indicate that the object is closer to the center of the image. To be provided with

For example, the method for determining the ROI of the population is as follows:

and S11, detecting the pedestrian target of the sample image, and marking the target frame of each target pedestrian. If the sample image is a first sample image, the crowd area is a first crowd area; if the sample image is the second sample image, the crowd region is the second crowd region.

And S12, acquiring the target pedestrian closest to the center position of the sample image as a first target pedestrian.

And S13, calculating coordinate distances between the other target pedestrians and the first target pedestrian, and sequencing the other target pedestrians according to the coordinate distances from small to large to form a target pedestrian sequence.

Selecting the target pedestrian at the most central position of the sample image as the center

Traversing all other pedestrian targets, calculating the sum of the pedestrian targets

Is a distance of

(calculated from the coordinate values on the graph) if

<

Then it is combined with

Forming a population, labeled

，

Is a predetermined distance threshold.

S14, dividing the first N target pedestrians and the first target pedestrian in the target pedestrian sequence, wherein the coordinate distance of the first N target pedestrians and the first target pedestrian are smaller than the preset distance, and the region surrounded by the circumscribed rectangles of all the target frames of the same group is a crowd region.

And S15, taking one target pedestrian which does not enter the crowd area as the updated first target pedestrian.

And S16, calculating coordinate distances between the other target pedestrians which do not enter the crowd area and the first target pedestrian, and sequencing the other target pedestrians which do not enter the crowd area according to the coordinate distances from small to large to form an updated target pedestrian sequence.

And S17, repeatedly executing the steps S14 to S16 until all target pedestrians in the sample image are drawn into a certain crowd area.

Setting a maximum number N of people, when the number of people exceeds N,

and finishing the calculation of the ROI of the group, wherein the circumscribed rectangles of all the target frames are the ROI areas. Find backward no-entry

First object of (2)

Repeating the operations of steps S14 to S16, the search does not proceed

To form the next ROI. And repeatedly finishing the judgment of the last pedestrian target.

Further, after the S1, the method further includes:

the size of the first crowd area or the second crowd area is adjusted so that the size of the first crowd area and the second crowd area are the same.

When the aspect ratio of a region of a certain ROI is greater than or equal to 1.5, the ROI is subjected to classification and blocking processing to form two regions close to squares

And

and the result is prevented from being influenced by serious deformation caused by adjusting the image into a square with a fixed size to adapt to the condition that the network input is a fixed square. When one or more pedestrians appear on the image cutting line, whether the new area formed by the pedestrians is included or not can be selected according to how close to the square the new area is formed by the pedestrians is included or not, and the new area is obtained by parting and dividing processing

And

there may be overlapping portions of the regions; when the ROI area aspect ratio is less than 1.5, the image size (Resize) is directly adjusted to a fixed size square as input.

Thus, a series of ROI regions of the human population are obtained from the two sample images respectively and are recorded as

And

，

and

all equal to 1, the ROI degenerates into the detection frame area of one pedestrian, and the problem degenerates into the single pedestrian re-identification problem. The result can be obtained by directly adopting individual characteristic comparison without carrying out crowd-related analysis.

When the distance threshold value

When the setting is larger than the image size, the target of analysis is the whole image range, namely all individuals appearing in the visual field range of the same camera are considered to form a group.

In the training stage, the loss function of the whole network model consists of two parts, namely a crowd loss function

Second is the individual loss function

。

The crowd loss function calculation adopts the commonly used two-class or multi-class problem loss functions, such as a mean square error loss function and a cross entropy loss function. The crowd similarity may model a two-class problem or a multi-class problem. Two categories of the second category are: [ not including the same person, including at least the same person ]; multiple classifications [ no identity, 1 identity, 2 identity, N-1 identity, greater than or equal to N identity ]; n is a positive integer predetermined according to actual use. To a certain extent, different categories represent different degrees of similarity of the population, and the larger the category number is, the higher the degree of similarity is represented.

The individual loss function calculation adopts a common two-classification problem loss function, and the individual similarity judgment is a two-classification problem, wherein the two types are respectively different people and the same person.

Loss function of small stock crowd re-identification model training whole

(ii) a Wherein the content of the first and second substances,

as a population loss function;

is an individual loss function;

the hyper-parameters determined for the experiments.

Another embodiment of the present invention provides a method for re-identifying a small group of people, as shown in fig. 3, the method includes:

s101, a first image to be detected and a second image to be detected are obtained, a first crowd area to be detected in the first image to be detected and a second crowd area to be detected in the second image to be detected are determined, and the first crowd area to be detected and the second crowd area to be detected are combined in pairs to form a comparison crowd pair.

S102, inputting a first to-be-detected crowd region in the comparison crowd pair into a first trunk network of the small crowd weight identification model to obtain individual features of each pedestrian in the first to-be-detected crowd and global features of the first to-be-detected crowd, and inputting a second to-be-detected crowd region in the comparison crowd pair into a second trunk network of the small crowd weight identification model to obtain individual features of each pedestrian in the second to-be-detected crowd and global features of the second to-be-detected crowd.

S103, inputting the global features of the first to-be-detected crowd and the global features of the second to-be-detected crowd into a crowd similarity judgment network of the small crowd re-identification model to obtain a crowd similarity score of a comparison crowd pair.

And S104, combining the individual characteristics of all the pedestrians in the first to-be-detected crowd and the individual characteristics of all the pedestrians in the second to-be-detected crowd in pairs and inputting the combined individual characteristics into an individual similarity judgment network of the small-share crowd re-identification model to obtain individual similarity scores of each pair of pedestrians in the comparison crowd pair.

For a pair of small crowd regions, which are abbreviated as R1 and R2, respectively, m pedestrians are contained in R1, n pedestrians are contained in R2, in the feature map of the main network, the features of all pedestrians in R1 and all pedestrians in R2 are sampled and extracted in sequence, and all pedestrians in R1 are traversed (R2)

All pedestrians (m) and R2

n) inputting the individual similarity judgment network in a pairwise combination manner, judging whether the pair of pedestrians are the same person, and recording the result as

。

And S105, obtaining a final similarity score of the comparison crowd pair and a re-identification result of the same pedestrian in the comparison crowd pair according to the crowd similarity score and the individual similarity score.

Wherein, the small stock crowd re-identification model is trained by adopting any one of the methods.

Wherein, the S105 specifically includes:

s1051, calculating the individual similarity statistical score of each pedestrian according to the individual similarity score of each pair of pedestrians.

Specifically, the method comprises the following steps: if the kth pedestrian in the first crowd area to be detected is paired with each pedestrian in the second crowd area to be detected, only one individual similarity score of the pedestrian pair

Greater than a predetermined threshold

Then, then

(ii) a Wherein the content of the first and second substances,

the individual similarity statistical score of the kth pedestrian in the first to-be-detected crowd region is obtained;

if the kth pedestrian in the first crowd region to be detected and the pedestrian pair consisting of each pedestrian in the second crowd region to be detected have individual similarity scores of a plurality of pedestrian pairs

Greater than a predetermined threshold

Then, the pedestrian pairs meeting the requirement are ranked from large to small according to the individual similarity scores thereof to obtain

Then, then

(ii) a Wherein the content of the first and second substances,

is a constant coefficient of the number of the light emitting elements,

k is satisfied for pedestrian

The number of pedestrians.

S1052, calculating to obtain a final similarity score of the comparison crowd pair according to a first formula, and taking the pedestrian pair successfully matched in the comparison crowd pair as a re-identification result of the same pedestrian in the comparison crowd pair;

the first formula is

；

Wherein the content of the first and second substances,

is the final similarity score to the crowd pair;

a population similarity score to a population pair;

is a constant coefficient;

，

the number of the same person is matched in the comparison crowd pair.

Further, the number of pedestrians who are judged to be the same person in R2 among R1 is T1. The number of pedestrians judged to be the same person in R1 among R2 is T2. To avoid the effect of mismatching, the total number of successfully matched pedestrian pairs is summed

(ii) a Wherein T1 is that the individual similarity score in the first human group region R1 is larger than the preset thresholdA pedestrian pair total of value; and T2 is the total number of the pedestrian pairs in the second detected crowd region R2, of which the individual similarity scores are larger than the preset threshold value.

Therefore, a small crowd similarity score S and a re-recognition result of the same pedestrian in the two crowd ROIs can be obtained, and the larger the score S is, the larger the probability that the same pedestrian is contained in the two crowds is.

After the two ROI areas are analyzed, comprehensive analysis and output for a full image or a sequence image can be further performed according to the actual application requirements, and the comprehensive analysis and output are not specifically designed.

Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

Claims

1. A training method for a small group re-recognition model is characterized by comprising the following steps:

2. The method according to claim 1, wherein the S4 specifically includes:

3. The method according to claim 1, wherein the determining of the first crowd region in the first sample image and the second crowd region in the second sample image in S1 specifically includes:

4. The method according to claim 1 or 3, wherein after the S1, the method further comprises:

5. The method according to claim 1, wherein the first backbone network and/or the second backbone network is a CvT network structure introducing a mechanism of attention.

6. The method of claim 1, wherein the loss function of the small segment population re-identification model

；

Wherein the content of the first and second substances,

as a population loss function;

as a function of individual losses;

is a preset hyper-parameter.

7. A method for re-identifying a small group of people, the method comprising:

s103, inputting the global features of the first to-be-detected crowd and the second to-be-detected crowd into a crowd similarity judgment network of the small crowd re-identification model to obtain a crowd similarity score of the comparison crowd pair;

wherein the small segment crowd re-identification model is trained by the method of any one of claims 1 to 6.

8. The method according to claim 7, wherein the S105 specifically includes:

the first formula is

；

Wherein, the first and the second end of the pipe are connected with each other,

(ii) a final similarity score for the pairwise population of pairs;

a crowd similarity score for the comparative crowd pair;

is a constant coefficient;

，

and matching the number of the same person in the comparison crowd pair.

9. The method according to claim 8, wherein the S1051 specifically includes:

if the kth pedestrian in the first crowd area to be detected and the second crowd to be detectedIndividual similarity scores for only one of the pedestrian pairs of each pedestrian in the area

Greater than a predetermined threshold

Then, then

(ii) a Wherein the content of the first and second substances,

counting the individual similarity of the kth pedestrian in the first to-be-detected crowd region;

if the kth pedestrian in the first to-be-detected crowd region and the pedestrian pair consisting of each pedestrian in the second to-be-detected crowd region have individual similarity scores of a plurality of pedestrian pairs

Greater than a predetermined threshold

Then the pedestrian pairs meeting the requirement are ranked from large to small according to the individual similarity scores

Then, then

(ii) a Wherein, the first and the second end of the pipe are connected with each other,

is a constant coefficient of the number of the light emitting elements,

to a rowHuman k satisfies

The number of pedestrians.

10. The method of claim 9, wherein the total number of successfully matched pairs of pedestrians in the aligned pairs of people is determined

；

Wherein, the T1 is the total number of pairs of pedestrians in the first detected crowd area whose individual similarity score is greater than the preset threshold; and the T2 is the total number of the pedestrian pairs with the individual similarity scores larger than the preset threshold value in the second to-be-detected crowd area.