CN114581858A - Method for identifying group of people with small shares and model training method - Google Patents

Method for identifying group of people with small shares and model training method Download PDF

Info

Publication number
CN114581858A
CN114581858A CN202210486758.5A CN202210486758A CN114581858A CN 114581858 A CN114581858 A CN 114581858A CN 202210486758 A CN202210486758 A CN 202210486758A CN 114581858 A CN114581858 A CN 114581858A
Authority
CN
China
Prior art keywords
crowd
pedestrian
target
individual
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210486758.5A
Other languages
Chinese (zh)
Other versions
CN114581858B (en
Inventor
李星光
张德馨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkezhiwei Technology Tianjin Co ltd
Original Assignee
Zhongkezhiwei Technology Tianjin Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkezhiwei Technology Tianjin Co ltd filed Critical Zhongkezhiwei Technology Tianjin Co ltd
Priority to CN202210486758.5A priority Critical patent/CN114581858B/en
Publication of CN114581858A publication Critical patent/CN114581858A/en
Application granted granted Critical
Publication of CN114581858B publication Critical patent/CN114581858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method for re-identifying small-share crowd and a model training method, belongs to the technical field of pedestrian re-identification, and can solve the problem of low accuracy rate of re-identification of small-share crowd. The method comprises the following steps: acquiring a first sample image and a second sample image, and determining a first crowd area and a second crowd area in the first sample image and the second sample image; inputting the first crowd region into a first backbone network to obtain individual features and global features of a first crowd, and inputting the second crowd region into a second backbone network to obtain individual features and global features of a second crowd; inputting global characteristics of a first population and a second population into a first neural network for training to obtain a population similarity judgment network; inputting the individual characteristics of the first population and the second population into a second neural network for training to obtain an individual similarity judgment network; and constructing a small-share crowd re-identification model according to the crowd similarity judgment network and the individual similarity judgment network. The pedestrian re-identification method is used for pedestrian re-identification.

Description

Method for re-identifying small-share crowd and model training method
Technical Field
The invention relates to a method for identifying the heavy crowd at a small stock and a model training method, belonging to the technical field of pedestrian heavy identification.
Background
In recent years, with the continuous progress of artificial intelligence technology in the field of computer vision, the intelligent video monitoring technology has received extensive attention and research from academic circles and industrial circles, and has been applied to various actual complex scenes. Corresponding to a clear image of a face, an iris, a fingerprint or a body required for accurate identification, a pedestrian re-identification (Person re-identification) technology is a human-centered research field in a monitoring scene, and the field has very important practical significance and a commercial transformation prospect. The pedestrian re-identification is realized by means of a widely distributed video monitoring system to realize cross-camera pedestrian feature extraction and retrieval, and an efficient technical means is provided for public safety guarantee.
Most of the traditional pedestrian re-identification research methods focus on the problems of human body posture, background, illumination and the like under the condition of visible light, and research mainly focuses on two aspects of feature expression and measurement learning. In addition, there have been some studies that have begun to focus on the use of pedestrian population matching to assist in individual matching, studying visual descriptors of people on the same video frame or on adjacent video frames. In practical application, the pedestrian population can improve the detection matching rate of pedestrian re-identification, and has important effects on the research of the pedestrian population and the information analysis of the pedestrian population. Meanwhile, for places with dense crowds, the traditional pedestrian re-identification is easy to lose effectiveness due to interference of adjacent people shielding and the like, and the pedestrian group re-identification is often higher in robustness. However, the existing pedestrian population weight recognition network is difficult to consider both pedestrian population matching and individual matching, and mismatching is easy to occur, so that the accuracy of the small-share crowd weight recognition is low.
Disclosure of Invention
The invention provides a method for re-identifying small-share crowd and a model training method, which can solve the problem of low accuracy of re-identifying small-share crowd in the prior art.
In one aspect, the invention provides a method for training a re-recognition model of a small share crowd, which comprises the following steps:
s1, acquiring a first sample image and a second sample image, and determining a first crowd region in the first sample image and a second crowd region in the second sample image;
s2, inputting the first crowd region into a first backbone network to obtain the individual characteristic of each pedestrian in the first crowd and the global characteristic of the first crowd, and inputting the second crowd region into a second backbone network to obtain the individual characteristic of each pedestrian in the second crowd and the global characteristic of the second crowd;
s3, inputting the global features of the first population and the global features of the second population into a first neural network for training to obtain a population similarity judgment network;
s4, inputting the individual characteristics of the first population and the individual characteristics of the second population into a second neural network for training to obtain an individual similarity judgment network;
s5, constructing a small share crowd re-identification model according to the first backbone network, the second backbone network, the crowd similarity judgment network and the individual similarity judgment network.
Optionally, the S4 specifically includes:
selecting the coordinate of the pedestrian closest to the center of the first crowd area as a first center target, and selecting the coordinate of the pedestrian closest to the center of the second crowd area as a second center target;
respectively acquiring corresponding individual features of the first central target and the second central target according to the coordinates of the first central target and the second central target;
inputting the two individual characteristics into a second neural network for training to obtain an individual similarity judgment network.
Optionally, the determining the first crowd region in the first sample image and the second crowd region in the second sample image in S1 specifically includes:
s11, carrying out pedestrian target detection on the sample image, and marking a target frame of each target pedestrian;
s12, acquiring a target pedestrian closest to the center position of the sample image as a first target pedestrian;
s13, calculating coordinate distances between the other target pedestrians and the first target pedestrian, and sequencing the other target pedestrians according to the coordinate distances from small to large to form a target pedestrian sequence;
s14, dividing the first N target pedestrians with the coordinate distance smaller than the preset distance in the target pedestrian sequence and the first target pedestrian into a same group, wherein the region surrounded by the circumscribed rectangles of all the target frames of the same group is a crowd region;
s15, taking a target pedestrian which does not enter the crowd area as an updated first target pedestrian;
s16, calculating coordinate distances between the rest target pedestrians which do not enter the crowd area and the first target pedestrian, and sequencing the rest target pedestrians which do not enter the crowd area according to the coordinate distances from small to large to form an updated target pedestrian sequence;
s17, repeating the steps S14 to S16 until all target pedestrians in the sample image are drawn into a certain crowd area;
if the sample image is the first sample image, the crowd region is the first crowd region; and if the sample image is the second sample image, the crowd area is the second crowd area.
Optionally, after S1, the method further includes:
adjusting the size of the first population area or the second population area such that the size of the first population area and the second population area are the same.
Optionally, the first backbone network and/or the second backbone network are/is a CvT network structure with attention mechanism introduced.
Optionally, the loss function of the small group re-identification model
Figure 324672DEST_PATH_IMAGE001
Wherein L isqIs a population loss function; l isgAs a function of individual losses; alpha is a preset hyper-parameter.
In another aspect, the present invention provides a method for re-identifying a small group of people, including:
s101, acquiring a first image to be detected and a second image to be detected, determining a first crowd area to be detected in the first image to be detected and a second crowd area to be detected in the second image to be detected, and combining two by two to form a comparison crowd pair;
s102, inputting a first to-be-detected crowd region in the comparison crowd pair into a first main network of a small crowd re-identification model to obtain individual features of each pedestrian in a first to-be-detected crowd and global features of the first to-be-detected crowd, and inputting a second to-be-detected crowd region in the comparison crowd pair into a second main network of the small crowd re-identification model to obtain individual features of each pedestrian in a second to-be-detected crowd and global features of the second to-be-detected crowd;
s103, inputting the global features of the first to-be-detected crowd and the global features of the second to-be-detected crowd into a crowd similarity judgment network of the small crowd re-identification model to obtain a crowd similarity score of the comparison crowd pair;
s104, combining the individual characteristics of all pedestrians in the first to-be-detected crowd and the individual characteristics of all pedestrians in the second to-be-detected crowd in pairs and inputting the combined individual characteristics into an individual similarity judgment network of the small-strand crowd re-identification model to obtain individual similarity scores of each pair of pedestrians in the comparison crowd pair;
s105, obtaining the final similarity score of the comparison crowd pair and the re-identification result of the same pedestrian in the comparison crowd pair according to the crowd similarity score and the individual similarity score;
the small-share crowd re-identification model is trained by adopting any one of the methods.
Optionally, the S105 specifically includes:
s1051, calculating an individual similarity statistical score of each pedestrian according to the individual similarity score of each pair of pedestrians;
s1052, calculating to obtain a final similarity score of the comparison crowd pair according to a first formula, and taking a pedestrian pair successfully matched in the comparison crowd pair as a re-identification result of the same pedestrian in the comparison crowd pair;
the first formula is
Figure 171274DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 333265DEST_PATH_IMAGE003
(ii) a final similarity score for the pairwise population of pairs;
Figure 581713DEST_PATH_IMAGE004
a population similarity score for the comparative demographic pair;
Figure 805013DEST_PATH_IMAGE005
is a constant coefficient;
Figure 155223DEST_PATH_IMAGE006
the statistical score of the individual similarity of the kth pedestrian in the first to-be-detected crowd region is obtained;
Figure 370173DEST_PATH_IMAGE007
Figure 223859DEST_PATH_IMAGE008
and matching the number of the same person in the comparison population pair.
Optionally, the S1051 specifically includes:
if the kth pedestrian in the first crowd area to be detected is in the pedestrian pair formed by each pedestrian in the second crowd area to be detected, only one individual similarity score of the pedestrian pair
Figure 866062DEST_PATH_IMAGE009
Greater than a predetermined threshold
Figure 437989DEST_PATH_IMAGE010
Then, then
Figure 925471DEST_PATH_IMAGE011
(ii) a Wherein the content of the first and second substances,
Figure 899243DEST_PATH_IMAGE012
the statistical score of the individual similarity of the kth pedestrian in the first to-be-detected crowd region is obtained;
if the kth pedestrian in the first crowd area to be detected is aligned with the pedestrian in the second crowd area to be detected, the individual similarity scores of a plurality of pedestrian pairs
Figure 712347DEST_PATH_IMAGE013
Greater than a predetermined threshold
Figure 755258DEST_PATH_IMAGE014
Then the pedestrian pairs meeting the requirement are ranked from big to small according to the individual similarity scores thereof to obtain
Figure 62743DEST_PATH_IMAGE015
Then, then
Figure 140289DEST_PATH_IMAGE016
(ii) a Wherein the content of the first and second substances,
Figure 875027DEST_PATH_IMAGE017
is a constant coefficient of the number of the light emitting elements,
Figure 670814DEST_PATH_IMAGE018
k is satisfied for pedestrian
Figure 250831DEST_PATH_IMAGE019
The number of pedestrians.
Optionally, the total number of successfully matched pedestrian pairs in the comparison crowd pair
Figure 188743DEST_PATH_IMAGE020
(ii) a Wherein, the T1 is the total number of pairs of pedestrians in the first detected crowd area whose individual similarity score is greater than the preset threshold; the T2 indicates that the individual similarity score in the second to-be-detected crowd region is larger than the preset threshold valueThe total number of pairs of pedestrians.
The invention can produce the beneficial effects that:
(1) according to the training method of the small-share crowd re-recognition model, the global features aiming at the crowd and the individual features aiming at all the pedestrians can be extracted at the same time, the consistency analysis is carried out on the individual features of all the individuals in the crowd, and meanwhile, the analysis result of crowd consistency judgment is combined, so that the accuracy of the small-share crowd re-recognition can be improved, and an individual relation data basis can be provided for the next semantic analysis.
(2) According to the training method of the small-share crowd re-identification model, the CvT network with the dynamic attention model is adopted to extract the features of the small-share crowd target area, and the feature description expression of a plurality of targets is promoted.
(3) The small-share crowd re-identification model training method provided by the invention designs a novel individual loss function calculated by comparing central target characteristics, and is combined with the crowd loss function judged by whether the same pedestrian exists or not to realize the integral training of the network, so that the network gives consideration to the description capacity of the crowd and the description capacity of the individual.
(4) According to the method for identifying the heavy crowd at the small stock, provided by the invention, the scheme of multi-target feature description extraction and comparison analysis is designed, and the accuracy of identifying the heavy crowd at the small stock is improved by being proved with the judgment of crowd consistency.
Drawings
FIG. 1 is a flowchart of a small segment crowd re-recognition model training method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a small segment crowd re-identification model according to an embodiment of the present invention;
fig. 3 is a flowchart of a small group re-identification method according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to examples, but the present invention is not limited to these examples.
The embodiment of the invention provides a training method of a small share crowd re-recognition model, wherein the small share crowd re-recognition comprises two layers of meanings: judging whether two video frame images from different cameras contain the same person; second is re-identification of the same person in the group.
The method is mainly used for re-identifying the pedestrians under the condition of high crowd density, and the pedestrians with high probability are overlapped and shielded or are accompanied. Typical scenes include subway exits and exits, pedestrian crossings, shopping malls, crowd-gathering squares, and the like. The method mainly comprises the steps of performing feature analysis on a small group consisting of a plurality of persons (more than or equal to 2 persons) appearing simultaneously in a single video frame acquired by one camera, and performing similarity calculation on features of the persons appearing in the single or simultaneously in the single video frame acquired by the other camera.
Taking fig. 2 as an example, when the right object in the upper left row image of fig. 2 is used as an independent person to be compared with the object in the lower left row image, due to the similarity of colors, the problem of occlusion exists, and mismatching is easy to occur. When the target in the upper row image on the left side is taken as a crowd target and compared with the crowd in the lower row image on the left side, the difference is more obvious, and the mistaken identification is not easy to occur.
Referring to fig. 2, the present invention respectively inputs the crowd Regions (ROIs) in two images collected in different spaces and different viewing angles into two twin backbone networks, combines the characteristics of the two networks together, and judges whether the two ROIs contain the same target through a small number of convolutional layers and full-link layers. Meanwhile, a feature layer with a proper size in the middle of the backbone network obtains the description features of the individual targets according to the coordinate sampling of the independent individual targets, and the description features are used for feature extraction of the subsequent individual targets.
Specifically, as shown in fig. 1, the method includes:
and S1, acquiring the first sample image and the second sample image, and determining a first crowd region in the first sample image and a second crowd region in the second sample image.
The first sample image and the second sample image are sample images for training a re-identification model of a small group of people, and the sample images in the training data set are paired for network training. The training optimization of the network parameters is realized by calculating the similarity degree of the crowd and the similarity degree of the individuals contained in the first sample image and the second sample image in the whole training process.
And (3) selecting a crowd region (namely a small crowd ROI region) from two video sample images from different cameras to obtain R1 and R2.
S2, inputting the first crowd area into the first trunk network to obtain the individual characteristic of each pedestrian in the first crowd and the global characteristic of the first crowd, and inputting the second crowd area into the second trunk network to obtain the individual characteristic of each pedestrian in the second crowd and the global characteristic of the second crowd.
And inputting the R1 and the R2 determined in the step S1 into the first backbone network and the second backbone network respectively for feature extraction.
The embodiment of the invention does not limit the specific structure types of the two backbone networks, and the structure types of the two backbone networks can be the same or different; for different application scenarios and modes, the two backbone networks can selectively adopt two different networks or the same network. The specific selection and network training is as follows.
Two different networks are used: the two backbone networks are respectively adaptive to different types of cameras or different scenes for learning and are used for a special scene, and the cameras and the scene are relatively fixed, so that the precision of actual deployment and use is improved conveniently. For example, the camera frame corresponding to the input of the first trunk network is arranged at the elevator entrance and faces the elevator for shooting, and the camera frame corresponding to the input of the second trunk network is arranged at one corner of a certain open space for shooting, so that the learned first trunk network and the learned second trunk network have obvious difference and cannot be used interchangeably. The two backbone networks are trained or fine-tuned on the basis of data collected in a scene, the training is carried out synchronously, and the parameters of the two backbone networks are different.
The same network is used: when two paths of cameras in some deployments are not clearly distinguished and are deployed in a large scale, the same backbone network can be adopted. During the training process, the backbone network is actually a network for parameter sharing. When the same network is adopted, the similarity of two targets, such as Euclidean distance and cosine distance, can be calculated by adopting a mode of calculating the distance between two pedestrian feature vectors without using an individual similarity judgment network.
In practical application, the two backbone networks may be deep network models with attention mechanism introduced, and are preferably CvT network structures.
Transformer was a model proposed in 2017, based mainly on Self-Attention structure. The method has high calculation efficiency and good expansibility, and can support the training of models with more than 100B parameters. In recent years, transformers have been introduced into computer vision. Vision Transformer (ViT) classifies images directly using transformers without the need for convolutional networks. In order for the ViT model to handle pictures, the picture is first divided into a number of tiles (Token), and then the sequence of tiles is passed into ViT. Convolution vision transformer (cvt) is a new architecture that improves performance and efficiency by introducing a convolution into ViT to produce the best effect of the combination of the two designs.
The embodiment of the invention adopts the CvT network with the dynamic attention model as the main network to extract the characteristics of the target area of the small group of people, thereby being beneficial to improving the description expression of the characteristics of a plurality of targets. In fig. 2, the dashed box is a normal operation in the process of extracting features from the backbone network, which is not described herein again.
And S3, inputting the global features of the first population and the global features of the second population into the first neural network for training to obtain a population similarity judgment network.
The first neural network is a preset initial network, and is trained by using the global features of the first population and the global features of the second population, so that a population similarity judgment network can be obtained.
After the feature extraction of the two main networks, the two main networks are combined for carrying out associated feature extraction, namely, the crowd similarity judgment network in the graph judges whether the same target people exist in R1 and R2.
And S4, inputting the individual characteristics of the first population and the individual characteristics of the second population into a second neural network for training to obtain an individual similarity judgment network.
The second neural network is a preset initial network, and is trained by using the individual features of the first population and the individual features of the second population, so that an individual similarity judgment network can be obtained.
In the training stage, in order to facilitate network training, only the features of the target closest to the center position of the ROI area can be extracted, and whether the individuals of the targets at the centers of the two images are the same person or not is compared.
Specifically, the method comprises the following steps: selecting the coordinate of the pedestrian closest to the center of the first crowd area as a first central target, and selecting the coordinate of the pedestrian closest to the center of the second crowd area as a second central target;
respectively acquiring corresponding individual features of the first central target and the second central target according to the coordinates of the first central target and the second central target;
and inputting the two individual characteristics into a second neural network for training to obtain an individual similarity judgment network.
By sampling the individual characteristics of the central target (the pedestrian target closest to the most central position of the ROI in the crowd area), connecting the central target characteristics of the ROI of the two images to extract the associated characteristics and judge whether the central target characteristics are the same person, the individual loss function is calculated and combined with the crowd loss function, and the synchronous training of the network is realized.
In the testing stage, only the features of the central target may be extracted for comparison, and the features of all targets in the ROI may also be extracted for comparison and analysis, which is not limited in the embodiment of the present invention.
Concat → conv → FC in the population similarity determination network and the individual similarity determination network in fig. 2 refers to concatenation → convolution → full join operation for extracted features, where conv (convolution) may be one convolution layer or multiple convolution layers, which is not limited in the embodiment of the present invention.
S5, constructing a small stock crowd re-identification model according to the first backbone network, the second backbone network, the crowd similarity judgment network and the individual similarity judgment network.
From the above, the input to the model is the region of interest (ROI) of the crowd in the video frames from two different cameras, after pedestrian detection: the set of pedestrian target boxes in the first sample image is
Figure 94382DEST_PATH_IMAGE021
Figure 111885DEST_PATH_IMAGE022
The smaller the i serial number is, the closer the target is to the center of the image; the set of pedestrian target boxes in the second sample image is
Figure 10440DEST_PATH_IMAGE023
Figure 813311DEST_PATH_IMAGE024
The number of the targets is the number of the targets,
Figure 873540DEST_PATH_IMAGE025
smaller sequence numbers indicate that the object is closer to the center of the image. To be provided with
Figure 394651DEST_PATH_IMAGE026
For example, the method for determining the ROI of the population is as follows:
and S11, detecting the pedestrian target of the sample image, and marking the target frame of each target pedestrian. If the sample image is a first sample image, the crowd area is a first crowd area; if the sample image is the second sample image, the crowd region is the second crowd region.
And S12, acquiring the target pedestrian closest to the center position of the sample image as a first target pedestrian.
And S13, calculating coordinate distances between the other target pedestrians and the first target pedestrian, and sequencing the other target pedestrians according to the coordinate distances from small to large to form a target pedestrian sequence.
Selecting the target pedestrian at the most central position of the sample image as the center
Figure 96897DEST_PATH_IMAGE027
Traversing all other pedestrian targets, calculating the sum of the pedestrian targets
Figure 754274DEST_PATH_IMAGE028
Is a distance of
Figure 250983DEST_PATH_IMAGE029
(calculated from the coordinate values on the graph) if
Figure 993811DEST_PATH_IMAGE029
(calculated from the coordinate values on the graph) if
Figure 968590DEST_PATH_IMAGE029
<
Figure 746053DEST_PATH_IMAGE030
Then it is combined with
Figure 148084DEST_PATH_IMAGE031
Forming a population, labeled
Figure 621616DEST_PATH_IMAGE032
Figure 150818DEST_PATH_IMAGE030
Is a predetermined distance threshold.
S14, dividing the first N target pedestrians and the first target pedestrian in the target pedestrian sequence, wherein the coordinate distance of the first N target pedestrians and the first target pedestrian are smaller than the preset distance, and the region surrounded by the circumscribed rectangles of all the target frames of the same group is a crowd region.
And S15, taking one target pedestrian which does not enter the crowd area as the updated first target pedestrian.
And S16, calculating coordinate distances between the other target pedestrians which do not enter the crowd area and the first target pedestrian, and sequencing the other target pedestrians which do not enter the crowd area according to the coordinate distances from small to large to form an updated target pedestrian sequence.
And S17, repeatedly executing the steps S14 to S16 until all target pedestrians in the sample image are drawn into a certain crowd area.
Setting a maximum number N of people, when the number of people exceeds N,
Figure 32055DEST_PATH_IMAGE033
and finishing the calculation of the ROI of the group, wherein the circumscribed rectangles of all the target frames are the ROI areas. Find backward no-entry
Figure 886879DEST_PATH_IMAGE034
First object of (2)
Figure 587987DEST_PATH_IMAGE035
Repeating the operations of steps S14 to S16, the search does not proceed
Figure 920880DEST_PATH_IMAGE036
To form the next ROI. And repeatedly finishing the judgment of the last pedestrian target.
Further, after the S1, the method further includes:
the size of the first crowd area or the second crowd area is adjusted so that the size of the first crowd area and the second crowd area are the same.
When the aspect ratio of a region of a certain ROI is greater than or equal to 1.5, the ROI is subjected to classification and blocking processing to form two regions close to squares
Figure 922203DEST_PATH_IMAGE037
And
Figure 682348DEST_PATH_IMAGE038
and the result is prevented from being influenced by serious deformation caused by adjusting the image into a square with a fixed size to adapt to the condition that the network input is a fixed square. When one or more pedestrians appear on the image cutting line, whether the new area formed by the pedestrians is included or not can be selected according to how close to the square the new area is formed by the pedestrians is included or not, and the new area is obtained by parting and dividing processing
Figure 870753DEST_PATH_IMAGE037
And
Figure 7336DEST_PATH_IMAGE038
there may be overlapping portions of the regions; when the ROI area aspect ratio is less than 1.5, the image size (Resize) is directly adjusted to a fixed size square as input.
Thus, a series of ROI regions of the human population are obtained from the two sample images respectively and are recorded as
Figure 863166DEST_PATH_IMAGE039
And
Figure 777901DEST_PATH_IMAGE040
Figure 469913DEST_PATH_IMAGE041
and
Figure 393876DEST_PATH_IMAGE042
all equal to 1, the ROI degenerates into the detection frame area of one pedestrian, and the problem degenerates into the single pedestrian re-identification problem. The result can be obtained by directly adopting individual characteristic comparison without carrying out crowd-related analysis.
When the distance threshold value
Figure 854944DEST_PATH_IMAGE030
When the setting is larger than the image size, the target of analysis is the whole image range, namely all individuals appearing in the visual field range of the same camera are considered to form a group.
In the training stage, the loss function of the whole network model consists of two parts, namely a crowd loss function
Figure 212019DEST_PATH_IMAGE043
Second is the individual loss function
Figure 391328DEST_PATH_IMAGE044
The crowd loss function calculation adopts the commonly used two-class or multi-class problem loss functions, such as a mean square error loss function and a cross entropy loss function. The crowd similarity may model a two-class problem or a multi-class problem. Two categories of the second category are: [ not including the same person, including at least the same person ]; multiple classifications [ no identity, 1 identity, 2 identity, N-1 identity, greater than or equal to N identity ]; n is a positive integer predetermined according to actual use. To a certain extent, different categories represent different degrees of similarity of the population, and the larger the category number is, the higher the degree of similarity is represented.
The individual loss function calculation adopts a common two-classification problem loss function, and the individual similarity judgment is a two-classification problem, wherein the two types are respectively different people and the same person.
Loss function of small stock crowd re-identification model training whole
Figure 853402DEST_PATH_IMAGE045
(ii) a Wherein the content of the first and second substances,
Figure 434556DEST_PATH_IMAGE046
as a population loss function;
Figure 691094DEST_PATH_IMAGE047
is an individual loss function;
Figure 357699DEST_PATH_IMAGE048
the hyper-parameters determined for the experiments.
Another embodiment of the present invention provides a method for re-identifying a small group of people, as shown in fig. 3, the method includes:
s101, a first image to be detected and a second image to be detected are obtained, a first crowd area to be detected in the first image to be detected and a second crowd area to be detected in the second image to be detected are determined, and the first crowd area to be detected and the second crowd area to be detected are combined in pairs to form a comparison crowd pair.
S102, inputting a first to-be-detected crowd region in the comparison crowd pair into a first trunk network of the small crowd weight identification model to obtain individual features of each pedestrian in the first to-be-detected crowd and global features of the first to-be-detected crowd, and inputting a second to-be-detected crowd region in the comparison crowd pair into a second trunk network of the small crowd weight identification model to obtain individual features of each pedestrian in the second to-be-detected crowd and global features of the second to-be-detected crowd.
S103, inputting the global features of the first to-be-detected crowd and the global features of the second to-be-detected crowd into a crowd similarity judgment network of the small crowd re-identification model to obtain a crowd similarity score of a comparison crowd pair.
And S104, combining the individual characteristics of all the pedestrians in the first to-be-detected crowd and the individual characteristics of all the pedestrians in the second to-be-detected crowd in pairs and inputting the combined individual characteristics into an individual similarity judgment network of the small-share crowd re-identification model to obtain individual similarity scores of each pair of pedestrians in the comparison crowd pair.
For a pair of small crowd regions, which are abbreviated as R1 and R2, respectively, m pedestrians are contained in R1, n pedestrians are contained in R2, in the feature map of the main network, the features of all pedestrians in R1 and all pedestrians in R2 are sampled and extracted in sequence, and all pedestrians in R1 are traversed (R2)
Figure 623464DEST_PATH_IMAGE049
All pedestrians (m) and R2
Figure 793545DEST_PATH_IMAGE050
n) inputting the individual similarity judgment network in a pairwise combination manner, judging whether the pair of pedestrians are the same person, and recording the result as
Figure 752143DEST_PATH_IMAGE051
And S105, obtaining a final similarity score of the comparison crowd pair and a re-identification result of the same pedestrian in the comparison crowd pair according to the crowd similarity score and the individual similarity score.
Wherein, the small stock crowd re-identification model is trained by adopting any one of the methods.
Wherein, the S105 specifically includes:
s1051, calculating the individual similarity statistical score of each pedestrian according to the individual similarity score of each pair of pedestrians.
Specifically, the method comprises the following steps: if the kth pedestrian in the first crowd area to be detected is paired with each pedestrian in the second crowd area to be detected, only one individual similarity score of the pedestrian pair
Figure 640464DEST_PATH_IMAGE052
Greater than a predetermined threshold
Figure 444341DEST_PATH_IMAGE053
Then, then
Figure 983776DEST_PATH_IMAGE054
(ii) a Wherein the content of the first and second substances,
Figure 598428DEST_PATH_IMAGE055
the individual similarity statistical score of the kth pedestrian in the first to-be-detected crowd region is obtained;
if the kth pedestrian in the first crowd region to be detected and the pedestrian pair consisting of each pedestrian in the second crowd region to be detected have individual similarity scores of a plurality of pedestrian pairs
Figure 223313DEST_PATH_IMAGE056
Greater than a predetermined threshold
Figure 581613DEST_PATH_IMAGE057
Then, the pedestrian pairs meeting the requirement are ranked from large to small according to the individual similarity scores thereof to obtain
Figure 993133DEST_PATH_IMAGE058
Then, then
Figure 44265DEST_PATH_IMAGE059
(ii) a Wherein the content of the first and second substances,
Figure 156447DEST_PATH_IMAGE060
is a constant coefficient of the number of the light emitting elements,
Figure 318438DEST_PATH_IMAGE061
k is satisfied for pedestrian
Figure 301306DEST_PATH_IMAGE062
The number of pedestrians.
S1052, calculating to obtain a final similarity score of the comparison crowd pair according to a first formula, and taking the pedestrian pair successfully matched in the comparison crowd pair as a re-identification result of the same pedestrian in the comparison crowd pair;
the first formula is
Figure 523340DEST_PATH_IMAGE063
Wherein the content of the first and second substances,
Figure 122817DEST_PATH_IMAGE064
is the final similarity score to the crowd pair;
Figure 88499DEST_PATH_IMAGE065
a population similarity score to a population pair;
Figure 191453DEST_PATH_IMAGE005
is a constant coefficient;
Figure 568077DEST_PATH_IMAGE066
the individual similarity statistical score of the kth pedestrian in the first to-be-detected crowd region is obtained;
Figure 405583DEST_PATH_IMAGE067
Figure 893065DEST_PATH_IMAGE068
the number of the same person is matched in the comparison crowd pair.
Further, the number of pedestrians who are judged to be the same person in R2 among R1 is T1. The number of pedestrians judged to be the same person in R1 among R2 is T2. To avoid the effect of mismatching, the total number of successfully matched pedestrian pairs is summed
Figure 132416DEST_PATH_IMAGE069
(ii) a Wherein T1 is that the individual similarity score in the first human group region R1 is larger than the preset thresholdA pedestrian pair total of value; and T2 is the total number of the pedestrian pairs in the second detected crowd region R2, of which the individual similarity scores are larger than the preset threshold value.
Therefore, a small crowd similarity score S and a re-recognition result of the same pedestrian in the two crowd ROIs can be obtained, and the larger the score S is, the larger the probability that the same pedestrian is contained in the two crowds is.
After the two ROI areas are analyzed, comprehensive analysis and output for a full image or a sequence image can be further performed according to the actual application requirements, and the comprehensive analysis and output are not specifically designed.
Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

Claims (10)

1. A training method for a small group re-recognition model is characterized by comprising the following steps:
s1, acquiring a first sample image and a second sample image, and determining a first crowd region in the first sample image and a second crowd region in the second sample image;
s2, inputting the first crowd region into a first backbone network to obtain the individual characteristic of each pedestrian in the first crowd and the global characteristic of the first crowd, and inputting the second crowd region into a second backbone network to obtain the individual characteristic of each pedestrian in the second crowd and the global characteristic of the second crowd;
s3, inputting the global features of the first population and the global features of the second population into a first neural network for training to obtain a population similarity judgment network;
s4, inputting the individual characteristics of the first population and the individual characteristics of the second population into a second neural network for training to obtain an individual similarity judgment network;
s5, constructing a small share crowd re-identification model according to the first backbone network, the second backbone network, the crowd similarity judgment network and the individual similarity judgment network.
2. The method according to claim 1, wherein the S4 specifically includes:
selecting the coordinate of the pedestrian closest to the center of the first crowd area as a first center target, and selecting the coordinate of the pedestrian closest to the center of the second crowd area as a second center target;
respectively acquiring corresponding individual features of the first central target and the second central target according to the coordinates of the first central target and the second central target;
inputting the two individual characteristics into a second neural network for training to obtain an individual similarity judgment network.
3. The method according to claim 1, wherein the determining of the first crowd region in the first sample image and the second crowd region in the second sample image in S1 specifically includes:
s11, carrying out pedestrian target detection on the sample image, and marking a target frame of each target pedestrian;
s12, acquiring a target pedestrian closest to the center position of the sample image as a first target pedestrian;
s13, calculating coordinate distances between the other target pedestrians and the first target pedestrian, and sequencing the other target pedestrians according to the coordinate distances from small to large to form a target pedestrian sequence;
s14, dividing the first N target pedestrians with the coordinate distance smaller than the preset distance in the target pedestrian sequence and the first target pedestrian into a same group, wherein the region surrounded by the circumscribed rectangles of all the target frames of the same group is a crowd region;
s15, taking a target pedestrian which does not enter the crowd area as an updated first target pedestrian;
s16, calculating coordinate distances between the rest target pedestrians which do not enter the crowd area and the first target pedestrian, and sequencing the rest target pedestrians which do not enter the crowd area according to the coordinate distances from small to large to form an updated target pedestrian sequence;
s17, repeating the steps S14 to S16 until all target pedestrians in the sample image are drawn into a certain crowd area;
if the sample image is the first sample image, the crowd region is the first crowd region; and if the sample image is the second sample image, the crowd area is the second crowd area.
4. The method according to claim 1 or 3, wherein after the S1, the method further comprises:
adjusting the size of the first population area or the second population area such that the size of the first population area and the second population area are the same.
5. The method according to claim 1, wherein the first backbone network and/or the second backbone network is a CvT network structure introducing a mechanism of attention.
6. The method of claim 1, wherein the loss function of the small segment population re-identification model
Figure 85196DEST_PATH_IMAGE001
Wherein the content of the first and second substances,
Figure 956200DEST_PATH_IMAGE002
as a population loss function;
Figure 280871DEST_PATH_IMAGE003
as a function of individual losses;
Figure 461186DEST_PATH_IMAGE004
is a preset hyper-parameter.
7. A method for re-identifying a small group of people, the method comprising:
s101, acquiring a first image to be detected and a second image to be detected, determining a first crowd area to be detected in the first image to be detected and a second crowd area to be detected in the second image to be detected, and combining two by two to form a comparison crowd pair;
s102, inputting a first to-be-detected crowd region in the comparison crowd pair into a first main network of a small crowd re-identification model to obtain individual features of each pedestrian in a first to-be-detected crowd and global features of the first to-be-detected crowd, and inputting a second to-be-detected crowd region in the comparison crowd pair into a second main network of the small crowd re-identification model to obtain individual features of each pedestrian in a second to-be-detected crowd and global features of the second to-be-detected crowd;
s103, inputting the global features of the first to-be-detected crowd and the second to-be-detected crowd into a crowd similarity judgment network of the small crowd re-identification model to obtain a crowd similarity score of the comparison crowd pair;
s104, combining the individual characteristics of all pedestrians in the first to-be-detected crowd and the individual characteristics of all pedestrians in the second to-be-detected crowd in pairs and inputting the combined individual characteristics into an individual similarity judgment network of the small-strand crowd re-identification model to obtain individual similarity scores of each pair of pedestrians in the comparison crowd pair;
s105, obtaining the final similarity score of the comparison crowd pair and the re-identification result of the same pedestrian in the comparison crowd pair according to the crowd similarity score and the individual similarity score;
wherein the small segment crowd re-identification model is trained by the method of any one of claims 1 to 6.
8. The method according to claim 7, wherein the S105 specifically includes:
s1051, calculating an individual similarity statistical score of each pedestrian according to the individual similarity score of each pair of pedestrians;
s1052, calculating to obtain a final similarity score of the comparison crowd pair according to a first formula, and taking a pedestrian pair successfully matched in the comparison crowd pair as a re-identification result of the same pedestrian in the comparison crowd pair;
the first formula is
Figure 887619DEST_PATH_IMAGE005
Wherein, the first and the second end of the pipe are connected with each other,
Figure 83020DEST_PATH_IMAGE006
(ii) a final similarity score for the pairwise population of pairs;
Figure 544088DEST_PATH_IMAGE007
a crowd similarity score for the comparative crowd pair;
Figure 629725DEST_PATH_IMAGE008
is a constant coefficient;
Figure 809033DEST_PATH_IMAGE009
the statistical score of the individual similarity of the kth pedestrian in the first to-be-detected crowd region is obtained;
Figure 271108DEST_PATH_IMAGE010
Figure 835950DEST_PATH_IMAGE011
and matching the number of the same person in the comparison crowd pair.
9. The method according to claim 8, wherein the S1051 specifically includes:
if the kth pedestrian in the first crowd area to be detected and the second crowd to be detectedIndividual similarity scores for only one of the pedestrian pairs of each pedestrian in the area
Figure 108800DEST_PATH_IMAGE012
Greater than a predetermined threshold
Figure 493513DEST_PATH_IMAGE013
Then, then
Figure 510011DEST_PATH_IMAGE014
(ii) a Wherein the content of the first and second substances,
Figure 663781DEST_PATH_IMAGE015
counting the individual similarity of the kth pedestrian in the first to-be-detected crowd region;
if the kth pedestrian in the first to-be-detected crowd region and the pedestrian pair consisting of each pedestrian in the second to-be-detected crowd region have individual similarity scores of a plurality of pedestrian pairs
Figure 622378DEST_PATH_IMAGE016
Greater than a predetermined threshold
Figure 510700DEST_PATH_IMAGE017
Then the pedestrian pairs meeting the requirement are ranked from large to small according to the individual similarity scores
Figure 580156DEST_PATH_IMAGE018
Then, then
Figure 604744DEST_PATH_IMAGE019
(ii) a Wherein, the first and the second end of the pipe are connected with each other,
Figure 751821DEST_PATH_IMAGE020
is a constant coefficient of the number of the light emitting elements,
Figure 861859DEST_PATH_IMAGE021
to a rowHuman k satisfies
Figure 735006DEST_PATH_IMAGE022
The number of pedestrians.
10. The method of claim 9, wherein the total number of successfully matched pairs of pedestrians in the aligned pairs of people is determined
Figure 879680DEST_PATH_IMAGE023
Wherein, the T1 is the total number of pairs of pedestrians in the first detected crowd area whose individual similarity score is greater than the preset threshold; and the T2 is the total number of the pedestrian pairs with the individual similarity scores larger than the preset threshold value in the second to-be-detected crowd area.
CN202210486758.5A 2022-05-06 2022-05-06 Method for re-identifying small-share crowd and model training method Active CN114581858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210486758.5A CN114581858B (en) 2022-05-06 2022-05-06 Method for re-identifying small-share crowd and model training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210486758.5A CN114581858B (en) 2022-05-06 2022-05-06 Method for re-identifying small-share crowd and model training method

Publications (2)

Publication Number Publication Date
CN114581858A true CN114581858A (en) 2022-06-03
CN114581858B CN114581858B (en) 2022-08-23

Family

ID=81769351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210486758.5A Active CN114581858B (en) 2022-05-06 2022-05-06 Method for re-identifying small-share crowd and model training method

Country Status (1)

Country Link
CN (1) CN114581858B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810476A (en) * 2014-02-20 2014-05-21 中国计量学院 Method for re-identifying pedestrians in video monitoring network based on small-group information correlation
CN110135233A (en) * 2019-01-24 2019-08-16 刘赏 The common Assembling Behavior recognition methods of terminal passenger based on video analysis
CN110751018A (en) * 2019-09-03 2020-02-04 上海交通大学 Group pedestrian re-identification method based on mixed attention mechanism
CN110765841A (en) * 2019-09-03 2020-02-07 上海交通大学 Group pedestrian re-identification system and terminal based on mixed attention mechanism
CN111666843A (en) * 2020-05-25 2020-09-15 湖北工业大学 Pedestrian re-identification method based on global feature and local feature splicing
CN111914642A (en) * 2020-06-30 2020-11-10 浪潮电子信息产业股份有限公司 Pedestrian re-identification method, device, equipment and medium
CN113469080A (en) * 2021-07-08 2021-10-01 中国科学院自动化研究所 Individual, group and scene interactive collaborative perception method, system and equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810476A (en) * 2014-02-20 2014-05-21 中国计量学院 Method for re-identifying pedestrians in video monitoring network based on small-group information correlation
CN110135233A (en) * 2019-01-24 2019-08-16 刘赏 The common Assembling Behavior recognition methods of terminal passenger based on video analysis
CN110751018A (en) * 2019-09-03 2020-02-04 上海交通大学 Group pedestrian re-identification method based on mixed attention mechanism
CN110765841A (en) * 2019-09-03 2020-02-07 上海交通大学 Group pedestrian re-identification system and terminal based on mixed attention mechanism
CN111666843A (en) * 2020-05-25 2020-09-15 湖北工业大学 Pedestrian re-identification method based on global feature and local feature splicing
CN111914642A (en) * 2020-06-30 2020-11-10 浪潮电子信息产业股份有限公司 Pedestrian re-identification method, device, equipment and medium
CN113469080A (en) * 2021-07-08 2021-10-01 中国科学院自动化研究所 Individual, group and scene interactive collaborative perception method, system and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FUYAN MA ETAL.: ""Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion"", 《ARXIV》 *
许琪羚: ""基于群组的行人重识别关键技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Also Published As

Publication number Publication date
CN114581858B (en) 2022-08-23

Similar Documents

Publication Publication Date Title
CN109670528B (en) Data expansion method facing pedestrian re-identification task and based on paired sample random occlusion strategy
Kuo et al. How does person identity recognition help multi-person tracking?
CN101630363B (en) Rapid detection method of face in color image under complex background
WO2020114118A1 (en) Facial attribute identification method and device, storage medium and processor
US9367730B2 (en) Method and system for automated face detection and recognition
US11017215B2 (en) Two-stage person searching method combining face and appearance features
CN104504362A (en) Face detection method based on convolutional neural network
CN110598535B (en) Face recognition analysis method used in monitoring video data
CN109816689A (en) A kind of motion target tracking method that multilayer convolution feature adaptively merges
CN111553193A (en) Visual SLAM closed-loop detection method based on lightweight deep neural network
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN109800624A (en) A kind of multi-object tracking method identified again based on pedestrian
CN108960047B (en) Face duplication removing method in video monitoring based on depth secondary tree
CN111027377B (en) Double-flow neural network time sequence action positioning method
CN112258559B (en) Intelligent running timing scoring system and method based on multi-target tracking
CN109344842A (en) A kind of pedestrian&#39;s recognition methods again based on semantic region expression
CN112115838B (en) Face classification method based on thermal infrared image spectrum fusion
CN111539351A (en) Multi-task cascaded face frame selection comparison method
CN113947814A (en) Cross-visual angle gait recognition method based on space-time information enhancement and multi-scale saliency feature extraction
CN110348366B (en) Automatic optimal face searching method and device
Ishii et al. Face detection based on skin color information in visual scenes by neural networks
CN109711232A (en) Deep learning pedestrian recognition methods again based on multiple objective function
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment
CN114581858B (en) Method for re-identifying small-share crowd and model training method
CN114399731B (en) Target positioning method under supervision of single coarse point

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant