CN115205779A - People number detection method based on crowd image template - Google Patents

People number detection method based on crowd image template Download PDF

Info

Publication number
CN115205779A
CN115205779A CN202210699257.5A CN202210699257A CN115205779A CN 115205779 A CN115205779 A CN 115205779A CN 202210699257 A CN202210699257 A CN 202210699257A CN 115205779 A CN115205779 A CN 115205779A
Authority
CN
China
Prior art keywords
crowd
image
people
template
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210699257.5A
Other languages
Chinese (zh)
Inventor
柳阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202210699257.5A priority Critical patent/CN115205779A/en
Publication of CN115205779A publication Critical patent/CN115205779A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a people number detection method based on a crowd image template, wherein the people number in the crowd image to be detected is determined by acquiring the crowd image to be counted, which needs to be counted, determining a candidate crowd template image with the highest similarity degree with the crowd image to be detected from a plurality of candidate crowd template images, taking the candidate crowd template image with the highest similarity degree as a target crowd template image, and determining the people number in the crowd image to be detected according to the number corresponding to the target crowd template image. The number of people who detects crowd's image is confirmed to the number of people that this application is direct through obtaining and treat the highest target crowd template image of crowd's image similarity to according to the number of people that target crowd template image corresponds, can improve the efficiency that the number detected.

Description

People number detection method based on crowd image template
Technical Field
The application relates to the technical field of people number detection, in particular to a people number detection method based on a crowd image template.
Background
At present, people number detection of people in images through computer vision correlation technology is a popular research direction, and the research goal is to give an image or a video under a crowd scene, generate a corresponding crowd density map through an algorithm and detect the people number in the crowd density map. In the related art, the efficiency of detecting the number of people in an image is low.
Disclosure of Invention
The embodiment of the application provides a people number detection method based on a crowd image template, which can improve the efficiency of people number detection of crowds in an image.
In a first aspect, an embodiment of the present application provides a people number detection method based on a crowd image template, including:
acquiring a to-be-counted crowd image needing to be counted;
determining a candidate crowd template image with the highest similarity degree with the image of the crowd to be detected from a plurality of candidate crowd template images;
taking the candidate crowd template image with the highest similarity as a target crowd template image;
and determining the number of people in the image of the group to be detected according to the number of people corresponding to the target group template image.
In a second aspect, an embodiment of the present application further provides a people number detection apparatus based on a people group image template, including:
the acquisition module is used for acquiring the image of the crowd to be counted, which needs to be counted;
the first determining module is used for determining a candidate group template image with the highest similarity degree with the to-be-detected group image from a plurality of candidate group template images;
the second determining module is used for taking the candidate crowd template image with the highest similarity as a target crowd template image;
and the third determining module is used for determining the number of people in the image of the group to be detected according to the number of people corresponding to the target group template image.
In a third aspect, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, which, when executed on a processor, causes the computer to execute the people number detection method based on the crowd image template as provided in any of the embodiments of the present application.
In a fourth aspect, an embodiment of the present application further provides an electronic device, including a processor and a memory, where the memory has a computer program, and the processor is configured to execute the method for detecting the number of people based on the crowd image template, as provided in any embodiment of the present application, by calling the computer program.
According to the technical scheme, the crowd images to be counted, which need to be counted, are acquired, the candidate crowd template image with the highest similarity degree with the crowd images to be detected is determined from the candidate crowd template images, the candidate crowd template image with the highest similarity degree is used as the target crowd template image, and the number of people in the crowd images to be detected is determined according to the number of people corresponding to the target crowd template image. The number of people who detects crowd's image is confirmed to the number of people that this application is direct through obtaining and treat the highest target crowd template image of crowd's image similarity to according to the number of people that target crowd template image corresponds, can improve the efficiency that the number detected.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a scene schematic diagram of a people number detection system according to an embodiment of the present application.
Fig. 2 is a first flowchart of a people number detection method based on a crowd image template according to an embodiment of the present application.
Fig. 3 is a second flowchart of the people number detection method based on the crowd image template according to the embodiment of the present application.
Fig. 4 is a schematic structural diagram of a people number detection model provided in the embodiment of the present application.
Fig. 5 is a schematic structural diagram of an attention scale network of a people number detection method based on a crowd image template according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of a people number detection apparatus based on a crowd image template according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.
Fig. 8 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are intended to be within the scope of the present application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
At present, people number detection of people in images through computer vision correlation technology is a popular research direction, and the research goal is to give an image or a video under a crowd scene, generate a corresponding crowd density map through an algorithm and detect the people number in the crowd density map. In the related art, the efficiency of detecting the number of people in the image is low.
In order to improve the efficiency of detecting the number of people in an image, an embodiment of the present application provides a method for detecting the number of people based on a crowd image template, where an execution main body of the method for detecting the number of people based on the crowd image template may be a device for detecting the number of people based on the crowd image template provided in the embodiment of the present application, or an electronic device integrated with the device for detecting the number of people based on the crowd image template, where the device for detecting the number of people based on the crowd image template may be implemented in a hardware or software manner.
Referring to fig. 1, the present application further provides a system for detecting a number of people, as shown in fig. 1, the system for detecting a number of people includes an electronic device 10, and a device for detecting a number of people based on a crowd image template provided by the present application is integrated in the electronic device 10. For example, when the electronic device 10 is further configured with a camera, the electronic device can directly shoot the people to be counted through the configured camera, so as to obtain the people to be counted image to be counted, then determine the target people template image matched with the people to be counted image, and determine the number of people in the people to be counted image according to the number of people corresponding to the target people template image. The method and the device have the advantages that the target crowd template image matched with the image of the crowd to be detected is directly obtained, the number of people of the crowd to be detected is determined according to the number of people corresponding to the target crowd template image, and the number detection efficiency of people of the crowd in the image can be improved.
The electronic device 10 may be any device equipped with a processor and having processing capability, such as a mobile electronic device equipped with a processor, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, etc., or a stationary electronic device equipped with a processor, such as a desktop computer, a television, a server, etc.
In addition, as shown in fig. 1, the people number detecting system may further include a memory 20 for storing data, for example, the electronic device 10 may store the obtained data of the number of people in the image of the people to be detected, the template image of the target people, and the image of the people to be detected in the memory 20.
It should be noted that the scene schematic diagram of the people number detection system shown in fig. 1 is only an example, and the people number detection system and the scene described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not constitute a limitation to the technical solution provided in the embodiment of the present application, and it is known by a person skilled in the art that the technical solution provided in the embodiment of the present application is also applicable to similar technical problems with the evolution of the people number detection system and the occurrence of a new business scene.
The following are detailed below. The numbers in the following examples are not intended to limit the order of preference of the examples.
Referring to fig. 2, fig. 2 is a first flowchart of a people number detection method based on a crowd image template according to an embodiment of the present disclosure. The specific flow of the people number detection method based on the crowd image template provided by the embodiment of the application can be as follows:
101. and acquiring an image of the crowd to be counted, which needs to be counted.
The image of the crowd to be detected refers to an image needing to be subjected to people number detection. The image of the crowd to be detected can be obtained by shooting through image acquisition equipment such as a camera and the like, can also be obtained through a pre-shot image stored in the local electronic equipment, and can also be obtained by obtaining image resources stored in a server side.
For example, when acquiring the image of the crowd to be counted, which needs to be counted, the image of the crowd to be detected can be obtained by photographing the place where the number of people needs to be counted. The places where people number calculation is needed can be shopping malls, banks, stations, scenic spots on holidays and the like.
For example, in a bank place, in order to avoid potential safety hazards caused by too many people entering the bank, people in the bank need to be counted, so that the bank staff can manage the flow of people in the bank conveniently, camera equipment can be arranged in the bank to shoot all areas of the bank, and shot pictures are taken as images of people to be detected.
102. And determining a candidate crowd template image with the highest similarity degree with the crowd image to be detected from the plurality of candidate crowd template images.
Wherein, the candidate crowd template image refers to the selectable crowd template image. The crowd template image comprises crowd images of different crowd distribution conditions, and can be obtained through multiple ways, for example, the crowd template image can be obtained through shooting by image acquisition equipment such as a camera and can also be obtained through obtaining image resources stored by a server.
In this embodiment, after the crowd image to be detected is acquired, the candidate crowd template image with the highest similarity to the crowd image to be detected is determined from the plurality of candidate crowd template images.
It should be noted that the image content in the target crowd template image has a higher similarity with the image content in the crowd image to be detected.
Exemplarily, a candidate crowd template image with the highest similarity with the image of the crowd to be detected can be determined from the plurality of candidate crowd template images as the target crowd template image through an image similarity algorithm.
103. And taking the candidate crowd template image with the highest similarity as the target crowd template image.
In this embodiment, after determining a candidate population template image having the highest similarity with the image of the population to be detected from among the plurality of candidate population template images, the candidate population template image having the highest similarity is used as the target population template image.
104. And determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image.
For example, in this embodiment, the number of people corresponding to the target people template image may be directly used as the number of people in the to-be-detected people image.
In this embodiment, the number of people in the people template image may be detected in advance to obtain the number of people in the people template image, then the corresponding relationship between the people template image and the number of people corresponding to the people template image may be established, and when the number of people corresponding to the target people template image is obtained, the number of people corresponding to the target people template image may be quickly obtained according to the pre-established corresponding relationship.
When the people number detection processing is carried out on the crowd template image, a density map corresponding to the crowd template image can be obtained, and the people number in the crowd template image is determined according to the density map. In addition, the human faces in the crowd template image can be identified through human face identification, and the number of people in the crowd template image can be determined according to the number of the identified human faces. The number of people in the crowd template image can be obtained in various ways, and the method is not particularly limited.
In particular implementation, the present application is not limited by the execution sequence of the described steps, and some steps may be performed in other sequences or simultaneously without conflict.
As can be seen from the above, in the people number detection method based on the crowd image template provided in the embodiment of the application, the crowd image to be counted, which needs to be counted, is obtained, the candidate crowd template image with the highest similarity degree with the crowd image to be detected is determined from the candidate crowd template images, the candidate crowd template image with the highest similarity degree is used as the target crowd template image, and the people number in the crowd image to be detected is determined according to the people number corresponding to the target crowd template image. The number of people who detects crowd's image is confirmed to the number of people that this application is direct through obtaining and treat the highest target crowd template image of crowd's image similarity to according to the number of people that target crowd template image corresponds, can improve the efficiency that the number detected.
The method according to the previous embodiment is further illustrated in detail by way of example.
Referring to fig. 3, fig. 3 is a second flowchart of the people detection method based on the crowd image template according to the embodiment of the present disclosure. The method comprises the following steps:
201. and acquiring an image of the crowd to be counted, which needs to be counted.
The image of the crowd to be detected refers to an image needing to be subjected to people number detection. The image of the crowd to be detected can be obtained by shooting through image acquisition equipment such as a camera and the like, can also be obtained through a pre-shot image stored in the local electronic equipment, and can also be obtained by acquiring image resources stored in a server side.
For example, in a bank place, in order to avoid potential safety hazards caused by too many people entering the bank, the people in the bank need to be counted, so that the bank staff can manage the flow of the people in the bank conveniently, camera equipment can be arranged in the bank to shoot each area of the bank, and the shot pictures are taken as the images of the people to be detected.
202. And constructing virtual crowd images with different crowd distribution situations.
For example, crowd scenes with different crowd distribution situations are constructed by using objects in a GTA5 virtual scene, then stable images are captured from the constructed scenes through a data collector, so as to obtain virtual crowd images with different crowd distribution situations, and then the virtual crowd images are used as crowd template images.
In this embodiment, the number of people in the crowd template image may be determined according to the rendering data by obtaining rendering data corresponding to each virtual crowd image, the corresponding relationship between the crowd template image and the number of people may be established according to the crowd template image and the number of people in the crowd template image, and the number of people corresponding to the target crowd template image may be determined according to the corresponding relationship.
According to the method and the device, the number of people in the crowd image to be detected is determined according to the number of people in the target crowd template image, so that the number of people in the crowd template image obtained according to the rendering data can ensure the accuracy of the number of people in the obtained crowd template image, and the accuracy of the method for detecting the number of people based on the crowd image template provided by the application is improved.
203. And taking the virtual crowd image as a crowd template image.
For example, after virtual crowd images with different crowd distribution situations are constructed, the constructed virtual crowd images can be used as crowd template images.
204. And determining a candidate crowd template image with the highest similarity degree with the crowd image to be detected from the plurality of candidate crowd template images.
Wherein, the candidate crowd template image refers to the selectable crowd template image. The crowd template image comprises crowd images of different crowd distribution conditions, and can be obtained through multiple ways, for example, the crowd template image can be obtained through shooting by image acquisition equipment such as a camera and can also be obtained through obtaining image resources stored by a server.
In one embodiment, when a candidate group template image with the highest similarity to the image of the group to be detected is determined from the plurality of candidate group template images, the image of the group to be detected can be input into the group image similarity model to obtain the similarity between the image of the group to be detected and each candidate group template image, and the candidate group template image with the highest similarity to the image of the group to be detected is determined according to the similarity.
The crowd image similarity model is configured to obtain similarity between an input crowd image to be detected and a plurality of candidate crowd template images. The crowd image similarity model can extract image features in crowd images and compare the image features, so that the similarity between the images is obtained.
Specifically, the crowd image similarity model may use an image similarity algorithm, and the image similarity algorithm is not specifically limited herein.
205. And taking the crowd template image with the highest similarity degree as a target crowd template image.
It can be understood that the higher the similarity between a certain candidate population template image and the image of the population to be detected is, the higher the similarity between the candidate population template image and the image content of the image of the population to be detected is. In this embodiment, the crowd template image with the highest similarity degree is used as the target crowd template image, so that the accuracy of the number of people in the subsequently acquired crowd image to be detected can be ensured.
206. And determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image.
For example, in this embodiment, the number of people corresponding to the target people template image may be directly used as the number of people in the to-be-detected people image.
In addition, in an implementation manner, when the number of people in the to-be-detected crowd image is determined according to the number of people corresponding to the target crowd template image, the to-be-detected crowd image can be input into the number-of-people detection model, the reference number of people corresponding to the to-be-detected crowd image is obtained, and the number of people in the to-be-detected crowd image is determined according to the number of people corresponding to the target crowd template image and the reference number of people.
The people number detection model is configured to perform people number detection processing on the people number image to be detected and obtain the reference people number in the people number image to be detected.
Specifically, the average value of the number of people corresponding to the target population template image and the reference number of people can be obtained, and the average value is used as the number of people in the population image to be detected. That is to say, the number of people corresponding to the target crowd template image is used as one reference value, the reference number obtained by inputting the crowd image to be detected into the people number detection model is also used as another reference value, and the number of people in the crowd image to be detected is determined according to the two reference values.
In one embodiment, when the number of people in the target people template image is determined according to the number of people corresponding to the target people template image, the number of people corresponding to the target people template image and the reference number of people can be subjected to weighted summation processing according to a preset strategy to obtain a weighted sum value, and the weighted sum value is used as the number of people in the target people image.
Wherein, the preset policy may be: and performing confidence calculation on the method used for acquiring the number of people corresponding to the target crowd template image through the neural network model, performing confidence calculation on the people number detection model through the neural network model, taking the confidence of the two people number detection modes as corresponding weight values, and performing weighted summation processing on the number of people corresponding to the target crowd template image and the reference number of people to obtain a weighted sum value.
For example, when the number of people corresponding to the target population template image is obtained, the confidence level of the face and the face number identification method may be obtained when the number of people corresponding to the target population template image is determined by the face and the face number identification method.
It should be noted that the confidence level refers to the confidence level of the measurement value.
The preset policy is not limited in the present application, and can be set by a person skilled in the art according to actual conditions.
The embodiment of the present application further provides a system for detecting a number of people, where the system for detecting a number of people includes a model for detecting a number of people, please refer to fig. 4, and fig. 4 is a schematic structural diagram of an optional model for detecting a number of people provided in the embodiment of the present application. The people number detection model comprises a density segmentation network, an attention scale network, a density map correction module, a density map fusion module and a people number detection module. When the number detection model performs number detection processing on a crowd image to be detected, the crowd image to be detected is input into a density segmentation network to obtain attention masks with different density levels corresponding to the crowd image to be detected, the crowd image to be detected is input into an attention scale network to obtain density maps and scale factors with different density levels corresponding to the crowd image to be detected, then the attention masks with different density levels, the density maps with different density levels and the scale factors with different density levels are input into the density map correction module to obtain corrected density maps corresponding to the density levels, then the corrected density maps with the density levels are input into a density map fusion module to obtain a target density map of the crowd image to be detected, and finally the target density map corresponding to the crowd image to be detected is input into the number detection module to obtain a reference number corresponding to the crowd image to be detected.
The density segmentation network can use VGG-19 as a backbone network, perform density level semantic segmentation on the crowd image to be detected, classify each pixel into a specific density level, and the pixels with the same density level form an attention mask region.
In an embodiment, before inputting the image of the crowd to be detected into the density segmentation network to obtain the attention masks corresponding to different density levels, the method may further include: and acquiring crowd sample images with different density distributions, and training the density segmentation network by adopting a loss function according to the crowd sample images.
Wherein the loss function may comprise an adaptive pyramid loss function.
For example, the loss function used in this embodiment may be as follows, since the mean square loss function (MSE) alone ignores the effect of different levels of density on the network training process. While the low-density and high-density distribution areas are usually quite unbalanced, the corresponding estimation error can bias the trained counting network, which can weaken the generalization capability of the counting network. In the embodiment, adaptive Pyramid Loss (APLoss) is added on the basis of a mean square Loss function, so that training bias can be relieved, and meanwhile, the generalization capability of the counting network is enhanced.
L=L MSE +λL AP
Figure BDA0003703336840000091
In the processing process of the loss function, the density map is divided into 2X2 grids each time, the number of people in each grid is detected, if the local count of each grid is greater than a threshold value T, the grid is divided into 2X2 again, and the operation is repeated; if the given threshold T is not exceeded, segmentation is not resumed. The local loss of the sub-regions corresponding to the grid obtained by the division is calculated as follows, X k Is the kth input image, D k Is the true density map to which it corresponds,
Figure BDA0003703336840000092
is a predicted density map of the density of the image,
Figure BDA0003703336840000093
is the sub-area after the nth division.
Figure BDA0003703336840000101
Total L AP As shown in the following formula, M is the standardThe magnitude of exercise
Figure BDA0003703336840000102
In one embodiment, when acquiring the crowd sample images with different density distributions, a virtual crowd image with different density distributions may be constructed as the crowd sample image.
For example, a data collector and labeler may be designed based on a data set of a game GTA5, which may synthesize crowd scenes and automatically label them. Thanks to the excellent game engine, the scene rendering, the texture details, the weather effect and the like of the game engine are very close to the real world situation, so that in the application, a complex crowded crowd scene can be constructed by using the objects in the GTA5 virtual scene, and then the data collector captures a stable image from the constructed scene, thereby obtaining virtual crowd images with different density distributions. And finally, automatically marking the head position of the person by analyzing the data from the game rendering template. By means of the designed collector and labeler, a large-scale and diversified synthetic crowd scene data set can be constructed.
In addition, in this embodiment, for the generation of the density level label of the ground channel, a specific flow may be as follows:
(1) All local counts were obtained by scanning the gt-population map in the training set pixel by pixel using a 64 x 64 sliding window.
(2) The size of the people in all the non-zero areas is calculated, and the average value AvgCnt can be obtained 11 And finds a minimum count MinCnt and a maximum count MaxCnt.
(3) This results in a threshold set of density levels MinCnt, avgCnt 11 MaxCnt, the density can then be divided into two levels: low density and high density. The average AvgCnt of all low density counts can then be iteratively calculated 21 And average value of all high density counts AvgCnt 22 . A new set of thresholds { MinCnt, avgCnt) is then obtained 21 ,AvgCnt 11 ,AvgCnt 22 MaxCnt, so that the density is divided into four levels, and so on.
(4) Thus, labels can be obtained for training, N density levels are given, N +1 density labels are provided, an additional background label is included, and then the density levels are marked according to the count of each pixel point in the gt crowd graph.
Please refer to fig. 5, and fig. 5 is a schematic structural diagram of an attention scale network of a people number detection method based on a crowd image template according to an embodiment of the present application. The Attention scale network comprises a feature extraction main body, an Attention Scaling (AS) branch and a Density Estimation (DE) branch, wherein the feature extraction main body is used for extracting image features of the crowd image to be detected, and VGG-19 can be used AS the feature extraction main body. The scale factor branch is used for learning scale factors, and then the estimation density of each corresponding sub-area is automatically adjusted by using the scale factors, so that local estimation errors are reduced. The density estimation branch is used for outputting a density map needing correction.
The density estimation branch comprises a space pyramid pooling layer and a convolution layer. When the image features are input into the density estimation branch to obtain density maps corresponding to different density levels, the image features can be input into the spatial pyramid pooling layer to obtain spatial pyramid pooling features, and then the pyramid pooling features are input into the convolution layer to obtain density maps corresponding to different density levels.
According to the method and the device, the spatial pyramid pooling layer is added in the density estimation branch, the features of different scales are calculated through the spatial pyramid pooling layer, and on the basis that the features are extracted from the feature extraction trunk, multi-scale context information is extracted, which is equivalent to that the context information is combined to predict the crowd density, so that the number of people is more accurately detected.
It should be noted that, since the limitation of the VGG network extraction used in the feature extraction backbone is that it encodes the same receptive field on the whole image, in order to solve this problem, in the present embodiment, a Spatial Pyramid Pooling layer is added in the density estimation branch, features of different scales are calculated by performing Spatial Pyramid Pooling (SPP), and on the basis of the VGG network extraction features, multi-scale context information is extracted. The calculation formula is as follows:
f j =U bi (F j (P ave (f v ,j),θ j )))
wherein, for each dimension j, P ave (f v J) feature f extracted from VGG network v Divided into k (j) xk (j) blocks, F j Is a convolutional network with a convolution kernel size of 1 for combining context features across channels without changing their size. U shape bi Representing bilinear interpolation, for outputting a feature map f j Sampling is as follows v The same size. The system adopts 4 different scales, corresponding to k (j) E {1,2,3,4}, and experiments prove that the setting is most effective for improving the network performance. And finally, fusing the context features with different scales with the original VGG features for outputting density maps with different density levels.
When the density map of each density level is corrected according to the attention mask and the scale factor corresponding to each density level, the density map correction module multiplies the attention mask, the scale factor and the density map corresponding to each density level respectively to obtain a corrected density map corresponding to each density level.
When the density map fusion module fuses the corrected density maps of the density levels, the corrected density maps of the density levels can be added, so that a target density map of the image of the crowd to be detected is obtained.
The people number detection module can perform people number detection processing according to a target density map corresponding to the people number image to be detected, and the reference people number of the people number image to be detected is obtained.
The number detection model provided by the embodiment of the application can be used for obtaining the attention masks with different density levels corresponding to the images of the people to be counted, which need to be subjected to people counting, and obtaining the density maps and the scale factors with different density levels corresponding to the images of the people to be counted, then correcting the density maps with different density levels according to the attention masks and the scale factors corresponding to the density levels to obtain the corrected density maps with different density levels, fusing the corrected density maps with different density levels to obtain the target density map of the images of the people to be detected, and finally determining the reference number of the people of the images of the people to be detected according to the target density map. The number detection model can avoid the influence of the distribution of people with different densities in different image areas in the image of the people to be detected on the number detection, so that the accuracy of the number measurement value obtained by the number detection model is higher.
As can be seen from the above, the people number detection method based on the crowd image template provided in the embodiment of the application obtains the crowd image to be counted, which needs to be counted, constructs the virtual crowd image with different crowd distribution conditions, uses the virtual crowd image as the crowd template image, determines the candidate crowd template image with the highest similarity degree with the crowd image to be detected from the candidate crowd template images, uses the crowd template image with the highest similarity degree as the target crowd template image, and determines the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image. The number of people who detects crowd's image is confirmed to the number of people that this application is direct through obtaining and treat the highest target crowd template image of crowd's image similarity to according to the number of people that target crowd template image corresponds, can improve the efficiency that the number detected. In addition, when the number of people in the crowd image to be detected is determined, the number of people corresponding to the target crowd template image can be used as a reference value, the number of reference people obtained by inputting the crowd image to be detected into the people detection model can also be used as a reference value, the number of people in the crowd image to be detected is determined according to the two reference values, and the accuracy of people number detection is improved.
In one embodiment, a people number detection device based on the people group image template is further provided. Referring to fig. 6, fig. 6 is a schematic structural diagram of a people number detection apparatus 300 based on a crowd image template according to an embodiment of the present disclosure. The people number detection apparatus 300 based on the crowd image template is applied to an electronic device, and the people number detection apparatus 300 based on the crowd image template includes an obtaining module 301, a first determining module 302, a second determining module 303, and a third determining module 304, as follows:
the acquisition module 301 is configured to acquire an image of a crowd to be counted, which needs to be counted;
a first determining module 302, configured to determine, from multiple candidate population template images, a candidate population template image that has the highest similarity with the to-be-detected population image;
a second determining module 303, configured to use the candidate population template image with the highest similarity as a target population template image;
and a third determining module 304, configured to determine the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image.
In one embodiment, the first determining module 302 may be configured to: inputting the crowd image to be detected into a crowd image similarity model to obtain the similarity between the crowd image to be detected and each candidate crowd template image; and determining the candidate group template image with the highest similarity with the image of the group to be detected according to the similarity.
In one embodiment, the obtaining module 301 may further be configured to: constructing virtual crowd images with different crowd distribution conditions; and taking the virtual crowd image as a crowd template image.
In one embodiment, the obtaining module 301 may further be configured to: acquiring rendering data corresponding to each constructed virtual crowd image; determining the number of people in the crowd template image according to rendering data; establishing a corresponding relation between the crowd template image and the number of people according to the crowd template image and the number of people in the crowd template image; and determining the number of people corresponding to the target crowd template image according to the corresponding relation.
In one embodiment, the third determination module 304 may be configured to: and acquiring the average value of the number of people corresponding to the target population template image and the reference number of people, and taking the average value as the number of people in the image of the population to be detected.
In one embodiment, the third determination module 304 may be configured to: and carrying out weighted summation processing on the number of people corresponding to the target crowd template image and the reference number of people according to a preset strategy to obtain a weighted sum value, and taking the weighted sum value as the number of people in the crowd image to be detected.
It should be noted that the people number detection apparatus based on the crowd image template provided in the embodiment of the present application and the people number detection method based on the crowd image template in the foregoing embodiment belong to the same concept, and the people number detection apparatus based on the crowd image template can implement any method provided in the embodiments of the people number detection method based on the crowd image template, and the specific implementation process thereof is detailed in the embodiments of the people number detection method based on the crowd image template, and is not described herein again.
Therefore, the people number detection device based on the crowd image template, which is provided by the embodiment of the application, acquires the crowd image to be counted, which needs to be counted, through the acquisition module 301, determines the candidate crowd template image with the highest similarity degree with the crowd image to be detected from the candidate crowd template images through the first determination module 302, determines the number of people in the crowd image to be detected through the second determination module 303, which takes the candidate crowd template image with the highest similarity degree as the target crowd template image, and determines the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image through the third determination module 304. The target crowd template image matched with the crowd image to be detected is directly acquired, the number of people of the crowd image to be detected is determined according to the number of people corresponding to the target crowd template image, and the number detection efficiency can be improved.
The embodiment of the application also provides the electronic equipment. The electronic device can be a smart phone, a tablet computer and the like. Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 400 comprises a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.
The processor 401 is a control center of the electronic device 400, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by running or calling a computer program stored in the memory 402 and calling data stored in the memory 402, thereby integrally monitoring the electronic device.
Memory 402 may be used to store computer programs and data. The memory 402 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 401 executes various functional applications and data processing by calling a computer program stored in the memory 402.
In this embodiment, the processor 401 in the electronic device 400 loads instructions corresponding to one or more processes of the computer program into the memory 402 according to the following steps, and the processor 401 runs the computer program stored in the memory 402, so as to implement various functions:
acquiring a to-be-counted crowd image needing to be counted;
determining a candidate crowd template image with the highest similarity degree with the image of the crowd to be detected from a plurality of candidate crowd template images;
taking the candidate crowd template image with the highest similarity as a target crowd template image;
and determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image.
In an implementation manner, please refer to fig. 8, and fig. 8 is a second structural schematic diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 400 further comprises: radio frequency circuit 403, display 404, control circuit 405, input unit 406, audio circuit 407, sensor 408, and power supply 409. The processor 401 is electrically connected to the radio frequency circuit 403, the display 404, the control circuit 405, the input unit 406, the audio circuit 407, the sensor 408, and the power source 409.
The radio frequency circuit 403 is used for transceiving radio frequency signals to communicate with a network device or other electronic devices through wireless communication.
The display screen 404 may be used to display information input by or provided to the user as well as various graphical user interfaces of the electronic device, which may be made up of images, text, icons, video, and any combination thereof.
The control circuit 405 is electrically connected to the display screen 404, and is configured to control the display screen 404 to display information.
The input unit 406 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. The input unit 406 may include a fingerprint recognition module.
The audio circuit 407 may provide an audio interface between the user and the electronic device through a speaker, a microphone. Wherein the audio circuit 407 comprises a microphone. The microphone is electrically connected to the processor 401. The microphone is used for receiving voice information input by a user.
The sensor 408 is used to collect external environmental information. The sensor 408 may include one or more of an ambient light sensor, an acceleration sensor, a gyroscope, and the like.
The power supply 409 is used to power the various components of the electronic device 400. In one embodiment, the power source 409 may be logically connected to the processor 401 through a power management system, so that the power management system may perform functions of managing charging, discharging, and power consumption.
Although not shown in the drawings, the electronic device 400 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.
In this embodiment, the processor 401 in the electronic device 400 loads instructions corresponding to one or more processes of the computer program into the memory 402 according to the following steps, and the processor 401 runs the computer program stored in the memory 402, so as to implement various functions:
acquiring a to-be-counted crowd image needing to be counted;
determining a candidate crowd template image with the highest similarity degree with the image of the crowd to be detected from a plurality of candidate crowd template images;
taking the candidate crowd template image with the highest similarity as a target crowd template image;
and determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image.
In one embodiment, when determining the candidate crowd template image with the highest similarity to the crowd image to be detected from the plurality of candidate crowd template images, the processor 401 may perform: inputting the crowd image to be detected into a crowd image similarity model to obtain the similarity between the crowd image to be detected and each candidate crowd template image; and determining the candidate crowd template image with the highest similarity with the crowd image to be detected according to the similarity.
In one embodiment, before performing determining, from a plurality of candidate population template images, a candidate population template image having a highest similarity to the image of the population to be detected, the processor 401 may further perform: acquiring rendering data corresponding to each constructed virtual crowd image; determining the number of people in the crowd template image according to rendering data; establishing a corresponding relation between the crowd template image and the number of people according to the crowd template image and the number of people in the crowd template image; and determining the number of people corresponding to the target crowd template image according to the corresponding relation.
In one embodiment, when determining the number of people in the image of the people to be detected according to the target number of people, the processor 401 may perform: inputting the image of the crowd to be detected into a people number detection model to obtain a reference people number corresponding to the image of the crowd to be detected; and determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image and the reference number of people.
In one embodiment, when determining the number of people in the to-be-detected people image according to the number of people corresponding to the target people template image and the reference number, the processor 401 may perform: and acquiring the average value of the number of people corresponding to the target population template image and the reference number of people, and taking the average value as the number of people in the image of the population to be detected.
In one embodiment, when determining the number of people in the to-be-detected people image according to the number of people corresponding to the target people template image and the reference number, the processor 401 may perform: and carrying out weighted summation processing on the number of people corresponding to the target crowd template image and the reference number of people according to a preset strategy to obtain a weighted sum value, and taking the weighted sum value as the number of people in the crowd image to be detected.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a processor, the computer executes the people number detection method based on the crowd image template according to any of the above embodiments.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, and the like.
In addition, the terms "first", "second", and "third", etc. in this application are used to distinguish different objects, and are not used to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules recited, but rather, some embodiments include additional steps or modules not recited, or inherent to such process, method, article, or apparatus.
The people number detection method based on the crowd image template provided by the embodiment of the application is described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A people number detection method based on a people group image template is characterized by comprising the following steps:
acquiring an image of a crowd to be counted, which needs to be counted;
determining a candidate crowd template image with the highest similarity degree with the image of the crowd to be detected from a plurality of candidate crowd template images;
taking the candidate crowd template image with the highest similarity as a target crowd template image;
and determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image.
2. The people number detection method based on the crowd image template as claimed in claim 1, wherein the determining the candidate crowd template image with the highest similarity degree with the crowd image to be detected from the plurality of candidate crowd template images comprises:
inputting the crowd image to be detected into a crowd image similarity model to obtain the similarity between the crowd image to be detected and each candidate crowd template image;
and determining the candidate group template image with the highest similarity with the image of the group to be detected according to the similarity.
3. The method as claimed in claim 1, further comprising, before determining the candidate crowd template image having the highest similarity with the crowd image to be detected from the plurality of candidate crowd template images:
constructing virtual crowd images with different crowd distribution conditions;
and taking the virtual crowd image as a crowd template image.
4. The method as claimed in claim 3, further comprising, before determining the number of people in the target people template image according to the number of people corresponding to the target people template image:
acquiring rendering data corresponding to each constructed virtual crowd image;
determining the number of people in the crowd template image according to rendering data;
establishing a corresponding relation between the crowd template image and the number of people according to the crowd template image and the number of people in the crowd template image;
and determining the number of people corresponding to the target crowd template image according to the corresponding relation.
5. The people number detection method based on the people number image template as claimed in claim 4, wherein the determining the number of people in the people number image to be detected according to the target number of people comprises:
inputting the image of the crowd to be detected into a people number detection model to obtain a reference people number corresponding to the image of the crowd to be detected;
and determining the number of people in the image of the group to be detected according to the number of people corresponding to the target group template image and the reference number of people.
6. The people number detection method based on the crowd image template as claimed in claim 5, wherein the determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image and the reference number of people comprises:
and acquiring the average value of the number of people corresponding to the target population template image and the reference number of people, and taking the average value as the number of people in the to-be-detected population image.
7. The people number detection method based on the people group image template as claimed in claim 5, wherein the determining the number of people in the people group image to be detected according to the number of people corresponding to the target people group template image and the reference number of people comprises:
and carrying out weighted summation processing on the number of people corresponding to the target crowd template image and the reference number of people according to a preset strategy to obtain a weighted sum value, and taking the weighted sum value as the number of people in the image of the crowd to be detected.
8. The utility model provides a number of people detection device based on crowd's image template which characterized in that includes:
the acquisition module is used for acquiring the image of the crowd to be counted, which needs to be counted;
the first determining module is used for determining a candidate group template image with the highest similarity degree with the to-be-detected group image from a plurality of candidate group template images;
the second determining module is used for taking the candidate crowd template image with the highest similarity as a target crowd template image;
and the third determining module is used for determining the number of people in the crowd image to be detected according to the number of people corresponding to the target crowd template image.
9. A computer-readable storage medium, on which a computer program is stored, which, when run on a processor, causes the computer to carry out the method for detecting a number of people based on a crowd image template according to any one of claims 1 to 7.
10. An electronic device comprising a processor and a memory, said memory storing a computer program, wherein said processor is configured to execute the method for detecting a number of people based on a crowd image template according to any one of claims 1 to 7 by calling said computer program.
CN202210699257.5A 2022-06-20 2022-06-20 People number detection method based on crowd image template Pending CN115205779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210699257.5A CN115205779A (en) 2022-06-20 2022-06-20 People number detection method based on crowd image template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210699257.5A CN115205779A (en) 2022-06-20 2022-06-20 People number detection method based on crowd image template

Publications (1)

Publication Number Publication Date
CN115205779A true CN115205779A (en) 2022-10-18

Family

ID=83576624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210699257.5A Pending CN115205779A (en) 2022-06-20 2022-06-20 People number detection method based on crowd image template

Country Status (1)

Country Link
CN (1) CN115205779A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116563371A (en) * 2023-03-28 2023-08-08 北京纳通医用机器人科技有限公司 Method, device, equipment and storage medium for determining key points

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116563371A (en) * 2023-03-28 2023-08-08 北京纳通医用机器人科技有限公司 Method, device, equipment and storage medium for determining key points

Similar Documents

Publication Publication Date Title
CN109086709B (en) Feature extraction model training method and device and storage medium
TWI773189B (en) Method of detecting object based on artificial intelligence, device, equipment and computer-readable storage medium
CN109299315B (en) Multimedia resource classification method and device, computer equipment and storage medium
US8750573B2 (en) Hand gesture detection
US8792722B2 (en) Hand gesture detection
CN110807361B (en) Human body identification method, device, computer equipment and storage medium
CN111178183B (en) Face detection method and related device
CN112150493B (en) Semantic guidance-based screen area detection method in natural scene
CN111325271B (en) Image classification method and device
CN106874826A (en) Face key point-tracking method and device
GB2555136A (en) A method for analysing media content
CN107679448A (en) Eyeball action-analysing method, device and storage medium
CN112381104A (en) Image identification method and device, computer equipment and storage medium
CN111079739A (en) Multi-scale attention feature detection method
CN110222572A (en) Tracking, device, electronic equipment and storage medium
CN111444826A (en) Video detection method and device, storage medium and computer equipment
CN103105924A (en) Man-machine interaction method and device
CN111028276A (en) Image alignment method and device, storage medium and electronic equipment
CN113869282A (en) Face recognition method, hyper-resolution model training method and related equipment
CN113065379B (en) Image detection method and device integrating image quality and electronic equipment
CN111709317A (en) Pedestrian re-identification method based on multi-scale features under saliency model
CN111507416A (en) Smoking behavior real-time detection method based on deep learning
CN115205779A (en) People number detection method based on crowd image template
CN111008589A (en) Face key point detection method, medium, device and computing equipment
CN112070035A (en) Target tracking method and device based on video stream and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination