CN111898578B

CN111898578B - Crowd density acquisition method and device and electronic equipment

Info

Publication number: CN111898578B
Application number: CN202010797785.5A
Authority: CN
Inventors: 王昌安
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2023-09-19
Anticipated expiration: 2040-08-10
Also published as: CN111898578A

Abstract

The embodiment of the application provides a crowd density acquisition method, a crowd density acquisition device and electronic equipment, and relates to the technical field of image processing, wherein the method comprises the steps of obtaining an initial crowd density map of an image to be detected, dividing the initial crowd density map according to at least two sizes, and obtaining a plurality of subareas for each size; carrying out quantization processing on the number of people in the plurality of subareas to obtain quantization processing results respectively corresponding to at least two sizes of the initial crowd density map; determining crowd density categories of the images to be detected according to quantization processing results corresponding to at least two sizes respectively; determining a target prediction model corresponding to the crowd density category from a plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected. Compared with a prediction model in the prior art, the embodiment of the application improves the specificity capability, and the output crowd density map has higher accuracy.

Description

Crowd density acquisition method and device and electronic equipment

Technical Field

The application relates to the technical field of image processing, in particular to a crowd density acquisition method, a crowd density acquisition device, electronic equipment and a computer program.

Background

Crowd density estimation is a technology for automatically estimating the total number of people in a given image, and plays an important role in the fields of people flow statistics, public safety, monitoring and the like.

The crowd density estimation algorithm at the current stage mainly utilizes a deep learning technology, extracts crowd density features rich in advanced semantic information from images through a convolutional neural network, and then carries out regression on crowd density graphs to obtain an estimation result of crowd density.

In the prior art, the estimated total number of people in an image is generally taken as a crowd density characteristic, and the main concept is that firstly, the average distance between each head point and N adjacent is calculated, and the average value of all heads in the image is taken as a measure of the crowd density in the image, but the crowd density is actually measured as a single scalar, so that the crowd density in the image cannot be accurately reflected.

Disclosure of Invention

Embodiments of the present invention provide a crowd density acquisition method, apparatus, electronic device, and computer program that overcome or at least partially solve the above-described problems.

In a first aspect, a method for obtaining crowd density is provided, the method comprising:

obtaining an initial crowd density map of an image to be detected, dividing the initial crowd density map according to at least two sizes, and obtaining a plurality of subareas according to each size;

Carrying out quantization processing on the number of people in the plurality of subareas to obtain quantization processing results respectively corresponding to at least two sizes of the initial crowd density map;

determining crowd density categories of the images to be detected according to quantization processing results corresponding to at least two sizes respectively;

determining a target prediction model corresponding to the crowd density category from a plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected.

Further, the quantization processing is performed on the number of people in the plurality of sub-areas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map, including:

for any size of subareas, determining the number of people in the subareas, and carrying out quantization processing on the number of people in the subareas to obtain a quantized value of the number of people in the subareas;

and carrying out normalization processing on the frequency of the quantized value of the number of people in the subarea to obtain a quantized processing result corresponding to any size of the initial crowd density map.

Further, the quantization processing is performed on the number of people in the subarea to obtain a quantized value of the number of people in the subarea, including:

determining a maximum number of people in a single sub-area from the number of people in each sub-area of any size;

And carrying out quantization processing on the number of people in each sub-area with the same size according to the maximum number of people in each sub-area, and obtaining a quantized value of the number of people in each sub-area.

Further, determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes, including:

obtaining crowd density characteristics according to quantization processing results respectively corresponding to at least two sizes;

and inputting the crowd density characteristics into a pre-constructed clustering model to obtain the crowd density category of the image to be detected output by the clustering model.

Further, inputting the image to be measured into a target prediction model to obtain a target crowd density map of the image to be measured, and then further comprising:

determining at least one prediction model which does not correspond to the crowd density category from a plurality of trained prediction models, and taking the at least one prediction model as an auxiliary prediction model;

inputting the image to be detected into an auxiliary prediction model to obtain an auxiliary crowd density map of the image to be detected;

and obtaining a final crowd density map of the image to be detected according to the target crowd density map and the auxiliary crowd density map.

Further, the training method for the prediction model corresponding to any group density category comprises the following steps:

acquiring a plurality of sample images with any group density category and a group density map of each sample image;

Training an initial neural network model by taking a sample image as a sample and taking a crowd density map of the sample image as a sample label until a loss function of the initial neural network model converges to obtain a prediction model corresponding to any crowd density type;

the size of the void ratio of the void convolution layer of the prediction model and the crowd density category matched with the prediction model are in a negative correlation relationship.

In a second aspect, there is provided a crowd density acquisition device, the device comprising:

the dividing module is used for obtaining an initial crowd density image of the image to be detected, dividing the initial crowd density image according to at least two sizes, and obtaining a plurality of subareas according to each size;

the quantization module is used for carrying out quantization processing on the number of people in the plurality of subareas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map respectively;

the category determining module is used for determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes;

the prediction module is used for determining a target prediction model corresponding to the crowd density category from a plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected.

Further, the quantization module includes:

the subarea quantification submodule is used for determining the number of people in the subarea according to any size of subareas, and quantifying the number of people in the subarea to obtain a quantified value of the number of people in the subarea;

the normalization processing sub-module is used for carrying out normalization processing on the frequency of the quantized value of the number of people in the sub-area to obtain a quantized processing result corresponding to any size of the initial crowd density map.

Further, the sub-region quantization sub-module includes a quantization value acquisition unit for performing quantization processing on the number of people in the sub-region to obtain a quantization value of the number of people in the sub-region, and the quantization value acquisition unit includes:

a maximum number acquisition subunit configured to determine a maximum number of persons in a single sub-area from the numbers of persons in the respective sub-areas of any one size;

the number quantization subunit is used for carrying out quantization processing on the number of people in each sub-area with the same size according to the maximum number of people in each sub-area, and obtaining a number quantization value in each sub-area.

Further, the category determination module includes:

the density characteristic obtaining submodule is used for obtaining crowd density characteristics according to quantization processing results which correspond to at least two sizes respectively;

And the clustering sub-module is used for inputting the crowd density characteristics into a pre-constructed clustering model to obtain the crowd density category of the image to be detected output by the clustering model.

Further, the crowd density obtaining device further includes:

the auxiliary model acquisition module is used for determining at least one prediction model which does not correspond to the crowd density category from a plurality of trained prediction models and taking the at least one prediction model as an auxiliary prediction model;

the auxiliary density map acquisition module is used for acquiring an auxiliary crowd density map of the image to be detected according to the input of the image to be detected to the auxiliary prediction model;

the final density map acquisition module is used for acquiring a final crowd density map of the image to be detected according to the target crowd density map and the auxiliary crowd density map.

Further, the crowd density obtaining device further includes a training module for training a prediction model corresponding to any crowd density category, and the training module includes:

the sample acquisition sub-module is used for acquiring a plurality of sample images with any group density category and a group density map of each sample image;

the training sub-module is used for training the initial neural network model by taking the sample image as a sample and taking the crowd density map of the sample image as a sample label until the loss function of the initial neural network model converges to obtain a prediction model corresponding to any crowd density type;

In a third aspect, an embodiment of the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as provided in the first aspect when the program is executed.

In a fourth aspect, an embodiment of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as provided by the first aspect.

In a fifth aspect, embodiments of the present invention provide a computer program comprising computer instructions stored in a computer readable storage medium, which when read from the computer readable storage medium by a processor of a computer device, cause the computer device to perform the steps of the method as provided in the first aspect.

According to the crowd density acquisition method, the crowd density acquisition device, the electronic equipment and the computer program provided by the embodiment of the invention, the initial crowd density map of the image to be detected is acquired, the initial crowd density map is divided into at least two sizes, and a plurality of subareas are obtained for each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of subareas to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map respectively, wherein the quantization processing results corresponding to at least two sizes are usually characteristic vectors of hundreds of dimensions, so that the crowd distribution situation in an image can be represented in a refined manner, further, the crowd density category of the image to be detected is obtained by utilizing the quantization processing results corresponding to at least two sizes respectively, a prediction model matched with the crowd density category of the image to be detected is selected to predict the crowd density map of the image to be detected, compared with the prediction model in the prior art, the specificity is improved, and the output crowd density map has higher accuracy.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 illustrates a schematic diagram of a hardware environment comprised of an image acquisition terminal and an electronic device;

FIG. 2 is a flow chart of one implementation of a method for obtaining a crowd density map as disclosed in an embodiment of the application;

FIG. 3 is a schematic diagram of a network structure of a crowd density map prediction model according to an embodiment of the application;

FIG. 4 is a flowchart of a method for obtaining a crowd density map according to another embodiment of the application;

fig. 5 is a schematic structural diagram of a crowd density map obtaining device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The application provides a crowd density acquisition method, a crowd density acquisition device, electronic equipment and a computer program, and aims to solve the technical problems in the prior art.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The crowd density map acquiring means may include crowd density map acquiring means operating in an electronic device and crowd density map acquiring means operating in a background server/platform.

The electronic device may be, but is not limited to, an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The electronic device and the background server/platform may be directly or indirectly connected by wired or wireless communication, and the application is not limited herein. In one example, the crowd density map acquisition device operating in the electronic device may be a client operating in a smartphone. The client may be an application client or a web client.

The crowd density map acquisition device running in the background server/platform can be a hardware component part of the server/platform, and can also be a functional module or a component.

The background server or the platform can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service.

Cloud computing (clouding) is a computing model that distributes computing tasks across a large pool of computers, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short, generally referred to as IaaS (Infrastructure as a Service, infrastructure as a service) platform) is established, in which multiple types of virtual resources are deployed for external clients to select for use.

According to the logic function division, a PaaS (Platform as a Service ) layer can be deployed on an IaaS (Infrastructure as a Service ) layer, and a SaaS (Software as a Service, software as a service) layer can be deployed above the PaaS layer, or the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, web container, etc. SaaS is a wide variety of business software such as web portals, sms mass senders, etc. Generally, saaS and PaaS are upper layers relative to IaaS.

In an alternative embodiment, the crowd density map obtaining method described above may be applied to the hardware environment formed by the image capturing terminal 11 and the electronic device 12 shown in fig. 1. As shown in fig. 1, the image pickup terminal 11 may be connected with the electronic device 12 by wireless. The wireless means includes, but is not limited to: wide area network, metropolitan area network, local area network, bluetooth, or NFC (Near Field Communication, short range wireless communication technology). The image acquisition terminal 11 may also be connected to the electronic device 12 by a wired connection.

The image acquisition terminal 11 may be used to monitor a crowd in a crowd activity scene and to acquire crowd images 111. The image acquisition terminal 11 may also send the acquired crowd image 111 to the electronic device 12.

The electronic device 12 may acquire the crowd density map 121 corresponding to the crowd image 111 based on the obtained crowd image 111 by using the crowd density map acquisition method provided by the present application. The electronic device 12 may predict potential safety hazards, such as trampling accidents, that may occur based on the crowd density map 121.

Alternatively, the electronic device 12 may count the number of people in the current acquired crowd image 111 based on the crowd density map 121, for example, if a black dot in the crowd density map 121 represents a person, 10 people in the crowd density map 121 shown in fig. 1.

Of course, the foregoing merely illustrates an alternative way of displaying the crowd density map 121, and other ways of displaying the crowd density map 121 are also possible, for example, a black dot in the crowd density map 121 may represent 10 people.

Referring next to fig. 2, fig. 2 is a flowchart of an implementation manner of the crowd density map obtaining method disclosed in the embodiment of the application. As shown in fig. 2, the method comprises the steps of:

s101, obtaining an initial crowd density map of an image to be detected, dividing the initial crowd density map according to at least two sizes, and obtaining a plurality of subareas according to each size.

Specifically, inputting an image to be detected into an initial prediction model, and obtaining an initial crowd density map of the image to be detected output by the initial prediction model; the initial prediction model is obtained by taking a plurality of sample images with at least two crowd density categories as samples and taking a crowd density map of each sample image as a sample label for training.

The initial prediction model in the embodiment of the application can be an existing single depth convolution neural network, a crowd density map is predicted by the existing single depth convolution neural network and is used as an initial crowd density map, the initial crowd density map is used for predicting the number of people at the corresponding position of each pixel point of the image to be detected in an actual scene, and the accuracy of the initial crowd density map is not particularly required, so that the number of people predicted by the initial crowd density map can be understood as the roughly estimated number of people.

The crowd density map generated by the initial prediction model in the embodiment of the application has the following principles:

for a certain head center point x _i This can be expressed as: in a drawing of the same size as the original, only the position x _i 1, the remaining positions are 0, delta (x-x _i ) The function representation, therefore, for a crowd image with N head labels, can be expressed as an H (x) function:

It can be noted that integrating the graphThe total number of people can be obtained by using a Gaussian kernel G _σ Convolving the graph to obtain a density graph d=g _σ * H (x), since the gaussian kernel is normalized, the total number of people can be obtained by integrating the density map D obtained after convolution.

According to the embodiment of the application, the second sample set can be constructed, and the second sample set comprises at least two types of crowd density sample images, so that an initial prediction model is trained on the sample images in the second sample set under the condition that the crowd density category is not considered, and it is understood that although the initial prediction model is lower in accuracy in crowd density map prediction of an image to be detected for a certain crowd density category than a single crowd density category prediction model, a more accurate crowd density map can be provided when a large number of images to be detected are faced, and the actual application requirement is met.

Aiming at the initial crowd density map of the image to be detected, the embodiment of the application can divide the initial crowd density map into at least two grid maps according to different sizes, and the sizes of the sub-areas in each grid map are the same. For example, the size of the initial crowd density map of the image to be measured is 128×128, if the initial crowd density map is gridded by 4*4, a grid map of 1024 sub-areas with a size of 4*4 can be obtained, if the initial crowd density map is gridded by 8×8, a grid map of 256 sub-areas with a size of 8×8 can be obtained, if the initial crowd density map is gridded by 16×16, a grid map of 64 sub-areas with a size of 16×16 can be obtained, and if the initial crowd density map is gridded by 32×32, a grid map of 16 sub-areas with a size of 32×32 can be obtained.

After the grid map of each size is obtained, the crowd density values of the sub-areas can be integrated, the number of people in each sub-area is obtained, and the density information of the grid map can be obtained by counting the number of people in all the sub-areas in one grid map.

It should be understood that, since the grid map is only divided into specific sizes on the initial crowd density map, the crowd number in the grid can be obtained by determining the crowd density value recorded in the initial crowd density map by the pixel points in the sub-area and integrating.

S102, carrying out quantization processing on the number of people in the plurality of subareas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map.

It should be noted that, at present, the existing method of taking the total number of people in an image as the crowd-intensive feature is often adopted, and the main concept is that firstly, the average distance between each head point and the N nearest neighbor is calculated, and the average value of all heads in the image is taken as the measure of the crowd density in the image, but through analysis, it can be found that the method actually quantifies the crowd density category into a single scalar, and cannot accurately reflect the crowd density category in the image.

In order to overcome the problems in the prior art, the embodiment of the application divides the initial crowd density map by at least two sizes respectively to obtain subareas corresponding to each size, wherein each subarea is a grid, and the quantification processing is performed on the number of people in each subarea with the same size to obtain the quantification processing result of the initial crowd density map with the size. The number of the sub-areas with different sizes is different because the number of the pixel points contained in the sub-areas with different sizes is different, and meanwhile, the number of the sub-areas contained in the grid images divided according to different sizes is also different, so that each grid image can obtain unique density information. The crowd density features formed in this way are typically feature vectors of several hundred dimensions, and can be used to finely characterize the crowd distribution in an image.

S103, determining crowd density categories of the images to be detected according to quantization processing results corresponding to at least two sizes respectively;

The crowd density category is a condition achieved by the crowd density in the image to be detected, and is also a result of qualitative analysis of the crowd density in the image to be detected.

According to the embodiment of the application, the crowd density characteristics of the image to be detected can be obtained according to the quantization processing results respectively corresponding to at least two sizes, the crowd density characteristics of a certain number of sample images are combined for clustering, and the crowd density category of the image to be detected is determined according to the clustering result of the image to be detected and the crowd density category of the sample images which are clustered into the same category with the image to be detected.

Specifically, inputting crowd density characteristics of an image to be detected into a crowd density classification model constructed in advance, and obtaining crowd density types of the image to be detected output by the crowd density classification model;

the method for constructing the crowd density classification model comprises the following steps:

collecting a plurality of sample images and obtaining crowd density characteristics of the plurality of sample images;

based on crowd density characteristics of a plurality of sample images, a crowd density classification model is constructed by adopting a clustering algorithm.

After the crowd density features of the image to be detected are obtained, the crowd density categories of the image to be detected can be obtained by inputting the crowd density features into a pre-trained clustering model.

The clustering model may be constructed according to sample data, where the sample data includes crowd density features corresponding to different crowd density categories, such as crowd density features corresponding to sparsity, crowd density features corresponding to moderate, and crowd density features corresponding to dense, and so on.

The clustering algorithm adopted by the clustering model is not particularly limited, and can be, for example, a K-Means clustering algorithm, a mean shift clustering algorithm, a Density-based clustering algorithm (DBSCAN, density-Based Spatial Clustering of Applications with Noise), a maximum expectation (EM, expectation maximization) clustering algorithm using a Gaussian mixture model (GMM, gaussian Mixture Model), a condensation hierarchical clustering algorithm and the like.

The process of constructing a cluster model according to the present application will be described below by taking the K-Means algorithm as an example:

1) Firstly, determining the category and the number of crowd density categories, and randomly initializing the center points of the categories, wherein the center points are the positions with the same vector length as each data point (crowd density characteristic);

2) Calculating the distance from each data point to the center point, and dividing the data point into which category from which center point the data point is nearest;

3) Recalculating the center point of each category and taking the center point as a new center point;

4) The above steps are repeated until the center of each category does not change much after each iteration. Alternatively, the center point may be initialized randomly a plurality of times, and then the one with the best operation result may be selected.

S104, determining a target prediction model corresponding to the crowd density category from the plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected.

Because the crowd distribution density in the actual scene changes greatly, when a single depth convolution network is adopted to train sample images of scenes with huge distribution differences, the scene performance in a single scale or density is often not ideal in order to be more uniform in all scenes, so that the accuracy of overall crowd density estimation is limited, that is, when the crowd density map of massive images to be measured is predicted in the prior art, the crowd density category of the images to be measured is firstly determined due to the fact that the images to be measured are directly input into the single depth convolution network, and the defect of low crowd density estimation accuracy exists, so that a foundation is laid for the follow-up selection of a prediction model matched with the crowd density category of the images to be measured, and therefore, when the crowd density map is predicted by utilizing the prediction model matched with the images to be measured, the crowd density map with higher accuracy can be output.

It should be understood that after qualitative description, a large number of sample images can be collected for each crowd density category in a targeted manner, and then a prediction model corresponding to the crowd density category can be trained for each crowd density category sample image, so that for any image to be tested, only the crowd density category of the image to be tested is determined, a prediction model matched with the image to be tested can be found out from a plurality of prediction models trained in advance.

According to the embodiment of the application, before the prediction model matched with the crowd density category of the image to be detected is obtained, at least two prediction models corresponding to different crowd density categories are required to be trained. After the crowd density type of the image to be detected is determined, a prediction model matched with the crowd density type of the image to be detected is obtained and used as a target prediction model.

The target prediction model takes a sample image in a first sample set (the crowd density category of the sample image in the first sample set is the same as the crowd density category of the image to be detected), takes the crowd density image of the sample image in the first sample set as a sample label to train the initial neural network model until the loss function of the initial neural network model converges, and a prediction model corresponding to the crowd density category of the image to be detected is obtained. That is, the prediction model matching the crowd density category of the image to be tested in step S104 means that the crowd density category of the training sample used in the training of the prediction model is the same as the crowd density category of the image to be tested. Because the training samples used by the target prediction model used in the embodiment of the application are sample images with the same crowd density category as the crowd density category of the image to be tested, the trained target prediction model can be used for predicting the crowd density map of the image to be tested with the same crowd density degree, and the model can be improved to be specialized in such an operation, so that the counting precision of the model on the image with a certain crowd density category is improved.

According to the embodiment of the application, the initial crowd density map of the image to be detected is obtained, the initial crowd density map is divided into at least two sizes, and a plurality of subareas are obtained for each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of subareas to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map respectively, wherein the quantization processing results corresponding to at least two sizes are usually characteristic vectors of hundreds of dimensions, can finely characterize the crowd distribution in an image, further obtains the crowd density category of the image to be detected by utilizing the quantization processing results corresponding to at least two sizes respectively, selects a prediction model matched with the crowd density category of the image to be detected to predict the crowd density map of the image to be detected, improves the specificity compared with the prediction model in the prior art, and has higher accuracy

The crowd density estimation method and the device can solve the problem of large-scale crowd density estimation and crowd density distribution change by applying the embodiment of the application to a crowd density map acquisition scene of massive images to be detected, collecting sample images of different crowd density categories in advance, and respectively training out a prediction model of the corresponding crowd density category.

Based on the above embodiments, the embodiments of the present application may also count the number of people based on the crowd density map output by the target prediction model.

The embodiment of the application adopts a crowd density map to predict potential safety hazards, and can accurately obtain the number of people contained in the image to be detected.

Based on the embodiments, the embodiment of the application can also obtain a heat map based on the finally obtained crowd density map, and the color of each pixel point in the heat map can be more striking along with the increase of the number of people in the actual scene corresponding to the pixel point. The density distribution of the crowd is visualized through the heat map, so that the judgment of potential safety hazard events is facilitated.

On the basis of the above embodiments, obtaining the crowd density category of the image to be detected output by the crowd density classification model, and then further includes: judging whether the crowd density of the image to be detected is abnormal according to the crowd density category of the image to be detected.

As can be seen from the foregoing embodiments, the crowd density category in the embodiments of the present application is a qualitative description of crowd density, for example, the crowd density category may be sparse, moderate, crowded, etc., so after determining the crowd density category of the image to be measured, whether the crowd density is abnormal may be directly determined according to the crowd density category, for example, when the crowd density category is crowded, the crowd density category is determined to be abnormal, and when the crowd density category is sparse or moderate, the crowd density category is determined to be normal.

On the basis of the above embodiments, as an optional embodiment, the quantization processing is performed on the number of people in the plurality of sub-areas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map, where the quantization processing results include:

s1021, for any size of subareas, determining the number of people in the subareas, and carrying out quantization processing on the number of people in the subareas to obtain a quantized value of the number of people in the subareas.

S1022, carrying out normalization processing on the frequency of the quantized value of the number of people in the subarea to obtain a quantized processing result corresponding to any size of the initial crowd density map.

In the embodiment of the application, for any size subarea, the number of people in the subarea is discrete but not continuous, and the number of people in the subarea is numerous, so that the situation that the number of people in each subarea is scattered can occur when the number of people in each subarea is counted.

Because the quantized values of the number of people in different subareas may be the same, the quantized values are not used as indexes for measuring the crowd density in the image, but the frequency of the quantized values is counted first, and then the frequency is normalized, so that a quantized processing result corresponding to each size of the initial crowd density map is obtained. The frequency of the quantized values, i.e. the number of times a quantized value appears in a grid, for example, there are 16 sub-areas in the grid, if the quantized value of the number of people having two total sub-areas is 3, then the frequency of a quantized value is obtained: 2, by counting the frequency of different quantized values in 16 sub-areas and then carrying out normalization processing, the obtained processing result can improve the precision of the model and the convergence rate when being input as the model.

On the basis of the above embodiments, as an optional embodiment, the quantization processing is performed on the number of people in the sub-area to obtain a quantized value of the number of people in the sub-area, including:

It should be understood that the maximum number of individual cells of an embodiment of the present application refers to the maximum number of individual sub-cells in a same size cell map.

Optionally, in the embodiment of the present application, the number of people in each sub-area in the grid chart is quantized according to the maximum number of people in the single sub-area, and the number of people in each sub-area in the grid chart is calculated according to the following quantization formula:

c＝int(N*log(p-q)/log(C _Max ))

wherein c represents a quantized value; n represents a quantization level; int represents a rounding operation; p is the number of people in the subarea; q is a quantization coefficient, and q is a positive number less than the minimum number of people in a single sub-region; c (C) _Max Is the maximum number of people in a single sub-zone. The values of N and q can be adjusted according to practical conditions, and optionally, N is 25, and q is 2.99.

On the basis of the foregoing embodiments, as an optional embodiment, determining, according to quantization processing results corresponding to at least two sizes, a crowd density category of an image to be measured includes:

inputting crowd density characteristics into a pre-constructed clustering model to obtain crowd density categories of images to be detected output by the clustering model

Specifically, the quantization processing results of the initial crowd density map divided according to at least two sizes may be ordered in a preset order, for example, from small to large or from large to small, and then the previous quantization processing result and the next quantization processing result are spliced according to the ordering, so as to obtain the crowd density characteristics of the image to be measured. For example, C8, C16, C32, and C64 respectively represent the quantization results of a grid map obtained by dividing an initial crowd density map by 8 x 8, 16 x 16, 32 x 32, and 64 x 64, wherein C8 is (0.15,0.40,0.45); c16 is (0.25,0.35,0.40); c32 is (0.15,0.15,0.3,0.4); c64 is (0.2,0.3,0.50), and the crowd density obtained after splicing is (0.15,0.40,0.45,0.25,0.35,0.40,0.15,0.15,0.3,0.4,0.2,0.3,0.50).

On the basis of the above embodiments, as an optional embodiment, after the prediction model set is constructed, when the crowd density map of the image to be measured needs to be predicted, the embodiment of the application may also select a plurality of prediction models which are not matched with the crowd density category of the image to be measured from the prediction model set according to the crowd density category of the image to be measured, predict the crowd density map of the image to be measured by using the non-matched prediction models, and finally obtain the final crowd density map by comprehensively considering the crowd density maps output by the matched prediction models and the non-matched prediction models. The following operation has the advantages that: the method can improve the accuracy of the crowd density map under the condition of inaccurate crowd density category.

On the basis of the above embodiments, as an optional embodiment, after inputting the image to be measured into the target prediction model to obtain the target crowd density map of the image to be measured, the method further includes:

It should be understood that, in the prediction model that does not correspond to the crowd density category of the image to be measured, the training sample is a sample image that is different from the crowd density category of the image to be measured.

Specifically, a weighted average crowd density map of an image to be detected is obtained by carrying out weighted average on a target crowd density map and an auxiliary crowd density map, and the weighted average crowd density map is taken as a final crowd density map of the image to be detected.

The training method for the prediction model corresponding to any group density category comprises the following steps:

and training the initial neural network model by taking the sample image as a sample and taking the crowd density map of the sample image as a sample label until the loss function of the initial neural network model converges to obtain a prediction model corresponding to any crowd density type.

Referring to fig. 3, fig. 3 is a schematic diagram of a network structure of a prediction model according to an embodiment of the present application, where the prediction model according to the embodiment of the present application is a CSRNet network model, and the CSRNet network model is mainly divided into a front end network and a back end network, and adopts a VGG-16 network with a full connection layer removed as the front end network of the CSRNet, and a VGG-16 network with a full connection layer removed, and for a perception domain with the same size, the smaller the convolution kernel, the more the number of convolution layers is better than those with a larger convolution kernel and a smaller number of convolution layers. In order to balance accuracy and resource overhead, the VGG-16 network of the embodiment of the application adopts a combination of 10 layers of convolution layers and 3 layers of pooling layers (max-pooling), specifically, the VGG16 network sequentially comprises two first convolution layers, a first pooling layer, two second convolution layers, a second pooling layer, three third convolution layers, a third pooling layer and three fourth convolution layers along an operation method, wherein the first convolution layer is represented as 3-64-1 in fig. 3, the meaning is that the convolution kernel size is 3*3, the number of channels is 64, the void ratio is 1, the like, the convolution kernel size of the second convolution layer is 3*3, the number of channels is 128, the void ratio is 1, the convolution kernel size of the third convolution layer is 3*3, the void ratio is 1, the convolution kernel size of the fourth convolution layer is 3*3, the number of channels is 512, and the void ratio is 1. The back end network adopts six layers of cavity convolution layers with the same cavity rate (alpha is positive number), and finally adopts a layer of 1 multiplied by 1 common convolution layer to output results, the cavity convolution layers can increase the receptive field of the convolution kernel under the premise of not increasing the calculated amount, and the back end network specifically comprises 3 first cavity convolution layers, 1 second cavity convolution layer, 1 third cavity convolution layer, 1 fourth cavity convolution layer and 1 full-connection part, wherein the convolution kernel size of the first cavity convolution layer is 3*3, the channel number is 512, the convolution kernel size of the second cavity convolution layer is 3*3, the channel number is 256, the convolution kernel size of the second cavity convolution layer is 3*3, the channel number is 128, the convolution kernel size of the fourth cavity convolution layer is 3*3, and the channel number is 64.

As a CSRnet network model of the end-to-end network, model training can be carried out by adopting the most direct method: the 10 convolutional layers in the front-end network are from the VGG-16 that has been trained, so only fine-tuning training is needed. Gaussian initialization with standard deviation of 0.01 is used for parameters of other convolutional layers. During training, the learning rate of random gradient descent is fixed at 1e-6. The distance between the generated density map and the true value is measured by using Euclidean distance. Loss function L _reg The definition is as follows:

where N represents the total number of pixels in the training image,is the density true value of the ith pixel point, z _i And optimizing the prediction model to obtain the density prediction value of the ith pixel point, so that the crowd density map predicted by the prediction model is finally as close as possible to the real crowd density map.

On the basis of the above embodiments, when the prediction model corresponding to any crowd density category is trained, the magnitude of the void ratio of the void convolution layer of the prediction model and the crowd density category corresponding to the prediction model are in a negative correlation. That is, for a prediction model using a sample image of a high population density class as a training sample, a smaller void ratio α is used in the void convolution layer; for a prediction model using a sample image with low crowd density as a training sample, a large void rate alpha is used in a void convolution layer.

Fig. 4 is a flow chart of a method for obtaining a crowd density map according to another embodiment of the application, as shown in fig. 4, the method includes:

s20, acquiring a second sample set, wherein the second sample set comprises sample images of N crowd density categories, and a sample subset is constructed according to the sample images of each crowd density category, so that N sample subsets are obtained, and N is an integer not less than 2;

s21, respectively training a prediction model corresponding to each crowd density category according to N sample subsets in the second sample set to obtain N prediction models;

s22, constructing a prediction model set, wherein the prediction model set comprises N prediction models;

s23, determining crowd density categories of images to be detected;

s24, according to the crowd density category of the image to be detected, obtaining a prediction model matched with the crowd density category of the image to be detected from a prediction model set, and taking the prediction model as a target prediction model;

s25, inputting the image to be detected into a target prediction model, and obtaining a target crowd density map of the image to be detected output by the target prediction model.

Fig. 5 is a schematic structural diagram of a crowd density acquisition device according to an embodiment of the application, and as shown in fig. 5, the crowd density acquisition device includes: the partitioning module 101, the quantization module 102, the category determination module 103 and the prediction module 104, specifically:

The dividing module 101 is configured to obtain an initial crowd density map of an image to be measured, divide the initial crowd density map according to at least two sizes, and obtain a plurality of sub-areas for each size;

the quantization module 102 is configured to perform quantization processing on the number of people in the multiple sub-areas, so as to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map respectively;

the category determining module 103 is configured to determine a crowd density category of the image to be detected according to quantization processing results corresponding to at least two sizes respectively;

a prediction module 104 for determining a target prediction model corresponding to the crowd density category from the plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected

The embodiment of the present invention provides a crowd density acquisition device, which specifically executes the flow of the embodiment of the method, and specifically please refer to the details of the embodiment of the crowd density acquisition method, which are not described herein. According to the crowd density acquisition device provided by the embodiment of the invention, the initial crowd density map of the image to be detected is acquired, the initial crowd density map is divided into at least two sizes, and a plurality of subareas are obtained for each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of subareas to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map respectively, wherein the quantization processing results corresponding to at least two sizes are usually characteristic vectors of hundreds of dimensions, so that the crowd distribution situation in an image can be represented in a refined manner, further, the crowd density category of the image to be detected is obtained by utilizing the quantization processing results corresponding to at least two sizes respectively, a prediction model matched with the crowd density category of the image to be detected is selected to predict the crowd density map of the image to be detected, compared with the prediction model in the prior art, the specificity is improved, and the output crowd density map has higher accuracy.

On the basis of the above embodiments, as an alternative embodiment, the quantization module includes:

On the basis of the above embodiments, as an optional embodiment, the sub-region quantization submodule includes a quantization value acquisition unit for performing quantization processing on the number of people in the sub-region to obtain a quantization value of the number of people in the sub-region, the quantization value acquisition unit including:

On the basis of the above embodiments, as an alternative embodiment, the category determining module includes:

On the basis of the above embodiments, as an optional embodiment, the crowd density obtaining device further includes:

the auxiliary density map acquisition module is used for inputting the image to be detected into the auxiliary prediction model to obtain an auxiliary crowd density map of the image to be detected;

On the basis of the foregoing embodiments, as an optional embodiment, the crowd density obtaining device further includes a training module for training a prediction model corresponding to any crowd density category, where the training module includes:

The embodiment of the application provides electronic equipment, which comprises: a memory and a processor; at least one program stored in the memory for execution by the processor, which, when executed by the processor, performs: dividing the initial crowd density map by at least two sizes by acquiring the initial crowd density map of the image to be detected, and obtaining a plurality of subareas for each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of subareas to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map respectively, wherein the quantization processing results corresponding to at least two sizes are usually characteristic vectors of hundreds of dimensions, so that the crowd distribution situation in an image can be represented in a refined manner, further, the crowd density category of the image to be detected is obtained by utilizing the quantization processing results corresponding to at least two sizes respectively, a prediction model matched with the crowd density category of the image to be detected is selected to predict the crowd density map of the image to be detected, compared with the prediction model in the prior art, the specificity is improved, and the output crowd density map has higher accuracy.

In an alternative embodiment, an electronic device is provided, as shown in fig. 6, the electronic device 4000 shown in fig. 6 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may also include a transceiver 4004. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application program codes for executing the inventive arrangements, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute application program codes stored in the memory 4003 to realize what is shown in the foregoing method embodiment.

Embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the prior art, the method has the advantages that the initial crowd density map of the image to be detected is obtained, the initial crowd density map is divided into at least two sizes, and a plurality of subareas are obtained for each size; the method comprises the steps of carrying out quantization processing on the number of people in a plurality of subareas to obtain quantization processing results corresponding to at least two sizes of an initial crowd density map respectively, wherein the quantization processing results corresponding to at least two sizes are usually characteristic vectors of hundreds of dimensions, so that the crowd distribution situation in an image can be represented in a refined manner, further, the crowd density category of the image to be detected is obtained by utilizing the quantization processing results corresponding to at least two sizes respectively, a prediction model matched with the crowd density category of the image to be detected is selected to predict the crowd density map of the image to be detected, compared with the prediction model in the prior art, the specificity is improved, and the output crowd density map has higher accuracy.

Embodiments of the present application provide a computer program comprising computer instructions stored in a computer-readable storage medium, which when read from the computer-readable storage medium by a processor of a computer device, cause the computer device to perform as shown in the foregoing method embodiments.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations should and are intended to be comprehended within the scope of the present invention.

Claims

1. The crowd density acquisition method is characterized by comprising the following steps:

obtaining an initial crowd density map of an image to be detected, dividing the initial crowd density map according to at least two sizes, and obtaining a plurality of subareas for each size;

determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes;

determining a target prediction model corresponding to the crowd density category from a plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected;

the determining the crowd density category of the image to be detected according to the quantization processing results respectively corresponding to the at least two sizes includes:

Obtaining crowd density characteristics according to the quantization processing results respectively corresponding to the at least two sizes;

and inputting the crowd density features into a pre-constructed clustering model to obtain the crowd density category of the image to be detected, which is output by the clustering model.

2. The method of claim 1, wherein the performing quantization processing on the number of people in the plurality of sub-areas to obtain quantization processing results corresponding to at least two sizes of the initial crowd density map respectively includes:

for any size subarea, determining the number of people in the subarea, and carrying out quantization processing on the number of people in the subarea to obtain a quantized value of the number of people in the subarea;

3. The method of claim 2, wherein the quantifying the number of people in the sub-area to obtain the quantized value of the number of people in the sub-area comprises:

determining a maximum number of people in a single sub-area from the number of people in each sub-area of any one size;

And carrying out quantization processing on the number of people in each sub-area with the same size according to the maximum number of people in the single sub-area, and obtaining a quantized value of the number of people in each sub-area.

4. The crowd density acquisition method according to claim 1, wherein the inputting the image to be measured to the target prediction model obtains a target crowd density map of the image to be measured, and further comprising:

determining at least one prediction model which does not correspond to the crowd density category from a plurality of trained prediction models and taking the at least one prediction model as an auxiliary prediction model;

inputting the image to be detected into the auxiliary prediction model to obtain an auxiliary crowd density map of the image to be detected;

5. The crowd density acquisition method according to claim 1 or 4, characterized in that the training method for the prediction model corresponding to any crowd density category comprises:

training an initial neural network model by taking the sample image as a sample and taking a crowd density map of the sample image as a sample label until a loss function of the initial neural network model converges to obtain a prediction model corresponding to any crowd density type;

And the size of the void ratio of the void convolution layer of the prediction model is in negative correlation with the crowd density category matched with the prediction model.

6. An apparatus for obtaining crowd density, comprising:

the prediction module is used for determining a target prediction model corresponding to the crowd density category from a plurality of trained prediction models, and inputting the image to be detected into the target prediction model to obtain a target crowd density map of the image to be detected;

wherein the category determination module comprises:

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the crowd density acquisition method according to any one of claims 1 to 5 when the program is executed by the processor.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the crowd density acquisition method according to any one of claims 1 to 5.