CN113822111B

CN113822111B - Crowd detection model training method and device and crowd counting method and device

Info

Publication number: CN113822111B
Application number: CN202110067279.5A
Authority: CN
Inventors: 谷爱国
Original assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Current assignee: Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2024-05-24
Anticipated expiration: 2041-01-19
Also published as: CN113822111A

Abstract

The application discloses a crowd detection model training method and device and a crowd counting method and device, wherein the model training method comprises the following steps: acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames; training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises: detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame; generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame; based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture. By adopting the application, the crowd counting accuracy can be improved.

Description

Crowd detection model training method and device and crowd counting method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a crowd detection model training method and apparatus, and a crowd counting method and apparatus.

Background

Crowd counting is an important computer vision technique for security. In intelligent security field, unmanned inspection vehicle can effectively judge crowd gathering condition through crowd count, makes early warning in advance, prevents abnormal behavior's appearance.

Head detection is a common crowd counting method, and the method calculates the crowd number by identifying heads in the crowd.

Disclosure of Invention

Therefore, the main objective of the present invention is to provide a crowd detection model training method and apparatus, and a crowd counting method and apparatus, which can improve crowd counting accuracy.

In order to achieve the above purpose, the technical solution provided by the embodiment of the present invention is as follows:

a crowd detection model training method, the method comprising:

Acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames;

Training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises:

Detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame;

generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame;

Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture.

In one embodiment, the detecting the head of the person and the five sense organs in the sample picture, and obtaining the head candidate detection frame and the five sense organs candidate detection frame includes:

Detecting the head in the sample picture by using a pre-trained head detection model to obtain a head candidate detection frame;

obtaining a sub-graph of the corresponding head based on the head candidate detection frame;

and detecting each five sense organs in the subgraph by using a five sense organ detection model to obtain the five sense organ candidate detection frame.

In one embodiment, the generating the attention feature vector of the respective header includes:

extracting a corresponding head subarea feature matrix based on a first head candidate detection frame by using a first region of interest extraction layer of the heuristic attention weighting network;

Performing global average sampling on the head sub-region feature matrix by using a first global pooling layer of the heuristic attention weighting network to obtain a corresponding head average feature vector;

Extracting a corresponding five-element subarea feature matrix based on each of the five-element candidate detection frames in the first head candidate detection frame by using a second region-of-interest extraction layer of the heuristic attention weighting network;

using a second global pooling layer of the heuristic attention weighting network to average sample each feature matrix of the five sense organs subareas to obtain average feature vectors of the corresponding five sense organs;

calculating an attention weight vector of each of the five sense organs in the corresponding head based on the head average feature vector and the average feature vector of the corresponding five sense organs;

And respectively carrying out point multiplication on the head average feature vector and the corresponding attention weight vector of each five sense organs, and summing the point multiplication results to obtain the attention feature vector of the head corresponding to the first head candidate detection frame.

In one embodiment, said calculating the attention weight vector for each of said five sense organs in the respective head comprises:

If the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is multiplied by the head average feature vector point to obtain a corresponding attention weight vector;

if the average feature vector is not present in the five sense organs, the corresponding attention weight vector is zero.

In one embodiment, the adjusting parameters of the crowd detection model includes:

Parameters in the head detection model, the five sense organs detection model, the heuristic attention weighting network, and the classification network are adjusted.

In one embodiment, the five sense organs include:

Left eye, right eye, left ear, right ear and mouth.

A population counting method, comprising:

acquiring a target detection picture;

detecting the head of a person in the target detection picture based on a crowd detection model, and counting the detected head to obtain the number of people in the target detection picture;

wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is trained by adopting any crowd detection model training method as described above in advance.

A crowd detection model training device, comprising:

the sample data acquisition module is used for acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames;

the model training module is used for training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises:

A population counting apparatus comprising:

The detection target acquisition module is used for acquiring a target detection picture;

the head detection module is used for detecting the head in the target detection picture based on the crowd detection model, and counting the detected head to obtain the number of people in the target detection picture; wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is obtained by training by adopting any crowd detection model training method.

A crowd detection model training device comprising a processor and a memory;

The memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method of any one of claims 1 to 6.

A computer readable storage medium having stored therein computer readable instructions for performing a crowd detection model training method as described above.

The embodiment of the invention also provides crowd counting equipment which comprises a processor and a memory;

the memory has stored therein an application executable by the processor for causing the processor to perform the population count method as described above.

Embodiments of the present invention also provide a computer readable storage medium having stored therein computer readable instructions for performing the population counting method as described above.

As can be seen from the above technical solutions, in the training process of the crowd detection model by using the sample images, the model training method and apparatus, and the crowd counting method and apparatus provided by the embodiments of the present invention introduce five sense organs detection based on the head detection, and comprehensively process the results of the head detection and the five sense organs detection by using a heuristic attention weighting mechanism, so as to generate the attention feature vector of each head detected by the head. Therefore, the heuristic attention weighting mechanism is adopted by the results of the five sense organs detection, so that the difference between the heads of the people and other similar objects in appearance is improved, the accuracy of attention feature vectors input into a classification network can be improved, the head of the people can be mistakenly detected in the human head detection results is screened out, and the detection accuracy of a crowd detection model can be improved. Correspondingly, the accuracy of crowd counting by using the crowd detection model is also improved.

Drawings

FIG. 1 is a schematic flow chart of a method according to a first embodiment of the invention;

FIG. 2 is a schematic diagram of a crowd detection model network according to an embodiment of the invention;

FIG. 3 is a flow chart of a second embodiment of the present invention;

FIG. 4 is a schematic diagram of a device according to a third embodiment of the present invention;

fig. 5 is a schematic diagram of a device structure according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and the embodiments, in order to make the objects, technical solutions and advantages of the present invention more apparent.

The inventor finds that the existing scheme for counting people by using head detection exists in the process of realizing the invention: and the counting error is large. Through serious research analysis, the specific cause of the problem is found as follows:

The existing human head detection scheme is realized based on a general target detection framework. In the scheme, the target features are extracted first, and then the category and the position of the target are obtained through classification and regression. In an actual application scene, angles of people relative to the camera are different, and feature variability of heads of people under different angles is large, for example, variability of a front face and a rear brain scoop is large. The presence of such large variability makes it easy for similar objects that differ less from the human head characteristics to be misdetected as human heads. Because, in a real scene, it is unavoidable that: at a certain angle, the difference in the characteristics of the head of a person from other objects is smaller than the difference in the characteristics of the head of a person at a different angle. For example, the head of a plush toy is very little different from the area of a human hindbrain scoop. Thus, the head of the plush toy is easily identified as the head of a person by the existing human head detection technology. Therefore, the problem of false detection easily occurs in the existing human head detection scheme, so that the crowd counting error is relatively large.

Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention, as shown in fig. 1, the model training method implemented by this embodiment mainly includes:

step 101, acquiring a sample data set.

Wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames.

In practical application, a person skilled in the art can select the five sense organs to be detected according to the actual needs. Preferably, in order to effectively screen out false detection results of human head detection, the five sense organs to be detected may include: left eye, right eye, left ear, right ear and mouth. In practical application, a person skilled in the art can set the set of five sense organs to be detected according to actual needs, so long as ensuring: regardless of the shooting angle, the image of the head of the person can contain at least one five sense organs in the set of five sense organs. Therefore, the head of the person with false detection can be screened out based on the five sense organs, so that the detection accuracy is improved.

In this step, the head and the preset five sense organs in each sample picture need to be marked with a detection frame identifier, so that when the model is trained, the model parameters are adjusted based on the detection frame identifier in the sample picture and the detection result output by the model.

And 102, training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model.

Based on sample pictures in a sample data set, training the crowd detection model can be specifically achieved by the following steps:

Generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame of each head;

Here, the specific method for adjusting the parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture is known to those skilled in the art, and will not be described herein.

According to the training method, not only the head of a person in a sample picture, but also the five sense organs in the head are detected, and the object of the head of the person, which is misjudged in the head detection result, can be effectively screened out by utilizing the detection result of the five sense organs and the heuristic attention weighting network, so that the accuracy of the head attention feature vector input to the classification network is effectively improved, the accuracy of the head identification result input by the classification network is improved, and the detection accuracy of the crowd detection model is further improved.

In addition, in the training method, the accuracy of classification is improved by introducing a heuristic attention weighting mechanism, so that after the attention feature vector of the head of the person is generated, the attention feature vector is only input into a classification network of a model to identify the authenticity of the head of the person, and the detection frame is not required to be finely tuned through regression processing in order to improve the detection accuracy like the existing head detection method, therefore, compared with the existing head detection method, the detection speed can be effectively improved.

The classification network is used for identifying the authenticity of the head of the corresponding person based on the attention characteristic vector of each head. The specific structure may be implemented using an existing classifier, for example, may include two fully connected layers and a softmax activation function, but is not limited thereto, and may be implemented using one fully connected layer or multiple fully connected layers.

In one embodiment, in the above model training method, the head and the five sense organs in the sample picture may be detected by the following method to obtain a head candidate detection frame and a five sense organs candidate detection frame:

and a step a1 of detecting the head in the sample picture by utilizing a pre-trained head detection model to obtain a head candidate detection frame.

In this step, the detection frame of each head in the picture detected by the head detection model is used as a head candidate detection frame to verify the authenticity in the subsequent step.

Here, the head detection model is a model that detects the head of a person in a picture.

And a step a2, obtaining a sub-graph of the corresponding head based on the head candidate detection frame.

In the step, the image in the head candidate detection frame is taken as a sub-image of the corresponding head, so that the five sense organs in the sub-image are identified based on the sub-image in the subsequent step, and the sub-region feature image of each sense organ is obtained.

And a3, detecting each five sense organs in the subgraph by using a five sense organ detection model to obtain the five sense organ candidate detection frame.

In this step, each preset five sense organs in the head subgraph is detected, and the detected detection frame is used as a candidate detection frame of the corresponding five sense organs. For example, if the five sense organs to be detected include left eye, right eye, left ear, right ear and mouth, then this step would require detecting these five sense organs from the subgraph, resulting in a left eye detection frame, a right eye detection frame, a left ear detection frame, a right ear detection frame and a mouth detection frame.

It should be noted that, due to different angles of shooting in practical applications, it is possible that one head sub-image cannot include all images of preset five sense organs, that is, there may be no detection frame of part of preset five sense organs in the sub-image.

In the above method, the head detection model and the five sense organs detection model may be implemented by using existing target detection methods, for example, may be implemented by using a regional candidate network (RPN).

In one embodiment, in the model training method, for each detected head, the generating the attention feature vector of the head may include:

Step b1, extracting a corresponding head subarea feature matrix based on a first head candidate detection frame by using a first interesting region extraction layer (ROI Pooling) of the heuristic attention weighting network.

In this step, for each head candidate detection frame detected in step a1, a corresponding head sub-region feature matrix (i.e., a head sub-region feature map) is extracted based on the head candidate detection frame, so as to obtain a head average feature vector of a corresponding head. The first head candidate detection frame represents any of the head candidate detection frames detected in step a 1.

And b2, performing global average sampling on the head subarea feature matrix by using a first global pooling layer (Global Pooling) of the heuristic attention weighting network to obtain a corresponding head average feature vector.

And b3, extracting a corresponding five-sense organ subarea feature matrix based on each five-sense organ candidate detection frame in the first head candidate detection frame by utilizing a second interest area extraction layer of the heuristic attention weighting network.

In this step, the feature matrix of the facial feature sub-region of each preset facial feature in the first head candidate detection frame is extracted, and if a candidate detection frame does not exist in a certain facial feature, the feature matrix of the corresponding facial feature sub-region does not exist.

And b4, carrying out average sampling on each facial feature subarea feature matrix by utilizing a second global pooling layer of the heuristic attention weighting network to obtain an average feature vector of the corresponding facial feature.

In this step, the feature matrix of the facial feature sub-region of each facial feature in the first head candidate detection frame is sampled averagely to obtain an average feature vector of the corresponding facial feature, so that attention weighting processing is performed based on the average feature vector to screen out the features of the head of the person being misdetected in the head detection.

And b5, calculating the attention weight vector of each five sense organs in the head corresponding to the first head candidate detection frame based on the head average feature vector and the average feature vector of the corresponding five sense organs.

In one embodiment, the method can be specifically implemented as followsCalculating the attention weight vector of each five sense organs in the corresponding head, wherein w _i represents the attention weight vector of the five sense organs i, m _i represents the average feature vector of the five sense organs i, h represents the head average feature vector, and w _i has the same dimension with h.

In the above calculation method, if the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is multiplied by the head average feature vector point to obtain the corresponding attention weight vector. If the average feature vector is not present in the five sense organs, the corresponding attention weight vector is zero. Thus, for an object that is misdetected as a person's head, since the five sense elements detection box is not detected in its subgraph, the attention weight vectors of all the five sense elements corresponding thereto are zero.

And b6, respectively carrying out point multiplication on the head average feature vector and the corresponding attention weight vector of each five sense organs, and summing the point multiplication results to obtain the attention feature vector of the head corresponding to the first head candidate detection frame.

Here, as described in the above steps, the attention weight vector of the five sense organs of the object similar to the human head will be zero, and thus, the zero vector will be zero vector after dot-multiplying the head average feature vector. In this way, the attention feature vector of the object similar to the head of the person is zero vector, so that the difference between the head of the person and other objects similar to the appearance is improved, and therefore, the object of the misdetected head of the person can be effectively screened out by utilizing the step b 6.

In one embodiment, in the model training method, when the parameters of the crowd detection model are adjusted according to the result output by the classification network, the parameters of the head detection model, the five sense organs detection model, the heuristic attention weighting network and the classification network in the model are optimized and adjusted. The specific adjustment methods are known to those skilled in the art and will not be described in detail herein.

In one embodiment, training in heuristic attention weighting networks as well as classification networks may be optimized by random gradient descent methods using cross entropy loss functions, but is not limited thereto.

In order to facilitate clear understanding of the crowd detection model structure provided by the embodiment of the invention. Fig. 2 shows a network structure diagram of the crowd detection model obtained based on the model training method. As shown in fig. 2, the model includes a head detection model, a five sense organs detection model, a heuristic attention weighting network, and a classification network. In the network structure example, the head detection model and the five sense organs detection model are both implemented by adopting RPN.

Based on the above embodiment of the model training method, the embodiment of the present invention further provides a crowd counting method, as shown in fig. 3, where the crowd counting method includes:

Step 301, obtaining a target detection picture.

Step 302, detecting the head of the person in the target detection picture based on the crowd detection model, and counting the detected head to obtain the number of the person in the target detection picture.

Wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is obtained by training the crowd detection model training method in advance.

In step 302, for each head detected by the crowd detection model, crowd counting is performed according to the confidence coefficient corresponding to the head, i.e. the head with the statistical confidence coefficient greater than the preset threshold. The specific calculation method of the confidence coefficient of the detection result can be realized by adopting the existing method.

As described in the above analysis, in the crowd detection model used in the step, since the five sense organs detection means are introduced and the heuristic attention mechanism is combined, the false detection result of the head detection can be effectively screened out, so that the detection accuracy of the crowd detection model can be ensured. Therefore, in step 302, the crowd detection model obtained by training in the first embodiment of the present invention is used to detect the head of the person in the target detection picture, and the accuracy of crowd detection can be improved by counting according to the detection result.

Here, the threshold value is used to define the constraint that the detected head of the person refers to the count, and a suitable value may be set by a person skilled in the art.

Corresponding to the above embodiment of the model training method, the embodiment of the present invention further provides a model training device, as shown in fig. 4, where the device includes:

A sample data acquisition module 401, configured to acquire a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames.

The model training module 402 is configured to train a crowd detection model that is built in advance by using the sample data set to obtain a target crowd detection model; wherein the training comprises:

Corresponding to the above embodiment of the crowd counting method, the embodiment of the present invention further provides a crowd counting device, as shown in fig. 5, where the crowd counting device includes:

A detection target obtaining module 501, configured to obtain a target detection picture;

The head detection module 502 is configured to detect the head of a person in the target detection picture based on a crowd detection model, and count the detected head to obtain the number of persons in the target detection picture; wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is obtained by training by adopting the crowd detection model training method.

According to the embodiment, in the model training method, in the process of training the crowd detection model by using the sample pictures, five sense organs are detected on the basis of head detection, and the results of the head detection and the five sense organs are comprehensively processed by using a heuristic attention weighting mechanism, so that attention feature vectors of each head detected by the head are generated. Therefore, the difference between the head of the person and other similar objects in appearance can be improved by utilizing the result of the five sense organs detection and adopting a heuristic attention weighting mechanism, so that the accuracy of attention feature vectors input into a classification network can be improved, the false detection result of the head of the person detection can be screened out, and the detection accuracy of a trained crowd detection model can be improved. Correspondingly, the accuracy of crowd counting by using the crowd detection model is also improved.

The crowd detection model provided by the embodiment of the invention can effectively overcome the influence of shooting angles on detection accuracy, so that the crowd counting method realized based on the crowd detection model has wider application scenes, and is suitable for various scenes, such as crowd-intensive scenes, crowd-sparse scenes, angle change diversity and the like.

Corresponding to the crowd detection model training method embodiment, the embodiment of the invention also provides crowd detection model training equipment, which comprises a processor and a memory;

the memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method as described above.

Embodiments of the present invention also provide a computer readable storage medium having stored therein computer readable instructions for performing the crowd detection model training method as described above.

Corresponding to the crowd counting method embodiment, the embodiment of the invention also provides crowd counting equipment which comprises a processor and a memory;

In the above embodiments, the memory may be embodied as various storage media such as an electrically erasable programmable read-only memory (EEPROM), a Flash memory (Flash memory), a programmable read-only memory (PROM), and the like. A processor may be implemented to include one or more central processors or one or more field programmable gate arrays, where the field programmable gate arrays integrate one or more central processor cores. In particular, the central processor or central processor core may be implemented as a CPU or MCU.

It should be noted that not all the steps and modules in the above processes and the structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted as required. The division of the modules is merely for convenience of description and the division of functions adopted in the embodiments, and in actual implementation, one module may be implemented by a plurality of modules, and functions of a plurality of modules may be implemented by the same module, and the modules may be located in the same device or different devices.

The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include specially designed permanent circuits or logic devices (e.g., special purpose processors such as FPGAs or ASiC) for performing a particular operation. A hardware module may also include programmable logic devices or circuits (e.g., including a general purpose processor or other programmable processor) temporarily configured by software for performing particular operations. As regards implementation of the hardware modules in a mechanical manner, either by dedicated permanent circuits or by circuits that are temporarily configured (e.g. by software), this may be determined by cost and time considerations.

Storage medium implementations for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD+RWs), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.

In this document, "schematic" means "serving as an example, instance, or illustration," and any illustrations, embodiments described herein as "schematic" should not be construed as a more preferred or advantageous solution. For simplicity of the drawing, the parts relevant to the present invention are shown only schematically in the drawings, and do not represent the actual structure thereof as a product. Additionally, in order to simplify the drawing for ease of understanding, components having the same structure or function in some of the drawings are shown schematically with only one of them, or only one of them is labeled. In this document, "a" does not mean to limit the number of relevant portions of the present invention to "only one thereof", and "an" does not mean to exclude the case where the number of relevant portions of the present invention is "more than one". In this document, "upper", "lower", "front", "rear", "left", "right", "inner", "outer", and the like are used merely to indicate relative positional relationships between the relevant portions, and do not limit the absolute positions of the relevant portions.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A crowd detection model training method, the method comprising:

Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; according to the identification result and the detection frame marked in the sample picture, adjusting parameters of the crowd detection model;

The generating the attention feature vector of the corresponding head includes:

2. The method according to claim 1, wherein the detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame includes:

3. The method of claim 1, wherein said calculating an attention weight vector for each of said five elements in the respective head comprises:

4. The method of claim 2, wherein said adjusting parameters of said crowd detection model comprises:

5. The method of claim 1, wherein the five sense organs comprise:

Left eye, right eye, left ear, right ear and mouth.

6. A method of crowd counting comprising:

acquiring a target detection picture;

Wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is trained in advance by any one of the methods of claims 1-5.

7. A crowd detection model training device, comprising:

generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame; wherein the generating of the attention feature vector of the corresponding head includes: extracting a corresponding head subarea feature matrix based on a first head candidate detection frame by using a first region of interest extraction layer of the heuristic attention weighting network; performing global average sampling on the head sub-region feature matrix by using a first global pooling layer of the heuristic attention weighting network to obtain a corresponding head average feature vector; extracting a corresponding five-element subarea feature matrix based on each of the five-element candidate detection frames in the first head candidate detection frame by using a second region-of-interest extraction layer of the heuristic attention weighting network; using a second global pooling layer of the heuristic attention weighting network to average sample each feature matrix of the five sense organs subareas to obtain average feature vectors of the corresponding five sense organs; calculating an attention weight vector of each of the five sense organs in the corresponding head based on the head average feature vector and the average feature vector of the corresponding five sense organs; respectively carrying out point multiplication on the head average feature vector and the corresponding attention weight vector of each five sense organs, and summing the point multiplication results to obtain the attention feature vector of the head corresponding to the first head candidate detection frame;

8.A population counting apparatus, comprising:

The head detection module is used for detecting the head in the target detection picture based on the crowd detection model, and counting the detected head to obtain the number of people in the target detection picture; wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is trained by any one of the methods of claims 1-5.

9. A crowd detection model training device, comprising a processor and a memory;

The memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method of any one of claims 1 to 5.

10. A computer readable storage medium having stored therein computer readable instructions for performing the crowd detection model training method of any one of claims 1 to 5.

11. A population counting device comprising a processor and a memory;

the memory has stored therein an application executable by the processor for causing the processor to perform the population count method of claim 6.

12. A computer readable storage medium having stored therein computer readable instructions for performing the population count method of claim 6.