CN113822111B - Crowd detection model training method and device and crowd counting method and device - Google Patents

Crowd detection model training method and device and crowd counting method and device Download PDF

Info

Publication number
CN113822111B
CN113822111B CN202110067279.5A CN202110067279A CN113822111B CN 113822111 B CN113822111 B CN 113822111B CN 202110067279 A CN202110067279 A CN 202110067279A CN 113822111 B CN113822111 B CN 113822111B
Authority
CN
China
Prior art keywords
head
crowd
sense organs
detection
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110067279.5A
Other languages
Chinese (zh)
Other versions
CN113822111A (en
Inventor
谷爱国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN202110067279.5A priority Critical patent/CN113822111B/en
Publication of CN113822111A publication Critical patent/CN113822111A/en
Application granted granted Critical
Publication of CN113822111B publication Critical patent/CN113822111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a crowd detection model training method and device and a crowd counting method and device, wherein the model training method comprises the following steps: acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames; training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises: detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame; generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame; based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture. By adopting the application, the crowd counting accuracy can be improved.

Description

Crowd detection model training method and device and crowd counting method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a crowd detection model training method and apparatus, and a crowd counting method and apparatus.
Background
Crowd counting is an important computer vision technique for security. In intelligent security field, unmanned inspection vehicle can effectively judge crowd gathering condition through crowd count, makes early warning in advance, prevents abnormal behavior's appearance.
Head detection is a common crowd counting method, and the method calculates the crowd number by identifying heads in the crowd.
Disclosure of Invention
Therefore, the main objective of the present invention is to provide a crowd detection model training method and apparatus, and a crowd counting method and apparatus, which can improve crowd counting accuracy.
In order to achieve the above purpose, the technical solution provided by the embodiment of the present invention is as follows:
a crowd detection model training method, the method comprising:
Acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames;
Training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises:
Detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame;
Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture.
In one embodiment, the detecting the head of the person and the five sense organs in the sample picture, and obtaining the head candidate detection frame and the five sense organs candidate detection frame includes:
Detecting the head in the sample picture by using a pre-trained head detection model to obtain a head candidate detection frame;
obtaining a sub-graph of the corresponding head based on the head candidate detection frame;
and detecting each five sense organs in the subgraph by using a five sense organ detection model to obtain the five sense organ candidate detection frame.
In one embodiment, the generating the attention feature vector of the respective header includes:
extracting a corresponding head subarea feature matrix based on a first head candidate detection frame by using a first region of interest extraction layer of the heuristic attention weighting network;
Performing global average sampling on the head sub-region feature matrix by using a first global pooling layer of the heuristic attention weighting network to obtain a corresponding head average feature vector;
Extracting a corresponding five-element subarea feature matrix based on each of the five-element candidate detection frames in the first head candidate detection frame by using a second region-of-interest extraction layer of the heuristic attention weighting network;
using a second global pooling layer of the heuristic attention weighting network to average sample each feature matrix of the five sense organs subareas to obtain average feature vectors of the corresponding five sense organs;
calculating an attention weight vector of each of the five sense organs in the corresponding head based on the head average feature vector and the average feature vector of the corresponding five sense organs;
And respectively carrying out point multiplication on the head average feature vector and the corresponding attention weight vector of each five sense organs, and summing the point multiplication results to obtain the attention feature vector of the head corresponding to the first head candidate detection frame.
In one embodiment, said calculating the attention weight vector for each of said five sense organs in the respective head comprises:
If the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is multiplied by the head average feature vector point to obtain a corresponding attention weight vector;
if the average feature vector is not present in the five sense organs, the corresponding attention weight vector is zero.
In one embodiment, the adjusting parameters of the crowd detection model includes:
Parameters in the head detection model, the five sense organs detection model, the heuristic attention weighting network, and the classification network are adjusted.
In one embodiment, the five sense organs include:
Left eye, right eye, left ear, right ear and mouth.
A population counting method, comprising:
acquiring a target detection picture;
detecting the head of a person in the target detection picture based on a crowd detection model, and counting the detected head to obtain the number of people in the target detection picture;
wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is trained by adopting any crowd detection model training method as described above in advance.
A crowd detection model training device, comprising:
the sample data acquisition module is used for acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames;
the model training module is used for training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises:
Detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame;
Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture.
A population counting apparatus comprising:
The detection target acquisition module is used for acquiring a target detection picture;
the head detection module is used for detecting the head in the target detection picture based on the crowd detection model, and counting the detected head to obtain the number of people in the target detection picture; wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is obtained by training by adopting any crowd detection model training method.
A crowd detection model training device comprising a processor and a memory;
The memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method of any one of claims 1 to 6.
A computer readable storage medium having stored therein computer readable instructions for performing a crowd detection model training method as described above.
The embodiment of the invention also provides crowd counting equipment which comprises a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the population count method as described above.
Embodiments of the present invention also provide a computer readable storage medium having stored therein computer readable instructions for performing the population counting method as described above.
As can be seen from the above technical solutions, in the training process of the crowd detection model by using the sample images, the model training method and apparatus, and the crowd counting method and apparatus provided by the embodiments of the present invention introduce five sense organs detection based on the head detection, and comprehensively process the results of the head detection and the five sense organs detection by using a heuristic attention weighting mechanism, so as to generate the attention feature vector of each head detected by the head. Therefore, the heuristic attention weighting mechanism is adopted by the results of the five sense organs detection, so that the difference between the heads of the people and other similar objects in appearance is improved, the accuracy of attention feature vectors input into a classification network can be improved, the head of the people can be mistakenly detected in the human head detection results is screened out, and the detection accuracy of a crowd detection model can be improved. Correspondingly, the accuracy of crowd counting by using the crowd detection model is also improved.
Drawings
FIG. 1 is a schematic flow chart of a method according to a first embodiment of the invention;
FIG. 2 is a schematic diagram of a crowd detection model network according to an embodiment of the invention;
FIG. 3 is a flow chart of a second embodiment of the present invention;
FIG. 4 is a schematic diagram of a device according to a third embodiment of the present invention;
fig. 5 is a schematic diagram of a device structure according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and the embodiments, in order to make the objects, technical solutions and advantages of the present invention more apparent.
The inventor finds that the existing scheme for counting people by using head detection exists in the process of realizing the invention: and the counting error is large. Through serious research analysis, the specific cause of the problem is found as follows:
The existing human head detection scheme is realized based on a general target detection framework. In the scheme, the target features are extracted first, and then the category and the position of the target are obtained through classification and regression. In an actual application scene, angles of people relative to the camera are different, and feature variability of heads of people under different angles is large, for example, variability of a front face and a rear brain scoop is large. The presence of such large variability makes it easy for similar objects that differ less from the human head characteristics to be misdetected as human heads. Because, in a real scene, it is unavoidable that: at a certain angle, the difference in the characteristics of the head of a person from other objects is smaller than the difference in the characteristics of the head of a person at a different angle. For example, the head of a plush toy is very little different from the area of a human hindbrain scoop. Thus, the head of the plush toy is easily identified as the head of a person by the existing human head detection technology. Therefore, the problem of false detection easily occurs in the existing human head detection scheme, so that the crowd counting error is relatively large.
Fig. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention, as shown in fig. 1, the model training method implemented by this embodiment mainly includes:
step 101, acquiring a sample data set.
Wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames.
In practical application, a person skilled in the art can select the five sense organs to be detected according to the actual needs. Preferably, in order to effectively screen out false detection results of human head detection, the five sense organs to be detected may include: left eye, right eye, left ear, right ear and mouth. In practical application, a person skilled in the art can set the set of five sense organs to be detected according to actual needs, so long as ensuring: regardless of the shooting angle, the image of the head of the person can contain at least one five sense organs in the set of five sense organs. Therefore, the head of the person with false detection can be screened out based on the five sense organs, so that the detection accuracy is improved.
In this step, the head and the preset five sense organs in each sample picture need to be marked with a detection frame identifier, so that when the model is trained, the model parameters are adjusted based on the detection frame identifier in the sample picture and the detection result output by the model.
And 102, training a pre-constructed crowd detection model by using the sample data set to obtain a target crowd detection model.
Based on sample pictures in a sample data set, training the crowd detection model can be specifically achieved by the following steps:
Detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame;
Generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame of each head;
Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture.
Here, the specific method for adjusting the parameters of the crowd detection model according to the identification result and the detection frame identified in the sample picture is known to those skilled in the art, and will not be described herein.
According to the training method, not only the head of a person in a sample picture, but also the five sense organs in the head are detected, and the object of the head of the person, which is misjudged in the head detection result, can be effectively screened out by utilizing the detection result of the five sense organs and the heuristic attention weighting network, so that the accuracy of the head attention feature vector input to the classification network is effectively improved, the accuracy of the head identification result input by the classification network is improved, and the detection accuracy of the crowd detection model is further improved.
In addition, in the training method, the accuracy of classification is improved by introducing a heuristic attention weighting mechanism, so that after the attention feature vector of the head of the person is generated, the attention feature vector is only input into a classification network of a model to identify the authenticity of the head of the person, and the detection frame is not required to be finely tuned through regression processing in order to improve the detection accuracy like the existing head detection method, therefore, compared with the existing head detection method, the detection speed can be effectively improved.
The classification network is used for identifying the authenticity of the head of the corresponding person based on the attention characteristic vector of each head. The specific structure may be implemented using an existing classifier, for example, may include two fully connected layers and a softmax activation function, but is not limited thereto, and may be implemented using one fully connected layer or multiple fully connected layers.
In one embodiment, in the above model training method, the head and the five sense organs in the sample picture may be detected by the following method to obtain a head candidate detection frame and a five sense organs candidate detection frame:
and a step a1 of detecting the head in the sample picture by utilizing a pre-trained head detection model to obtain a head candidate detection frame.
In this step, the detection frame of each head in the picture detected by the head detection model is used as a head candidate detection frame to verify the authenticity in the subsequent step.
Here, the head detection model is a model that detects the head of a person in a picture.
And a step a2, obtaining a sub-graph of the corresponding head based on the head candidate detection frame.
In the step, the image in the head candidate detection frame is taken as a sub-image of the corresponding head, so that the five sense organs in the sub-image are identified based on the sub-image in the subsequent step, and the sub-region feature image of each sense organ is obtained.
And a3, detecting each five sense organs in the subgraph by using a five sense organ detection model to obtain the five sense organ candidate detection frame.
In this step, each preset five sense organs in the head subgraph is detected, and the detected detection frame is used as a candidate detection frame of the corresponding five sense organs. For example, if the five sense organs to be detected include left eye, right eye, left ear, right ear and mouth, then this step would require detecting these five sense organs from the subgraph, resulting in a left eye detection frame, a right eye detection frame, a left ear detection frame, a right ear detection frame and a mouth detection frame.
It should be noted that, due to different angles of shooting in practical applications, it is possible that one head sub-image cannot include all images of preset five sense organs, that is, there may be no detection frame of part of preset five sense organs in the sub-image.
In the above method, the head detection model and the five sense organs detection model may be implemented by using existing target detection methods, for example, may be implemented by using a regional candidate network (RPN).
In one embodiment, in the model training method, for each detected head, the generating the attention feature vector of the head may include:
Step b1, extracting a corresponding head subarea feature matrix based on a first head candidate detection frame by using a first interesting region extraction layer (ROI Pooling) of the heuristic attention weighting network.
In this step, for each head candidate detection frame detected in step a1, a corresponding head sub-region feature matrix (i.e., a head sub-region feature map) is extracted based on the head candidate detection frame, so as to obtain a head average feature vector of a corresponding head. The first head candidate detection frame represents any of the head candidate detection frames detected in step a 1.
And b2, performing global average sampling on the head subarea feature matrix by using a first global pooling layer (Global Pooling) of the heuristic attention weighting network to obtain a corresponding head average feature vector.
And b3, extracting a corresponding five-sense organ subarea feature matrix based on each five-sense organ candidate detection frame in the first head candidate detection frame by utilizing a second interest area extraction layer of the heuristic attention weighting network.
In this step, the feature matrix of the facial feature sub-region of each preset facial feature in the first head candidate detection frame is extracted, and if a candidate detection frame does not exist in a certain facial feature, the feature matrix of the corresponding facial feature sub-region does not exist.
And b4, carrying out average sampling on each facial feature subarea feature matrix by utilizing a second global pooling layer of the heuristic attention weighting network to obtain an average feature vector of the corresponding facial feature.
In this step, the feature matrix of the facial feature sub-region of each facial feature in the first head candidate detection frame is sampled averagely to obtain an average feature vector of the corresponding facial feature, so that attention weighting processing is performed based on the average feature vector to screen out the features of the head of the person being misdetected in the head detection.
And b5, calculating the attention weight vector of each five sense organs in the head corresponding to the first head candidate detection frame based on the head average feature vector and the average feature vector of the corresponding five sense organs.
In one embodiment, the method can be specifically implemented as followsCalculating the attention weight vector of each five sense organs in the corresponding head, wherein w i represents the attention weight vector of the five sense organs i, m i represents the average feature vector of the five sense organs i, h represents the head average feature vector, and w i has the same dimension with h.
In the above calculation method, if the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is multiplied by the head average feature vector point to obtain the corresponding attention weight vector. If the average feature vector is not present in the five sense organs, the corresponding attention weight vector is zero. Thus, for an object that is misdetected as a person's head, since the five sense elements detection box is not detected in its subgraph, the attention weight vectors of all the five sense elements corresponding thereto are zero.
And b6, respectively carrying out point multiplication on the head average feature vector and the corresponding attention weight vector of each five sense organs, and summing the point multiplication results to obtain the attention feature vector of the head corresponding to the first head candidate detection frame.
Here, as described in the above steps, the attention weight vector of the five sense organs of the object similar to the human head will be zero, and thus, the zero vector will be zero vector after dot-multiplying the head average feature vector. In this way, the attention feature vector of the object similar to the head of the person is zero vector, so that the difference between the head of the person and other objects similar to the appearance is improved, and therefore, the object of the misdetected head of the person can be effectively screened out by utilizing the step b 6.
In one embodiment, in the model training method, when the parameters of the crowd detection model are adjusted according to the result output by the classification network, the parameters of the head detection model, the five sense organs detection model, the heuristic attention weighting network and the classification network in the model are optimized and adjusted. The specific adjustment methods are known to those skilled in the art and will not be described in detail herein.
In one embodiment, training in heuristic attention weighting networks as well as classification networks may be optimized by random gradient descent methods using cross entropy loss functions, but is not limited thereto.
In order to facilitate clear understanding of the crowd detection model structure provided by the embodiment of the invention. Fig. 2 shows a network structure diagram of the crowd detection model obtained based on the model training method. As shown in fig. 2, the model includes a head detection model, a five sense organs detection model, a heuristic attention weighting network, and a classification network. In the network structure example, the head detection model and the five sense organs detection model are both implemented by adopting RPN.
Based on the above embodiment of the model training method, the embodiment of the present invention further provides a crowd counting method, as shown in fig. 3, where the crowd counting method includes:
Step 301, obtaining a target detection picture.
Step 302, detecting the head of the person in the target detection picture based on the crowd detection model, and counting the detected head to obtain the number of the person in the target detection picture.
Wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is obtained by training the crowd detection model training method in advance.
In step 302, for each head detected by the crowd detection model, crowd counting is performed according to the confidence coefficient corresponding to the head, i.e. the head with the statistical confidence coefficient greater than the preset threshold. The specific calculation method of the confidence coefficient of the detection result can be realized by adopting the existing method.
As described in the above analysis, in the crowd detection model used in the step, since the five sense organs detection means are introduced and the heuristic attention mechanism is combined, the false detection result of the head detection can be effectively screened out, so that the detection accuracy of the crowd detection model can be ensured. Therefore, in step 302, the crowd detection model obtained by training in the first embodiment of the present invention is used to detect the head of the person in the target detection picture, and the accuracy of crowd detection can be improved by counting according to the detection result.
Here, the threshold value is used to define the constraint that the detected head of the person refers to the count, and a suitable value may be set by a person skilled in the art.
Corresponding to the above embodiment of the model training method, the embodiment of the present invention further provides a model training device, as shown in fig. 4, where the device includes:
A sample data acquisition module 401, configured to acquire a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames.
The model training module 402 is configured to train a crowd detection model that is built in advance by using the sample data set to obtain a target crowd detection model; wherein the training comprises:
Detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame;
Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture.
Corresponding to the above embodiment of the crowd counting method, the embodiment of the present invention further provides a crowd counting device, as shown in fig. 5, where the crowd counting device includes:
A detection target obtaining module 501, configured to obtain a target detection picture;
The head detection module 502 is configured to detect the head of a person in the target detection picture based on a crowd detection model, and count the detected head to obtain the number of persons in the target detection picture; wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is obtained by training by adopting the crowd detection model training method.
According to the embodiment, in the model training method, in the process of training the crowd detection model by using the sample pictures, five sense organs are detected on the basis of head detection, and the results of the head detection and the five sense organs are comprehensively processed by using a heuristic attention weighting mechanism, so that attention feature vectors of each head detected by the head are generated. Therefore, the difference between the head of the person and other similar objects in appearance can be improved by utilizing the result of the five sense organs detection and adopting a heuristic attention weighting mechanism, so that the accuracy of attention feature vectors input into a classification network can be improved, the false detection result of the head of the person detection can be screened out, and the detection accuracy of a trained crowd detection model can be improved. Correspondingly, the accuracy of crowd counting by using the crowd detection model is also improved.
The crowd detection model provided by the embodiment of the invention can effectively overcome the influence of shooting angles on detection accuracy, so that the crowd counting method realized based on the crowd detection model has wider application scenes, and is suitable for various scenes, such as crowd-intensive scenes, crowd-sparse scenes, angle change diversity and the like.
Corresponding to the crowd detection model training method embodiment, the embodiment of the invention also provides crowd detection model training equipment, which comprises a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method as described above.
Embodiments of the present invention also provide a computer readable storage medium having stored therein computer readable instructions for performing the crowd detection model training method as described above.
Corresponding to the crowd counting method embodiment, the embodiment of the invention also provides crowd counting equipment which comprises a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the population count method as described above.
Embodiments of the present invention also provide a computer readable storage medium having stored therein computer readable instructions for performing the population counting method as described above.
In the above embodiments, the memory may be embodied as various storage media such as an electrically erasable programmable read-only memory (EEPROM), a Flash memory (Flash memory), a programmable read-only memory (PROM), and the like. A processor may be implemented to include one or more central processors or one or more field programmable gate arrays, where the field programmable gate arrays integrate one or more central processor cores. In particular, the central processor or central processor core may be implemented as a CPU or MCU.
It should be noted that not all the steps and modules in the above processes and the structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted as required. The division of the modules is merely for convenience of description and the division of functions adopted in the embodiments, and in actual implementation, one module may be implemented by a plurality of modules, and functions of a plurality of modules may be implemented by the same module, and the modules may be located in the same device or different devices.
The hardware modules in the various embodiments may be implemented mechanically or electronically. For example, a hardware module may include specially designed permanent circuits or logic devices (e.g., special purpose processors such as FPGAs or ASiC) for performing a particular operation. A hardware module may also include programmable logic devices or circuits (e.g., including a general purpose processor or other programmable processor) temporarily configured by software for performing particular operations. As regards implementation of the hardware modules in a mechanical manner, either by dedicated permanent circuits or by circuits that are temporarily configured (e.g. by software), this may be determined by cost and time considerations.
Storage medium implementations for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD+RWs), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.
In this document, "schematic" means "serving as an example, instance, or illustration," and any illustrations, embodiments described herein as "schematic" should not be construed as a more preferred or advantageous solution. For simplicity of the drawing, the parts relevant to the present invention are shown only schematically in the drawings, and do not represent the actual structure thereof as a product. Additionally, in order to simplify the drawing for ease of understanding, components having the same structure or function in some of the drawings are shown schematically with only one of them, or only one of them is labeled. In this document, "a" does not mean to limit the number of relevant portions of the present invention to "only one thereof", and "an" does not mean to exclude the case where the number of relevant portions of the present invention is "more than one". In this document, "upper", "lower", "front", "rear", "left", "right", "inner", "outer", and the like are used merely to indicate relative positional relationships between the relevant portions, and do not limit the absolute positions of the relevant portions.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A crowd detection model training method, the method comprising:
Acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames;
Training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises:
Detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame;
Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; according to the identification result and the detection frame marked in the sample picture, adjusting parameters of the crowd detection model;
The generating the attention feature vector of the corresponding head includes:
extracting a corresponding head subarea feature matrix based on a first head candidate detection frame by using a first region of interest extraction layer of the heuristic attention weighting network;
Performing global average sampling on the head sub-region feature matrix by using a first global pooling layer of the heuristic attention weighting network to obtain a corresponding head average feature vector;
Extracting a corresponding five-element subarea feature matrix based on each of the five-element candidate detection frames in the first head candidate detection frame by using a second region-of-interest extraction layer of the heuristic attention weighting network;
using a second global pooling layer of the heuristic attention weighting network to average sample each feature matrix of the five sense organs subareas to obtain average feature vectors of the corresponding five sense organs;
calculating an attention weight vector of each of the five sense organs in the corresponding head based on the head average feature vector and the average feature vector of the corresponding five sense organs;
And respectively carrying out point multiplication on the head average feature vector and the corresponding attention weight vector of each five sense organs, and summing the point multiplication results to obtain the attention feature vector of the head corresponding to the first head candidate detection frame.
2. The method according to claim 1, wherein the detecting the head of the person and the five sense organs in the sample picture to obtain a head candidate detection frame and a five sense organs candidate detection frame includes:
Detecting the head in the sample picture by using a pre-trained head detection model to obtain a head candidate detection frame;
obtaining a sub-graph of the corresponding head based on the head candidate detection frame;
and detecting each five sense organs in the subgraph by using a five sense organ detection model to obtain the five sense organ candidate detection frame.
3. The method of claim 1, wherein said calculating an attention weight vector for each of said five elements in the respective head comprises:
If the average feature vector exists in the five sense organs, the average feature vector of the corresponding five sense organs is multiplied by the head average feature vector point to obtain a corresponding attention weight vector;
if the average feature vector is not present in the five sense organs, the corresponding attention weight vector is zero.
4. The method of claim 2, wherein said adjusting parameters of said crowd detection model comprises:
Parameters in the head detection model, the five sense organs detection model, the heuristic attention weighting network, and the classification network are adjusted.
5. The method of claim 1, wherein the five sense organs comprise:
Left eye, right eye, left ear, right ear and mouth.
6. A method of crowd counting comprising:
acquiring a target detection picture;
detecting the head of a person in the target detection picture based on a crowd detection model, and counting the detected head to obtain the number of people in the target detection picture;
Wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is trained in advance by any one of the methods of claims 1-5.
7. A crowd detection model training device, comprising:
the sample data acquisition module is used for acquiring a sample data set; wherein, the head and the five sense organs of the person in the sample picture are marked with detection frames;
the model training module is used for training a crowd detection model constructed in advance by utilizing the sample data set to obtain a target crowd detection model; wherein the training comprises:
Detecting the head of a person in the sample picture and the five sense organs to obtain a head candidate detection frame and a five sense organs candidate detection frame;
generating attention feature vectors of the corresponding heads by using a heuristic attention weighting network based on the head candidate detection frame and the five sense organ candidate detection frame; wherein the generating of the attention feature vector of the corresponding head includes: extracting a corresponding head subarea feature matrix based on a first head candidate detection frame by using a first region of interest extraction layer of the heuristic attention weighting network; performing global average sampling on the head sub-region feature matrix by using a first global pooling layer of the heuristic attention weighting network to obtain a corresponding head average feature vector; extracting a corresponding five-element subarea feature matrix based on each of the five-element candidate detection frames in the first head candidate detection frame by using a second region-of-interest extraction layer of the heuristic attention weighting network; using a second global pooling layer of the heuristic attention weighting network to average sample each feature matrix of the five sense organs subareas to obtain average feature vectors of the corresponding five sense organs; calculating an attention weight vector of each of the five sense organs in the corresponding head based on the head average feature vector and the average feature vector of the corresponding five sense organs; respectively carrying out point multiplication on the head average feature vector and the corresponding attention weight vector of each five sense organs, and summing the point multiplication results to obtain the attention feature vector of the head corresponding to the first head candidate detection frame;
Based on the attention feature vector, identifying authenticity of the corresponding head by using a classification network; and adjusting parameters of the crowd detection model according to the identification result and the detection frame marked in the sample picture.
8.A population counting apparatus, comprising:
The detection target acquisition module is used for acquiring a target detection picture;
The head detection module is used for detecting the head in the target detection picture based on the crowd detection model, and counting the detected head to obtain the number of people in the target detection picture; wherein the confidence of the counted head is greater than a preset threshold; the crowd detection model is trained by any one of the methods of claims 1-5.
9. A crowd detection model training device, comprising a processor and a memory;
The memory has stored therein an application executable by the processor for causing the processor to perform the crowd detection model training method of any one of claims 1 to 5.
10. A computer readable storage medium having stored therein computer readable instructions for performing the crowd detection model training method of any one of claims 1 to 5.
11. A population counting device comprising a processor and a memory;
the memory has stored therein an application executable by the processor for causing the processor to perform the population count method of claim 6.
12. A computer readable storage medium having stored therein computer readable instructions for performing the population count method of claim 6.
CN202110067279.5A 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device Active CN113822111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110067279.5A CN113822111B (en) 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110067279.5A CN113822111B (en) 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device

Publications (2)

Publication Number Publication Date
CN113822111A CN113822111A (en) 2021-12-21
CN113822111B true CN113822111B (en) 2024-05-24

Family

ID=78912375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110067279.5A Active CN113822111B (en) 2021-01-19 2021-01-19 Crowd detection model training method and device and crowd counting method and device

Country Status (1)

Country Link
CN (1) CN113822111B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303193A (en) * 2015-09-21 2016-02-03 重庆邮电大学 People counting system for processing single-frame image
WO2016183766A1 (en) * 2015-05-18 2016-11-24 Xiaogang Wang Method and apparatus for generating predictive models
CN106960195A (en) * 2017-03-27 2017-07-18 深圳市丰巨泰科电子有限公司 A kind of people counting method and device based on deep learning
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN108985256A (en) * 2018-08-01 2018-12-11 曜科智能科技(上海)有限公司 Based on the multiple neural network demographic method of scene Density Distribution, system, medium, terminal
CN111046747A (en) * 2019-11-21 2020-04-21 北京金山云网络技术有限公司 Crowd counting model training method, crowd counting method, device and server
WO2020207038A1 (en) * 2019-04-12 2020-10-15 深圳壹账通智能科技有限公司 People counting method, apparatus, and device based on facial recognition, and storage medium
CN111950507A (en) * 2020-08-25 2020-11-17 北京猎户星空科技有限公司 Data processing and model training method, device, equipment and medium
CN112232140A (en) * 2020-09-25 2021-01-15 浙江远传信息技术股份有限公司 Crowd counting method and device, electronic equipment and computer storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10009579B2 (en) * 2012-11-21 2018-06-26 Pelco, Inc. Method and system for counting people using depth sensor
CN110490177A (en) * 2017-06-02 2019-11-22 腾讯科技(深圳)有限公司 A kind of human-face detector training method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016183766A1 (en) * 2015-05-18 2016-11-24 Xiaogang Wang Method and apparatus for generating predictive models
CN105303193A (en) * 2015-09-21 2016-02-03 重庆邮电大学 People counting system for processing single-frame image
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN106960195A (en) * 2017-03-27 2017-07-18 深圳市丰巨泰科电子有限公司 A kind of people counting method and device based on deep learning
CN108985256A (en) * 2018-08-01 2018-12-11 曜科智能科技(上海)有限公司 Based on the multiple neural network demographic method of scene Density Distribution, system, medium, terminal
WO2020207038A1 (en) * 2019-04-12 2020-10-15 深圳壹账通智能科技有限公司 People counting method, apparatus, and device based on facial recognition, and storage medium
CN111046747A (en) * 2019-11-21 2020-04-21 北京金山云网络技术有限公司 Crowd counting model training method, crowd counting method, device and server
CN111950507A (en) * 2020-08-25 2020-11-17 北京猎户星空科技有限公司 Data processing and model training method, device, equipment and medium
CN112232140A (en) * 2020-09-25 2021-01-15 浙江远传信息技术股份有限公司 Crowd counting method and device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN113822111A (en) 2021-12-21

Similar Documents

Publication Publication Date Title
Montserrat et al. Deepfakes detection with automatic face weighting
CN108256459B (en) Security check door face recognition and face automatic library building algorithm based on multi-camera fusion
CN107423690B (en) Face recognition method and device
KR101964397B1 (en) Information processing apparatus and information processing method
CN108269254B (en) Image quality evaluation method and device
US9652694B2 (en) Object detection method, object detection device, and image pickup device
US8131010B2 (en) High density queue estimation and line management
CN111639616B (en) Heavy identity recognition method based on deep learning
CN109784386A (en) A method of it is detected with semantic segmentation helpers
US20120093362A1 (en) Device and method for detecting specific object in sequence of images and video camera device
CN108229335A (en) It is associated with face identification method and device, electronic equipment, storage medium, program
CN104599287B (en) Method for tracing object and device, object identifying method and device
CN107305635A (en) Object identifying method, object recognition equipment and classifier training method
CN112514373B (en) Image processing apparatus and method for feature extraction
CN110765903A (en) Pedestrian re-identification method and device and storage medium
CN112668557A (en) Method for defending image noise attack in pedestrian re-identification system
WO2013075295A1 (en) Clothing identification method and system for low-resolution video
Han et al. Dr. vic: Decomposition and reasoning for video individual counting
CN111814690A (en) Target re-identification method and device and computer readable storage medium
CN112614102A (en) Vehicle detection method, terminal and computer readable storage medium thereof
CN111986163A (en) Face image selection method and device
KR20170006356A (en) Method for customer analysis based on two-dimension video and apparatus for the same
CN113822111B (en) Crowd detection model training method and device and crowd counting method and device
JP2021149687A (en) Device, method and program for object recognition
CN111753601A (en) Image processing method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant