CN112380960A - Crowd counting method, device, equipment and storage medium - Google Patents
Crowd counting method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112380960A CN112380960A CN202011254152.6A CN202011254152A CN112380960A CN 112380960 A CN112380960 A CN 112380960A CN 202011254152 A CN202011254152 A CN 202011254152A CN 112380960 A CN112380960 A CN 112380960A
- Authority
- CN
- China
- Prior art keywords
- head
- shoulder detection
- frame
- image
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000001514 detection method Methods 0.000 claims abstract description 141
- 238000010586 diagram Methods 0.000 claims description 17
- 230000009467 reduction Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a crowd counting method, a device, equipment and a storage medium, wherein the method comprises the following steps: sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model for head and shoulder detection, and outputting a head and shoulder detection frame of each frame of image; matching each head and shoulder detection frame in two continuous frames of images, and judging that the two successfully matched head and shoulder detection frames are the same target; tracking the same target in the target video to obtain a tracking track; the number of the tracking tracks is calculated to obtain the people counting result in the target video, and the technical problem that the existing pedestrian detection method has large people detection error when the crowd is dense and the pedestrians are seriously shielded from each other is solved.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for people counting.
Background
The video monitoring system enters an intelligent video monitoring era after the result simulates video monitoring and digital video monitoring. In intelligent video monitoring system, crowd density detects a core personage, especially in scenes such as garden, station, gathers crowd image data through the camera, rapid analysis and statistics number to report an emergency and ask for help avoiding appearing overcrowded, trample safety accident such as even to the high density crowd scene.
In the prior art, people are counted by a pedestrian detection method, and the method has the problem of large detection error of people when people are dense and pedestrians are seriously shielded.
Disclosure of Invention
The application provides a crowd counting method, a device, equipment and a storage medium, which are used for solving the technical problem that the existing pedestrian detection method is dense in crowd and has large detection error when pedestrians are seriously shielded.
In view of the above, the present application provides, in a first aspect, a crowd counting method, including:
sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model for head and shoulder detection, and outputting a head and shoulder detection frame of each frame of image;
matching each head and shoulder detection frame in two continuous frames of the images, and judging that the two successfully matched head and shoulder detection frames are the same target;
tracking the same target in the target video to obtain a tracking track;
and calculating the number of the tracking tracks to obtain the people counting result in the target video.
Optionally, the preset head and shoulder detection model includes: the characteristic diagram reducing module and the multi-scale receptive field expanding module are connected with the characteristic diagram reducing module;
correspondingly, the head and shoulder detection frame for sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model for head and shoulder detection and outputting each frame of image comprises:
sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model, enabling the feature map reduction module to perform feature extraction on the input image and reduce the size of the extracted feature map, performing multi-scale processing on the reduced feature map by the multi-scale receptive field expansion module, performing head and shoulder detection frame prediction based on the extracted multi-scale features, and outputting a head and shoulder detection frame of each frame of image.
Optionally, the feature map reduction module includes: a first convolutional layer, a second convolutional layer, a third convolutional layer and a fourth convolutional layer;
wherein the convolution kernel size of the first convolution layer is 7 × 7, and the convolution kernel size of the second convolution layer, the third convolution layer, and the fourth convolution layer is 3 × 3.
Optionally, the multi-scale receptive field expansion module includes: an inclusion layer, a convolution layer and 3 prediction layers.
Optionally, the matching each of the head and shoulder detection frames in the two consecutive frames of images, and determining that the two successfully matched head and shoulder detection frames are the same target, includes:
calculating the intersection ratio between each head and shoulder detection frame in two continuous frames of the images, and when the maximum intersection ratio is greater than a preset threshold value, successfully matching the two corresponding head and shoulder detection frames by the maximum intersection ratio;
and judging that the two head and shoulder detection frames which are successfully matched are the same target.
A second aspect of the present application provides a people counting device comprising:
the output unit is used for sequentially inputting each frame of image in the acquired target video into a preset head and shoulder detection model for head and shoulder detection and outputting a head and shoulder detection frame of each frame of image;
the matching unit is used for matching each head and shoulder detection frame in the two continuous frames of images and judging that the two successfully matched head and shoulder detection frames are the same target;
the tracking unit is used for tracking the same target in the target video to obtain a tracking track;
and the calculating unit is used for calculating the number of the tracking tracks to obtain the people counting result in the target video.
Optionally, the preset head and shoulder detection model includes: the characteristic diagram reducing module and the multi-scale receptive field expanding module are connected with the characteristic diagram reducing module;
correspondingly, the output unit is specifically configured to:
sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model, enabling the feature map reduction module to perform feature extraction on the input image and reduce the size of the extracted feature map, performing multi-scale processing on the reduced feature map by the multi-scale receptive field expansion module, performing head and shoulder detection frame prediction based on the extracted multi-scale features, and outputting a head and shoulder detection frame of each frame of image.
Optionally, the matching unit is specifically configured to:
calculating the intersection ratio between each head and shoulder detection frame in two continuous frames of the images, and when the maximum intersection ratio is greater than a preset threshold value, successfully matching the two corresponding head and shoulder detection frames by the maximum intersection ratio;
and judging that the two head and shoulder detection frames which are successfully matched are the same target.
A third aspect of the application provides a people counting device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the people counting method according to any of the first aspect according to instructions in the program code.
A fourth aspect of the present application provides a computer readable storage medium for storing program code for performing the people counting method of any one of the first aspect.
According to the technical scheme, the method has the following advantages:
the application provides a crowd counting method, which comprises the following steps: sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model for head and shoulder detection, and outputting a head and shoulder detection frame of each frame of image; matching each head and shoulder detection frame in two continuous frames of images, and judging that the two successfully matched head and shoulder detection frames are the same target; tracking the same target in the target video to obtain a tracking track; and calculating the number of the tracking tracks to obtain the people counting result in the target video.
According to the method and the device, the head and the shoulder of each frame of image in the target video are detected through the preset head and shoulder detection model, so that false detection and missing detection caused by mutual shielding of people are avoided; the method comprises the steps of matching head and shoulder detection frames in two continuous frames of images, determining the head and shoulder detection frames belonging to the same target in the two continuous frames of images, tracking the head and shoulder detection frames, and finally determining the people counting result in a target video through the number of tracking tracks.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a crowd counting method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a preset head and shoulder detection model according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a crowd counting apparatus according to an embodiment of the present disclosure.
Detailed Description
The application provides a crowd counting method, a device, equipment and a storage medium, which are used for solving the technical problem that the existing pedestrian detection method is dense in crowd and has large detection error when pedestrians are seriously shielded.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For ease of understanding, referring to fig. 1, the present application provides an embodiment of a people counting method, comprising:
and 101, sequentially inputting each frame of image in the acquired target video into a preset head and shoulder detection model for head and shoulder detection, and outputting a head and shoulder detection frame of each frame of image.
The video of the crowd dense area is obtained through the camera to obtain the target video, and the target video can be subjected to frame division to obtain each frame of image.
The existing pedestrian detection network is large in size and high in calculation resource cost, and in order to solve the problem, the preset head and shoulder detection model in the embodiment of the application is a lightweight neural network model and mainly comprises two parts: a characteristic diagram reducing module and a multi-scale receptive field expanding module. And sequentially inputting each frame of image in the target video into a preset head and shoulder detection model, so that a feature map reduction module performs feature extraction on the input image, reduces the size of the extracted feature map, a multi-scale receptive field expansion module performs multi-scale processing on the reduced feature map, performs head and shoulder detection frame prediction based on the extracted multi-scale features, and outputs a head and shoulder detection frame of each frame of image.
Further, the preset head and shoulder detection model can be structured as shown in fig. 2, and the feature diagram reduction module can rapidly reduce the space size of the feature diagram and increase the network operation speed. The feature map reduction module comprises: a first convolutional layer, a second convolutional layer, a third convolutional layer and a fourth convolutional layer; the convolution kernel size of the first convolution layer Conv1 is 7 × 7, and the convolution kernel sizes of the second convolution layer Conv2, the third convolution layer Conv3 and the fourth convolution layer Conv4 are 3 × 3.
The characteristic diagram reducing module firstly adopts the convolution kernel size of 7 × 7 and the convolution layer with the step length of 4 to rapidly reduce the size of the input image, thereby greatly reducing the size of the characteristic diagram processed by the subsequent convolution layer and further reducing the calculated amount. Meanwhile, the convolution kernel parameter quantity and the receptive field of 7 × 7 are relatively large, the extracted features are richer, and the loss of feature information caused by the rapid reduction of the size of the feature map can be reduced. After the first convolution, the feature size is further reduced using a convolution layer of 3 x 3 convolution kernel size with step size 2. And then, a convolution layer with the convolution kernel size of 3 x 3 and the step length of 1 is adopted, so that the rapid loss of the characteristic information is reduced on one hand, and the network depth is deepened on the other hand, so that the network extracts more accurate depth characteristics. Finally, the fourth convolutional layer Conv4 quickly reduced the feature map size to 1/16 as input.
The rapid feature map reduction module can greatly increase the speed and reduce the precision problem caused by feature information loss, thereby not only accelerating the running speed of the model, but also keeping the model at higher precision. In addition, the network sets the number of convolution kernels of Conv1, Conv2, Conv3 and Conv4 to 12, 24 and 48 respectively, parameter redundancy is reduced, and operation efficiency is further improved.
Further, referring to fig. 2, the multi-scale receptive field expansion module includes: an inclusion layer, a convolutional layer (expansion convolutional layer), and 3 prediction layers. The multi-scale receptive field expansion module is used for expanding a target-associated receptive field and providing rich context semantic information for the head-shoulder target characteristics in combination with the form of the multi-scale receptive field, the multi-scale receptive field expansion module in the embodiment of the application designs a more adaptive expansion rate (the expansion rate is preferably 3) for head-shoulder data distribution, and greatly reduces the precision loss caused by the multi-branch expansion convolutional layer, and the scale of the receptive field can be increased by the expansion convolutional layer with the expansion rate of 3.
After receptive field scale amplification and multi-scale generation, the multi-scale receptive field expansion module predicts the head-shoulder target in a manner that predicts separately on 3 different convolutional layers. Setting a 1 st prediction layer after the inclusion, setting a second prediction layer after Conv6_1, setting a third prediction layer after Conv9_1, and adopting prior box design with different scales. Because in the head-shoulder detection, the head-shoulder target aspect ratio approaches 1: therefore, in order to efficiently regress the target and save the calculation amount by the prior frame, the aspect ratio of the embodiment of the present application is 1: 1, prior box. The robustness of the detector can be effectively improved by combining the layered prediction with the multi-scale prior frame design. The loss functions of the preset head and shoulder detection model in the embodiment of the application comprise a Softmax loss function and a Smooth L1 loss function, wherein the Softmax loss function is mainly used for performing loss calculation on a predicted target category; the Smooth L1 loss function is used to regress the predicted and actual detection boxes.
For the head and shoulder detection frame results of the three prediction layers, the non-maximum inhibition method is adopted to screen the head and shoulder detection frames, and the optimal head and shoulder detection frame is output.
And 102, matching each head and shoulder detection frame in two continuous frames of images, and judging that the two successfully matched head and shoulder detection frames are the same target.
After detection, each head and shoulder target of the current frame image corresponds to one head and shoulder detection frame, and then detection results may be missed and false due to the detection precision of the detector. Therefore, in order to improve the accuracy of the crowd counting method, the multi-target tracking algorithm is added on the basis of detection to correct the detection result, and the tracking track of each head and shoulder target in the continuous video frames is obtained.
In the embodiment of the application, the IOU (IOU is the intersection and parallel ratio between two frames) between the head and shoulder detection frames detected by the front and rear frames of images is used as the association basis, so that all the head and shoulder detection frames in the two frames are directly matched without considering the appearance information of the detection target and predicting the motion trail.
Further, the matching process may be: calculating the intersection ratio between each head and shoulder detection frame in two continuous frames of images, and when the maximum intersection ratio is greater than a preset threshold value, successfully matching the two corresponding head and shoulder detection frames by the maximum intersection ratio; and judging that the two head and shoulder detection frames which are successfully matched are the same target. Specifically, the IOU between each head and shoulder detection frame in the current frame image and each head and shoulder detection frame in the previous frame image is calculated, when each frame image is processed, for each tracked target, a maximum IOU between the detected head and shoulder detection frames and the position before the maximum IOU is selected from the detected head and shoulder detection frames, if the maximum IOU is larger than a preset threshold value, the two head and shoulder detection frames corresponding to the maximum IOU are judged to be matched, the two head and shoulder detection frames corresponding to the maximum IOU are judged to be the same target, and otherwise, the matching fails.
And 103, tracking the same target in the target video to obtain a tracking track.
The same target in the target video is tracked, and each target correspondingly obtains a tracking track tracklet. If a tracklet match fails, the target is considered to be off. If there is a head-shoulder detection box that does not match the tracklet, then the newly-appearing target is considered and a new tracklet is created for it.
The embodiment of the application tracks the head and shoulder target detection frame, when the same target is detected in continuous N frames of images (which can be continuous 3 frames of images), the target starts to be tracked, and if the target is not detected in continuous M frames of images (which can be continuous 10 frames of images) after the last detection, the tracking is finished.
And step 104, calculating the number of the tracking tracks to obtain the people counting result in the target video.
And finally, determining the number of people in the target video according to the number of the tracking tracks. Compared with the crowd counting strategy based on single-frame image target detection, the crowd counting strategy based on tracking and detection further improves the precision and robustness of crowd counting.
In the embodiment of the application, the head and the shoulder of each frame of image in the target video are detected through the preset head and shoulder detection model, so that false detection and missing detection caused by mutual shielding of people are avoided; the method comprises the steps of matching head and shoulder detection frames in two continuous frames of images, determining the head and shoulder detection frames belonging to the same target in the two continuous frames of images, tracking the head and shoulder detection frames, and finally determining the people counting result in a target video through the number of tracking tracks.
The above is a crowd counting method provided by the present application, and the following is a crowd counting device provided by the embodiment of the present application.
Referring to fig. 3, an embodiment of a crowd counting apparatus provided in the present application includes:
the output unit 201 is configured to sequentially input each frame of image in the acquired target video to a preset head and shoulder detection model for head and shoulder detection, and output a head and shoulder detection frame of each frame of image;
a matching unit 202, configured to match each head and shoulder detection frame in two consecutive frames of images, and determine that two successfully matched head and shoulder detection frames are the same target;
the tracking unit 203 is configured to track the same target in the target video to obtain a tracking track;
and the calculating unit 204 is used for calculating the number of the tracking tracks to obtain the people counting result in the target video.
As a further improvement, the preset head and shoulder detection model comprises: the characteristic diagram reducing module and the multi-scale receptive field expanding module are connected with the characteristic diagram reducing module;
correspondingly, the output unit 201 is specifically configured to:
and sequentially inputting each frame of image in the acquired target video into a preset head and shoulder detection model, so that a feature map reduction module performs feature extraction on the input image and reduces the size of the extracted feature map, a multi-scale receptive field expansion module performs multi-scale processing on the reduced feature map, performs head and shoulder detection frame prediction based on the extracted multi-scale features, and outputs a head and shoulder detection frame of each frame of image.
As a further improvement, the matching unit 202 is specifically configured to:
calculating the intersection ratio between each head and shoulder detection frame in two continuous frames of images, and when the maximum intersection ratio is greater than a preset threshold value, successfully matching the two corresponding head and shoulder detection frames by the maximum intersection ratio;
and judging that the two head and shoulder detection frames which are successfully matched are the same target.
The embodiment of the application also provides crowd counting equipment, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is adapted to perform the people counting method of the aforementioned embodiments of the people counting method according to instructions in the program code.
An embodiment of the present application further provides a computer-readable storage medium, which is used for storing program codes, and the program codes are used for executing the crowd counting method in the aforementioned crowd counting method embodiment.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (10)
1. A method of population counting, comprising:
sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model for head and shoulder detection, and outputting a head and shoulder detection frame of each frame of image;
matching each head and shoulder detection frame in two continuous frames of the images, and judging that the two successfully matched head and shoulder detection frames are the same target;
tracking the same target in the target video to obtain a tracking track;
and calculating the number of the tracking tracks to obtain the people counting result in the target video.
2. The population counting method of claim 1, wherein said preset head and shoulder detection model comprises: the characteristic diagram reducing module and the multi-scale receptive field expanding module are connected with the characteristic diagram reducing module;
correspondingly, the head and shoulder detection frame for sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model for head and shoulder detection and outputting each frame of image comprises:
sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model, enabling the feature map reduction module to perform feature extraction on the input image and reduce the size of the extracted feature map, performing multi-scale processing on the reduced feature map by the multi-scale receptive field expansion module, performing head and shoulder detection frame prediction based on the extracted multi-scale features, and outputting a head and shoulder detection frame of each frame of image.
3. The population counting method of claim 2, wherein said profile reduction module comprises: a first convolutional layer, a second convolutional layer, a third convolutional layer and a fourth convolutional layer;
wherein the convolution kernel size of the first convolution layer is 7 × 7, and the convolution kernel size of the second convolution layer, the third convolution layer, and the fourth convolution layer is 3 × 3.
4. The population counting method of claim 2, wherein said multi-scale receptive field expansion module comprises: an inclusion layer, a convolution layer and 3 prediction layers.
5. The people counting method according to claim 1, wherein the matching each of the head and shoulder detection frames in two consecutive frames of the image and determining that two successfully matched head and shoulder detection frames are the same target comprises:
calculating the intersection ratio between each head and shoulder detection frame in two continuous frames of the images, and when the maximum intersection ratio is greater than a preset threshold value, successfully matching the two corresponding head and shoulder detection frames by the maximum intersection ratio;
and judging that the two head and shoulder detection frames which are successfully matched are the same target.
6. A people counting device, comprising:
the output unit is used for sequentially inputting each frame of image in the acquired target video into a preset head and shoulder detection model for head and shoulder detection and outputting a head and shoulder detection frame of each frame of image;
the matching unit is used for matching each head and shoulder detection frame in the two continuous frames of images and judging that the two successfully matched head and shoulder detection frames are the same target;
the tracking unit is used for tracking the same target in the target video to obtain a tracking track;
and the calculating unit is used for calculating the number of the tracking tracks to obtain the people counting result in the target video.
7. The people counting device according to claim 6, wherein the preset head and shoulder detection model comprises: the characteristic diagram reducing module and the multi-scale receptive field expanding module are connected with the characteristic diagram reducing module;
correspondingly, the output unit is specifically configured to:
sequentially inputting each frame of image in the acquired target video to a preset head and shoulder detection model, enabling the feature map reduction module to perform feature extraction on the input image and reduce the size of the extracted feature map, performing multi-scale processing on the reduced feature map by the multi-scale receptive field expansion module, performing head and shoulder detection frame prediction based on the extracted multi-scale features, and outputting a head and shoulder detection frame of each frame of image.
8. The people counting device according to claim 6, wherein the matching unit is specifically configured to:
calculating the intersection ratio between each head and shoulder detection frame in two continuous frames of the images, and when the maximum intersection ratio is greater than a preset threshold value, successfully matching the two corresponding head and shoulder detection frames by the maximum intersection ratio;
and judging that the two head and shoulder detection frames which are successfully matched are the same target.
9. A people counting device, characterized in that the device comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the people counting method of any of claims 1-5 according to instructions in the program code.
10. A computer-readable storage medium for storing program code for performing the people counting method according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011254152.6A CN112380960A (en) | 2020-11-11 | 2020-11-11 | Crowd counting method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011254152.6A CN112380960A (en) | 2020-11-11 | 2020-11-11 | Crowd counting method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112380960A true CN112380960A (en) | 2021-02-19 |
Family
ID=74582675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011254152.6A Pending CN112380960A (en) | 2020-11-11 | 2020-11-11 | Crowd counting method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112380960A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033353A (en) * | 2021-03-11 | 2021-06-25 | 北京文安智能技术股份有限公司 | Pedestrian trajectory generation method based on overlook image, storage medium and electronic device |
CN113128430A (en) * | 2021-04-25 | 2021-07-16 | 科大讯飞股份有限公司 | Crowd gathering detection method and device, electronic equipment and storage medium |
CN113988111A (en) * | 2021-12-03 | 2022-01-28 | 深圳佑驾创新科技有限公司 | Statistical method for pedestrian flow of public place and computer readable storage medium |
CN114119648A (en) * | 2021-11-12 | 2022-03-01 | 史缔纳农业科技(广东)有限公司 | Pig counting method for fixed channel |
CN114463378A (en) * | 2021-12-27 | 2022-05-10 | 浙江大华技术股份有限公司 | Target tracking method, electronic device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284670A (en) * | 2018-08-01 | 2019-01-29 | 清华大学 | A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism |
CN109697499A (en) * | 2017-10-24 | 2019-04-30 | 北京京东尚科信息技术有限公司 | Pedestrian's flow funnel generation method and device, storage medium, electronic equipment |
CN110738160A (en) * | 2019-10-12 | 2020-01-31 | 成都考拉悠然科技有限公司 | human face quality evaluation method combining with human face detection |
CN111611878A (en) * | 2020-04-30 | 2020-09-01 | 杭州电子科技大学 | Method for crowd counting and future people flow prediction based on video image |
-
2020
- 2020-11-11 CN CN202011254152.6A patent/CN112380960A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697499A (en) * | 2017-10-24 | 2019-04-30 | 北京京东尚科信息技术有限公司 | Pedestrian's flow funnel generation method and device, storage medium, electronic equipment |
CN109284670A (en) * | 2018-08-01 | 2019-01-29 | 清华大学 | A kind of pedestrian detection method and device based on multiple dimensioned attention mechanism |
CN110738160A (en) * | 2019-10-12 | 2020-01-31 | 成都考拉悠然科技有限公司 | human face quality evaluation method combining with human face detection |
CN111611878A (en) * | 2020-04-30 | 2020-09-01 | 杭州电子科技大学 | Method for crowd counting and future people flow prediction based on video image |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033353A (en) * | 2021-03-11 | 2021-06-25 | 北京文安智能技术股份有限公司 | Pedestrian trajectory generation method based on overlook image, storage medium and electronic device |
CN113128430A (en) * | 2021-04-25 | 2021-07-16 | 科大讯飞股份有限公司 | Crowd gathering detection method and device, electronic equipment and storage medium |
CN113128430B (en) * | 2021-04-25 | 2024-06-04 | 科大讯飞股份有限公司 | Crowd gathering detection method, device, electronic equipment and storage medium |
CN114119648A (en) * | 2021-11-12 | 2022-03-01 | 史缔纳农业科技(广东)有限公司 | Pig counting method for fixed channel |
CN113988111A (en) * | 2021-12-03 | 2022-01-28 | 深圳佑驾创新科技有限公司 | Statistical method for pedestrian flow of public place and computer readable storage medium |
CN114463378A (en) * | 2021-12-27 | 2022-05-10 | 浙江大华技术股份有限公司 | Target tracking method, electronic device and storage medium |
CN114463378B (en) * | 2021-12-27 | 2023-02-24 | 浙江大华技术股份有限公司 | Target tracking method, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112380960A (en) | Crowd counting method, device, equipment and storage medium | |
CN110942009B (en) | Fall detection method and system based on space-time hybrid convolutional network | |
CN109272509B (en) | Target detection method, device and equipment for continuous images and storage medium | |
Haines et al. | Background subtraction with dirichlet processes | |
CN111539290B (en) | Video motion recognition method and device, electronic equipment and storage medium | |
CN110245579B (en) | People flow density prediction method and device, computer equipment and readable medium | |
CN103929685A (en) | Video abstract generating and indexing method | |
CN111104925B (en) | Image processing method, image processing apparatus, storage medium, and electronic device | |
CN107563299B (en) | Pedestrian detection method using RecNN to fuse context information | |
CN109446967B (en) | Face detection method and system based on compressed information | |
CN110633643A (en) | Abnormal behavior detection method and system for smart community | |
CN111652181B (en) | Target tracking method and device and electronic equipment | |
CN112016461A (en) | Multi-target behavior identification method and system | |
CN109697393B (en) | Person tracking method, person tracking device, electronic device, and computer-readable medium | |
CN114926791A (en) | Method and device for detecting abnormal lane change of vehicles at intersection, storage medium and electronic equipment | |
CN111950507B (en) | Data processing and model training method, device, equipment and medium | |
CN110956097A (en) | Method and module for extracting occluded human body and method and device for scene conversion | |
CN113642442B (en) | Face detection method and device, computer readable storage medium and terminal | |
CN111383245A (en) | Video detection method, video detection device and electronic equipment | |
CN115690732A (en) | Multi-target pedestrian tracking method based on fine-grained feature extraction | |
CN112907623A (en) | Statistical method and system for moving object in fixed video stream | |
CN112966136A (en) | Face classification method and device | |
CN113554685A (en) | Method and device for detecting moving target of remote sensing satellite, electronic equipment and storage medium | |
CN104732558B (en) | moving object detection device | |
CN112598707A (en) | Real-time video stream object detection and tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |