CN110020572B - People counting method, device and equipment based on video image and storage medium - Google Patents

People counting method, device and equipment based on video image and storage medium Download PDF

Info

Publication number
CN110020572B
CN110020572B CN201810014369.6A CN201810014369A CN110020572B CN 110020572 B CN110020572 B CN 110020572B CN 201810014369 A CN201810014369 A CN 201810014369A CN 110020572 B CN110020572 B CN 110020572B
Authority
CN
China
Prior art keywords
image
pixel
acquiring
shooting scene
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810014369.6A
Other languages
Chinese (zh)
Other versions
CN110020572A (en
Inventor
胡香敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BYD Co Ltd
Original Assignee
BYD Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BYD Co Ltd filed Critical BYD Co Ltd
Priority to CN201810014369.6A priority Critical patent/CN110020572B/en
Publication of CN110020572A publication Critical patent/CN110020572A/en
Application granted granted Critical
Publication of CN110020572B publication Critical patent/CN110020572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a method and a device for counting people based on video images, wherein the method comprises the following steps: acquiring setting parameters of a camera device in a shooting scene; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. The applicability of the method can be effectively improved. In addition, the method can avoid the operation complexity caused by feature detection, reduce the system resource requirement and further improve the real-time performance of the system.

Description

People counting method, device and equipment based on video image and storage medium
Technical Field
The invention relates to the technical field of video image processing, in particular to a people counting method and device based on video images.
Background
With the continuous development of video image processing technology, the real-time people counting can be carried out on places such as supermarkets, banks, subways and the like through video monitoring, so that the statistical information such as passenger flow distribution, crowding degree and the like can be obtained, and effective reference data can be provided for works such as public area management, resource scheduling and the like.
Conventionally, people counting is performed by extracting features of pedestrians from an image captured by an imaging device, and features such as a head-shoulder shape feature, a face feature, and a pedestrian direction Gradient Histogram (HOG) feature can be extracted. Specifically, the extraction of features and the pedestrian positioning can be realized through a machine learning method, for example, a convolutional neural network can be used for training a pedestrian image detector, then, an image acquired by the camera device is input into the trained pedestrian image detector, and the position of a pedestrian can be obtained, so that the number of people can be counted.
However, in different shooting scenes, the features selected by the machine learning method have large differences, which may result in that the human image cannot be recognized, for example, a detection algorithm based on human face features may cause missing detection when a person wears a mask or caps cover key features of the face. Therefore, when the shooting scene changes, the samples need to be extracted again and the pedestrian image detector needs to be retrained, so that the system can work normally in the new shooting scene, and the workload is large. In addition, the feature detection has high computational complexity, and needs to extract a Region of Interest (ROI), run a feature extraction algorithm and a classification algorithm, which has a high demand on computational resources, so that the real-time requirement is difficult to achieve.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present invention is to provide a method for counting people based on video images, so as to calculate a new imaging weight matrix only by knowing an installation height, an installation tilt angle, and a field angle of an image capturing device in different shooting scenes, thereby adapting to the new scenes, avoiding extra work such as resampling and retraining a detector, and effectively improving applicability of the method while reducing workload. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.
The second purpose of the invention is to provide a people counting device based on video images.
A third object of the invention is to propose a computer device.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, a first embodiment of the present invention provides a method for counting people based on video images, including:
acquiring setting parameters of a camera device in a shooting scene; wherein the setting parameters include: the installation height and the installation inclination angle of the camera device and the field angle of the camera device;
acquiring an imaging weight matrix of a human body when the human body is imaged in the shooting scene according to the setting parameters; elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the elements is the imaging weight of the pixel points;
acquiring a pixel matrix of an image mask of the shooting scene; the elements in the first pixel matrix correspond to the pixels in the image one by one, and the values of the elements are the pixel values of the pixels;
acquiring a current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix;
and acquiring the statistical number of people in the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.
According to the method for counting the number of people based on the video images, the setting parameters of the camera device in a shooting scene are obtained; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.
In order to achieve the above object, a second embodiment of the present invention provides a people counting device based on video images, including:
the parameter acquisition module is used for acquiring the setting parameters of the camera device in a shooting scene; wherein the setting parameters include: the installation height and the installation inclination angle of the camera device and the field angle of the camera device;
the weight matrix acquisition module is used for acquiring an imaging weight matrix when the human body is imaged in the shooting scene according to the setting parameters; elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the elements is the imaging weight of the pixel points;
the pixel matrix acquisition module is used for acquiring a pixel matrix of an image mask of the shooting scene;
the pixel value acquisition module is used for acquiring the current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix;
and the number obtaining module is used for obtaining the statistical number of people in the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.
According to the device for counting the number of people based on the video images, the setting parameters of the camera device in a shooting scene are obtained; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.
To achieve the above object, a third embodiment of the present invention provides a computer device, including: a processor and a memory;
wherein the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the video image-based people counting method according to the embodiment of the first aspect of the present invention.
In order to achieve the above object, a fourth embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements a video image-based people counting method according to an embodiment of the first aspect of the present invention.
In order to achieve the above object, a fifth embodiment of the present invention provides a computer program product, wherein instructions of the computer program product, when executed by a processor, implement the video image-based people counting method according to the first embodiment of the present invention.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart illustrating a method for counting people based on video images according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of perspective theory;
FIG. 3 is a schematic view of a stereo imaging system according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an imaging effect in an embodiment of the present invention;
FIG. 5 is a flow chart illustrating another method for counting people based on video images according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a people counting device based on video images according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of another apparatus for counting people based on video images according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The video image-based people counting method and apparatus according to the embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a method for counting people based on video images according to an embodiment of the present invention.
The method for counting the number of people based on the video images, provided by the embodiment of the invention, can be used for processing the video pre-recorded by the camera device, performing offline analysis and counting the number of people in the video images, or can be used for processing the video played online in real time and counting the number of people in the video images, and is not limited in this respect.
As shown in fig. 1, the method for counting people based on video images comprises the following steps:
step 101, acquiring setting parameters of a camera in a shooting scene; wherein setting the parameters includes: the mounting height of the image pickup device, the mounting inclination angle, and the field angle of the image pickup device.
It should be noted that, in order to avoid blocking a large area in the imaging picture by a person, in the embodiment of the present invention, the installation height of the image capturing device should be higher than the height of the human body. Further, the installation heights are different when the cameras are installed in different scenes, and for example, the installation height may be 2.5m when the camera is installed indoors, and may be 3.5m when the camera is installed outdoors.
Specifically, the installation height of the image pickup device may be obtained by measurement, for example, the installation height of the image pickup device may be obtained by using a length measurement sensor, or the installation height of the image pickup device may be directly measured by a scale, which is not limited thereto.
The installation inclination angle of the camera device is obtained by a manner that can be directly measured, and the installation inclination angle of the camera device can be measured by a protractor, for example. Or, the installation inclination angle of the camera device may be indirectly obtained through calculation, specifically, a central point of an image acquired by the camera device may be determined, and then a distance between a ground point corresponding to the central point and an installation position of the camera device is obtained, so that the installation inclination angle is indirectly calculated according to the distance and the installation height.
Since the size of the photosensitive element of the imaging device varies with the imaging device, the lenses having the same focal length have different imaging angles for imaging devices having photosensitive elements of different sizes. Therefore, the shooting ranges of different camera devices cannot be compared with the real focal length of the lens. Therefore, in the embodiment of the invention, the angle of view of the imaging device can be calculated according to the equivalent focal length of the imaging device.
Optionally, a manufacturer of the image capturing apparatus provides an equivalent focal length of the image capturing apparatus, so in the embodiment of the present invention, the equivalent focal length of the image capturing apparatus may be directly read, and then the angle of view may be calculated according to the equivalent focal length, for example, the angle of view may be calculated according to the following formula:
Figure GDA0003022254440000051
where θ denotes the angle of view, h denotes the image height, and L denotes the equivalent focal length.
102, acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to set parameters; and elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the element is the imaging weight of the pixel point.
It should be noted that when the distance or the orientation of the person from the imaging device is different, the size and the position of the person in the imaged image are different due to the perspective effect.
Specifically, in planar imaging, the relationship between the size of the real object and the size of the object in the imaged image can be obtained according to the perspective theory. For example, referring to fig. 2, fig. 2 is a schematic diagram of a perspective theory. According to the similar triangles AOB and AOB, the corresponding AB size of the real object AB in the imaging image is obtained as follows:
Figure GDA0003022254440000052
in stereo imaging, the mapping relationship between an object and a real object in an imaged image is referred to as a perspective relationship. The real object and the object in the imaging image have not only a size relationship but also an angle relationship, for example, refer to fig. 3, and fig. 3 is a schematic diagram of stereo imaging in an embodiment of the present invention. The balustrades in the areas 31 and 32 are as high as the road surface in reality and are perpendicular to the road surface, but the height of the balustrades in the areas 31 and 32 is significantly different from the road surface after the imaging picture is taken by the camera device and is not perpendicular to the road surface.
Alternatively, referring to fig. 4, fig. 4 is a schematic diagram of an imaging effect in an embodiment of the present invention. Here, the pedestrian 1 is farther from the imaging device, and therefore, the area occupied by the pedestrian 1 in the imaged image is smaller, and the pedestrian 2 is closer to the imaging device, and therefore, the area occupied by the pedestrian 2 in the imaged image is larger.
Therefore, in the embodiment of the present invention, in order to make the sizes of the people in the imaging images corresponding to the shooting scene consistent, the imaging weight of the pixel point corresponding to each imaging point in the imaging image can be obtained by using the perspective theory. Thus, the imaging point can be multiplied by the imaging weight corresponding to the imaging point, so that the sizes of the people in the imaging image corresponding to the shooting scene are consistent.
Specifically, the distance between each imaging point and the camera device can be calculated by using a perspective theory based on the assumed relationship that all imaging points are on the same horizontal plane in a shooting scene, then, the numerical values of the imaging points in the horizontal direction and the vertical direction of corresponding pixel points in a formed image can be respectively calculated according to the distance corresponding to each imaging point, then, the numerical values in the horizontal direction and the vertical direction are multiplied to obtain a product, the product is used as the weight of the pixel points, and then, the weight of each pixel point can be used to form an imaging weight matrix.
By introducing a perspective theory, after the imaging weight is obtained, the imaging point can be multiplied by the imaging weight corresponding to the imaging point according to the imaging characteristics of the near, the far and the small, so that the sizes of people in the imaging image corresponding to the shooting scene can be consistent. For example, see fig. 4, where weight represents an imaging weight, a graphical representation of the imaging weight: the larger the pedestrian, the smaller the weight, the smaller the pedestrian, the larger the weight. Since the pedestrian 1 is small and the pedestrian 2 is large, the imaging weight of the pedestrian 1 is larger than that of the pedestrian 2, and after weighting processing, the sizes of the pedestrian 1 and the pedestrian 2 in fig. 4 can be made to be the same.
Step 103, a pixel matrix of an image mask of a shooting scene is acquired.
As a possible implementation, a preset foreground extraction algorithm may be used to determine an image mask of a shooting scene. The preset foreground extraction algorithm may be an interframe difference algorithm, a static difference algorithm, or other foreground extraction algorithms. In this embodiment, the image mask of the captured scene is used to block an uninteresting region in a subsequently acquired image, for example, the uninteresting region may be a background portion in the image, and the amount of operation for extracting an interesting region or a foreground from the image can be reduced by blocking the uninteresting region in the image, where the foreground may be a person in the image.
By taking a preset foreground extraction algorithm as an example of a static difference algorithm, an image shot by a shooting device for a current shooting scene is subjected to interframe difference with a background image, and binarization processing is performed, so that an edge mask of a moving human body in the image shot by the current shooting scene can be obtained.
The background image may be an image shot immediately before the shooting device, or the background image may be an image corresponding to a specified unmanned shooting scene, or the background image may be an image acquired by the shooting device at an initial time, or the background image may be an image obtained by performing denoising processing on the image shot immediately before the shooting device, for example, the image shot immediately before the shooting device may be subjected to gaussian filtering processing to obtain the image subjected to denoising processing, which is not limited in this embodiment of the present invention.
In this embodiment, the first mask may be used to block a subsequently acquired image, and when a person in the subsequently acquired image changes, the person may be extracted through the first mask.
And 104, acquiring a current weighted pixel value of the shooting scene according to the pixel matrix and the imaging weight matrix of the image mask.
In the embodiment of the invention, the pixel matrix of the image mask and the imaging weight matrix can be multiplied and summed to obtain the current weighted pixel value of the shooting scene.
In the embodiment of the invention, the current weighted pixel value and the number of the statistical people have a corresponding mapping relation, so that after the current weighted pixel value of the shooting scene is obtained, the corresponding mapping relation can be inquired to obtain the number of the statistical people of the shooting scene, and the method is simple to operate and easy to realize.
And 105, acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.
In the embodiment of the invention, the mapping relation between the preset weighted pixel value and the number of people is established in advance.
Alternatively, the weighted pixel values have a corresponding curve relationship with the number of people, and a curve may be fitted in advance using the sample image, for example, a polynomial fitting may be performed using the sample image. Therefore, the curve relation between the weighted pixel value and the number of people can be obtained, and the mapping relation between the weighted pixel value and the number of people can be obtained. Therefore, after the current weighted pixel value is obtained, the number of people corresponding to the current weighted pixel value can be obtained as the number of the people counting the shooting scene by inquiring the corresponding mapping relation, and the method is simple to operate and easy to realize.
It should be noted that when the population density in the shooting scene is large, the occlusion is serious, and therefore, the curve obtained by the fitting is nonlinear.
According to the people counting method based on the video images, the setting parameters of the camera device in the shooting scene are obtained; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.
In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.
To clearly illustrate the above embodiment, this embodiment provides another people counting method based on video images, and fig. 5 is a schematic flow chart of the another people counting method based on video images according to the embodiment of the present invention.
As shown in fig. 5, based on the embodiment shown in fig. 1, step 103 specifically includes the following sub-steps:
in step 201, setting parameters of the camera in a shooting scene are acquired.
Wherein, setting parameters includes: the mounting height of the image pickup device, the mounting inclination angle, and the field angle of the image pickup device.
Step 202, acquiring an imaging weight matrix of the human body when the human body is imaged in a shooting scene according to the setting parameters.
And elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the element is the imaging weight of the pixel point.
The execution processes of steps 201 to 202 can refer to the execution processes of steps 101 to 102 in the above embodiments, which are not described herein again.
And step 203, taking the image collected by the camera device as a background image at the initial time.
Alternatively, an image acquired by the camera at the initial time may be used as a background image, for example, the background image is marked as image 0.
And step 204, acquiring two continuous frames of images of the shooting scene by the camera.
The two continuous frames of images comprise a first frame of image and a second frame of image, and the acquisition time of the second frame of image is later than that of the first frame of image.
Alternatively, two continuous images of the shooting scene, namely a first frame image and a second frame image, may be acquired by the camera, and for example, the first frame image is marked as image1, and the second frame image is marked as image2, where the capture time of image2 is later than that of image 1.
In step 205, inter-frame difference is performed on two continuous frames of images, and binarization processing is performed to obtain a first pixel matrix of the first mask.
Optionally, an inter-frame difference is performed on two continuous frames of images, and binarization processing is performed to obtain an edge mask of a moving human body in the second frame of image, which is referred to as a first pixel matrix of the first mask in the embodiment of the present invention.
Step 206, multiplying the first pixel matrix and the imaging weight matrix and summing to obtain a pixel sum value.
Alternatively, the first pixel matrix is multiplied by the imaging weight matrix and summed to obtain a pixel sum value, so that the size of the person in the second frame image can be made uniform.
Step 207, determining whether the sum of the pixel values is greater than a predetermined threshold, if yes, performing step 208 and step 211, otherwise, performing step 212.
In the embodiment of the present invention, the preset threshold is preset, and the preset threshold may be, for example, 10% of a pixel sum value of human body imaging.
Alternatively, when the pixel sum value is less than or equal to the preset threshold value, it indicates that the number of people in the first frame image and the second frame image is consistent, and there are no more people, at this time, the background image0 may be updated to the second frame image2, so that the real-time property of the background may be maintained. Further, the number of people in the second frame image may be set to 0, and the process proceeds to step 213. When the sum of pixels is greater than the predetermined threshold, which indicates that the number of people in the first frame image is different from the number of people in the second frame image, step 208 may be triggered to count the number of people in the second frame image.
And 208, performing interframe difference on the second frame image and the current background image, and performing binarization processing to obtain a second pixel matrix of the second mask.
Optionally, inter-frame difference between the second frame image and the current background image may be performed, and binarization processing may be performed to obtain a foreground mask in the second frame image, which is a second pixel matrix marked as a second mask in the embodiment of the present invention.
In the embodiment of the invention, the second mask is used for shielding the background of the subsequently acquired image, so that the foreground of the image, namely a person, can be extracted.
Step 209 is to use the second pixel matrix of the second mask as the pixel matrix of the image mask.
Further, in the embodiment of the present invention, the second pixel matrix may be compared with the first pixel matrix to obtain a third pixel matrix, and then the third pixel matrix is used as the pixel matrix of the image mask, so that the pixel matrix of the image mask with higher reliability may be obtained.
Step 210, obtaining a current weighted pixel value of the shooting scene according to the pixel matrix and the imaging weight matrix of the image mask.
And step 211, acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.
The execution processes of steps 210 to 211 can refer to the execution processes of steps 104 to 105 in the above embodiments, which are not described herein.
Step 212, updating the background image to a second frame image.
Objects within a scene tend to change in real time during the shooting of the scene. When it is recognized that the image change of two adjacent sampling moments is not large through step 207, which indicates that the change of the current scene is maintained in a constant state, the background image can be updated by using the second frame image, so that the change along with time can be ensured, the background condition of the shot scene can be updated in real time, and the image recognition can be more accurate.
In step 213, it is determined whether the video is finished, if yes, step 214 is executed, and if not, step 204 is executed.
Alternatively, when the video is not finished, the process returns to step 204 for the next people counting process, and when the video is finished, the process flow may be finished.
And step 214, ending.
In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved.
Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.
In order to implement the embodiment, the invention further provides a people counting device based on the video image.
Fig. 6 is a schematic structural diagram of a people counting device based on video images according to an embodiment of the present invention.
As shown in fig. 6, the video image-based people counting apparatus 100 includes: a parameter obtaining module 110, a weight matrix obtaining module 120, a pixel matrix obtaining module 130, a pixel value obtaining module 140, and a people number obtaining module 150. Wherein,
a parameter obtaining module 110, configured to obtain a setting parameter of the camera in a shooting scene; wherein setting the parameters includes: the mounting height of the image pickup device, the mounting inclination angle, and the field angle of the image pickup device.
As a possible implementation manner, the parameter obtaining module 110 is specifically configured to obtain the installation height and the installation inclination angle through a measurement manner; reading an equivalent focal length of a camera device; and calculating the angle of view according to the equivalent focal length.
As another possible implementation manner, the parameter obtaining module 110 is specifically configured to obtain the installation height through a measurement manner; determining a central point of an image acquired by a camera device; acquiring the distance between the ground point corresponding to the central point and the installation position of the camera device; determining an installation inclination angle according to the distance and the installation height; and calculating the angle of view according to the equivalent focal length.
The weight matrix obtaining module 120 is configured to obtain an imaging weight matrix when the human body is imaged in the shooting scene according to the setting parameter; and elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the element is the imaging weight of the pixel point.
As a possible implementation manner, the weight matrix obtaining module 120 is specifically configured to obtain a distance between each imaging point and the camera in a shooting scene; according to the distance, acquiring numerical values of the imaging point in the horizontal direction and the vertical direction of the corresponding pixel point in the formed image; multiplying the numerical values in the horizontal direction and the vertical direction to obtain the weight of the pixel point; and forming an imaging weight matrix by using the weight of each pixel point.
A pixel matrix obtaining module 130, configured to obtain a pixel matrix of an image mask of a shooting scene.
As a possible implementation manner, the pixel matrix obtaining module 130 is specifically configured to obtain two continuous frames of images of a shooting scene by the camera; performing frame-to-frame difference on two continuous frames of images, and performing binarization processing to obtain a first pixel matrix of a first mask; the first pixel matrix of the first mask is used as the pixel matrix of the image mask.
As another possible implementation manner, the pixel matrix obtaining module 130 is specifically configured to multiply and sum the first pixel matrix and the imaging weight matrix to obtain a pixel sum value; if the pixel sum value is larger than a preset threshold value, performing interframe difference on the second frame image and the background image and performing binarization processing to obtain a second pixel matrix of a second mask; the second pixel matrix of the second mask is used as the pixel matrix of the image mask.
Optionally, the pixel matrix obtaining module 130 is further configured to update the background image to the second frame image when the sum of the pixels is smaller than or equal to a preset threshold.
Optionally, the pixel matrix obtaining module 130 is further configured to use an image captured by the camera as a background image at an initial time.
As another possible implementation manner, the pixel matrix obtaining module 130 is specifically configured to perform gaussian filtering processing on an image captured immediately before the capturing device to obtain a background image of the captured scene; and performing interframe difference on the image acquired at the current moment and the background image, and performing binarization processing to obtain a pixel matrix of the image mask.
The pixel value obtaining module 140 is configured to obtain a current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix.
The people number obtaining module 150 is configured to obtain a statistical number of people in the shooting scene according to the current weighted pixel value and a mapping relationship between a preset weighted pixel value and the number of people.
Further, in a possible implementation manner of the embodiment of the present invention, referring to fig. 7, on the basis of the embodiment shown in fig. 6, the apparatus 100 for counting people based on video images may further include: the module 160 is updated.
And the updating module 160 is configured to perform inter-frame difference between the second frame image and the current background image, perform binarization processing to obtain a second pixel matrix of the second mask, and then perform an addition or subtraction between the second pixel matrix and the first pixel matrix to obtain a pixel matrix of the image mask.
It should be noted that the foregoing explanation of the embodiment of the method for counting people based on video images is also applicable to the apparatus 100 for counting people based on video images of this embodiment, and will not be described herein again.
The people counting device based on the video images of the embodiment acquires the setting parameters of the camera device in the shooting scene; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.
In order to implement the foregoing embodiment, the present invention further provides a computer device, including: a processor and a memory; wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the video image-based people counting method as proposed by the aforementioned embodiment of the present invention.
In order to achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a video image-based people counting method as proposed by the foregoing embodiments of the present invention.
In order to implement the above embodiments, the present invention further proposes a computer program product, wherein instructions of the computer program product, when executed by a processor, implement the video image-based people counting method as proposed by the foregoing embodiments of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (13)

1. A people counting method based on video images is characterized by comprising the following steps:
acquiring setting parameters of a camera device in a shooting scene; wherein the setting parameters include: the installation height and the installation inclination angle of the camera device and the field angle of the camera device;
acquiring an imaging weight matrix of a human body when the human body is imaged in the shooting scene according to the setting parameters; elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the elements is the imaging weight of the pixel points;
acquiring a pixel matrix of an image mask of the shooting scene, the acquiring the pixel matrix of the image mask of the shooting scene comprising: acquiring two continuous frames of images of the shooting scene by the camera device; performing interframe difference on the two continuous frames of images, performing binarization processing to obtain a first pixel matrix of a first mask, multiplying the first pixel matrix by the imaging weight matrix, and summing to obtain a pixel sum value; if the pixel sum value is larger than a preset threshold value, performing interframe difference on the second frame image and the background image and performing binarization processing to obtain a second pixel matrix of a second mask; using a second pixel matrix of the second mask as a pixel matrix of the image mask;
acquiring a current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix;
and acquiring the statistical number of people in the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.
2. The method of claim 1, wherein the acquiring a pixel matrix of an image mask of the captured scene comprises:
if the pixel sum value is less than or equal to the preset threshold value, the first pixel matrix of the first mask is used as the pixel matrix of the image mask.
3. The method according to claim 1, wherein after the inter-frame difference between the second frame image and the current background image is performed and the binarization processing is performed to obtain the second pixel matrix of the second mask, the method further comprises:
and performing phase OR on the second pixel matrix and the first pixel matrix to obtain a pixel matrix of the image mask.
4. The method of claim 1, further comprising:
and if the pixel sum value is less than or equal to the preset threshold value, updating the background image into the second frame image.
5. The method according to claim 1, wherein before the inter-frame differencing and binarizing the two consecutive images to obtain the first pixel matrix of the first mask, the method further comprises:
and at the initial moment, taking the image collected by the camera device as the background image.
6. The method of claim 1, wherein the acquiring a pixel matrix of an image mask of the captured scene comprises:
performing Gaussian filtering processing on an image shot immediately before the camera device to obtain a background image of the shooting scene;
and performing interframe difference on the image acquired at the current moment and the background image, and performing binarization processing to obtain a pixel matrix of the image mask.
7. The method according to any one of claims 1 to 6, wherein the obtaining an imaging weight matrix of the human body when the human body is imaged in the shooting scene according to the setting parameters comprises:
acquiring the distance between each imaging point and the camera device in the shooting scene;
according to the distance, acquiring numerical values of the imaging point in the horizontal direction and the vertical direction of the corresponding pixel point in the formed image;
multiplying the numerical values in the horizontal direction and the vertical direction to obtain the weight of the pixel point;
and forming the imaging weight matrix by using the weight of each pixel point.
8. The method according to any one of claims 1 to 6, wherein the acquiring of the setting parameters of the camera in the shooting scene comprises:
acquiring the installation height and the installation inclination angle in a measuring mode;
reading the equivalent focal length of the camera device;
and calculating to obtain the field angle according to the equivalent focal length.
9. The method according to any one of claims 1 to 6, wherein the acquiring of the setting parameters of the camera in the shooting scene comprises:
acquiring the installation height in a measuring mode;
determining a central point of an image acquired by the camera device;
acquiring the distance between the ground point corresponding to the central point and the installation position of the camera device;
determining the installation inclination angle according to the distance and the installation height;
and calculating the angle of view according to the equivalent focal length.
10. A video image-based people counting device, comprising:
the parameter acquisition module is used for acquiring the setting parameters of the camera device in a shooting scene; wherein the setting parameters include: the installation height and the installation inclination angle of the camera device and the field angle of the camera device;
the weight matrix acquisition module is used for acquiring an imaging weight matrix when the human body is imaged in the shooting scene according to the setting parameters; elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the elements is the imaging weight of the pixel points;
the device comprises a pixel matrix acquisition module, a pixel matrix acquisition module and a pixel matrix analysis module, wherein the pixel matrix acquisition module is used for acquiring a pixel matrix of an image mask of the shooting scene, and the pixel matrix acquisition module is used for acquiring two continuous frames of images of the shooting scene by the camera device; performing interframe difference on the two continuous frames of images, performing binarization processing to obtain a first pixel matrix of a first mask, multiplying the first pixel matrix by the imaging weight matrix, and summing to obtain a pixel sum value; if the pixel sum value is larger than a preset threshold value, performing interframe difference on the second frame image and the background image and performing binarization processing to obtain a second pixel matrix of a second mask; using a second pixel matrix of the second mask as a pixel matrix of the image mask;
the pixel value acquisition module is used for acquiring the current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix;
and the number obtaining module is used for obtaining the statistical number of people in the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.
11. A computer device comprising a processor and a memory;
wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the video image based demographics method as claimed in any one of claims 1 to 9.
12. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the video image-based people counting method according to any one of claims 1 to 9.
13. A computer program medium, characterized in that instructions in the computer program medium, when executed by a processor, implement the video image based people counting method according to any one of claims 1-9.
CN201810014369.6A 2018-01-08 2018-01-08 People counting method, device and equipment based on video image and storage medium Active CN110020572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810014369.6A CN110020572B (en) 2018-01-08 2018-01-08 People counting method, device and equipment based on video image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810014369.6A CN110020572B (en) 2018-01-08 2018-01-08 People counting method, device and equipment based on video image and storage medium

Publications (2)

Publication Number Publication Date
CN110020572A CN110020572A (en) 2019-07-16
CN110020572B true CN110020572B (en) 2021-08-10

Family

ID=67187315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810014369.6A Active CN110020572B (en) 2018-01-08 2018-01-08 People counting method, device and equipment based on video image and storage medium

Country Status (1)

Country Link
CN (1) CN110020572B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508239B (en) * 2020-04-16 2022-03-01 成都旸谷信息技术有限公司 Intelligent vehicle flow identification method and system based on mask matrix
CN116506473B (en) * 2023-06-29 2023-09-22 北京格林威尔科技发展有限公司 Early warning method and device based on intelligent door lock

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982341A (en) * 2012-11-01 2013-03-20 南京师范大学 Self-intended crowd density estimation method for camera capable of straddling
CN105096292A (en) * 2014-04-30 2015-11-25 株式会社理光 Object quantity estimation method and device
CN106127812A (en) * 2016-06-28 2016-11-16 中山大学 A kind of passenger flow statistical method of non-gate area, passenger station based on video monitoring
CN106250828A (en) * 2016-07-22 2016-12-21 中山大学 A kind of people counting method based on the LBP operator improved

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI620148B (en) * 2016-04-28 2018-04-01 新加坡商雲網科技新加坡有限公司 Device and method for monitoring, method for counting people at a location

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982341A (en) * 2012-11-01 2013-03-20 南京师范大学 Self-intended crowd density estimation method for camera capable of straddling
CN105096292A (en) * 2014-04-30 2015-11-25 株式会社理光 Object quantity estimation method and device
CN106127812A (en) * 2016-06-28 2016-11-16 中山大学 A kind of passenger flow statistical method of non-gate area, passenger station based on video monitoring
CN106250828A (en) * 2016-07-22 2016-12-21 中山大学 A kind of people counting method based on the LBP operator improved

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ON PIXEL COUNT BASED CROWD DENSITY ESTIMATION FOR VISUAL SURVEILLANCE;Ruihua MA, Liyuan LI, Weimin HUANG, Qi TlAN;《Conference on Cybernetics and Intelligent Systems Singapore》;20041203;第170-173页 *

Also Published As

Publication number Publication date
CN110020572A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN109949347B (en) Human body tracking method, device, system, electronic equipment and storage medium
EP2549738B1 (en) Method and camera for determining an image adjustment parameter
JP6464337B2 (en) Traffic camera calibration update using scene analysis
CN108229475B (en) Vehicle tracking method, system, computer device and readable storage medium
US20180151063A1 (en) Real-time detection system for parked vehicles
Carr et al. Monocular object detection using 3d geometric primitives
KR101787542B1 (en) Estimation system and method of slope stability using 3d model and soil classification
WO2018052547A1 (en) An automatic scene calibration method for video analytics
KR20130030220A (en) Fast obstacle detection
US10692225B2 (en) System and method for detecting moving object in an image
KR20150027291A (en) Optical flow tracking method and apparatus
CN111524091B (en) Information processing apparatus, information processing method, and storage medium
JP7354767B2 (en) Object tracking device and object tracking method
CN110020572B (en) People counting method, device and equipment based on video image and storage medium
JP4691570B2 (en) Image processing apparatus and object estimation program
CN112053397A (en) Image processing method, image processing device, electronic equipment and storage medium
KR101290517B1 (en) Photographing apparatus for tracking object and method thereof
JP7243372B2 (en) Object tracking device and object tracking method
US10757318B2 (en) Determination of a contrast value for a digital image
CN109242900B (en) Focal plane positioning method, processing device, focal plane positioning system and storage medium
Bravo et al. Outdoor vacant parking space detector for improving mobility in smart cities
KR101241813B1 (en) Apparatus and method for detecting objects in panoramic images using gpu
CN107818287B (en) Passenger flow statistics device and system
JP6348020B2 (en) Image processing apparatus, image processing method, and inspection method using the same
CN110826455A (en) Target identification method and image processing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant