CN110020572B

CN110020572B - People counting method, device and equipment based on video image and storage medium

Info

Publication number: CN110020572B
Application number: CN201810014369.6A
Authority: CN
Inventors: 胡香敏
Original assignee: BYD Co Ltd
Current assignee: BYD Co Ltd
Priority date: 2018-01-08
Filing date: 2018-01-08
Publication date: 2021-08-10
Anticipated expiration: 2038-01-08
Also published as: CN110020572A

Abstract

The invention provides a method and a device for counting people based on video images, wherein the method comprises the following steps: acquiring setting parameters of a camera device in a shooting scene; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. The applicability of the method can be effectively improved. In addition, the method can avoid the operation complexity caused by feature detection, reduce the system resource requirement and further improve the real-time performance of the system.

Description

People counting method, device and equipment based on video image and storage medium

Technical Field

The invention relates to the technical field of video image processing, in particular to a people counting method and device based on video images.

Background

With the continuous development of video image processing technology, the real-time people counting can be carried out on places such as supermarkets, banks, subways and the like through video monitoring, so that the statistical information such as passenger flow distribution, crowding degree and the like can be obtained, and effective reference data can be provided for works such as public area management, resource scheduling and the like.

Conventionally, people counting is performed by extracting features of pedestrians from an image captured by an imaging device, and features such as a head-shoulder shape feature, a face feature, and a pedestrian direction Gradient Histogram (HOG) feature can be extracted. Specifically, the extraction of features and the pedestrian positioning can be realized through a machine learning method, for example, a convolutional neural network can be used for training a pedestrian image detector, then, an image acquired by the camera device is input into the trained pedestrian image detector, and the position of a pedestrian can be obtained, so that the number of people can be counted.

However, in different shooting scenes, the features selected by the machine learning method have large differences, which may result in that the human image cannot be recognized, for example, a detection algorithm based on human face features may cause missing detection when a person wears a mask or caps cover key features of the face. Therefore, when the shooting scene changes, the samples need to be extracted again and the pedestrian image detector needs to be retrained, so that the system can work normally in the new shooting scene, and the workload is large. In addition, the feature detection has high computational complexity, and needs to extract a Region of Interest (ROI), run a feature extraction algorithm and a classification algorithm, which has a high demand on computational resources, so that the real-time requirement is difficult to achieve.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present invention is to provide a method for counting people based on video images, so as to calculate a new imaging weight matrix only by knowing an installation height, an installation tilt angle, and a field angle of an image capturing device in different shooting scenes, thereby adapting to the new scenes, avoiding extra work such as resampling and retraining a detector, and effectively improving applicability of the method while reducing workload. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.

The second purpose of the invention is to provide a people counting device based on video images.

A third object of the invention is to propose a computer device.

A fourth object of the invention is to propose a non-transitory computer-readable storage medium.

A fifth object of the invention is to propose a computer program product.

In order to achieve the above object, a first embodiment of the present invention provides a method for counting people based on video images, including:

acquiring setting parameters of a camera device in a shooting scene; wherein the setting parameters include: the installation height and the installation inclination angle of the camera device and the field angle of the camera device;

acquiring an imaging weight matrix of a human body when the human body is imaged in the shooting scene according to the setting parameters; elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the elements is the imaging weight of the pixel points;

acquiring a pixel matrix of an image mask of the shooting scene; the elements in the first pixel matrix correspond to the pixels in the image one by one, and the values of the elements are the pixel values of the pixels;

acquiring a current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix;

and acquiring the statistical number of people in the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.

According to the method for counting the number of people based on the video images, the setting parameters of the camera device in a shooting scene are obtained; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.

In order to achieve the above object, a second embodiment of the present invention provides a people counting device based on video images, including:

the parameter acquisition module is used for acquiring the setting parameters of the camera device in a shooting scene; wherein the setting parameters include: the installation height and the installation inclination angle of the camera device and the field angle of the camera device;

the weight matrix acquisition module is used for acquiring an imaging weight matrix when the human body is imaged in the shooting scene according to the setting parameters; elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the elements is the imaging weight of the pixel points;

the pixel matrix acquisition module is used for acquiring a pixel matrix of an image mask of the shooting scene;

the pixel value acquisition module is used for acquiring the current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix;

and the number obtaining module is used for obtaining the statistical number of people in the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.

According to the device for counting the number of people based on the video images, the setting parameters of the camera device in a shooting scene are obtained; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.

To achieve the above object, a third embodiment of the present invention provides a computer device, including: a processor and a memory;

wherein the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the video image-based people counting method according to the embodiment of the first aspect of the present invention.

In order to achieve the above object, a fourth embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements a video image-based people counting method according to an embodiment of the first aspect of the present invention.

In order to achieve the above object, a fifth embodiment of the present invention provides a computer program product, wherein instructions of the computer program product, when executed by a processor, implement the video image-based people counting method according to the first embodiment of the present invention.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart illustrating a method for counting people based on video images according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of perspective theory;

FIG. 3 is a schematic view of a stereo imaging system according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an imaging effect in an embodiment of the present invention;

FIG. 5 is a flow chart illustrating another method for counting people based on video images according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a people counting device based on video images according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another apparatus for counting people based on video images according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The video image-based people counting method and apparatus according to the embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a method for counting people based on video images according to an embodiment of the present invention.

The method for counting the number of people based on the video images, provided by the embodiment of the invention, can be used for processing the video pre-recorded by the camera device, performing offline analysis and counting the number of people in the video images, or can be used for processing the video played online in real time and counting the number of people in the video images, and is not limited in this respect.

As shown in fig. 1, the method for counting people based on video images comprises the following steps:

step 101, acquiring setting parameters of a camera in a shooting scene; wherein setting the parameters includes: the mounting height of the image pickup device, the mounting inclination angle, and the field angle of the image pickup device.

It should be noted that, in order to avoid blocking a large area in the imaging picture by a person, in the embodiment of the present invention, the installation height of the image capturing device should be higher than the height of the human body. Further, the installation heights are different when the cameras are installed in different scenes, and for example, the installation height may be 2.5m when the camera is installed indoors, and may be 3.5m when the camera is installed outdoors.

Specifically, the installation height of the image pickup device may be obtained by measurement, for example, the installation height of the image pickup device may be obtained by using a length measurement sensor, or the installation height of the image pickup device may be directly measured by a scale, which is not limited thereto.

The installation inclination angle of the camera device is obtained by a manner that can be directly measured, and the installation inclination angle of the camera device can be measured by a protractor, for example. Or, the installation inclination angle of the camera device may be indirectly obtained through calculation, specifically, a central point of an image acquired by the camera device may be determined, and then a distance between a ground point corresponding to the central point and an installation position of the camera device is obtained, so that the installation inclination angle is indirectly calculated according to the distance and the installation height.

Since the size of the photosensitive element of the imaging device varies with the imaging device, the lenses having the same focal length have different imaging angles for imaging devices having photosensitive elements of different sizes. Therefore, the shooting ranges of different camera devices cannot be compared with the real focal length of the lens. Therefore, in the embodiment of the invention, the angle of view of the imaging device can be calculated according to the equivalent focal length of the imaging device.

Optionally, a manufacturer of the image capturing apparatus provides an equivalent focal length of the image capturing apparatus, so in the embodiment of the present invention, the equivalent focal length of the image capturing apparatus may be directly read, and then the angle of view may be calculated according to the equivalent focal length, for example, the angle of view may be calculated according to the following formula:

where θ denotes the angle of view, h denotes the image height, and L denotes the equivalent focal length.

102, acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to set parameters; and elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the element is the imaging weight of the pixel point.

It should be noted that when the distance or the orientation of the person from the imaging device is different, the size and the position of the person in the imaged image are different due to the perspective effect.

Specifically, in planar imaging, the relationship between the size of the real object and the size of the object in the imaged image can be obtained according to the perspective theory. For example, referring to fig. 2, fig. 2 is a schematic diagram of a perspective theory. According to the similar triangles AOB and AOB, the corresponding AB size of the real object AB in the imaging image is obtained as follows:

in stereo imaging, the mapping relationship between an object and a real object in an imaged image is referred to as a perspective relationship. The real object and the object in the imaging image have not only a size relationship but also an angle relationship, for example, refer to fig. 3, and fig. 3 is a schematic diagram of stereo imaging in an embodiment of the present invention. The balustrades in the

areas

31 and 32 are as high as the road surface in reality and are perpendicular to the road surface, but the height of the balustrades in the

areas

31 and 32 is significantly different from the road surface after the imaging picture is taken by the camera device and is not perpendicular to the road surface.

Alternatively, referring to fig. 4, fig. 4 is a schematic diagram of an imaging effect in an embodiment of the present invention. Here, the pedestrian 1 is farther from the imaging device, and therefore, the area occupied by the pedestrian 1 in the imaged image is smaller, and the pedestrian 2 is closer to the imaging device, and therefore, the area occupied by the pedestrian 2 in the imaged image is larger.

Therefore, in the embodiment of the present invention, in order to make the sizes of the people in the imaging images corresponding to the shooting scene consistent, the imaging weight of the pixel point corresponding to each imaging point in the imaging image can be obtained by using the perspective theory. Thus, the imaging point can be multiplied by the imaging weight corresponding to the imaging point, so that the sizes of the people in the imaging image corresponding to the shooting scene are consistent.

Specifically, the distance between each imaging point and the camera device can be calculated by using a perspective theory based on the assumed relationship that all imaging points are on the same horizontal plane in a shooting scene, then, the numerical values of the imaging points in the horizontal direction and the vertical direction of corresponding pixel points in a formed image can be respectively calculated according to the distance corresponding to each imaging point, then, the numerical values in the horizontal direction and the vertical direction are multiplied to obtain a product, the product is used as the weight of the pixel points, and then, the weight of each pixel point can be used to form an imaging weight matrix.

By introducing a perspective theory, after the imaging weight is obtained, the imaging point can be multiplied by the imaging weight corresponding to the imaging point according to the imaging characteristics of the near, the far and the small, so that the sizes of people in the imaging image corresponding to the shooting scene can be consistent. For example, see fig. 4, where weight represents an imaging weight, a graphical representation of the imaging weight: the larger the pedestrian, the smaller the weight, the smaller the pedestrian, the larger the weight. Since the pedestrian 1 is small and the pedestrian 2 is large, the imaging weight of the pedestrian 1 is larger than that of the pedestrian 2, and after weighting processing, the sizes of the pedestrian 1 and the pedestrian 2 in fig. 4 can be made to be the same.

Step 103, a pixel matrix of an image mask of a shooting scene is acquired.

As a possible implementation, a preset foreground extraction algorithm may be used to determine an image mask of a shooting scene. The preset foreground extraction algorithm may be an interframe difference algorithm, a static difference algorithm, or other foreground extraction algorithms. In this embodiment, the image mask of the captured scene is used to block an uninteresting region in a subsequently acquired image, for example, the uninteresting region may be a background portion in the image, and the amount of operation for extracting an interesting region or a foreground from the image can be reduced by blocking the uninteresting region in the image, where the foreground may be a person in the image.

By taking a preset foreground extraction algorithm as an example of a static difference algorithm, an image shot by a shooting device for a current shooting scene is subjected to interframe difference with a background image, and binarization processing is performed, so that an edge mask of a moving human body in the image shot by the current shooting scene can be obtained.

The background image may be an image shot immediately before the shooting device, or the background image may be an image corresponding to a specified unmanned shooting scene, or the background image may be an image acquired by the shooting device at an initial time, or the background image may be an image obtained by performing denoising processing on the image shot immediately before the shooting device, for example, the image shot immediately before the shooting device may be subjected to gaussian filtering processing to obtain the image subjected to denoising processing, which is not limited in this embodiment of the present invention.

In this embodiment, the first mask may be used to block a subsequently acquired image, and when a person in the subsequently acquired image changes, the person may be extracted through the first mask.

And 104, acquiring a current weighted pixel value of the shooting scene according to the pixel matrix and the imaging weight matrix of the image mask.

In the embodiment of the invention, the pixel matrix of the image mask and the imaging weight matrix can be multiplied and summed to obtain the current weighted pixel value of the shooting scene.

In the embodiment of the invention, the current weighted pixel value and the number of the statistical people have a corresponding mapping relation, so that after the current weighted pixel value of the shooting scene is obtained, the corresponding mapping relation can be inquired to obtain the number of the statistical people of the shooting scene, and the method is simple to operate and easy to realize.

And 105, acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.

In the embodiment of the invention, the mapping relation between the preset weighted pixel value and the number of people is established in advance.

Alternatively, the weighted pixel values have a corresponding curve relationship with the number of people, and a curve may be fitted in advance using the sample image, for example, a polynomial fitting may be performed using the sample image. Therefore, the curve relation between the weighted pixel value and the number of people can be obtained, and the mapping relation between the weighted pixel value and the number of people can be obtained. Therefore, after the current weighted pixel value is obtained, the number of people corresponding to the current weighted pixel value can be obtained as the number of the people counting the shooting scene by inquiring the corresponding mapping relation, and the method is simple to operate and easy to realize.

It should be noted that when the population density in the shooting scene is large, the occlusion is serious, and therefore, the curve obtained by the fitting is nonlinear.

According to the people counting method based on the video images, the setting parameters of the camera device in the shooting scene are obtained; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.

In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.

To clearly illustrate the above embodiment, this embodiment provides another people counting method based on video images, and fig. 5 is a schematic flow chart of the another people counting method based on video images according to the embodiment of the present invention.

As shown in fig. 5, based on the embodiment shown in fig. 1, step 103 specifically includes the following sub-steps:

in step 201, setting parameters of the camera in a shooting scene are acquired.

Wherein, setting parameters includes: the mounting height of the image pickup device, the mounting inclination angle, and the field angle of the image pickup device.

Step 202, acquiring an imaging weight matrix of the human body when the human body is imaged in a shooting scene according to the setting parameters.

And elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the element is the imaging weight of the pixel point.

The execution processes of steps 201 to 202 can refer to the execution processes of steps 101 to 102 in the above embodiments, which are not described herein again.

And step 203, taking the image collected by the camera device as a background image at the initial time.

Alternatively, an image acquired by the camera at the initial time may be used as a background image, for example, the background image is marked as image 0.

And step 204, acquiring two continuous frames of images of the shooting scene by the camera.

The two continuous frames of images comprise a first frame of image and a second frame of image, and the acquisition time of the second frame of image is later than that of the first frame of image.

Alternatively, two continuous images of the shooting scene, namely a first frame image and a second frame image, may be acquired by the camera, and for example, the first frame image is marked as image1, and the second frame image is marked as image2, where the capture time of image2 is later than that of image 1.

In step 205, inter-frame difference is performed on two continuous frames of images, and binarization processing is performed to obtain a first pixel matrix of the first mask.

Optionally, an inter-frame difference is performed on two continuous frames of images, and binarization processing is performed to obtain an edge mask of a moving human body in the second frame of image, which is referred to as a first pixel matrix of the first mask in the embodiment of the present invention.

Step 206, multiplying the first pixel matrix and the imaging weight matrix and summing to obtain a pixel sum value.

Alternatively, the first pixel matrix is multiplied by the imaging weight matrix and summed to obtain a pixel sum value, so that the size of the person in the second frame image can be made uniform.

Step 207, determining whether the sum of the pixel values is greater than a predetermined threshold, if yes, performing step 208 and step 211, otherwise, performing step 212.

In the embodiment of the present invention, the preset threshold is preset, and the preset threshold may be, for example, 10% of a pixel sum value of human body imaging.

Alternatively, when the pixel sum value is less than or equal to the preset threshold value, it indicates that the number of people in the first frame image and the second frame image is consistent, and there are no more people, at this time, the background image0 may be updated to the second frame image2, so that the real-time property of the background may be maintained. Further, the number of people in the second frame image may be set to 0, and the process proceeds to step 213. When the sum of pixels is greater than the predetermined threshold, which indicates that the number of people in the first frame image is different from the number of people in the second frame image, step 208 may be triggered to count the number of people in the second frame image.

And 208, performing interframe difference on the second frame image and the current background image, and performing binarization processing to obtain a second pixel matrix of the second mask.

Optionally, inter-frame difference between the second frame image and the current background image may be performed, and binarization processing may be performed to obtain a foreground mask in the second frame image, which is a second pixel matrix marked as a second mask in the embodiment of the present invention.

In the embodiment of the invention, the second mask is used for shielding the background of the subsequently acquired image, so that the foreground of the image, namely a person, can be extracted.

Step 209 is to use the second pixel matrix of the second mask as the pixel matrix of the image mask.

Further, in the embodiment of the present invention, the second pixel matrix may be compared with the first pixel matrix to obtain a third pixel matrix, and then the third pixel matrix is used as the pixel matrix of the image mask, so that the pixel matrix of the image mask with higher reliability may be obtained.

Step 210, obtaining a current weighted pixel value of the shooting scene according to the pixel matrix and the imaging weight matrix of the image mask.

And step 211, acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people.

The execution processes of steps 210 to 211 can refer to the execution processes of steps 104 to 105 in the above embodiments, which are not described herein.

Step 212, updating the background image to a second frame image.

Objects within a scene tend to change in real time during the shooting of the scene. When it is recognized that the image change of two adjacent sampling moments is not large through step 207, which indicates that the change of the current scene is maintained in a constant state, the background image can be updated by using the second frame image, so that the change along with time can be ensured, the background condition of the shot scene can be updated in real time, and the image recognition can be more accurate.

In step 213, it is determined whether the video is finished, if yes, step 214 is executed, and if not, step 204 is executed.

Alternatively, when the video is not finished, the process returns to step 204 for the next people counting process, and when the video is finished, the process flow may be finished.

And step 214, ending.

In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved.

Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.

In order to implement the embodiment, the invention further provides a people counting device based on the video image.

Fig. 6 is a schematic structural diagram of a people counting device based on video images according to an embodiment of the present invention.

As shown in fig. 6, the video image-based people counting apparatus 100 includes: a parameter obtaining module 110, a weight matrix obtaining module 120, a pixel matrix obtaining module 130, a pixel value obtaining module 140, and a people number obtaining module 150. Wherein,

a parameter obtaining module 110, configured to obtain a setting parameter of the camera in a shooting scene; wherein setting the parameters includes: the mounting height of the image pickup device, the mounting inclination angle, and the field angle of the image pickup device.

As a possible implementation manner, the parameter obtaining module 110 is specifically configured to obtain the installation height and the installation inclination angle through a measurement manner; reading an equivalent focal length of a camera device; and calculating the angle of view according to the equivalent focal length.

As another possible implementation manner, the parameter obtaining module 110 is specifically configured to obtain the installation height through a measurement manner; determining a central point of an image acquired by a camera device; acquiring the distance between the ground point corresponding to the central point and the installation position of the camera device; determining an installation inclination angle according to the distance and the installation height; and calculating the angle of view according to the equivalent focal length.

The weight matrix obtaining module 120 is configured to obtain an imaging weight matrix when the human body is imaged in the shooting scene according to the setting parameter; and elements in the imaging weight matrix correspond to pixel points in the formed image one by one, and the value of the element is the imaging weight of the pixel point.

As a possible implementation manner, the weight matrix obtaining module 120 is specifically configured to obtain a distance between each imaging point and the camera in a shooting scene; according to the distance, acquiring numerical values of the imaging point in the horizontal direction and the vertical direction of the corresponding pixel point in the formed image; multiplying the numerical values in the horizontal direction and the vertical direction to obtain the weight of the pixel point; and forming an imaging weight matrix by using the weight of each pixel point.

A pixel matrix obtaining module 130, configured to obtain a pixel matrix of an image mask of a shooting scene.

As a possible implementation manner, the pixel matrix obtaining module 130 is specifically configured to obtain two continuous frames of images of a shooting scene by the camera; performing frame-to-frame difference on two continuous frames of images, and performing binarization processing to obtain a first pixel matrix of a first mask; the first pixel matrix of the first mask is used as the pixel matrix of the image mask.

As another possible implementation manner, the pixel matrix obtaining module 130 is specifically configured to multiply and sum the first pixel matrix and the imaging weight matrix to obtain a pixel sum value; if the pixel sum value is larger than a preset threshold value, performing interframe difference on the second frame image and the background image and performing binarization processing to obtain a second pixel matrix of a second mask; the second pixel matrix of the second mask is used as the pixel matrix of the image mask.

Optionally, the pixel matrix obtaining module 130 is further configured to update the background image to the second frame image when the sum of the pixels is smaller than or equal to a preset threshold.

Optionally, the pixel matrix obtaining module 130 is further configured to use an image captured by the camera as a background image at an initial time.

As another possible implementation manner, the pixel matrix obtaining module 130 is specifically configured to perform gaussian filtering processing on an image captured immediately before the capturing device to obtain a background image of the captured scene; and performing interframe difference on the image acquired at the current moment and the background image, and performing binarization processing to obtain a pixel matrix of the image mask.

The pixel value obtaining module 140 is configured to obtain a current weighted pixel value of the shooting scene according to the pixel matrix of the image mask and the imaging weight matrix.

The people number obtaining module 150 is configured to obtain a statistical number of people in the shooting scene according to the current weighted pixel value and a mapping relationship between a preset weighted pixel value and the number of people.

Further, in a possible implementation manner of the embodiment of the present invention, referring to fig. 7, on the basis of the embodiment shown in fig. 6, the apparatus 100 for counting people based on video images may further include: the module 160 is updated.

And the updating module 160 is configured to perform inter-frame difference between the second frame image and the current background image, perform binarization processing to obtain a second pixel matrix of the second mask, and then perform an addition or subtraction between the second pixel matrix and the first pixel matrix to obtain a pixel matrix of the image mask.

It should be noted that the foregoing explanation of the embodiment of the method for counting people based on video images is also applicable to the apparatus 100 for counting people based on video images of this embodiment, and will not be described herein again.

The people counting device based on the video images of the embodiment acquires the setting parameters of the camera device in the shooting scene; acquiring an imaging weight matrix of a human body when the human body is imaged in a shooting scene according to the setting parameters; acquiring a pixel matrix of an image mask of a shooting scene; acquiring a current weighted pixel value of a shooting scene according to a pixel matrix and an imaging weight matrix of an image mask; and acquiring the number of the statistical people of the shooting scene according to the current weighted pixel value and the mapping relation between the preset weighted pixel value and the number of people. In the embodiment, under different shooting scenes, a new imaging weight matrix can be obtained by calculation only by knowing the installation height, the installation inclination angle and the field angle of the camera, so that the method is suitable for the new scene, the extra work of resampling, retraining the detector and the like is avoided, the workload is reduced, and meanwhile, the applicability of the method can be effectively improved. Furthermore, the current weighted pixel value of the shooting scene is obtained by weighting and summing the image mask of the current shooting scene, then the number of the statistical people of the shooting scene is obtained by inquiring the mapping relation between the preset weighted pixel value and the number of people, which is established in advance, of the current weighted pixel value, the operation is simple, the operation complexity caused by feature detection is avoided, the system resource requirement is reduced, and the real-time performance of the system is improved.

In order to implement the foregoing embodiment, the present invention further provides a computer device, including: a processor and a memory; wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the video image-based people counting method as proposed by the aforementioned embodiment of the present invention.

In order to achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a video image-based people counting method as proposed by the foregoing embodiments of the present invention.

In order to implement the above embodiments, the present invention further proposes a computer program product, wherein instructions of the computer program product, when executed by a processor, implement the video image-based people counting method as proposed by the foregoing embodiments of the present invention.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A people counting method based on video images is characterized by comprising the following steps:

acquiring a pixel matrix of an image mask of the shooting scene, the acquiring the pixel matrix of the image mask of the shooting scene comprising: acquiring two continuous frames of images of the shooting scene by the camera device; performing interframe difference on the two continuous frames of images, performing binarization processing to obtain a first pixel matrix of a first mask, multiplying the first pixel matrix by the imaging weight matrix, and summing to obtain a pixel sum value; if the pixel sum value is larger than a preset threshold value, performing interframe difference on the second frame image and the background image and performing binarization processing to obtain a second pixel matrix of a second mask; using a second pixel matrix of the second mask as a pixel matrix of the image mask;

2. The method of claim 1, wherein the acquiring a pixel matrix of an image mask of the captured scene comprises:

if the pixel sum value is less than or equal to the preset threshold value, the first pixel matrix of the first mask is used as the pixel matrix of the image mask.

3. The method according to claim 1, wherein after the inter-frame difference between the second frame image and the current background image is performed and the binarization processing is performed to obtain the second pixel matrix of the second mask, the method further comprises:

and performing phase OR on the second pixel matrix and the first pixel matrix to obtain a pixel matrix of the image mask.

4. The method of claim 1, further comprising:

and if the pixel sum value is less than or equal to the preset threshold value, updating the background image into the second frame image.

5. The method according to claim 1, wherein before the inter-frame differencing and binarizing the two consecutive images to obtain the first pixel matrix of the first mask, the method further comprises:

and at the initial moment, taking the image collected by the camera device as the background image.

6. The method of claim 1, wherein the acquiring a pixel matrix of an image mask of the captured scene comprises:

performing Gaussian filtering processing on an image shot immediately before the camera device to obtain a background image of the shooting scene;

and performing interframe difference on the image acquired at the current moment and the background image, and performing binarization processing to obtain a pixel matrix of the image mask.

7. The method according to any one of claims 1 to 6, wherein the obtaining an imaging weight matrix of the human body when the human body is imaged in the shooting scene according to the setting parameters comprises:

acquiring the distance between each imaging point and the camera device in the shooting scene;

according to the distance, acquiring numerical values of the imaging point in the horizontal direction and the vertical direction of the corresponding pixel point in the formed image;

multiplying the numerical values in the horizontal direction and the vertical direction to obtain the weight of the pixel point;

and forming the imaging weight matrix by using the weight of each pixel point.

8. The method according to any one of claims 1 to 6, wherein the acquiring of the setting parameters of the camera in the shooting scene comprises:

acquiring the installation height and the installation inclination angle in a measuring mode;

reading the equivalent focal length of the camera device;

and calculating to obtain the field angle according to the equivalent focal length.

9. The method according to any one of claims 1 to 6, wherein the acquiring of the setting parameters of the camera in the shooting scene comprises:

acquiring the installation height in a measuring mode;

determining a central point of an image acquired by the camera device;

acquiring the distance between the ground point corresponding to the central point and the installation position of the camera device;

determining the installation inclination angle according to the distance and the installation height;

and calculating the angle of view according to the equivalent focal length.

10. A video image-based people counting device, comprising:

the device comprises a pixel matrix acquisition module, a pixel matrix acquisition module and a pixel matrix analysis module, wherein the pixel matrix acquisition module is used for acquiring a pixel matrix of an image mask of the shooting scene, and the pixel matrix acquisition module is used for acquiring two continuous frames of images of the shooting scene by the camera device; performing interframe difference on the two continuous frames of images, performing binarization processing to obtain a first pixel matrix of a first mask, multiplying the first pixel matrix by the imaging weight matrix, and summing to obtain a pixel sum value; if the pixel sum value is larger than a preset threshold value, performing interframe difference on the second frame image and the background image and performing binarization processing to obtain a second pixel matrix of a second mask; using a second pixel matrix of the second mask as a pixel matrix of the image mask;

11. A computer device comprising a processor and a memory;

wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the video image based demographics method as claimed in any one of claims 1 to 9.

12. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the video image-based people counting method according to any one of claims 1 to 9.

13. A computer program medium, characterized in that instructions in the computer program medium, when executed by a processor, implement the video image based people counting method according to any one of claims 1-9.