CN114581854A

CN114581854A - Crowd counting method and device, electronic equipment and storage medium

Info

Publication number: CN114581854A
Application number: CN202210273018.3A
Authority: CN
Inventors: 杨昆霖; 刘诗男; 侯军; 伊帅
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2022-06-03
Also published as: WO2023173616A1

Abstract

The present disclosure relates to a crowd counting method and apparatus, an electronic device, and a storage medium, the method comprising: acquiring a plurality of crowd images, wherein each crowd image corresponds to an image acquisition device, and each image acquisition device corresponds to a preset statistical area; determining the number of people in a preset statistical area corresponding to each image acquisition device based on the head key point positioning of the plurality of crowd images; determining the number of people in a target area according to the number of people in a preset statistical area corresponding to each image acquisition device, wherein the target area represents a spatial range in which people statistics needs to be carried out, and the target area comprises the preset statistical area corresponding to each image acquisition device. The embodiment of the disclosure can effectively improve the accuracy of crowd counting in the scene with a large area and dense crowd.

Description

Crowd counting method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a crowd counting method and apparatus, an electronic device, and a storage medium.

Background

With the growth of population and the acceleration of urbanization process, a large amount of people gather more and more, and the size of the people is larger and larger. The crowd analysis has important significance for public safety and city planning. Common crowd analysis tasks include crowd counting, crowd behavior analysis, crowd positioning and the like, wherein the crowd counting plays a very important role in the management and control of crowd gathering behaviors. For example, the number of people in the dining room in each time period is counted, so that diners can conveniently and reasonably arrange the dining time of the diners, and unnecessary waiting and crowding are effectively reduced. Therefore, how to accurately perform the crowd statistics becomes a problem to be solved urgently at present.

Disclosure of Invention

The disclosure provides a crowd counting method and device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided a demographic method, including: acquiring a plurality of crowd images, wherein each crowd image corresponds to an image acquisition device, and each image acquisition device corresponds to a preset statistical area; determining the number of people in a preset statistical area corresponding to each image acquisition device based on the head key point positioning of the plurality of crowd images; determining the number of people in a target area according to the number of people in a preset statistical area corresponding to each image acquisition device, wherein the target area represents a spatial range in which people statistics needs to be carried out, and the target area comprises the preset statistical area corresponding to each image acquisition device.

In the embodiment of the disclosure, a plurality of crowd images are acquired by a plurality of image acquisition devices, so that the coverage of the acquired images is improved, and missing detection of personnel is effectively reduced; meanwhile, the positions of the target key points in each crowd image are obtained by positioning the head key points, so that the missing detection of people caused by the shielding of other people or a dining table is effectively reduced. Therefore, in the embodiment of the disclosure, the accuracy of crowd counting in a scene with a large area and dense crowds can be effectively improved.

In one possible implementation, the method further includes: configuring a preset statistical area of each image acquisition device; the number of people in the target area is determined according to the number of people in the preset statistical area corresponding to each image acquisition device, and the method comprises the following steps: and determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image acquisition device.

Therefore, the number of people in the target region is counted conveniently by configuring the preset counting region of each image acquisition device.

In a possible implementation manner, in a case that the preset statistical regions of the image capturing devices are not overlapped with each other and the target region is formed after the image capturing devices are spliced, determining the number of people in the target region based on the number of people in the preset statistical region corresponding to each image capturing device includes: and taking the sum of the number of people in the preset statistical area corresponding to each image acquisition device as the number of people in the target area.

Therefore, the repeated statistics or the error statistics can not occur, the missing statistics can not occur, and the statistical result is accurate.

In a possible implementation manner, in a case that the preset statistical areas of the image capturing devices at least partially overlap and the target area can be covered after the preset statistical areas are spliced, the determining, based on the number of people in the area statistical area corresponding to each image capturing device, the number of people in the target area includes: determining the number of people in the target area according to the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the repeated statistical number of people in the target area; wherein the number of repeated statistics in the target area is determined by: for a preset statistical area corresponding to each image acquisition device, mapping head key points in the preset statistical area to the target area according to the mapping relation between the preset statistical area and the target area to obtain the position of the head key points in the preset statistical area in the target area; and determining the repeated statistical number in the target area according to the positions of the head key points in each preset statistical area in the target area.

In a possible implementation manner, in a case that the preset statistical areas of the image capturing devices at least partially overlap and are spliced to form the target area, the determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image capturing device includes: determining the number of people in the target area according to the difference between the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the repeated statistical number of people in the target area; determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image acquisition device under the condition that the preset statistical areas of the image acquisition devices are at least partially overlapped and the spliced area is larger than the target area, and the method comprises the following steps: determining the number of people in the target area according to the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the difference between the number of people repeatedly counted in the target area and the number of people outside the target area; and determining the number of people outside the target area based on the positions of the head key points in each preset statistical area relative to the target area.

Therefore, repeated statistics or error statistics can not occur, missing statistics can not occur, and the statistical result is accurate.

In one possible implementation, the method further includes: acquiring a preset seat number of the target area; and determining the seating rate in the target area according to the number of people in the target area and the preset seat number.

In a possible implementation manner, the obtaining a preset number of seats of the target area includes: acquiring a plurality of spatial images, wherein each spatial image corresponds to an image acquisition device, and the spatial images represent images acquired in an unmanned scene; determining the number of seats in a preset statistical area corresponding to each image acquisition device based on target recognition respectively performed on the plurality of space images; and determining the preset seat number of the target area according to the seat number in the preset statistical area corresponding to each image acquisition device.

In a possible implementation manner, the determining the number of people in the preset statistical area corresponding to each image capturing device based on the head key point positioning performed on the plurality of people images respectively includes: obtaining a target positioning diagram corresponding to each crowd image based on the head key point positioning of the crowd images, wherein the target positioning diagram is used for indicating the position of a target head key point included in the corresponding crowd image; and determining the number of people in a preset statistical area corresponding to each image acquisition device based on the target positioning map corresponding to each crowd image.

In a possible implementation manner, the determining, based on the target location map corresponding to each of the crowd images, the number of people in a preset statistical area corresponding to each of the image capturing devices includes: determining a target head key point in each preset statistical area according to the position of each preset statistical area in the corresponding crowd image and the position of the target head key point in the crowd image corresponding to each preset statistical area; and determining the number of the key points of the head of the target person in each preset statistical area as the number of people in each preset statistical area.

In a possible implementation manner, the obtaining a target positioning map corresponding to each crowd image based on the head key point positioning performed on the plurality of crowd images respectively includes: performing head key point positioning on a first crowd image, and determining a predicted positioning map corresponding to the first crowd image, wherein the first crowd image represents a crowd image acquired by first image acquisition equipment in the crowd images, the first image acquisition equipment represents any one of the image acquisition equipment, and the predicted positioning map is used for indicating the prediction confidence that each pixel point in the first crowd image is a head key point; based on a preset confidence threshold value, carrying out image processing on the predicted positioning map to obtain an initial positioning map, wherein the initial positioning map is used for indicating the positions of initial head key points included in the first crowd image; determining a target field corresponding to each initial head key point in the initial positioning graph; and based on the predicted positioning map, filtering a target neighborhood corresponding to each initial human head key point to obtain a target positioning map corresponding to the first crowd image.

In one possible implementation manner, the determining a target neighborhood corresponding to each of the initial head keypoints in the initial positioning map includes: and determining a target neighborhood corresponding to each initial head key point according to a preset neighborhood radius, wherein the preset neighborhood radius is determined based on the position of the initial head key point in the first crowd image and a preset perspective relation corresponding to the first crowd image, and the preset perspective mapping relation corresponding to the first crowd image is used for indicating image scales corresponding to different positions in the first crowd image.

In a possible implementation manner, the filtering, based on the predicted location map, a target neighborhood corresponding to each initial human head key point to obtain the target location map includes: aiming at any one initial head key point, determining whether at least one other initial head key point exists in a target neighborhood corresponding to the initial head key point i; under the condition that at least one other initial human head key point j exists in a target neighborhood corresponding to the initial human head key point i, determining a prediction confidence coefficient corresponding to the initial human head key point i and a prediction confidence coefficient corresponding to the at least one other initial human head key point j based on the prediction positioning diagram; and determining a target head key point in a target neighborhood corresponding to the initial head key point i based on the initial head key point i with the maximum prediction confidence coefficient in the initial head key point i and the at least one other initial head key point j.

According to an aspect of the present disclosure, there is provided a demographic device, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of crowd images, each crowd image corresponds to an image acquisition device, and each image acquisition device corresponds to a preset statistical area;

the first determining module is used for determining the number of people in a preset statistical area corresponding to each image acquisition device based on the head key point positioning of the plurality of crowd images;

the second determining module is used for determining the number of people in a target area according to the number of people in a preset statistical area corresponding to each image acquisition device, wherein the target area represents a space range in which people statistics needs to be carried out, and the target area comprises the preset statistical area corresponding to each image acquisition device.

In one possible implementation, the apparatus further includes:

the configuration module is used for configuring a preset statistical area of each image acquisition device;

the second determining module is further configured to determine the number of people in the target area based on the number of people in a preset statistical area corresponding to each image acquisition device.

In a possible implementation manner, in a case that the preset statistical regions of the image capturing devices are not overlapped with each other and the target region is formed after the image capturing devices are spliced, determining the number of people in the target region based on the number of people in the preset statistical region corresponding to each image capturing device includes:

and taking the sum of the number of people in the preset statistical area corresponding to each image acquisition device as the number of people in the target area.

In a possible implementation manner, in a case that the preset statistical areas of the image capturing devices at least partially overlap and the target area can be covered after the preset statistical areas are spliced, the determining, based on the number of people in the area statistical area corresponding to each image capturing device, the number of people in the target area includes:

determining the number of people in the target area according to the sum of the number of people in a preset counting area corresponding to each image acquisition device and the repeated counting number of people in the target area;

wherein the number of repeated statistics in the target area is determined by:

for a preset statistical area corresponding to each image acquisition device, mapping head key points in the preset statistical area to the target area according to the mapping relation between the preset statistical area and the target area to obtain the position of the head key points in the preset statistical area in the target area;

and determining the repeated statistical number in the target area according to the positions of the head key points in the preset statistical areas in the target area.

In a possible implementation manner, in a case that the preset statistical areas of the image capturing devices at least partially overlap and are spliced to form the target area, the determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image capturing device includes:

determining the number of people in the target area according to the difference between the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the repeated statistical number of people in the target area;

determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image acquisition device under the condition that the preset statistical areas of the image acquisition devices are at least partially overlapped and the spliced area is larger than the target area, and the method comprises the following steps:

determining the number of people in the target area according to the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the difference between the number of people repeatedly counted in the target area and the number of people outside the target area;

and determining the number of people outside the target area based on the positions of the head key points in each preset statistical area relative to the target area.

In one possible implementation, the apparatus further includes:

the second acquisition module is used for acquiring a preset seat number of the target area;

and the third determining module is used for determining the seating rate in the target area according to the number of people in the target area and the preset seat number.

In a possible implementation manner, the second obtaining module is further configured to:

acquiring a plurality of spatial images, wherein each spatial image corresponds to an image acquisition device, and the spatial images represent images acquired in an unmanned scene;

determining the number of seats in a preset statistical area corresponding to each image acquisition device based on target recognition respectively performed on the plurality of space images;

and determining the preset seat number of the target area according to the seat number in the preset statistical area corresponding to each image acquisition device.

In one possible implementation manner, the first determining module is further configured to:

obtaining a target positioning diagram corresponding to each crowd image based on the head key point positioning of the crowd images, wherein the target positioning diagram is used for indicating the position of a target head key point included in the corresponding crowd image;

and determining the number of people in a preset statistical area corresponding to each image acquisition device based on the target positioning map corresponding to each crowd image.

In a possible implementation manner, the determining, based on the target location map corresponding to each of the crowd images, the number of people in a preset statistical area corresponding to each of the image capturing devices includes:

determining a target head key point in each preset statistical area according to the position of each preset statistical area in the corresponding crowd image and the position of the target head key point in the crowd image corresponding to each preset statistical area;

and determining the number of the key points of the head of the target person in each preset statistical area as the number of people in each preset statistical area.

In a possible implementation manner, the obtaining a target positioning map corresponding to each crowd image based on the head key point positioning performed on the plurality of crowd images respectively includes:

performing head key point positioning on a first crowd image, and determining a predicted positioning map corresponding to the first crowd image, wherein the first crowd image represents a crowd image acquired by first image acquisition equipment in the crowd images, the first image acquisition equipment represents any one of the image acquisition equipment, and the predicted positioning map is used for indicating the prediction confidence that each pixel point in the first crowd image is a head key point;

based on a preset confidence threshold value, carrying out image processing on the predicted positioning map to obtain an initial positioning map, wherein the initial positioning map is used for indicating the position of an initial head key point included in the first crowd image;

determining a target field corresponding to each initial head key point in the initial positioning graph;

and based on the predicted positioning map, filtering a target neighborhood corresponding to each initial human head key point to obtain a target positioning map corresponding to the first crowd image.

In one possible implementation manner, the determining a target neighborhood corresponding to each of the initial head keypoints in the initial positioning map includes:

and determining a target neighborhood corresponding to each initial head key point according to a preset neighborhood radius, wherein the preset neighborhood radius is determined based on the position of the initial head key point in the first crowd image and a preset perspective relation corresponding to the first crowd image, and the preset perspective mapping relation corresponding to the first crowd image is used for indicating image scales corresponding to different positions in the first crowd image.

In a possible implementation manner, the filtering, based on the predicted location map, a target neighborhood corresponding to each initial human head key point to obtain the target location map includes:

aiming at any one initial head key point, determining whether at least one other initial head key point exists in a target neighborhood corresponding to the initial head key point i;

under the condition that at least one other initial human head key point j exists in a target neighborhood corresponding to the initial human head key point i, determining a prediction confidence coefficient corresponding to the initial human head key point i and a prediction confidence coefficient corresponding to the at least one other initial human head key point j based on the prediction positioning diagram;

and determining a target head key point in a target neighborhood corresponding to the initial head key point i based on the initial head key point i with the maximum prediction confidence coefficient in the initial head key point i and the at least one other initial head key point j.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of a demographic method in accordance with an embodiment of the present disclosure.

Fig. 2 shows an exemplary schematic diagram of the preset statistical region.

Fig. 3 shows an exemplary schematic diagram of a preset statistical region and a corresponding crowd image.

FIG. 4 is a schematic diagram illustrating a first crowd image and its corresponding preset perspective mapping relationship according to an embodiment of the present disclosure;

FIG. 5 shows a block diagram of a demographic device in accordance with an embodiment of the present disclosure;

FIG. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure;

fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In the related art, an image of a certain spatial range is generally acquired by using a certain image acquisition device, and the acquired image is input into a counting model to acquire the number of people in the image. However, for a spatial range with a large area and dense crowds, such as a dining room dining area, a train station waiting hall or a square, the situation that coverage of the acquired images is incomplete and blocking of pedestrians and/or objects (e.g., dining tables and the like) in the images is serious exists by adopting the method, so that missed detection is serious, and the accuracy of crowd statistics is low.

The embodiment of the disclosure provides a crowd counting method, which can be applied to crowd counting in a spatial range with a large area and dense crowd. In the embodiment of the disclosure, a plurality of crowd images are acquired by a plurality of image acquisition devices, so that the coverage of the acquired images is improved, and missing detection of personnel is effectively reduced; meanwhile, the positions of the target key points in each crowd image are obtained by positioning the head key points, so that the missing detection of people caused by the shielding of other people or a dining table is effectively reduced. Therefore, in the embodiment of the disclosure, the accuracy of crowd counting in a scene with a large area and dense crowds can be effectively improved.

FIG. 1 shows a flow diagram of a demographic method in accordance with an embodiment of the present disclosure. The crowd positioning method can be implemented by a processor calling a computer readable instruction stored in a memory, and the crowd positioning method can be implemented by electronic Equipment such as a terminal device or a server, wherein the terminal device can be User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like. Alternatively, the crowd location method may be performed by a server. As shown in fig. 1, the demographic method may include:

in step S11, a plurality of crowd images are obtained, where each crowd image corresponds to an image capturing device, and each image capturing device corresponds to a preset statistical area.

In the embodiment of the disclosure, a plurality of image acquisition devices are arranged for a spatial range (which can be called as a target area) needing to be subjected to crowd counting. For example, the image capturing device includes, but is not limited to, a camera, a video camera, a monitor, a camera, or other devices with a function of taking a picture or recording a video. Each image capture device may be configured to capture images of a population of people. The crowd image is an image containing dense crowd, which may be obtained by image acquisition equipment after image acquisition is performed on the dense crowd in a certain spatial range, may also be a key image frame containing the dense crowd obtained from a video, and may also be obtained by other methods, which is not specifically limited by the present disclosure.

The preset statistical Region may be used to represent a Region Of Interest (ROI) Of the image capturing device. The shape of the preset statistical region may be set as required, for example, the preset statistical region may be a rectangle, a circle, an ellipse, or other irregular polygons, which is not limited in this disclosure.

In the embodiment of the present disclosure, a corresponding preset statistical region may be configured for each image capturing device.

In a possible implementation manner, there is no overlap between the preset statistical regions corresponding to any two image acquisition devices, and the preset statistical regions corresponding to the image acquisition devices are spliced to form the target region. The preset statistical regions corresponding to any two image acquisition devices are not overlapped, so that repeated statistics of a certain region in the target region can be avoided. The preset statistical regions corresponding to the image acquisition devices can be spliced to obtain the target region, and a certain region in the target region can be prevented from being omitted. Therefore, the accuracy of the crowd statistics can be effectively improved.

Fig. 2 shows an exemplary schematic diagram of the preset statistical region. As shown in fig. 2, the target area is provided with four image capturing devices, which are respectively: an image capturing apparatus 1, an image capturing apparatus 2, an image capturing apparatus 3, and an image capturing apparatus 4. The preset statistical regions corresponding to the four image acquisition devices are respectively as follows: a preset statistical region 1, a preset statistical region 2, a preset statistical region 3, and a preset statistical region 4. As shown in fig. 2, any two of the preset statistical area 1 to the preset statistical area 4 are not overlapped, and the preset statistical area 1 to the preset statistical area 4 can be spliced to obtain the target area.

In another possible implementation manner, the preset statistical regions of the image acquisition devices are at least partially overlapped and can cover the target region after being spliced. In this case, the region obtained by splicing the image acquisition devices may just form the target region, or may be larger than the target region. Therefore, the preset statistical areas corresponding to the image acquisition devices can be spliced to cover the target area, and the preset statistical areas of the image acquisition devices are at least partially overlapped, so that a certain area in the target area can be prevented from being omitted, and the accuracy of crowd statistics is improved.

In step S12, determining the number of people in the preset statistical area corresponding to each image capturing device based on the head key point positioning performed on each of the plurality of people images.

In one possible implementation, step S12 may include: obtaining a target positioning diagram corresponding to each crowd image based on the head key point positioning of the crowd images, wherein the target positioning diagram is used for indicating the position of a target head key point included in the corresponding crowd image; and determining the number of people in a preset statistical area corresponding to each image acquisition device based on the target positioning map corresponding to each crowd image.

And positioning the head key points of the crowd image, and obtaining a target positioning image used for indicating the positions of the target head key points included in the crowd image from end to end. The detailed description of the specific process for positioning the key points of the human head will be given later in conjunction with possible implementations of the present disclosure, and will not be repeated herein. The key point of the human head may be a central point of the human head, or may be other preset key points of the human head, which is not limited in the present disclosure.

Each image acquisition device acquires a crowd image, and each image acquisition device corresponds to a preset statistical area. Correspondingly, each crowd image corresponds to a preset statistical area. The target positioning graph corresponding to the crowd image indicates the positions of the key points of the head of the target person included in the crowd image. Therefore, the position of each person in the crowd image can be determined based on the target positioning map corresponding to the crowd image.

In consideration of the fact that the preset counting area corresponding to each image acquisition device can cover the target area, when people counting is performed in the embodiment of the present disclosure, people in the preset counting area in the image of the counted crowd can be counted, and therefore, a certain area in the target area can be prevented from being omitted when people counting is performed.

In a possible implementation manner, determining, based on the target positioning map corresponding to each of the crowd images, the number of people in the preset statistical area corresponding to each of the image capturing devices may include: determining a target head key point in each preset statistical area according to the position of each preset statistical area in the corresponding crowd image and the position of the target head key point in the crowd image corresponding to each preset statistical area; and determining the number of the key points of the head of the target person in each preset statistical area as the number of people in each preset statistical area.

Fig. 3 shows an exemplary schematic diagram of a preset statistical region and a corresponding crowd image. As shown in fig. 3, for any one crowd image, through the performed human head key point positioning, it is identified that the crowd image includes 9 target human head key points; according to the positions of the preset statistical area in the crowd image and the positions of the target head key points in the head image, 5 target head key points can be determined to fall in the preset statistical area, and 4 target head key points fall in the area outside the preset statistical area, so that the number of people in the preset statistical area can be determined to be 5.

In step S13, determining the number of people in a target area according to the number of people in the preset statistical area corresponding to each image capturing device, where the target area represents a spatial range in which people statistics needs to be performed, and the target area includes the preset statistical area corresponding to each image capturing device.

In one possible implementation, before performing step S13, the method further includes: and configuring a preset statistical area of each image acquisition device. Based on this, step S13 may include: and determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image acquisition device.

In a possible implementation manner, preset statistical regions of the configured image acquisition devices are not overlapped with each other, and the target region is formed after splicing. In this case, the determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image capturing device may include: and determining the sum of the number of people in the preset statistical area corresponding to each image acquisition device as the number of people in the target area. At the moment, the situation of repeated statistics and the situation of missing statistics can be avoided, and the statistical result is accurate.

In a possible implementation manner, preset statistical regions of the configured image acquisition devices are at least partially overlapped, and the target region can be covered after splicing. In this case, the determining the number of people in the target area based on the number of people in the area statistical area corresponding to each image capturing device may include: and determining the number of people in the target area according to the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the repeated statistical number of people in the target area.

Wherein, the repeated statistical population of the target area can be determined by the following steps: for a preset statistical area corresponding to each image acquisition device, mapping head key points in the preset statistical area to the target area according to the mapping relation between the preset statistical area and the target area to obtain the position of the head key points in the preset statistical area in the target area; and determining the repeated statistical number in the target area according to the positions of the head key points in the preset statistical areas in the target area.

It can be understood that after an image capturing device is erected and debugged, a spatial region corresponding to an image captured by the image capturing device is fixed. Therefore, in the embodiment of the present disclosure, the position of each pixel point in the target region in the image acquired by the image acquisition device may be obtained, that is, the mapping relationship between the image acquired by the image acquisition device and the target region is obtained. Based on the mapping relationship between the image acquired by the image acquisition device and the target area and the position of the preset statistical area corresponding to the image acquisition device in the image acquired by the image acquisition device, the mapping relationship between the preset statistical area corresponding to the image acquisition device and the target area can be obtained. And then, according to the mapping relation between the preset statistical area corresponding to the image acquisition equipment and the target area, the head key points in the preset statistical area can be mapped into the target area. If a plurality of head key points are mapped to the same position in the target area, which indicates that repeated statistics occur, the number of repeated statistics in the target area can be increased by one. If a certain head key point cannot be mapped into the target area, the head key point is located outside the target area, people number removing is needed when counting the number of people in the target area, and at this time, the number of people outside the target area can be increased by one.

In the case that the preset statistical areas of the image acquisition devices at least partially overlap, repeated counting of people may occur, and therefore, the number of the repeated counted people in the target area needs to be determined for removing.

In one example, the preset statistical regions of the image capturing devices at least partially overlap, and the preset statistical regions of the image capturing devices are spliced to form the target region. At this time, the situation that the key point of the head of the person is located outside the target area does not occur, and therefore, the number of people in the target area can be determined according to the difference value between the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the repeated statistical number of people in the target area. Therefore, the repeated statistics or the error statistics can not occur, the missing statistics can not occur, and the statistical result is accurate.

In another example, the preset statistical regions of the image capturing devices at least partially overlap, and the region after the splicing of the preset statistical regions of the image capturing devices is larger than the target region. At this time, there may be a case where the key point of the head of the person is outside the target area, and therefore, the number of persons in the target area may be determined according to the sum of the number of persons in the preset statistical area corresponding to each image capturing device and the difference between the number of repeated statistics in the target area and the number of persons outside the target area. The number of people outside the target area is determined based on the position of the head key point in each preset statistical area relative to the target area, and the specific determination method is introduced above and is not described herein again. Therefore, the repeated statistics or the error statistics can not occur, the missing statistics can not occur, and the statistical result is accurate.

In the embodiment of the disclosure, a plurality of image acquisition devices are adopted to acquire a plurality of crowd images, so that the coverage of the acquired images is improved, and missing detection of personnel is effectively reduced; meanwhile, the number of people in the preset statistical area corresponding to each image acquisition device is determined by adopting the head key point positioning, so that the missing detection of people caused by the shielding of other people or a dining table is effectively reduced. Therefore, in the embodiment of the disclosure, the accuracy of crowd counting in a scene with a large area and dense crowds can be effectively improved.

In one possible implementation manner, the demographic method provided by the present disclosure may further include: acquiring a preset seat number of the target area; and determining the seating rate in the target area according to the number of people in the target area and the preset seat number.

Where the seating rate refers to a usage rate of seats, in the embodiment of the present disclosure, a ratio of the number of persons in the target area to a preset number of seats in the target area may be determined as the seating rate in the target area.

The preset seat number of the target area can be manually input by a user, or acquired from a database of the target area, or can be automatically detected according to an image acquired by image acquisition equipment, which is not limited in the embodiment of the application.

In one possible implementation, a plurality of spatial images are acquired, each spatial image corresponding to an image acquisition device, wherein the spatial images represent images acquired in an unmanned scene; determining the number of seats in a preset statistical area corresponding to each image acquisition device based on target recognition respectively performed on the plurality of space images; and determining the preset seat number of the target area according to the seat number in the preset statistical area corresponding to each image acquisition device.

Wherein, for a scene (e.g., a restaurant) in which the chair may be occluded by a table, target recognition of the aerial image may be to identify the table in the aerial image. According to the number of tables in the preset statistical area of the space image and the number of chairs equipped for each table, the number of seats in the preset statistical area of the space image can be obtained. For a scene without a desk (such as a waiting room), the target recognition of the spatial image may be to recognize a chair in the spatial image, based on which the number of seats in a preset statistical area of the spatial image may be obtained. The method for identifying the target of the spatial image may refer to the related art, and will not be described herein.

The process of determining the preset seat number of the target area according to the seat number in the preset statistical area corresponding to each image acquisition device can refer to the number of people in the preset statistical area corresponding to each image acquisition device to determine the number of people in the target area, and details are not repeated here.

By counting the seating rate of the target area, the user can conveniently arrange the time for the user to reach the target area, so that the waiting time is reduced, the crowding is avoided, and the comfort level is improved.

Take the dining area of the dining room as an example. The image acquisition equipment can be erected at a proper position (such as the periphery) in the dining room, and a corresponding preset statistical area is configured for each image acquisition equipment, wherein the preset statistical areas corresponding to any two image acquisition equipment are not overlapped, and the preset statistical areas corresponding to the image acquisition equipment can be spliced to obtain dining areas of the dining room. Then, acquiring a plurality of crowd images through each image acquisition device; and respectively positioning the head key points of the plurality of crowd images to obtain a target positioning map corresponding to each crowd image (the target positioning map is used for indicating the positions of the target head key points included in the crowd images). Then, for each image acquisition device, determining the number of people in the preset statistical area corresponding to the image acquisition device according to the position of the preset statistical area corresponding to the image acquisition device in the crowd image and the position of the target head key point in the crowd image acquired by the image acquisition device, wherein the number of people is the effective number of people for dining shot by the image acquisition device. And finally, summing the effective number of the dining people shot by each image acquisition device, so as to obtain the current number M of the dining people in the dining area of the dining hall. On the basis, according to the preset seat number N of the dining area of the dining room and the current number M of the dining people, the seating rate M/N of the dining area of the current dining room can be obtained. Wherein M and N are positive integers.

Like this, the user can arrange the time of having dinner of oneself according to the intensive degree of the regional rate of sitting of having dinner in current dining room to reduce latency, avoid crowded, promote the comfort level.

In step S12, the head key points of the plurality of crowd images are positioned to obtain a target positioning map corresponding to each crowd image. The following describes a process of positioning the head key points by taking an example of positioning the head key points of the first crowd image to obtain a target positioning map corresponding to the first crowd image. The first crowd image represents a crowd image acquired by a first image acquisition device in the plurality of crowd images, and the first image acquisition device represents any one of the set image acquisition devices.

In one possible implementation, the performing the head key point positioning on the first person group image to obtain the target positioning map corresponding to the first person group image may include: positioning a head key point of the first crowd image, and determining a prediction positioning map corresponding to the first crowd image, wherein the prediction positioning map is used for indicating the prediction confidence that each pixel point in the first crowd image is the head key point; based on a preset confidence threshold value, carrying out image processing on the predicted positioning map to obtain an initial positioning map, wherein the initial positioning map is used for indicating the positions of initial head key points included in the first crowd image; determining a target field corresponding to each initial head key point in the initial positioning graph; and based on the prediction positioning graph, filtering the target neighborhood corresponding to each initial head key point to obtain a target positioning graph corresponding to the first crowd image.

The method comprises the steps of positioning head key points of a first crowd image, determining the prediction confidence coefficient that each pixel point in the first crowd image is the head key point end to end, further performing threshold segmentation on the prediction positioning map through a preset confidence threshold value, determining an initial positioning map used for indicating the position of the initial head key point included in the first crowd image, further determining a target neighborhood of each initial head key point in the initial positioning map, further filtering the target neighborhood corresponding to the initial head key point based on the prediction positioning map, obtaining a target positioning map with high precision, and accurately indicating the position of the target head key point included in the first crowd image.

In an example, a first population image may be subject to head keypoint localization using a trained head keypoint localization neural network. Specifically, the first crowd image is input into a trained human head key point positioning neural network, and a prediction positioning diagram is directly output through positioning of the human head key point positioning neural network. The specific network structure and training process of the trained human head key point positioning neural network can adopt the network structure and training process in the related technology, which is not specifically limited by the disclosure.

In one example, the pixel value of each pixel point in the predicted location graph represents the prediction confidence of the pixel point, i.e., the probability that the pixel point is the head key point. And carrying out sigmoid operation on the predicted positioning diagram so that the pixel value of each pixel point in the predicted positioning diagram is between 0 and 1. For example, if the pixel value of a certain pixel in the predicted location map is 0.7, the probability that the pixel is a head key point is 0.7.

Because the predicted positioning map is only used for indicating the prediction confidence coefficient that each pixel point in the first crowd image is the head key point, the predicted positioning map is subjected to threshold segmentation by presetting a confidence coefficient threshold, so that the initial positioning map used for indicating the position of the initial head key point included in the first crowd image can be effectively obtained. The specific value of the preset confidence level threshold may be flexibly set according to the actual situation, which is not specifically limited by the present disclosure.

Comparing the pixel value of each pixel point in the predicted positioning map with a preset confidence threshold, and determining the pixel value of the pixel point at the corresponding position in the initial positioning map as 1 under the condition that the pixel value of a certain pixel point in the predicted positioning map is greater than or equal to the preset confidence threshold; and under the condition that the pixel value of a certain pixel point in the pre-positioning map is smaller than a preset confidence threshold, determining the pixel value of the pixel point at the corresponding position in the initial positioning map as 0.

The initial positioning image and the first crowd image have the same size, and the position of the pixel point with the pixel value of 1 in the initial positioning image is used for indicating the position of the initial head key point included in the first crowd image. For example, in the case that the pixel value of the pixel point with the image coordinate (x, y) in the initial positioning map is 1, it may be determined that the pixel point with the image coordinate (x, y) in the first crowd image is the initial head key point; under the condition that the pixel value of the pixel point with the image coordinate (x, y) in the initial positioning map is 0, the pixel point with the image coordinate (x, y) in the first person group image can be determined to be a part except the initial head key point.

In order to avoid the problem of false detection that the same human head corresponds to a plurality of initial human head key points, a target neighborhood corresponding to each initial human head key point in an initial positioning graph is further determined, and the target neighborhood corresponding to each initial human head key point is filtered to obtain a target positioning graph with higher precision, wherein the same human head corresponds to one target human head key point in the target positioning graph.

In one possible implementation, determining a target neighborhood corresponding to each initial head keypoint in the initial positioning map includes: and determining a target neighborhood corresponding to each initial human head key point according to the preset neighborhood radius.

In one example, the preset domain radius may be fixed, and at this time, the preset domain radius may be referred to as a first preset domain radius. In another example, the preset neighborhood radius may be determined according to the position of the initial head keypoint in the first crowd image and a preset perspective relationship corresponding to the first crowd image, and at this time, according to the size of the head frame height, a second preset neighborhood radius or a third preset neighborhood radius may be selected for determining the target neighborhood. The following respectively describes a process of determining a target domain corresponding to each initial head key point according to a first preset neighborhood radius, and a process of determining a target neighborhood corresponding to the initial head key point based on a position of the initial head key point in the first crowd image and a preset perspective mapping relationship (corresponding to a second preset domain radius and a third preset domain radius) corresponding to the first crowd image.

In one possible implementation, determining a target neighborhood corresponding to each initial head keypoint in the initial positioning map includes: and determining a target neighborhood corresponding to each initial head key point according to the first preset neighborhood radius.

And the target neighborhood corresponding to each initial head key point can be quickly determined through the preset fixed first preset neighborhood radius. The specific value of the radius of the first neighborhood can be flexibly set according to actual conditions, which is not specifically limited in the present disclosure.

For example, if the radius of the first preset neighborhood is 2, for any initial head keypoint i, the target neighborhood corresponding to the initial head keypoint i includes: and the pixel distance between the initial human head key point i and the pixel point of 2 pixel points is not more than.

In one possible implementation, determining a target neighborhood corresponding to each initial head keypoint in the initial positioning map includes: and determining a target neighborhood corresponding to the initial head key point based on the position of the initial head key point in the first crowd image and a preset perspective mapping relation corresponding to the first crowd image aiming at any initial head key point.

The preset perspective mapping relation is used for indicating image scales corresponding to different positions in the first crowd image. Due to the fact that the installation angles of different image acquisition devices are different, the image scales corresponding to crowd images acquired by different image acquisition devices are different. In the embodiment of the present disclosure, it is necessary to respectively determine the preset perspective mapping relationship corresponding to the crowd image acquired by each image acquisition device. The following describes a process of determining a preset perspective mapping relationship corresponding to the first crowd image.

In a possible implementation manner, a plurality of labeled human body frames obtained by labeling the human body frames of pedestrians at different positions in the first crowd image can be obtained; and determining a preset perspective mapping relation corresponding to the first crowd image based on the plurality of marked human body frames.

The method comprises the steps of selecting pedestrians at different positions far away from, in and near from a crowd image to label a human body frame, obtaining a plurality of labeled human body frames in a first crowd image, determining image scales corresponding to limited positions (labeled human body frame positions) in the first crowd image based on the height of the labeled human body frames and the proportional relation between the actual heights of the pedestrians, further fitting to effectively obtain the image scales corresponding to each position in the first crowd image based on the image scales corresponding to the limited positions, and obtaining the preset perspective mapping relation corresponding to the first crowd image.

Fig. 4 shows a schematic diagram of a first crowd image and its corresponding preset perspective mapping relationship according to an embodiment of the present disclosure. As shown in fig. 2, pedestrians at different positions, far, middle and near, are selected from the first crowd image to perform human frame labeling, so as to obtain four labeled human frames A, B, C, D at different positions in the first crowd image, and then, based on the four labeled human frames A, B, C, D, a preset perspective mapping relationship corresponding to the first crowd image is obtained through fitting.

In one possible implementation manner, determining a preset perspective mapping relationship corresponding to the first crowd image based on a plurality of labeled human body frames includes: determining a reference image scale corresponding to a reference human body key point in any one labeled human body frame; and fitting to obtain a preset perspective mapping relation corresponding to the first crowd image according to the third image coordinate of the reference human body key point in each labeled human body frame and the reference image scale corresponding to the reference human body key point in each labeled human body frame.

The image scales of different positions are linearly changed along the column direction of the crowd image, so that after the image scale corresponding to the reference human body key point of the limited position in the crowd image is determined according to the labeled human body frame, the image scale corresponding to each position in the first crowd image can be effectively obtained through linear function fitting, and the preset perspective mapping relation corresponding to the first crowd image is obtained.

Because the pedestrian stands vertically, the human foot key points are used as reference human body key points, and the height of the marked human body frame can be regarded as the height of the pedestrian in the crowd image. The height of the labeled human body frame can be represented by the number of pixel lines occupied by the labeled human body frame. For example, if the annotated body box occupies 17 rows of pixels in the crowd image, the height of the annotated body box is 17. Assuming that the actual height of the pedestrian corresponding to the labeled body frame is 1.7 m, the position of the key point of the reference foot in the labeled body frame can be determined, which indicates that 17 rows of pixels are needed for 1.7 m in the real world. Assuming that the unit height is 1m, therefore, at the position of the key point of the reference human foot in the labeled human body frame, 10 rows of pixels are needed for 1 meter in the real world, that is, the reference image scale corresponding to the key point of the reference human foot in the labeled human body frame is 10. The real height of the pedestrian corresponding to the body frame can be marked by selecting a proper value according to the actual condition, and the real height is not specifically limited by the disclosure.

The key point of the reference human foot in the human body frame can be the middle point of the bottom edge of the human body frame, and can also be other pixel points in the human body frame, which is not specifically limited by the disclosure.

Still taking the above fig. 4 as an example, after the four labeled human body frames A, B, C, D are labeled in the first human group image, the reference image scale corresponding to the key point of the reference human foot in each labeled human body frame is determined in the above manner. And further, performing linear function fitting according to the third image coordinates of the key points of the reference feet in the four marked human body frames and the corresponding reference image scales to obtain a linear mapping function p ═ a × y + b.

The image coordinates refer to position coordinates in a pixel coordinate system of the crowd image. For example, a pixel coordinate system of the crowd image is constructed by taking the upper left corner of the crowd image as a coordinate origin (0, 0), the direction parallel to the row direction of the image as the direction of the x axis, and the direction parallel to the column direction of the image as the direction of the y axis, and the horizontal coordinate and the vertical coordinate of the image coordinate are all pixels. For example, if the image coordinates of the reference human foot key point are (10, 15), this indicates that the reference human foot key point is a pixel point located at row 10 and column 15 in the crowd image.

The linear mapping function p ═ a × y + b is a functional representation of the preset perspective mapping relationship corresponding to the crowd image. Wherein, a and b are parameters obtained by linear function fitting, y is the ordinate of the image coordinate of different positions in the crowd image, and p is the image scale corresponding to the position. By using a linear mapping function p ═ a × y + b, the image scale corresponding to each position in the crowd image can be determined.

Before or after the preset perspective mapping relation corresponding to the first crowd image is determined, head key points of the first crowd image are positioned to obtain an initial positioning picture corresponding to the first crowd image.

After determining the locations of the initial head keypoints included in the first-person image based on the initial positioning map, a target neighborhood matching each of the initial head keypoints may be determined based on a preset perspective mapping relationship.

In a possible implementation manner, determining a target neighborhood corresponding to an initial human head key point based on a position of the initial human head key point in a crowd image and a preset perspective mapping relationship includes: determining a target image scale corresponding to the position of the initial head key point in the crowd image based on a preset perspective mapping relation; determining the height of a human head frame corresponding to the initial human head key point based on the target image scale; and determining a target neighborhood corresponding to the initial head key point based on the head frame height corresponding to the initial head key point.

Based on the preset perspective mapping relation, the target image scale corresponding to the position of the initial head key point in the crowd image can be quickly determined, and then the head frame height corresponding to the initial head key point is determined based on the target image scale, so that the target neighborhood matched with the head frame height can be further determined according to the head frame height.

For example, for an initial head keypoint i in the initial positioning map, the image coordinate of the initial head keypoint i in the first crowd image is (hx, hy), and then according to the preset perspective mapping relationship (linear mapping function p ═ a × y + b) corresponding to the first crowd image, it may be determined that the target scale corresponding to the initial head keypoint i is pi ═ a × hy + b. Assuming that the actual head frame height corresponding to the pedestrian in the first crowd image is 0.4 meters by 0.4 meters, the head frame height corresponding to the initial head key point i in the first crowd image is si-0.4 pi. And determining a target neighborhood matched with the initial human head key point according to the height si of the human head frame corresponding to the initial human head key point, which is 0.4 pi.

In one possible implementation, determining a target neighborhood corresponding to the initial head key point based on the head frame height corresponding to the initial head key point includes: under the condition that the height of the human head frame is larger than a preset human head frame height threshold value, determining a target neighborhood corresponding to the initial human head key point based on a second preset neighborhood radius; or under the condition that the height of the human head frame is smaller than or equal to a preset human head frame height threshold value, determining a target neighborhood corresponding to the initial human head key point based on a third neighborhood radius, wherein the second neighborhood radius is larger than the third neighborhood radius.

Under the condition that the height of the human head frame is larger than a preset human head frame height threshold value, the human head frame can be determined to be large in size, and therefore, a second larger neighborhood radius is adopted for carrying out subsequent filtering processing on the human head frame; and under the condition that the height of the human head frame is less than or equal to the preset height threshold of the human head frame, determining that the size of the human head frame is smaller, and performing subsequent filtering processing on the human head frame by adopting a smaller third neighborhood radius. By flexibly determining the neighborhood radius, the accuracy of the filtering operation can be improved. The specific values of the preset human head frame height threshold, the second neighborhood radius and the third neighborhood radius can be flexibly set according to actual conditions, and the method is not specifically limited in the disclosure.

In an example, the head frame height threshold is 32, and for a certain initial head key point i, under the condition that the corresponding head frame height si is greater than 32, a target neighborhood is determined based on the second neighborhood radius 2; in the case of its corresponding human head frame height si < 32, its target neighborhood is determined based on the third neighborhood radius 1.

Under the condition that the radius of the second preset neighborhood is 2, the target neighborhood corresponding to the initial human head key point i comprises pixel points, wherein the pixel distance between the target neighborhood and the initial human head key point i is not more than 2 pixel points. Under the condition that the radius of the third preset neighborhood is 1, the target neighborhood corresponding to the initial human head key point i comprises pixel points, wherein the pixel distance between the target neighborhood and the initial human head key point i is not more than 1 pixel point.

After the target neighborhood corresponding to each initial human head key point in the initial positioning graph is determined based on the mode, the target neighborhood corresponding to each initial human head key point is utilized to filter the initial positioning graph so as to obtain the target positioning graph with higher accuracy.

In a possible implementation manner, based on the pre-positioning map, filtering a target neighborhood corresponding to each initial head key point to obtain a target positioning map, including: aiming at any initial head key point i, determining whether at least one other initial head key point exists in a target neighborhood corresponding to the initial head key point i; under the condition that at least one other initial human head key point j exists in a target neighborhood corresponding to the initial human head key point i, determining a prediction confidence coefficient corresponding to the initial human head key point i and a prediction confidence coefficient corresponding to the at least one other initial human head key point j based on the prediction positioning graph; and determining a target head key point in a target neighborhood corresponding to the initial head key point i based on the initial head key point i with the maximum prediction confidence coefficient in the initial head key point j and at least one other initial head key point j.

And (3) detecting whether other initial head key points exist in the target neighborhood of the initial head key point i with the image coordinate of (xi, yi) in the initial positioning map, and if other initial head key points j with the image coordinate of (xj, yj) exist, determining the prediction confidence corresponding to the initial head key point i and the prediction confidence corresponding to the initial head key point j according to the prediction positioning map. And under the condition that the prediction confidence coefficient of the initial human head key point i is greater than that of the initial human head key point j, keeping the pixel value of the pixel point with the image coordinate of (xi, yi) as 1, and updating the pixel value of the pixel point with the image coordinate of (xj, yj) as 0, namely filtering the initial human head key point j in the initial positioning image. And by analogy, traversing each initial head key point in the initial positioning graph to obtain a final target positioning graph.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a crowd counting apparatus, an electronic device, a computer readable storage medium, and a program, which can be used to implement any crowd counting method provided by the present disclosure, and the corresponding technical solutions and descriptions thereof and the corresponding descriptions in the methods section are not repeated.

FIG. 5 shows a block diagram of a demographic device in accordance with an embodiment of the present disclosure. As shown in fig. 5, the apparatus 50 includes:

the first obtaining module 51 is configured to obtain a plurality of crowd images, where each crowd image corresponds to an image acquisition device, and each image acquisition device corresponds to a preset statistical area;

a first determining module 52, configured to determine, based on the head key point locations performed on the plurality of crowd images, the number of people in a preset statistical area corresponding to each image acquisition device;

the second determining module 53 is configured to determine the number of people in a target area according to the number of people in a preset statistical area corresponding to each image capturing device, where the target area indicates a spatial range in which people statistics needs to be performed, and the target area includes the preset statistical area corresponding to each image capturing device.

In one possible implementation, the apparatus further includes:

determining the number of people in the target area according to the sum of the number of people in the preset statistical area corresponding to each image acquisition device and the repeated statistical number of people in the target area;

wherein the number of repeated statistics in the target area is determined by:

In one possible implementation, the apparatus further includes:

the method comprises the steps of acquiring a plurality of spatial images, wherein each spatial image corresponds to an image acquisition device, and the spatial images represent images acquired in an unmanned scene;

In a possible implementation manner, the first determining module is further configured to:

The method has specific technical relevance with the internal structure of the computer system, and can solve the technical problems of how to improve the hardware operation efficiency or the execution effect (including reducing data storage capacity, reducing data transmission capacity, improving hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system according with the natural law.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or other terminal device.

Referring to fig. 6, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second generation mobile communication technology (2G), a third generation mobile communication technology (3G), a fourth generation mobile communication technology (4G), a long term evolution of universal mobile communication technology (LTE), a fifth generation mobile communication technology (5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server or terminal device. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer-readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical encoding device, such as punch cards or in-groove raised structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of demographic, the method comprising:

acquiring a plurality of crowd images, wherein each crowd image corresponds to an image acquisition device, and each image acquisition device corresponds to a preset statistical area;

determining the number of people in a preset statistical area corresponding to each image acquisition device based on the head key point positioning of the plurality of crowd images;

determining the number of people in a target area according to the number of people in a preset statistical area corresponding to each image acquisition device, wherein the target area represents a spatial range in which people statistics needs to be carried out, and the target area comprises the preset statistical area corresponding to each image acquisition device.

2. The method of claim 1, further comprising:

configuring a preset statistical area of each image acquisition device;

the number of people in the target area is determined according to the number of people in the preset statistical area corresponding to each image acquisition device, and the method comprises the following steps: and determining the number of people in the target area based on the number of people in the preset statistical area corresponding to each image acquisition device.

3. The method of claim 2,

under the condition that the preset statistical regions of the image acquisition devices are not overlapped and the target region is formed after splicing, determining the number of people in the target region based on the number of people in the preset statistical regions corresponding to the image acquisition devices comprises the following steps:

4. The method of claim 2,

under the condition that the preset statistical areas of the image acquisition devices are at least partially overlapped and can cover the target area after being spliced, determining the number of people in the target area based on the number of people in the area statistical area corresponding to the image acquisition devices comprises the following steps:

wherein the number of repeated statistics in the target area is determined by:

5. The method of claim 4,

under the condition that the preset statistical areas of the image acquisition devices are at least partially overlapped and spliced to form the target area, determining the number of people in the target area based on the number of people in the preset statistical areas corresponding to the image acquisition devices comprises:

determining the number of people in the target area according to the difference between the sum of the number of people in the preset statistical area corresponding to each image acquisition device, the repeated statistical number of people in the target area and the number of people outside the target area;

6. The method according to any one of claims 1 to 5, further comprising:

acquiring a preset seat number of the target area;

and determining the seating rate in the target area according to the number of people in the target area and the preset seat number.

7. The method of claim 6, wherein the obtaining the preset number of seats of the target area comprises:

8. The method according to any one of claims 1 to 7, wherein the determining the number of people in the preset statistical area corresponding to each image capturing device based on the head key point positioning performed on the plurality of people images respectively comprises:

obtaining a target positioning map corresponding to each crowd image based on the head key point positioning of the crowd images, wherein the target positioning map is used for indicating the position of a target head key point included in the corresponding crowd image;

9. The method according to claim 8, wherein the determining the number of people in the preset statistical area corresponding to each image capturing device based on the target positioning map corresponding to each people image comprises:

10. The method of claim 8, wherein the obtaining a target location map corresponding to each of the plurality of crowd images based on the respective head keypoints locations of the plurality of crowd images comprises:

11. The method of claim 10, wherein said determining a target neighborhood corresponding to each of said initial head keypoints in said initial localization map comprises:

12. The method according to claim 10, wherein the filtering a target neighborhood corresponding to each of the initial human head key points based on the predicted location map to obtain the target location map comprises:

13. A demographic device, the device comprising:

14. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 12.

15. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any one of claims 1 to 12.