CN111242023A

CN111242023A - Statistical method and statistical device suitable for complex light passenger flow

Info

Publication number: CN111242023A
Application number: CN202010028677.1A
Authority: CN
Inventors: 刘东海; 沈修平
Original assignee: SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Current assignee: SHANGHAI ULUCU ELECTRONIC TECHNOLOGY CO LTD
Priority date: 2020-01-11
Filing date: 2020-01-11
Publication date: 2020-06-05

Abstract

The invention relates to a statistical method and a statistical device suitable for complex light passenger flow, comprising the following steps: 1) under complex light, the detection accuracy of passenger flow is reduced, and a tracking algorithm cannot be correctly matched and associated with the same target; under the condition, according to the detection area, the imaging of the alignment head-shoulder detection is automatically applicable, so that the significance of the detection object under complex light rays can be obviously improved; 2) an improved first-order model with both performance and real-time performance is more suitable for the platform operation of an embedded end; 3) by adopting the improved correlation filtering method, after the tracking loss exists in the actual use, a relocation method is added in the algorithm after the target tracking fails. In the actual application scene, for example, there is the scene of sheltering from of goods shelves, and customer can appear sheltering from to a different extent when bowing and lowering head when getting goods. Other conventional algorithms are easily lost for tracking. After the algorithm is improved, the relocation is carried out after the tracking fails, and the improvement on the occlusion scene is obvious.

Description

Statistical method and statistical device suitable for complex light passenger flow

Technical Field

The invention relates to passenger flow statistics, in particular to a passenger flow statistics method and a passenger flow statistics device suitable for complex light.

Background

The customer flow of the physical store is the most important data in the business data, the efficiency of store operation management can be known according to the customer flow, and the operation means can be continuously changed on the basis of the efficiency, so that the operation performance is improved. Therefore, the statistics of store passenger flow volume has been used as an important commercial market research means to provide accurate and timely data reference for operation decision and comprehensive management of operators. Application scenarios: and stores in various scenes such as chain stores, supermarkets, hotels, airports, subways, scenic spots and the like in shopping malls. The video camera part for counting passenger flow is arranged in the people, and a plurality of video cameras are close to the gate and are easy to interfere with light. Or the mirror reflection of the light ground, the condition that partial area is too bright and too dark occurs. An imaging control system of a camera generally considers exposure control and gain adjustment of a global area according to global and brightness conditions. The monitoring video has the phenomenon that the target area detected by the head of the shop person is just in an overexposed area or an overexposed area, or if the light motion blur is serious. The detection target objects are difficult to distinguish, so that the phenomena of missing detection or false detection occur.

The passenger flow detection method generally adopts a background modeling-based mode to obtain a moving target; the detection accuracy is easily influenced by light change; or the model parameters based on deep learning are very large, such as feature extraction based on VGG 16. The scale of the model is in direct proportion to the computing power and the memory demand. The large model is difficult to apply in front-end embedded devices. The model provided by the patent has the advantages that the model framework is cut and optimized through performance, and the purpose of available use in embedded equipment is met by adopting a model which has performance consideration and can be light-weight and efficient.

At present, in an application scene, most of cameras are installed near a street store doorway and are greatly influenced by sunlight change, and in another situation, the light source is applied in a ground view range, and partial areas are over-bright. The complex light environment influences the quality of images acquired by the camera, and further influences the detection algorithm of passenger flow; thereby affecting the accuracy of the detection algorithm and the tracking algorithm.

In general, a camera device adjusts the exposure of a camera by using the brightness of a global area. If the detected object reaches a weak light area, the specific object can be difficult to see clearly, or the exposure is low, and the motion blur of the moving object is very serious. The discrimination of the detected target is low, and the accuracy of target detection is improved by improving the exposure brightness of the effective target area.

Disclosure of Invention

The invention aims to provide a passenger flow statistical method which is practical under complex light, through head detection under different environments and according to a target area of a previous frame, program control parameters of a target of the next frame are calculated, so that the accuracy rate of the head detection is improved, and the method is suitable for a detection algorithm of embedded equipment and an improved tracking method thereof. The overall accuracy and the practicability of the passenger flow system are improved.

The specific technical scheme of the invention is as follows:

a passenger flow statistical method suitable for complex light comprises the following steps:

1) under complex light, the detection accuracy of passenger flow is reduced, and a tracking algorithm cannot be correctly matched and associated with the same target; under the condition, according to the detection area, the imaging of the alignment head-shoulder detection is automatically applicable, so that the significance of the detection object under complex light rays can be obviously improved;

2) an improved first-order model with both performance and real-time performance is more suitable for the platform operation of an embedded end;

3) by adopting the improved correlation filtering method, the target is difficult to find after the tracking loss exists in practical use. Therefore, a relocation method is added in the algorithm after the target tracking fails.

The invention also provides a passenger flow statistical device suitable for complex light rays, which comprises the following components: the system comprises a human head detection module, a self-adaptive detection target area imaging control module, an improved DCF tracking module and a passenger flow counting module;

the human head detection module adopts a first-order model, and does not need to convert a detection image into a multi-scale image, namely, the data is an image with a fixed scale; respectively inputting images under different scales into a convolutional neural network trained according to a training image with a calibrated human head position, wherein the convolutional neural network comprises: the system comprises a fast transformation convolution module, a cascaded feature extraction module, a convolution dimensionality reduction module and a multi-target loss function thereof;

in the multi-target loss function, the regression layer in the convolutional neural network for classification is replaced by the regression layer for mapping the second feature output by the converted convolutional layer into the head position and the corresponding confidence coefficient; training the convolutional neural network comprising the prepositive layer, the converted convolutional layer and the replaced regression layer by adopting a training image with the calibrated head position;

the filtering the head position corresponding to each sub-image according to the corresponding confidence coefficient to obtain the head position detected in the image to be detected comprises: screening out the head positions corresponding to the sub-images, wherein the confidence coefficient of each head position is higher than or equal to a confidence coefficient threshold value; selecting a head position which is intersected with the screened head position in the image to be detected from the head positions corresponding to each sub-image; determining the human head position detected in the image to be detected according to the screened human head position and the selected human head position;

the self-adaptive detection target area imaging control module manually calibrates a detection area and performs program control according to the area in the area without the detection target; when a detection target appears, adjusting a program control system of the camera according to the brightness distribution of the detection target area; if the object appears, adjusting the shutter and the gain of the next frame image of the camera according to the bright light distribution condition of the target area;

for the current understanding area, the brightness distribution of the detection area is compared according to the previous accumulated target brightness distribution value and the brightness distribution value of the current detection area, and the weighting adjustment is carried out according to a certain weight, so that the detection area can distinguish the target more obviously; adjusting the algorithm distribution condition of the brightness area, wherein local over-brightness or over-darkness influences the overall image adjustment of the detection area;

the improved DCF tracking module adopts a DCF tracking algorithm, although the tracking performance and the real-time performance are relatively high, the target is difficult to find after the tracking loss exists in the actual use. Therefore, the relocation algorithm is added in the algorithm after the target tracking fails;

the target tracking state is evaluated through the PSR value, and when the target is lost, a target repositioning algorithm is called to determine the position of the target so as to continuously and accurately track the target;

PSR is defined as: the PSR value τ is defined as (gmax- μ s1)/σ s1, where gmax is the maximum value of the correlation output, and μ s1 and σ s1 respectively represent the mean and standard deviation of the remaining region excluding the 11 × 11 region centered on the maximum value of the correlation output. Two thresholds τ a and τ b are defined herein to represent the tracking status of the target as failed and successful, respectively. When tau is less than or equal to tau a, the target is lost, and the target needs to be relocated; when tau is larger than or equal to tau b, the target is correctly tracked, and the next frame of image is continuously tracked.

The target relocation algorithm of the time domain from coarse to fine: after the target loss is judged according to the PSR tau, a rapid relocation algorithm from coarse to fine is carried out by using a time domain model gamma prev and an MOSSE filter H; gamma prev is a time domain model extracted when a previous frame meets tau is more than or equal to tau b according to a tracking result PSR; the algorithm comprises the following steps: firstly, using a variance filter for rough detection to filter out an area where most targets do not exist; then further screening the remaining regions by using a histogram intersection method; and finally, carrying out fine inspection on the screened area by using an MOSSE filter:

(1) coarse checking the variance filter according to gamma_prevCalculating standard deviation, and extracting standard deviation satisfying | sigma-sigma₀|≤σ_minAnd combining NMS (non-maximum suppression algorithm) processing to obtain a candidate set C₁. Where σ and σ₀Respectively representing the current candidate box and gamma_prevStandard deviation of (a)_minIs the standard deviation threshold. The standard deviation calculation formula is as follows:

wherein N represents the number of pixel points, x represents the gray value of the pixel, and m is the mean value of the image. For the calculation of the standard deviation sigma, an integral graph can be used for solving, and the rough detection operation is accelerated;

(2) candidate box set C by histogram intersection₁Further screening out the possibility of existence of the targetLarge candidate frame to obtain candidate frame set C₂The histogram intersection formula is as follows:

d(H₁，H₂)＝∑min(H₁(i)，H₂(i)) (5)

if d ≧ d₀The candidate box is retained, otherwise the current candidate box is discarded. Wherein d is₀Is the histogram intersection threshold value, d is the intersection value after histogram normalization;

(3) fine inspection was performed using a MOSSE filter H, which was defined and output related as follows:

R＝F⊙H^*(7)

wherein F represents the fourier transform of the training image F, G represents the fourier transform of a two-dimensional gaussian G with a peak at the center point of the image F, x represents the conjugation operation, and R represents the correlation output. According to the correlation theory, when the correlation value of the two images is larger, the images are more similar, and the fine inspection target is obtained by adopting the following formula based on the image:

wherein C' represents the best candidate frame after the fine inspection,

which represents the fourier transform of the image extracted with the candidate frame C as the region.

The passenger flow counting module is a human head detection module, a human head area appearing in the area is detected firstly, and the tracking module correlates the target areas detected by adjacent frames according to the detected target area to obtain a track sequence of people walking in the area. The area is divided into two areas by a midline mode, and the entering direction of the detection area is counted according to the appearance condition of the track of the person in the coordinate area.

The technical effects are as follows:

the invention adopts a lightweight model, gives consideration to both accuracy and real-time performance, and better uses the product deployment of front-end embedded equipment, thereby achieving the practical purpose. Meanwhile, under complex light, target detection loss may occur, and the tracking algorithm is improved better. The invention can improve the accuracy and the practicability of the passenger flow statistical method and the passenger flow statistical device under complex light. In the actual application scene, for example, there is the scene of sheltering from of goods shelves, and customer can appear sheltering from to a different extent when bowing and lowering head when getting goods. Other conventional algorithms are easily lost for tracking. After the statistical method is improved and the tracking fails, the relocation is carried out, and the improvement on the occlusion scene is obvious.

Drawings

FIG. 1: fast transform convolution module schematic.

FIG. 2: schematic diagram of an acceptance v2 module.

FIG. 3: schematic of the model of initiation plus residual layer residual.

FIG. 4 is a schematic diagram of cascade initiation.

Fig. 5 is a flow diagram of a head detection model.

FIG. 6 is a diagram of a multi-objective loss function loss branch.

Fig. 7 is a flow chart of an imaging control for adaptively detecting a target area.

FIG. 8 is a target relocation flow chart.

Detailed Description

Examples

2) an improved first-order (one _ stages) model with both performance and real-time performance is more suitable for the platform operation of an embedded end;

the human head detection module adopts a first-order (one _ locations) model, and a detection image does not need to be converted into a multi-scale image, namely, data can be an image with a fixed scale; respectively inputting images under different scales into a convolutional neural network trained according to a training image with a calibrated human head position, wherein the convolutional neural network comprises: a fast transformation convolution module (Rapid transformed Convolutional Model), a cascaded feature extraction module (Cascade inclusion Model), a convolution dimensionality reduction module and a multi-target loss function (loss branch) thereof;

the fast transform convolution module: the size of the feature value (feature map) is rapidly reduced. The method comprises the following steps:

(1) the span (stride) of the convolutional layer (Conv1), pooling layer (Pool1), convolutional layer (Conv2) and pooling layer (Pool2) is typically large, such as 2, 4. Therefore, the span (stride) of the whole fast transformation convolution module is larger, and the size of the feature map can be reduced quickly;

(2) too large a convolution (or pooling layer) kernel is slow and too small a kernel covers insufficient information. After the balance, the cores of the convolutional layer (Conv1), the pooling layer (Pool1), the convolutional layer (Conv2) and the pooling layer (Pool2) are also set to be larger, and the use of 7x7,5x5,3x3 is not limited;

(3) reducing the number of convolution kernels under the condition of ensuring that the output dimension is not changed by using a parameter activation function (CReLU);

the fast transform Convolution module (among which Convolution layer (Convolution), batch normalization (BatchNorm), negative activation (activation), multi-scale connection layer (termination), scaling (scale), activation layer (Relu)) is shown in fig. 1.

The cascaded feature extraction modules are not limited to use of one of the initiation v1, v2, v3 and v4 or a cascaded combination of a plurality of modules.

For example, the following is a network structure diagram of an acceptance v2 module (including: base layer (Bace), different size convolutional layer (Conv)) as shown in FIG. 2.

It is also possible to use residual layer (residual) modules in the resnet network for reference. The combination of the inclusion plus residual layer (residual), including the active layer (Relu), the active multi-scale layer (Activation Scaling), is shown in fig. 3.

The different start network layer (initiation) modules are grouped together, as shown in fig. 4 for three start network layers (initiation).

Further features (features) can be extracted by concatenating the start network layer (initiation) and its convolutional layer (conv). The human head target region of the first minute part (Tiny part);

further, by dimension reduction convolution, target features (features) of the medium part and large part scale range are found. The model flow chart of the whole test is shown in fig. 5.

The multi-objective loss function (loss branch) replaces a regression layer in the convolutional neural network for classification with a regression layer for mapping a second feature output by the converted convolutional layer into a human head position and a corresponding confidence coefficient; and training the convolutional neural network comprising the prepositive layer, the converted convolutional layer and the replaced regression layer by adopting the training image with the calibrated head position.

The filtering the head position corresponding to each sub-image according to the corresponding confidence coefficient to obtain the head position detected in the image to be detected comprises:

screening out the head positions corresponding to each sub-image, wherein the confidence coefficient (classification) of the head positions is higher than or equal to a confidence coefficient threshold value; selecting a human head position (Bbox regression) which is intersected with the screened human head position in the image to be detected from the human head positions corresponding to each sub-image; and determining the human head position detected in the image to be detected according to the screened human head position and the screened human head position. As shown in fig. 6.

And the self-adaptive detection target area imaging control module is used for manually calibrating the detection area and performing program control according to the area in the area without the detection target. When a detection target appears, adjusting a program control system of the camera according to the brightness distribution of the detection target area; if the object appears, the shutter and the gain of the next frame image of the camera are adjusted according to the bright light distribution condition of the target area.

For the current understanding area, the brightness distribution of the detection area is compared according to the previous accumulated target brightness distribution value and the brightness distribution value of the current detection area, and the weighting adjustment is carried out according to a certain weight, so that the detection area can distinguish the target more obviously.

Adjusting the algorithm distribution condition of the brightness area, wherein local over-brightness or over-darkness influences the overall image adjustment of the detection area; the control flow chart of the imaging of the self-adaptive detection target area is shown in the figure 7.

Improved DCF tracking module

Although the tracking performance and the real-time performance of the DCF tracking algorithm are relatively high, the target is difficult to find after the tracking loss exists in practical use. Therefore, the relocation algorithm is added after the target tracking fails in the algorithm.

The target tracking state is evaluated through the PSR value, and when the target is lost, the target relocation algorithm is called to determine the position of the target, so that accurate tracking can be continued.

PSR definition

The PSR value τ is defined as (gmax- μ s1)/σ s1, where gmax is the maximum value of the correlation output, and μ s1 and σ s1 respectively represent the mean and standard deviation of the remaining region excluding the 11 × 11 region centered on the maximum value of the correlation output. Two thresholds τ a and τ b are defined herein to represent the tracking status of the target as failed and successful, respectively. When tau is less than or equal to tau a, the target is lost, and the target needs to be relocated; when tau is larger than or equal to tau b, the target is correctly tracked, and the next frame of image is continuously tracked.

Target relocation algorithm from coarse time domain to fine time domain

After the target loss is judged according to the PSR tau, a fast relocation algorithm from coarse to fine is carried out by using a time domain model gamma prev and a MOSSE filter H. Where γ prev is a time domain model extracted when the previous frame satisfies τ ≧ τ b from the tracking result PSR. The algorithm comprises the following steps: firstly, using a variance filter for rough detection to filter out an area where most targets do not exist; then further screening the remaining regions by using a histogram intersection method; and finally, carrying out fine inspection on the screened area by using an MOSSE filter. The target relocation flowchart is shown in fig. 8.

wherein N represents the number of pixel points, x represents the gray value of the pixel, and m is the mean value of the image. And the integral graph can be used for solving the calculated standard deviation sigma, so that the rough detection operation is accelerated.

(2) Candidate box set C by histogram intersection₁Further screening out candidate frames with high target existence possibility to obtain a candidate frame set C₂The histogram intersection formula is as follows:

d(H₁，H₂)＝∑min(H₁(i)，H₂(i)) (5)

if d ≧ d₀The candidate box is retained, otherwise the current candidate box is discarded. Wherein d is₀Is the histogram intersection threshold value, d is the intersection value after histogram normalization.

R＝F⊙H^*(7)

wherein C' represents the best candidate frame after the fine inspection,

Passenger flow counting module

The human head detection module is used for detecting a human head area appearing in the area, and the tracking module is used for correlating the target areas detected by the adjacent frames according to the detected target area to obtain a track sequence of people walking in the area. The area is divided into two areas by a midline mode, and the entering direction of the detection area is counted according to the appearance condition of the track of the person in the coordinate area.

The technical effects are as follows:

the invention adopts a lightweight model, gives consideration to both accuracy and real-time performance, and better uses the product deployment of front-end embedded equipment, thereby achieving the practical purpose. Meanwhile, under complex light, target detection loss may occur, and the tracking algorithm is improved better. The invention can improve the accuracy and the practicability of the passenger flow statistical method and the passenger flow statistical device under complex light. In the actual application scene, for example, there is the scene of sheltering from of goods shelves, and customer can appear sheltering from to a different extent when bowing and lowering head when getting goods. Other conventional algorithms are easily lost for tracking. After the algorithm is improved, the relocation is carried out after the tracking fails, and the improvement on the occlusion scene is obvious.

Claims

1. A passenger flow statistical method suitable for complex light comprises the following steps:

3) by adopting the improved correlation filtering method, after the tracking loss exists in the actual use, a relocation method is added in the algorithm after the target tracking fails.

2. A passenger flow statistical device suitable for complex light is characterized by comprising a human head detection module, a self-adaptive detection target area imaging control module, an improved DCF tracking module and a passenger flow counting module;

the improved DCF tracking module adopts a DCF tracking algorithm and increases a relocation algorithm after target tracking fails;

evaluating a target tracking state through a PSR value, and calling a target repositioning algorithm to determine the position of a target after the target is lost so as to continuously and accurately track;

PSR is defined as: defining a PSR value τ ═ (gmax- μ s1)/σ s1, where gmax is the maximum value of the correlation output, and μ s1 and σ s1 respectively represent the mean and standard deviation of the remaining region excluding the 11 × 11 region centered on the maximum value of the correlation output; thresholds tau a and tau b respectively represent that the tracking state of the target is failure and success; when tau is less than or equal to tau a, the target is lost, and the target needs to be relocated; when the tau is larger than or equal to the tau b, representing that the target is correctly tracked, and continuously tracking the next frame of image;

(1) coarse checking the variance filter according to gamma_prevCalculating standard deviation, and extracting standard deviation satisfying | sigma-sigma₀|≤σ_minAnd combining NMS (non-maximum suppression algorithm) processing to obtain a candidate set C₁(ii) a Where σ and σ₀Respectively representing the current candidate box and gamma_prevStandard deviation of (a)_minIs the standard deviation threshold; the standard deviation calculation formula is as follows:

wherein N represents the number of pixel points, x represents the gray value of the pixel, and m is the average value of the image; for the calculation of the standard deviation sigma, an integral graph can be used for solving, and the rough detection operation is accelerated;

d(H₁，H₂)＝Σmin(H₁(i)，H₂(i) (5)

if d ≧ d₀Keeping the candidate frame, otherwise discarding the current candidate frame; wherein d is₀Is the histogram intersection threshold value, d is the intersection value after histogram normalization;

R＝F⊙H^*(7)

wherein F represents the Fourier transform of the training image F, G represents the Fourier transform of a two-dimensional Gaussian G with a peak at the center of the image F, represents the conjugate operation, and R represents the correlation output

Obtaining a fine inspection target:

wherein C' represents the best candidate frame after the fine inspection,

3. The device of claim 2, wherein the head detection module, using a first-order model, does not need to convert the detected image into a multi-scale image, but rather data is a fixed-scale image; respectively inputting images under different scales into a convolutional neural network trained according to a training image with a calibrated human head position, wherein the convolutional neural network comprises: the system comprises a fast transformation convolution module, a cascaded feature extraction module, a convolution dimensionality reduction module and a multi-target loss function thereof;

the filtering the head position corresponding to each sub-image according to the corresponding confidence coefficient to obtain the head position detected in the image to be detected comprises: screening out the head positions corresponding to the sub-images, wherein the confidence coefficient of each head position is higher than or equal to a confidence coefficient threshold value; selecting a head position which is intersected with the screened head position in the image to be detected from the head positions corresponding to each sub-image; and determining the human head position detected in the image to be detected according to the screened human head position and the screened human head position.

4. The device according to claim 2, wherein the passenger flow counting module is a human head detection module, which detects human head areas appearing in the area first, and the tracking module correlates the target areas detected by the adjacent frames according to the detected target areas to obtain a sequence of tracks traveled by people in the areas; the area is divided into two areas by a midline mode, and the entering direction of the detection area is counted according to the appearance condition of the track of the person in the coordinate area.