CN111160101B - Video personnel tracking and counting method based on artificial intelligence - Google Patents

Video personnel tracking and counting method based on artificial intelligence Download PDF

Info

Publication number
CN111160101B
CN111160101B CN201911200873.6A CN201911200873A CN111160101B CN 111160101 B CN111160101 B CN 111160101B CN 201911200873 A CN201911200873 A CN 201911200873A CN 111160101 B CN111160101 B CN 111160101B
Authority
CN
China
Prior art keywords
pedestrian
pedestrians
video
samples
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911200873.6A
Other languages
Chinese (zh)
Other versions
CN111160101A (en
Inventor
邹建红
高元荣
陈雯珊
王辉
陈哲
张兴
王宇奇
陈彬
陈凡千
孙建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Nebula Big Data Application Service Co ltd
Original Assignee
Fujian Nebula Big Data Application Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Nebula Big Data Application Service Co ltd filed Critical Fujian Nebula Big Data Application Service Co ltd
Priority to CN201911200873.6A priority Critical patent/CN111160101B/en
Publication of CN111160101A publication Critical patent/CN111160101A/en
Application granted granted Critical
Publication of CN111160101B publication Critical patent/CN111160101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/754Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries involving a deformation of the sample pattern or of the reference pattern; Elastic matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video personnel tracking and counting method based on artificial intelligence, which comprehensively utilizes learning features extracted by a convolutional neural network and artificial features extracted by geometric calculation, utilizes a tracker capable of updating network parameters on line to carry out multi-target matching between video image sequences, and calculates personnel increment according to the change of inner and outer identification positions of the same pedestrian in adjacent frames. A group of feature sets are obtained by learning from a mass of public video data sets by using a sparse self-encoder and are used as a filter of the convolutional neural network, so that the online updating efficiency of the convolutional neural network is improved. In addition, common personnel shielding modes are considered, and counting errors caused by shielding are compensated. The method has robustness, real-time performance, relatively high precision and strong anti-blocking capability, is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.

Description

Video personnel tracking and counting method based on artificial intelligence
[ technical field ] A method for producing a semiconductor device
The invention belongs to the technical field of intelligent video monitoring and analysis, and particularly relates to a video personnel tracking and counting method based on artificial intelligence.
[ background of the invention ]
Generally, two ideas are available for detecting and counting the number of people in a building. The method comprises the steps of firstly, accumulating and summing the number of people detected in the monitoring video of each layer and each area of the building to serve as the total number of people in the building. This concept requires that building video surveillance must be fully covered. In addition, since the number of people detected by each monitoring video has a certain error, the summation error is large. And subtracting the accumulated number of people who leave from the accumulated number of people who enter detected in the monitoring videos of all the entrances and exits of the building to obtain the total number of people in the building. The number of the network cameras related to the idea is small, the accumulated error is relatively small, and the feasibility is good.
Actually, the second idea is to analyze the monitoring videos at the entrance and exit of the public building in real time to realize the passenger flow statistics, which is a technical solution that receives much attention and is gradually applied in recent years. The technical scheme generally requires that a network camera with a vertical downward overlooking visual angle is installed at the top end of each entrance and exit of the building, videos of people entering and exiting the building are captured, and the aim of calculating passenger flow is fulfilled by detecting and counting the heads of the people through an intelligent front end or a background. However, many times, owners do not want to additionally deploy a network video monitoring system dedicated for passenger flow statistics, but want to add a certain video analysis software module on the basis of the deployed video monitoring system for security and protection purpose to realize the passenger flow statistics. Since this not only simplifies the deployment of the system, but also avoids increasing hardware costs.
However, in order to obtain a large monitoring range, the network camera of the security video monitoring system is usually installed on the roof of a house, and looks obliquely at a monitoring area at a certain angle. In this scenario, people detection and counting cannot be achieved simply by detecting their heads. In a monitoring video scene obtained from a vertical downward overlooking visual angle, the human head characteristics are simple and consistent, no mutual shielding phenomenon exists generally, and a video analysis algorithm is simpler. However, in a monitored video scene observed from an oblique viewing angle, the human head features are complex, and the phenomenon of blocking or covering with other people often occurs, which adds great difficulty to the video analysis technology.
[ summary of the invention ]
The invention aims to solve the technical problem of providing a video personnel tracking and counting method based on artificial intelligence, selecting proper characteristics and establishing a reasonable pedestrian shielding model to effectively improve the accuracy of personnel detection and counting, realizing continuous robust tracking matching by utilizing a pedestrian tracking matching algorithm and meeting the real-time and long-term counting requirement of video monitoring.
The invention is realized by the following technical scheme:
a video personnel tracking and counting method based on artificial intelligence comprises the following steps:
step 1: initializing a video frame number n =1, and segmenting an nth frame video object to obtain a pedestrian connected domain set
Figure GDA0002419398600000021
Calculating a feature vector for a jth pedestrian>
Figure GDA0002419398600000022
And motion vector pick>
Figure GDA0002419398600000023
Setting the longest unacquired tracking matching number of the jth pedestrian>
Figure GDA0002419398600000024
The calculation method of the feature vector and the motion vector of the pedestrian is as follows:
the feature vector of the jth pedestrian is v j =(x j ,y j ,S j ) Wherein (x) j ,y j ) Is p j Center of mass coordinate of S j Is p j Area of (d):
Figure GDA0002419398600000025
wherein, y h For monitoring the height of the video image, N i And M i Are each p j The number of pixels in the length and width directions of the circumscribed rectangle, f j (x, y) is p j The binary image of (2):
Figure GDA0002419398600000031
the motion vector of the jth pedestrian is m j =(l j ,λ j ) Wherein l is j =l(p j ) For the inside and outside of the door by a pedestrian, l j =0 denotes the inside of the door (in the building) | j =1 represents the outside of the door (outside the building); lambda [ alpha ] j For the longest untracked matching times of jth pedestrian, i.e. lambda j =λ(p j );
Step 2: dividing the (n + 1) th frame video object
Figure GDA0002419398600000032
Calculate->
Figure GDA0002419398600000033
And &>
Figure GDA0002419398600000034
j=1,...,k;
And 3, step 3: at P (n) Middle search and P (n+1) Adapted for
Figure GDA0002419398600000035
And sets->
Figure GDA0002419398600000036
i=1,...,k;
To P (n+1 ) Each pedestrian in (1)
Figure GDA0002419398600000037
Are all from P n Finding a pedestrian or a vehicle matched to the tracking>
Figure GDA0002419398600000038
If the matching is successful, calculating the number increment in: (1) If the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) If the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0;
successfully obtaining matched p i The longest untracked matching times lambda i All are reset;
if the matching is successful, the matching needs to be checked
Figure GDA0002419398600000039
Whether the judgment condition of the combined type shielding is met or not is judged, if yes, the in detection needs to be compensated;
if the matching fails, the judgment is needed
Figure GDA00024193986000000310
Whether the pedestrian is a blocked pedestrian in the nth frame; if it is
Figure GDA00024193986000000311
If the judgment condition of distributed shielding is met, compensating the in; otherwise, it is looked pick>
Figure GDA00024193986000000312
For pedestrians newly appearing in the monitored area, let λ i =0;
And 4, step 4: examination of P (n) Those failing to react with P (n+1) Successfully matched rowsPeople, supplementing these pedestrians to P (n+1) The longest untracked matching time is added with 1; if the pedestrian is matched in the (n + 2) th frame, judging that intermittent shielding occurs and in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached; if it is
Figure GDA0002419398600000041
If the judgment condition of the convergent type shielding is met, compensating the in;
and 5: rejecting pedestrians and misdetected pedestrians who have left the monitored area, for P (n+1) Checking whether the longest untracked matching frequency of each pedestrian exceeds a threshold value;
if the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned;
otherwise, the pedestrian is considered to be temporarily shielded and should be reserved;
meanwhile, whether the area of the pedestrian exceeds the range is checked, if the area of the pedestrian is not within the range, the pedestrian is considered to be detected wrongly and should be discarded;
updating P (n+1)
Step 6: and (4) making n = n +1, and jumping to the step 2 until the analysis of the whole video image sequence is completed.
Further, the tracking matching in step 3 specifically includes the following steps:
step 31: initializing a video frame number n =1, tracker T (W);
step 32: handle
Figure GDA0002419398600000042
Center of mass (x) i ,y i ) Translated to the coordinate->
Figure GDA0002419398600000043
To (3). The translation mode is that the center of mass is taken as the center, and the center of mass is translated to D between the center of mass and the direction of the center of mass 8 Pixel point with distance equal to d, wherein->
Figure GDA0002419398600000044
d =5, 10; in conjunction with->
Figure GDA0002419398600000045
Obtaining 17 samples (labeled as i) of the ith class;
Figure GDA0002419398600000046
Figure GDA0002419398600000051
step 33: using the obtained samples to form a sample set C (1) And training the tracker T (W) to determine the parameter as W 1
Step 34: detecting the (n + 1) th frame to obtain P (n+1) Is provided with C (n+1) =C (n)
Step 35: p is to be j ∈P (n+1) Input tracker T (W) n ) Obtaining output; taking the maximum value o of the output m And an upper threshold value sigma 1 Lower threshold σ 21 ≥σ 2 ) And (3) comparison:
(1) If o is m Less than a lower threshold σ 2 Then, consider p j In the (n + 1) th frame, the tracking matching fails for the newly appearing pedestrian. Handle p j Translation of the center of mass to coordinates
Figure GDA0002419398600000052
Is located, wherein>
Figure GDA0002419398600000053
d =5, 10. Together with p j A total of 17 samples were obtained, increasing to C (n+1) As a new class of samples;
(2) If o is m Greater than the upper threshold σ 1 Then consider p to be m ∈P (n) And p j ∈P (n+1) Are highly matched;
(3) If o is m Greater than a lower threshold value sigma 2 But is smaller than the upper threshold value sigma 1 Then, consider p m ∈P (n) And p j ∈P (n+1) Is matched, with j Translation of the center of mass to coordinates
Figure GDA0002419398600000054
Is in, wherein +>
Figure GDA0002419398600000055
d =5, 10. Together with p j Obtaining 17 samples in total, adding the samples into a sample set with the label of m, and removing the 17 samples with the label of m in the first entering class if the number of the samples with the label of m is larger than the capacity V of each class of sample pool;
step 36: updating the sample set, removing the samples of pedestrians who leave the monitoring area and are detected by mistake, and updating the samples into 3 conditions:
(1) For newly emerging pedestrians, a new pedestrian category is created.
(2) For pedestrians whose appearance has changed, new samples should be collected and supplemented. When the number of samples exceeds the capacity V of each type of sample pool during sample expansion, the sample set is updated according to a first-in first-out rule, and the sample entering the sample pool at the earliest time is replaced by the newly supplemented sample. V =34 was determined by experiment.
(3) And for the pedestrian which leaves the monitoring area and is detected by mistake, rejecting the sample of the category to which the pedestrian belongs. After updating, a new sample set C is obtained (n+1)
Step 37: update parameters of tracker T (W): use of C (n+1) Training T (W) and determining the parameter as W n+1 When training T (W), the initial value of the network parameter is W n
Further, the tracker includes: the method comprises the following steps of (1) updating a filter, a convolutional neural network, a discriminant classifier and parameters on line;
obtaining a pedestrian set containing moving targets after the nth frame image is segmented by a video object, adjusting the area of each pedestrian rectangular frame to be 50 multiplied by 110, and inputting the pedestrian rectangular frames into a convolutional neural network;
the convolutional neural network inputs the extracted features into a discriminant classifier, and the discriminant classifier outputs a tracking result vector and gives the probability that the pedestrian in the current frame belongs to each class;
if the tracking result shows that the appearance characteristics of the newly appeared pedestrians are changed, the pedestrians leave the monitoring area and the false detection situation occurs, the sample set is updated, the hidden layer and the classifier are retrained, new network parameters are determined, and then the pedestrian tracking of the (n + 1) th frame is started.
Furthermore, the filter in the tracker is a set of feature sets pre-trained by a sparse self-encoder, and is obtained by training in a massive unsupervised auxiliary training set, so that the filter has good generality and completeness, the pre-training process of the features is an off-line process, and the trained features are not updated when a target tracking algorithm is executed.
Further, the convolution kernel used by the convolutional neural network in the tracker is a filter composed of 100 pre-training features with the size of 10 × 10.
Further, a mathematical model of a discriminant classifier in the tracker employs a SoftMax function.
The invention has the advantages that: the method has robustness, real-time performance, relatively high precision and strong anti-blocking capability, and can meet the application requirement of long-time uninterrupted operation of video monitoring. The method is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.
[ description of the drawings ]
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
FIG. 1 is a flow chart of a video people tracking and counting method based on artificial intelligence of the present invention.
FIG. 2 is a schematic view of the same side type occlusion of the present invention.
FIG. 3 is a schematic diagram of distributed occlusion according to the present invention.
FIG. 4 is a schematic diagram of the convergent occlusion of the present invention.
FIG. 5 is a schematic view of the intermittent occlusion of the present invention.
FIG. 6 is a schematic diagram of a merged occlusion of the present invention.
FIG. 7 is a table of occlusion mode determination and compensation according to the present invention.
FIG. 8 is a flow chart of the trace matching algorithm of the present invention.
FIG. 9 is a block diagram of a convolutional neural network-based tracker of the present invention.
Fig. 10 is a network structure diagram of the sparse autoencoder of the present invention.
FIG. 11 is a sparse self-encoder training result visualization diagram of the present invention.
[ detailed description ] A
Fig. 1 is a video human tracking and counting method based on artificial intelligence, which calculates human increment through the change of side identification position of the same pedestrian in adjacent frames, matches multiple pedestrians in adjacent frames using a tracker based on convolutional neural network feature extraction and online parameter update, detects common occlusion pattern and compensates human increment, and the method comprises the following steps:
step 1: initializing a video frame number n =1, and segmenting an nth frame video object to obtain a pedestrian connected domain set
Figure GDA0002419398600000081
Calculating a feature vector ^ of the jth pedestrian>
Figure GDA0002419398600000082
And motion vector pick>
Figure GDA0002419398600000083
Setting a maximum number of untracked matches >>
Figure GDA0002419398600000084
The following describes a method for calculating a feature vector and a motion vector of a pedestrian.
The feature vector of the jth pedestrian is v j =(x j ,y j ,S j ) Wherein (x) j ,y j ) Is p j Center of mass coordinate of S j Is p j Area of (c):
Figure GDA0002419398600000085
wherein, y h For monitoring the height of the video image, N i And M i Are each p j The number of pixels in the length and width directions of the circumscribed rectangle, f j (x, y) is p j The binary image of (2):
Figure GDA0002419398600000086
the motion vector of the jth pedestrian is m j =(l j ,λ j ) Wherein l is j =l(p j ) For the inside and outside of the door by a pedestrian, l j =0 denotes the inside of the door (in the building) | j =1 represents the outside of the door (outside the building); lambda [ alpha ] j For the longest untracked matching times of jth pedestrian, i.e. lambda j =λ(p j )。
Step 2: dividing the (n + 1) th frame video object
Figure GDA0002419398600000087
Calculate->
Figure GDA0002419398600000088
And &>
Figure GDA0002419398600000089
j=1,...,k。
And step 3: at P (n) Middle search and P (n+1) Adapted for
Figure GDA0002419398600000091
And sets->
Figure GDA0002419398600000092
i=1,...,k。
To P (n+1) Each pedestrian in (1)
Figure GDA0002419398600000093
Are all from P n Finding a pedestrian or a vehicle matched to the tracking>
Figure GDA0002419398600000094
If the matching is successful, calculating the number increment in: (1) If the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) If the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; and (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0.
In either case, a successful match of p is obtained i The longest untracked matching times lambda i Is cleared. If the matching is successful, the matching needs to be checked
Figure GDA0002419398600000095
And if the judgment condition of the combined shielding is met, compensating the in detection.
If the matching fails, the judgment is needed
Figure GDA0002419398600000096
Whether it is a certain blocked pedestrian in the nth frame. If/or>
Figure GDA0002419398600000097
If the judgment condition of distributed shielding is met, compensating the in; otherwise, it is looked pick>
Figure GDA0002419398600000098
For pedestrians newly appearing in the monitored area, let λ i =0。
And 4, step 4: examination of P (n) Those of (1) fail to react with P (n+1) Successfully matched pedestrians, supplementing them to P (n+1) And adds 1 to the longest untracked match. If the pedestrian gets a match in the (n + 2) th frameIf so, judging that intermittent shielding occurs and in also changes correspondingly; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached. If it is
Figure GDA0002419398600000099
And if the judgment condition of the convergent type shielding is met, compensating the in.
And 5: and eliminating pedestrians who have left the monitoring area and misdetected pedestrians. For P (n+1) Each pedestrian of (4) is checked whether the longest untracked match number exceeds a threshold. If the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned; otherwise, the pedestrian is considered to be temporarily blocked and should be reserved. And meanwhile, checking whether the area of the pedestrian exceeds the range, and if the area of the pedestrian is not within the range, determining that false detection occurs and abandoning the detection. Updating P (n+1)
And 6: let n = n +1 and jump to step 2 until the analysis of the entire sequence of video images is completed.
FIGS. 2, 3, 4, 5, 6 are schematic diagrams of unilateral occlusion, decentralized occlusion, convergent occlusion, intermittent occlusion, and merged occlusion, respectively. Table 1 shows the judgment conditions and the personnel count error compensation formula for the five common occlusion modes.
The design idea of anti-shielding is as follows: and regarding pedestrians appearing in the previous frame and not appearing in the current frame as blocked by default, adding the pedestrians to the pedestrian set of the current frame, and recording the blocking times by using the maximum tracking-free matching times, wherein the pedestrians still participate in the matching process of the pedestrian set of the next frame. If the pedestrians are detected again in the next few frames, the maximum number of times of untracked matching is reset; otherwise, it can be considered that the pedestrians are not occluded but have left the monitored area. Thus, if a certain pedestrian p i Maximum number of untracked matches λ i Exceeds a threshold lambda 0 Considered to have left the surveillance zone (including entering the building interior from the inside of the door and moving away from the outside of the door); if λ i Not exceeding lambda 0 And is not equal to zero, consider the pedestrian at the n-thIs occluded in the frame. Lambda [ alpha ] i And p i The relationship between the states of (a) is: if λ i =0, then p i Is located in the monitoring area and is successfully detected; if 0 < lambda i <λ 0 Then p is i Is positioned in the monitoring area and is shielded; if λ i ≥λ 0 Then p is i Leaving the monitored area.
Fig. 7 is a flow chart of the trace matching algorithm used in step 3 of the method of the present invention. The following details how the various steps in the trace matching algorithm are implemented:
step 31: initializing video frame number n =1, tracker T (W).
Step 32: handle
Figure GDA0002419398600000101
Center of mass (x) i ,y i ) Translated to the coordinate->
Figure GDA0002419398600000102
To (3). The translation mode is that the center of mass is used as the center, and the center of mass is translated to D of the center of mass in the direction I 8 Pixel point with distance equal to d, wherein->
Figure GDA0002419398600000103
d =5, 10. In conjunction with->
Figure GDA0002419398600000104
A total of 17 samples (labeled i) from category i were obtained.
Figure GDA0002419398600000111
Figure GDA0002419398600000112
Step 33: using the obtained samples to form a sample set C (1) And training the tracker T (W) to determine the parameter as W 1
Step 34: detecting the (n + 1) th frame to obtain P (n+1) Is provided with C (n+1) =C (n)
Step 35: p is to be j ∈P (n+1 ) Input tracker T (W) n ) And obtaining output. Taking the maximum value o of the output m And an upper threshold value sigma 1 Lower threshold σ 21 ≥σ 2 ) And (3) comparison:
(1) If o is m Less than a lower threshold σ 2 Then, consider p j In the (n + 1) th frame, the tracking matching fails for the newly appearing pedestrian. P is to j Translation of the center of mass to coordinates
Figure GDA0002419398600000113
Is located, wherein>
Figure GDA0002419398600000114
d =5, 10. Together with p j A total of 17 samples were obtained, increasing to C (n+1) As a new class of samples.
(2) If o is m Greater than the upper threshold σ 1 Then, consider p m ∈P (n) And p j ∈P (n+1) Are highly matched.
(3) If o is m Greater than a lower threshold value sigma 2 But is smaller than the upper threshold value sigma 1 Then, consider p m ∈P (n) And p j ∈P (n+1) Is matched, with j Translation of the center of mass to coordinates
Figure GDA0002419398600000115
Is in, wherein +>
Figure GDA0002419398600000116
d =5, 10. Together with p j A total of 17 samples were obtained, adding to the set of samples labeled m. At this time, if the number of samples labeled m is greater than the per-class pool capacity V, 17 samples labeled m are removed first.
Step 36: and updating the sample set, and rejecting the samples of the pedestrians which leave the monitoring area and are detected by mistake. The update is divided into 3 cases:
(1) For newly emerging pedestrians, a new pedestrian category is created.
(2) For pedestrians whose appearance has changed, new samples should be collected and supplemented. When the number of samples exceeds the capacity V of each type of sample pool during sample expansion, the sample set is updated according to a first-in first-out rule, and the sample entering the sample pool at the earliest time is replaced by the newly supplemented sample. V =34 was determined experimentally.
(3) And for the pedestrians who leave the monitoring area and are detected by mistake, rejecting the samples of the categories to which the pedestrians belong.
After updating, a new sample set C is obtained (n+1)
Step 37: the parameters of the tracker T (W) are updated. Use of C (n+1) Training T (W) and determining the parameter as W n+1 . In training T (W), the initial value of the network parameter is W n
Fig. 8 is a diagram of a tracker structure in a tracking matching algorithm. The tracker T (W) mainly comprises a filter, a convolutional neural network, a discriminant classifier, parameter online updating and the like.
And (3) segmenting the nth frame image by using a video object to obtain a pedestrian set containing a moving target, adjusting the area of each pedestrian rectangular frame to be 50 x 110, and inputting the pedestrian rectangular frames into the convolutional neural network. The convolutional neural network inputs the extracted features into a discriminant classifier, and the classifier outputs tracking result vectors to give the probability that the pedestrians in the current frame belong to each class. And if the tracking result shows that the appearance characteristics of newly appeared pedestrians are changed, the pedestrians leave the monitoring area, the false detection is carried out and the like, updating the sample set, retraining the hidden layer and the classifier, determining new network parameters, and then entering the pedestrian tracking of the (n + 1) th frame.
The design and training methods of the filter, the convolutional neural network, the discriminant classifier, etc. are described below.
1. Filter with a filter element having a plurality of filter elements
The filter is a set of feature sets pre-trained by a sparse autoencoder to serve as convolution kernels. The method is obtained by training in a massive unsupervised auxiliary training set, and the feature set has good generality and completeness. Characterised byThe pre-training process is an off-line process, and the trained features are not updated when the target tracking algorithm is executed. Fig. 9 is a network configuration diagram of the sparse autoencoder. L is 1 Is an input layer, and for a 10 × 10 image, input x = [ x ] 1 ,x 2 ,...,x 100 ]。L 2 Is a hidden layer, containing 100 hidden neurons. L is 3 Is an output layer, outputs h W,b (x) In that respect Is provided with
Figure GDA0002419398600000131
Is the connection weight between the jth unit of the l th layer and the ith unit of the l +1 th layer, and the combination unit>
Figure GDA0002419398600000132
Is the bias term of the ith unit of the l +1 th layer, the parameter of the sparse self-encoder is (W, b) = (W) (1) ,b (1) ,W (2) ,b (2) ) Wherein W is (l) (l =1,2) is @>
Figure GDA0002419398600000133
Is a 100 x 100 matrix of elements, b (l) (l =1,2) is @>
Figure GDA0002419398600000134
Is a 100-dimensional vector of elements.
The training process of the sparse autoencoder is as follows: (1) The gradient of the initial weight and bias term is 0, with a normal distribution N (0,0.01) 2 ) The generated random value is used as an initial value of the network parameter (W, b); and (2) calculating partial derivatives. Calculating and accumulating partial derivatives by using a back propagation algorithm; (3) updating the weight parameter; and (4) repeating the steps (1) - (3) until convergence.
The method randomly selects one million pictures from a public data set TinyImagesDataset containing a large number of pictures of objects, pedestrians, backgrounds and the like in real life as auxiliary unsupervised training data, and calculates and determines parameters (W, b).
If the input g (100-dimensional vector) has the following constraints:
Figure GDA0002419398600000135
the inputs that make the i-th element of the hidden layer get the maximum excitation are:
Figure GDA0002419398600000136
wherein the content of the first and second substances,
Figure GDA0002419398600000137
the ith unit (i =1,2.., 100) of the hidden layer is sequentially set to the maximum excitation value, and g at this time is calculated (i) Then 100 input images of 10 × 10 are obtained, as shown in fig. 10. These 100 images can be considered as "bases" of a training sample set, and any given image sample can be approximately represented by a combination of these bases. In the convolutional neural network, the substrates are used as convolution kernels, so that the features of an input picture can be effectively extracted.
2. Convolutional neural network
The convolution kernel is a filter consisting of 100 pre-training features of size 10 x 10. The filter may extract features of the input image. The step size of the filter is set to 5, and each filter convolves the input image to obtain a feature map with the size of 9 × 21. Then, pooling is performed for each 3 × 3 region of the feature map, and the pooling algorithm is an averaging, thereby obtaining a feature map with a size of 3 × 7. All 2100 nodes of the feature map are input into a neural network (namely a hidden layer) containing 350 nodes, and the features of a higher layer are further extracted while the dimension is reduced, so that the classifier can further judge the features.
3. Discriminant classifier
The mathematical model of the discriminant classifier is the SoftMax function. The minimum value of the cost function of the SoftMax regression algorithm can be solved by a gradient descent method, and a unique optimal solution is obtained.
4. Training of hidden layer and discriminant classifier cascade network
When the parameters need to be updated, the tracker should be retrained. The filter parameters need not be updated, and the parameters of the hidden layer and discriminant classifier need to be updated. And training the network formed by cascading the hidden layer and the discriminant classifier by using a gradient descent method as a whole. The training algorithm comprises the following steps: (1) Performing feedforward transmission, and calculating feature maps after convolution and pooling, hidden layer weighted sum, activation value vector and classification probability vector; (2) calculating a residual error; (3) calculating partial derivatives; (4) updating the parameters; and (5) repeating the steps (1) - (4) until convergence.
The relevant parameter settings for the process of the invention are shown in table 1.
TABLE 1 parameter settings
Figure GDA0002419398600000151
The method of the present invention was compared with IVT (inclusive Visual Tracking), SCM (Sparse collaborative model), MIL (multiple institute learning) methods on a self-built building doorway monitoring video data set, and the performance was as shown in table 2. From the result, the tracking accuracy of the method is closer to that of other algorithms, the execution efficiency is slightly higher than that of other algorithms, and the method is better than other algorithms in the aspects of robustness and counting precision.
TABLE 2 Performance and comparison of people number increment detection methods based on motion tracking
Figure GDA0002419398600000161
The method of the invention designs the shielding mode detection and compensation method for the common shielding mode, and has stronger shielding resistance. The design of a convolution filter, a simple convolution neural network structure, a hidden layer, a regression layer and the like only need to be trained when parameters are updated on line in advance through off-line training, so that the method can meet the application requirement of long-time uninterrupted operation of video monitoring. The method has robustness, real-time performance and relatively high precision, is suitable for personnel counting under video big data, and can be integrated into a video monitoring software system.
The above description is only an example of the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A video personnel tracking and counting method based on artificial intelligence is characterized in that: the method comprises the following steps:
step 1: initializing a video frame number n =1, and segmenting an nth frame video object to obtain a pedestrian connected domain set
Figure FDA0004058405580000011
Calculating a feature vector ^ of the jth pedestrian>
Figure FDA0004058405580000012
And motion vector pick>
Figure FDA0004058405580000013
Setting a maximum number of untracked matches >>
Figure FDA0004058405580000014
The calculation method of the feature vector and the motion vector of the pedestrian is as follows:
the feature vector of the jth pedestrian is v j =(x j ,y j ,S j ) Wherein (x) j ,y j ) Is p j Center of mass coordinate of S j Is p j Area of (d):
Figure FDA0004058405580000015
wherein, y h For monitoring the height of the video image, N i And M i Are respectively provided withIs p j The number of pixels in the length and width directions of the circumscribed rectangle, f j (x, y) is p j The binary image of (2):
Figure FDA0004058405580000016
the motion vector of the jth pedestrian is m j =(l j ,λ j ) Wherein l is j =l(p j ) For the inside and outside of the door by a pedestrian, l j =0 for inside of door, i.e. in building,/ j =1 represents the outside of the door, i.e. outside the building; lambda [ alpha ] j For the longest untracked matching times of jth pedestrian, i.e. lambda j =λ(p j );
Step 2: dividing the (n + 1) th frame video object
Figure FDA0004058405580000017
Calculate->
Figure FDA0004058405580000018
And &>
Figure FDA0004058405580000019
Figure FDA00040584055800000110
And step 3: at P (n) Middle search and P (n+1) Adapted for
Figure FDA00040584055800000111
And sets->
Figure FDA00040584055800000112
Figure FDA00040584055800000113
To P (n+1) Each pedestrian in (1)
Figure FDA0004058405580000021
Are all from P n Finding a pedestrian or a vehicle matched to the tracking>
Figure FDA0004058405580000022
If the matching is successful, calculating the number increment in: (1) If the pedestrians move from the inside of the building to the outside of the building, the increment of the number of the pedestrians is-1; (2) If the pedestrians move from the outside to the inside of the building, the increment of the number of the pedestrians is 1; (3) if the pedestrians move in the building all the time, the increment of the number of the pedestrians is 0; (4) if the pedestrians move outside the building all the time, the increment of the number of the pedestrians is 0;
successfully obtaining matched p i The longest untracked matching times lambda i All are reset;
if the matching is successful, the matching needs to be checked
Figure FDA0004058405580000023
Whether the judgment condition of the combined type shielding is met or not is judged, if yes, the in detection needs to be compensated;
if the matching fails, the judgment is needed
Figure FDA0004058405580000024
Whether the pedestrian is a blocked pedestrian in the nth frame or not; if/or>
Figure FDA0004058405580000025
If the judgment condition of distributed shielding is met, compensating the in; otherwise, it is looked pick>
Figure FDA0004058405580000026
For pedestrians newly appearing in the monitored area, let λ i =0;
And 4, step 4: examination of P (n) Those failing to react with P (n+1) Successfully matched pedestrians, supplementing them to P (n+1) The maximum number of times of matching which is not tracked is added with 1; if go upIf the pedestrian is matched in the (n + 2) th frame, judging that the intermittent shielding occurs and the in is correspondingly changed; otherwise, adding 1 again to the longest untracked matching times, and judging that the mobile terminal leaves the monitoring area once the threshold value is reached; if it is
Figure FDA0004058405580000027
If the judgment condition of the convergent type shielding is met, compensating the in;
and 5: rejecting pedestrians who have left the monitored area and misdetected pedestrians, for P (n+1) Checking whether the longest untracked matching frequency of each pedestrian exceeds a threshold value;
if the pedestrian is larger than the threshold value, the pedestrian is considered to leave the monitoring area and should be abandoned;
otherwise, the pedestrian is considered to be temporarily shielded and should be reserved;
meanwhile, whether the area of the pedestrian exceeds the range is checked, if the area of the pedestrian is not within the range, the pedestrian is considered to be detected wrongly and should be discarded;
updating P (n+1)
Step 6: letting n = n +1, and skipping to step 2 until the analysis of the whole video image sequence is completed;
the tracking matching in the step 3 specifically comprises the following steps:
step 31: initializing a video frame number n =1, tracker T (W);
step 32: handle
Figure FDA0004058405580000031
Center of mass (x) i ,y i ) Translated to the coordinate->
Figure FDA0004058405580000032
At least one of (1) and (b); the translation mode is that the center of mass is used as the center, and the center of mass is translated to D of the center of mass in the direction I 8 Pixel point with distance equal to d, wherein->
Figure FDA0004058405580000033
d=5,10;In conjunction with->
Figure FDA0004058405580000034
Obtaining 17 samples of the ith class, wherein the labels are i;
Figure FDA0004058405580000035
Figure FDA0004058405580000036
step 33: using the obtained samples to form a sample set C (1) And training the tracker T (W) to determine the parameter as W 1
Step 34: detecting the (n + 1) th frame to obtain P (n+1) Is provided with C (n+1) =C (n)
Step 35: p is to be j ∈P (n+1) Input tracker T (W) n ) Obtaining output; taking the maximum value o of the output m And an upper threshold σ 1 Lower threshold σ 2 A comparison is made, where σ 1 ≥σ 2
(1) If o is m Less than a lower threshold σ 2 Then, consider p j Tracking and matching failure is carried out on newly appeared pedestrians in the (n + 1) th frame; p is to j Translation of the center of mass to coordinates
Figure FDA0004058405580000037
Is located, wherein>
Figure FDA0004058405580000038
d =5, 10; together with p j A total of 17 samples were obtained, increasing to C (n+1) As a new class of samples;
(2) If o is m Greater than the upper threshold σ 1 Then consider p to be m ∈P (n) And p j ∈P (n+1) Are highly matched;
(3) If o is m Greater than the lower thresholdValue sigma 2 But is smaller than the upper threshold value sigma 1 Then, consider p m ∈P (n) And p j ∈P (n+1) Is matched, with j Translation of the center of mass to coordinates
Figure FDA0004058405580000041
Is located, wherein>
Figure FDA0004058405580000042
d =5, 10; together with p j Obtaining 17 samples in total, adding the samples into a sample set with a label of m, and removing the 17 samples with the label of m which enter the first class if the number of the samples with the label of m is greater than the capacity V of each class of sample pool; />
Step 36: updating the sample set, and removing the samples of the pedestrians who leave the monitoring area and are detected by mistake, wherein the updating is divided into 3 conditions:
(1) For newly appeared pedestrians, a new pedestrian category is created;
(2) For pedestrians whose appearance characteristics have changed, new samples should be collected and supplemented; when the number of samples exceeds the capacity V of each type of sample pool during sample expansion, updating the sample set according to a first-in first-out rule, namely replacing the sample entering the sample pool at the earliest time with the newly supplemented sample, and determining V =34 through experiments;
(3) For the pedestrian which leaves the monitoring area and is detected by mistake, samples of the category of the pedestrian are removed;
after updating, a new sample set C is obtained (n+1)
Step 37: update parameters of tracker T (W): use of C (n+1) Training T (W) and determining the parameter as W n+1 When training T (W), the initial value of the network parameter is W n
2. The artificial intelligence based video personnel tracking and counting method according to claim 1, wherein: the tracker, comprising: the method comprises the following steps of (1) updating a filter, a convolutional neural network, a discriminant classifier and parameters on line;
obtaining a pedestrian set containing moving targets after the nth frame image is segmented by a video object, adjusting the area of each pedestrian rectangular frame to be 50 multiplied by 110, and inputting the pedestrian rectangular frames into a convolutional neural network;
the convolutional neural network inputs the extracted features into a discriminant classifier, and the discriminant classifier outputs a tracking result vector and gives the probability that the pedestrian in the current frame belongs to each class;
and if the tracking result shows that the appearance characteristics of newly appeared pedestrians are changed, the pedestrians leave the monitoring area and the false detection condition is detected, updating the sample set, retraining the hidden layer and the classifier, determining new network parameters, and then entering the pedestrian tracking of the (n + 1) th frame.
3. The artificial intelligence based video personnel tracking and counting method according to claim 2, characterized in that: the filter in the tracker is a group of feature sets which are pre-trained by a sparse self-encoder, is obtained by training in a massive unsupervised auxiliary training set, and has good generality and completeness.
4. The artificial intelligence based video personnel tracking and counting method according to claim 3, wherein: the convolution kernel used by the convolution neural network in the tracker is a filter composed of 100 pre-training features with the size of 10 × 10.
5. The artificial intelligence based video personnel tracking and counting method according to claim 2, characterized in that: and a mathematical model of a discriminant classifier in the tracker adopts a SoftMax function.
CN201911200873.6A 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence Active CN111160101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911200873.6A CN111160101B (en) 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911200873.6A CN111160101B (en) 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN111160101A CN111160101A (en) 2020-05-15
CN111160101B true CN111160101B (en) 2023-04-18

Family

ID=70556257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911200873.6A Active CN111160101B (en) 2019-11-29 2019-11-29 Video personnel tracking and counting method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN111160101B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906590A (en) * 2021-03-02 2021-06-04 东北农业大学 FairMOT-based multi-target tracking pedestrian flow monitoring method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013189464A2 (en) * 2012-11-28 2013-12-27 中兴通讯股份有限公司 Pedestrian tracking and counting method and device for near-front top-view monitoring video
CN104112282A (en) * 2014-07-14 2014-10-22 华中科技大学 A method for tracking a plurality of moving objects in a monitor video based on on-line study
CN105224912A (en) * 2015-08-31 2016-01-06 电子科技大学 Based on the video pedestrian detection and tracking method of movable information and Track association
CN105989615A (en) * 2015-03-04 2016-10-05 江苏慧眼数据科技股份有限公司 Pedestrian tracking method based on multi-feature fusion
CN109146921A (en) * 2018-07-02 2019-01-04 华中科技大学 A kind of pedestrian target tracking based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013189464A2 (en) * 2012-11-28 2013-12-27 中兴通讯股份有限公司 Pedestrian tracking and counting method and device for near-front top-view monitoring video
CN104112282A (en) * 2014-07-14 2014-10-22 华中科技大学 A method for tracking a plurality of moving objects in a monitor video based on on-line study
CN105989615A (en) * 2015-03-04 2016-10-05 江苏慧眼数据科技股份有限公司 Pedestrian tracking method based on multi-feature fusion
CN105224912A (en) * 2015-08-31 2016-01-06 电子科技大学 Based on the video pedestrian detection and tracking method of movable information and Track association
CN109146921A (en) * 2018-07-02 2019-01-04 华中科技大学 A kind of pedestrian target tracking based on deep learning

Also Published As

Publication number Publication date
CN111160101A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
Basharat et al. Learning object motion patterns for anomaly detection and improved object detection
EP1844443B1 (en) Classifying an object in a video frame
Haritaoglu et al. Detection and tracking of shopping groups in stores
CN108021848A (en) Passenger flow volume statistical method and device
US9569531B2 (en) System and method for multi-agent event detection and recognition
US20060285723A1 (en) Object tracking system
Swathi et al. Crowd behavior analysis: A survey
Kumar et al. Study of robust and intelligent surveillance in visible and multi-modal framework
WO2008070206A2 (en) A seamless tracking framework using hierarchical tracklet association
Cupillard et al. Group behavior recognition with multiple cameras
EP1472870A4 (en) Method and apparatus for video frame sequence-based object tracking
D'Orazio et al. Color brightness transfer function evaluation for non overlapping multi camera tracking
Tomar et al. Crowd analysis in video surveillance: A review
Afonso et al. Automatic estimation of multiple motion fields from video sequences using a region matching based approach
CN111160101B (en) Video personnel tracking and counting method based on artificial intelligence
CN108280408B (en) Crowd abnormal event detection method based on hybrid tracking and generalized linear model
KR20200060868A (en) multi-view monitoring system using object-oriented auto-tracking function
Xu et al. Smart video surveillance system
Liu et al. Multi-view vehicle detection and tracking in crossroads
Mazzeo et al. Visual players detection and tracking in soccer matches
Al Najjar et al. A hybrid adaptive scheme based on selective Gaussian modeling for real-time object detection
CN113223081A (en) High-altitude parabolic detection method and system based on background modeling and deep learning
Bruckner et al. High-level hierarchical semantic processing framework for smart sensor networks
Panda et al. Robust real-time object tracking under background clutter
Mazzeo et al. Object tracking by non-overlapping distributed camera network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant