CN116895047B - Rapid people flow monitoring method and system - Google Patents

Rapid people flow monitoring method and system Download PDF

Info

Publication number
CN116895047B
CN116895047B CN202310906960.3A CN202310906960A CN116895047B CN 116895047 B CN116895047 B CN 116895047B CN 202310906960 A CN202310906960 A CN 202310906960A CN 116895047 B CN116895047 B CN 116895047B
Authority
CN
China
Prior art keywords
image
target
head
task
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310906960.3A
Other languages
Chinese (zh)
Other versions
CN116895047A (en
Inventor
黄丰喜
关丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Panoramic Youtu Technology Co ltd
Original Assignee
Beijing Panoramic Youtu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Panoramic Youtu Technology Co ltd filed Critical Beijing Panoramic Youtu Technology Co ltd
Priority to CN202310906960.3A priority Critical patent/CN116895047B/en
Publication of CN116895047A publication Critical patent/CN116895047A/en
Application granted granted Critical
Publication of CN116895047B publication Critical patent/CN116895047B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a rapid people flow monitoring method, which comprises the steps of determining a central area and an edge area of a target scene image; collecting an image sequence under the same target scene; calculating the color difference between the current frame target scene image and the adjacent frame target scene image in the image sequence to obtain a color difference image of the current frame; determining a key frame image according to the information entropy of the color difference image; performing head recognition on the key frame image by adopting a multitask convolutional neural network so as to obtain a head target; tracking a human head target in an image sequence by adopting a multi-scale space tracking algorithm; merging the human head targets and tracking the merged human head targets; and analyzing the combined target by combining the central area and the edge area to obtain the people flow information. The method adopts the ideas of only carrying out identification in key frames and tracking and partitioning in all frames, improves the accuracy of identification and tracking, and can better identify the head target with shielding, back facing and small size.

Description

Rapid people flow monitoring method and system
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a rapid people flow monitoring method and system.
Background
The traditional people flow monitoring method is that a sliding window is arranged in an image, HOG characteristics in the sliding window are extracted, and an SVM classifier is trained by using the HOG characteristics and is used for distinguishing people heads from non-people heads. This method has the following problems: the calculated amount is large, and under the conditions of large people flow density and people-to-people shielding, the condition of missed detection or repeated detection is easy to occur.
Another conventional people flow monitoring method is to use a deep learning network to detect people heads, and the method has excellent performance in target detection, but has complex network structure, large calculation amount and can not well detect small target objects.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a rapid people flow monitoring method and system, which overcome the conditions of missed detection and repeated detection in the prior art.
In a first aspect, a rapid traffic monitoring method includes:
determining a central region and an edge region of the target scene image;
collecting an image sequence under the same target scene; the image sequence comprises a plurality of frames of target scene images;
calculating the color difference between the current frame target scene image and the adjacent frame target scene image in the image sequence to obtain a color difference image of the current frame;
determining a key frame image according to the information entropy of the color difference image;
performing head recognition on the key frame image by adopting a multitask convolutional neural network so as to obtain a head target;
tracking a human head target in an image sequence by adopting a multi-scale space tracking algorithm;
combining the head target which is recognized and output by the head and the head target which is being tracked to obtain a combined target, and tracking the combined target in an image sequence;
and analyzing the combined target by combining the central area and the edge area to obtain the people flow information.
Further, the central region is located in the middle of the target scene image; the edge region is located at the periphery of the central region.
Further, determining the key frame image according to the information entropy of the color difference image specifically includes: and marking the image frames with entropy value H larger than threshold value T in the color difference images as key frame images.
Further, the head recognition includes:
the improved rapid multitask convolutional neural network is adopted to identify the head target, and the basic structure is an mtcnn network;
replacing the two classification loss functions with a multi-classification cross entropy loss function, the multi-classification cross loss function L i The method comprises the following steps:
wherein p is i Representing the probability that the sample is a face, y i E {0,1} represents a sign function, and is valued at 1 when the sign function is a face and 0 when the sign function is not a face;
the bounding box regression task objective function is:
wherein the method comprises the steps ofIs the regression target position of the network output; />Is the marked position, has four coordinates including the upper left corner, the height and the width;
the human face locating point task objective function is:
wherein the method comprises the steps ofIs the coordinates of the locating points of the face output by the network, < + >>The face positioning point coordinates are marked and comprise left eyes, right eyes, noses, left mouth corners and right mouth corners;
the overall task objective function of the network is:
where i represents a sample, j represents a task, the task includes a bounding box regression task and a face anchor point task, N is the number of training samples,sample type indicator, alpha j Representing the importance of the task; />An objective function for each task;
the key frame images are input into a modified fast multitasking convolutional neural network to identify a human head target.
Further, the training set of the improved fast multitasking convolutional neural network is obtained by the following method:
acquiring a history monitoring video under a target scene;
extracting multi-frame images in the historical monitoring video to serve as training images;
selecting a head range in a training image, and marking a plurality of positioning points in the head range;
generating an image range with random positions and random sizes in the training image; the ratio of the overlapping area of the image range and the human head range to the human head range is smaller than 0.3;
and obtaining a training set according to the training image.
Further, the traffic information includes the number of people's head targets in the center area of the target scene image, and the directions of the individual people's head targets.
Further, when the human head target enters the central area from the edge area, the human head target goes to enter;
when the head object enters the edge area from the central area, the head object goes to leave.
In a second aspect, a rapid traffic monitoring system includes:
the acquisition unit: for determining a center region and an edge region of the target scene image; collecting an image sequence under the same target scene; the image sequence comprises a plurality of frames of target scene images;
color difference calculation unit: the method comprises the steps of calculating color difference between a current frame target scene image and an adjacent frame target scene image in an image sequence to obtain a color difference image of a current frame; determining a key frame image according to the information entropy of the color difference image;
identification tracking unit: the method comprises the steps of performing head recognition on a key frame image by adopting a multitasking convolutional neural network so as to obtain a head target; tracking a human head target in an image sequence by adopting a multi-scale space tracking algorithm; tracking the combined target in the image sequence;
a target updating unit: the head recognition system is used for combining the head target which is recognized and output by the head and the head target which is being tracked to obtain a combined target;
a data analysis unit: and the device is used for analyzing the combined target by combining the central area and the edge area so as to obtain the people flow information.
Further, the color difference calculating unit is specifically configured to: and marking the image frames with entropy value H larger than threshold value T in the color difference images as key frame images.
Further, the identification tracking unit is specifically configured to:
the improved rapid multitask convolutional neural network is adopted to identify the head target, and the basic structure is an mtcnn network;
instead of using a multi-class cross entropy loss function for the two-class loss function,multi-class cross-loss function L i The method comprises the following steps:
wherein p is i Representing the probability that the sample is a face, y i E {0,1} represents a sign function, and is valued at 1 when the sign function is a face and 0 when the sign function is not a face;
the bounding box regression task objective function is:
wherein the method comprises the steps ofIs the regression target position of the network output; />Is the marked position, has four coordinates including the upper left corner, the height and the width;
the human face locating point task objective function is:
wherein the method comprises the steps ofIs the coordinates of the locating points of the face output by the network, < + >>The face positioning point coordinates are marked and comprise left eyes, right eyes, noses, left mouth corners and right mouth corners;
the overall task objective function of the network is:
where i represents a sample, j represents a task, the task includes a bounding box regression task and a face anchor point task, N is the number of training samples,sample type indicator, alpha j Representing the importance of the task; />An objective function for each task;
the key frame images are input into a modified fast multitasking convolutional neural network to identify a human head target.
And (4) carrying out head target tracking by adopting a rapid multi-scale space tracking algorithm dsst. And judging the position and the direction of the head target.
According to the technical scheme, the rapid people flow monitoring method and system provided by the invention are characterized in that the target scene image is collected, the center area and the edge area are divided on the target scene image, and the data analysis of people counting, people going and the like in the target scene is completed through the position change of the people head target. The method adopts the ideas of only identifying key frames and tracking and partitioning all frames, improves the calculation efficiency of the algorithm, improves the accuracy of identification and tracking, can better identify the head target with shielding, back and small size, and overcomes the conditions of missed detection and repeated detection in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.
Fig. 1 is a flowchart of a rapid traffic monitoring method according to an embodiment.
Fig. 2 is a schematic diagram of a target scene image partition provided by an embodiment.
Detailed Description
Embodiments of the technical scheme of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and thus are merely examples, and are not intended to limit the scope of the present invention. It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Examples:
a rapid traffic monitoring method, see fig. 1, comprising:
determining a central region and an edge region of the target scene image;
collecting an image sequence under the same target scene; the image sequence comprises a plurality of frames of target scene images;
calculating the color difference between the current frame target scene image and the adjacent frame target scene image in the image sequence to obtain a color difference image of the current frame;
determining a key frame image according to the information entropy of the color difference image;
performing head recognition on the key frame image by adopting a multitask convolutional neural network so as to obtain a head target;
tracking a human head target in an image sequence by adopting a multi-scale space tracking algorithm;
combining the head target which is recognized and output by the head and the head target which is being tracked to obtain a combined target, and tracking the combined target in an image sequence;
and analyzing the combined target by combining the central area and the edge area to obtain the people flow information.
In this embodiment, the method may use a short-focus wide-angle camera to collect images of the target scene, the resolution of the camera may select 1024×768 pixels, and the frame rate is 24 frames/second, and the method may generate an image sequence according to continuously collected images of the target scene.
In this embodiment, the method may partition the acquired target scene image. For example, referring to fig. 2, the target scene image is divided into a center region, which is located in the middle of the target scene image, and an edge region, which is located at the periphery of the center region, which may be one third of the size of the entire target scene image. The central area and the edge area are mainly used to determine the number of persons in the monitored area (i.e. the central area) and the destination of each person. For example, the number of people in the central area is the number of people in the monitoring area, the going-away of people from the central area to the edge area is determined according to the historical track, and the going-away of people from the edge area to the central area is determined as the going-in.
In this embodiment, since the target scene is fixed, the background variation in the target scene images of the adjacent frames acquired for the same target scene is small, so the method obtains the color difference image of the current frame by calculating the color difference between the target scene image of the current frame and the target scene image of the adjacent frames in the image sequence, and removes the influence of the background, thereby obtaining the region needing to be tracked and analyzed with emphasis. And determining the key frame image according to the information entropy of the color difference image.
In the embodiment, the method adopts a multitasking convolutional neural network to perform head recognition on the key frame image so as to obtain a head target; and tracking the head target in the image sequence by adopting a multi-scale space tracking algorithm. For example, the method can always execute target tracking and only perform head recognition on the key frame image, so that the target tracking can be performed according to the latest result of head recognition when the target tracking is performed next time, and the accuracy and the instantaneity of target tracking and head recognition are improved. The target tracking mainly judges the position and the direction of the head target. The method adopts a rapid multi-scale space tracking algorithm dsst to track the human head target. The color difference image, the target scene image and a plurality of human head targets to be tracked are input into a rapid multi-scale space tracking algorithm dsst, the rapid multi-scale space tracking algorithm dsst extracts an area Z with the size which is 2 times that of a previous frame target in a current frame, the central position of the previous frame target is the center of the area Z, 28-dimensional characteristics (27-dimensional FHOG characteristics+1-dimensional original gray characteristics) are calculated for each pixel point in the area Z, and a new position of the current frame target is obtained when max (y) is obtained. Extracting regions at s=33 scales with the new position as the centerP and R are the length and width of a frame on the target, each area is scaled to be of a fixed size, 31-dimension FHOG features are extracted respectively, FHOG features of each sample are connected in series to form a feature vector, and the obtained scale is the scale of the new position when max (y) is obtained. And calculating the position of each head target of the current frame according to the obtained new position and the scale. The method combines the improved MTCNN network model with the rapid multi-scale space tracking algorithm dsst, avoids the problem of repeated detection, improves the real-time performance and the accuracy of head target tracking, and ensures that the people flow statistics is more accurate.
In the embodiment, the method combines the head target which is output by head recognition and the head target which is being tracked to obtain a combined target, and tracks the combined target in the image sequence. The method completes the updating of the human head target by combining the results of human head identification and target tracking. And carrying out follow-up tracking according to the updated human head target. And finally, judging whether the head target enters or leaves the central area according to the current position of the tracked head target, whether the head target is in the central area, the number of the head targets in the central area, the position of the head target in the last frame, and the like, so as to obtain the destination of the head target and obtain the flow information.
According to the method, a target scene image is acquired, a central area and an edge area are divided on the target scene image, and data analysis such as personnel counting, personnel going and the like in a target scene is completed through position change of a head target. The method adopts the ideas of only identifying key frames and tracking and partitioning all frames, improves the calculation efficiency of the algorithm, improves the accuracy of identification and tracking, can better identify the head target with shielding, back and small size, and overcomes the conditions of missed detection and repeated detection in the prior art.
Further, in some embodiments, determining the key frame image according to the information entropy of the color difference image specifically includes: and marking the image frames with entropy value H larger than threshold value T in the color difference images as key frame images.
In this embodiment, the method further marks a key frame image in the obtained color difference image, for example, an image frame defining an entropy H greater than a threshold T is marked as a key frame image. Wherein the threshold T may be specifically set according to the actual situation. The information in the color difference image may be represented by an entropy value H. The one-dimensional entropy of the image represents the information content contained in the aggregation feature of the gray distribution in the image, and p is set i Representing the proportion of i pixels in the image, the entropy H is represented as:the image frame with the entropy value H larger than the threshold value T in the color difference image is marked as the key frame image, so that the method only needs to identify the head of the key frame image, and the non-key frame image does not enter the imagePedestrian head recognition greatly reduces the calculated amount.
Further, in some embodiments, the human head identification comprises:
the improved fast multi-task convolutional neural network is adopted, and the basic structure of the fast multi-task convolutional neural network can be an MTCNN network. The MTCNN network is a multitasking convolutional neural network combining face detection and facial key point detection, and has good recognition effect and high speed. The MTCNN network adopts a cascaded CNN architecture, and the image processing process is distributed from coarse to fine; performance is improved by exploring relationships between various tasks; an online difficult sample mining (OHEM) strategy is used in the training process; joint learning of face alignment is added. The MTCNN Network is mainly composed of three sub-networks, namely a Propos Network (P-Net), a finer Network (R-Net), and an Output Network (O-Net). The method comprises the steps of performing scale transformation of an initial image by using an image pyramid at an input layer, generating a large number of candidate target area frames by using P-Net, performing first concentration and frame regression on the target area frames by using R-Net to exclude most of non-target areas, performing discrimination and area frame regression on the rest target area frames by using a more complex and high-precision network O-Net, and finally obtaining a detected target. However, since MTCNN network is initially intended for face detection, five key points (eyes, nose, mouth, etc.) of the face are detected with emphasis in addition to the outline of the head. Therefore, when the head of a person is shielded, faces away from the lens and faces with small size, misjudgment and missed judgment are easy to generate. And people flow monitoring only focuses on the number and direction of people in the scene, and does not focus on who the target is. The method therefore makes the following improvements to the MTCNN network:
in the original MTCNN network, the final classification task of the P-Net, R-Net and O-Net networks is two classifications, and the loss function is L i =-(y i log(p i )+(1-y i )(1-log(p i ) A) is set forth; where i represents the ith sample, p i Representing the probability that the sample is a face, y i E {0,1} represents a sign function, and is valued at 1 when the sign function is a face and 0 when the sign function is not a face. But the rapid traffic monitoring method of the present application is much more rapidThe task convolution neural network needs to detect three types of faces, heads and non-faces, so that three classification processing is needed, the condition of missing detection under the condition that the faces cannot be seen is avoided, and the method uses a multi-classification cross entropy loss function to replace the original two-classification loss function, wherein the multi-classification cross loss function is as follows:
the bounding box of the method returns the task objective functionThe method comprises the following steps: />Wherein->Is the regression target position of the network output; />For example, 4 positions are noted, including the upper left corner, height and width. Face anchor point task objective function>The method comprises the following steps: /> Is the coordinates of the locating points of the face output by the network, < + >>The coordinates of the locating points of the face to be marked comprise left eyes, right eyes, noses, left mouth corners and right mouth corners. The overall task objective function of the network is: />j represents tasks (classification task, bounding box regression task and face anchor point task), N is training sample number, ++>Sample type indicator, alpha j Representing the importance of a task->For objective functions of tasks, including multi-classification task L i Bounding box regression task->Face anchor point task->The multi-classification task of the method is three classification, and when the head or the non-face head is identified, the face positioning point task alpha is realized j When the human face is identified, all alpha are marked as the importance of the task of the human face locating point is lower j A smaller value will be assigned.
The method improves the MTCNN network, reduces the weight of the face key points in the recognition result, improves the recognition rate of dense, back-facing, shielded and small-size heads, and avoids the condition of missed detection.
Further, in some embodiments, the training set of the improved fast multitasking convolutional neural network is obtained by:
acquiring a history monitoring video under a target scene;
extracting multi-frame images in the historical monitoring video to serve as training images;
selecting a head range in a training image, and marking a plurality of positioning points in the head range;
generating an image range with random positions and random sizes in the training image; the ratio of the overlapping area of the image range and the human head range to the human head range is smaller than 0.3;
and obtaining a training set according to the training image.
In this embodiment, the method builds a training set prior to training the network. The specific construction method comprises the following steps: and extracting multi-frame images in a historical monitoring video of the target scene as training images, and selecting a head range in the training images, wherein the head range comprises a human face and a human head, and a plurality of positioning point coordinates (such as left eye, right eye, nose, left mouth angle, right mouth angle and the like) can be marked on the human face. The method may then further produce a negative sample, generating a randomly positioned and sized image range in the training image, the overlapping area of the image range and the head range having an area ratio of less than 0.3 relative to the entire head range.
In this embodiment, after the training set is obtained, the method starts training of the network. Firstly, face alignment processing is carried out on a training image, then an image pyramid is produced, the training image is scaled according to different scales, and scaling is stopped until the minimum side length of the training image is smaller than 12. Inputting the image pyramid into the P-Net network in turn, adopting direction propagation to update network parameters, enabling all training images to pass through the network once to be called as 1 epoch, selecting the best one time of training results as initial network parameters to perform new training after a plurality of epoch iterations, and finally selecting the network with the best results as final P-Net network parameters; the result of the training image output by the P-Net is used as the input of the R-Net network, and the R-Net is iterated for a plurality of times, and the final result with the best effect is selected as the final result of the R-Net; after further screening of R-Net, the output of R-Net is used as the input of O-Net network, and the final network parameter with the best effect is selected as the final network parameter of O-Net according to the iterative network parameter of objective function.
In the embodiment, after the network training is completed, the method acquires the image sequence of the target scene in real time, inputs the image sequence into the trained network, recognizes and tracks the head target, and can output the current number of people in the central area and the number of people entering and leaving the central area in real time by combining the central area and the edge area.
A rapid traffic monitoring system comprising:
the acquisition unit: for determining a center region and an edge region of the target scene image; collecting an image sequence under the same target scene; the image sequence comprises a plurality of frames of target scene images;
color difference calculation unit: the method comprises the steps of calculating color difference between a current frame target scene image and an adjacent frame target scene image in an image sequence to obtain a color difference image of a current frame; and determining the key frame image according to the information entropy of the color difference image.
Identification tracking unit: the method comprises the steps of performing head recognition on a key frame image by adopting a multitasking convolutional neural network so as to obtain a head target; tracking a human head target in an image sequence by adopting a multi-scale space tracking algorithm; tracking the combined target in the image sequence;
a target updating unit: the head recognition system is used for combining the head target which is recognized and output by the head and the head target which is being tracked to obtain a combined target;
a data analysis unit: and the device is used for analyzing the combined target by combining the central area and the edge area so as to obtain the people flow information.
Further, in some embodiments, the color difference calculating unit is specifically configured to: and marking the image frames with entropy value H larger than threshold value T in the color difference images as key frame images.
Further, in some embodiments, the identification tracking unit is specifically configured to:
an improved fast multitasking convolutional neural network is adopted, and the basic structure is an mtcnn network;
replacing the two classification loss functions with a multi-classification cross entropy loss function, the multi-classification cross loss function L i The method comprises the following steps:
wherein p is i Representing the probability that the sample is a face, y i E {0,1} represents a sign function, and is valued at 1 when the sign function is a face and 0 when the sign function is not a face;
the bounding box regression task objective function is:
wherein the method comprises the steps ofIs the regression target position of the network output; />Is the marked position, has four coordinates including the upper left corner, the height and the width;
the human face locating point task objective function is:
wherein the method comprises the steps ofIs the coordinates of the locating points of the face output by the network, < + >>The face positioning point coordinates are marked and comprise left eyes, right eyes, noses, left mouth corners and right mouth corners;
the overall task objective function of the network is:
where i represents a sample, j represents a task, the task includes a bounding box regression task and a face anchor point task, N is the number of training samples,sample type indicator, alpha j Representing the importance of the task; />An objective function for each task;
the key frame images are input into a modified fast multitasking convolutional neural network to identify a human head target.
And (4) carrying out head target tracking by adopting a rapid multi-scale space tracking algorithm dsst. And judging the position and the direction of the target.
For a brief description of the system provided by the embodiments of the present invention, reference may be made to the corresponding content in the foregoing embodiments where the description of the embodiments is not mentioned.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims (8)

1. A rapid traffic monitoring method, comprising:
determining a central region and an edge region of the target scene image;
collecting an image sequence under the same target scene; the image sequence comprises a plurality of frames of the target scene images;
calculating the color difference between the current frame target scene image and the adjacent frame target scene image in the image sequence to obtain a color difference image of the current frame;
determining a key frame image according to the information entropy of the color difference image;
performing head recognition on the key frame image by adopting a multitask convolutional neural network so as to obtain a head target; the head recognition comprises:
the improved rapid multitask convolutional neural network is adopted to identify the head target, and the basic structure is an mtcnn network;
substitution of two-class loss using multi-class cross entropy loss functionLoss function, multi-class cross-loss function L i The method comprises the following steps:
wherein p is i Representing the probability that the sample is a face, y i E {0,1} represents a sign function, and is valued at 1 when the sign function is a face and 0 when the sign function is not a face;
the bounding box regression task objective function is:
wherein the method comprises the steps ofIs the regression target position of the network output; />Is the marked position, has four coordinates including the upper left corner, the height and the width;
the human face locating point task objective function is:
wherein the method comprises the steps ofIs the coordinates of the locating points of the face output by the network, < + >>The face positioning point coordinates are marked and comprise left eyes, right eyes, noses, left mouth corners and right mouth corners;
the overall task objective function of the network is:
wherein i represents a sample, j represents a task, the task comprises a bounding box regression task and a face positioning point task, N is the number of training samples, and beta i j E {0,1} sample type indicator, alpha j Representing the importance of the task;an objective function for each task; when the human head or the non-human face human head is identified, the human face positioning point task alpha j Assign 0; when the face is identified, the task alpha is located by the face j The assignment is small;
inputting the key frame image into an improved fast multitasking convolutional neural network to identify the human head target;
tracking the head target in the image sequence by adopting a multi-scale space tracking algorithm;
combining the head target which is recognized and output by the head and the head target which is being tracked to obtain a combined target, and tracking the combined target in the image sequence;
and analyzing the merging targets by combining the central area and the edge area to obtain the people flow information.
2. The rapid traffic monitoring method of claim 1, wherein the central region is located in the middle of the target scene image; the edge region is located at the periphery of the central region.
3. The rapid traffic monitoring method according to claim 1, wherein the determining the key frame image according to the information entropy of the color difference image specifically comprises: and marking the image frames with entropy value H larger than threshold value T in the color difference images as the key frame images.
4. The rapid traffic monitoring method according to claim 1, wherein the training set of the improved rapid multitasking convolutional neural network is obtained by:
acquiring a history monitoring video under the target scene;
extracting multi-frame images in the history monitoring video to serve as training images;
selecting a head range from the training image, and marking a plurality of positioning points in the head range;
generating an image range with random positions and random sizes in the training image; the ratio of the overlapping area of the image range and the human head range to the human head range is smaller than 0.3;
and obtaining the training set according to the training image.
5. The rapid traffic monitoring method according to claim 1, wherein the traffic information includes the number of people's head objects in the central area of the object scene image, and the destination of each of the people's head objects.
6. The rapid traffic monitoring method according to claim 5, wherein,
when the human head target enters the central area from the edge area, the human head target enters the central area from the edge area;
and when the head object enters the edge area from the central area, the head object goes to leave.
7. A rapid traffic monitoring system, comprising:
the acquisition unit: for determining a center region and an edge region of the target scene image; collecting an image sequence under the same target scene; the image sequence comprises a plurality of frames of the target scene images;
color difference calculation unit: the method comprises the steps of calculating color difference between a current frame target scene image and an adjacent frame target scene image in an image sequence to obtain a color difference image of a current frame; determining a key frame image according to the information entropy of the color difference image;
identification tracking unit: the method comprises the steps of performing head recognition on the key frame image by adopting a multitasking convolutional neural network so as to obtain a head target; tracking the head target in the image sequence by adopting a multi-scale space tracking algorithm; tracking a combined target in the image sequence;
a target updating unit: the head recognition system is used for combining the head target which is recognized and output by the head and the head target which is being tracked to obtain a combined target;
a data analysis unit: the method comprises the steps of analyzing the merging targets by combining the central area and the edge area to obtain people flow information;
the identification tracking unit is specifically configured to:
the improved rapid multitask convolutional neural network is adopted to identify the head target, and the basic structure is an mtcnn network;
replacing the two classification loss functions with a multi-classification cross entropy loss function, the multi-classification cross loss function L i The method comprises the following steps:
wherein p is i Representing the probability that the sample is a face, y i E {0,1} represents a sign function, and is valued at 1 when the sign function is a face and 0 when the sign function is not a face;
the bounding box regression task objective function is:
wherein the method comprises the steps ofIs the regression target position of the network output; />Is the marked position, has four coordinates including the upper left corner, the height and the width;
the human face locating point task objective function is:
wherein the method comprises the steps ofIs the coordinates of the locating points of the face output by the network, < + >>The face positioning point coordinates are marked and comprise left eyes, right eyes, noses, left mouth corners and right mouth corners;
the overall task objective function of the network is:
wherein i represents a sample, j represents a task, the task comprises a bounding box regression task and a face positioning point task, N is the number of training samples, and beta i j E {0,1} sample type indicator, alpha j Representing the importance of the task;an objective function for each task; when the human head or the non-human face human head is identified, the human face positioning point task alpha j Assign 0; when the face is identified, the task alpha is located by the face j The assignment is small;
inputting the key frame image into an improved fast multitasking convolutional neural network to identify the human head target;
and tracking the human head target by adopting a rapid multi-scale space tracking algorithm dsst, and judging the position and the direction of the human head target.
8. The rapid traffic monitoring system of claim 7, wherein the color difference calculation unit is specifically configured to:
and marking the image frames with entropy value H larger than threshold value T in the color difference images as the key frame images.
CN202310906960.3A 2023-07-24 2023-07-24 Rapid people flow monitoring method and system Active CN116895047B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310906960.3A CN116895047B (en) 2023-07-24 2023-07-24 Rapid people flow monitoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310906960.3A CN116895047B (en) 2023-07-24 2023-07-24 Rapid people flow monitoring method and system

Publications (2)

Publication Number Publication Date
CN116895047A CN116895047A (en) 2023-10-17
CN116895047B true CN116895047B (en) 2024-01-30

Family

ID=88314704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310906960.3A Active CN116895047B (en) 2023-07-24 2023-07-24 Rapid people flow monitoring method and system

Country Status (1)

Country Link
CN (1) CN116895047B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477641A (en) * 2009-01-07 2009-07-08 北京中星微电子有限公司 Demographic method and system based on video monitoring
CN101877058A (en) * 2010-02-10 2010-11-03 杭州海康威视软件有限公司 People flow rate statistical method and system
WO2016011433A2 (en) * 2014-07-17 2016-01-21 Origin Wireless, Inc. Wireless positioning systems
CN108986064A (en) * 2017-05-31 2018-12-11 杭州海康威视数字技术股份有限公司 A kind of people flow rate statistical method, equipment and system
CN110236511A (en) * 2019-05-30 2019-09-17 云南东巴文健康管理有限公司 A kind of noninvasive method for measuring heart rate based on video
CN111160243A (en) * 2019-12-27 2020-05-15 深圳云天励飞技术有限公司 Passenger flow volume statistical method and related product
CN111209845A (en) * 2020-01-03 2020-05-29 平安科技(深圳)有限公司 Face recognition method and device, computer equipment and storage medium
CN111832413A (en) * 2020-06-09 2020-10-27 天津大学 People flow density map estimation, positioning and tracking method based on space-time multi-scale network
WO2021068323A1 (en) * 2019-10-12 2021-04-15 平安科技(深圳)有限公司 Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium
CN112991399A (en) * 2021-03-23 2021-06-18 上海工程技术大学 Bus passenger number detection system based on RFS
WO2021151296A1 (en) * 2020-07-22 2021-08-05 平安科技(深圳)有限公司 Multi-task classification method and apparatus, computer device, and storage medium
CN114708554A (en) * 2022-04-12 2022-07-05 南京邮电大学 Intelligent library people flow monitoring method and device based on face detection
CN114926422A (en) * 2022-05-11 2022-08-19 西南交通大学 Method and system for detecting boarding and alighting passenger flow

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250850B (en) * 2016-07-29 2020-02-21 深圳市优必选科技有限公司 Face detection tracking method and device, and robot head rotation control method and system
CN110263774B (en) * 2019-08-19 2019-11-22 珠海亿智电子科技有限公司 A kind of method for detecting human face
US11647294B2 (en) * 2021-05-25 2023-05-09 Shanghai Bilibili Technology Co., Ltd. Panoramic video data process

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477641A (en) * 2009-01-07 2009-07-08 北京中星微电子有限公司 Demographic method and system based on video monitoring
CN101877058A (en) * 2010-02-10 2010-11-03 杭州海康威视软件有限公司 People flow rate statistical method and system
WO2016011433A2 (en) * 2014-07-17 2016-01-21 Origin Wireless, Inc. Wireless positioning systems
CN108986064A (en) * 2017-05-31 2018-12-11 杭州海康威视数字技术股份有限公司 A kind of people flow rate statistical method, equipment and system
CN110236511A (en) * 2019-05-30 2019-09-17 云南东巴文健康管理有限公司 A kind of noninvasive method for measuring heart rate based on video
WO2021068323A1 (en) * 2019-10-12 2021-04-15 平安科技(深圳)有限公司 Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium
CN111160243A (en) * 2019-12-27 2020-05-15 深圳云天励飞技术有限公司 Passenger flow volume statistical method and related product
CN111209845A (en) * 2020-01-03 2020-05-29 平安科技(深圳)有限公司 Face recognition method and device, computer equipment and storage medium
CN111832413A (en) * 2020-06-09 2020-10-27 天津大学 People flow density map estimation, positioning and tracking method based on space-time multi-scale network
WO2021151296A1 (en) * 2020-07-22 2021-08-05 平安科技(深圳)有限公司 Multi-task classification method and apparatus, computer device, and storage medium
CN112991399A (en) * 2021-03-23 2021-06-18 上海工程技术大学 Bus passenger number detection system based on RFS
CN114708554A (en) * 2022-04-12 2022-07-05 南京邮电大学 Intelligent library people flow monitoring method and device based on face detection
CN114926422A (en) * 2022-05-11 2022-08-19 西南交通大学 Method and system for detecting boarding and alighting passenger flow

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向视频监控场景的人脸检测与跟踪方法研究;严秋实;《长江信息通信》;第34卷(第9期);52-55 *

Also Published As

Publication number Publication date
CN116895047A (en) 2023-10-17

Similar Documents

Publication Publication Date Title
Yang et al. Class-agnostic few-shot object counting
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN107145862B (en) Multi-feature matching multi-target tracking method based on Hough forest
US5442716A (en) Method and apparatus for adaptive learning type general purpose image measurement and recognition
CN111932583A (en) Space-time information integrated intelligent tracking method based on complex background
CN111985348B (en) Face recognition method and system
CN113297956B (en) Gesture recognition method and system based on vision
CN109993061A (en) A kind of human face detection and tracing method, system and terminal device
Hebbale et al. Real time COVID-19 facemask detection using deep learning
CN107590427A (en) Monitor video accident detection method based on space-time interest points noise reduction
CN113706481A (en) Sperm quality detection method, sperm quality detection device, computer equipment and storage medium
Chhadikar et al. Image processing based tracking and counting vehicles
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
Sakthimohan et al. Detection and Recognition of Face Using Deep Learning
CN116895047B (en) Rapid people flow monitoring method and system
Alsaedi et al. Design and Simulation of Smart Parking System Using Image Segmentation and CNN
CN112597842B (en) Motion detection facial paralysis degree evaluation system based on artificial intelligence
Afrin et al. AI based facial expression recognition for autism children
Sonthi et al. A Deep learning Technique for Smart Gender Classification System
Rao et al. Convolutional Neural Network Model for Traffic Sign Recognition
CN111832475A (en) Face false detection screening method based on semantic features
Raja et al. A novel deep learning based approach for object detection using mask R-CNN in moving images
Sinha et al. Ensemble based feature extraction and deep learning classification model with depth vision
Babu et al. An Automatic Student Attendance Monitoring System Using an Integrated HAAR Cascade with CNN for Face Recognition with Mask.
Ahuja et al. Object Detection and Classification for Autonomous Drones

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant