CN117037035A - Student data intelligent acquisition method and device based on human eyes - Google Patents

Student data intelligent acquisition method and device based on human eyes Download PDF

Info

Publication number
CN117037035A
CN117037035A CN202311030823.4A CN202311030823A CN117037035A CN 117037035 A CN117037035 A CN 117037035A CN 202311030823 A CN202311030823 A CN 202311030823A CN 117037035 A CN117037035 A CN 117037035A
Authority
CN
China
Prior art keywords
visitor
track
personnel
dcmc
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311030823.4A
Other languages
Chinese (zh)
Inventor
刘海
张昭理
吴晨
吴砥
代书铭
郭惠敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202311030823.4A priority Critical patent/CN117037035A/en
Publication of CN117037035A publication Critical patent/CN117037035A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a student data intelligent acquisition method and device based on human eyes, wherein the method comprises the following steps: acquiring video streams of visitors collected by cameras deployed in different exhibition areas in an exhibition hall, and generating detection frames of each visitor through image detection to form a detection frame set to be queried; extracting appearance features, human body posture features and virtual tracks of each visitor from the video stream by using a trained feature extraction model; matching the detection frame with the visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track, and generating personnel track matching information; and generating activity data of the visitor according to the personnel track matching information and the pre-established exhibition hall camera link topology, wherein the activity data comprises identity information and activity track data of the visitor. The application can effectively identify the visitors in the exhibition hall and generate the moving track, and can overcome the problems of illumination change, shielding and the like.

Description

Student data intelligent acquisition method and device based on human eyes
Technical Field
The application relates to the technical field of personnel re-identification, in particular to an intelligent student data acquisition method and device based on human eyes.
Background
The data acquisition and the identification of the visitors in the science and technology museum are important components of exhibition area management. Traditional manual supervision and management mode is difficult to adapt to modern science and technology museum management demand, in order to grasp and manage visiting activity information of personnel better, need set up video acquisition system in the exhibition district and carry out personnel re-identification system, realizes accurate location and the data acquisition of personnel.
However, the existing video acquisition system installed in the exhibition area of the science and technology museum often lacks functions such as automatic detection and personnel re-identification, and more noise can appear in the video stream due to the light change of the exhibition area caused by the exhibition content of the exhibition area, and the image can lose details and definition, so that the results of personnel detection and personnel re-identification in the exhibition area of the science and technology museum are affected. In addition, the aggregation behavior of people in the exhibition area often causes non-target pedestrian shielding or non-pedestrian shielding, reduces the efficiency and accuracy of personnel re-identification, and causes potential adverse consequences for traffic management of science and technology museums and visit management of the exhibition area.
Disclosure of Invention
Aiming at least one defect or improvement requirement of the prior art, the invention provides an intelligent student data acquisition method and device based on human eyes, which can resist illumination change, resist shielding and accurately identify personnel and analyze activity paths of visitors in different exhibition areas of a science and technology center.
To achieve the above object, according to a first aspect of the present invention, there is provided an intelligent student data collection method based on human eyes, comprising:
acquiring video streams of visitors collected by cameras deployed in different exhibition areas in an exhibition hall, and generating detection frames of each visitor through image detection to form a detection frame set to be queried;
extracting appearance features, human body posture features and virtual tracks of each visitor from the video stream by using a trained feature extraction model;
matching the detection frame with the visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track, and generating personnel track matching information;
and generating activity data of the visitor according to the personnel track matching information and the pre-established exhibition hall camera link topology, wherein the activity data comprises identity information and activity track data of the visitor.
Further, in the student data intelligent acquisition method, the extraction process of the appearance characteristics of the visitor comprises the following steps:
personnel global feature extraction: preliminary feature extraction is carried out on the video stream images of the visitors, and feature vectors are generated; after converting the feature vector into a one-dimensional feature sequence, calculating the feature vector through a self-attention mechanism and generating a final global feature vector after the feature vector is fully connected with a layer;
personnel local feature extraction: introducing a learnable set of local prototypes Representing a local classifier, distributing pixels of the global feature vector to the ith local, extracting foreground local features in the global feature vector by using a cross attention mechanism, and obtaining a final result after a full connection layerA local feature vector;
and connecting the global feature vector with the local feature vector in series to generate the appearance features of the visitors.
Further, in the student data intelligent acquisition method, the extraction process of the human body posture characteristics of the visitor comprises the following steps:
based on detection frames of the visitors, predicting human body posture key points of the corresponding personnel, and generating human body posture characteristics:wherein (x) i ,y i ,s i ) Is shown at the ith position (x i ,y i ) On s i Representing the confidence level of each human keypoint.
Further, in the student data intelligent acquisition method, the virtual track acquisition process of the visitor is as follows:
for each detection frame of the visitors, adopting a Kalman filtering algorithm to generate a virtual trackAnd the predicted state vector value is +.>The method comprises the following steps:
(1) Virtual track generation: generating a posterior state predicted value tau, a posterior covariance matrix P, a state transition matrix F, an observation matrix O and a noise matrix N existing between two modes based on a Kalman filtering algorithm; in each frame t, the observed value of the target observed at the last is set toThe observed value of the re-trigger association is denoted +.>The virtual track is expressed as:
(2) Virtual trajectory iteration: the posterior state predictive value tau is along the virtual trackAnd carrying out the iteration of prediction and re-updating, wherein the operation of the prediction and the re-updating comprises the following formula:
until the observed value on the virtual track matches the state vector value calibrated by the latest real observed value.
Further, according to the student data intelligent acquisition method, the detecting frame is matched with the visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track, and the personnel track matching information is generated, and the student data intelligent acquisition method comprises the following steps:
Calculating the distance between the feature vector extracted based on the prediction state vector value and the appearance feature, and generating the similarity of the appearance feature of the person;
the positions and the confidence degrees of the key points of the human body in the human body posture characteristics are used as vectors and state vectors of a Kalman filtering algorithm, and the similarity of key information of the human body posture is calculated;
constructing a distance matrix from the similarity of the appearance features of the personnel and the similarity of the key information of the human body gestures in a virtual track set and a detection frame set to be inquired, and matching the detection frame in the current frame with each existing virtual track in the previous frame according to the distance matrix; if the matching is successful, the virtual track is identical to the target identity of the detection frame, and the personnel track matching information corresponding to the visiting personnel in the detection frame is generated.
Further, the student data intelligentThe acquisition method adopts a Ford-Fulkerson algorithm based on network flow to match the virtual track with the detection frame, and the calculation mode is as follows: g t,t-1 =FFA(D t,t-1 );
Wherein D is t,t-1 Representing a distance matrix; g t,t-1 Is a virtual track set U t-1 And a detection frame set omega to be queried t Is the best match of (a); if solve G t,t-1 A value of 1 indicates a virtual trackDetection frame in collection with detection frame +. >The target identity of (a) is the same.
Further, according to the student data intelligent acquisition method, the establishment method of the exhibition hall camera link topology comprises the following steps:
(1) Establishing a link topological structure among different cameras, wherein the link topological structure is expressed as: g DCMC =(V DCMC ,E DCMC ) Wherein V is DCMC Representing camera, V DCMC ={d i |1≤i≤N DCMC };E DCMC Representing the transfer distribution between different cameras, E DCMC ={p i,j (Δt)|1≤i≤N DCMC ,1≤j≤N DCMC ,i≠j};
Wherein N is DCMC D is the total number of cameras in the link topology structure i Represents the i-th camera, d j Represents the j-th camera, p i,j (Δt) represents d i And d j A transfer distribution between;
(2) Establishing a topological structure between different exhibition areas, which is expressed as: g EA =(V EA ,E EA ),V EA ={d i(k) |1≤i≤N DCMC, 1≤k≤A i },
Wherein A is i Representing the number of exhibition areas covered by the ith camera, d i(k) Representing the kth exhibition area where the ith camera is positioned;is d i(k) And d j(k) A transition profile therebetween;
(3) Performing iterative optimization of the topological structure, wherein the iterative optimization comprises two steps of updating a time window and repositioning vanishing personnel;
updating the time window T: for one transfer distribution p (Δt) to N (μ, α2) between any two extents, the lower and upper temporal limits of the transfer distribution p (Δt) are adjusted using an a priori topology as follows:
wherein μ is a constant, T min At the lower limit, T max As an upper limit, with the calculated time limit, the time window T is updated as:
Where α (p (Δt)) is the gaussian fitting error rate;
repositioning vanishing personnel: depending on the topology, a visitor disappearing at time T in the exit area is expected to appear around time (t+t) in the entrance area of the other camera; searching the corresponding relation of the person in the topological structure, wherein the time slot center of the corresponding relation is close to (t+T), and when the appearance feature similarity and/or the human body posture key information similarity of the corresponding relation are/is larger than a preset value, the corresponding relation is used as a reliable corresponding relation, and the transfer distribution is updated;
repeating the steps until the transfer distribution converges, executing the process between all the exhibition areas in the link topology structure until the topology structure is not changed any more or the change amount in a plurality of iterative processes is in a preset range, and stopping the iterative process.
Further, in the student data intelligent acquisition method, the generating the activity data of the visitor according to the personnel track matching information and the pre-established exhibition hall camera link topology includes:
executing path association among different cameras according to personnel track matching information, wherein the paths needing to be associated are defined asVirtual track and identity ID of target visitor are put into search pool +.>And executing the following procedures:
According to the virtual track of the visitorAnd a priori generating candidate path library of camera link topology of exhibition hall +.>
From the candidate path librarySelecting the path with highest similarity +.>And search library->And (3) associating the paths to generate the activity path information of the visitors, wherein the activity path information comprises a time stamp, an ID and a visiting area number.
According to a second aspect of the present invention, there is also provided an eye-like student data intelligent acquisition device, comprising:
the data acquisition unit comprises a plurality of cameras which are respectively arranged in different exhibition areas in the exhibition hall, and the cameras are used for acquiring video streams of visitors;
the target detection unit is configured to detect images of the video streams of the visitors acquired by the cameras, generate detection frames of each visitor, and form a detection frame set to be queried;
a feature extraction unit configured to extract appearance features, human body posture features, and virtual trajectories of each visitor from the video stream using a trained feature extraction model;
the target matching unit is configured to match the detection frame with a visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track, and generate personnel track matching information;
And the activity path generation unit is configured to generate activity data of the visitor according to the personnel track matching information and the pre-established exhibition hall camera link topology, wherein the activity data comprises identity information and activity track data of the visitor.
Further, the data acquisition unit of the student data intelligent acquisition device also comprises a control module and a light intensity sensing module;
the control module is used for controlling the operation of the camera and adaptively adjusting the working parameters of the camera according to the environmental parameters of the exhibition area;
the light intensity sensing module is used for collecting illumination intensity in the exhibition area and switching the camera to work in a visible light mode collection mode or an infrared collection mode according to the illumination intensity.
Further, the student data intelligent acquisition device further comprises a user interface unit for visually displaying the visitor activity path data, and providing inquiry operation buttons and setting options.
According to a third aspect of the present invention there is also provided a computer device comprising at least one processing unit and at least one storage unit, wherein the storage unit stores a computer program which, when executed by the processing unit, causes the processing unit to perform the steps of the student data intelligent acquisition method of any one of the above.
In general, the above technical solutions conceived by the present application, compared with the prior art, enable the following beneficial effects to be obtained:
the student data intelligent acquisition method and device based on human eyes can effectively acquire data, identify and generate an activity track for the visitors in the exhibition hall, can overcome the problems of illumination change, shielding and the like, is beneficial to management and service of the visitors in the exhibition hall, can know the behavior mode and the visit preference of the visitors in the exhibition hall, and can provide data support for comprehensive literacy evaluation of the visitors in the exhibition hall.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a student data intelligent acquisition method based on human eyes according to an embodiment of the present application;
fig. 2 is a flowchart of an implementation of a student data intelligent acquisition method based on human eyes according to an embodiment of the present application;
FIG. 3 is a schematic view of a tripod mount for deployment of a binocular multimodal camera in an exhibition hall;
FIG. 4 is a high-level wall-mount schematic of a binocular multimodal camera deployed in an exhibition hall;
FIG. 5 is a schematic diagram of a network structure of a personnel appearance feature extractor according to an embodiment of the present application;
fig. 6 is a logic block diagram of an intelligent student data acquisition device based on human eyes according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. In addition, the technical features of the embodiments of the present application described below may be combined with each other as long as they do not collide with each other.
The terms first, second, third and the like in the description and in the claims and in the above drawings, are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Furthermore, well-known or widely-used techniques, elements, structures, and processes may not be described or shown in detail in order to avoid obscuring the understanding of the present invention by the skilled artisan. Although the drawings represent exemplary embodiments of the present invention, the drawings are not necessarily to scale and certain features may be exaggerated or omitted in order to better illustrate and explain the present invention.
Fig. 1 is a flow chart of an intelligent student data collection method based on human eyes provided in this embodiment, and fig. 2 is a flow chart of an implementation of the intelligent student data collection method based on human eyes provided in this embodiment, please refer to fig. 1 and 2, and the method mainly includes the following steps:
s1, acquiring video streams of visitors collected by cameras deployed in different exhibition areas in an exhibition hall, and generating a detection frame of each visitor through image detection to form a detection frame set to be queried;
in the method, a plurality of cameras are deployed in different exhibition areas in the exhibition hall, and in order to ensure the quality of the collected images under different illumination environments, a binocular multi-mode camera (DCMC) is preferably adopted and can work in a visible light mode collection mode and an infrared light mode collection mode respectively In the acquisition mode, the acquisition mode is expressed as: dcmc= { (CVIS) 1 ,CIR 1 );(CVIS 2 ,CIR 2 );…;(CVIS N ,CIR N ) -wherein N represents the number of DCMC; in different exhibition areas of the exhibition hall, audience flow lines and main observation points of a specific exhibition hall are determined first, DCMC (distributed control module) normal positions are deployed towards the audience flow lines and the main observation points, and the installation mode is divided into tripod erection as shown in figure 3 or high-position wall mounting as shown in figure 4 according to the area size of the exhibition hall and the room building structure of the exhibition hall. In this embodiment, the tripod-mount DCMC is deployed in an area of less than 150m 2 The support plate is made of copper-aluminum alloy, and the copper-aluminum alloy can effectively resist the magnetic field interference of abrupt change in the space of the exhibition area; the ground clearance of the supporting plate is 1500mm, the distance between a pair of DCMC is 150mm, the DCMC is fixed on the supporting plate in parallel, the parallax of the two cameras is ensured to determine the distance and depth of an object, and the binocular stereoscopic imaging effect is achieved; the high-level wall DCMC is deployed in the area larger than 150m 2 The installation gasket between the DCMC and the wall in the exhibition area is made of copper-aluminum alloy; the DCMC is 2500mm away from the ground, 150mm apart, and is fixed on the wall in parallel, so that the binocular stereoscopic imaging effect is achieved under a large view field.
A binocular multi-mode camera DCMC is used for collecting video streams of visitors in different exhibition areas of the exhibition hall; the illumination intensity sensing module of the DCMC acquires an illumination intensity data value in a current exhibition area, compares the illumination intensity data value with a preset illumination threshold value, and starts to acquire a video stream when the illumination threshold value is larger than 40% of illumination intensity, which indicates that the illumination of the current exhibition area is sufficient; if the illumination threshold value is smaller than 40% of illumination intensity, the illumination of the current exhibition area is weak, and the CIR of the infrared collector in the DCMC is activated to start to collect video streams.
After the binocular multimode camera DCMC acquires a plurality of synchronous video streams, image detection is carried out on an image sequence of the video streams by utilizing a multimode target detector, and a return value is detection frame position information of a visitor and is expressed as: e, e m = { x, y, h, w }, m e { visible, infra }, (x, y) represents the upper left corner pixel coordinates of the detection frame, (h, w) represents the detection frame width and height, and m represents the image mode in which the current detection frame is locatedState: a visible light mode or an infrared mode.
S2, extracting appearance features, human body posture features and virtual tracks of each visitor from the video stream by using a trained feature extraction model;
(1) Appearance feature extraction
The extraction process of the appearance characteristics of the visitor comprises global characteristic extraction and local characteristic extraction;
personnel global feature extraction: preliminary feature extraction is carried out on the video stream images of the visitors, and feature vectors are generated; after converting the feature vector into a one-dimensional feature sequence, calculating the feature vector through a self-attention mechanism and generating a final global feature vector after the feature vector is fully connected with the layer;
personnel local feature extraction: introducing a learnable set of local prototypes Representing a local classifier, distributing pixels of the global feature vector to an ith local, extracting foreground local features in the global feature vector by using a cross attention mechanism, and obtaining a final local feature vector after a full connection layer;
And then, the global feature vector and the local feature vector are connected in series, so that appearance features of the visitor can be generated.
In the embodiment, the appearance characteristic extraction of the visitor is realized by using a personnel appearance characteristic extractor, and the personnel appearance characteristic extractor comprises a multi-mode fusion device, a human global characteristic extraction module and a human local characteristic extraction module under the condition of using a multi-mode camera to acquire images; when training the appearance feature extractor of the personnel, the training set comprises a visible light image set And infrared image set-> Wherein (1)>And->Respectively representing Infrared (Infrared) and Visible (Visible) images from a visitor in the same frame of video, and N represents the number of images.
Fig. 5 is a schematic diagram of a network structure of a personnel appearance feature extractor provided in this embodiment, referring to fig. 5, a backbone network of a multi-mode feature fusion device is formed by a cross-connect dual-flow CNN network with an embedded mode fusion module, and each CNN flow has a three-layer downsampling structure; the multi-mode feature fusion device carries out fusion operation on the infrared image and the visible light image to generate a feature vector F vis And F ir
The multimodal fusion process can be divided into the following steps:
Step 1-1, in the first stage, two modal images enter a DenseNet layer to perform preliminary feature extraction after passing through a 3 multiplied by 3 convolution layer and a BN+ReLU layer to obtain preliminary features, which are expressed as X v is And X i r
Step 1-2, second stage X v is And X i r Simultaneously, an embedded mode fusion attention module is entered, and global average pooling operation is utilized and expressed as:
step 1-3, compressed vector V vis ,V ir Entering a full connection layer, calculatingObtaining a fused vector V by using the weight omega and the bias term b fuse The calculation is expressed as: v (V) fuse =ω[(V vis ,V ir )]+b, where [. Cndot.]Representing a channel connect operation.
Step 1-4, vector V fuse Respectively enter two full connection layers to generate two channel correction vectors R vis And R is jr The calculation is as follows: r is R vis =ω 1 V fuse +b 1 ,R ir =ω 2 V fuse +b 2 Wherein omega 1 ,ω 2 And b 1 ,b 2 Is the weight and bias of the two fully connected layers.
Step 1-5, R vis And R is ir Integrating the channel phase method into two input modes to finally obtain a feature vector F after the mode feature fusion vis And F ir Expressed as:
F vis =σ(R vis )*X' vis ,F ir =σ(R ir )*X′ ir (2)
where σ represents the activation function and "x" represents the channel multiplication operation, by means of multi-modality image feature fusion, the channels of each modality can be calibrated and the image background feature information extracted.
Personnel global feature extraction: vector F after modal fusion vis And F ir Can be expressed as a feature vector sequence F m =R n ×w×d The global feature extraction process of m epsilon { visible, updated }, the global feature extraction of the two modes is the same, taking visible light mode as an example, and the personnel global feature extraction is divided into the following steps:
and 2-1, feature vector dimension reduction. The encoder requires a one-dimensional sequence as input for a two-dimensional feature sequence F.epsilon.R n ×w×d Performing flattening and dimension reduction operation to obtain a 1-dimensional characteristic sequence F with the size of hw multiplied by d vis =[f 1,vis ;f 2,vis ;...;f hw,vis ],f i,vis ∈R 1×d
Step 2-2, self-attention mechanism. F (F) vis Generating corresponding Keys, queries and Values after self-attention mechanism,the mathematical expression is as follows:
wherein i, j e 1,2,..hw; w (W) Q ,W K ,W V Respectively representing weight matrixes of the linear mapping corresponding to the Keys, the query and the value; for the followingThe attention weight is based on the dot product similarity of the query and key calculated as:
wherein,the corresponding proportionality coefficient is obtained by the operation of a self-attention mechanism, and the feature vector is as follows:
self-attention weight ζ i,j Modeling interdependence information between an ith pixel and a jth pixel in a feature input sequence;
step 2-3, a multi-head attention mechanism. Combining all the outputs calculated by the self-attention mechanism to obtainIt contains information of global neighboring pixels, which are then calculated using a feed forward network, expressed as:
Wherein FFN (·) represents a neural network with two fully connected layers, each normalized before operation using residual connection to mitigate overfitting, and generating final global feature vectors after passing through the attention mechanism, expressed as:
and 2-4, classifying the identities. Personnel global feature extraction encoder and then identity classifier ID (identity classifier) is added g Predicting class distribution to obtain probability p g Expressed as:ID g cross entropy loss L with real identity tags cls And triplet loss L tri As a target for classification training;
step 2-5, the encoder trains the total loss function. The trained encoder module performs context awareness extraction of personnel global features, and utilizes global averaging pooling constraint global features to satisfy a target loss function, expressed as:f g ∈R 1×d personnel global feature extraction training stage total loss function L en The definition is as follows:
wherein mu c ,μ tri To control the hyper-parameters of the identity classification loss function and the triplet loss function,representing the same identity positive sample pair f g ,/>Distance between them, then->Representing negative sample pairs f of different identities g ,/>The distance between the two is alpha, and alpha represents a marginal parameter.
Personnel local feature extraction: in the embodiment, the personnel local feature is positioned by using a weak supervision mode, and the personnel local feature is extracted based on an encoder, wherein the extraction process comprises the following steps:
Step 3-1, cross-attention mechanism. First a learnable local paradigm set is introducedIt represents a local classifier, which determines the feature vector +.>Whether the pixel of (2) belongs to the ith part or not, extracting +.>Foreground local features of (a) are extracted for encoder training>The cross-attention operation is expressed as:
wherein i, j e 1,2,..hw; w (W) Q ,W K ,W V Respectively representing weight matrixes of the linear mapping corresponding to the Keys, the query and the value;
and 3-2, generating local characteristics. For each local paradigmThe calculation of its local mask is expressed as:
wherein sigma i,j Representing feature vectorsThe probability belonging to the ith foreground part is calculated, the attention weights of all hw positions are calculated respectively, and a mask set is formed and expressed as: m is M i =[σ i,1 ,σ i,2 ,...,σ i,hw ]The i-th partial feature is further obtained by a weighted set, defined as a weighted sum of all values, expressed as:
obtained byThe final local features are obtained by two fully connected layers, expressed as:
step 3-3, the decoder trains the total loss function. Training an optimizer local feature extraction model decoder using local classification loss and triplet loss, total loss function L loc The definition is as follows:
In the extraction process of the global and local features of the personnel, the personnel appearance feature extractor model is trained by minimizing the total target, and is expressed as follows:
L cmgl =L En +L loc (13)
for each not seeThe image of the identity after the steps is processed to obtain the global feature f g And local featuresExpressed in series as:
wherein [ (x)]Represents series operation, f cmgl And the final personnel appearance characteristics obtained in the characteristic extraction stage.
(2) Human body posture feature extraction
Dcmc= { (CVIS) for intra-exhibition hall deployment 1 ,CIR 1 );(CVIS 2 ,CIR 2 );...;(CVIS N ,CIR N ) V is set up i =(CVIS i ,CIR i ) The video stream acquired by the ith binocular multimode camera DCMC is represented, and a detection frame set to be queried is generated for the current image frame by utilizing the multimode target detector in the step S1t represents a time stamp, i represents a data from the ith DCMC; the present embodiment predicts each detection box +.>Human body posture key points of the visiting person, and the generated human body posture key information set is expressed as follows:
wherein, (x) i ,y i ,s i ) Is shown at the ith position (x i ,y i ) On s i Representing the confidence level of each human keypoint.
(3) Virtual track generation
In the process of personnel trackingEstablishing a visitor activity track set for v i Detection frame set omega to be queried corresponding to each frame of image in video stream t Generating a virtual track by adopting a Kalman filtering algorithm based on CmaoSORTAnd the predicted state vector value is +.>The method comprises the following steps:
virtual track generation: the Kalman filtering algorithm based on CmaoSORT generates a posterior state predicted value tau, a posterior covariance matrix P, a state transition matrix F, an observation matrix O and a noise matrix N existing between two modes, and in each frame t, the last observed target observed value is set asThe associated observations of the retrigger are denoted +.>The virtual trajectory is expressed as:
virtual trajectory iteration: predicted values based on posterior τ are followed along the trajectoryAnd carrying out the iteration of prediction and re-updating, wherein the operation of the prediction and the re-updating comprises the following formula:
because the observed value on the virtual track is matched with the state vector information calibrated by the latest real observed value, the updating is not influenced by the error accumulated by the updating of the Kalman filtering algorithm.
S3, matching the detection frame with the visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track of the visitor to generate personnel track matching information;
in an alternative embodiment, it comprises:
(1) Calculating the distance between the feature vector extracted based on the prediction state vector value and the appearance feature, and generating the similarity of the appearance feature of the person;
In one particular example, the cosine distance is used to calculate the slave prediction state vector valueThe appearance characteristic vector of the person extracted in step S2 and the appearance characteristic +.>The feature distance between the two features is taken as the appearance similarity, and the calculation formula is as follows:
wherein,is indicated at->The appearance characteristics of the target personnel in the detection image in the track are larger as the characteristic distance is smaller and the matching similarity is larger.
(2) The positions and the confidence degrees of the key points of the human body in the human body posture characteristics are used as vectors and state vectors of a Kalman filtering algorithm, and the similarity of key information of the human body posture is calculated;
in a toolIn the body example, the positions and the confidence degrees of the key points of the human body in the human body posture features extracted in the step S2 are used as vectors and state vectors of a Kalman filtering algorithm, and the calculation is performedAnd->Similarity between, expressed as:
wherein c (·) represents the cosine distance between each key point, γ p The weight parameter of each human body key point is represented, and a kappa (·) function is used for judging whether the corresponding human body key point is valid or not;
the overall similarity is expressed as:
(3) Constructing a distance matrix from the similarity of the appearance features of the personnel and the similarity of the key information of the human body gestures in the virtual track set and the detection frame set to be inquired, and matching the detection frame in the current frame with each virtual track existing in the previous frame according to the distance matrix; if the matching is successful, the virtual track is identical to the target identity of the detection frame, and the personnel track matching information corresponding to the visiting personnel in the detection frame is generated.
In an alternative embodiment, the matching between the virtual trace and the detection box is performed using a Ford-Fulkerson algorithm based on network flows. Similarity of appearance characteristics of personnel and similarity of key information of human body gestures are set in track set U t-1 And a detection set to be queried omega t Building a distance matrix D t,t-1 The task of data correlation between candidate detection frames in the current frame and existing trace sets in the previous frame can be considered as a maximum weight matching problem and use of network-based streamingIs solved by the Ford-Fulkerson algorithm. Specifically, each detection frame can be regarded as a left node, each existing track is regarded as a right node, the weight is a distance value between two nodes, and then the track of the best match between each detection frame in the current frame and the track of the best match in the previous frame is obtained by solving the maximum weight match. The formula is as follows:
G t,t-1 =FFA(D t,t-1 ) (22)
wherein G is t,t-1 Is a virtual track set U t-1 And a detection frame set omega to be queried t Is the best match of (a); if solve G t,t-1 A value of 1 indicates a virtual trackDetection frame in collection with detection frame +.>The target identity of (a) is the same.
S4, generating activity data of the visitor according to the personnel track matching information and the pre-established exhibition hall camera link topology, wherein the activity data comprises identity information and activity track data of the visitor.
In this step, first, the priori matching information is used to generate the trajectory data of the visiting students, and the link topology of the cameras of the exhibition hall is established, denoted as g= (V, E), where y represents the multi-mode camera pair (CVIS) in the DCMC i ,CIR i ) E represents the transfer distribution between different camera pairs. The specific establishment method comprises the following steps:
(1) Establishing a link topology structure between different cameras, including:
and 4-1, initializing the corresponding relation of the association personnel.
The entire population is first divided into a plurality of subgroups using time stamps and a series of people feature classifiers having overlapping time windows are trained using overlapping time windows T. By means of other (CVIS) i ,CIR i ) The person feature classifier searches for person correspondences when a visitor is in a certain (CVIS i ,CIR i ) When disappearing, in the time range [ T-T, t+T ]]Inside from other (CVIS) i ,CIR i ) Searching the corresponding relation of the personnel. If a plurality of personnel feature classifiers overlap with the time range, testing all the personnel feature classifiers, selecting the most reliable one of the personnel feature classifiers as the optimal corresponding relation, and using the personnel appearance feature similarity score or the overall similarity as the judging basis of the optimal corresponding relation, when χ i,j >θ sim When the corresponding relation is the most reliable corresponding relation; θ sim Representing a preset similarity threshold parameter, in one particular example, θ sim =0.8。
And 4-2, calculating the transfer distribution among DCMC.
Firstly, calculating the time difference of the corresponding relation, making a histogram of the time difference, and normalizing the histogram by the total number of the reliable corresponding relation. The transfer distribution is denoted as p (Δt). The corresponding relation of the histograms obtained by DCMC with strong connection is obviously dense in a certain time difference, and the corresponding relation of the histograms obtained by DCMC with weak connection or without connection is sparse in the time difference.
And 4-3, performing transfer distribution inspection.
If there is a topological connection between two pairs of DCMC, the transfer distribution follows a normal distribution, N (μ, σ) of the Gaussian model 2 ) Fitting to the distribution p (Δt), the connection confidence defining two pairs of DCMC is expressed as:
conf(p(Δt))=exp(-σ)*(1-α(p(Δt))) (23)
wherein the connection confidence level range [0,1 ]]Alpha (p (Δt)) represents model fitting error, when conf (p (Δt)) > θ conf When two pairs of DCMC are defined as active connections.
And 4-4, establishing a topological structure.
With the above steps, the link topology establishment between cameras in an exhibition hall can be expressed as:
wherein N is DCMC For the total number of DCMC in the camera linkQuantity d i Represents the ith DCMC, p i,j (Δt) represents d i And d j Distribution of transitions between.
(2) Establishing topology between different exhibition areas
For different display areas, the DCMC defaults to consider the spatial prior information (ingress/egress areas) of each display area in the deployment phase, and only the pairs of areas that egress to ingress are considered when two display areas belong to different DCMC. A visitor is missing at the exit area at time T and is likely to be present at the entrance areas of different DCMC's for a certain time interval T. Therefore, searching the corresponding relation of missing persons in the entrance areas in different DCMC in the time range [ T, t+T ]; likewise, the person feature classifier trained for the entry region measures the connectivity confidence of all possible region pairs with reliable correspondence, expressing the inter-region pair topology as:
wherein A is i Representing the number of display areas covered by the ith DCMC, d i(k) Indicating the kth exhibition where the ith DCMC is located.Representation d i(k) And d j(k) A transition profile therebetween.
(3) Topology iterative optimization
After obtaining the link topological structures among different cameras and the topological structure diagrams among different exhibition areas, carrying out iterative updating on the personnel re-identification result and the network topological structure, wherein the specific steps are as follows:
step 5-1, updating the time window T.
One transfer profile p (Δt) to N (μ, σ2) is between the two regions, the lower and upper temporal limits of the transfer profile p (Δt) are adjusted using an a priori topology, as follows:
wherein μ is a constant, T min At the lower limit, T max As an upper limit, with the calculated time limit, the time window T is updated as:
where α (p (Δt)) is the gaussian fitting error rate. When the fitting error is large, the time window T becomes large.
And 5-2, repositioning the vanishing personnel.
Depending on the topology, a student who disappears at time T in the exit area is expected to appear around time (t+t) in the entrance area of another camera. By using topology information, we search the corresponding relation of the person from the feature classification model, the time slot center of the person is close to (t+T), and the person is regarded as similarity score χ i,j >θ sim And as a reliable corresponding relation and updating the transfer distribution.
Repeating the steps until the transfer distribution converges, executing the process for all exhibition areas in the DCMC network topology structure, and stopping the process if the topology structure is not changed any more or is slightly changed in a plurality of iterative processes.
And then, carrying out personnel activity path association by using the personnel track matching information and the DCMC network topology structure to generate activity data of the visitors. Specific:
Executing path association among different cameras according to personnel track matching information, wherein the paths needing to be associated are defined asVirtual track and identity ID of target visitor are put into search pool +.>And executing the following procedures:
according to the virtual track of the visitorAnd a priori generating candidate path library of camera link topology of exhibition hall +.>
From a candidate path librarySelecting the path with highest similarity +.>And search library->And (3) associating the paths to generate the activity path information of the visitors, wherein the activity path information comprises a time stamp, an ID and a visiting area number.
It should be noted that while in the above-described embodiments the operations of the methods of the embodiments of the present specification are described in a particular order, this does not require or imply that the operations must be performed in that particular order or that all of the illustrated operations be performed in order to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
The embodiment provides an intelligent student data acquisition device based on human eyes, which can be realized in a software and/or hardware mode and can be integrated on electronic equipment; FIG. 6 is a logic block diagram of the student data intelligent acquisition device provided in the present embodiment, as shown in FIG. 6, the device includes a data acquisition unit, a target detection unit, a feature extraction unit, a target matching unit, an activity path generation unit, and a user interface unit; wherein,
The data acquisition unit comprises a plurality of cameras which are respectively arranged in different exhibition areas in the exhibition hall, and the cameras are used for acquiring video streams of visitors; the camera preferably adopts a binocular multimode camera, and can work in a visible light mode acquisition mode and an infrared acquisition mode respectively. The mounting mode of the camera is shown in fig. 3 and 4.
Further preferably, the data acquisition unit further comprises a control module and a light intensity sensing module;
the control module is used for controlling the operation of the camera and adaptively adjusting the working parameters of the camera according to the environmental parameters of the exhibition area;
the light intensity sensing module is used for collecting illumination intensity in the exhibition area and switching the camera to work in a visible light mode collection mode or an infrared collection mode according to the illumination intensity.
The target detection unit is configured to perform image detection on the video stream of the visitor acquired by the camera, generate a detection frame of each visitor, and form a detection frame set to be queried;
the feature extraction unit is configured to extract appearance features, human body posture features and virtual tracks of each visitor from the video stream by using the trained feature extraction model;
The target matching unit is configured to match the detection frame with the visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track, and generate personnel track matching information;
the activity path generating unit is configured to generate activity data of the visitor according to the personnel track matching information and a pre-established exhibition hall camera link topology, wherein the activity data comprises identity information and activity track data of the visitor;
the user interface unit is configured to visually display visitor activity path data, provide inquiry operation buttons, setting options, and the like.
For specific limitation of the student data intelligent acquisition device, reference may be made to the limitation of the student data intelligent acquisition method hereinabove, and the description thereof will not be repeated here. All or part of each module in the student data intelligent acquisition device can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The embodiment also provides an electronic device, which includes at least one processor and at least one memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the student data intelligent acquisition method, and the specific steps are referred to above and are not repeated herein; in the present embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing terminal, display, etc.), with one or more terminals that enable a user to interact with the electronic device, and/or with any terminal (e.g., network card, modem, etc.) that enables the electronic device to communicate with one or more other computing terminals. Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), and/or a public network such as the internet via a network adapter.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be performed by hardware associated with a program that is stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The student data intelligent acquisition method based on human eyes is characterized by comprising the following steps of:
acquiring video streams of visitors collected by cameras deployed in different exhibition areas in an exhibition hall, and generating detection frames of each visitor through image detection to form a detection frame set to be queried;
extracting appearance features, human body posture features and virtual tracks of each visitor from the video stream by using a trained feature extraction model;
matching the detection frame with the visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track, and generating personnel track matching information;
And generating activity data of the visitor according to the personnel track matching information and the pre-established exhibition hall camera link topology, wherein the activity data comprises identity information and activity track data of the visitor.
2. The student data intelligent acquisition method as claimed in claim 1, wherein the extraction process of the appearance characteristics of the visitors comprises:
personnel global feature extraction: preliminary feature extraction is carried out on the video stream images of the visitors, and feature vectors are generated; after converting the feature vector into a one-dimensional feature sequence, calculating the feature vector through a self-attention mechanism and generating a final global feature vector after the feature vector is fully connected with a layer;
personnel local feature extraction: introducing a learnable set of local prototypes Representing a local classifier, distributing pixels of the global feature vector to an ith local, extracting foreground local features in the global feature vector by using a cross attention mechanism, and obtaining a final local feature vector after a full connection layer;
and connecting the global feature vector with the local feature vector in series to generate the appearance features of the visitors.
3. The student data intelligent acquisition method as claimed in claim 1, wherein the extraction process of the human body posture characteristics of the visitor comprises:
And predicting human body posture key points of corresponding personnel based on detection frames of the visitors, and generating human body posture features including the position of each human body key point and the corresponding confidence level.
4. The student data intelligent acquisition method as claimed in claim 1, wherein the acquisition process of the virtual track of the visitor is:
for each detection frame of the visitors, adopting a Kalman filtering algorithm to generate a virtual trackAnd the predicted state vector value is +.>The method comprises the following steps:
virtual track generation: generating a posterior state predicted value tau, a posterior covariance matrix P, a state transition matrix F, an observation matrix O and a noise matrix N existing between two modes based on a Kalman filtering algorithm; in each frame t, the observed value of the target observed at the last is set toThe observed value of the re-trigger association is denoted +.>The virtual track is expressed as:
virtual trajectory iteration: the posterior state predictive value tau is along the virtual trackPerforming an iteration of prediction and re-update, predictionAnd the operation of the re-update is as follows:
until the observed value on the virtual track matches the state vector value calibrated by the latest real observed value.
5. The method for intelligently collecting student data according to claim 4, wherein the step of matching the detection frame with the visitor to be identified based on the appearance feature, the human posture feature and the virtual track to generate the person track matching information comprises the steps of:
Calculating the distance between the feature vector extracted based on the prediction state vector value and the appearance feature, and generating the similarity of the appearance feature of the person;
the positions and the confidence degrees of the key points of the human body in the human body posture characteristics are used as vectors and state vectors of a Kalman filtering algorithm, and the similarity of key information of the human body posture is calculated;
constructing a distance matrix from the similarity of the appearance features of the personnel and the similarity of the key information of the human body gestures in a virtual track set and a detection frame set to be inquired, and matching the detection frame in the current frame with each existing virtual track in the previous frame according to the distance matrix; if the matching is successful, the virtual track is identical to the target identity of the detection frame, and the personnel track matching information corresponding to the visiting personnel in the detection frame is generated.
6. The intelligent student data acquisition method of claim 5, wherein the matching between the virtual track and the detection frame is performed by adopting a Ford-Fulkerson algorithm based on network flow, and the calculation mode is as follows: g t,t-1 =FFA(D t,t-1 );
Wherein D is t,t-1 Representing a distance matrix; g t,t-1 Is a virtual track set U t-1 And a detection frame set omega to be queried t Is the best match of (a); if solve G t,t-1 A value of 1 indicates a virtual track Detection frame in collection with detection frame +.>The target identity of (a) is the same.
7. The student data intelligent acquisition method of claim 1, wherein the establishment method of the exhibition hall camera link topology is as follows:
(1) Establishing a link topological structure among different cameras, wherein the link topological structure is expressed as: g DCMC =(V DCMC ,E DCMC ) Wherein V is DCMC Representing camera, V DCMC ={d i |1≤i≤N DCMC };E DCMC Representing the transfer distribution between different cameras, E DCMC ={p i,j (Δt)|1≤i≤N DCMC ,1≤j≤N DCMC ,i≠j};
Wherein N is DCMC D is the total number of cameras in the link topology structure i Represents the i-th camera, d j Represents the j-th camera, p i,j (Δt) represents d i And d j A transfer distribution between;
(2) Establishing a topological structure between different exhibition areas, which is expressed as: g EA =(V EA ,E EA ),V EA ={d i(k) |1≤i≤N DCMC, 1≤k≤A i },
Wherein A is i Representing the ith cameraNumber of exhibition areas covered by image head d i(k) Representing the kth exhibition area where the ith camera is positioned;is d i(k) And d j(k) A transition profile therebetween;
(3) Performing iterative optimization of the topological structure, wherein the iterative optimization comprises two steps of updating a time window and repositioning vanishing personnel;
updating the time window T: for one transfer distribution p (Δt) to N (μ, σ2) between any two extents, the lower and upper temporal limits of the transfer distribution p (Δt) are adjusted using an a priori topology as follows:
wherein μ is a constant, T min At the lower limit, T max As an upper limit, with the calculated time limit, the time window T is updated as:
Where α (p (Δt)) is the gaussian fitting error rate;
repositioning vanishing personnel: depending on the topology, a visitor disappearing at time T in the exit area is expected to appear around time (t+t) in the entrance area of the other camera; searching the corresponding relation of the person in the topological structure, wherein the time slot center of the corresponding relation is close to (t+T), and the corresponding relation is used as a reliable corresponding relation and the transfer distribution is updated when the appearance characteristic similarity and/or the human body posture key information similarity of the corresponding relation and the time slot center are/is larger than a preset value;
repeating the steps until the transfer distribution converges, executing the process between all the exhibition areas in the link topology structure until the topology structure is not changed any more or the change amount in a plurality of iterative processes is in a preset range, and stopping the iterative process.
8. The method for intelligently collecting student data according to claim 7, wherein the generating the activity data of the visitors according to the personnel track matching information and the pre-established exhibition hall camera link topology comprises:
executing path association among different cameras according to personnel track matching information, wherein the paths needing to be associated are defined asVirtual track and identity ID of target visitor are put into search pool +. >And executing the following procedures:
according to the virtual track of the visitorAnd a priori generating candidate path library of camera link topology of exhibition hall +.>
From the candidate path librarySelecting the path with highest similarity +.>And search library->And (3) associating the paths to generate the activity path information of the visitors, wherein the activity path information comprises a time stamp, an ID and a visiting area number.
9. Student data intelligence collection system based on class human eye, its characterized in that includes:
the data acquisition unit comprises a plurality of cameras which are respectively arranged in different exhibition areas in the exhibition hall, and the cameras are used for acquiring video streams of visitors;
the target detection unit is configured to detect images of the video streams of the visitors acquired by the cameras, generate detection frames of each visitor, and form a detection frame set to be queried;
a feature extraction unit configured to extract appearance features, human body posture features, and virtual trajectories of each visitor from the video stream using a trained feature extraction model;
the target matching unit is configured to match the detection frame with a visitor to be identified based on the appearance characteristics, the human body posture characteristics and the virtual track, and generate personnel track matching information;
And the activity path generation unit is configured to generate activity data of the visitor according to the personnel track matching information and the pre-established exhibition hall camera link topology, wherein the activity data comprises identity information and activity track data of the visitor.
10. The intelligent student data acquisition device of claim 9, wherein the data acquisition unit further comprises a control module and a light intensity sensing module;
the control module is used for controlling the operation of the camera and adaptively adjusting the working parameters of the camera according to the environmental parameters of the exhibition area;
the light intensity sensing module is used for collecting illumination intensity in the exhibition area and switching the camera to work in a visible light mode collection mode or an infrared collection mode according to the illumination intensity.
CN202311030823.4A 2023-08-15 2023-08-15 Student data intelligent acquisition method and device based on human eyes Pending CN117037035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311030823.4A CN117037035A (en) 2023-08-15 2023-08-15 Student data intelligent acquisition method and device based on human eyes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311030823.4A CN117037035A (en) 2023-08-15 2023-08-15 Student data intelligent acquisition method and device based on human eyes

Publications (1)

Publication Number Publication Date
CN117037035A true CN117037035A (en) 2023-11-10

Family

ID=88638781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311030823.4A Pending CN117037035A (en) 2023-08-15 2023-08-15 Student data intelligent acquisition method and device based on human eyes

Country Status (1)

Country Link
CN (1) CN117037035A (en)

Similar Documents

Publication Publication Date Title
CN108921051B (en) Pedestrian attribute identification network and technology based on cyclic neural network attention model
Khan et al. Analyzing crowd behavior in naturalistic conditions: Identifying sources and sinks and characterizing main flows
Benezeth et al. Towards a sensor for detecting human presence and characterizing activity
KR102462934B1 (en) Video analysis system for digital twin technology
US20150350608A1 (en) System and method for activity monitoring using video data
WO2008070206A2 (en) A seamless tracking framework using hierarchical tracklet association
CN108960184A (en) A kind of recognition methods again of the pedestrian based on heterogeneous components deep neural network
KR20170077366A (en) System and method for face recognition
KR20160132731A (en) Device and method for tracking pedestrian in thermal image using an online random fern learning
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera
Sairam et al. Automated vehicle parking slot detection system using deep learning
Acharya et al. Real-time detection and tracking of pedestrians in CCTV images using a deep convolutional neural network
CN112989889A (en) Gait recognition method based on posture guidance
US20170053172A1 (en) Image processing apparatus, and image processing method
CN115410222A (en) Video pedestrian re-recognition network with posture sensing function
Li et al. Extraction and modelling application of evacuation movement characteristic parameters in real earthquake evacuation video based on deep learning
JP2016143335A (en) Group mapping device, group mapping method, and group mapping computer program
CN117237879B (en) Track tracking method and system
Thakur et al. Autonomous pedestrian detection for crowd surveillance using deep learning framework
Shuai et al. Traffic modeling and prediction using camera sensor networks
Maheshwari et al. A review on crowd behavior analysis methods for video surveillance
CN114372996B (en) Pedestrian track generation method for indoor scene
US20230076241A1 (en) Object detection systems and methods including an object detection model using a tailored training dataset
CN117037035A (en) Student data intelligent acquisition method and device based on human eyes
Parsola et al. Automated system for road extraction and traffic volume estimation for traffic jam detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination