CN113033468A

CN113033468A - Specific person re-identification method based on multi-source image information

Info

Publication number: CN113033468A
Application number: CN202110397163.8A
Authority: CN
Inventors: 庄杰栋; 郑恩辉
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2021-04-13
Filing date: 2021-04-13
Publication date: 2021-06-25

Abstract

The invention discloses a specific person re-identification method based on multi-source image information. Arranging a plurality of fixed cameras and an unmanned aerial vehicle with a cloud platform camera in the same scene, and arranging an image cache queue to acquire and store a plurality of paths of camera videos synchronously; detecting the pedestrians in the images in the video in real time by adopting a one-stage target detection network and storing the detected pedestrians in a cache region; and constructing a pedestrian re-identification network to process and obtain the re-identified Euclidean distance, and drawing and displaying the personnel position frame after sequencing. The invention sets the mode of image cache queue by evaluating the efficiency of the neural network algorithm, can effectively and stably obtain the monitoring video image in the actual scene, can adapt to different deep learning algorithms to improve and perfect the scheme, and improves the re-identification efficiency.

Description

Specific person re-identification method based on multi-source image information

Technical Field

The invention belongs to the field of video monitoring, and particularly relates to a specific person re-identification method based on multi-source image information.

Background

Pedestrian re-identification is a very hot research topic in recent years in the field of computer vision and can be seen as a sub-problem of image retrieval, which aims to give a monitored pedestrian image to retrieve the pedestrian image under other devices. The traditional method depends on manual characteristics and cannot adapt to complex environments with large data volumes. With the development of deep learning in recent years, a large number of pedestrian re-recognition methods based on deep learning have been proposed.

According to the network training loss classification, the pedestrian re-identification method can be divided into two types of characterization learning and metric learning. The characterization learning and the metric learning have respective advantages and disadvantages, and the academic and industrial circles gradually begin to combine two learning losses. And on the basis of the traditional metric learning method, a full connection layer is added behind the feature layer to perform ID classification learning. The network simultaneously optimizes the characterization learning loss and the metric learning loss to jointly optimize the feature layer.

According to the type of the network output features, the pedestrian re-identification method can be divided into methods based on global features and local features. The method for fusing the global features and the local features is a very common means for improving network performance at present, and the common idea for fusing the global features and the local features at present is to respectively extract features for a global module and a local module and then splice the global features and the local features together to serve as final features. The Spindle net extracts the global features and the local features of n different scales, and then the global features and the local features are fused into final image features for carrying out final similarity measurement. The AlignedReID provides another fusion method, namely, the global characteristic distance and the local characteristic distance of the two images are respectively calculated, and then weighted sum is taken as the distance between the two images in the characteristic space finally.

The pedestrian re-recognition method can be classified into a method based on a single frame image and a video sequence according to network input data. The video sequence-based approach can solve the disadvantage of insufficient information for a single frame of image and can incorporate motion information to enhance robustness, however, it is computationally inefficient because multiple images are processed at a time. Of course, most of the methods based on video sequences are extension extensions of the single-frame image method, so the method for developing single-frame images is also beneficial to the method for developing video sequences.

In reality, the identification requirement and the identification performance of pedestrian re-identification are limited by various practical factors, firstly, the pedestrian re-identification is to acquire a panoramic image in a monitoring video scene, and the pedestrian re-identification is on the premise that a pedestrian target contained in a monitoring picture is detected by using a detection network and then similarity matching between pedestrians is performed. The accumulated sum of the operation efficiency of the existing detection network and the existing re-identification network is not enough to support the real-time pedestrian re-identification operation in a video scene; secondly, the recognition performance of pedestrian re-recognition is limited by self factors and environmental factors. Considering self factors, the detection time can be prolonged by a high-resolution camera or an image shot under a complex situation, and the detection precision can be reduced by a low-resolution camera, a pedestrian target shot in a long distance or the shake of the camera; on the other hand, the characteristics of gradual change of pixel values caused by illumination and shielding also influence the detection precision. Considering from environmental factors, the situation of similar dressing, similar background or serious shielding and the like of moving pedestrians greatly weakens the difference among the characteristics of the identified objects, and the identification capability of re-identification is further limited.

Some existing pedestrian re-identification methods still have the problem that real-time processing cannot be performed on video monitoring, if the real-time processing cannot be completed, the pedestrian re-identification is applied to an actual scene, so that unprocessed monitoring videos are easily overstocked, and finally, problems (such as delay, memory overflow and the like) occur in storage or re-identification results. Therefore, these methods are difficult to be applied to practical scenes. The processing of the surveillance video by a plurality of recognition devices will undoubtedly increase the hardware cost significantly. In addition, the number of monitoring videos in an actual scene is limited, and a specific area cannot be covered often, so that the difficulty of re-identification and positioning of specific personnel is greatly increased in the situation.

Disclosure of Invention

In order to solve the problems, the invention provides a specific person re-identification method based on multi-source image information, which is used for carrying out real-time pedestrian re-identification on a monitoring video and an unmanned aerial vehicle video in an actual scene.

The technical scheme adopted by the invention is as follows:

step 1, arranging a plurality of fixed cameras and an unmanned aerial vehicle in the same scene, wherein a holder camera is arranged on the unmanned aerial vehicle, the fixed cameras and the holder camera jointly form a camera, an image cache queue is arranged, and videos of a plurality of paths of cameras are acquired and stored synchronously;

step 2, carrying out real-time pedestrian detection on the acquired images in the videos of the multiple cameras by adopting a one-stage target detection network and storing the result into a cache region;

and 3, constructing a pedestrian re-identification network, reading pedestrian data in the cache region, processing to obtain re-identified Euclidean distances, sequencing the re-identified Euclidean distances, and drawing and displaying a personnel position frame according to a sequencing result.

The multi-source image information of the invention is derived from a monitoring camera fixed on the ground and a camera on an unmanned aerial vehicle moving in the air, and the real-time position of a specific person is positioned according to the combination of the ground monitoring camera and the video of the unmanned aerial vehicle.

In the invention, the specific person can be a person in a specific profession such as a thief or a person in a specific person.

The step 1 comprises the following substeps:

step 1-1, respectively acquiring monitoring videos of a plurality of paths of fixed cameras to a pc end through an rtsp protocol for decoding, and simultaneously transmitting image videos of the pan-tilt camera to the mobile phone end through image transmission by the unmanned aerial vehicle for encoding, and then transmitting the encoded images to the pc end through a formulated tcp protocol for decoding; the PC end is a PC computer.

Step 1-2, establishing a thread and an image cache queue for each path of video of the camera at the pc end, storing each frame image in the video under each path of camera in the image cache queue of the pc end according to a time sequence, correspondingly processing the image cache queue under each path of camera by the thread under each path of camera, and reading the image cache queue with the minimum frame number in all the image cache queues at intervals as a reference image cache queue for synchronizing the video time of each camera;

step 1-3, unifying the resolution of the image of each camera by using a bilinear interpolation method;

and 1-4, establishing a thread pool by all threads, so that the videos of each camera are processed together and displayed at a fixed position in a display interface.

The step 2 comprises the following sub-steps:

step 2-1, selecting a YOLOv3 neural network as a basic model of a target detection network, and adding a spatial pyramid pooling structure (spp structure) on the basis of the YOLOv3 neural network to form a final target detection network, so that the target detection network can detect targets with different scales, and the detection precision is improved;

the spatial pyramid pooling structure is connected between a backbone network and a head network of a YOLOv3 neural network. The backbone network is a Darknet network, and the head network is a classified network.

And 2-2, loading the target detection network on a coco2017 data set to pre-train the weight of the network, performing preprocessing such as size adjustment and normalization on the images in the videos of the multiple paths of cameras acquired in the step 1, inputting the images into a network model to perform prediction processing, obtaining a prediction frame of the pedestrian category, extracting the prediction frame of the pedestrian category as a pedestrian image, and storing the pedestrian image in a cache region.

The step 3 comprises the following substeps:

step 3-1, selecting a lightweight trunk network osnet as a trunk of a pedestrian heavy identification network, dividing the head of the pedestrian heavy identification network into two branches of a local branch and a global branch, respectively extracting the local branch and the global branch to obtain local features and global features, adopting a PCB (printed circuit board) structure for the local branch, averagely dividing a feature diagram output by the local branch into four parts which are matched with a human body topological structure from top to bottom, such as the head, an upper trunk, a lower trunk and feet, and finally splicing the outputs of the global branch and the local branch;

therefore, the head of the pedestrian re-recognition network adopts fusion processing of local features and global features, the neural network can be tried to extract features with higher robustness, and the accuracy of re-recognition is improved.

Step 3-2, classifying the spliced feature results, and using the cross entropy and the triple loss function as classification loss functions to train and supervise the pedestrian re-identification network together;

3-3, processing the pedestrian image in the cache region in the step 2 by the trained pedestrian re-identification network to obtain a feature vector of each pedestrian of the pedestrian image, and calculating Euclidean distance between each feature vector and a feature vector of a specific person obtained in advance;

and 3-4, sequencing the Euclidean distances obtained by re-recognition, and drawing and displaying the personnel position frame according to the sequencing result.

The feature vector of the specific person obtained in advance is obtained by inputting a picture or a portrait of the known specific person into a target detection network and then inputting the picture or the portrait into a pedestrian re-identification network.

In the invention, the target detection in the step 2-2 and the re-identification in the step 3-3 are respectively a thread, namely the target detection and the re-identification are carried out simultaneously, and the target detection and the re-identification are carried out through the cache region to access data. The method realizes multithreading, improves the operation efficiency, greatly improves the real-time performance, and ensures that the frame number of the video processed in real time can reach 15fps, which exceeds the prior methods.

Video information comes from the ground surveillance camera head and aerial unmanned aerial vehicle, according to the video combination of ground surveillance camera head and control unmanned aerial vehicle, fixes a position specific personnel's real-time position.

The invention has the following functions and effects:

according to the specific personnel re-identification method based on the multi-source image information, because the mode of the image cache queue is set by evaluating the efficiency of the neural network algorithm, the monitoring video image can be effectively and stably obtained in an actual scene, and meanwhile, the method can be suitable for improving and perfecting the scheme by different deep learning algorithms.

Furthermore, compared with other mainstream re-identification networks, the invention adopts a light-weight level backbone network and a global and local feature fused head network, so that the efficiency is ensured on the premise of ensuring the precision, and meanwhile, the multi-thread use ensures that the method can achieve the re-identification efficiency of 15fps under the condition of simultaneously processing 8 paths of videos, thereby greatly exceeding the current mainstream pedestrian re-identification network and providing a guarantee for real-time identification.

By the specific person re-identification method, the real-time re-identification of the specific person is realized by combining the pedestrian target detection and the pedestrian re-identification, so that the method has a great practical value for the re-identification of the pedestrian in the image scene, the video scene and even the unmanned aerial vehicle aerial shooting video scene acquired in the daily scene.

Drawings

FIG. 1 is a flow chart of a method for re-identifying a particular person in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a pedestrian re-identification network in an embodiment of the present invention;

FIG. 3 is a flow chart of a specific person re-identification method algorithm in an embodiment of the present invention;

FIG. 4 is a diagram illustrating re-recognition effects of a specific person according to an embodiment of the present invention.

Detailed Description

In order to make the technical means, the creation features, the achievement purposes and the effects of the invention easy to understand, specific people re-identification methods of the invention are specifically described below with reference to the embodiment and the attached drawings.

The examples of the invention are as follows:

in this embodiment, the method for re-identifying the specific person is implemented by a computer in which eight monitoring cameras are connected with one unmanned aerial vehicle camera, and the computer can acquire monitoring videos shot by the eight monitoring cameras and videos shot by one unmanned aerial vehicle in real time and run a real-time pedestrian re-identification algorithm to process the videos in real time. In this example, the computer acquires real-time monitoring videos by acquiring rtsp addresses of eight cameras, and receives aerial videos returned by the unmanned aerial vehicle end through a tcp protocol.

Fig. 1 is a flowchart of a method for re-identifying a specific person in an embodiment of the present invention.

As shown in fig. 1, the implementation of the method for re-identifying a specific person in this embodiment includes the following steps:

step 1, setting an image cache queue, synchronizing eight paths of camera images and simultaneously acquiring a video of an unmanned aerial vehicle end. In this embodiment, the step 1 includes the following substeps:

step 1-1, respectively acquiring eight paths of monitoring video images through an rtsp protocol to a pc end for decoding, and simultaneously transmitting the images of a mobile phone end to the pc end by an unmanned aerial vehicle through an image transmission to the mobile phone end for encoding through a formulated tcp protocol and then decoding;

step 1-2, establishing a thread and an image cache queue for each path of video of the camera at the pc, storing each frame image in the video under each path of camera in the image cache queue of the pc according to a time sequence, correspondingly processing the image cache queue under each path of camera by the thread under each path of camera, and reading the image cache queue with the minimum frame number in all the image cache queues as a reference queue at each time interval. Specifically, 8 queues of Q1, Q2.., Q8 are respectively set to buffer video images in real time, the respective lengths of the 8 queues are obtained in the main thread, the shortest frame number Qmin in the 8 queues is obtained, and then images with the frame number Qmin are obtained from the 8 queues so as to synchronize the time of the 8 monitoring videos.

And 1-3, adjusting the resolution of each acquired frame image based on a bilinear interpolation method, wherein the size of an output picture of each video is kept consistent.

And 1-4, starting a thread pool, wherein the capacity of the thread pool is consistent with the number of videos. And newly building an empty image, and averagely dividing the image into a plurality of parts, wherein each part is filled with the image acquired in one thread.

And 2, carrying out real-time pedestrian detection on the image by adopting a one-stage target detection network and storing the result into a cache region.

In step 2 in this embodiment, the one-stage detection network extracts the image features of the pedestrian by using the mainstream YOLO-V3 (in other embodiments, other one-stage detection networks may be used). On the basis, an spp (spatial pyramid pooling) structure is added, so that the network can detect more example objects with larger size difference. Meanwhile, a specific image area of the pedestrians in the picture is obtained and is used for similarity matching between the pedestrians in the pedestrian re-identification stage. The detection speed of the one-stage detection network can reach 30fps/s under the condition of simultaneous detection of 8 paths of videos, so that real-time detection of pedestrian re-identification can be guaranteed.

And 3, constructing a pedestrian re-identification model and reading pedestrian data in the cache region for inference. In this embodiment, a schematic structural diagram of the pedestrian re-identification network model is shown in fig. 2, and step 3 specifically includes the following sub-steps:

step 3-1: selecting a lightweight trunk network osnet as a trunk of a heavy identification network, fusing a local feature and a global feature at a network head, respectively extracting the global feature and the local feature from a global branch and a local branch, wherein the local branch adopts a mainstream PCB structure, the PCB structure divides a feature diagram into 4 parts from top to bottom, the 4 parts are matched with a topological structure of a human body (such as 4 structures of the head, the upper trunk, the lower trunk and the feet), and finally splicing feature vectors output by the global branch and the local branch.

And 3-2, classifying the spliced characteristic results, using the cross entropy as a classification loss function, using the triple loss as a measurement loss function, and performing supervision training on the pedestrian re-identification network together.

And 3-3, inputting the pedestrian image in the cache region in the step 2-3 into a re-recognition model to deduce to obtain a feature vector of each pedestrian, and calculating the Euclidean distance between each obtained feature vector and a feature vector of a specific person obtained in advance.

In this embodiment, an euclidean distance measurement mode is adopted, distances between features after pedestrian re-identification normalization are calculated one by one and ranked, persons of the same ID are minimized based on a triplet loss triple loss function, pedestrian differences of different IDs are maximized, and parameter learning is continuously performed on a re-identification network.

In step 3 of this embodiment, the pedestrian re-identification network and the accessed triple loss function are constructed by using a pytorch deep learning framework, and a non-public data set is applied in model training as a training set, where the data set includes 23 cameras and thousands of pedestrians. The accuracy of the algorithm is tested by computing top1 and map on the test set of the data set.

And 4, sequencing the distance calculation results, setting a distance threshold value and selecting the most matched personnel.

In step 4 of this embodiment, after the re-recognition result of the distance ranking is obtained, the pedestrian with the smallest distance is taken as the pedestrian that is most matched with the specific person. And comparing the minimum distance with a distance threshold value set in an experiment, and if the distance is less than the threshold value, the pedestrian can be regarded as the preset specific person.

And 5, drawing and displaying the personnel position frame according to the sequencing result.

In step 5 of this embodiment, according to the pedestrians and the result frame of target detection of the pedestrians corresponding to the sorting result, the distances between all the pedestrians and the specific person and the result frame are plotted on the image, wherein the pedestrians meeting the condition and smaller than the threshold are marked by an ellipse, as shown in fig. 4.

In addition, in this embodiment, the movement locus of the specific person can be drawn on the prepared map by the closed-loop area formed by the multiple cameras and the position information carried by the person in the enrichment result. And timely alarming can be carried out when a specific person appears in the camera.

The algorithm in the method for re-identifying the specific person can be packaged into a computer program, so that the effects of plug and play and strong portability are achieved. A flow chart of this algorithm is shown in fig. 3.

Examples effects and effects

Because the mode of the image cache queue is set by evaluating the efficiency of the neural network algorithm, the monitoring video image can be effectively and stably obtained in an actual scene, and meanwhile, the method can be suitable for improving and perfecting different deep learning algorithms. Furthermore, compared with other mainstream re-identification networks, the invention adopts a light-weight level backbone network and a global and local feature fused head network, so that the efficiency is ensured on the premise of ensuring the precision, and meanwhile, the use of multiple threads ensures that the method can achieve the re-identification efficiency of 15fps under the condition of simultaneously processing 8 paths of videos, thereby greatly exceeding the current mainstream pedestrian re-identification network and providing guarantee for real-time identification. By the specific person re-identification method, the real-time re-identification of the specific person is realized by combining the pedestrian target detection and the pedestrian re-identification, so that the method has a great practical value for the re-identification of the pedestrian in the image scene, the video scene and even the unmanned aerial vehicle aerial shooting video scene acquired in the daily scene.

The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the description of the above-described embodiments.

Claims

1. A specific person re-identification method based on multi-source image information is characterized by comprising the following steps:

step 2, carrying out real-time pedestrian detection on the acquired images in the videos of the multiple paths of cameras by adopting a one-stage target detection network and storing the result into a cache region;

2. The method for re-identifying the specific personnel based on the multi-source image information as claimed in claim 1, wherein: the step 1 comprises the following substeps:

step 1-1, respectively acquiring monitoring videos of a plurality of paths of fixed cameras to a pc end through an rtsp protocol for decoding, and simultaneously transmitting image videos of the pan-tilt camera to the mobile phone end through image transmission by the unmanned aerial vehicle for encoding, and then transmitting the encoded images to the pc end through a tcp protocol for decoding;

step 1-2, establishing a thread and an image cache queue for each path of video of the camera at the pc end, storing each frame image in the video under each path of camera in the image cache queue of the pc end according to a time sequence, correspondingly processing the image cache queue under each path of camera by the thread under each path of camera, and reading the image cache queue with the minimum frame number in all the image cache queues at intervals to serve as a reference image cache queue;

3. The method for re-identifying the specific personnel based on the multi-source image information as claimed in claim 1, wherein: the step 2 comprises the following sub-steps:

step 2-1, selecting a YOLOv3 neural network as a basic model of a target detection network, and adding a spatial pyramid pooling structure on the basis of the YOLOv3 neural network to form a final target detection network;

4. The method for re-identifying the specific personnel based on the multi-source image information as claimed in claim 1, wherein: the step 3 comprises the following substeps:

step 3-1, selecting a backbone network osnet as a backbone of the pedestrian re-identification network, dividing the head of the pedestrian re-identification network into two branches of a local branch and a global branch, respectively extracting the local branch and the global branch to obtain local features and global features, averagely dividing a feature graph output by the local branch into four parts matched with a human body topological structure from top to bottom, and finally splicing the outputs of the global branch and the local branch;

3-3, processing the pedestrian image in the cache region in the step 2 by the trained pedestrian re-identification network to obtain a feature vector of each pedestrian, and calculating Euclidean distance between each feature vector and a feature vector of a specific person obtained in advance;