CN114998815A

CN114998815A - Traffic vehicle identification tracking method and system based on video analysis

Info

Publication number: CN114998815A
Application number: CN202210931367.XA
Authority: CN
Inventors: 岳建明; 杨睿; 杨冬俊
Original assignee: Jiangsu Sanleng Smartcity&iot System Co ltd
Current assignee: Jiangsu Sanleng Smartcity&iot System Co ltd
Priority date: 2022-08-04
Filing date: 2022-08-04
Publication date: 2022-09-02
Anticipated expiration: 2042-08-04
Also published as: CN114998815B

Abstract

The invention discloses a traffic vehicle identification tracking method and a system based on video analysis, wherein the method comprises the following steps: step 1, extracting a first video and a second video, wherein the first video and the second video have space-time relevance; step 2, aiming at the first video, vehicle identification is carried out; step 3, vehicle identification is carried out on the second video; and 4, marking the same vehicle with the same license plate successfully identified on the map, performing marking point connection on the map according to the space-time relevance of the first video and the second video to obtain a vehicle running track, and finishing vehicle tracking according to the vehicle running track. The method improves the efficiency of vehicle detection, identification and tracking by combining the improved switch transform deep learning model and the AlexNet model trained under the MapReduce framework.

Description

Traffic vehicle identification tracking method and system based on video analysis

Technical Field

The invention relates to the technical field of image processing, in particular to a traffic vehicle identification tracking method and system based on video analysis.

Background

In the conventional vehicle recognition technology, the vehicle is recognized by collecting vehicle-related image data and then statistically analyzing and processing the collected vehicle images by using a trained single model, and in the processing method, the vehicle is mostly tracked by recognizing the whole license plate image, for example: CN107273896A discloses a license plate detection and recognition method based on image recognition, wherein a vehicle recognition process comprises moving vehicle detection, image preprocessing, license plate positioning, character segmentation, character recognition and the like, a background difference method is adopted for extraction in the vehicle detection process, and a mixed Gaussian background model is adopted for background updating in the background difference method; in the image binarization preprocessing process, an edge detection binarization method is adopted for preprocessing, in the license plate positioning process, a license plate positioning method based on edge detection and priori knowledge is adopted for positioning the license plate, and in the character recognition process, a BP neural network character recognition method is adopted for completing the recognition of all license plate characters. However, the method uses a single model, and cannot overcome the defect of the used single model, and in addition, the efficiency can be greatly reduced by directly detecting the whole license plate, the vehicle cannot be detected and tracked quickly and in real time, and the accuracy is not high.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a traffic vehicle recognition and tracking method and system based on video analysis aiming at the defects of single use model, low recognition efficiency, low accuracy and low real-time property in the traditional vehicle recognition and tracking technology.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:

a traffic vehicle identification tracking method based on video analysis comprises the following steps:

step 1, extracting a first video and a second video, wherein the first video and the second video have space-time relevance.

Step 2, aiming at the first video, vehicle identification is carried out, wherein the method comprises the following steps:

step 2.1, acquiring a video frame image of a first video at a frame rate of 1 frame per second, and normalizing to obtain a first video frame image;

step 2.2, automatically identifying the first video frame image in the step 2.1 by using an improved swin transform deep learning model to obtain a gray level image 1 containing a vehicle reference area;

2.3, constructing an image segmentation and fusion model, and rapidly partitioning and segmenting the gray level image 1 of the vehicle reference region by using the image segmentation and fusion model to obtain a license plate reference gray level image 1 in the video image;

step 2.4, training an AlexNet model by using a MapReduce framework to generate AlexNet for image set classification; recognizing and classifying license plate reference images 1 corresponding to all partition images in the license plate reference gray level image 1 by using a MapReduce combined trained AlexNet model to obtain license plate targets, and performing symbol identification on the license plate targets in the first video frame image; the license plate reference image 1 is an image of a corresponding position of the license plate reference gray-scale image 1 in an original first video frame;

and 2.5, when the number of frames in the first video stream, in which the symbol marks of the license plate targets continuously appear at the same or nearby positions, reaches a preset value, successfully identifying the vehicle.

And 3, aiming at the second video, carrying out vehicle identification, wherein the vehicle identification comprises the following steps:

step 3.1, acquiring a video frame image of a second video at a frame rate of 1 frame per second, and normalizing to obtain a second video frame image;

3.2, automatically identifying the second video frame image by using an improved swin transform deep learning model to obtain a gray image 2 containing a vehicle reference area;

3.3, constructing an image segmentation and fusion model, and rapidly segmenting the gray level image 2 of the vehicle reference region by using the image segmentation and fusion model to obtain a license plate reference gray level image 2 in the video image;

step 3.4, training an AlexNet model by using a MapReduce framework, and generating AlexNet aiming at image set classification; identifying and classifying license plate reference images 2 corresponding to the sub-images in the license plate reference gray level image 2 by using a MapReduce combined with the trained AlexNet model to obtain license plate targets, and marking the license plate targets in the second video frame image; the license plate reference image 2 is an image of a corresponding position of the license plate reference gray level image 2 in the original second video frame;

and 3.5, when the number of the continuously appeared frames of the symbol marks of the license plate targets at the same or nearby positions in the second video stream reaches a preset value, successfully identifying the vehicles.

And 4, marking the same license plate vehicle successfully identified in the step 2.5 and the step 3.5 on a map, performing marking point connection on the map according to the space-time relevance of the first video and the second video to obtain a vehicle running track, and finishing vehicle tracking according to the vehicle running track.

Further, the improved swin transformer deep learning model specifically comprises: an attention mechanism module was introduced in Swin Transformer, and a multiscale mixed convolution was introduced in PatchEmBed in Swin Transformer.

Further, in the step 2.2, the first video frame image in the step 2.1 is automatically identified by using an improved swin transform deep learning model, so as to obtain a grayscale image 1 including a vehicle reference region, which specifically includes: and inputting the normalized first video frame image into an improved swin transform deep learning model for detection, obtaining an image of a vehicle reference region corresponding to the video frame, and then performing binarization, wherein the pixel value of a vehicle region is set to be 1, and the pixel value of a non-vehicle region is set to be 0 in the binary image.

Step 3.2, automatically identifying the second video frame image by using an improved swin transform deep learning model to obtain a gray image 2 containing a vehicle reference region, and specifically comprising the following steps: and inputting the normalized second video frame image into an improved swin transform deep learning model for detection, obtaining an image of a vehicle reference region corresponding to the video frame, and then performing binarization, wherein the pixel value of a vehicle region is set to be 1, and the pixel value of a non-vehicle region is set to be 0 in the binary image.

Further, the spatiotemporal relevance specifically includes a temporal order and a spatial position relationship.

Further, the image segmentation and fusion model consists of a VGG network and U-Net, wherein the VGG network consists of layer1, layer2, layer3, layer4 and layer5 of VGG 16.

Further, the Loss function Loss of the image segmentation and fusion model is customized as follows:

obtaining an optimal network parameter through a minimization loss function; where pred represents a set of predicted values, true represents a set of true values, α, γ are adjustment coefficients, α =0.5, y represents a label, n represents the number of categories, if a category i =1, y _ i =1, otherwise y _ i =0, p _ i represents a probability value output by the corresponding category, and L _1 represents a mean square error.

Further, the method includes the steps of utilizing a MapReduce to combine with a trained AlexNet model to recognize and classify license plate reference images 1 corresponding to all partition images in a license plate reference gray image 1 to obtain license plate targets, and marking the license plate targets in the first video frame image by symbols, and specifically includes the following steps: and finding license plate reference images 1 of the corresponding positions of all subarea images in the license plate reference gray-scale image 1 in the first video frame, normalizing all license plate reference images 1, inputting the normalized license plate reference images into an AlexNet model, identifying all license plate target positions, and carrying out symbol identification on all license plate target positions in the first video frame image.

Identifying and classifying the license plate reference images 2 corresponding to the sub-images in the license plate reference gray level image 2 by using a MapReduce combined with the trained AlexNet model to obtain license plate targets, and marking the license plate targets with symbols in the second video frame image, wherein the method specifically comprises the following steps: and finding license plate reference images 2 of the corresponding positions of all the subarea images in the license plate reference gray level images 2 in the second video frame, normalizing all the license plate reference images 2, inputting the normalized license plate reference images into an AlexNet model, identifying all license plate target positions, and performing symbol identification on all the license plate target positions in the second video frame images.

Based on the same inventive concept, the invention discloses a traffic vehicle identification and tracking system based on video analysis, which is used for realizing the traffic vehicle identification and tracking method based on video analysis, and specifically comprises the following steps:

the extraction module is used for extracting a first video and a second video, and the first video and the second video have space-time relevance.

The identification module 1 is configured to perform vehicle identification on a first video, and includes:

the acquisition module 1 is used for acquiring a video frame image of a first video at a frame rate of 1 frame per second and carrying out normalization to obtain a first video frame image;

the improvement module 1 is used for automatically identifying the first video frame image in the step 2.1 by using an improved swin transform deep learning model to obtain a gray level image 1 containing a vehicle reference area;

the segmentation module 1 is used for constructing an image segmentation fusion model, and performing fast partition segmentation on the gray level image 1 of the vehicle reference region by using the image segmentation fusion model to obtain the license plate reference gray level image 1 in the video image.

The classification module 1 is used for training an AlexNet model by utilizing a MapReduce framework and generating AlexNet for image set classification; identifying and classifying license plate reference images 1 corresponding to all partition images in the license plate reference gray level image 1 by using a MapReduce combined trained AlexNet model to obtain license plate targets, and marking the license plate targets in the first video frame image; the license plate reference image 1 is an image of a corresponding position of the license plate reference gray-scale image 1 in the original first video frame.

The detection module 1 is used for successfully identifying the vehicle when the number of frames in the first video stream, in which the symbol marks of the license plate target continuously appear at the same or nearby positions, reaches a preset value;

the identification module 2 is configured to perform vehicle identification on the second video, and includes:

and the acquisition module 2 is used for acquiring the video frame image of the second video at the frame rate of 1 frame per second, and normalizing the video frame image to obtain the second video frame image.

The improvement module 2 is used for automatically identifying the second video frame image by using an improved swin transform deep learning model to obtain a gray level image 2 containing a vehicle reference area;

and the segmentation module 2 is used for constructing an image segmentation fusion model, and rapidly segmenting the gray level image 2 of the vehicle reference region by using the image segmentation fusion model to obtain the license plate reference gray level image 2 in the video image.

The classification module 2 is used for training an AlexNet model by using a MapReduce framework and generating AlexNet for image set classification; identifying and classifying license plate reference images 2 corresponding to the sub-images in the license plate reference gray level image 2 by using a MapReduce combined with the trained AlexNet model to obtain license plate targets, and marking the license plate targets in the second video frame image; the license plate reference image 2 is an image of a corresponding position of the license plate reference gray-scale image 2 in the original second video frame.

And the detection module 2 is used for successfully identifying the vehicle when the number of frames in the second video stream, in which the symbol identifications of the license plate target continuously appear at the same or nearby positions, reaches a preset value.

And a tracking module, configured to mark the same license plate vehicle successfully identified in the step 2.5 and the step 3.5 on a map, perform mark point connection on the map according to the space-time correlation of the first video and the second video, obtain a vehicle travel track, and complete vehicle tracking according to the vehicle travel track.

Based on the same inventive concept, the invention discloses a traffic vehicle identification and tracking system based on video analysis, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the traffic vehicle identification and tracking method based on video analysis when being loaded to the processor.

Compared with the prior art, the invention has the beneficial effects that:

1. the method creatively utilizes the swin transformer deep learning model, increases the receptive field and the network depth, inhibits the interference background information, and extracts more abundant characteristic information, thereby enhancing the visual representation capability and improving the identification speed and accuracy.

2. By means of fusing the improved swin transformer deep learning model and the AlexNet model, the defects of interference and single use model are effectively overcome.

3. Because the swin transformer has the capacity of large throughput and large-scale parallel processing, and the MapReduce framework can provide the distributed computing capacity, the distributed parallel processing of the image can be realized at each stage of the image processing by combining the two technologies, and the efficiency of vehicle/license plate detection, identification and tracking is greatly improved.

4. The gray level images of the vehicle reference regions are partitioned by the aid of the image partitioning and fusing model, and then the license plate reference images corresponding to the partitioned images are recognized and classified by the aid of the AlexNet model, so that detection of vehicles is achieved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of an image segmentation fusion model.

Detailed Description

An embodiment of the present invention will be described in detail below with reference to the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the embodiment.

As shown in fig. 1-2, the embodiment provides a method for identifying and tracking a transportation vehicle based on video analysis, which includes the following steps:

Specifically, the first video and the second video are both monitoring videos related to a road, the monitoring videos can be cut into corresponding lengths to facilitate subsequent processing and analysis, and the time-space correlation specifically includes a time sequence and a spatial position relationship.

specifically, the method further includes preprocessing the acquired video frame image of the first video before performing the normalization, and the preprocessing operation may specifically include gaussian filtering processing.

specifically, the improved swin transformer deep learning model specifically comprises: the MSA and MLP layers in Swin Transformer add channel attention modules and multiscale mixed convolutions are introduced in the patchemped of Swin Transformer.

specifically, the image segmentation and fusion model is composed of a VGG network and U-Net, wherein the VGG network is composed of layer1, layer2, layer3, layer4 and layer5 of VGG 16.

Step 2.4, training an AlexNet model by using a MapReduce framework, and generating AlexNet aiming at image set classification; identifying and classifying license plate reference images 1 corresponding to all partition images in the license plate reference gray level image 1 by using a MapReduce combined trained AlexNet model to obtain license plate targets, and marking the license plate targets in the first video frame image; the license plate reference image 1 is an image of a corresponding position of the license plate reference gray-scale image 1 in the original first video frame.

Specifically, MapReduce is used as a parallel program design model and method, provides a simple parallel program design method, realizes basic parallel computing tasks by using two functions of Map and Reduce, and provides an abstract operation and parallel programming interface so as to simply and conveniently complete the programming and computing processing of large-scale data. The distributed computing capacity provided by the MapReduce framework is fully utilized to improve the efficiency of vehicle/license plate detection, identification and tracking.

Specifically, if the same symbol mark appears in 20 consecutive images, it indicates that the vehicle identification is successful.

Specifically, successfully identified vehicle positions are respectively marked on a Baidu map or a Gade map, marking points of the vehicle are sequentially connected based on a time sequence and a spatial position relation, so that a vehicle running track is obtained, and the vehicle is tracked according to the vehicle running track.

Based on the same inventive concept, the embodiment of the invention discloses a traffic vehicle identification and tracking system based on video analysis, which is used for realizing the traffic vehicle identification and tracking method based on video analysis, and specifically comprises the following steps:

the acquisition module 1 is configured to acquire a video frame image of a first video at a frame rate of 1 frame per second, and perform normalization to obtain the first video frame image.

And the improvement module 1 is configured to automatically identify the first video frame image in the step 2.1 by using an improved swin transformer deep learning model, so as to obtain a grayscale image 1 including a vehicle reference region.

The detection module 1 is used for successfully identifying the vehicle when the number of frames in the first video stream, in which the symbol marks of the license plate target continuously appear at the same or nearby positions, reaches a preset value.

And the improvement module 2 is used for automatically identifying the second video frame image by using an improved swin transformer deep learning model to obtain a gray image 2 containing a vehicle reference area.

The classification module 2 is used for training an AlexNet model by utilizing a MapReduce framework and generating AlexNet for image set classification; identifying and classifying license plate reference images 2 corresponding to the sub-images in the license plate reference gray level image 2 by using a MapReduce combined with the trained AlexNet model to obtain license plate targets, and marking the license plate targets in the second video frame image; the license plate reference image 2 is an image of a corresponding position of the license plate reference gray-scale image 2 in the original second video frame.

And the detection module 2 is used for successfully identifying the vehicle when the number of frames continuously appearing at the same or nearby position of the symbol mark of the license plate target in the second video stream reaches a preset value.

Based on the same inventive concept, the embodiment of the invention discloses a traffic vehicle identification and tracking system based on video analysis, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the traffic vehicle identification and tracking method based on video analysis when being loaded to the processor.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.

Claims

1. A traffic vehicle identification and tracking method based on video analysis is characterized by comprising the following steps:

step 1, extracting a first video and a second video, wherein the first video and the second video have space-time relevance;

step 2, aiming at the first video, vehicle identification is carried out, wherein the vehicle identification comprises the following steps:

step 2.4, training an AlexNet model by using a MapReduce framework, and generating AlexNet aiming at image set classification; identifying and classifying license plate reference images 1 corresponding to all partition images in the license plate reference gray level image 1 by using a MapReduce combined trained AlexNet model to obtain license plate targets, and marking the license plate targets in the first video frame image; the license plate reference image 1 is an image of a corresponding position of the license plate reference gray-scale image 1 in an original first video frame;

step 2.5, when the number of frames in the first video stream, in which the symbol marks of the license plate target continuously appear at the same or nearby positions, reaches a preset value, the vehicle identification is successful;

step 3.4, training an AlexNet model by using a MapReduce framework, and generating AlexNet aiming at image set classification; recognizing and classifying the license plate reference image 2 corresponding to each sub-image in the license plate reference gray image 2 by using a MapReduce combined trained AlexNet model to obtain a license plate target, and performing symbol identification on the license plate target in the second video frame image; the license plate reference image 2 is an image of a corresponding position of the license plate reference gray level image 2 in the original second video frame;

step 3.5, when the number of frames continuously appearing at the same or nearby positions of the symbol marks of the license plate targets in the second video stream reaches a preset value, successfully identifying the vehicles;

and 4, marking the same license plate vehicle identified successfully in the step 2.5 and the step 3.5 on a map, connecting marking points on the map according to the space-time relevance of the first video and the second video to obtain a vehicle running track, and finishing vehicle tracking according to the vehicle running track.

2. The method for identifying and tracking the transportation vehicles based on the video analysis as claimed in claim 1, wherein the improved swin transformer deep learning model comprises: an attention mechanism module was introduced in Swin Transformer, and a multiscale mixed convolution was introduced in PatchEmBed in Swin Transformer.

3. The method for identifying and tracking the transportation vehicles based on the video analysis as claimed in claim 1, wherein in the step 2.2, the first video frame image in the step 2.1 is automatically identified by using an improved swin transformer deep learning model to obtain a gray image 1 including a vehicle reference region, and the method specifically comprises: inputting the normalized first video frame image into an improved swin transform deep learning model for detection, obtaining an image of a vehicle reference region corresponding to the video frame, and then performing binarization, wherein the pixel value of a vehicle region in the binary image is set to be 1, and the pixel value of a non-vehicle region is set to be 0;

4. The method as claimed in claim 1, wherein the spatiotemporal correlation specifically includes a temporal sequence and a spatial position relationship.

5. The method for identifying and tracking the transportation vehicles based on the video analysis as claimed in claim 1, wherein the image segmentation fusion model is composed of a VGG network and U-Net, wherein the VGG network is composed of layer1, layer2, layer3, layer4 and layer5 of VGG 16.

6. The method for identifying and tracking the transportation vehicles based on the video analysis as claimed in claim 1 or 5, wherein the Loss function Loss of the image segmentation and fusion model is self-defined as:

7. The traffic vehicle recognition and tracking method based on video analysis according to claim 1, wherein the license plate reference images 1 corresponding to each partition image in the license plate reference gray level image 1 are recognized and classified by using a MapReduce combined trained AlexNet model to obtain license plate targets, and the license plate targets are marked in the first video frame image, specifically comprising: finding license plate reference images 1 of the corresponding positions of all subarea images in the license plate reference gray level image 1 in a first video frame, normalizing all license plate reference images 1, inputting the normalized license plate reference images into an AlexNet model, identifying all license plate target positions, and carrying out symbol identification on all license plate target positions in the first video frame image;

8. A traffic vehicle identification and tracking system based on video analysis, which is used for implementing a traffic vehicle identification and tracking method based on video analysis according to any one of claims 1-7, and comprises:

the extraction module is used for extracting a first video and a second video, and the first video and the second video have space-time relevance;

the segmentation module 1 is used for constructing an image segmentation fusion model, and performing fast partition segmentation on the gray level image 1 of the vehicle reference region by using the image segmentation fusion model to obtain a license plate reference gray level image 1 in a video image;

the classification module 1 is used for training an AlexNet model by utilizing a MapReduce framework and generating AlexNet for image set classification; identifying and classifying license plate reference images 1 corresponding to all partition images in the license plate reference gray level image 1 by using a MapReduce combined trained AlexNet model to obtain license plate targets, and marking the license plate targets in the first video frame image; the license plate reference image 1 is an image of a corresponding position of the license plate reference gray-scale image 1 in an original first video frame;

the acquisition module 2 is used for acquiring a video frame image of a second video at a frame rate of 1 frame per second, and normalizing the video frame image to obtain a second video frame image;

the segmentation module 2 is used for constructing an image segmentation fusion model, and rapidly segmenting the gray level image 2 of the vehicle reference region by using the image segmentation fusion model to obtain a license plate reference gray level image 2 in the video image;

the classification module 2 is used for training an AlexNet model by utilizing a MapReduce framework and generating AlexNet for image set classification; recognizing and classifying the license plate reference image 2 corresponding to each sub-image in the license plate reference gray image 2 by using a MapReduce combined trained AlexNet model to obtain a license plate target, and performing symbol identification on the license plate target in the second video frame image; the license plate reference image 2 is an image of a corresponding position of the license plate reference gray-scale image 2 in an original second video frame;

the detection module 2 is used for successfully identifying the vehicle when the number of frames in the second video stream, in which the symbol marks of the license plate target continuously appear at the same or nearby positions, reaches a preset value;

and the tracking module is used for marking the same license plate vehicle identified successfully in the step 2.5 and the step 3.5 on a map, performing marking point connection on the map according to the space-time relevance of the first video and the second video to obtain a vehicle running track, and completing vehicle tracking according to the vehicle running track.

9. The video analysis-based traffic vehicle identification and tracking system according to claim 8, wherein the improved swin transformer deep learning model comprises: an attention mechanism module is introduced in Swin Transformer, and a multiscale mixed convolution is introduced in PatchEmBed of Swin Transformer.

10. A system for identifying and tracking vehicles based on video analysis, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the computer program, when loaded into the processor, implements a method for identifying and tracking vehicles based on video analysis according to any one of claims 1-7.