CN109190444B - Method for realizing video-based toll lane vehicle feature recognition system - Google Patents

Method for realizing video-based toll lane vehicle feature recognition system Download PDF

Info

Publication number
CN109190444B
CN109190444B CN201810705071.XA CN201810705071A CN109190444B CN 109190444 B CN109190444 B CN 109190444B CN 201810705071 A CN201810705071 A CN 201810705071A CN 109190444 B CN109190444 B CN 109190444B
Authority
CN
China
Prior art keywords
vehicle
target
feature
feature map
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810705071.XA
Other languages
Chinese (zh)
Other versions
CN109190444A (en
Inventor
阮雅端
赵博睿
陈林凯
葛嘉琦
陈启美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810705071.XA priority Critical patent/CN109190444B/en
Publication of CN109190444A publication Critical patent/CN109190444A/en
Application granted granted Critical
Publication of CN109190444B publication Critical patent/CN109190444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07BTICKET-ISSUING APPARATUS; FARE-REGISTERING APPARATUS; FRANKING APPARATUS
    • G07B15/00Arrangements or apparatus for collecting fares, tolls or entrance fees at one or more control points
    • G07B15/06Arrangements for road pricing or congestion charging of vehicles or vehicle users, e.g. automatic toll systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for realizing a video-based toll lane vehicle feature recognition system, which comprises three modules: the system comprises a vehicle detection module, a vehicle tracking module and a vehicle characteristic identification module. The invention uses an SSD target detector for detection, uses a characteristic diagram histogram comparison and distance comparison method for tracking, and carries out vehicle characteristic identification on the characteristic diagram through a convolution neural network. The method can effectively identify the characteristics, can run in real time, reduces the repeated consumption of computing resources and improves the accuracy of the system.

Description

Method for realizing video-based toll lane vehicle feature recognition system
Technical Field
The invention belongs to the technical field of image processing and computer vision detection, relates to application of a target detection algorithm and a deep learning algorithm in vehicle detection, and discloses a method for realizing a toll lane vehicle feature recognition system based on a video.
Background
The construction condition of the expressway in China develops rapidly, and the expressway transportation also becomes one of the main modes of land freight transportation. The highway transportation has the advantages of high speed, stability and the like. However, the toll evasion phenomenon of the expressway toll lane is increasingly serious. Many vehicles are clearly buses, but are equipped with ETC toll devices for cars, and when passing through a toll lane, the vehicles are charged according to the charging standard of the cars. With the increasingly mature deep learning and target detection technologies, the automatic detection and feature recognition of toll lane vehicles become an important research topic of an intelligent traffic system, and in the toll lane management of an expressway, the manpower consumption can be effectively reduced, and the fee escaping phenomenon can be efficiently counteracted. However, vehicle detection and feature recognition in toll lanes have high requirements on real-time performance and accuracy of the system. If the real-time performance does not meet the requirement, the system cannot be normally used; if the accuracy does not meet the requirement, a large number of misjudgments easily occur in the system, and the normal work of the toll lane is influenced. Therefore, how to simultaneously improve the real-time performance and the accuracy of the detection and identification system is very important, and the method is also a great hot direction of the current research, and has important significance and value for an intelligent traffic system of a toll lane.
At present, most vehicle feature recognition systems model the background in the toll lane video by adopting a Gaussian mixture background subtraction algorithm (GMBSD) so as to realize vehicle detection and tracking, but the method has low accuracy and no universality when the vehicle is congested. At present, many target detection algorithms based on deep learning, such as fast R-CNN, SSD and the like, have better detection accuracy, but the target detectors have lower real-time performance and cannot be deployed on a large scale effectively and economically, and because of no subsequent vehicle tracking algorithm, the system can easily perform repeated vehicle feature identification on the same vehicle, and even if the subsequent vehicle tracking algorithm and vehicle feature identification algorithm are simply added, the real-time performance of the system is lower, and the large-scale deployment is still difficult.
Disclosure of Invention
The invention aims to solve the problems that: aiming at the requirement of toll lane vehicle feature recognition, the recognition method adopted by the existing recognition system cannot give consideration to accuracy, instantaneity and economy, and cannot meet the requirements of large-scale deployment and accurate and real-time recognition. The invention aims to improve the real-time performance of the existing vehicle characteristic identification system without losing the accuracy; aiming at a vehicle feature recognition task, a target tracking method is realized, and the repetition of vehicle feature recognition is reduced; and the characteristic diagram obtained by detection is used for identifying the characteristics of the vehicle, so that the real-time performance of the system is improved.
The technical scheme of the invention is as follows: a toll lane vehicle feature recognition method based on videos comprises the following three steps of vehicle detection, vehicle tracking and vehicle feature recognition:
step S1, vehicle detection is carried out on the video of the toll lane based on the deep learning method, the characteristic diagram of the detected vehicle is stored after being normalized and pooled, and the position and category information of each vehicle is stored at the same time:
s1.1) training a convolutional neural network for vehicle detection, and classifying the detected vehicles into 3 types, namely a bus, a truck and a car;
s1.2) detecting each frame of picture of the toll lane video by using the convolutional neural network, wherein a detection object comprises the position and the category of each vehicle, the position refers to the coordinate, the width and the height of the center point of each vehicle, and the category refers to one of 3 categories;
s1.3) the feature maps of the video images of the detected vehicles are normalized and pooled to obtain sub-feature maps, the vehicle positions and the vehicle types are saved as detection information, an ID is used as an index for each vehicle, and the saved information is expressed as:
content(id)={featuremap,loc,class} (1)
in the formula, featuremap represents a feature map, and is a 3x3x 256-dimensional vector; the loc (x, y, w, h) represents position information, the four items respectively represent a central point abscissa, a central point ordinate, a vehicle width and a vehicle height, and values are all between 0 and 1; class (cls1, cls2 and cls3) represents the class of the vehicle, and the three items respectively represent the total frame number of the cars, the bus and the truck which are identified as targets by the current frame;
step S2, comparing the feature map similarity and the position of the detection information of the current frame and the detection information of the previous frame, marking the vehicles with similar comparison results as the same vehicle, and realizing the vehicle tracking function:
s2.1) comparing the vehicle detection information of the previous frame with the vehicle detection information of the current frame one by one, regarding a target with the similarity and the position distance of a feature map meeting set thresholds as a same vehicle target, changing the corresponding ID of the vehicle in the current frame with the same target into the corresponding ID of the vehicle in the previous frame, updating by using the corresponding detection information of the current frame until the same vehicle target does not appear in a video frame any more, realizing target tracking, wherein the same vehicle corresponds to the same ID in a multi-frame image at the moment, the detection information is the detection information corresponding to the last detected video frame of the vehicle, if the target in the current frame does not appear in the previous frame, regarding the target as a vehicle newly appearing in the video, regarding the ID of the vehicle corresponding to the current frame as the ID of the vehicle, and performing new tracking;
s2.2) carrying out weighted average on the current category and the categories of all historical frames belonging to the same vehicle target to obtain the final category of the vehicle, wherein the category average method is represented as follows:
cls=argmax(cls1,cls2,cls3) (3)
where argmax denotes an index value to the maximum value;
step S3, when the tracked vehicle passes through a polygonal interesting area marked in the video in advance, inputting the normalized sub-feature maps corresponding to the vehicle target into two deep learning sub-networks, respectively performing vehicle type recognition and color recognition, and storing all feature information to realize the toll lane vehicle feature recognition function:
s3.1) judging the position information of all vehicle targets of the current frame, if a certain target is in an interested area, extracting a sub-feature map corresponding to the target, wherein the mode for judging whether the target is in the interested area is as follows: sequentially traversing the vertexes of the polygon of the interested region, if the area of a sub-triangle formed by all the vertexes of the interested region and the vehicle central point is equal to that of the polygon, the point is positioned in the interested region, otherwise, the point is positioned outside the interested region, and the discriminant expression is as follows:
Figure BDA0001715263050000031
Figure BDA0001715263050000032
in the formula, Area represents the Area of a triangle, P represents the center point of the target, and R representsiThe ith point represents the clockwise order of the polygon, n represents the number of the points of the polygon, and if any area is equal to area', the target is positioned in the polygon;
s3.2) the obtained sub-feature map is passed through two convolutional neural networks to respectively obtain color information and vehicle type classification information, and the two convolutional neural networks use the collected toll lane video as training data for identifying the vehicle color and vehicle type information;
and S3.3) storing the color and the vehicle type information corresponding to the vehicle ID, wherein if the same ID target appears in all the subsequent frames, the color and the vehicle type information cannot be stored repeatedly, and the recognition of the vehicle characteristics of the toll lane is completed.
Preferably, the deep learning method used in step S1 is specifically:
carrying out vehicle detection on a frame image of a toll lane video by using a Single Shot MultiBox Detector algorithm, inputting a color image with the size of 300x300, wherein the convolutional neural network structure specifically comprises the following steps:
(1) the characteristic diagram scales used for detection are 10x10, 5x5, 3x3 and 1x 1;
(2) convolution kernels with the sizes of 5x5, 3x3 and 1x1 are used for detection and connected in parallel, the convolution kernels with three scales are filled, the sizes of feature graphs after convolution are guaranteed to be the same, and the corresponding zero value filling (padding) scales of the three scales are respectively 2, 1 and 0;
(3) the loss function used in training is divided into a location loss and a category loss, and the loss function is expressed as:
loss=lossloc*0.8+lossclass*0.2 (5)
in the formula, losslocsmoothL1() represents the positional regression loss, the loss function being smoothL1, lossclassSoftMax () represents the classification loss and the loss function is SoftMax.
Further, the method for normalizing and pooling the feature map used in step S1 specifically includes:
firstly, selecting a feature map with a scale of 38x38 as a reference feature map, mapping the vehicle size onto the reference feature map to obtain a sub-feature map, pooling the mapped sub-feature map by using a variable pooling step size and a pooling kernel, ensuring that the sizes of feature maps output after pooling are unified into 3x3, wherein the pooling step size and the pooling kernel size are uniquely determined by the size of the sub-feature map, and the determination method can be represented as:
Figure BDA0001715263050000041
wherein W and H are the width and length of the sub-feature map, and the horizontal step length of pooling is equal to the width of the pooling nucleus, both being sw(ii) a The longitudinal step length of the pooling is equal to the height of the pooling nucleus, and is sh,[]Meaning rounding down the real number.
Further, in S2.1), the comparison content for target tracking includes feature map similarity comparison and position distance comparison, where the feature map similarity comparison method is to calculate a feature histogram distance, and the smaller the distance, the higher the similarity, the distance calculation method is euclidean distance, and is expressed as:
Figure BDA0001715263050000042
in the formula, x1,y1,x2,y2Respectively represents the horizontal and vertical coordinates of the center point of a certain vehicle in the current frame and the horizontal and vertical coordinates of the center point of a certain vehicle in the previous frame.
The invention has the beneficial effects that:
in order to identify important features of vehicles passing through a toll lane, a target detection algorithm based on deep learning and a target tracking algorithm based on feature map comparison are combined to obtain a sub-feature map of the vehicle, and the features such as color, category and the like of the sub-feature map are identified by utilizing the deep learning algorithm;
the invention improves the vehicle detection algorithm, and in the SSD target detection algorithm, the characteristic diagram with low utilization rate is removed, thereby saving the time consumption during detection and improving the real-time performance of the system; and the loss of the training period and the scale of the convolution kernel are modified aiming at the task, so that the accuracy of the system is improved.
The method comprehensively considers the real-time performance and the accuracy of the system, cuts out the redundant part in the target detector, and improves the accuracy of vehicle detection by modifying the network structure and the loss function; meanwhile, a method combining feature map histogram comparison and position comparison is used for realizing a tracking algorithm of the detected vehicle, and the method has better robustness; finally, the system identifies the characteristics of colors, vehicle types and the like of the vehicles with the unique ID, the used input is not a picture, but a sub-characteristic diagram obtained by a detection network, the real-time performance of the system and the use efficiency of network parameters are improved, and the system has good real-time performance and effectiveness.
Drawings
FIG. 1 is a system framework diagram of the present invention.
Fig. 2 is a schematic diagram of the deep learning method used in step S1, i.e., the SSD network structure.
FIG. 3 is a schematic diagram of the structure of the SSD detection convolution kernel in step S1 according to the present invention.
FIG. 4 is a schematic diagram of the normalized pooling algorithm of the present invention.
FIG. 5 is a schematic diagram of a method for determining a relationship between a point and a polygon, where (a) is a point outside a graph and (b) is a point inside a graph.
FIG. 6 is a schematic diagram of a convolutional neural network for color and vehicle type recognition in the present invention.
FIG. 7 is a diagram illustrating the effect of the steps of the present invention, (a) is an input image; (b) is a detection effect graph; (c) is a tracking effect graph; (d) corresponding pictures for the vehicle sub-feature maps; (e) and the vehicle characteristic identification result is obtained.
Detailed Description
The invention provides a method for realizing a video-based toll lane vehicle feature recognition system, which can effectively realize vehicle detection and tracking, avoid repeated feature recognition on the same vehicle and further improve the accuracy and the real-time performance of vehicle feature recognition.
The invention is further illustrated with reference to the figures and examples.
The technical scheme of the invention is as follows: an implementation method of a video-based toll lane vehicle feature recognition system is provided. As shown in fig. 1, the method specifically includes three parts, namely vehicle detection, vehicle tracking and vehicle feature identification, and includes the following steps:
step S1: as shown in fig. 7(a), vehicle detection based on the deep learning method is performed for each frame of picture, and the feature maps of the detected vehicles are normalized and pooled and then stored, and at the same time, the position information of each vehicle is stored:
s1.1) carrying out picture interception and vehicle labeling on part of collected toll lane videos, wherein the obtained picture data are used for training a convolutional neural network, and the convolutional neural network divides detected vehicles into 3 types, namely buses, trucks and cars;
s1.2) detecting each frame of picture by using a trained convolutional neural network, wherein the detected result is the position and the category of each vehicle, the position is represented by the coordinates of a central point, the width and the height, and the category is represented by one of 3 categories as shown in FIG. 7 (b);
s1.3) the feature maps of the video images of the detected vehicles are normalized and pooled to obtain sub-feature maps, the vehicle positions and the vehicle types are saved as detection information, an ID is used as an index for each vehicle, and the saved information is expressed as: the information saved is represented as:
content(id)={featuremap,loc,class} (1)
in the formula, featuremap represents a feature map, and is a 3x3x 256-dimensional vector; the loc (x, y, w, h) represents position information, the four items respectively represent a central point abscissa, a central point ordinate, a vehicle width and a vehicle height, and values are all between 0 and 1; class (cls1, cls2, cls3) represents a vehicle class, and three items represent the total number of frames in which the target has been identified as a car, a bus, and a truck until now.
Step S2: sequencing the pictures of the detected vehicles according to the video, tracking the vehicles by the current frame and the previous frame, and comparing the feature map similarity and the position of the detection information of the current frame and the detection information of the previous frame, wherein the vehicles with similar comparison results are marked as the same vehicle to realize the vehicle tracking function as shown in fig. 7 (c);
s2.1) comparing the vehicle detection information of the previous frame with the vehicle detection information of the current frame one by one, regarding a target with the similarity and the position distance of a feature map meeting set thresholds as a same vehicle target, changing the corresponding ID of the vehicle in the current frame with the same target into the corresponding ID of the vehicle in the previous frame, updating by using the corresponding detection information of the current frame until the same vehicle target does not appear in a video frame any more, realizing target tracking, wherein the same vehicle corresponds to the same ID in a multi-frame image at the moment, the detection information is the detection information corresponding to the last detected video frame of the vehicle, if the target in the current frame does not appear in the previous frame, regarding the target as a vehicle newly appearing in the video, regarding the ID of the vehicle corresponding to the current frame as the ID of the vehicle, and performing new tracking;
the feature map similarity contrast method is to calculate the distance of the feature histogram, and the smaller the distance, the higher the similarity, the feature histogram statistical method is similar to the color histogram statistics, and the difference is that the statistical channel component is changed from the color three-channel value to the 256-channel feature value. The distance calculation method is Euclidean distance and is expressed as:
Figure BDA0001715263050000061
in the formula, x1,y1,x2,y2Respectively represents the horizontal and vertical coordinates of the center point of a certain vehicle in the current frame and the horizontal and vertical coordinates of the center point of a certain vehicle in the previous frame. After a comparison result is obtained, the targets with the similarity comparison result and the position comparison result meeting the set threshold value are regarded as the same vehicle target;
s2.2) carrying out weighted average on the current category and the categories of all historical frames belonging to the same target to obtain the final category of the vehicle, wherein the category average method is represented as:
cls=argmax(cls1,cls2,cls3) (3)
in the formula, argmax represents the value of indexing the maximum value, and cls1, cls2, and cls3 represent three components in the content of class, respectively.
Step S3: when the tracked vehicle passes through a polygonal interesting area marked manually in advance, the normalized sub-feature maps (step S1.3) corresponding to the vehicle targets are input into two deep learning sub-networks, vehicle type recognition and color recognition are respectively carried out, and all feature information is stored. The method realizes the function of recognizing the vehicle characteristics of the toll lane:
and S3.1) judging the position information of all vehicle targets in the current frame, if a certain target is in the region of interest, extracting a sub-feature map corresponding to the target, wherein an original image corresponding to the sub-feature map is shown in FIG. 7(d), and the sub-feature map is considered as a feature representation of the vehicle information and is similar to a color histogram. As shown in fig. 5, the manner of determining whether the target is located in the region of interest is to calculate the area of the sub-triangle, sequentially traverse the vertices of the polygon of the region of interest, if the area of the sub-triangle formed by all the vertices of the region of interest and the vehicle center point is equal to the area of the polygon, the point is located in the region of interest, otherwise, the point is located outside the region of interest, and the discriminant expression is:
Figure BDA0001715263050000071
Figure BDA0001715263050000072
in the formula, Area represents the Area of a triangle, P represents the center point of the target, and R representsiI-th point representing the polygon in clockwise order, and n represents the number of points of the polygon. If there is area', the target is located inside the polygon;
and S3.2) as shown in the graph (e) of FIG. 7, respectively obtaining color information and vehicle type classification information from the obtained sub-feature graph through two convolutional neural networks, wherein the two convolutional neural networks are trained, and the collected toll lane video is used as training data. The method is similar to S1.1, except that the inputs to these two convolutional neural networks are not images but feature maps, and the convolutional neural network structure is shown in fig. 6, where the colors are 8 types: black, white, red, yellow, blue, green, brown, silver: and vehicle types are in 76 types: BMW, the public, etc.;
s3.3) storing all the characteristic information, wherein each vehicle has a unique ID during storage, and the storage will not be repeated if targets with the same ID appear in all the following frames.
Further, in the above scheme, the deep learning algorithm used in step S1 is specifically:
and carrying out vehicle detection on the video frame image of the toll lane by using a Single Shot MultiBox (SSD) algorithm. The input to the SSD is a color image of size 300x 300. As shown in fig. 2, the SSD network structure is modified for the toll lane vehicle detection problem as follows:
(1) the characteristic diagram used for detection has the dimensions of 10x10, 5x5, 3x3 and 1x 1. The feature maps of 19x19 and 38x38 sizes originally used were deleted. Because the vehicle feature recognition of the toll lane only needs to detect the vehicles passing through the toll lane, and the vehicles are in the stronger visual angle of the camera, and the scale is generally larger. And the feature maps with the sizes of 19x19 and 38x38 are used for detecting small targets, so that the feature maps can be deleted in a vehicle detection task, and the real-time performance of the model is improved.
(2) As shown in fig. 3, the sizes of convolution kernels used for detection are changed to be 5x5, 3x3 and 1x1 convolution kernels which are connected in parallel, padding with different sizes is performed on the convolution kernels with three scales, the sizes of feature maps after convolution are guaranteed to be the same, and therefore feature map fusion can be performed. The corresponding padding of the three scales is 2, 1 and 0 respectively. This structure is similar to the inclusion structure. The purpose is to better extract the characteristic information of a plurality of reception fields and improve the accuracy of the network.
(3) The loss function used in training is divided into position loss and category loss, and the weighting proportion of the position loss is improved, so that the position of the detection result is more accurate. The redefined loss function may be expressed as:
loss=lossloc*0.8+lossclass*0.2 (5)
in the formula, losslocsmoothL1() represents the positional regression loss, the loss function being smoothL1, lossclassSoftMax () represents the classification loss and the loss function is SoftMax.
Since the classification target is only 3 classes, the classification task is simpler than the position regression task, and therefore, the detection accuracy is not reduced by properly reducing the proportion of class loss, and the detection accuracy is improved due to the fact that the proportion of position loss is improved.
The method for normalizing and pooling the feature map used in the step S1 specifically comprises the following steps:
firstly, a feature map with a scale of 38x38 is selected as a reference feature map. And mapping the size corresponding to the target, namely the size of the vehicle, to the reference characteristic diagram to obtain a sub characteristic diagram. The scale feature map is chosen because: (1) the semantic information of the scale characteristic graph is low, and vehicles in the same category can be effectively distinguished; (2) the maximum feature map size used for detection is 10x10, and the sub-feature map size corresponding to the target can be not smaller than 3x3 by using the feature map with the scale of 38x38 as a reference. And pooling the sub-feature maps obtained by mapping by using a variable pooling step size and a pooling kernel, and ensuring that sizes of feature maps output after pooling are unified to be 3x 3. As shown in fig. 4, the pooled step size and the pooled kernel size may be uniquely determined by the size of the sub-feature map, and the determination method may be expressed as:
Figure BDA0001715263050000081
wherein W and H are the width and length of the sub-feature map, and the horizontal step length of pooling is equal to the width of the pooling nucleus, both being sw(ii) a The longitudinal step length of the pooling is equal to the height of the pooling nucleus, and is sh。[]Meaning rounding down the real number.
Through the implementation, the vehicle identification of the lane video is realized.

Claims (4)

1. A method for realizing a video-based toll lane vehicle feature recognition system is characterized by comprising the following three steps of vehicle detection, vehicle tracking and vehicle feature recognition:
step S1, vehicle detection is carried out on the video of the toll lane based on the deep learning method, the characteristic diagram of the detected vehicle is stored after being normalized and pooled, and the position and category information of each vehicle is stored at the same time:
s1.1) training a convolutional neural network for vehicle detection, and classifying the detected vehicles into 3 types, namely a bus, a truck and a car;
s1.2) detecting each frame of picture of the toll lane video by using the convolutional neural network, wherein a detection object comprises the position and the category of each vehicle, the position refers to the coordinate, the width and the height of the center point of each vehicle, and the category refers to one of 3 categories;
s1.3) the feature maps of the video images of the detected vehicles are normalized and pooled to obtain sub-feature maps, the vehicle positions and the vehicle types are saved as detection information, an ID is used as an index for each vehicle, and the saved information is expressed as:
content(id)={featuremap,loc,class} (1)
in the formula, featuremap represents a feature map, and is a 3x3x 256-dimensional vector; the loc (x, y, w, h) represents position information, the four items respectively represent a central point abscissa, a central point ordinate, a vehicle width and a vehicle height, and values are all between 0 and 1; class (cls1, cls2 and cls3) represents the class of the vehicle, and the three items respectively represent the total frame number of the cars, the bus and the truck which are identified as targets by the current frame;
step S2, comparing the feature map similarity and the position of the detection information of the current frame and the detection information of the previous frame, marking the vehicles with similar comparison results as the same vehicle, and realizing the vehicle tracking function:
s2.1) comparing the vehicle detection information of the previous frame with the vehicle detection information of the current frame one by one, regarding a target with the similarity and the position distance of a feature map meeting set thresholds as a same vehicle target, changing the corresponding ID of the vehicle in the current frame with the same target into the corresponding ID of the vehicle in the previous frame, updating by using the corresponding detection information of the current frame until the same vehicle target does not appear in a video frame any more, realizing target tracking, wherein the same vehicle corresponds to the same ID in a multi-frame image at the moment, the detection information is the detection information corresponding to the last detected video frame of the vehicle, if the target in the current frame does not appear in the previous frame, regarding the target as a vehicle newly appearing in the video, regarding the ID of the vehicle corresponding to the current frame as the ID of the vehicle, and performing new tracking;
s2.2) carrying out weighted average on the current category and the categories of all historical frames belonging to the same vehicle target to obtain the final category of the vehicle, wherein the category average method is represented as follows:
cls=argmax(cls1,cls2,cls3) (3)
where argmax denotes an index value to the maximum value;
step S3, when the tracked vehicle passes through a polygonal interesting area marked in the video in advance, inputting the normalized sub-feature maps corresponding to the vehicle target into two deep learning sub-networks, respectively performing vehicle type recognition and color recognition, and storing all feature information to realize the toll lane vehicle feature recognition function:
s3.1) judging the position information of all vehicle targets of the current frame, if a certain target is in an interested area, extracting a sub-feature map corresponding to the target, wherein the mode for judging whether the target is in the interested area is as follows: sequentially traversing the vertexes of the polygon of the region of interest, if the area of the sub-triangle formed by all the vertexes of the region of interest and the center point of the vehicle is equal to the area' of the polygon, the point is positioned in the region of interest, otherwise, the point is positioned outside the region of interest, and the discriminant is expressed as:
Figure FDA0002919044660000021
Figure FDA0002919044660000022
where Area represents the Area of the triangle, P tableCenter point of target, RiThe ith point represents the clockwise order of the polygon, n represents the number of the points of the polygon, and if any area is equal to area', the target is positioned in the polygon;
s3.2) the obtained sub-feature map is passed through two convolutional neural networks to respectively obtain color information and vehicle type classification information, and the two convolutional neural networks use the collected toll lane video as training data for identifying the vehicle color and vehicle type information;
and S3.3) storing the color and the vehicle type information corresponding to the vehicle ID, wherein if the same ID target appears in all the subsequent frames, the color and the vehicle type information cannot be stored repeatedly, and the recognition of the vehicle characteristics of the toll lane is completed.
2. The method for implementing a video-based toll lane vehicle feature recognition system as claimed in claim 1, wherein the deep learning method used in step S1 is specifically:
carrying out vehicle detection on a frame image of a toll lane video by using a Single Shot MultiBox Detector algorithm, inputting a color image with the size of 300x300, wherein the convolutional neural network structure specifically comprises the following steps:
(1) the characteristic diagram scales used for detection are 10x10, 5x5, 3x3 and 1x 1;
(2) the convolution kernels with the sizes of 5x5, 3x3 and 1x1 used for detection are connected in parallel, the convolution kernels with three scales are filled, the sizes of feature graphs after convolution are ensured to be the same, and the filling scales of zero values corresponding to the three scales are respectively 2, 1 and 0;
(3) the loss function used in training is divided into a position regression loss and a classification loss, and the loss function is expressed as:
loss=lossloc*0.8+lossclass*0.2 (5)
in the formula, losslocsmoothL1() represents the positional regression loss, the loss function being smoothL1, lossclassSoftMax () represents the classification loss and the loss function is SoftMax.
3. The method for implementing a video-based toll lane vehicle feature recognition system as claimed in claim 1, wherein the feature map normalization pooling method used in step S1 is specifically:
firstly, selecting a feature map with a scale of 38x38 as a reference feature map, mapping the vehicle size onto the reference feature map to obtain a sub-feature map, pooling the mapped sub-feature map by using a variable pooling step size and a pooling kernel, ensuring that the sizes of feature maps output after pooling are unified into 3x3, wherein the pooling step size and the pooling kernel size are uniquely determined by the size of the sub-feature map, and the determination method can be represented as:
Figure FDA0002919044660000031
wherein W and H are the width and length of the sub-feature map, and the horizontal step length of pooling is equal to the width of the pooling nucleus, both being sw(ii) a The longitudinal step length of the pooling is equal to the height of the pooling nucleus, and is sh,[]Meaning rounding down the real number.
4. The method for implementing the video-based toll lane vehicle feature recognition system according to claim 1, wherein in S2.1), the comparison contents for target tracking include feature map similarity comparison and position distance comparison, the feature map similarity comparison method is to calculate a feature histogram distance, and the smaller the distance, the higher the similarity, the distance calculation method is euclidean distance, expressed as:
Figure FDA0002919044660000032
in the formula, x1,y1,x2,y2Respectively represents the horizontal and vertical coordinates of the center point of a certain vehicle in the current frame and the horizontal and vertical coordinates of the center point of a certain vehicle in the previous frame.
CN201810705071.XA 2018-07-02 2018-07-02 Method for realizing video-based toll lane vehicle feature recognition system Active CN109190444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810705071.XA CN109190444B (en) 2018-07-02 2018-07-02 Method for realizing video-based toll lane vehicle feature recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810705071.XA CN109190444B (en) 2018-07-02 2018-07-02 Method for realizing video-based toll lane vehicle feature recognition system

Publications (2)

Publication Number Publication Date
CN109190444A CN109190444A (en) 2019-01-11
CN109190444B true CN109190444B (en) 2021-05-18

Family

ID=64948776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810705071.XA Active CN109190444B (en) 2018-07-02 2018-07-02 Method for realizing video-based toll lane vehicle feature recognition system

Country Status (1)

Country Link
CN (1) CN109190444B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886312B (en) * 2019-01-28 2023-06-06 同济大学 Bridge vehicle wheel detection method based on multilayer feature fusion neural network model
CN109902733A (en) * 2019-02-22 2019-06-18 北京三快在线科技有限公司 The method, apparatus and storage medium of typing Item Information
CN110223279B (en) * 2019-05-31 2021-10-08 上海商汤智能科技有限公司 Image processing method and device and electronic equipment
CN110516703A (en) * 2019-07-18 2019-11-29 平安科技(深圳)有限公司 Vehicle identification method, device and storage medium based on artificial intelligence
CN110517293A (en) * 2019-08-29 2019-11-29 京东方科技集团股份有限公司 Method for tracking target, device, system and computer readable storage medium
CN110555867B (en) * 2019-09-05 2023-07-07 杭州智爱时刻科技有限公司 Multi-target object tracking method integrating object capturing and identifying technology
CN110569785B (en) * 2019-09-05 2023-07-11 杭州智爱时刻科技有限公司 Face recognition method integrating tracking technology
CN111523419A (en) * 2020-04-13 2020-08-11 北京巨视科技有限公司 Video detection method and device for motor vehicle exhaust emission
CN112668497B (en) * 2020-12-30 2022-05-20 南京佑驾科技有限公司 Vehicle accurate positioning and identification method and system
CN113033449A (en) * 2021-04-02 2021-06-25 上海国际汽车城(集团)有限公司 Vehicle detection and marking method and system and electronic equipment
CN113371035B (en) * 2021-08-16 2021-11-23 山东矩阵软件工程股份有限公司 Train information identification method and system
TWI783723B (en) * 2021-10-08 2022-11-11 瑞昱半導體股份有限公司 Character recognition method, character recognition device and non-transitory computer readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2532075A (en) * 2014-11-10 2016-05-11 Lego As System and method for toy recognition and detection based on convolutional neural networks
CN105868700A (en) * 2016-03-25 2016-08-17 哈尔滨工业大学深圳研究生院 Vehicle type recognition and tracking method and system based on monitoring video
CN107066953A (en) * 2017-03-22 2017-08-18 北京邮电大学 It is a kind of towards the vehicle cab recognition of monitor video, tracking and antidote and device
CN107133974A (en) * 2017-06-02 2017-09-05 南京大学 The vehicle type classification method that Gaussian Background modeling is combined with Recognition with Recurrent Neural Network
CN108171112A (en) * 2017-12-01 2018-06-15 西安电子科技大学 Vehicle identification and tracking based on convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10157441B2 (en) * 2016-12-27 2018-12-18 Automotive Research & Testing Center Hierarchical system for detecting object with parallel architecture and hierarchical method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2532075A (en) * 2014-11-10 2016-05-11 Lego As System and method for toy recognition and detection based on convolutional neural networks
CN105868700A (en) * 2016-03-25 2016-08-17 哈尔滨工业大学深圳研究生院 Vehicle type recognition and tracking method and system based on monitoring video
CN107066953A (en) * 2017-03-22 2017-08-18 北京邮电大学 It is a kind of towards the vehicle cab recognition of monitor video, tracking and antidote and device
CN107133974A (en) * 2017-06-02 2017-09-05 南京大学 The vehicle type classification method that Gaussian Background modeling is combined with Recognition with Recurrent Neural Network
CN108171112A (en) * 2017-12-01 2018-06-15 西安电子科技大学 Vehicle identification and tracking based on convolutional neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Fusion Feature for Vehicle Classification and Recognition;Hong Qiao等;《2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference》;20180527;1364-1371 *
基于卷积神经网络的运动车辆视频检测方法;陈林凯等;《2016年全国通信软件学术会议程序册与交流文集》;20160624;52-57 *
视觉车辆识别迁移学习算法;蔡英凤等;《东南大学学报(自然科学版)》;20150430;275-280 *

Also Published As

Publication number Publication date
CN109190444A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109190444B (en) Method for realizing video-based toll lane vehicle feature recognition system
CN107563372B (en) License plate positioning method based on deep learning SSD frame
CN110069986B (en) Traffic signal lamp identification method and system based on hybrid model
CN109977782B (en) Cross-store operation behavior detection method based on target position information reasoning
CN110097044B (en) One-stage license plate detection and identification method based on deep learning
WO2017190574A1 (en) Fast pedestrian detection method based on aggregation channel features
CN105809184B (en) Method for real-time vehicle identification and tracking and parking space occupation judgment suitable for gas station
CN109902806A (en) Method is determined based on the noise image object boundary frame of convolutional neural networks
WO2021238019A1 (en) Real-time traffic flow detection system and method based on ghost convolutional feature fusion neural network
CN109726717B (en) Vehicle comprehensive information detection system
CN103824081B (en) Method for detecting rapid robustness traffic signs on outdoor bad illumination condition
CN106682586A (en) Method for real-time lane line detection based on vision under complex lighting conditions
Zhang et al. Study on traffic sign recognition by optimized Lenet-5 algorithm
CN107315998B (en) Vehicle class division method and system based on lane line
Shi et al. A vision system for traffic sign detection and recognition
Li et al. Robust vehicle detection in high-resolution aerial images with imbalanced data
CN112651293B (en) Video detection method for road illegal spreading event
CN111915583A (en) Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene
CN112560852A (en) Single-stage target detection method with rotation adaptive capacity based on YOLOv3 network
CN112381870A (en) Ship identification and navigational speed measurement system and method based on binocular vision
CN111860509A (en) Coarse-to-fine two-stage non-constrained license plate region accurate extraction method
CN114049572A (en) Detection method for identifying small target
Xu et al. Convolutional neural network based traffic sign recognition system
Liu et al. A large-scale benchmark for vehicle logo recognition
Wang et al. Vehicle key information detection algorithm based on improved SSD

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant