CN110991447A - Train number accurate positioning and identification method based on deep learning - Google Patents

Train number accurate positioning and identification method based on deep learning Download PDF

Info

Publication number
CN110991447A
CN110991447A CN201911166263.9A CN201911166263A CN110991447A CN 110991447 A CN110991447 A CN 110991447A CN 201911166263 A CN201911166263 A CN 201911166263A CN 110991447 A CN110991447 A CN 110991447A
Authority
CN
China
Prior art keywords
train number
train
network
positioning
panoramic image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911166263.9A
Other languages
Chinese (zh)
Other versions
CN110991447B (en
Inventor
邹琪
艾鑫
罗常津
杨文冠
丁正刚
胡宸瀚
周通
阳勇杰
徐嫣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Beijing Jingwei Information Technology Co Ltd
Original Assignee
Beijing Jiaotong University
Beijing Jingwei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University, Beijing Jingwei Information Technology Co Ltd filed Critical Beijing Jiaotong University
Priority to CN201911166263.9A priority Critical patent/CN110991447B/en
Publication of CN110991447A publication Critical patent/CN110991447A/en
Application granted granted Critical
Publication of CN110991447B publication Critical patent/CN110991447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a train number accurate positioning and identifying method based on deep learning, which comprises the following steps: collecting a panoramic image of the train, and adjusting the size of the panoramic image; constructing a train number positioning network, and training the train number positioning network by taking the adjusted panoramic image as a training set; training the train number recognition network according to the train number area output by the train number positioning network; adjusting the size of a panoramic image of the train to be identified, and inputting the panoramic image into a trained train number positioning network to obtain a train number area accurately positioned; and inputting the train number area into a trained train number identification network for identification to obtain a train number digital identification result. By the method, the vehicle number sequence with any length can be processed, the defects that manual features in a complex scene and an existing deep learning method are low in positioning accuracy and difficult to distinguish small-size vehicle numbers are overcome, character segmentation is not involved, and overall recognition is achieved.

Description

Train number accurate positioning and identification method based on deep learning
Technical Field
The invention relates to the technical field of intelligent transportation, in particular to a train number accurate positioning and identification method based on deep learning.
Background
The automation of the train inspection and goods inspection operation is an important target of railway informatization, and the positioning and identification of the train number are one of basic tasks of the inspection automation, so that the information of the train number can be automatically recorded, and huge manpower and material resources are prevented from being consumed. When the abnormal condition of the freight train or the passenger train is detected, the train number information is automatically used as a mark of the train identity, is linked with the safety state information of the equipment, and gives an alarm to the control center, so that the train number identification is particularly important in the automation of train inspection and goods inspection operation.
The conventional train number Identification system ((Automatic Terminal Information Service, ATIS)) is mainly realized based on Radio Frequency Identification (RFID) technology, but the accuracy of the system mainly depends on an RFID tag installed at the bottom of a train, and the RFID tag is easy to damage and lose, so that the accuracy of the train number Identification is difficult to ensure.
In recent years, the computer vision technology is used for automatically identifying the train number, the automatic monitoring of the train can be realized without additionally arranging other special devices for the train, and great convenience is brought to the identification of the train number. However, identifying car numbers from images or videos also faces challenges: firstly, the proportion of the train number area in the train image in the original image is extremely small, and a large amount of text information which is not the train number interferes, so that the small target in the oversized image (the minimum of the carriage image is 5847 × 2048, the maximum of the carriage image is 12693 × 2048, and the image belongs to the oversized resolution image) with the interfering information is detected and identified. The conventional text detection method cannot achieve a good effect under the condition. The conventional text detection method is validated on a public data set of natural scene text detection, ICDAR15, which includes two subsets: less difficult FSTD datasets and more difficult ISTD datasets. In the FSTD dataset, the text region accounts for about 8.1% of the original image; in the ISTD data set, the text area accounts for about 0.49% in the original image, whereas in the train number recognition task, the number area accounts for 0.21% to 0.40% in the original image. Secondly, the complex environment also brings difficulty to train number identification, and the complex scene contains three factors, namely the background environment is complex. Under the all-weather monitoring environment of 24 hours, the train images shot by the camera comprise scenes in different seasons (sunny days, rain and snow) and different illumination (day and night); secondly, the appearance difference of the monitored objects is large. Trains (flat cars, open cars, boxcars, tank cars, hopper cars and the like) of different types have very different appearance characteristics, the positions of car numbers on a carriage are different, and some open cars are covered by canvas, which causes difficulty in positioning the car numbers; and thirdly, the train number identification interference factor is characterized by character breakage and large number interval, and shadow, doodling, symbols and carriage fouling also bring interference to the train number identification.
The existing text positioning method based on deep learning succeeds in natural scene text detection, but the image effect of directly applying the text positioning method to the text task is not ideal. On one hand, the method is not suitable for detecting the car number with small proportion of the car number area; on the other hand, the natural scene text detection based on the word level is not suitable for the condition that the train number interval is large. Such methods easily position the car number into two separate parts, and cannot achieve positioning of a complete car number.
Therefore, in the case of the above-described complicated scene, accurate identification of the train number becomes an urgent problem to be solved.
Disclosure of Invention
The invention provides a train number accurate positioning and identifying method based on deep learning, which aims to overcome the defects in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
The invention provides a train number accurate positioning and identifying method based on deep learning, which comprises the following steps:
collecting a panoramic image of the train, and adjusting the size of the panoramic image;
constructing a train number positioning network, and training the train number positioning network by taking the adjusted panoramic image as a training set;
training the train number recognition network according to the train number area output by the train number positioning network;
adjusting the size of a panoramic image of the train to be identified, and inputting the panoramic image into a trained train number positioning network to obtain a train number area accurately positioned;
and inputting the train number area into a trained train number identification network for identification to obtain a train number digital identification result.
Preferably, resizing the panorama comprises: the panorama is scaled to a picture with a height of 600 pixels or a picture with a width of 1000 pixels.
Preferably, the objective function of the constructed train number positioning network is shown as the following formula (1), and the objective function is minimized as a final objective:
Figure BDA0002287533430000031
wherein the content of the first and second substances,
Figure BDA0002287533430000032
judging whether the reliability prediction loss of the vehicle number is judged;
Figure BDA0002287533430000033
the vertical direction coordinate prediction loss of the vehicle number candidate area is obtained;
Figure BDA0002287533430000034
the horizontal direction coordinate prediction loss of the vehicle number candidate area; siA score representing the probability value that the ith reference frame is predicted to be the car number or the non-car number, namely that the ith reference frame belongs to and does not belong to the car number;
Figure BDA0002287533430000035
is the classification score of groudtruth; i is the training sample, i.e. the index of the reference box; vjIs a vector formed by the ordinate and the height of the jth reference frame;
Figure BDA0002287533430000041
is a vector consisting of the ordinate and the height of the GroudTruth; j is the index of the reference frame located within the regression range in the y direction, cxIs the abscissa of the center point of the predicted car number candidate frame;
Figure BDA0002287533430000042
is the abscissa of the central point of the group Truth;
Figure BDA0002287533430000043
is the abscissa of the center point of the reference frame; w is a(a)Is the width of the reference frame; k is the index of the reference box located within the regression range in the x-direction,
Figure BDA0002287533430000044
adopting Softmax loss;
Figure BDA0002287533430000045
loss using smoothL1Losing; lambda [ alpha ]12Are empirical parameters, 1.0 and 2.0, respectively; ns represents the number of reference frames in the iterative process of the optimization objective function; n is a radical ofVRepresenting the number of reference frames located within the y-direction regression range; n is a radical ofdIndicating the number of reference boxes that lie within the regression range in the x-direction.
Preferably, j is an index of a reference frame located in the y-direction regression range, and the reference frame whose area intersection ratio with the groudtruth is greater than a set threshold needs to be satisfied; k is the index of the reference frame positioned in the regression range of the x direction, and the right boundary of the reference frame positioned in the vehicle number group Truth needs to be extended to the right by w(a)The pixel width, or the GroudTruth left boundary, extends to the left by w(a)All reference frames within the pixel width.
Preferably, training the train number positioning network by using the adjusted panoramic image as a training set, includes: for difficult samples according to difficult sample mining strategy
Figure BDA0002287533430000047
The difficult samples are marked as 1: IoU of the small text segment GroudTruth within a certain range of the left and right boundaries of the vehicle number area exceeds a set threshold, or IoU of the small text segment GroudTruth is the maximum value;
abscissa c of center point of predicted vehicle number candidate framexIn the process, a boundary-sensitive fine-grained text box accurate positioning strategy is used, namely, the manually marked whole vehicle number text line group Truth is divided into small group Truths with fixed widths, and the text line group Truth of one vehicle number is divided into 6-10 small group Truths with fixed widths; the center coordinates of the small group Truth for determining the left and right boundaries are calculated
Figure BDA0002287533430000046
And then, taking the right boundary of the whole text line group Truth as a starting point, extending the fixed pixel width to the left to generate the small group Truth, or taking the left boundary of the whole text line group Truth as a starting point, extending the fixed pixel width to the right to generate the small group Truth.
Preferably, the threshold is set to a fixed value within the interval [0.5,0.7 ].
Preferably, the train number recognition network is an Attention-OCR network.
Preferably, training the train number positioning network by using the adjusted panoramic image as a training set, includes: and when the proportion of the train number area in the train panoramic image is less than 0.3%, adding a characteristic pyramid network during characteristic extraction.
According to the technical scheme provided by the method for accurately positioning and identifying the train number based on deep learning, the train number text positioning stage of the method generates the minimum rectangular frame containing the text by constructing the train number positioning network, designs the horizontal direction regression layer of the minimum rectangular frame according to the layout characteristics of the train number, and simultaneously adopts a boundary sensitive fine-grained text frame accurate positioning strategy and a difficult sample mining strategy in the network training process to obtain more accurate sample positioning. The car number text recognition adopts a deep learning method based on an attention mechanism, can realize integral recognition, can process car number sequences with any length, does not relate to a character segmentation process, and avoids error accumulation effect caused by character segmentation errors on recognition, so that real-time accurate positioning and accurate recognition are further realized, and the defect of low accuracy of manual feature positioning in a complex scene is overcome.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for accurately locating and identifying train numbers based on deep learning according to an embodiment;
fig. 2 is a schematic diagram of a division manner of a group route in the HEM and BSF policies, where (a) and (b) are schematic diagrams of a division manner of a left and right boundary group route in the HEM policy; (c) (d) is a schematic diagram of a right-left boundary GroudTruth division mode in a BSF strategy;
FIG. 3 is a schematic diagram illustrating the comparison of the train number positioning and recognition results of the embodiment;
FIG. 4 is a schematic diagram illustrating another train number positioning and identification result comparison according to the embodiment.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, and/or operations, but do not preclude the presence or addition of one or more other features, integers, steps, and/or operations. It should be understood that the term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein. In the invention, the area Intersection ratio (IoU) is also called as the overlapping degree, namely the ratio of an Intersection area to a Union area, wherein a numerator is an Intersection overlapping area between a detection frame to be calculated and a group-channel, and a denominator is the Union area.
To facilitate understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the accompanying drawings.
Examples
Fig. 1 is a schematic flow chart of a method for accurately locating and identifying a train number based on deep learning according to this embodiment, and with reference to fig. 1, the method includes:
s1, collecting a panoramic image of the train, and adjusting the size of the panoramic image.
Collecting a train panoramic image collected by a monitoring camera, and scaling the panoramic image to a picture with the height of 600 pixels or a picture with the width of 1000 pixels according to the same proportion.
The specific method comprises the following steps: inputting a panorama with an arbitrary size, scaling the width and height of the input panorama to the same scale until the height is 600 pixels, and if the width exceeds 1000 pixels at this time, continuing to reduce the input panorama until the width of the panorama is 1000 pixels.
S2, a train number positioning network is constructed, and the adjusted panoramic image is used as a training set to train the train number positioning network.
The train number positioning network of the embodiment includes: feature extraction, text box detection and boundary regression. The method mainly comprises the steps of optimizing a basic network connected with a Text suggestion network (CTPN), designing and constructing an objective function of the train number positioning network as shown in the following formula (1), and minimizing the objective function into a final target:
Figure BDA0002287533430000071
wherein the content of the first and second substances,
Figure BDA0002287533430000072
judging whether the reliability prediction loss of the vehicle number is judged;
Figure BDA0002287533430000073
the vertical direction coordinate prediction loss of the vehicle number candidate area is obtained;
Figure BDA0002287533430000074
the horizontal direction coordinate prediction loss of the vehicle number candidate area; siA score representing the probability value that the ith reference frame is predicted to be the car number or the non-car number, namely that the ith reference frame belongs to and does not belong to the car number;
Figure BDA0002287533430000075
is the classification score of groudtruth; i is the training sample, i.e. the index of the reference box; vjIs a vector formed by the ordinate and the height of the jth reference frame;
Figure BDA0002287533430000081
is a vector consisting of the ordinate and the height of the GroudTruth; j is the index of the reference frame located within the regression range in the y direction, cxIs the abscissa of the center point of the predicted car number candidate frame;
Figure BDA0002287533430000082
is the abscissa of the central point of the group Truth;
Figure BDA0002287533430000083
is the abscissa of the center point of the reference frame; w is a(a)Is the width of the reference frame; k is the index of the reference box located within the regression range in the x-direction,
Figure BDA0002287533430000084
adopting Softmax loss;
Figure BDA0002287533430000085
losses with smoothL 1; lambda [ alpha ]12Are empirical parameters, each of 1.0 and 2.0; ns represents the number of reference frames in the iterative process of the optimization objective function; n is a radical ofVRepresenting the number of reference frames located within the y-direction regression range; n is a radical ofdIndicating the number of reference boxes that lie within the regression range in the x-direction.
The above formula mainly combines the reliability prediction loss of judging whether the car number is the car number
Figure BDA0002287533430000086
Vertical coordinate prediction loss of vehicle number candidate area
Figure BDA0002287533430000087
Horizontal direction coordinate prediction loss of vehicle number candidate area
Figure BDA0002287533430000088
And constructing an objective function by the aid of the three parts of multitask loss functions. The reference frame is: after extracting features of an input picture, extracting candidate text boxes (candidate boxes for short) on a feature map, setting different reference boxes for each candidate box, wherein the abscissa and the width of the reference box and the width of the candidate box are the same, and the height of the reference box and the width of the candidate box are different in a certain interval. Each reference box has 6 attributes: score belonging to car number, score not belonging to car number, ordinate, height, abscissa and width.
J is an index of a reference frame located in the y-direction regression range, and the reference frame which meets the condition that the area intersection ratio of the reference frame and the GroudTruth is greater than a set threshold value needs to be met; k is the index of the reference frame positioned in the regression range of the x direction, and the right boundary of the reference frame positioned in the vehicle number group Truth needs to be extended to the right by w(a)The pixel width, or the GroudTruth left boundary, extends to the left by w(a)All reference frames within the pixel width.
For difficult samples according to Hard sample Mining strategy (HEM)
Figure BDA0002287533430000089
Note 1, which ensures that no positive samples containing only a small fraction of the car number area are missed. The difficult sample is a small text segment G within a certain range with the left and right boundaries of the car number areaIoU of roundtrip exceeds a set threshold or is at a maximum value with IoU of the small text segment groudtruth. The way of generating the small group truth corresponding to the difficult sample is as follows: dividing manually marked whole vehicle number text line into fixed width (optional [12, 18)]Fixed value in interval), fig. 2(a) and (b) are schematic diagrams of dividing the left and right boundaries group in the HEM policy, the mode of fig. 2(b) is adopted for dividing the text line group from left to right, the mode of fig. 2(a) is adopted for dividing the text line group from right to left, the small group corresponding to the difficult sample is the inner frame of the double-line frame, the reference frame exceeding the threshold value is IoU of the double-line frame, or the reference frame with the maximum IoU of the double-line frame is the difficult sample. It can be ensured that some positive samples at the edge are not missed, and difficult sample mining at the edge is achieved, at this time, some positive samples at the edge may be discarded if the text line group try is divided in the manner of fig. 2(c) and (d). Therefore, the HEM strategy adopted by the embodiment effectively improves the situation that the left and right boundaries are insufficient due to the omission of the positive samples at the edges.
Third term of horizontal boundary regression objective function
Figure BDA0002287533430000091
Abscissa c of center point of predicted vehicle number candidate framexIn the process, a Boundary-Sensitive Fine-grained text-Box (BSF) accurate positioning strategy is used, namely, the manually marked whole vehicle number text line is divided into small groups with fixed widths, and the text line group of one vehicle number is divided into 6-10 small groups with fixed widths; the center coordinates of the small group Truth for determining the left and right boundaries are calculated
Figure BDA0002287533430000092
And when the small group Truth is generated, the right boundary of the whole text line group Truth is taken as a starting point, the fixed pixel width is extended leftwards to generate the small group Truth, or the left boundary of the whole text line group Truth is taken as a starting point, the fixed pixel width is extended rightwards to generate the small group Truth, so that the compact surrounding vehicle number information is ensured.
Illustratively, taking 16 pixels from a fixed width of a small group truth as an example, when a whole text line group truth is divided from left to right, if a last small group truth is less than 16 pixels wide, a right boundary of the whole text line group truth is taken as a starting point, the width of the right boundary is extended to the left by 16 pixels to generate the small group truth, fig. 2(c) is a schematic diagram of a dividing mode of the group truth obtained by the boundary sensitive fine-grained text box accurate positioning BSF policy of the embodiment, and a dashed frame is the small group truth of the right boundary. Similarly, the dashed box in FIG. 2- (d) is the small group Truth at the left boundary. It can be seen that this approach effectively ameliorates the case of left and right boundary redundancy.
Wherein, the threshold value is set as a fixed value in the interval of [0.5,0.7 ].
Therefore, in two stages of regression and positive sample selection, different grouping Truth division modes are adopted respectively. The accuracy of the regression boundary can be guaranteed, and reasonable positive samples can be selected.
Further, when the proportion of the train number area in the train panoramic image is less than 0.3%, a characteristic pyramid network is added during characteristic extraction.
In the real data set collected from the marshalling station, the images are large in size and varied in size. The long edge is scaled to 1000 pixels and the height is scaled equally during both training and testing. The object with the ratio less than 0.3% in the original image is zoomed, the width or high-pass is usually only a few pixels to a dozen pixels, and the conventional object positioning method is difficult to detect. In view of the fact that the size of the car number is variable, the positioning model needs to be able to detect both large car numbers and small car numbers. In the embodiment, the multi-scale problem is solved by adding the characteristic pyramid network, the middle-level and high-level characteristics of the characteristic pyramid network are fused, and preferably, the 4 th layer and the 5 th layer are taken for predicting the coordinates and scores of the car number area, so that the problem of car number positioning with different resolutions is solved.
Specifically, the method comprises the following steps:
a: and performing calculation of feature extraction through a convolutional network to obtain features of different levels from a bottom layer to a high layer.
B: the high-level features are up-sampled, and then convolution operation is carried out on the adjacent low levels of the high-level features, and then the high-level features and the adjacent low levels of the high-level features are fused to obtain a new feature map.
And generating reference frames with different sizes on different fusion layers by utilizing the fusion of the high layer and the low layer, and performing regression and classification operations respectively to generate the candidate regions of the train number text.
S3 train the train number recognition network according to the train number area output by train number positioning network training.
The car number recognition network is an Attention-OCR network.
S4, the size of the panoramic image of the train to be identified is adjusted and input into the trained train number positioning network, and the accurately positioned train number area is obtained.
The resizing in this step is the same as in step S1, and the panorama is scaled to a picture with a height of 600 pixels or a picture with a width of 1000 pixels.
The accurate positioning means that a positioned area frame tightly surrounds the train number, and the number or the local part of the number is not omitted, and the area does not contain redundant areas.
And S5, inputting the train number area into a trained train number recognition network for recognition to obtain a train number recognition result.
Inputting the obtained train number area into a trained Attention-OCR network for recognition, processing the characteristic sequence through a convolution layer, an encoding layer and a decoding layer to obtain specific train number content, and outputting a 7-digit train number recognition result
The following are simulation experiments performed using the above method:
since the continuous 7 digital texts on the surface of the train are the required train number information, other letters, Chinese characters and identifiers are interference texts. Therefore, only the vehicle number information needs to be identified in the positioning and identifying process, and interference of other information is avoided.
The simulation embodiment divides the acquired train panorama according to the proportion of 2:1 of the training set and the test set. The test set is used for testing and evaluating the effect of the method, and has no intersection with the training set.
4352 panoramic images of trains shot by a monitoring camera from a certain freight train station are randomly marked in the experimental process to serve as a training set, and each image comprises a train number area. The test set used 2109 images labeled randomly without intersection with the training set.
The public evaluation indexes for the car number positioning and car number identification results are 3: precision, recall, F1-Measure (F1 value). F1-Measure is the harmonic mean of precision rate and recall rate. The accuracy rate represents the ratio of the number of correct results obtained by positioning or recognition to the total number of all positioning or recognition results, and the recall rate represents the ratio of the number of correct results obtained by positioning or recognition to the number of all manually labeled GroundTruth results. All 3 indexes are between 0 and 1, and the closer to 1, the better the effect is.
The train number positioning adopts the train number positioning network of the embodiment. The experimental environment is Ubuntu 16.04, and the computer is provided with a Tesla K40c display card. The input image may be of any size, the width and height of the input image being scaled up to a height of 600 pixels, and if the width exceeds 1000 pixels at this time, the input image continues to be scaled down until the width of the image is 1000 pixels. When the reference frame is generated, its width is set to a fixed width of 16 pixels, IoU being selected to be 0.5.
The feature pyramid network used in this simulation example is a newp5 layer obtained for the convolution layer with the 5 th layer of the feature extraction layer connected to 1 × 1.
The high-level feature (newp5) is up-sampled by 2 times, and then the 4 th level is subjected to 1 x 1 convolution operation, and then the two are fused to obtain a new feature map (newp 4). Generating a reference frame with the width fixed to 16 and the height uniformly changed by 0.7 times from 11 to 283 on the new feature layer according to the features of the newp5 layers; reference frames with fixed width of 16 and uniformly varying height of 0.7 from 6 to 142 were generated from the newp4 layer features.
In a boundary-sensitive fine-grained textbox strategy and a difficult sample mining strategy, a complete textline GroudTruth is divided into 6 small GroudTruths with fixed widths of 16.
The basic network in the prior art is adopted to position the car number, and the obtained accuracy rate, recall rate and F1 indexes are 0.86, 0.81 and 0.83 respectively (the indexes are between 0 and 1, the closer to 1, the better the description effect).
Only a horizontal direction regression layer is added on a basic network, the obtained accuracy rate, the recall rate and the F1 index are respectively increased to 0.89, 0.84 and 0.86, a boundary sensitive fine-grained text box accurate positioning strategy and a difficult sample mining strategy are added in the horizontal direction regression layer, the accuracy rate, the recall rate and the F1 index are increased to 0.99, 0.92 and 0.94, and the recall rate is further increased to 0.93 by adding a characteristic pyramid network.
In the test set, only 58 trolley number images (the area of the trolley number area accounts for 0.21% -0.3% of the whole image), 43 trolley number images can be detected by the positioning method added with the characteristic pyramid network, only 6 trolley number images can be detected by the positioning method without the characteristic pyramid network, and the effectiveness of the characteristic pyramid network on the trolley number is proved.
The train number identification adopts the train number identification network of the embodiment. The experimental environment is Ubuntu 16.04, and the computer is provided with a Tesla K40c display card. The input to the identification network is the output of the car number location network in the previous step. In the process of training and testing the car number identification network, the height of the input car number area image is scaled to 32, and the width is scaled according to the same proportion. The number of features of the hidden layer of the encoder is 256, the number of features of the hidden layer of the decoder is 512, and the training process is iterated 20000 times. The F1 index for car number character recognition is raised from 0.89 to 0.94 for the underlying network.
Fig. 3 and 4 are schematic diagrams showing comparison between two sets of train number positioning and recognition results, respectively, and referring to fig. 3 and 4, an uplink of each set of diagrams is a basic deep learning network positioning result, a downlink is a positioning result of the present embodiment, and the positioning result is an area indicated by a dashed frame. Referring to fig. 3, the car number recognition result of this embodiment is 6281442. The recognition results of the Baidu cloud universal text recognition system are three: (1) g70; (2)6261442 has been modified; (3) light oil. Referring to fig. 4, the car number recognition result of this embodiment is 5073546. The recognition results of the Baidu cloud general text recognition system are two: (1) a first-class buttercup; (2) n17AK 5075546.
It will be appreciated by those skilled in the art that the above-described exemplary embodiments are merely examples, and that other types of applications, which may occur or become known in the future, such as those applicable to the embodiments of the present invention, are also encompassed within the scope of the present invention and are hereby incorporated by reference.
It will be appreciated by those skilled in the art that the various network elements shown in fig. 1 for simplicity only may be fewer in number than in an actual network, but such omissions are clearly not to be considered as a prerequisite for a clear and complete disclosure of the inventive embodiments.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A train number accurate positioning and identification method based on deep learning is characterized by comprising the following steps:
collecting a panoramic image of the train, and adjusting the size of the panoramic image;
constructing a train number positioning network, and training the train number positioning network by taking the adjusted panoramic image as a training set;
training the train number recognition network according to the train number area output by the train number positioning network;
adjusting the size of a panoramic image of the train to be identified, and inputting the panoramic image into a trained train number positioning network to obtain a train number area accurately positioned;
and inputting the train number area into a trained train number identification network for identification to obtain a train number digital identification result.
2. The method of claim 1, wherein the resizing the panorama comprises: the panorama is scaled to a picture with a height of 600 pixels or a picture with a width of 1000 pixels.
3. The method according to claim 1, wherein the objective function of the constructed train number location network is shown as the following formula (1), and the objective function is minimized as a final objective:
Figure FDA0002287533420000011
wherein the content of the first and second substances,
Figure FDA0002287533420000012
judging whether the reliability prediction loss of the vehicle number is judged;
Figure FDA0002287533420000013
the vertical direction coordinate prediction loss of the vehicle number candidate area is obtained;
Figure FDA0002287533420000014
the horizontal direction coordinate prediction loss of the vehicle number candidate area; siA score representing the probability value that the ith reference frame is predicted to be the car number or the non-car number, namely that the ith reference frame belongs to and does not belong to the car number;
Figure FDA0002287533420000021
is the classification score of groudtruth; i is the training sample, i.e. the index of the reference box; vjIs a vector formed by the ordinate and the height of the jth reference frame;
Figure FDA0002287533420000022
is a vector consisting of the ordinate and the height of the GroudTruth; j is the index of the reference frame located within the regression range in the y direction, cxIs the abscissa of the center point of the predicted car number candidate frame;
Figure FDA0002287533420000023
is the abscissa of the central point of the group Truth;
Figure FDA0002287533420000024
is the abscissa of the center point of the reference frame; w is a(a)Is the width of the reference frame; k is the index of the reference box located within the regression range in the x-direction,
Figure FDA0002287533420000025
adopting Softmax loss;
Figure FDA0002287533420000026
loss using smooth L1; lambda [ alpha ]12Are empirical parameters, 1.0 and 2.0, respectively; ns represents the number of reference frames in the iterative process of the optimization objective function; n is a radical ofVRepresenting the number of reference frames located within the y-direction regression range; n is a radical ofdIndicating the number of reference boxes that lie within the regression range in the x-direction.
4. The method according to claim 3, wherein j is an index of a reference frame located in a regression range of the y direction, and the reference frame required to satisfy the area intersection ratio with the group Truth is greater than a set threshold; k is the index of the reference frame positioned in the regression range of the x direction and needs to meet the requirement of being positioned in the vehicle number groupTruth right boundary extends to the right by w(a)The pixel width, or the GroudTruth left boundary, extends to the left by w(a)All reference frames within the pixel width.
5. The method according to claim 3, wherein the training the train number positioning network by using the adjusted panorama as a training set comprises: for difficult samples according to difficult sample mining strategy
Figure FDA0002287533420000027
The difficult samples are marked as 1: IoU of the small text segment GroudTruth within a certain range of the left and right boundaries of the vehicle number area exceeds a set threshold, or IoU of the small text segment GroudTruth is the maximum value;
abscissa c of center point of predicted vehicle number candidate framexIn the process, a boundary-sensitive fine-grained text box accurate positioning strategy is used, namely, the manually marked whole vehicle number text line group Truth is divided into small group Truths with fixed widths, and the text line group Truth of one vehicle number is divided into 6-10 small group Truths with fixed widths; the center coordinates of the small group Truth for determining the left and right boundaries are calculated
Figure FDA0002287533420000028
And then, taking the right boundary of the whole text line group Truth as a starting point, extending the fixed pixel width to the left to generate the small group Truth, or taking the left boundary of the whole text line group Truth as a starting point, extending the fixed pixel width to the right to generate the small group Truth.
6. The method according to claim 4 or 5, wherein the set threshold is a fixed value within the interval [0.5,0.7 ].
7. The method of claim 1 wherein said train number identification network is an Attention-OCR network.
8. The method according to claim 3, wherein the training the train number positioning network by using the adjusted panorama as a training set comprises: and when the proportion of the train number area in the train panoramic image is less than 0.3%, adding a characteristic pyramid network during characteristic extraction.
CN201911166263.9A 2019-11-25 2019-11-25 Train number accurate positioning and identifying method based on deep learning Active CN110991447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911166263.9A CN110991447B (en) 2019-11-25 2019-11-25 Train number accurate positioning and identifying method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911166263.9A CN110991447B (en) 2019-11-25 2019-11-25 Train number accurate positioning and identifying method based on deep learning

Publications (2)

Publication Number Publication Date
CN110991447A true CN110991447A (en) 2020-04-10
CN110991447B CN110991447B (en) 2024-05-17

Family

ID=70086514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911166263.9A Active CN110991447B (en) 2019-11-25 2019-11-25 Train number accurate positioning and identifying method based on deep learning

Country Status (1)

Country Link
CN (1) CN110991447B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926637A (en) * 2021-02-08 2021-06-08 天津职业技术师范大学(中国职业培训指导教师进修中心) Method for generating text detection training set
CN113283418A (en) * 2021-04-15 2021-08-20 南京大学 Text detection attack method
CN113327426A (en) * 2021-05-26 2021-08-31 国能朔黄铁路发展有限责任公司 Vehicle type code identification method and device and vehicle number identification method and device
CN113371035A (en) * 2021-08-16 2021-09-10 山东矩阵软件工程股份有限公司 Train information identification method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120106802A1 (en) * 2010-10-29 2012-05-03 National Chiao Tung University Vehicle license plate recognition method and system thereof
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110472633A (en) * 2019-08-15 2019-11-19 南京拓控信息科技股份有限公司 A kind of detection of train license number and recognition methods based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120106802A1 (en) * 2010-10-29 2012-05-03 National Chiao Tung University Vehicle license plate recognition method and system thereof
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110472633A (en) * 2019-08-15 2019-11-19 南京拓控信息科技股份有限公司 A kind of detection of train license number and recognition methods based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MICHAELLIU_DEV: "CTPN(Detecting Text in Natural Image with Connectionist Text Proposal Network)算法详解", pages 1 - 2, Retrieved from the Internet <URL:https://blog.csdn.net/michaelshare/article/details/86176989> *
YUN ZHAO等: "Training Cascade Compact CNN With Region-IoU for Accurate Pedestrian", 《IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS》, vol. 21, no. 9, 13 August 2019 (2019-08-13) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926637A (en) * 2021-02-08 2021-06-08 天津职业技术师范大学(中国职业培训指导教师进修中心) Method for generating text detection training set
CN112926637B (en) * 2021-02-08 2023-06-09 天津职业技术师范大学(中国职业培训指导教师进修中心) Method for generating text detection training set
CN113283418A (en) * 2021-04-15 2021-08-20 南京大学 Text detection attack method
CN113283418B (en) * 2021-04-15 2024-04-09 南京大学 Text detection attack method
CN113327426A (en) * 2021-05-26 2021-08-31 国能朔黄铁路发展有限责任公司 Vehicle type code identification method and device and vehicle number identification method and device
CN113371035A (en) * 2021-08-16 2021-09-10 山东矩阵软件工程股份有限公司 Train information identification method and system
CN113371035B (en) * 2021-08-16 2021-11-23 山东矩阵软件工程股份有限公司 Train information identification method and system

Also Published As

Publication number Publication date
CN110991447B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
Bang et al. Encoder–decoder network for pixel‐level road crack detection in black‐box images
CN110991447B (en) Train number accurate positioning and identifying method based on deep learning
CN109447018B (en) Road environment visual perception method based on improved Faster R-CNN
Siriborvornratanakul An automatic road distress visual inspection system using an onboard in‐car camera
CN110390251B (en) Image and character semantic segmentation method based on multi-neural-network model fusion processing
CN110555433B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN111709416B (en) License plate positioning method, device, system and storage medium
CN113033604B (en) Vehicle detection method, system and storage medium based on SF-YOLOv4 network model
CN113468967A (en) Lane line detection method, device, equipment and medium based on attention mechanism
CN110008900B (en) Method for extracting candidate target from visible light remote sensing image from region to target
CN102902974A (en) Image based method for identifying railway overhead-contact system bolt support identifying information
CN114092917B (en) MR-SSD-based shielded traffic sign detection method and system
CN112633149A (en) Domain-adaptive foggy-day image target detection method and device
CN115239644B (en) Concrete defect identification method, device, computer equipment and storage medium
CN114742799A (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
CN110008899A (en) A kind of visible remote sensing image candidate target extracts and classification method
Li et al. Pixel‐Level Recognition of Pavement Distresses Based on U‐Net
CN111881984A (en) Target detection method and device based on deep learning
CN111611933A (en) Information extraction method and system for document image
Zhang et al. Image-based approach for parking-spot detection with occlusion handling
CN114463205A (en) Vehicle target segmentation method based on double-branch Unet noise suppression
Mei et al. A conditional wasserstein generative adversarial network for pixel-level crack detection using video extracted images
CN113158954A (en) Automatic traffic off-site zebra crossing area detection method based on AI technology
CN115346206B (en) License plate detection method based on improved super-resolution deep convolution feature recognition
CN116128866A (en) Power transmission line insulator fault detection method based on USRNet and improved MobileNet-SSD algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant