CN110991447A

CN110991447A - Train number accurate positioning and identification method based on deep learning

Info

Publication number: CN110991447A
Application number: CN201911166263.9A
Authority: CN
Inventors: 邹琪; 艾鑫; 罗常津; 杨文冠; 丁正刚; 胡宸瀚; 周通; 阳勇杰; 徐嫣
Original assignee: Beijing Jiaotong University; Beijing Jingwei Information Technology Co Ltd
Current assignee: Beijing Jiaotong University; Beijing Jingwei Information Technology Co Ltd
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2020-04-10
Anticipated expiration: 2039-11-25
Also published as: CN110991447B

Abstract

The invention provides a train number accurate positioning and identifying method based on deep learning, which comprises the following steps: collecting a panoramic image of the train, and adjusting the size of the panoramic image; constructing a train number positioning network, and training the train number positioning network by taking the adjusted panoramic image as a training set; training the train number recognition network according to the train number area output by the train number positioning network; adjusting the size of a panoramic image of the train to be identified, and inputting the panoramic image into a trained train number positioning network to obtain a train number area accurately positioned; and inputting the train number area into a trained train number identification network for identification to obtain a train number digital identification result. By the method, the vehicle number sequence with any length can be processed, the defects that manual features in a complex scene and an existing deep learning method are low in positioning accuracy and difficult to distinguish small-size vehicle numbers are overcome, character segmentation is not involved, and overall recognition is achieved.

Description

Train number accurate positioning and identification method based on deep learning

Technical Field

The invention relates to the technical field of intelligent transportation, in particular to a train number accurate positioning and identification method based on deep learning.

Background

The automation of the train inspection and goods inspection operation is an important target of railway informatization, and the positioning and identification of the train number are one of basic tasks of the inspection automation, so that the information of the train number can be automatically recorded, and huge manpower and material resources are prevented from being consumed. When the abnormal condition of the freight train or the passenger train is detected, the train number information is automatically used as a mark of the train identity, is linked with the safety state information of the equipment, and gives an alarm to the control center, so that the train number identification is particularly important in the automation of train inspection and goods inspection operation.

The conventional train number Identification system ((Automatic Terminal Information Service, ATIS)) is mainly realized based on Radio Frequency Identification (RFID) technology, but the accuracy of the system mainly depends on an RFID tag installed at the bottom of a train, and the RFID tag is easy to damage and lose, so that the accuracy of the train number Identification is difficult to ensure.

In recent years, the computer vision technology is used for automatically identifying the train number, the automatic monitoring of the train can be realized without additionally arranging other special devices for the train, and great convenience is brought to the identification of the train number. However, identifying car numbers from images or videos also faces challenges: firstly, the proportion of the train number area in the train image in the original image is extremely small, and a large amount of text information which is not the train number interferes, so that the small target in the oversized image (the minimum of the carriage image is 5847 × 2048, the maximum of the carriage image is 12693 × 2048, and the image belongs to the oversized resolution image) with the interfering information is detected and identified. The conventional text detection method cannot achieve a good effect under the condition. The conventional text detection method is validated on a public data set of natural scene text detection, ICDAR15, which includes two subsets: less difficult FSTD datasets and more difficult ISTD datasets. In the FSTD dataset, the text region accounts for about 8.1% of the original image; in the ISTD data set, the text area accounts for about 0.49% in the original image, whereas in the train number recognition task, the number area accounts for 0.21% to 0.40% in the original image. Secondly, the complex environment also brings difficulty to train number identification, and the complex scene contains three factors, namely the background environment is complex. Under the all-weather monitoring environment of 24 hours, the train images shot by the camera comprise scenes in different seasons (sunny days, rain and snow) and different illumination (day and night); secondly, the appearance difference of the monitored objects is large. Trains (flat cars, open cars, boxcars, tank cars, hopper cars and the like) of different types have very different appearance characteristics, the positions of car numbers on a carriage are different, and some open cars are covered by canvas, which causes difficulty in positioning the car numbers; and thirdly, the train number identification interference factor is characterized by character breakage and large number interval, and shadow, doodling, symbols and carriage fouling also bring interference to the train number identification.

The existing text positioning method based on deep learning succeeds in natural scene text detection, but the image effect of directly applying the text positioning method to the text task is not ideal. On one hand, the method is not suitable for detecting the car number with small proportion of the car number area; on the other hand, the natural scene text detection based on the word level is not suitable for the condition that the train number interval is large. Such methods easily position the car number into two separate parts, and cannot achieve positioning of a complete car number.

Therefore, in the case of the above-described complicated scene, accurate identification of the train number becomes an urgent problem to be solved.

Disclosure of Invention

The invention provides a train number accurate positioning and identifying method based on deep learning, which aims to overcome the defects in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme.

The invention provides a train number accurate positioning and identifying method based on deep learning, which comprises the following steps:

collecting a panoramic image of the train, and adjusting the size of the panoramic image;

constructing a train number positioning network, and training the train number positioning network by taking the adjusted panoramic image as a training set;

training the train number recognition network according to the train number area output by the train number positioning network;

adjusting the size of a panoramic image of the train to be identified, and inputting the panoramic image into a trained train number positioning network to obtain a train number area accurately positioned;

and inputting the train number area into a trained train number identification network for identification to obtain a train number digital identification result.

Preferably, resizing the panorama comprises: the panorama is scaled to a picture with a height of 600 pixels or a picture with a width of 1000 pixels.

Preferably, the objective function of the constructed train number positioning network is shown as the following formula (1), and the objective function is minimized as a final objective:

wherein the content of the first and second substances,

judging whether the reliability prediction loss of the vehicle number is judged;

the vertical direction coordinate prediction loss of the vehicle number candidate area is obtained;

the horizontal direction coordinate prediction loss of the vehicle number candidate area; s_iA score representing the probability value that the ith reference frame is predicted to be the car number or the non-car number, namely that the ith reference frame belongs to and does not belong to the car number;

is the classification score of groudtruth; i is the training sample, i.e. the index of the reference box; v_jIs a vector formed by the ordinate and the height of the jth reference frame;

is a vector consisting of the ordinate and the height of the GroudTruth; j is the index of the reference frame located within the regression range in the y direction, c_xIs the abscissa of the center point of the predicted car number candidate frame;

is the abscissa of the central point of the group Truth;

is the abscissa of the center point of the reference frame; w is a^(a)Is the width of the reference frame; k is the index of the reference box located within the regression range in the x-direction,

adopting Softmax loss;

loss using smoothL1Losing; lambda [ alpha ]₁,λ₂Are empirical parameters, 1.0 and 2.0, respectively; ns represents the number of reference frames in the iterative process of the optimization objective function; n is a radical of_VRepresenting the number of reference frames located within the y-direction regression range; n is a radical of_dIndicating the number of reference boxes that lie within the regression range in the x-direction.

Preferably, j is an index of a reference frame located in the y-direction regression range, and the reference frame whose area intersection ratio with the groudtruth is greater than a set threshold needs to be satisfied; k is the index of the reference frame positioned in the regression range of the x direction, and the right boundary of the reference frame positioned in the vehicle number group Truth needs to be extended to the right by w^(a)The pixel width, or the GroudTruth left boundary, extends to the left by w^(a)All reference frames within the pixel width.

Preferably, training the train number positioning network by using the adjusted panoramic image as a training set, includes: for difficult samples according to difficult sample mining strategy

The difficult samples are marked as 1: IoU of the small text segment GroudTruth within a certain range of the left and right boundaries of the vehicle number area exceeds a set threshold, or IoU of the small text segment GroudTruth is the maximum value;

abscissa c of center point of predicted vehicle number candidate frame_xIn the process, a boundary-sensitive fine-grained text box accurate positioning strategy is used, namely, the manually marked whole vehicle number text line group Truth is divided into small group Truths with fixed widths, and the text line group Truth of one vehicle number is divided into 6-10 small group Truths with fixed widths; the center coordinates of the small group Truth for determining the left and right boundaries are calculated

And then, taking the right boundary of the whole text line group Truth as a starting point, extending the fixed pixel width to the left to generate the small group Truth, or taking the left boundary of the whole text line group Truth as a starting point, extending the fixed pixel width to the right to generate the small group Truth.

Preferably, the threshold is set to a fixed value within the interval [0.5,0.7 ].

Preferably, the train number recognition network is an Attention-OCR network.

Preferably, training the train number positioning network by using the adjusted panoramic image as a training set, includes: and when the proportion of the train number area in the train panoramic image is less than 0.3%, adding a characteristic pyramid network during characteristic extraction.

According to the technical scheme provided by the method for accurately positioning and identifying the train number based on deep learning, the train number text positioning stage of the method generates the minimum rectangular frame containing the text by constructing the train number positioning network, designs the horizontal direction regression layer of the minimum rectangular frame according to the layout characteristics of the train number, and simultaneously adopts a boundary sensitive fine-grained text frame accurate positioning strategy and a difficult sample mining strategy in the network training process to obtain more accurate sample positioning. The car number text recognition adopts a deep learning method based on an attention mechanism, can realize integral recognition, can process car number sequences with any length, does not relate to a character segmentation process, and avoids error accumulation effect caused by character segmentation errors on recognition, so that real-time accurate positioning and accurate recognition are further realized, and the defect of low accuracy of manual feature positioning in a complex scene is overcome.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for accurately locating and identifying train numbers based on deep learning according to an embodiment;

fig. 2 is a schematic diagram of a division manner of a group route in the HEM and BSF policies, where (a) and (b) are schematic diagrams of a division manner of a left and right boundary group route in the HEM policy; (c) (d) is a schematic diagram of a right-left boundary GroudTruth division mode in a BSF strategy;

FIG. 3 is a schematic diagram illustrating the comparison of the train number positioning and recognition results of the embodiment;

FIG. 4 is a schematic diagram illustrating another train number positioning and identification result comparison according to the embodiment.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, and/or operations, but do not preclude the presence or addition of one or more other features, integers, steps, and/or operations. It should be understood that the term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein. In the invention, the area Intersection ratio (IoU) is also called as the overlapping degree, namely the ratio of an Intersection area to a Union area, wherein a numerator is an Intersection overlapping area between a detection frame to be calculated and a group-channel, and a denominator is the Union area.

To facilitate understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the accompanying drawings.

Examples

Fig. 1 is a schematic flow chart of a method for accurately locating and identifying a train number based on deep learning according to this embodiment, and with reference to fig. 1, the method includes:

s1, collecting a panoramic image of the train, and adjusting the size of the panoramic image.

Collecting a train panoramic image collected by a monitoring camera, and scaling the panoramic image to a picture with the height of 600 pixels or a picture with the width of 1000 pixels according to the same proportion.

The specific method comprises the following steps: inputting a panorama with an arbitrary size, scaling the width and height of the input panorama to the same scale until the height is 600 pixels, and if the width exceeds 1000 pixels at this time, continuing to reduce the input panorama until the width of the panorama is 1000 pixels.

S2, a train number positioning network is constructed, and the adjusted panoramic image is used as a training set to train the train number positioning network.

The train number positioning network of the embodiment includes: feature extraction, text box detection and boundary regression. The method mainly comprises the steps of optimizing a basic network connected with a Text suggestion network (CTPN), designing and constructing an objective function of the train number positioning network as shown in the following formula (1), and minimizing the objective function into a final target:

wherein the content of the first and second substances,

is the abscissa of the central point of the group Truth;

adopting Softmax loss;

losses with smoothL 1; lambda [ alpha ]₁,λ₂Are empirical parameters, each of 1.0 and 2.0; ns represents the number of reference frames in the iterative process of the optimization objective function; n is a radical of_VRepresenting the number of reference frames located within the y-direction regression range; n is a radical of_dIndicating the number of reference boxes that lie within the regression range in the x-direction.

The above formula mainly combines the reliability prediction loss of judging whether the car number is the car number

Vertical coordinate prediction loss of vehicle number candidate area

Horizontal direction coordinate prediction loss of vehicle number candidate area

And constructing an objective function by the aid of the three parts of multitask loss functions. The reference frame is: after extracting features of an input picture, extracting candidate text boxes (candidate boxes for short) on a feature map, setting different reference boxes for each candidate box, wherein the abscissa and the width of the reference box and the width of the candidate box are the same, and the height of the reference box and the width of the candidate box are different in a certain interval. Each reference box has 6 attributes: score belonging to car number, score not belonging to car number, ordinate, height, abscissa and width.

J is an index of a reference frame located in the y-direction regression range, and the reference frame which meets the condition that the area intersection ratio of the reference frame and the GroudTruth is greater than a set threshold value needs to be met; k is the index of the reference frame positioned in the regression range of the x direction, and the right boundary of the reference frame positioned in the vehicle number group Truth needs to be extended to the right by w^(a)The pixel width, or the GroudTruth left boundary, extends to the left by w^(a)All reference frames within the pixel width.

For difficult samples according to Hard sample Mining strategy (HEM)

Note 1, which ensures that no positive samples containing only a small fraction of the car number area are missed. The difficult sample is a small text segment G within a certain range with the left and right boundaries of the car number areaIoU of roundtrip exceeds a set threshold or is at a maximum value with IoU of the small text segment groudtruth. The way of generating the small group truth corresponding to the difficult sample is as follows: dividing manually marked whole vehicle number text line into fixed width (optional [12, 18)]Fixed value in interval), fig. 2(a) and (b) are schematic diagrams of dividing the left and right boundaries group in the HEM policy, the mode of fig. 2(b) is adopted for dividing the text line group from left to right, the mode of fig. 2(a) is adopted for dividing the text line group from right to left, the small group corresponding to the difficult sample is the inner frame of the double-line frame, the reference frame exceeding the threshold value is IoU of the double-line frame, or the reference frame with the maximum IoU of the double-line frame is the difficult sample. It can be ensured that some positive samples at the edge are not missed, and difficult sample mining at the edge is achieved, at this time, some positive samples at the edge may be discarded if the text line group try is divided in the manner of fig. 2(c) and (d). Therefore, the HEM strategy adopted by the embodiment effectively improves the situation that the left and right boundaries are insufficient due to the omission of the positive samples at the edges.

Third term of horizontal boundary regression objective function

Abscissa c of center point of predicted vehicle number candidate frame_xIn the process, a Boundary-Sensitive Fine-grained text-Box (BSF) accurate positioning strategy is used, namely, the manually marked whole vehicle number text line is divided into small groups with fixed widths, and the text line group of one vehicle number is divided into 6-10 small groups with fixed widths; the center coordinates of the small group Truth for determining the left and right boundaries are calculated

And when the small group Truth is generated, the right boundary of the whole text line group Truth is taken as a starting point, the fixed pixel width is extended leftwards to generate the small group Truth, or the left boundary of the whole text line group Truth is taken as a starting point, the fixed pixel width is extended rightwards to generate the small group Truth, so that the compact surrounding vehicle number information is ensured.

Illustratively, taking 16 pixels from a fixed width of a small group truth as an example, when a whole text line group truth is divided from left to right, if a last small group truth is less than 16 pixels wide, a right boundary of the whole text line group truth is taken as a starting point, the width of the right boundary is extended to the left by 16 pixels to generate the small group truth, fig. 2(c) is a schematic diagram of a dividing mode of the group truth obtained by the boundary sensitive fine-grained text box accurate positioning BSF policy of the embodiment, and a dashed frame is the small group truth of the right boundary. Similarly, the dashed box in FIG. 2- (d) is the small group Truth at the left boundary. It can be seen that this approach effectively ameliorates the case of left and right boundary redundancy.

Wherein, the threshold value is set as a fixed value in the interval of [0.5,0.7 ].

Therefore, in two stages of regression and positive sample selection, different grouping Truth division modes are adopted respectively. The accuracy of the regression boundary can be guaranteed, and reasonable positive samples can be selected.

Further, when the proportion of the train number area in the train panoramic image is less than 0.3%, a characteristic pyramid network is added during characteristic extraction.

In the real data set collected from the marshalling station, the images are large in size and varied in size. The long edge is scaled to 1000 pixels and the height is scaled equally during both training and testing. The object with the ratio less than 0.3% in the original image is zoomed, the width or high-pass is usually only a few pixels to a dozen pixels, and the conventional object positioning method is difficult to detect. In view of the fact that the size of the car number is variable, the positioning model needs to be able to detect both large car numbers and small car numbers. In the embodiment, the multi-scale problem is solved by adding the characteristic pyramid network, the middle-level and high-level characteristics of the characteristic pyramid network are fused, and preferably, the 4 th layer and the 5 th layer are taken for predicting the coordinates and scores of the car number area, so that the problem of car number positioning with different resolutions is solved.

Specifically, the method comprises the following steps:

a: and performing calculation of feature extraction through a convolutional network to obtain features of different levels from a bottom layer to a high layer.

B: the high-level features are up-sampled, and then convolution operation is carried out on the adjacent low levels of the high-level features, and then the high-level features and the adjacent low levels of the high-level features are fused to obtain a new feature map.

And generating reference frames with different sizes on different fusion layers by utilizing the fusion of the high layer and the low layer, and performing regression and classification operations respectively to generate the candidate regions of the train number text.

S3 train the train number recognition network according to the train number area output by train number positioning network training.

The car number recognition network is an Attention-OCR network.

S4, the size of the panoramic image of the train to be identified is adjusted and input into the trained train number positioning network, and the accurately positioned train number area is obtained.

The resizing in this step is the same as in step S1, and the panorama is scaled to a picture with a height of 600 pixels or a picture with a width of 1000 pixels.

The accurate positioning means that a positioned area frame tightly surrounds the train number, and the number or the local part of the number is not omitted, and the area does not contain redundant areas.

And S5, inputting the train number area into a trained train number recognition network for recognition to obtain a train number recognition result.

Inputting the obtained train number area into a trained Attention-OCR network for recognition, processing the characteristic sequence through a convolution layer, an encoding layer and a decoding layer to obtain specific train number content, and outputting a 7-digit train number recognition result

The following are simulation experiments performed using the above method:

since the continuous 7 digital texts on the surface of the train are the required train number information, other letters, Chinese characters and identifiers are interference texts. Therefore, only the vehicle number information needs to be identified in the positioning and identifying process, and interference of other information is avoided.

The simulation embodiment divides the acquired train panorama according to the proportion of 2:1 of the training set and the test set. The test set is used for testing and evaluating the effect of the method, and has no intersection with the training set.

4352 panoramic images of trains shot by a monitoring camera from a certain freight train station are randomly marked in the experimental process to serve as a training set, and each image comprises a train number area. The test set used 2109 images labeled randomly without intersection with the training set.

The public evaluation indexes for the car number positioning and car number identification results are 3: precision, recall, F1-Measure (F1 value). F1-Measure is the harmonic mean of precision rate and recall rate. The accuracy rate represents the ratio of the number of correct results obtained by positioning or recognition to the total number of all positioning or recognition results, and the recall rate represents the ratio of the number of correct results obtained by positioning or recognition to the number of all manually labeled GroundTruth results. All 3 indexes are between 0 and 1, and the closer to 1, the better the effect is.

The train number positioning adopts the train number positioning network of the embodiment. The experimental environment is Ubuntu 16.04, and the computer is provided with a Tesla K40c display card. The input image may be of any size, the width and height of the input image being scaled up to a height of 600 pixels, and if the width exceeds 1000 pixels at this time, the input image continues to be scaled down until the width of the image is 1000 pixels. When the reference frame is generated, its width is set to a fixed width of 16 pixels, IoU being selected to be 0.5.

The feature pyramid network used in this simulation example is a newp5 layer obtained for the convolution layer with the 5 th layer of the feature extraction layer connected to 1 × 1.

The high-level feature (newp5) is up-sampled by 2 times, and then the 4 th level is subjected to 1 x 1 convolution operation, and then the two are fused to obtain a new feature map (newp 4). Generating a reference frame with the width fixed to 16 and the height uniformly changed by 0.7 times from 11 to 283 on the new feature layer according to the features of the newp5 layers; reference frames with fixed width of 16 and uniformly varying height of 0.7 from 6 to 142 were generated from the newp4 layer features.

In a boundary-sensitive fine-grained textbox strategy and a difficult sample mining strategy, a complete textline GroudTruth is divided into 6 small GroudTruths with fixed widths of 16.

The basic network in the prior art is adopted to position the car number, and the obtained accuracy rate, recall rate and F1 indexes are 0.86, 0.81 and 0.83 respectively (the indexes are between 0 and 1, the closer to 1, the better the description effect).

Only a horizontal direction regression layer is added on a basic network, the obtained accuracy rate, the recall rate and the F1 index are respectively increased to 0.89, 0.84 and 0.86, a boundary sensitive fine-grained text box accurate positioning strategy and a difficult sample mining strategy are added in the horizontal direction regression layer, the accuracy rate, the recall rate and the F1 index are increased to 0.99, 0.92 and 0.94, and the recall rate is further increased to 0.93 by adding a characteristic pyramid network.

In the test set, only 58 trolley number images (the area of the trolley number area accounts for 0.21% -0.3% of the whole image), 43 trolley number images can be detected by the positioning method added with the characteristic pyramid network, only 6 trolley number images can be detected by the positioning method without the characteristic pyramid network, and the effectiveness of the characteristic pyramid network on the trolley number is proved.

The train number identification adopts the train number identification network of the embodiment. The experimental environment is Ubuntu 16.04, and the computer is provided with a Tesla K40c display card. The input to the identification network is the output of the car number location network in the previous step. In the process of training and testing the car number identification network, the height of the input car number area image is scaled to 32, and the width is scaled according to the same proportion. The number of features of the hidden layer of the encoder is 256, the number of features of the hidden layer of the decoder is 512, and the training process is iterated 20000 times. The F1 index for car number character recognition is raised from 0.89 to 0.94 for the underlying network.

Fig. 3 and 4 are schematic diagrams showing comparison between two sets of train number positioning and recognition results, respectively, and referring to fig. 3 and 4, an uplink of each set of diagrams is a basic deep learning network positioning result, a downlink is a positioning result of the present embodiment, and the positioning result is an area indicated by a dashed frame. Referring to fig. 3, the car number recognition result of this embodiment is 6281442. The recognition results of the Baidu cloud universal text recognition system are three: (1) g70; (2)6261442 has been modified; (3) light oil. Referring to fig. 4, the car number recognition result of this embodiment is 5073546. The recognition results of the Baidu cloud general text recognition system are two: (1) a first-class buttercup; (2) n17AK 5075546.

It will be appreciated by those skilled in the art that the above-described exemplary embodiments are merely examples, and that other types of applications, which may occur or become known in the future, such as those applicable to the embodiments of the present invention, are also encompassed within the scope of the present invention and are hereby incorporated by reference.

It will be appreciated by those skilled in the art that the various network elements shown in fig. 1 for simplicity only may be fewer in number than in an actual network, but such omissions are clearly not to be considered as a prerequisite for a clear and complete disclosure of the inventive embodiments.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A train number accurate positioning and identification method based on deep learning is characterized by comprising the following steps:

2. The method of claim 1, wherein the resizing the panorama comprises: the panorama is scaled to a picture with a height of 600 pixels or a picture with a width of 1000 pixels.

3. The method according to claim 1, wherein the objective function of the constructed train number location network is shown as the following formula (1), and the objective function is minimized as a final objective:

wherein the content of the first and second substances,

is the abscissa of the central point of the group Truth;

adopting Softmax loss;

loss using smooth L1; lambda [ alpha ]₁,λ₂Are empirical parameters, 1.0 and 2.0, respectively; ns represents the number of reference frames in the iterative process of the optimization objective function; n is a radical of_VRepresenting the number of reference frames located within the y-direction regression range; n is a radical of_dIndicating the number of reference boxes that lie within the regression range in the x-direction.

4. The method according to claim 3, wherein j is an index of a reference frame located in a regression range of the y direction, and the reference frame required to satisfy the area intersection ratio with the group Truth is greater than a set threshold; k is the index of the reference frame positioned in the regression range of the x direction and needs to meet the requirement of being positioned in the vehicle number groupTruth right boundary extends to the right by w^(a)The pixel width, or the GroudTruth left boundary, extends to the left by w^(a)All reference frames within the pixel width.

5. The method according to claim 3, wherein the training the train number positioning network by using the adjusted panorama as a training set comprises: for difficult samples according to difficult sample mining strategy

6. The method according to claim 4 or 5, wherein the set threshold is a fixed value within the interval [0.5,0.7 ].

7. The method of claim 1 wherein said train number identification network is an Attention-OCR network.

8. The method according to claim 3, wherein the training the train number positioning network by using the adjusted panorama as a training set comprises: and when the proportion of the train number area in the train panoramic image is less than 0.3%, adding a characteristic pyramid network during characteristic extraction.