JP2022025008A

JP2022025008A - License plate recognition method based on text line recognition

Info

Publication number: JP2022025008A
Application number: JP2021105233A
Authority: JP
Inventors: 黄徳双; De Shuang Huang; 秦魏; Wei Qin
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-07-28
Filing date: 2021-06-24
Publication date: 2022-02-09
Anticipated expiration: 2041-06-24
Also published as: CN111914838A; CN111914838B; JP7246104B2

Abstract

To provide a license plate recognition method based on text line recognition having high robustness and high performance.SOLUTION: The license plate recognition method includes: a step of acquiring an original image; a license plate detection step of detecting a license plate part in the original image to obtain a license plate image; a text line detection step of detecting a text line on the license plate through a text detection network to obtain a license plate text line image; and a text line identification step of inputting the license plate text line image into a license plate text line identification network, and finally outputting a character sequence of the license plate text line to complete license plate identification.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理とモード識別に基づくナンバープレート識別技術に関し、特にテキスト行識別に基づくナンバープレート識別方法に関する。 The present invention relates to a license plate identification technique based on image processing and mode identification, and more particularly to a license plate identification method based on text line identification.

ナンバープレート検出と識別は、典型的なコンピュータ視覚分野のタスクであり、インテリジェント交通システムにおいて広い応用の将来性がある。現代交通システムの発展に伴い、交通量は、急速に増加し、ナンバープレート識別システムは、交通管理、公共安全などを補助することができる。 License plate detection and identification is a typical computer visual field task and has wide application potential in intelligent transportation systems. With the development of modern transportation systems, traffic volume will increase rapidly, and license plate identification systems can assist in traffic management, public safety, etc.

過去十数年来、ナンバープレート識別問題は、業界で広く注目されている。画質に影響を与えるさまざまな要因、例えば撮影環境（照明、位置、焦点ぼけなど）、ピクチャ品質（解像度など）及び複雑な撮影背景を考慮すると、任意のシーンでのナンバープレート識別タスクは、依然として困難に直面している。 For the past decade or so, the license plate identification problem has received widespread attention in the industry. Given the various factors that affect image quality, such as the shooting environment (lighting, position, defocus, etc.), picture quality (resolution, etc.) and complex shooting backgrounds, the license plate identification task in any scene remains difficult. Facing.

既存のいくつかのナンバープレート識別システムの識別方法は、主にナンバープレート検出、キャラクタ分割及びシーン文字識別のステップを含む。ナンバープレート識別は、自然画像からナンバープレートの位置を検出し、且つ検出されたナンバープレート上からテキスト情報を識別するという二つの部分にまとめることができる。既存のいくつかのナンバープレート識別システムのワークフローにおいて、あるものは、入力された自然画像から出力されたテキスト内容までの完全なワークフローを実現することに重点を置き、あるワークフローは、識別の正確性を向上させるために、ナンバープレート検出の前に車両検出を加えている。 The identification method of some existing license plate identification systems mainly includes steps of license plate detection, character division and scene character identification. License plate identification can be summarized in two parts: the position of the license plate is detected from the natural image, and the text information is identified from the detected license plate. Among the workflows of some existing license plate identification systems, some focus on achieving a complete workflow from the input natural image to the output text content, some of which are identification accuracy. Vehicle detection is added before license plate detection in order to improve.

既存のナンバープレート識別方法は、ディープラーニングに基づく方法と非ディープラーニングに基づく方法の二つに分けることができる。ディープラーニングが発展する前に、一般的には色情報、テキスト情報、またはナンバープレートのエッジ情報に基づいてナンバープレートを大まかに識別する。使用される方法は、一般的には有限ボルツマンマシンまたはサポートベクトルマシンである。 The existing license plate identification method can be divided into a method based on deep learning and a method based on non-deep learning. Prior to the development of deep learning, license plates are generally identified roughly based on color information, textual information, or license plate edge information. The method used is generally a finite Boltzmann machine or a support vector machine.

近年、ディープラーニングの発展に伴い、キャラクタ分割に基づくナンバープレート識別方法が比較的に流行している。キャラクタ分割に基づく方法には、事前に分割されたトレーニングデータを必要とするため、トレーニングデータへのタグ付けが困難であり、且つそれは、フォントファイルを利用して画像を自動的に生成するため、ナンバープレート識別の効果とロバスト性が比較的に低い。 In recent years, with the development of deep learning, the license plate identification method based on character division has become relatively popular. The method based on character division requires pre-divided training data, which makes it difficult to tag the training data, because it uses font files to automatically generate images. The effect of license plate identification and robustness are relatively low.

本発明の目的は、上記従来の技術の欠陥を克服するために、識別効果及びロバスト性を向上させるテキスト行識別に基づくナンバープレート識別方法を提供することである。 An object of the present invention is to provide a license plate identification method based on text line identification that improves the identification effect and robustness in order to overcome the above-mentioned defects of the prior art.

本発明の目的は、以下の技術案よって実現されてもよい。 The object of the present invention may be realized by the following technical proposals.

テキスト行識別に基づくナンバープレート識別方法であって、
オリジナル画像を取得するＳ１と、
オリジナル画像におけるナンバープレート部分を検出し、ナンバープレート画像を得るナンバープレート検出ステップＳ２と、
ＣＰＴＮネットワークによってナンバープレート上のテキスト行を検出し、ナンバープレートテキスト行画像を得るテキスト行検出ステップＳ３と、
ナンバープレートテキスト行画像をナンバープレートテキスト行識別ネットワークに入力し、最終的にナンバープレートテキスト行のキャラクタシーケンスを出力し、ナンバープレート識別を完了するテキスト行識別ステップＳ４とを含む。 It is a license plate identification method based on text line identification,
S1 to acquire the original image and
The license plate detection step S2, which detects the license plate portion in the original image and obtains the license plate image,
Text line detection step S3, which detects a text line on the license plate by the CPTN network and obtains a license plate text line image,
The license plate text line image is input to the license plate text line identification network, and finally the character sequence of the license plate text line is output, and the text line identification step S4 for completing the license plate identification is included.

さらに、前記ステップＳ２において、ＹＯＬＯｖ３ネットワークによってオリジナル画像におけるナンバープレート部分を検出する。 Further, in step S2, the license plate portion in the original image is detected by the YOLOv3 network.

さらに、前記ＹＯＬＯｖ３ネットワークは、オリジナル画像の特徴図ディメンションを五回低減させ、それぞれ第一の特徴図、第二の特徴図、第三の特徴図、第四の特徴図及び第五の特徴図を得て、その後それぞれ第三の特徴図、第四の特徴図及び第五の特徴図をアップサンプリングを経た後にスプライスし、最後に特徴テンソルを出力し、ナンバープレートに対する検出を完了する。 Further, the YOLOv3 network reduces the feature diagram dimension of the original image five times, and displays the first feature diagram, the second feature diagram, the third feature diagram, the fourth feature diagram, and the fifth feature diagram, respectively. After that, the third feature diagram, the fourth feature diagram, and the fifth feature diagram are upsampled and then spliced, and finally the feature tensor is output to complete the detection for the license plate.

さらに、前記ステップＳ３は、具体的には、
回帰モデルを利用してＣＰＴＮネットワークの垂直検出フレームを予測して得るＳ３１と、
予測して得られる垂直検出フレームに対して、ＣＰＴＮネットワークにおいて発生する可能性のある水平方向での測位が正確でないことを防止できる境界最適化を行うＳ３２と、
垂直方向上に重畳程度が設定される閾値に達する垂直検出フレームを一つの検出フレームに合併し、最終の垂直検出フレームを得て、垂直検出フレームの合併は、ＣＴＰＮネットワークが同一ラインのテキストを二つの部分に分割することを防止できるＳ３３と、
ＣＰＴＮネットワークは、垂直検出フレームによってテキスト行の検出を行い、ナンバープレートテキスト行画像を得るＳ３４と、を含む。 Further, in step S3, specifically,
S31 obtained by predicting the vertical detection frame of the CPTN network using the regression model, and
For the predicted vertical detection frame, S32 that performs boundary optimization that can prevent the horizontal positioning that may occur in the CPTN network from being inaccurate, and
The vertical detection frames that reach the threshold for which the degree of superimposition is set in the vertical direction are merged into one detection frame to obtain the final vertical detection frame. S33, which can prevent division into two parts,
The CPTN network includes S34, which detects text lines by vertical detection frames and obtains a license plate text line image.

さらに、前記垂直フレーム検出フレームのセンター位置ｔ_ｃと高さｔ_ｈの計算方法は、以下の通りである。 Further, the calculation method of the center position _t _c and the height th of the vertical frame detection frame is as follows.

そのうち、ｃ^ｂ _ｙが境界フレームのセンター位置であり、ｈ^ｂが境界フレームの高さであり、ｃ^ａ _ｙがアンカーフレームのセンターであり、ｈ^ａがアンカーフレームの高さであり、前記境界最適化は、各垂直フレーム検出フレームが一つの水平方向検出のオフセット量ｔ_ｗを計算し、該オフセット量ｔ_ｗの計算式は、以下の通りである。 Of these, c by is the center position of the boundary frame, _{h b is the height of the boundary frame, c a y} ^is ^the ^center _of the anchor frame, and ^ha is the height of the anchor frame. In the optimization, each vertical frame detection frame calculates an offset amount _tw for one horizontal detection, and the calculation formula for the offset amount _tw is as follows.

そのうち、ｘ^ａ _ｓｉｄｅが実際のナンバープレート水平境界に最も近い座標であり、ｃ^ａ _ｘが垂直検出フレームセンター位置のｘ座標であり、ｗ^ａが垂直検出フレームフレームの幅である。 Among them, x ^a _side is the coordinate closest to the actual license plate horizontal boundary, c ^a _x is the x coordinate of the vertical detection frame center position, and w ^a is the width of the vertical detection frame frame.

さらに、前記ナンバープレートテキスト識別ネットワークは、補正ネットワークとテキスト識別ネットワークを含み、それぞれナンバープレートテキスト行画像に対する補正と文字識別を実現し、前記補正ネットワークは、二次元変換によって歪みとねじれのテキストを補正し、前記テキスト識別ネットワークは、エンコーダ－デコーダパラダイムのアテンションメカニズムが内蔵されているｓｅｑ２ｓｅｑネットワークを採用する。 Further, the license plate text identification network includes a correction network and a text identification network to realize correction and character identification for the license plate text line image, respectively, and the correction network corrects distortion and twisted text by two-dimensional transformation. However, the text identification network employs a seq2seq network having a built-in attention mechanism of the encoder-decoder paradigm.

さらに、前記補正ネットワークは、測位ネットワークを含み、前記測位ネットワークは、オリジナルテキスト行の制御点ベクトル群Ｃに対して予測を行い、且つ逆伝播勾配によって、補正テキスト行の制御点ベクトル群Ａ^ｒを回帰分析して得て、前記補正ネットワークは、オリジナルテキスト行制御点ベクトル群Ａと補正テキスト行制御点ベクトル群Ａ^ｒとの間の関係に基づき、オリジナルテキスト行画像に対して二次元変換を行い、補正後のテキスト行画像を得る。 Further, the correction network includes a positioning network, and the positioning network makes a prediction with respect to the control point vector group C of the original text line, and the control point vector group ^Ar of the correction text line is determined by the back propagation gradient. Obtained by regression analysis, the correction network performs two-dimensional conversion on the original text line image based on the relationship between the original text line control point vector group A and the correction text line control point vector group ^Ar . , Get the corrected text line image.

さらに好ましくは、前記測位ネットワークは、六つのコンボリューションフィルタレイヤ、五つの最大プールレイヤ及び二つの完全接続レイヤグループレイヤを含み、前記制御点は、五つを含み、それぞれは、ナンバープレートテキスト行の四つの頂点と対角線の交点である。 More preferably, the positioning network includes six convolution filter layers, five maximum pool layers and two fully connected layer group layers, the control points include five, each of which is a license plate text line. It is the intersection of the four vertices and the diagonal line.

さらに、前記テキスト識別ネットワークがナンバープレートテキスト行画像に対して文字識別を行うことは、具体的には、
エンコーダは、コンボリューションニューラルネットワークを使用して補正後のナンバープレートテキスト行画像上からテキスト特徴図を抽出し、その後テキスト特徴図を分割し且つ双方向ＬＳＴＭネットワークに入力した後、テキスト特徴シーケンスを得るＳ４１と、

デコーダは、コンテキストベクトル、デコーダ内部状態及び前ステップの出力を利用し、アテンションメカニズムとＧＲＵサイクルネットワークユニットによって、各キャラクタとシーケンスターミネータの確率を出力し、現在のテキストシンボルを予測するＳ４３と、を含む。 Further, specifically, the text identification network performs character identification on the license plate text line image.
The encoder uses a convolution neural network to extract a text feature diagram from the corrected license plate text line image, then divides the text feature diagram and inputs it into a bidirectional LSTM network to obtain a text feature sequence. With S41

The decoder utilizes the context vector, the decoder internal state and the output of the previous step, and includes S43, which outputs the probability of each character and sequence terminator by the attention mechanism and the GRU cycle network unit, and predicts the current text symbol. ..

さらに、前記デコーダの計算式は、以下の通りである。 Further, the calculation formula of the decoder is as follows.

そのうち、数式４がデコーダによってステップｔ_２で出力される現在のテキストシンボル予測確率であり、数式５がデコーダのステップｔ_２の内部状態ベクトルであり、数式６がデコーダのステップｔ_２－１の内部状態ベクトルであり、数式７デコーダのステップｔ_２のコンテキストベクトルであり、数式８がデコーダのステップｔ_２－１の出力であり、ｒｎｎがＧＲＵサイクルネットワークユニットであり、数式９がキャラクタ分類確率図であり、Ｗ_０が完全接続ネットワークパラメータであり、ｂ_０が完全接続ネットワークオフセット量である。 Of these, Equation 4 is the current text symbol prediction probability _output by the decoder in step t2, Equation ₅ is the internal state vector of step t2 of the decoder, and Equation 6 is the inside of step _t2-1 of the decoder. It is a state vector, the context vector of step t2 of the equation ₇ decoder, equation 8 is the output of step _t2-1 of the decoder, rnn is the GRU cycle network unit, and equation 9 is the character classification probability diagram. Yes, W ₀ is the fully connected network parameter, and b ₀ is the fully connected network offset amount.

前記デコーダのステップｔ_２のコンテキストベクトル数式７は、アテンションメカニズムによって得られ、その数式は、以下の通りである。 _The context vector equation 7 in step t2 of the decoder is obtained by an attention mechanism, and the equation is as follows.

そのうち、Ｗ_ｃｏｎｖがコンボリューションネットワークパラメータであり、数式１１が完全接続ネットワークパラメータであり、数式１２がエンコーダのステップｔ_１のテキスト特徴シーケンスであり、数式１３が重み付けパラメータであり、Ｔが入力シーケンスの長さであり、数式１４がｋ時刻重み付けパラメータであり、ｖ、Ｗ、Ｖがいずれも完全接続ネットワークパラメータであり、ＢＬＳＴＭが双方向ＬＳＴＭネットワークであり、数式１５がエンコーダのステップｔ_１のテキスト特徴図を分割した後に得られるベクトルシーケンスであり、数式１６がエンコーダのステップｔ_１－１のテキスト特徴シーケンスである。 Of these, W _conv is the convolution network parameter, Equation 11 is the fully connected network parameter, Equation ₁₂ is the text feature sequence of step t1 of the encoder, Equation 13 is the weighting parameter, and T is the input sequence. The length, math 14 is the k-time weighting parameter, v, W, V are all fully connected network parameters, BLSTM is the bidirectional LSTM network, and math ₁₅ is the text feature of step t1 of the encoder. It is a vector sequence obtained after dividing the figure, and Equation 16 is the text feature sequence of step t _1-1 of the encoder.

従来技術に比べて、本発明は、以下の利点を有する。
１）本発明は、キャラクタ分割がない全新のナンバープレート識別方法を提案し、キャラクタ分割の代わりにナンバープレートテキスト行検出を導入し、キャラクタ分割に比べて、本発明は、同一ラインの連続テキストを全体として検出し、トレーニングに対する後続の識別モデルの効果を向上させ、既存のナンバープレート識別方法と比べて、本発明の方法は、モデルのロバスト性を向上させ、モデルの識別の正確度を向上させる。
２）本発明は、ナンバープレート識別問題を古典的なコンピュータ視覚問題－－画像に基づくシーケンス識別に転化し、このように、トレーニングデータにはナンバープレートの二次元座標と識別する必要なキャラクタシーケンスのみを必要として、モデルトレーニングの時間とコストを節約する。
３）本発明では、ナンバープレートテキスト行検出によって、複数行のテキストのナンバープレートに適用でき、複数の国と地域の異なるナンバープレートを識別することができるため、本発明は、日常の都市交通管理に運用できるだけでなく、都市間、ひいては国際交通管理にも運用でき、スマートシティ建設の重要な構成部分となり、人工知能技術と中国の都市建設、道路建設、交通管理との結合に対して積極的な推進意義がある。 The present invention has the following advantages over the prior art.
1) The present invention proposes a completely new license plate identification method without character division, introduces license plate text line detection instead of character division, and compared to character division, the present invention provides continuous text on the same line. Detecting as a whole, improving the effectiveness of subsequent identification models on training, the method of the invention improves the robustness of the model and improves the accuracy of identification of the model compared to existing license plate identification methods. ..
2) The present invention transforms the license plate identification problem into a classical computer visual problem-image-based sequence identification, thus in which the training data contains only the character sequences that need to be identified from the license plate's two-dimensional coordinates. Need to save time and cost of model training.
3) In the present invention, the license plate text line detection can be applied to a license plate of a plurality of lines of text, and different license plates of a plurality of countries and regions can be identified. Therefore, the present invention has daily urban traffic management. It can be operated not only in the city but also in the intercity and international traffic management, and it is an important component of smart city construction, and it is positive for the combination of artificial intelligence technology with urban construction, road construction and traffic management in China. There is significance in promoting it.

本発明の方法の全体フローチャートである。It is an overall flowchart of the method of this invention. ナンバープレート検出のためのＹＯＬＯｖ３ネットワーク構造概略図である。It is a schematic diagram of the YOLOv3 network structure for license plate detection. ナンバープレートテキスト行検出のためのネットワークモデル概略図である。It is a schematic diagram of a network model for license plate text line detection. ナンバープレートテキスト行補正のためのネットワークモデル概略図である。It is a schematic diagram of a network model for license plate text line correction. ナンバープレートテキスト行制御点を予測するための測位ネットワーク概略図である。It is a schematic of the positioning network for predicting the license plate text line control point. ナンバープレートテキスト行識別のためのネットワークモデル概略図である。It is a schematic diagram of a network model for license plate text line identification.

以下、添付図面と具体的な実施例を参照しながら、本発明について詳細に説明する。明らかに、記述された実施例は、本発明の一部の実施例であり、全ての実施例ではない。本発明における実施例に基づき、当業者が創造的な労力を払わない前提で得られたすべての他の実施例は、いずれも本発明の保護範囲に属すべきである。 Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and specific examples. Obviously, the examples described are examples of some of the invention, not all of them. Based on the examples in the present invention, all other examples obtained on the premise that those skilled in the art do not make creative efforts should all belong to the scope of protection of the present invention.

実施例；
図１に示すように、本発明は、テキスト行識別に基づくナンバープレート識別方法を提供し、自然シーンでナンバープレートを識別でき、該方法は、コンボリューションニューラルネットワークとサイクルニューラルネットワークに基づき、主にナンバープレート検出（ＬＰＤ）、ナンバープレートテキスト検出（ＬＰＴｅｘｔＤｅｔｅｃｔｉｏｎ）及びナンバープレートテキスト識別（ＬＰＴｅｘｔＲｅｃｔｉｆｉｃａｔｉｏｎａｎｄＲｅｃｏｇｎｉｔｉｏｎ）の三つのステップを含む。 Example;
As shown in FIG. 1, the present invention provides a license plate identification method based on text line identification, which can identify license plates in a natural scene, which method is mainly based on a convolution neural network and a cycle neural network. It includes three steps: license plate detection (LPD), license plate text detection (LP Text Detection) and license plate text identification (LP Text Detection and Recognition).

そのうち、ナンバープレート検出ステップにおいて、ＹＯＬＯｖ３ネットワークによってオリジナル画像におけるナンバープレート部分を検出し、例えば、図１において、オリジナル画像がオートバイクに乗っている人であり、ナンバープレート検出を経た後、オリジナル画像におけるナンバープレート一部のサブ画像を抽出する。 Among them, in the license plate detection step, the license plate portion in the original image is detected by the YOLOv3 network. For example, in FIG. 1, the original image is a person riding a motorcycle, and after the license plate detection, the license plate portion is detected. Extract a sub-image of a part of the license plate.

ナンバープレートテキスト行検出ステップにおいて、ＣＰＴＮネットワークによってナンバープレート上のテキスト行を分割し、国際上によく見られるナンバープレート上のテキストは、単一行と複数行に分けられてもよい。複数行のテキストに対して、後続の識別作業を容易にするために、まず二行のテキストを複数の単一行のテキストに分割する必要がある。単一行のテキストに対して、画像から直感的に見ると、前のステップのナンバープレート検出は、必ずしもナンバープレート上のテキスト行の位置を正確に測位することができないため、このステップも不可欠である。図１のように、一つの二行のナンバープレートテキストが上下二行に分割され、単独で後続ネットワークに送られてテキスト識別が行われる。 In the license plate text line detection step, the text line on the license plate is divided by the CPTN network, and the text on the license plate commonly found internationally may be divided into a single line and multiple lines. For multiple lines of text, the two lines of text must first be split into multiple single lines of text to facilitate subsequent identification. This step is also essential because the license plate detection of the previous step cannot always accurately position the text line on the license plate when viewed intuitively from the image for a single line of text. .. As shown in FIG. 1, one two-line license plate text is divided into upper and lower two lines, and is independently sent to a subsequent network for text identification.

ナンバープレートテキスト行識別ステップは、ＴＰＳに基づく補正ネットワークと、アテンションメカニズムを含むＳｅｑ２Ｓｅｑモデルに基づく識別ネットワークとによって、テキスト行の文字を識別し、ナンバープレート識別を完了し、撮影角度などの問題の影響で、ナンバープレートが画像中でねじれてしまう可能性があり、テキスト識別の効果を向上させるために、識別前にテキスト行に対して補正操作を行う必要がある。図１において、最終的に二つのテキスト行をそれぞれ補正と識別した後、完全なナンバープレートの識別結果を得る。 The license plate text line identification step identifies the characters in the text line by a correction network based on TPS and an identification network based on the Seq2Seq model including an attention mechanism, completes the license plate identification, and is affected by problems such as shooting angle. Therefore, the license plate may be twisted in the image, and in order to improve the effect of text identification, it is necessary to perform a correction operation on the text line before identification. In FIG. 1, after finally distinguishing the two lines of text as corrections, the complete license plate identification result is obtained.

三つのステップの具体的な実行過程は、以下の通りである。 The specific execution process of the three steps is as follows.

（１）ナンバープレート検出ステップ
ＹＯＬＯｖ３ネットワークを使用してオリジナル画像におけるナンバープレート部分を検出する時、オリジナル入力画像をまずグリッドに分割する必要があり、ナンバープレートのセンターがグリッドユニットにある場合、該グリッドは、ナンバープレート検出を担当する。 (1) License plate detection step When detecting the license plate part in the original image using the YOLOv3 network, it is necessary to first divide the original input image into a grid, and if the center of the license plate is in the grid unit, the grid. Is in charge of license plate detection.

図２に示すように、ＹＯＬＯｖ３ネットワークのバックボーンネットワークは、古典的なＤａｒｋｎｅｔ－５３であり、主に５３レイヤのコンボリューションネットワークで構成され、ｂｏｔｔｏｍ－ｕｐ経路、ｔｏｐ－ｄｏｗｎ経路及びサイド接続を含む。 As shown in FIG. 2, the backbone network of the YOLOv3 network is the classic Darknet-53, which is primarily composed of 53 layer convolution networks, including bottom-up paths, top-down paths and side connections.

本発明は、入力画像の解像度を６０８＊６０８に設定し、Ｄａｒｋｎｅｔ－５３のネットワーク構造に従って、特徴図のディメンションを五回低減させる：３０４、１５２、７６、３８、１９。異なるサイズのターゲットを検出する時のネットワークの効果を向上させるために、ＹＯＬＯｖ３ネットワークは、三種類の異なる次元の特徴図を使用してナンバープレートを検出し、それぞれは７６、３８、１９であり、異なるサイズの特徴テンソルに対してアップサンプリングを経た後にスプライスを行い、最後に出力される特徴テンソルは、高い正確性を有するだけでなく、さらに高い意味性を有する。境界フレームの回帰複雑性を低減させるために、本発明は、Ｆａｓｔｅｒ－ＲＣＮＮにおけるＡｎｃｈｏｒＢｏｘの概念又はＳＳＤにおけるＰｒｉｏｒＢｏｘの概念を導入し、ｋ－ｍｅａｎｓクラスタリング方法を使用してＰｒｉｏｒＢｏｘを得る。 The present invention sets the resolution of the input image to 608 * 608 and reduces the dimensions of the feature diagram five times according to the Darknet-53 network structure: 304, 152, 76, 38, 19. To improve the effectiveness of the network when detecting targets of different sizes, the YOLOv3 network detects license plates using three different dimensional feature diagrams, 76, 38, 19 respectively. The feature tensors of different sizes are upsampled and then spliced, and the final output feature tensor is not only highly accurate but also more meaningful. In order to reduce the regression complexity of the boundary frame, the present invention introduces the concept of Anchor Box in Faster-RCNN or the concept of Prior Box in SSD and obtains Prior Box using the k-means clustering method.

（２）ナンバープレートテキスト行検出ステップ
本発明のナンバープレート識別方法は、複数の国と地域のナンバープレートに適用され、周知のように、国内でよく見られるナンバープレートの文字は、いずれも単一行であるが、他の国のナンバープレートの文字が複数行であることを考慮すると、後続の文字識別を容易にするために、ナンバープレートの文字を行ごとに検出する必要があると考えられる。単一行のテキストのナンバープレートに対して、該ステップは、検出領域と実際領域のＩｏＵ値を向上させることができる。 (2) License plate text line detection step The license plate identification method of the present invention is applied to license plates of a plurality of countries and regions, and as is well known, the characters of license plates commonly found in Japan are all single lines. However, considering that the characters on the license plates of other countries are multiple lines, it is considered necessary to detect the characters on the license plate line by line in order to facilitate the subsequent character identification. For single-line text license plates, the step can improve the IoU values in the detection area and the actual area.

一般的な検出ターゲットとは異なり、テキスト行は、一つのキャラクタシーケンスであり、一貫性の意味を有する。領域生成ネットワーク（ＲＰＮ）は、ナンバープレートテキスト行の開始位置と終了位置を測位することが比較的に困難であるため、ＣＴＰＮモデルを採用してナンバープレートテキスト行を検出する。 Unlike common detection targets, a line of text is a character sequence and has the meaning of consistency. Since it is relatively difficult for the region generation network (RPN) to determine the start position and the end position of the license plate text line, the CTPN model is adopted to detect the license plate text line.

ＣＴＰＮネットワークは、垂直フレームを導入してテキスト行を検出し、垂直フレームは、一組の等幅の検出フレームであり、それらの高さは、それぞれ異なり、一つの垂直フレームは、センター位置と高さの二つの指標で決定されてもよい。ＣＰＴＮネットワークにおいて、一つの回帰モデルを用いて垂直フレームを予測する。垂直フレームのセンター位置ｔ_ｃと高さｔ_ｈの計算方法は、以下の通りである。 The CTPN network introduces vertical frames to detect lines of text, vertical frames are a set of monospaced detection frames, their heights are different, and one vertical frame is center position and height. It may be determined by the above two indicators. In the CPTN network, one regression model is used to predict vertical frames. The calculation method of the center position _t _c and the height th of the vertical frame is as follows.

そのうち、ｃ^ｂ _ｙとｈ^ｂは、それぞれ境界フレームのセンター位置と高さを表し、ｃ^ａ _ｙとｈ^ａは、入力画像に基づいて事前に計算し、計算を助けることができる。しかし、画像が水平方向上に１６画素の等幅の領域に分割されているため、テキスト行検出フレームが水平方向上にも実際のナンバープレート領域を完全にカバーできることを保証できず、ＣＰＴＮモデルにおいて、水平方向での測位が正確でない状況が発生する可能性がある。この問題を解决するために、境界最適化の方法を導入し、各垂直フレームが一つの水平方向検出のオフセット量を計算し、このオフセット量の計算方法は、以下の通りである。 Of these, ^c by and h ^b represent the center position and height of the boundary frame, respectively, and ^ca _y and _ha ^can be calculated in advance based on the input image to assist the calculation. However, since the image is horizontally divided into areas of equal width of 16 pixels, it cannot be guaranteed that the text line detection frame can completely cover the actual license plate area in the horizontal direction, and in the CPTN model. , There may be situations where horizontal positioning is not accurate. In order to solve this problem, a method of boundary optimization is introduced, each vertical frame calculates the offset amount of one horizontal detection, and the calculation method of this offset amount is as follows.

そのうち、ｘ^ａ _ｓｉｄｅが実際のナンバープレート水平境界に最も近い座標を表し、ｃ^ａ _ｘが垂直フレームセンター位置のｘ座標を表し、ｗ^ａが垂直フレームの幅を表す。 Of these, x ^a _side represents the coordinates closest to the actual license plate horizontal boundary, c ^a _x represents the x coordinates of the vertical frame center position, and w ^a represents the width of the vertical frame.

図３に示すように、ＣＴＰＮモデルのバックボーンネットワークは、ＶＧＧ１６ネットワークを使用し、入力画像は、任意の大きさであってもよく、ＶＧＧ１６によって出力される特徴図のサイズは、入力画像の大きさに依存する。複数回のコンボリューションを経て特徴を抽出し、最終的にＷ＊Ｈ＊Ｎの特徴図を得て、Ｎが特徴チャネル数であり、ＷとＨがそれぞれ特徴図の幅と高さである。次に２５６個の３＊３のコンボリューションコアが特徴図上でスライドし、画素点ごとに２５６次元の特徴ベクトルを抽出し、ピクチャにおける同一行内で抽出された複数の２５６次元ベクトルを一つのシーケンスと見なし、ＢＬＳＴＭモジュール中に導入し、ＢＬＳＴＭモジュールの後に５１２次元の完全接続層と出力層が接続される。 As shown in FIG. 3, the backbone network of the CTPN model uses the VGG16 network, the input image may be of any size, and the size of the feature diagram output by the VGG16 is the size of the input image. Depends on. Features are extracted through a plurality of convolutions, and finally a feature diagram of W * H * N is obtained. N is the number of feature channels, and W and H are the width and height of the feature diagram, respectively. Next, 256 3 * 3 convolution cores slide on the feature diagram, extract 256-dimensional feature vectors for each pixel point, and combine multiple 256-dimensional vectors extracted within the same row in the picture into one sequence. It is considered to be introduced in the BLSTM module, and the 512-dimensional fully connected layer and the output layer are connected after the BLSTM module.

ＣＴＰＮネットワークは、同一行のテキストを二つの部分に分割することがあり、本発明では、検出フレーム合併を導入することにより、後続の処理を行い、二つの検出が垂直方向上での重畳がある程度に達した場合、それらを一つの検出フレームに合併し、具体的には、一つの閾値を設定し、垂直方向上での重畳部分が閾値よりも高い場合、両者を合併することである。 The CTPN network may divide the text of the same line into two parts, and in the present invention, by introducing the detection frame merger, the subsequent processing is performed, and the two detections are overlapped in the vertical direction to some extent. When it reaches, they are merged into one detection frame, specifically, one threshold value is set, and when the superimposed portion in the vertical direction is higher than the threshold value, both are merged.

（３）ナンバープレートテキスト行識別ステップ
該ステップは、すでに検出されたナンバープレート上のテキスト行に対する識別を完了する必要があるが、識別する前に、テキスト行を補正する必要がある。撮影画角の問題により、ピクチャ上の文字が歪んで見える可能性があり、ある程度の補正により、歪んだ文字をできるだけ規則的にし、このように、識別の正確率を向上させることができる。 (3) License plate text line identification step The step needs to complete the identification of the already detected text lines on the license plate, but it is necessary to correct the text lines before identification. Characters on the picture may appear distorted due to the problem of the shooting angle of view, and with some correction, the distorted characters can be made as regular as possible, and thus the accuracy of identification can be improved.

本発明は、Ｓｅｑ２Ｓｅｑネットワークを使用してテキスト識別を行い、そのうちに古典的なアテンションメカニズムが含まれる。テキストの補正に対して、本発明は、ＳＴＮネットワークをテキスト識別ネットワークに嵌め込むことによって実現され、２Ｄ変換によって歪みとねじれのテキストを補正する。 The present invention uses the Seq2Seq network to perform text identification, including a classical attention mechanism. For text correction, the present invention is realized by fitting the STN network into the text identification network, and corrects the distorted and twisted text by 2D conversion.

図４に示すように、ＳＴＮネットワークの主な考え方は、空間変換操作をニューラルネットワークモデルにモデリングする。補正対象の画像において、矩形フレームの四つの頂点と対角線の交点にそれぞれ位置する五つの制御点を決定する。入力ピクチャをＩに仮定し、出力される補正後の画像がＩ_ｒであり、原画像の五つの制御点の座標からなるベクトル群がＡと表され、出力される補正後の画像における五つの制御点からなるベクトル群がＡ^ｒと表され、オリジナルテキスト行の制御点ベクトル群Ａにおける各制御点の座標は、具体的には、数式１９と表される。二次元変換の本質は、一つの補間函数ｆに近似し、Ａ^ｒ＝ｆ（Ａ）を満たすことである。ＴＰＳ（Ｔｈｉｎ－Ｐｌａｔｅ－Ｓｐｌｉｎｅ）モデルは、歪みテキスト補正の処理において非常に有効であることが証明されており、ナンバープレートピクチャの補正タスクは、五つの制御点位置の予測タスクに帰着されてもよく、測位ネットワークを用いて画像Ｉ上の制御点を予測し、測位ネットワークは、逆伝播勾配によって、出力画像の制御点を回帰分析し、出力画像の五つの制御点を自動的にラベル付けし、

６つのコンボリューションフィルタレイヤ、５つの最大プールレイヤ及び二つの完全接続レイヤで構成される。一つの１０次元のベクトルを出力し、５つの２次元ベクトルに再構成し、５つの制御点座標に対応させる。制御点の座標は、正規化を経て、つまり、左上のの頂点座標が（０、０）であり、右下の頂点座標が（１、１）である。 As shown in FIG. 4, the main idea of the STN network is to model the spatial transformation operation into a neural network model. In the image to be corrected, five control points located at the intersections of the four vertices of the rectangular frame and the diagonal line are determined. Assuming the input picture is I, the output corrected image is _Ir , the vector group consisting of the coordinates of the five control points of the original image is represented as A, and the five corrected images output. The vector group consisting of control points is represented by ^Ar , and the coordinates of each control point in the control point vector group A of the original text line are specifically represented by equation 19. The essence of the two-dimensional transformation is to approximate one interpolation function f and satisfy ^Ar = f (A). The TPS (Thin-Plate-Spline) model has proven to be very effective in the processing of distorted text correction, even if the license plate picture correction task is reduced to a five control point position prediction task. Often, the positioning network is used to predict the control points on the image I, and the positioning network regresses the control points of the output image by the backpropagation gradient and automatically labels the five control points of the output image. ,

It consists of 6 convolution filter layers, 5 maximum pool layers and 2 fully connected layers. One 10-dimensional vector is output, reconstructed into 5 2-dimensional vectors, and corresponding to 5 control point coordinates. The coordinates of the control points are normalized, that is, the upper left vertex coordinates are (0, 0) and the lower right vertex coordinates are (1, 1).

ｐ点の座標が［ｘ_ｐ，ｙ_ｐ］と表され、それに対応する補正後の点ｐ’の座標は、以下のような方法に従って計算することができる。 The coordinates of the point p are expressed as [x _p , y _p ], and the coordinates of the corresponding corrected point p'can be calculated according to the following method.

そのうち、Φ（ｘ）＝ｘ^２ｌｏｇ（ｘ）は、点ｐとｋ番目の制御点との間のユークリッド距離に応用される核関数である。 Of these, Φ (x) = x ² log (x) is a kernel function applied to the Euclidean distance between the point p and the kth control point.

線形システムを解くことによってＴＰＳのパラメータを解いた後、最終的に得られる出力される補正画像の数式は、以下の通りである。 After solving the TPS parameters by solving the linear system, the formula of the corrected image finally obtained is as follows.

そのうち、Ｖがダウンサンプラであり、Ｉが入力ピクチャであり、Ｉ_ｒが補正後のピクチャであり、原図と補正図の画素点は、ダウンサンプリングを経て最終的に補正された画像を得る。 Among them, V is a down sampler, I is an input picture, _Ir is a corrected picture, and the pixel points of the original drawing and the corrected drawing obtain a finally corrected image through downsampling.

図６に示すように、ナンバープレートテキスト識別ネットワークは、ナンバープレートテキスト行のキャラクタシーケンスを出力するために用いられ、該ネットワークは、一つのｓｅｑ２ｓｅｑフレームであり、且つエンコーダ－デコーダパラダイムに依存する内蔵されているアテンションメカニズムを有する。 As shown in FIG. 6, a license plate text identification network is used to output a character sequence of license plate text lines, which is a seq2seq frame and is built-in depending on the encoder-decoder paradigm. Has an attention mechanism.

まず、エンコーダは、コンボリューションニューラルネットワークを使用して補正後のナンバープレートテキスト行画像上から特徴を抽出し、画像サイズが３２＊１００である。特徴を抽出するコンボリューションネットワークは、ＲｅｓＮｅｔ－５０に基づく改良であり、最後の三つのダウンサンプリングレイヤのコンボリューションコアの移動ステップサイズは、（２、１）であり、このようにするのは、各特徴チャネル上の特徴図が一つのベクトルであることを保証するためであり、従って、最後に得られる特徴図の大きさは、１＊２５＊５１２（ｈ＊ｗ＊ｎ）である。その後特徴図を分割し、一つのベクトルシーケンスで構成されるベクトル群を得て、Ｘ＝［ｘ_１，ｘ_２，…，ｘ_Ｔ］と表され、そのうち、Ｔ＝２５であり、すなわち特徴図大きさにおける特徴幅ｗ、ベクトル群における各ベクトルは、いずれも５１２次元であり、すなわち特徴図大きさにおける特徴チャネル数ｎである。 First, the encoder uses a convolution neural network to extract features from the corrected license plate text line image, and the image size is 32 * 100. The feature-extracting convolution network is an improvement based on ResNet-50, where the movement step size of the convolution core of the last three downsampling layers is (2, 1). This is to ensure that the feature diagram on each feature channel is one vector, and therefore the size of the feature diagram finally obtained is 1 * 25 * 512 (h * w * n). After that, the feature diagram is divided to obtain a vector group composed of one vector sequence, which is expressed as X = [x ₁ , x ₂ , ..., X _T ], of which T = 25, that is, the feature diagram. The feature width w in the size and each vector in the vector group are 512 dimensions, that is, the number of feature channels n in the feature map size.

双方向ＬＳＴＭ（ＢＬＳＴＭ）ネットワークは、二つの方向上での特徴シーケンスの長距離依存関係を取得することができるので、ＢＬＳＴＭを前のステップで取得された特徴シーケンスに応用することにより、より豊富なコンテキスト関係を有する特徴シーケンスを取得する。ＢＬＳＴＭによって出力される新たな特徴シーケンスは、Ｈ＝［ｈ_１，ｈ_２，…，ｈ_Ｔ］と表され、そのうち、任意の一つの数式２２は、数式２３と表されてもよい。 Bidirectional LSTM (BLSTM) networks can acquire long-distance dependencies of feature sequences in two directions, so by applying BLSTM to the feature sequences obtained in the previous step, they are more abundant. Gets a feature sequence that has a contextual relationship. The new feature sequence output by BLSTM is represented as H = [h ₁ , h ₂ , ..., H _T ], of which any one math 22 may be represented as math 23.

ＢＬＳＴＭの任意の一ステップにおいて、デコーダは、最終的にコンテキストベクトルＣ、デコーダの内部状態ｓ、前のステップの出力ｙに基づき、最後に一つの確率図を出力し、この確率図は、各キャラクタとシーケンスターミネータ号（ＥＯＳ）の確率を表す。コンテキストベクトルＣは、Ｈの集約情報であり、Ｃ＝［ｃ_１，ｃ_２，…，ｃ_Ｔ］，Ｃ＝ｑ（Ｈ）と表され、ここでのｑは、アテンションメカニズムであり、数式２４と表されてもよく、
そのうち、数式２５は、エンコーダのステップｔ_１の隠れた状態数式２６とデコーダのステップｔ_２－１の隠れた状態数式２７によって計算して得られるものであり、Ｗ、Ｖ、ｂは、いずれもトレーニング可能な重み付けである。 In any one step of BLSTM, the decoder finally outputs one probability diagram based on the context vector C, the decoder's internal state s, and the output y of the previous step, and this probability diagram is for each character. And the probability of the sequence terminator (EOS). The context vector C is the aggregated information of H and is expressed as C = [c ₁ , c ₂ , ..., c _T ], C = q (H), where q is the attention mechanism and the equation 24. May be expressed as
Of these, equation 25 is calculated by the hidden state equation 26 in step t ₁ of the encoder and the hidden state equation 27 in step t _2-1 of the decoder, and W, V, and b are all obtained. It is a weight that can be trained.

エンコーダの出力は、また入力としてデコーダに入り、デコーダは、一つの出力ベクトルｚと一つの新しい状態ベクトルｓを計算する。 The output of the encoder also enters the decoder as an input, which computes one output vector z and one new state vector s.

そのうち、ｙは、ｏｎｅ－ｈｏｔ形式であり、ｒｎｎは、ＧＲＵサイクルネットワークユニットを表し、出力ｚは、現在のテキストシンボルを予測するために用いられる。 Among them, y is a one-hot format, rnn represents a GRU cycle network unit, and output z is used to predict the current text symbol.

最尤推定の考え方を運用し、出力シーケンスの条件確率を最大化するために、最適化する必要なターゲット関数は、以下の通りである。 The target functions that need to be optimized in order to operate the concept of maximum likelihood estimation and maximize the conditional probability of the output sequence are as follows.

出力が最大長さを超えた場合、またはＥＯＳシンボルを得た場合、出力シーケンスが終了し、最終的に画像中のナンバープレートテキスト行の識別結果を得たことを示し、本実施例は、ＢｅａｍＲｅｓｅａｒｃｈアルゴリズムを使用し、そのうちＢｅａｍｓｉｚｅのパラメータが５に設定される。 If the output exceeds the maximum length, or if an EOS symbol is obtained, it indicates that the output sequence is completed and finally the identification result of the license plate text line in the image is obtained. The Research algorithm is used, of which the Beam size parameter is set to 5.

本発明は、ＡＯＬＰデータセットとＵＦＰＲ－ＡＬＰＲデータセットによってトレーニングとテストを行うことにより、本発明によって提案される方法の高いロバスト性と高い性能を検証した。 The present invention has verified the high robustness and high performance of the method proposed by the present invention by training and testing with the AOLP and UFPR-ALPR datasets.

ナンバープレート検出ステップにおいて、ＩｏＵ値が０．５より大きい場合、ナンバープレートの検出に成功したとみなされ、ＩｏＵの数式は、以下の通りである。 In the license plate detection step, if the IoU value is larger than 0.5, it is considered that the license plate has been successfully detected, and the formula of IoU is as follows.

そのうち、Ｒ_ｄｅｔが検出フレームであり、Ｒ_ｇｔがマークフレームである。 Among them, R _det is a detection frame and R _gt is a mark frame.

ナンバープレートテキスト行検出タスクにおいて、ＩｏＵは、検出の正確性を評価するために用いられる。また、ナンバープレートテキスト識別タスクといくつかのナンバープレートテキスト検出タスクにおいて、Ｆ_１－ｓｃｏｒｅ使用して性能を評価し、数式は、以下の通りである。 In the license plate text line detection task, IoU is used to evaluate the accuracy of detection. In addition, in the license plate text identification task and some license plate text detection tasks, the performance is evaluated using F1 _- score, and the mathematical formula is as follows.

この指標は、正確率ｐｒｅｃｅｓｉｏｎとリコール率ｒｅｃａｌｌを同時に考慮した。 This index considered both the accuracy rate precession and the recall rate recall at the same time.

本実施例では、二つのデータセットを使用してそれぞれ検証する。各ステップが終了した後、いずれもその効果を検査し、各ステップがいずれも高い性能と高いロバスト性であることを確保する。ＵＦＰＲ－ＡＬＰＲデータセットとは異なり、ＡＯＬＰデータセット自体は、トレーニングセットとテストセットを分割していないので、そのうちの三つのサブセットのうちの二つをトレーニングセットとして、一つをテストセットとして利用してもよく、例えば、ＬＥとＡＣサブセットを使用してナンバープレート識別モデルをトレーニングし、ＲＰサブセットを使用してテストする。二つのデータセットのそれぞれの三つの主なステップにおける詳細なテスト結果は、表１から表６を参照してください。 In this embodiment, two data sets are used for each verification. After each step is completed, the effect is inspected to ensure that each step has high performance and high robustness. Unlike the UFPR-ALPR dataset, the AOLP dataset itself does not separate the training set and the test set, so two of the three subsets are used as the training set and one as the test set. The license plate discriminative model may be trained using, for example, LE and AC subsets and tested using the RP subset. See Tables 1-6 for detailed test results in each of the three main steps of the two datasets.

以上に記述されているのは、本発明の具体的な実施の形態に過ぎず、本発明の保護範囲は、それに限らない。いかなる当業者が、本発明に掲示される技術的範囲内に、各種の等価な修正又は置き換えを容易に想到でき、これらの修正又は置き換えは、いずれも、本発明の保護範囲内に含まれるべきである。このため、本発明の保護範囲は、請求項の保護範囲を基にすべきである。 The above description is merely a specific embodiment of the present invention, and the scope of protection of the present invention is not limited thereto. Any person skilled in the art can readily conceive of various equivalent modifications or replacements within the technical scope posted in the invention, all of which should be included within the scope of the invention. Is. Therefore, the scope of protection of the present invention should be based on the scope of protection of the claims.

Claims

テキスト行識別に基づくナンバープレート識別方法であって、
オリジナル画像を取得するＳ１と、
オリジナル画像におけるナンバープレート部分を検出し、ナンバープレート画像を得るナンバープレート検出ステップＳ２と、
テキスト検出ネットワークによってナンバープレート上のテキスト行を検出し、ナンバープレートテキスト行画像を得るテキスト行検出ステップＳ３と、
ナンバープレートテキスト行画像をナンバープレートテキスト行識別ネットワークに入力し、最終的にナンバープレートテキスト行のキャラクタシーケンスを出力し、ナンバープレート識別を完了するテキスト行識別ステップＳ４とを含む、ことを特徴とする、テキスト行識別に基づくナンバープレート識別方法。 It is a license plate identification method based on text line identification,
S1 to acquire the original image and
The license plate detection step S2, which detects the license plate portion in the original image and obtains the license plate image,
Text line detection step S3, which detects a text line on the license plate by the text detection network and obtains a license plate text line image,
The license plate text line image is input to the license plate text line identification network, and finally the character sequence of the license plate text line is output, and the license plate text line identification step S4 is included. , License plate identification method based on text line identification.

前記ステップＳ２において、ＹＯＬＯｖ３ネットワークによってオリジナル画像におけるナンバープレート部分を検出する、ことを特徴とする、請求項１に記載のテキスト行識別に基づくナンバープレート識別方法。 The license plate identification method based on the text line identification according to claim 1, wherein the license plate portion in the original image is detected by the YOLOv3 network in step S2.

前記ＹＯＬＯｖ３ネットワークは、オリジナル画像の特徴図のディメンションを五回低減させ、それぞれ第一の特徴図、第二の特徴図、第三の特徴図、第四の特徴図及び第五の特徴図を得て、その後それぞれ第三の特徴図、第四の特徴図及び第五の特徴図をアップサンプリングを経た後にスプライスし、最後に特徴テンソルを出力し、ナンバープレートに対する検出を完了する、ことを特徴とする、請求項２に記載のテキスト行識別に基づくナンバープレート識別方法。 The YOLOv3 network reduces the dimensions of the feature diagram of the original image five times to obtain the first feature diagram, the second feature diagram, the third feature diagram, the fourth feature diagram, and the fifth feature diagram, respectively. After that, the third feature diagram, the fourth feature diagram, and the fifth feature diagram are upsampled and then spliced, and finally the feature tensor is output to complete the detection for the license plate. The license plate identification method based on the text line identification according to claim 2.

前記ステップＳ３は、具体的には、
回帰モデルを利用してＣＰＴＮネットワークの垂直検出フレームを予測して得るＳ３１と、
予測して得られる垂直検出フレームに対して境界最適化を行うＳ３２と、
垂直方向上に重畳程度が設定される閾値に達する垂直検出フレームを一つの検出フレームに合併し、最終の垂直検出フレームを得るＳ３３と、
ＣＰＴＮネットワークは、垂直検出フレームによってテキスト行の検出を行い、ナンバープレートテキスト行画像を得るＳ３４と、を含む、ことを特徴とする、請求項１に記載のテキスト行識別に基づくナンバープレート識別方法。 Specifically, the step S3 is
S31 obtained by predicting the vertical detection frame of the CPTN network using the regression model, and
S32, which performs boundary optimization for the predicted vertical detection frame, and
S33, which obtains the final vertical detection frame by merging the vertical detection frames that reach the threshold value at which the degree of superimposition is set in the vertical direction into one detection frame,
The license plate identification method based on the text line identification according to claim 1, wherein the CPTN network includes S34, which detects a text line by a vertical detection frame and obtains a license plate text line image.

前記垂直フレーム検出フレームのセンター位置ｔ_ｃと高さｔ_ｈの計算方法は、数式１と数式２の通りであり、そのうち、ｃ^ｂ _ｙが境界フレームのセンター位置であり、ｈ^ｂが境界フレームの高さであり、ｃ^ａ _ｙがアンカーフレームのセンターであり、ｈ^ａがアンカーフレームの高さであり、

前記境界最適化は、各垂直フレーム検出フレームが一つの水平方向検出のオフセット量ｔ_ｗを計算し、該オフセット量ｔ_ｗの計算式は数式３の通りであり、

そのうち、ｘ^ａ _ｓｉｄｅが実際のナンバープレート水平境界に最も近い座標であり、ｃ^ａ _ｘが垂直検出フレームセンター位置のｘ座標であり、ｗ^ａが垂直検出フレームフレームの幅であることを特徴とする、請求項４に記載のテキスト行識別に基づくナンバープレート識別方法。 The calculation method of the center position t _c and the height th of the vertical frame detection frame is as in Equation 1 and Equation 2, of which ^{c by is the center position of the boundary frame and h b} _is _the ^boundary frame. The height, ^ca _y is the center of the anchor frame, ^ha is the height of the anchor frame,

In the boundary optimization, each vertical frame detection frame calculates an offset amount t _w for one horizontal detection, and the formula for calculating the offset amount t _w is as shown in Equation 3.

Among them, x ^a _side is the coordinate closest to the actual license plate horizontal boundary, c ^a _x is the x coordinate of the vertical detection frame center position, and w ^a is the width of the vertical detection frame frame. , The license plate identification method based on the text line identification according to claim 4.

前記ナンバープレートテキスト識別ネットワークは、補正ネットワークとテキスト識別ネットワークを含み、それぞれナンバープレートテキスト行画像に対する補正と文字識別を実現し、前記補正ネットワークは、二次元変換によって歪みとねじれのテキストを補正し、前記テキスト識別ネットワークは、エンコーダ－デコーダパラダイムのアテンションメカニズムが内蔵されているｓｅｑ２ｓｅｑネットワークを採用する、ことを特徴とする、請求項１に記載のテキスト行識別に基づくナンバープレート識別方法。 The license plate text identification network includes a correction network and a text identification network to realize correction and character identification for the license plate text line image, respectively, and the correction network corrects distortion and twisted text by two-dimensional transformation. The license plate identification method based on the text line identification according to claim 1, wherein the text identification network employs a seq2seq network having an encoder-decoder paradigm attention mechanism built-in.

前記補正ネットワークは、測位ネットワークを含み、前記測位ネットワークは、オリジナルテキスト行の制御点ベクトル群Aに対して予測を行い、且つ逆伝播勾配によって、補正テキスト行の制御点ベクトル群Ａ^ｒを回帰分析して得て、前記補正ネットワークは、オリジナルテキスト行制御点ベクトル群Cと補正テキスト行制御点ベクトル群Ａ^ｒとの間の関係に基づき、オリジナルテキスト行画像に対して二次元変換を行い、補正後のテキスト行画像を得る、ことを特徴とする、請求項６に記載のテキスト行識別に基づくナンバープレート識別方法。 The correction network includes a positioning network, and the positioning network makes a prediction for the control point vector group A of the original text line, and ^regression -analyzes the control point vector group Ar of the correction text line by the back propagation gradient. The correction network performs two-dimensional conversion on the original text line image and corrects it based on the relationship between the original text line control point vector group C and the correction text line control point vector group ^Ar . The number plate identification method based on the text line identification according to claim 6, wherein a subsequent text line image is obtained.

前記測位ネットワークは、六つのコンボリューションフィルタレイヤ、五つの最大プールレイヤ及び二つの完全接続レイヤグループレイヤを含み、前記制御点は、五つを含み、それぞれは、ナンバープレートテキスト行の四つの頂点と対角線の交点である、ことを特徴とする、請求項７に記載のテキスト行識別に基づくナンバープレート識別方法。 The positioning network includes six convolution filter layers, five maximum pool layers and two fully connected layer group layers, the control points include five, each with four vertices of a license plate text line. The license plate identification method based on the text line identification according to claim 7, wherein the license plate is an intersection of diagonal lines.

前記テキスト識別ネットワークがナンバープレートテキスト行画像に対して文字識別を行うことは、具体的には、
エンコーダは、コンボリューションニューラルネットワークを使用して補正後のナンバープレートテキスト行画像上からテキスト特徴図を抽出し、その後テキスト特徴図を分割し且つ双方向ＬＳＴＭネットワークに入力した後、テキスト特徴シーケンスｈ_ｔを得るＳ４１と、
テキスト特徴シーケンスｈ_ｔをデコーダに入力するＳ４２と、
デコーダは、コンテキストベクトル、デコーダ内部状態及び前のステップの出力を利用し、アテンションメカニズムとＧＲＵサイクルネットワークユニットによって、各キャラクタとシーケンスターミネータの確率を出力し、現在のテキストシンボルを予測するＳ４３と、を含む、ことを特徴とする、請求項６に記載のテキスト行識別に基づくナンバープレート識別方法。 Specifically, the text identification network performs character identification on the license plate text line image.
The encoder uses a convolution neural network to extract the text feature diagram from the corrected license plate text line image, then divides the text feature diagram and inputs it to the bidirectional _LSTM network, and then the text feature sequence ht. S41 and
S42 for inputting the text feature sequence h _t to the decoder, and
The decoder uses the context vector, the decoder internal state and the output of the previous step to output the probabilities of each character and sequence terminator by the attention mechanism and GRU cycle network unit, and S43 to predict the current text symbol. The license plate identification method based on the text line identification according to claim 6, wherein the license plate identification method includes.

前記デコーダの計算式は、数式４の通りであり、

そのうち、数式５がデコーダによってステップｔ_２で出力される現在のテキストシンボル予測確率であり、数式６がデコーダのステップｔ_２の内部状態ベクトルであり、数式７がデコーダのステップｔ_２－１の内部状態ベクトルであり、数式８デコーダのステップｔ_２のコンテキストベクトルであり、数式９がデコーダのステップｔ_２－１の出力であり、ｒｎｎがＧＲＵサイクルネットワークユニットであり、数式１０がキャラクタ分類確率図であり、Ｗ_０が完全接続ネットワークパラメータであり、ｂ_０が完全接続ネットワークオフセット量であり、

前記デコーダのステップｔ_２のコンテキストベクトル数式８は、アテンションメカニズムによって得られ、その数式は、数式１１の通りであり、

そのうち、Ｗ_ｃｏｎｖがコンボリューションネットワークパラメータであり、数式１２が完全接続ネットワークパラメータであり、数式１３がエンコーダのステップｔ_１のテキスト特徴シーケンスであり、数式１４が重み付けパラメータであり、Ｔが入力シーケンスの長さであり、数式１５がｋ時刻重み付けパラメータであり、ｖ、Ｗ、Ｖがいずれも完全接続ネットワークパラメータであり、ＢＬＳＴＭが双方向ＬＳＴＭネットワークであり、数式１６がエンコーダのステップｔ_１のテキスト特徴図を分割した後に得られるベクトルシーケンスであり、数式１７がエンコーダのステップｔ_１－１のテキスト特徴シーケンスである、

ことを特徴とする、請求項９に記載のテキスト行識別に基づくナンバープレート識別方法。 The calculation formula of the decoder is as shown in Equation 4.

Of these, Equation 5 is the current text symbol prediction probability _output by the decoder in step t2, Equation ₆ is the internal state vector of step t2 of the decoder, and Equation 7 is the inside of step _t2-1 of the decoder. It is a state vector, the context vector of step t2 of the equation ₈ decoder, equation 9 is the output of step _t2-1 of the decoder, rnn is the GRU cycle network unit, and equation 10 is the character classification probability diagram. Yes, W ₀ is the fully connected network parameter, b ₀ is the fully connected network offset amount, and so on.

_The context vector equation 8 in step t2 of the decoder is obtained by an attention mechanism, and the equation is as in equation 11.

Of these, W _conv is the convolution network parameter, Equation 12 is the fully connected network parameter, Equation ₁₃ is the text feature sequence of step t1 of the encoder, Equation 14 is the weighting parameter, and T is the input sequence. The length, math 15 is the k-time weighting parameter, v, W, V are all fully connected network parameters, BLSTM is the bidirectional LSTM network, and math ₁₆ is the text feature of step t1 of the encoder. It is a vector sequence obtained after dividing the figure, and Equation 17 is the text feature sequence of step t _1-1 of the encoder.

The license plate identification method based on the text line identification according to claim 9.