JP2023523745A

JP2023523745A - Character string recognition method, apparatus, equipment and medium based on computer vision

Info

Publication number: JP2023523745A
Application number: JP2022564797A
Authority: JP
Inventors: 志成楊; 睿宇李
Original assignee: Shenzhen Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd
Priority date: 2020-07-03
Filing date: 2021-07-02
Publication date: 2023-06-07
Anticipated expiration: 2041-07-02
Also published as: WO2022002262A1; CN111832561B; CN111832561A; JP7429307B2

Abstract

コンピュータビジョンに基づく文字列認識方法、装置、コンピュータ機器及び記憶媒体を提供する。前記方法は、認識対象文字列が付いた画像を取得するステップと、予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得するステップと、目標領域画像を横方向補正して、横方向の目標領域画像を得るステップと、予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得するステップと、文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得するステップとを含む。【選択図】図１A computer vision-based string recognition method, apparatus, computer equipment and storage medium are provided. The method includes the steps of obtaining an image with a recognition target character string, obtaining a target region image in which the recognition target character string is located in the image based on a pre-built position detection model, Obtaining a target area image in the horizontal direction by correcting the area image in the horizontal direction; inputting a horizontal target area image into a pre-built content recognition model to obtain character string content corresponding to the character string to be recognized, if the standing state of the row is an upright state; [Selection drawing] Fig. 1

Description

本願は、コンピュータビジョンに基づく文字列認識方法、装置、コンピュータ機器及び記憶媒体に関する。 The present application relates to a computer vision-based character string recognition method, apparatus, computer equipment and storage medium.

本願は、２０２０年０７月０３日に提出された、発明が「コンピュータビジョンに基づく文字列認識方法、装置、機器及び媒体」、出願番号が２０２０１０６３０５５３．０である中国出願の優先権を主張しており、当該出願の開示内容は引用により全体として本願に組み込まれている。 This application claims the priority of the Chinese application filed on July 03, 2020, whose invention is "Computer vision-based character string recognition method, apparatus, apparatus and medium", application number 202010630553.0 and the disclosure of that application is incorporated herein by reference in its entirety.

コンピュータビジョン技術の発展に伴い、文字列に対する認識はコンピュータビジョン技術が生活において実際に適用されることの１つとなっており、例えば、工業シーンにおいて、シリアル番号、製造日、ステンシルや碑文などの文字列を認識する。一般には、文字列に対する認識プロセスでは、まず文字列の位置を検出し、検出した位置にある文字列をトリミングし、最後に、トリミングした文字列画像について角度判定及び認識を行い、対応するテキストコンテンツを得るか、又は、文字列を特殊な目標として検出し、分類器によって検出し、画像構造のモデルに基づいて１つの語として集約するか、又はニューラルネットワークアルゴリズムにより、画像特徴及び文字列位置と対応するコンテンツとのマッチング関係を作成することで、文字列を認識する。 With the development of computer vision technology, recognition of character strings has become one of the practical applications of computer vision technology in life. Recognize columns. In general, the recognition process for a character string first detects the position of the character string, trims the character string at the detected position, and finally performs angle judgment and recognition on the trimmed character string image to determine the corresponding text content. or detect the strings as special targets, detected by a classifier and aggregated as a single word based on a model of image structure, or by a neural network algorithm to combine image features and string positions with Recognize strings by creating matching relationships with corresponding content.

複数の実施例によれば、本願の第１態様は、
認識対象文字列が付いた画像を取得するステップと、
予め構築された位置検出モデルに基づいて、前記画像のうちの前記認識対象文字列が位置する目標領域画像を取得するステップと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得るステップと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得するステップと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するステップとを含む、コンピュータビジョンに基づく文字列認識方法を提供する。 According to several embodiments, a first aspect of the present application includes:
obtaining an image with a string to be recognized;
obtaining a target area image in which the character string to be recognized is located in the image based on a position detection model constructed in advance;
laterally correcting the target area image to obtain a lateral target area image;
obtaining an upright state of a character string in the horizontal target area image based on a pre-constructed angle determination model;
inputting the horizontal target area image into a pre-constructed content recognition model to acquire character string content corresponding to the character string to be recognized, if the upright state of the character string is an upright state; To provide a computer vision-based string recognition method, including:

複数の実施例によれば、本願の第２態様は、
認識対象文字列が付いた画像を取得する画像取得モジュールと、
予め構築された位置検出モデルに基づいて、前記画像のうちの前記認識対象文字列が位置する目標領域画像を取得する位置検出モジュールと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得る横方向補正モジュールと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得する角度判定モジュールと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するコンテンツ認識モジュールとを含む、コンピュータビジョンに基づく文字列認識装置を提供する。 According to several embodiments, a second aspect of the present application comprises:
an image acquisition module for acquiring an image with a recognition target character string;
a position detection module that obtains a target area image in which the character string to be recognized is located in the image based on a position detection model constructed in advance;
a lateral correction module laterally correcting the target area image to obtain a lateral target area image;
an angle determination module that acquires an upright state of a character string in the horizontal target area image based on a pre-constructed angle determination model;
A content recognition module for acquiring character string content corresponding to the character string to be recognized by inputting the target area image in the horizontal direction into a content recognition model constructed in advance when the standing state of the character string is an upright state. and a computer vision-based string recognizer.

複数の実施例によれば、本願の第３態様は、
コンピュータプログラムが記憶されているメモリと、前記コンピュータプログラムを実行すると上記方法のステップを実現するプロセッサとを含む、コンピュータ機器を提供する。 According to several embodiments, a third aspect of the present application comprises:
A computer apparatus is provided, comprising a memory in which a computer program is stored, and a processor that implements the steps of the method when executing the computer program.

複数の実施例によれば、本願の第４態様は、
プロセッサによって実行されると上記の方法のステップを実現するコンピュータプログラムが記憶されている、コンピュータ読み取り可能な記憶媒体を提供する。 According to several embodiments, a fourth aspect of the present application comprises:
A computer readable storage medium is provided on which is stored a computer program that, when executed by a processor, implements the steps of the above method.

本願の１つ又は複数の実施例の詳細は以下の図面及び説明に記載される。本願の他の特徴及び利点は、明細書、図面及び特許請求の範囲から明らかになる。 The details of one or more implementations of the application are set forth in the drawings and description below. Other features and advantages of the present application will become apparent from the specification, drawings and claims.

本願の実施例又は従来技術の技術的解決手段をより明確に説明するために、以下、実施例又は従来の説明に必要な図面を簡単に説明するが、以下の説明における図面は本願の一部の実施例に過ぎず、当業者であれば、創造的な努力を必要とせずに、これらの図面に基づいて他の図面を得ることもできることは明らかである。 In order to describe the embodiments of the present application or the technical solutions of the prior art more clearly, the drawings necessary for the description of the embodiments or the prior art will be briefly described below, but the drawings in the following description are part of the present application. It is obvious that those skilled in the art can derive other drawings based on these drawings without creative efforts.

一実施例におけるコンピュータビジョンに基づく文字列認識方法の流れの概略図である。1 is a schematic diagram of the flow of a computer vision-based character string recognition method in one embodiment; FIG. 一実施例における予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得する流れの概略図である。FIG. 5 is a schematic diagram of the flow of acquiring the standing state of a character string in a horizontal target area image based on a pre-built angle determination model in one embodiment; 一実施例における予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得する流れの概略図である。FIG. 4 is a schematic diagram of the flow of obtaining a target area image in which a recognition target character string is located in an image based on a pre-established position detection model in one embodiment; 一実施例における予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得する流れの概略図である。FIG. 4 is a schematic diagram of the flow of inputting a horizontal target area image into a pre-constructed content recognition model and acquiring character string content corresponding to a recognition target character string in an embodiment; 別の実施例におけるコンピュータビジョンに基づく文字列認識方法の流れの概略図である。FIG. 5 is a schematic diagram of the flow of a computer vision-based character string recognition method in another embodiment; 一適用例におけるアルゴリズムトレーニング及び予測処理の流れの概略図である。1 is a schematic diagram of the algorithm training and prediction process flow in one application; FIG. 一適用例における画像特徴ピラミッドの構造概略図である。1 is a structural schematic diagram of an image feature pyramid in one application; FIG. 一適用例における文字列角度判定アルゴリズムの流れの概略図である。FIG. 4 is a schematic diagram of the flow of a string angle determination algorithm in one application; 一適用例における文字列コンテンツ認識アルゴリズムの流れの概略図である。Fig. 2 is a schematic diagram of the flow of a string content recognition algorithm in one application; 一実施例におけるコンピュータビジョンに基づく文字列認識装置の構造ブロック図である。1 is a structural block diagram of a computer vision-based character string recognition device in one embodiment; FIG. 一実施例におけるコンピュータ機器の内部構造図である。1 is an internal structural diagram of a computer device in one embodiment; FIG.

現在の文字列認識方法は、全て低次元の手動特徴に基づくものであり、工業シーンにおける画像撮影角度の変化への適応処理能力に欠けるため、文字列に対する認識の正確率が低い。 Current character string recognition methods are all based on low-dimensional manual features, and lack the ability to adapt to changes in image shooting angles in industrial scenes, resulting in low recognition accuracy rates for character strings.

本願の目的、技術的解決手段及び利点をより明確にするために、以下、図面及び実施例を参照して、本願についてさらに詳細に説明する。なお、ここで説明される具体的な実施例は本願を解釈するために過ぎず、本願を限定するものではない。 In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be further described in detail below with reference to the drawings and examples. It should be noted that the specific examples described herein are only for the purpose of interpreting the present application and are not intended to limit the present application.

一実施例では、図１に示すように、コンピュータビジョンに基づく文字列認識方法を提供し、本実施例では、該方法が端末に適用される場合を例として説明するが、該方法はサーバに適用されてもよいし、端末とサーバを備えたシステムに適用され、端末とサーバとの相互作用を通じて実装されてもよいことが理解される。本実施例では、該方法は、ステップＳ１０１～ステップＳ１０５を含む。 In one embodiment, as shown in FIG. 1, a computer vision-based character string recognition method is provided, and in this embodiment, the method is applied to a terminal as an example, but the method is applied to a server. It will be appreciated that it may be applied to a system comprising a terminal and a server and implemented through the interaction of the terminal and the server. In this embodiment, the method includes steps S101 to S105.

ステップＳ１０１において、端末は認識対象文字列が付いた画像を取得する。 In step S101, the terminal acquires an image with a recognition target character string.

ここでは、認識対象文字列とは、ユーザが画像から取得すべき文字列を指し、該画像は工業シーンで撮影された画像であってもよい。具体的には、ユーザは、携帯電話のカメラ又はビデオ収集機器などにより、さまざまなシーンから認識対象文字列が付いた画像を記録し、この画像を端末に記憶し、端末が認識対象文字列が付いた画像を得るようにしてもよい。 Here, the character string to be recognized refers to a character string that the user should acquire from an image, and the image may be an image taken in an industrial scene. Specifically, the user records images with recognition target character strings from various scenes using a mobile phone camera or video collection equipment, stores these images in the terminal, and the terminal recognizes the recognition target character strings. You may obtain the attached image.

ステップＳ１０２において、端末は、予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得する。 In step S102, the terminal acquires a target area image in which the recognition target character string is located in the image based on the position detection model constructed in advance.

ここでは、位置検出モデルは主に画像内の認識対象文字の位置領域を検出するものであり、目標領域画像とは、認識対象文字列の該画像での位置領域の画像を指す。具体的には、端末は、予め構築された位置検出モデルを利用して、認識対象文字列が付いた画像に対して文字列位置検出を行うことで、認識対象文字列が位置する目標領域画像を決定してもよい。 Here, the position detection model mainly detects the position area of the recognition target character in the image, and the target area image refers to the image of the position area of the recognition target character string in the image. Specifically, the terminal uses a position detection model built in advance to perform character string position detection on an image with a recognition target character string, thereby obtaining a target region image where the recognition target character string is located. may be determined.

ステップＳ１０３において、端末は、目標領域画像を横方向補正して、横方向の目標領域画像を得る。 In step S103, the terminal laterally corrects the target area image to obtain a lateral target area image.

ユーザがさまざまな撮影角度から認識対象文字列の画像を撮影するのが一般的であるため、端末により得られた、認識対象文字列が付いた画像では、認識対象文字列は横方向に配列されるのではなく、横方向とある角度をなして表現される場合が多い。このため、文字列認識の正確性を向上させるために、端末は、ステップＳ１０２で目標領域画像を得た後、目標領域画像を横方向に補正して、横方向の目標領域画像を得る必要がある。横方向の目標領域画像内では、認識対象文字列は横方向に配列される。具体的には、端末は目標領域画像に対してアフィン変換を行うことで横方向補正を行い、横方向の目標領域画像を得るようにしてもよい。 Since it is common for users to shoot images of recognition target character strings from various shooting angles, recognition target character strings are arranged horizontally in images with recognition target character strings obtained by terminals. It is often expressed at an angle with the horizontal direction instead of the horizontal direction. Therefore, in order to improve the accuracy of character string recognition, the terminal needs to correct the target area image in the horizontal direction after obtaining the target area image in step S102 to obtain the horizontal target area image. be. In the horizontal target area image, the character strings to be recognized are arranged horizontally. Specifically, the terminal may perform horizontal correction by performing affine transformation on the target area image to obtain a horizontal target area image.

ステップＳ１０４において、端末は、予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得する。 In step S104, the terminal acquires the standing state of the character string in the target area image in the horizontal direction based on the angle determination model constructed in advance.

ステップＳ１０３では、端末は、目標領域画像に対する横方向補正を完了した後、ユーザの初期の撮影画像の角度から、得た横方向の目標領域画像の文字列の起立状態が正立状態であってもよいし、倒立状態であってもよく、倒立状態である場合、文字列の起立状態のずれにより最終の文字列認識結果が影響を受ける。このため、端末は、横方向の目標領域画像を得た後、得た横方向の目標領域画像の文字列の起立状態を決定する必要がある。具体的には、端末は、予め構築された角度判定モデルに横方向の目標領域画像を入力することで、横方向の目標領域画像の文字列の起立状態を決定してもよい。 In step S103, after the horizontal correction of the target area image is completed, the terminal confirms that the upright state of the character string in the horizontal target area image obtained from the angle of the initial photographed image of the user is the upright state. It may be in an inverted state, and in the case of an inverted state, the final character string recognition result is affected by the deviation of the upright state of the character string. Therefore, after obtaining the horizontal target area image, the terminal needs to determine the standing state of the character string in the obtained horizontal target area image. Specifically, the terminal may determine the standing state of the character string in the horizontal target area image by inputting the horizontal target area image into an angle determination model constructed in advance.

ステップＳ１０５において、端末は、文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得する。 In step S105, if the upright state of the character string is the upright state, the terminal inputs the target area image in the horizontal direction to the content recognition model constructed in advance, and acquires the character string content corresponding to the recognition target character string. do.

一方、端末は、このときの文字列の起立状態を正立状態として決定した場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を直接入力してもよく、コンテンツ認識モデルは主に目標領域画像内の文字列のコンテンツを認識するものであり、このため、端末は、このコンテンツ認識モデルを利用して、認識対象文字列に対応する文字列コンテンツを得てもよい。 On the other hand, when the terminal determines that the upright state of the character string at this time is the upright state, the terminal may directly input the target area image in the horizontal direction to the content recognition model constructed in advance. It recognizes the content of the string in the target area image, so the terminal may use this content recognition model to obtain the string content corresponding to the string to be recognized.

上記のコンピュータビジョンに基づく文字列認識方法では、端末は、認識対象文字列が付いた画像を取得し、予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得し、目標領域画像を横方向補正して、横方向の目標領域画像を得て、予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得し、文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得する。本願では、端末が目標領域画像を横方向補正することにより、工業シーンにおける画像撮影角度の変化への適応処理が図られ、文字列に対する認識の正確率が向上する。 In the above computer vision-based character string recognition method, a terminal obtains an image with a recognition target character string attached, and based on a pre-built position detection model, a target target in which the recognition target character string is located in the image. A region image is acquired, the target region image is corrected in the horizontal direction to obtain a horizontal target region image, and the standing state of the character string in the horizontal target region image is determined based on a pre-constructed angle determination model. If the standing state of the character string is the upright state, the target area image in the horizontal direction is input to the content recognition model constructed in advance, and the character string content corresponding to the recognition target character string is obtained. In the present application, the terminal corrects the target area image in the horizontal direction, so that adaptive processing can be performed to changes in the image capturing angle in the industrial scene, and the recognition accuracy rate for the character string is improved.

一実施例では、図２に示すように、ステップＳ１０４は、ステップＳ２０１とステップＳ２０２を含む。 In one embodiment, as shown in FIG. 2, step S104 includes steps S201 and S202.

ステップＳ２０１において、端末は、角度判定モデルに基づいて、横方向の目標領域画像の起立角度を取得する。 In step S201, the terminal acquires the standing angle of the target area image in the horizontal direction based on the angle determination model.

ここでは、角度判定モデルは主に横方向の目標領域画像の角度を決定するものであり、文字列の起立状態が主としてユーザの初期の撮影画像の角度によるものであるため、端末はこの角度判定モデルにより、横方向の目標領域画像の起立角度を決定し、起立角度を利用して文字列の起立状態を決定してもよい。 Here, the angle determination model mainly determines the angle of the target area image in the horizontal direction. The model may determine the orientation angle of the target area image in the horizontal direction, and the orientation angle may be used to determine the orientation of the character string.

ステップＳ２０２において、端末は、起立角度が属する起立角度区間から、文字列の起立状態を決定する。 In step S202, the terminal determines the standing state of the character string from the standing angle section to which the standing angle belongs.

一方、端末により得られた横方向の目標領域画像の起立角度と標準の横方向角度との間の僅かなずれを回避するために、ステップＳ２０１では、端末は、角度判定モデルによって起立角度を決定した後、予め設定された起立角度区間表から、当該起立角度に適した起立角度区間を、該起立角度が属する起立角度区間として選択し、起立角度区間を利用して文字列の起立状態を決定してもよい。 Meanwhile, in order to avoid a slight deviation between the standing angle of the lateral target area image obtained by the terminal and the standard lateral angle, in step S201, the terminal determines the standing angle according to the angle determination model. Then, from a preset standing angle interval table, a standing angle interval suitable for the standing angle is selected as the standing angle interval to which the standing angle belongs, and the standing state of the character string is determined using the standing angle interval. You may

さらに、起立角度区間は第１角度区間と第２角度区間を含んでもよく、文字列の起立状態は正立状態と倒立状態を含んでもよく、ステップＳ２０２は、起立角度区間が第１角度区間である場合、端末は文字列の起立状態を正立状態として決定するステップと、起立角度区間が第２角度区間である場合、文字列の起立状態を倒立状態として決定するステップとをさらに含んでもよい。 Further, the standing angle section may include a first angle section and a second angle section, the standing state of the character string may include an upright state and an inverted state, and step S202 is performed when the standing angle section is the first angle section. In some cases, the terminal may further include determining the standing state of the character string as an upright state, and determining the standing state of the character string as an inverted state if the standing angle interval is the second angle interval. .

ここでは、第１角度区間と第２角度区間はそれぞれ２つの異なる角度区間であり、文字列の２種の起立状態をそれぞれ表す。具体的には、端末により得られた横方向の目標領域画像の起立角度が属する起立角度区間が第１角度区間である場合、端末は、このときの横方向の目標領域画像を正立状態として決定してもよく、一方、端末により得られた横方向の目標領域画像の起立角度が属する起立角度区間が第２角度区間である場合、端末は、このときの横方向の目標領域画像を倒立状態として決定してもよい。 Here, the first angle section and the second angle section are two different angle sections, respectively, representing two standing states of the character string. Specifically, when the standing angle section to which the standing angle of the horizontal target area image obtained by the terminal belongs is the first angle section, the terminal treats the horizontal target area image at this time as the upright state. On the other hand, if the standing angle section to which the standing angle of the horizontal target area image obtained by the terminal belongs is the second angle section, the terminal can invert the horizontal target area image at this time. You may decide as a state.

また、文字列の起立状態が倒立状態である場合、横方向の目標領域画像を正立状態に回転させて、コンテンツ認識モデルに入力し、文字列コンテンツを取得する。 In addition, when the upright state of the character string is an inverted state, the target area image in the horizontal direction is rotated to the upright state and input to the content recognition model to acquire the character string content.

端末がコンテンツ認識モデルに倒立状態の横方向の目標領域画像をそのまま入力すれば、コンテンツ認識モデルが得た文字列コンテンツと実際の文字コンテンツとの間のずれをもたらす恐れがある。このため、コンテンツ認識モデルに横方向の目標領域画像を入力するに先立って、横方向の目標領域画像を回転させて、正立状態にする必要があり、例えば、横方向の目標領域画像の中心を１８０°回転させることによって、横方向の目標領域画像を正立状態に回転させ、コンテンツ認識モデルに回転後の横方向の目標領域画像を入力し、認識対象文字列の文字列コンテンツを得るようにしてもよい。 If the terminal directly inputs the inverted horizontal target area image to the content recognition model, there is a risk of causing a discrepancy between the character string content obtained by the content recognition model and the actual character content. For this reason, prior to inputting the horizontal target area image into the content recognition model, it is necessary to rotate the horizontal target area image to an upright state, e.g. is rotated by 180° to rotate the target area image in the horizontal direction to an upright state, and the target area image in the horizontal direction after rotation is input to the content recognition model to obtain the character string content of the character string to be recognized. can be

上記の実施例では、端末は、角度判定モデルにより、横方向の目標領域画像の起立角度を得て、文字列の起立状態を決定してもよく、一方、文字列の起立状態が倒立状態である場合、端末は、回転によって横方向の目標領域画像を正立状態に変換し、コンテンツ認識モデルに正立状態の横方向の目標領域画像を入力し、文字列コンテンツを得るようにしてもよく、これは、得た文字列コンテンツの正確性のさらなる向上に有利である。 In the above embodiments, the terminal may obtain the standing angle of the target area image in the horizontal direction according to the angle determination model to determine the standing state of the character string, while the standing state of the character string may be the inverted state. In some cases, the terminal may convert the horizontal target area image into an upright state by rotation, input the upright horizontal target area image into the content recognition model, and obtain the text content. , which is advantageous for further improving the accuracy of the obtained string content.

一実施例では、図３に示すように、ステップＳ１０２は、ステップＳ３０１～ステップＳ３０３を含む。 In one embodiment, as shown in FIG. 3, step S102 includes steps S301-S303.

ステップＳ３０１において、端末は、位置検出モデルを利用して、画像から文字領域画像特徴を抽出する。 In step S301, the terminal uses the position detection model to extract character region image features from the image.

ここでは、文字領域画像特徴とは、文字列位置を決定するための画像特徴を指す。具体的には、端末は、位置検出モデルを利用して、得た認識対象文字列の画像から、上記文字領域画像特徴を抽出してもよい。 Here, the character area image feature refers to an image feature for determining the character string position. Specifically, the terminal may use the position detection model to extract the character area image feature from the obtained recognition target character string image.

ステップＳ３０２において、端末は、文字領域画像特徴に従って、目標領域画像の予測マスクを取得する。 In step S302, the terminal obtains a prediction mask of the target area image according to the character area image features.

ここで、マスクとは、選択された画像、図形又は物体であり、処理対象の画像（グローバル又はローカル）を遮断することで、画像の処理領域又は処理プロセスを制御することを指す。具体的には、端末は、文字領域画像特徴を利用して、文字領域画像特徴に対応する予測マスクを得るようにしてもよい。 Here, a mask is a selected image, figure, or object that blocks the image (global or local) to be processed, thereby controlling the processing area or process of the image. Specifically, the terminal may use the character region image feature to obtain a prediction mask corresponding to the character region image feature.

ステップＳ３０３において、端末は、予測マスクについて連通ドメイン及び最小外接矩形を求め、目標領域画像を得る。 In step S303, the terminal obtains the connected domain and minimum bounding rectangle for the prediction mask to obtain the target region image.

ステップＳ３０２では、端末は、目標領域画像の予測マスクを得た後、該マスクについて連通ドメイン及び最小外接矩形を求め、目標画像を得るようにしてもよい。 In step S302, after obtaining the prediction mask of the target region image, the terminal may obtain the connected domain and minimum bounding rectangle for the mask to obtain the target image.

さらに、端末により得られた認識対象文字列が付いた画像に存在し得る鮮明さの不足や、光照射強度が低すぎることにより文字列認識の正確率が低すぎるという問題を回避するために、一実施例では、ステップＳ３０１は、さらに、端末は、画像を前処理し、前処理後の画像から高次元画像特徴を抽出するステップと、画像特徴ピラミッドを利用して、高次元画像特徴に対して第１特徴強調処理を行い、文字領域画像特徴とするステップとを含んでもよい。 Furthermore, in order to avoid the problem that the accuracy rate of character string recognition is too low due to the lack of sharpness that may exist in the image with the recognition target character string obtained by the terminal, and the light irradiation intensity is too low, In one embodiment, step S301 further includes the step of pre-processing the image, extracting high-dimensional image features from the pre-processed image; and performing a first feature enhancement process on the character area image feature.

ここで、前処理のプロセスは、端末が、認識対象文字列が付いた画像のうちの小さい又は視認しにくい文字列領域画像をフィルタリングすることで、認識対象文字列が付いた画像内の高次元画像特徴を抽出できることであってもよく、また、端末は、画像特徴ピラミッドを利用して、抽出した高次元画像特徴に対して第１特徴強調処理を行ってもよく、これは、文字領域画像特徴の特徴表現能力の向上に有利であり、特徴が不明確な環境においても正確な目標領域画像の予測マスクを生成することが可能である。 Here, in the preprocessing process, the terminal filters small or difficult-to-visual character string region images in the image with the recognition target character string, thereby obtaining a high-dimensional image in the image with the recognition target character string. The terminal may be able to extract image features, and the terminal may use an image feature pyramid to perform a first feature enhancement process on the extracted high-dimensional image features, which is a character region image It is advantageous for improving the feature representation ability of features, and it is possible to generate an accurate prediction mask of the target area image even in an environment where the features are unclear.

上記実施例では、端末は、画像から文字領域画像特徴を抽出し、対応する予測マスクを生成し、また、予測マスクについて連通ドメイン及び最小外接矩形を求めることで、正確な目標領域画像を得てもよく、また、特徴が不明瞭であることによる文字列の認識漏れや誤認識などの問題を回避するために、端末は、画像特徴ピラミッドにより、抽出した画像特徴に対して第１特徴強調処理を行うことで、文字領域画像特徴の特徴表現能力を向上させることができ、これにより、文字列認識の正確性をさらに向上させる。 In the above embodiments, the terminal extracts the character region image features from the image, generates the corresponding prediction mask, and obtains the connected domain and minimum bounding rectangle for the prediction mask to obtain the accurate target region image. Also, in order to avoid problems such as recognition omissions and misrecognition of character strings due to obscure features, the terminal performs first feature enhancement processing on the extracted image features using the image feature pyramid. By performing the above, it is possible to improve the feature representation capability of the character area image feature, thereby further improving the accuracy of character string recognition.

一実施例では、図４に示すように、ステップＳ１０５は、ステップＳ４０１～ステップＳ４０３を含む。 In one embodiment, as shown in FIG. 4, step S105 includes steps S401-S403.

ステップＳ４０１において、端末は、コンテンツ認識モデルを利用して、横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得る。 In step S401, the terminal uses the content recognition model to perform global image feature extraction on the horizontal target area image to obtain the character string image feature corresponding to the horizontal target area image.

ここで、コンテンツ認識モデルは、主に横方向の目標領域画像に含まれる認識対象文字列の文字コンテンツを認識するものである。具体的には、端末は、コンテンツ認識モデルを利用して、得た横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得るようにしてもよい。 Here, the content recognition model mainly recognizes the character content of the recognition target character string included in the target area image in the horizontal direction. Specifically, the terminal uses the content recognition model to perform global image feature extraction on the obtained horizontal target area image to obtain character string image features corresponding to the horizontal target area image. can be

ステップＳ４０２において、端末は、行ベクトル畳み込みカーネルを用いて横方向に沿って文字列画像特徴に対して第２特徴強調処理を行う。 In step S402, the terminal performs a second feature enhancement process on the character string image feature along the horizontal direction using the row vector convolution kernel.

ここでは、第２特徴強調処理とは、文字列画像特徴に対する特徴強調処理を指す。具体的には、ステップＳ４０１では、文字列画像特徴を得た後、行ベクトル畳み込みカーネルを用いて、横方向、すなわち文字列の方向に沿って文字列画像特徴に対して第２特徴強調処理を行ってもよい。 Here, the second feature enhancement processing refers to feature enhancement processing for character string image features. Specifically, in step S401, after character string image features are obtained, a row vector convolution kernel is used to perform a second feature enhancement process on the character string image features in the horizontal direction, that is, along the direction of the character string. you can go

ステップＳ４０３において、端末は、第２特徴強調処理により得られた文字列画像特徴に基づいて、認識対象文字列を並列予測して、前記文字列コンテンツを得る。 In step S403, the terminal parallel-predicts the recognition target character string based on the character string image feature obtained by the second feature enhancement processing, and obtains the character string content.

また、文字列認識の効率をさらに高めるために、端末は、第２特徴強調処理により得られた文字列画像特徴について、文字列コンテンツの認識を行ってもよく、また、認識プロセスは並列予測であり、複数の文字列について予測することができ、このため、文字列コンテンツに対する効率的な予測が図られる。 Further, in order to further improve the efficiency of character string recognition, the terminal may perform character string content recognition on the character string image features obtained by the second feature enhancement processing, and the recognition process can be performed by parallel prediction. Yes, predictions can be made for multiple strings, thus providing efficient predictions for string content.

本実施例では、端末は、コンテンツ認識モデルによって文字列のコンテンツを正確に認識し、文字列画像特徴に対して第２特徴強調処理を行うことで、特徴の表現能力を向上させることができ、これにより、文字列コンテンツ認識の正確性を向上させ、また、並列予測方法によって全ての文字列に対して予測を行うことで、文字列コンテンツ認識の効率をさらに向上させる。 In this embodiment, the terminal accurately recognizes the content of the character string by the content recognition model, and performs the second feature enhancement processing on the character string image feature, thereby improving the ability to express the feature. This improves the accuracy of character string content recognition, and further improves the efficiency of character string content recognition by predicting all character strings by the parallel prediction method.

一実施例では、図５に示すように、コンピュータビジョンに基づく文字列認識方法を提供し、本実施例では、該方法が端末に適用される場合を例として説明するが、本実施例では、該方法は、ステップＳ５０１～ステップＳ５１０を含む。 In one embodiment, as shown in FIG. 5, a computer vision-based character string recognition method is provided, and in this embodiment, the case where the method is applied to a terminal is described as an example. The method includes steps S501-S510.

ステップＳ５０１において、端末は、認識対象文字列が付いた画像を取得する。 In step S501, the terminal acquires an image with a recognition target character string.

ステップＳ５０２において、端末は、画像を前処理し、前処理後の画像から高次元画像特徴を抽出し、画像特徴ピラミッドを利用して、高次元画像特徴に対して第１特徴強調処理を行い、文字領域画像特徴とする。 In step S502, the terminal preprocesses the image, extracts high-dimensional image features from the preprocessed image, uses the image feature pyramid to perform a first feature enhancement process on the high-dimensional image features, Character area image features.

ステップＳ５０３において、端末は、文字領域画像特徴に従って、目標領域画像の予測マスクを取得し、予測マスクについて連通ドメイン及び最小外接矩形を求め、目標領域画像を得る。 In step S503, the terminal obtains the prediction mask of the target area image according to the character area image features, determines the connected domain and the minimum bounding rectangle for the prediction mask, and obtains the target area image.

ステップＳ５０４において、端末は、目標領域画像を横方向補正して、横方向の目標領域画像を得る。 In step S504, the terminal laterally corrects the target area image to obtain a lateral target area image.

ステップＳ５０５において、端末は、角度判定モデルに基づいて、横方向の目標領域画像の起立角度を取得する。 In step S505, the terminal acquires the standing angle of the target area image in the horizontal direction according to the angle determination model.

ステップＳ５０６において、起立角度区間が前記第１角度区間である場合、端末は、文字列の起立状態を正立状態として決定し、起立角度区間が第２角度区間である場合、端末は、文字列の起立状態を倒立状態として決定する。 In step S506, if the standing angle section is the first angle section, the terminal determines the standing state of the character string as the upright state; if the standing angle section is the second angle section, the terminal determines the character string The upright state of is determined as the inverted state.

ステップＳ５０７において、文字列の起立状態が正立状態である場合、端末は、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、文字列の起立状態が倒立状態である場合、端末は、横方向の目標領域画像を正立状態に回転させてコンテンツ認識モデルに入力する。 In step S507, if the standing state of the character string is the upright state, the terminal inputs the horizontal target area image to the pre-built content recognition model; The terminal rotates the horizontal target area image to an upright state and inputs it to the content recognition model.

ステップＳ５０８において、端末は、コンテンツ認識モデルを利用して、横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得る。 In step S508, the terminal uses the content recognition model to perform global image feature extraction on the horizontal target area image to obtain the character string image feature corresponding to the horizontal target area image.

ステップＳ５０９において、端末は、行ベクトル畳み込みカーネルを用いて横方向に沿って文字列画像特徴に対して第２特徴強調処理を行う。 In step S509, the terminal performs a second feature enhancement process on the character string image feature along the horizontal direction using the row vector convolution kernel.

ステップＳ５１０において、端末は、第２特徴強調処理により得られた文字列画像特徴に基づいて、認識対象文字列を並列予測して、文字列コンテンツを得る。 In step S510, the terminal parallel-predicts the recognition target character string based on the character string image feature obtained by the second feature enhancement process to obtain the character string content.

上記実施例では、端末が目標領域画像を横方向補正することにより、工業シーンにおける画像撮影角度の変化への適応処理が図られ、文字列に対する認識の正確率が向上する。また、端末は、角度判定モデルにより、横方向の目標領域画像の起立角度を得て、文字列の起立状態を決定してもよく、文字列の起立状態が倒立状態である場合、端末は、回転によって横方向の目標領域画像を正立状態に変換し、これは、得られた文字列コンテンツの正確性のさらなる向上に有利である。また、端末は、画像特徴ピラミッドを利用して、抽出した高次元画像特徴に対して第１特徴強調処理を行い、文字列画像特徴に対して第２特徴強調処理を行ってもよく、これにより、特徴の表現能力を向上させ、文字列コンテンツ認識の正確性をさらに向上させることができる。しかも、並列予測方法によって全ての文字列について予測を行うことにより、文字列コンテンツ認識の効率をさらに向上させる。 In the embodiment described above, the terminal corrects the target area image in the horizontal direction, thereby achieving adaptive processing to changes in the image capturing angle in the industrial scene, thereby improving the recognition accuracy rate for the character string. In addition, the terminal may determine the upright state of the character string by obtaining the upright angle of the target area image in the horizontal direction from the angle determination model. Rotation transforms the horizontal target area image into an upright state, which is advantageous for further improving the accuracy of the resulting string content. Further, the terminal may use the image feature pyramid to perform first feature enhancement processing on the extracted high-dimensional image features, and perform second feature enhancement processing on the character string image features, whereby , can improve the ability to express features and further improve the accuracy of string content recognition. Moreover, by predicting all character strings by the parallel prediction method, the efficiency of character string content recognition is further improved.

一適用例では、現在の工業シーンにおいて、文字認識アルゴリズムのぼやけ、光照射や角度変化などの場合での認識漏れ、誤認識等の問題を効果的に解決し、認識正確率をより高くする目的で、工業シーンにおける任意の角度の文字列認識アルゴリズムをさらに提供する。本願は、カメラの画像形成環境が悪い工業環境に配置されてもよく、また、認識アルゴリズムの効率性や正確性を確保し、多角度、さらに倒立文字の認識をサポートする。ここで、アルゴリズムのトレーニング及び予測処理の流れを図６に示す。流れは主としてアルゴリズムのトレーニングと予測の２つのプロセスに分けられる。トレーニングプロセスでは、それぞれ文字列位置の検出、文字列角度の判定及び文字列コンテンツの認識のための３つの異なるモデルをトレーニングする必要がある。予測プロセスでは、トレーニング済みのモデルはテスト画像に入力されて、位置検出、角度判定及び認識コンテンツの順に処理を行い、最後に、文字列、位置及び対応するコンテンツが得られる。 One example of application is to effectively solve problems such as blurring of character recognition algorithms, recognition omissions due to light irradiation and angle changes, and misrecognition in the current industrial scene, and to increase the recognition accuracy rate. further provides an arbitrary angle string recognition algorithm in the industrial scene. The present application may be placed in an industrial environment where the imaging environment of the camera is poor, and also ensures the efficiency and accuracy of the recognition algorithm, and supports multi-angle and even inverted character recognition. FIG. 6 shows the flow of algorithm training and prediction processing. The flow is mainly divided into two processes: algorithm training and prediction. The training process requires training three different models for string position detection, string angle determination, and string content recognition, respectively. In the prediction process, the trained model is input to the test image and goes through the sequence of position detection, angle determination and recognized content, finally obtaining the string, position and corresponding content.

各モジュールによる処理の流れは、具体的には、以下のとおりである。 Specifically, the flow of processing by each module is as follows.

（一）トレーニングプロセス
１．１文字列位置検出アルゴリズム
トレーニングサンプルは、文字列を含む全体のサンプル画像であり、対応する注釈は、文字列位置の座標情報、例えば文字列の開始点の左上隅及び終了点の右下隅の情報を含む画像内の文字列の位置ボックスである。異なるトレーニングサンプルの間にスケール、色分布の違いが存在することから、サンプルに対して正規化処理を行うとともに、小さい又は視認しにくい文字列位置ボックスをフィルタリングする必要がある。画像前処理を受けたデータは、文字列位置検出アルゴリズム部分の入力とし、この部分はディープニューラルネットワークを介して、画像特徴ピラミッド構造と合わせて特徴強調を行う。図７に示すように、ｃｏｎｖは異なる畳み込み層を表し、ｓｔｒｉｄｅは異なるステップサイズを表し、抽出した各スケールの特徴についてアップサンプリングを行い、以前にネットワークを介して得られた特徴を加算することにより、最終的な画像特徴が得られる。この場合、該特徴は、空間情報に加えて、セマンティクス情報を保持している。位置検出アルゴリズムによって得られた画像特徴は、最終的な画像文字列領域に対するマスクを予測することに用いられる。該マスクについて連通ドメイン及び最小外接矩形を求めることにより、文字列位置ボックスが得られる。
１．２文字列角度判定アルゴリズム
図８に示すように、文字列の角度が０度よりも大きく１８０度未満の場合、アフィン変換によって横方向に補正された文字列画像が得られる。横方向に補正された後、最初の撮影角度により、補正後の文字列について正立か倒立が確保されにくく、このため、補正後の文字列が倒立であるか否かを判定するための角度判定アルゴリズムが追加され、倒立の場合、中心に対して文字列を１８０度回転させ、正立の場合、処理せずに直接出力する。このようにして、最終的に得られた文字列画像が正立のものとして確保され、次の文字列コンテンツの出力とされる。
１．３文字列コンテンツ認識アルゴリズム
図９に示すように、文字列画像コンテンツの認識には、ディープニューラルネットワークを用いて文字列特徴について学習を行い、列全体の特徴を取得するために、最後に、抽出した画像特徴に対して、行ベクトルを畳み込みカーネルとして、文字列方向に沿って特徴強調を行い、これにより、文字列コンテンツを並列して効率的に予測する。 (1) Training process 1.1 String position detection algorithm The training sample is the entire sample image containing the string, and the corresponding annotation is the coordinate information of the string position, such as the upper left corner of the starting point of the string and A position box for the string in the image containing information about the bottom right corner of the end point. Due to differences in scale, color distribution among different training samples, it is necessary to perform a normalization process on the samples and filter out small or hard-to-see string position boxes. The image preprocessed data is input to the character string position detection algorithm part, and this part performs feature enhancement together with the image feature pyramid structure through a deep neural network. As shown in Fig. 7, conv represents different convolutional layers, stride represents different step sizes, upsampling for each extracted scale feature, and summing the features previously obtained through the network by , yielding the final image features. In this case, the features carry semantic information in addition to spatial information. The image features obtained by the localization algorithm are used to predict the mask for the final image string region. Finding the connected domain and minimum bounding rectangle for the mask yields the string position box.
1.2 Character String Angle Determination Algorithm As shown in FIG. 8, when the angle of a character string is greater than 0 degrees and less than 180 degrees, a character string image corrected in the horizontal direction is obtained by affine transformation. After correction in the horizontal direction, it is difficult to ensure that the character string after correction is upright or inverted depending on the initial photographing angle. A decision algorithm is added to rotate the string 180 degrees about the center if it is upside down, and output it directly without processing if it is upright. In this way, the finally obtained character string image is secured as an upright image, and is used as the output of the next character string content.
1.3 Character String Content Recognition Algorithm As shown in FIG. 9, for character string image content recognition, a deep neural network is used to learn character string features, and finally, to obtain the features of the entire string, , for the extracted image features, the row vector is used as the convolution kernel to perform feature enhancement along the string direction, thereby predicting the string content in parallel and efficiently.

（二）予測プロセス
テスト画像を入力し、まず、文字列位置検出アルゴリズムで該テスト画像の文字列位置を検出する。次に、検出した画像領域についてトリミング及びアフィン変換を行い、変換後のトリミング領域を文字列角度判定アルゴリズムに供給し、トリミング領域画像が倒立であると判定した場合、中心に対して１８０度回転させ、正立であると判定した場合、処理しない。文字列位置検出アルゴリズム及び文字列角度判定アルゴリズムで処理された画像領域を、文字列コンテンツ認識ネットワークの入力とし、最後に、コンテンツ認識ネットワークにより、画像内の文字列の位置及び対応する文本コンテンツを得る。 (2) Prediction process A test image is input, and the character string position of the test image is first detected by a character string position detection algorithm. Next, trimming and affine transformation are performed on the detected image region, and the trimmed region after transformation is supplied to the character string angle determination algorithm. , if it is determined to be upright, it is not processed. The image regions processed by the string position detection algorithm and the string angle determination algorithm are input to the string content recognition network, and finally, the content recognition network obtains the position of the string in the image and the corresponding text content. .

上記適用例では、カスケード文字列位置検出アルゴリズム、文字列角度判定アルゴリズム及び文字列コンテンツ認識アルゴリズムという合計３段階のアルゴリズムにより、形成画像の鮮明さが変化したり、角度が変化したり、光照射が変化したりする一般的な工業シーンにおいても文字列を安定的かつ効率よく認識するアルゴリズムが得られ、工業シーンにおける文字列認識の適用のための基礎を築いた。 In the above application example, the sharpness of the formed image changes, the angle changes, and the light irradiation changes due to a total of three stages of algorithms, namely, the cascade character string position detection algorithm, the character string angle determination algorithm, and the character string content recognition algorithm. We obtained an algorithm that recognizes character strings stably and efficiently even in changing general industrial scenes, and laid the foundation for the application of character string recognition in industrial scenes.

なお、本願の流れ図における各ステップは、矢印のような順で示されているものの、これらのステップは必ずしも矢印のような順番に従って実行されるわけではない。明確な記載がない限り、これらのステップの実行の順番には厳格な制限がなく、これらのステップは他の順番で実行されてもよい。そして、図における少なくとも一部のステップは複数のステップ又は複数の段階を含んでもよく、これらのステップ又は段階は必ずしも同じタイミングで完了するわけではなく、異なるタイミングで実行されても構わず、これらのステップ又は段階は必ずしも順次実行されるとは限らず、他のステップ、他のステップにおけるステップ又は段階の少なくとも一部と順番に又は交互に実行されてもよい。 Although each step in the flowcharts of the present application is shown in the order indicated by the arrows, these steps are not necessarily executed according to the order indicated by the arrows. Unless explicitly stated, there is no hard limit to the order in which these steps are performed, and these steps may be performed in other orders. At least some of the steps in the diagrams may include multiple steps or multiple stages, and these steps or stages are not necessarily completed at the same timing, and may be executed at different timings. Steps or stages are not necessarily performed sequentially, but may be performed sequentially or alternately with other steps, or at least portions of steps or stages within other steps.

一実施例では、図１０に示すように、画像取得モジュール１００１と、位置検出モジュール１００２と、横方向補正モジュール１００３と、角度判定モジュール１００４と、コンテンツ認識モジュール１００５とを含むコンピュータビジョンに基づく文字列認識装置を提供し、
画像取得モジュール１００１は、認識対象文字列が付いた画像を取得するために用いられる。
位置検出モジュール１００２は、予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得するために用いられる。
横方向補正モジュール１００３は、目標領域画像を横方向補正して、横方向の目標領域画像を得るために用いられる。
角度判定モジュール１００４は、予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得するために用いられる。
コンテンツ認識モジュール１００５は、文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得するために用いられる。 In one embodiment, as shown in FIG. 10, a computer vision based text string module including an image acquisition module 1001, a position detection module 1002, a lateral correction module 1003, an angle determination module 1004, and a content recognition module 1005. providing a recognition device;
An image acquisition module 1001 is used to acquire an image attached with a recognition target character string.
The position detection module 1002 is used to acquire a target area image in which a character string to be recognized is located in an image based on a position detection model constructed in advance.
The lateral correction module 1003 is used to laterally correct the target area image to obtain a lateral target area image.
The angle determination module 1004 is used to obtain the standing state of the character string in the target area image in the horizontal direction based on the angle determination model constructed in advance.
The content recognition module 1005 inputs a target area image in the horizontal direction to a pre-constructed content recognition model and acquires the character string content corresponding to the recognition target character string when the upright state of the character string is the upright state. used for

一実施例では、角度判定モジュール１００４は、さらに、角度判定モデルに基づいて、横方向の目標領域画像の起立角度を取得し、起立角度が属する起立角度区間から、文字列の起立状態を決定するために用いられる。 In one embodiment, the angle determination module 1004 further obtains the standing angle of the target area image in the horizontal direction according to the angle determining model, and determines the standing state of the character string from the standing angle interval to which the standing angle belongs. used for

一実施例では、起立角度区間は第１角度区間と第２角度区間を含み、文字列の起立状態は正立状態と倒立状態を含み、角度判定モジュール１００４は、さらに、起立角度区間が前記第１角度区間である場合、文字列の起立状態を正立状態として決定し、起立角度区間が前記第２角度区間である場合、文字列の起立状態を倒立状態として決定するために用いられる。 In one embodiment, the standing angle interval includes a first angle interval and a second angle interval, the standing state of the character string includes an upright state and an inverted state, and the angle determination module 1004 further determines that the standing angle interval is It is used to determine the erected state of the character string as an upright state if it is one angle section, and to determine the erected state of the character string as an upside down state if the erected angle section is the second angle section.

一実施例では、コンテンツ認識モジュール１００５は、さらに、文字列の起立状態が倒立状態である場合、横方向の目標領域画像を正立状態に回転させてコンテンツ認識モデルに入力し、文字列コンテンツを取得するために用いられる。 In one embodiment, the content recognition module 1005 further rotates the horizontal target area image to an upright state and inputs it into the content recognition model to recognize the string content if the standing state of the string is an inverted state. used to retrieve

一実施例では、位置検出モジュール１００２は、さらに、位置検出モデルを利用して、画像から文字領域画像特徴を抽出し、文字領域画像特徴に従って、目標領域画像の予測マスクを取得し、予測マスクについて連通ドメイン及び最小外接矩形を求め、目標領域画像を得るために用いられる。 In one embodiment, the location detection module 1002 further utilizes the location detection model to extract text region image features from the image, obtains a prediction mask for the target region image according to the text region image features, and obtains a prediction mask for the prediction mask. The connected domain and minimum bounding rectangle are determined and used to obtain the target area image.

一実施例では、位置検出モジュール１００２は、さらに、画像を前処理し、前処理後の画像から高次元画像特徴を抽出し、画像特徴ピラミッドを利用して、高次元画像特徴に対して第１特徴強調処理を行い、文字領域画像特徴とするために用いられる。 In one embodiment, the location detection module 1002 further preprocesses the image, extracts high-dimensional image features from the preprocessed image, and utilizes an image feature pyramid to apply the first It is used to perform feature enhancement processing and obtain character area image features.

一実施例では、コンテンツ認識モジュール１００５は、さらに、コンテンツ認識モデルを利用して、横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得て、行ベクトル畳み込みカーネルを用いて横方向に沿って文字列画像特徴に対して第２特徴強調処理を行い、第２特徴強調処理により得られた文字列画像特徴に基づいて、認識対象文字列を並列予測して、文字列コンテンツを得るために用いられる。 In one embodiment, the content recognition module 1005 further utilizes the content recognition model to perform global image feature extraction on the horizontal target area image, and character string image feature corresponding to the horizontal target area image. and perform second feature enhancement processing on the character string image features along the horizontal direction using the row vector convolution kernel, and based on the character string image features obtained by the second feature enhancement processing, the recognition target Used for parallel prediction of strings to obtain string content.

コンピュータビジョンに基づく文字列認識装置についての具体的な限定は、上記でコンピュータビジョンに基づく文字列認識方法に対する限定を参照してもよいため、ここでは詳しく説明しない。上記コンピュータビジョンに基づく文字列認識装置における各モジュールの全部又は一部は、ソフトウェア、ハードウェアとソフトウェアとの組み合わせによって実装されてもよい。上記各モジュールは、ハードウェアの形態でコンピュータ機器のプロセッサに組み込まれたり、コンピュータ機器のプロセッサから独立してもよく、また、ソフトウェアの形態でコンピュータ機器のメモリに記憶されて、プロセッサによって呼び出されて以上の各モジュールに対応する操作を実行してもよい。 The specific limitations of the computer vision-based string recognition apparatus may be referred to the limitations of the computer vision-based string recognition method above, so they will not be described in detail here. All or part of each module in the computer vision-based character string recognition apparatus may be implemented by software or a combination of hardware and software. Each of the above modules may be incorporated in the processor of the computer equipment in the form of hardware, or may be independent from the processor of the computer equipment, or may be stored in the memory of the computer equipment in the form of software and called by the processor. Operations corresponding to each of the above modules may be executed.

一実施例では、コンピュータ機器を提供し、該コンピュータ機器は端末であってもよく、その内部構造図は図１１に示されるものであってもよい。該コンピュータ機器は、システムのバスを介して接続されたプロセッサ、メモリ、通信インターフェース、表示画面、及び入力装置を含む。該コンピュータ機器のプロセッサは計算及び制御の能力を提供するものである。該コンピュータ機器のメモリは、不揮発性記憶媒体、内部メモリを含む。該不揮発性記憶媒体にはオペレーティングシステム及びコンピュータプログラムが記憶されている。該内部メモリは不揮発性記憶媒体におけるオペレーティングシステム及びコンピュータプログラムが実行するための環境を提供する。該コンピュータ機器の通信インターフェースは外部の端末と有線又は無線通信を行うことに用いられ、無線方式は、ＷＩＦＩ、事業者のネットワーク、ＮＦＣ（近距離無線通信）や他の技術で実現されてもよい。該コンピュータプログラムは、プロセッサによって実行されると、コンピュータビジョンに基づく文字列認識方法を実現する。該コンピュータ機器の表示画面は液晶表示画面又は電子インク表示画面であってもよく、該コンピュータ機器の入力装置は表示画面上に覆われたタッチ層であってもよいし、コンピュータ機器のケースに設けられたボタン、トラックボール又はタッチパネルであってもよいし、外付けのキーボード、タッチパネルやマウスなどであってもよい。 In one embodiment, a computer device is provided, which may be a terminal, the internal structural diagram of which is shown in FIG. 11 . The computer equipment includes a processor, memory, communication interfaces, display screen, and input devices connected via a system bus. The computer device's processor provides the computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. An operating system and a computer program are stored in the non-volatile storage medium. The internal memory provides an environment for executing operating systems and computer programs on non-volatile storage media. The communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless method may be WIFI, operator's network, NFC (Near Field Communication) or other technologies. . The computer program implements a computer vision-based string recognition method when executed by a processor. The display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen or provided on the case of the computer equipment. It may be an attached button, trackball or touch panel, or an external keyboard, touch panel or mouse.

当業者にとって明らかなように、図１１に示す構造は、本願の解決手段に関連する部分の構造のブロック図に過ぎず、本願の解決手段が適用されるコンピュータ機器を限定するものではない。具体的には、コンピュータ機器は、図に示したものよりも少ない又は多い部材を含んだり、一部の部材を組み合わせたり、異なる部材の配置を有したりしてもよい。 It is obvious to those skilled in the art that the structure shown in FIG. 11 is only a block diagram of the structure of the part related to the solution of the present application, and does not limit the computer equipment to which the solution of the present application is applied. In particular, the computer equipment may include fewer or more components, combine some components, or have different arrangements of components than those shown in the figures.

一実施例では、コンピュータプログラムが記憶されているメモリと、コンピュータプログラムを実行すると上記各方法実施例におけるステップを実現するプロセッサと、を含むコンピュータ機器をさらに提供する。 In one embodiment, there is further provided a computer apparatus including a memory in which a computer program is stored and a processor that implements the steps in each of the above method embodiments when executing the computer program.

一実施例では、プロセッサによって実行されると上記各方法実施例のステップを実現するコンピュータプログラムが記憶されているコンピュータ読み取り可能な記憶媒体を提供する。 In one embodiment, a computer readable storage medium is provided on which is stored a computer program that, when executed by a processor, implements the steps of the above method embodiments.

当業者にとって明らかなように、上記実施例方法の全部又は一部の流れは、コンピュータプログラムが関連するハードウェアに命令することで実施されてもよく、前記コンピュータプログラムは不揮発性コンピュータ読み取り可能な取記憶媒体に記憶されてもよく、該コンピュータプログラムは、実行されると、上記各方法の実施例の流れを含んでもよい。本願に係る各実施例で使用されるメモリ、記憶、データベース又は他の媒体の全ての引用は、不揮発性メモリ及び揮発性メモリの少なくとも１種を含んでもよい。不揮発性メモリは、読み取り専用メモリ（ＲＯＭ：Ｒｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、磁気テープ、フロッピーディスク、フラッシュメモリ又は光メモリなどを含んでもよい。揮発性メモリは、ランダムアクセスメモリ（ＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）又は外部キャッシュメモリを含んでもよい。非限定的な説明であるが、ＲＡＭは、スタティックランダムアクセスメモリ（ＳＲＡＭ：ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）又はダイナミックランダムアクセスメモリ（ＤＲＡＭ：ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などのさまざまな形態であってもよい。 Those skilled in the art will appreciate that all or part of the flow of the above embodiment method may be implemented by a computer program instructing relevant hardware, said computer program being a non-volatile computer readable memory. The computer program may be stored in a storage medium and, when executed, may include the flow of the above method embodiments. All references to memory, storage, databases or other media used in embodiments of the present application may include at least one of non-volatile memory and volatile memory. Non-volatile memory may include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, or the like. Volatile memory can include random access memory (RAM) or external cache memory. By way of non-limiting illustration, RAM may be in various forms such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

以上の実施例の各技術的特徴は任意に組み合わせられてもよく、説明の便宜上、上記実施例の各技術的特徴の全ての可能な組み合わせは記載されていないが、これらの技術的特徴の組み合わせは、矛盾がない限り、本明細書に記載の範囲にあるとみなすべきである。 Each technical feature of the above embodiments may be combined arbitrarily. should be considered within the ranges described herein unless inconsistent.

以上に記載の実施例は本願のいくつかの実施形態に過ぎず、その説明は具体的かつ詳細であるが、本発明の特許範囲を制限するものとして理解すべきではない。なお、当業者であれば、本願の趣旨を逸脱せずに、いくつかの変形や改良を行うことができ、これらは全て本願の特許範囲に含まれるものとする。このため、本願の特許範囲は添付の特許請求の範囲に準じるべきである。 The above-described examples are merely some embodiments of the present application, and although the descriptions are specific and detailed, they should not be understood as limiting the patent scope of the present invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the spirit of the present application, and all of these are intended to be included in the patent scope of the present application. Accordingly, the scope of the present application shall be governed by the appended claims.

複数の実施例によれば、本願の第１態様は、
認識対象文字列が付いた画像を取得するステップと、
予め構築された位置検出モデルに基づいて、前記認識対象文字列が付いた画像のうちの前記認識対象文字列が位置する目標領域画像を取得するステップと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得るステップと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得するステップと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するステップとを含む、コンピュータビジョンに基づく文字列認識方法を提供する。 According to several embodiments, a first aspect of the present application includes:
obtaining an image with a string to be recognized;
obtaining a target area image in which the recognition target character string is located , from among the images attached with the recognition target character string, based on a position detection model constructed in advance;
laterally correcting the target area image to obtain a lateral target area image;
obtaining an upright state of a character string in the horizontal target area image based on a pre-constructed angle determination model;
inputting the horizontal target area image into a pre-constructed content recognition model to acquire character string content corresponding to the character string to be recognized, if the upright state of the character string is an upright state; To provide a computer vision-based string recognition method, including:

複数の実施例によれば、本願の第２態様は、
認識対象文字列が付いた画像を取得する画像取得モジュールと、
予め構築された位置検出モデルに基づいて、前記認識対象文字列が付いた画像のうちの前記認識対象文字列が位置する目標領域画像を取得する位置検出モジュールと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得る横方向補正モジュールと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得する角度判定モジュールと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するコンテンツ認識モジュールとを含む、コンピュータビジョンに基づく文字列認識装置を提供する。 According to several embodiments, a second aspect of the present application comprises:
an image acquisition module for acquiring an image with a recognition target character string;
a position detection module that acquires a target area image in which the recognition target character string is located in the image attached with the recognition target character string, based on a position detection model constructed in advance;
a lateral correction module laterally correcting the target area image to obtain a lateral target area image;
an angle determination module that acquires an upright state of a character string in the horizontal target area image based on a pre-constructed angle determination model;
A content recognition module for acquiring character string content corresponding to the character string to be recognized by inputting the target area image in the horizontal direction into a content recognition model constructed in advance when the standing state of the character string is an upright state. and a computer vision-based string recognizer.

ステップＳ１０２において、端末は、予め構築された位置検出モデルに基づいて、認識対象文字列が付いた画像のうちの認識対象文字列が位置する目標領域画像を取得する。 In step S102, the terminal acquires a target area image in which the recognition target character string is located in the image with the recognition target character string, based on the position detection model constructed in advance.

（一）トレーニングプロセス
１．１文字列位置検出アルゴリズム
トレーニングサンプルは、文字列を含む全体のサンプル画像であり、対応する注釈は、文字列位置の座標情報、例えば文字列の開始点の左上隅及び終了点の右下隅の情報を含む画像内の文字列の位置ボックスである。異なるトレーニングサンプルの間にスケール、色分布の違いが存在することから、サンプルに対して正規化処理を行うとともに、小さい又は視認しにくい文字列位置ボックスをフィルタリングする必要がある。画像前処理を受けたデータは、文字列位置検出アルゴリズム部分の入力とし、この部分はディープニューラルネットワークを介して、画像特徴ピラミッド構造と合わせて特徴強調を行う。図７に示すように、ｃｏｎｖは畳み込み層を表し、ｓｔｒｉｄｅはステップサイズを表し、抽出した各スケールの特徴についてアップサンプリングを行い、以前にネットワークを介して得られた特徴を加算することにより、最終的な画像特徴が得られる。この場合、該特徴は、空間情報に加えて、セマンティクス情報を保持している。位置検出アルゴリズムによって得られた画像特徴は、最終的な画像文字列領域に対するマスクを予測することに用いられる。該マスクについて連通ドメイン及び最小外接矩形を求めることにより、文字列位置ボックスが得られる。
１．２文字列角度判定アルゴリズム
図８に示すように、文字列の角度が０度よりも大きく１８０度未満の場合、アフィン変換によって横方向に補正された文字列画像が得られる。横方向に補正された後、最初の撮影角度により、補正後の文字列について正立か倒立が確保されにくく、このため、補正後の文字列が倒立であるか否かを判定するための角度判定アルゴリズムが追加され、倒立の場合、中心に対して文字列を１８０度回転させ、正立の場合、処理せずに直接出力する。このようにして、最終的に得られた文字列画像が正立のものとして確保され、次の文字列コンテンツの出力とされる。
１．３文字列コンテンツ認識アルゴリズム
図９に示すように、文字列画像コンテンツの認識には、ディープニューラルネットワークを用いて文字列特徴について学習を行い、列全体の特徴を取得するために、最後に、抽出した画像特徴に対して、行ベクトルを畳み込みカーネルとして、文字列方向に沿って特徴強調を行い、これにより、文字列コンテンツを並列して効率的に予測する。 (1) Training process 1.1 String position detection algorithm The training sample is the entire sample image containing the string, and the corresponding annotation is the coordinate information of the string position, such as the upper left corner of the starting point of the string and A position box for the string in the image containing information about the bottom right corner of the end point. Due to differences in scale, color distribution among different training samples, it is necessary to perform a normalization process on the samples and filter out small or hard-to-see string position boxes. The image preprocessed data is input to the character string position detection algorithm part, and this part performs feature enhancement together with the image feature pyramid structure through a deep neural network. As shown in Fig. 7, conv represents the convolutional layer, stride represents the step size, upsampling for each scale feature extracted, and summing the features previously obtained through the network. gives the final image features. In this case, the features carry semantic information in addition to spatial information. The image features obtained by the localization algorithm are used to predict the mask for the final image string region. Finding the connected domain and minimum bounding rectangle for the mask yields the string position box.
1.2 Character String Angle Determination Algorithm As shown in FIG. 8, when the angle of a character string is greater than 0 degrees and less than 180 degrees, a character string image corrected in the horizontal direction is obtained by affine transformation. After correction in the horizontal direction, it is difficult to ensure that the character string after correction is upright or inverted depending on the initial photographing angle. A decision algorithm is added to rotate the string 180 degrees about the center if it is upside down, and output it directly without processing if it is upright. In this way, the finally obtained character string image is secured as an upright image, and is used as the output of the next character string content.
1.3 Character String Content Recognition Algorithm As shown in FIG. 9, for character string image content recognition, a deep neural network is used to learn character string features, and finally, to obtain the features of the entire string, , for the extracted image features, the row vector is used as the convolution kernel to perform feature enhancement along the string direction, thereby predicting the string content in parallel and efficiently.

一実施例では、コンテンツ認識モジュール１００５は、さらに、文字列の起立状態が倒立状態である場合、横方向の目標領域画像を正立状態に回転させる。 In one embodiment, the content recognition module 1005 further rotates the horizontal target area image to an upright state if the upright state of the character string is upside down.

一実施例では、位置検出モジュール１００２は、さらに、位置検出モデルを利用して、認識対象文字列が付いた画像から文字領域画像特徴を抽出し、文字領域画像特徴に従って、目標領域画像の予測マスクを取得し、予測マスクについて連通ドメイン及び最小外接矩形を求め、目標領域画像を得るために用いられる。 In one embodiment, the position detection module 1002 further utilizes the position detection model to extract character region image features from the image with the target character string , and according to the character region image features, predict mask the target region image. and find the connected domain and minimum bounding rectangle for the prediction mask, which is used to obtain the target region image.

一実施例では、位置検出モジュール１００２は、さらに、認識対象文字列が付いた画像を前処理し、前処理後の画像から高次元画像特徴を抽出し、画像特徴ピラミッドを利用して、高次元画像特徴に対して第１特徴強調処理を行い、文字領域画像特徴とするために用いられる。 In one embodiment, the location detection module 1002 further preprocesses the image with the target string to recognize , extracts high-dimensional image features from the preprocessed image, and utilizes image feature pyramids to generate high-dimensional It is used to perform the first feature enhancement process on the image feature and obtain a character area image feature.

Claims

コンピュータビジョンに基づく文字列認識方法であって、
認識対象文字列が付いた画像を取得するステップと、
予め構築された位置検出モデルに基づいて、前記画像のうちの前記認識対象文字列が位置する目標領域画像を取得するステップと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得るステップと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得するステップと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するステップとを含む、ことを特徴とする文字列認識方法。 A computer vision-based string recognition method comprising:
obtaining an image with a string to be recognized;
obtaining a target area image in which the character string to be recognized is located in the image based on a position detection model constructed in advance;
laterally correcting the target area image to obtain a lateral target area image;
obtaining an upright state of a character string in the horizontal target area image based on a pre-constructed angle determination model;
inputting the horizontal target area image into a pre-constructed content recognition model to acquire character string content corresponding to the character string to be recognized, if the upright state of the character string is an upright state; A character string recognition method characterized by comprising:

前記予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得するステップは、
前記角度判定モデルに基づいて、前記横方向の目標領域画像の起立角度を取得するステップと、
前記起立角度が属する起立角度区間から、前記文字列の起立状態を決定するステップとを含む、ことを特徴とする請求項１に記載の方法。 The step of acquiring an upright state of a character string in the horizontal target area image based on the pre-constructed angle determination model,
obtaining an upright angle of the lateral target area image based on the angle determination model;
and determining the standing state of the string from the standing angle interval to which the standing angle belongs.

前記起立角度区間は第１角度区間と第２角度区間を含み、前記文字列の起立状態は正立状態と倒立状態とを含み、
前記起立角度が属する起立角度区間から、前記文字列の起立状態を決定する前記ステップは、
前記起立角度区間が前記第１角度区間である場合、前記文字列の起立状態を前記正立状態として決定するステップを含む、ことを特徴とする請求項２に記載の方法。 the standing angle section includes a first angle section and a second angle section, the standing state of the character string includes an upright state and an inverted state;
The step of determining the standing state of the character string from the standing angle section to which the standing angle belongs,
3. The method of claim 2, further comprising determining the standing state of the character string as the upright state if the standing angle section is the first angle section.

前記起立角度区間は第１角度区間と第２角度区間を含み、前記文字列の起立状態は正立状態と倒立状態を含み、
前記起立角度が属する起立角度区間から、前記文字列の起立状態を決定する前記ステップは、
前記起立角度区間が前記第２角度区間である場合、前記文字列の起立状態を前記倒立状態として決定するステップを含む、ことを特徴とする請求項２に記載の方法。 the standing angle section includes a first angle section and a second angle section, the standing state of the character string includes an upright state and an inverted state;
The step of determining the standing state of the character string from the standing angle section to which the standing angle belongs,
3. The method of claim 2, further comprising determining the standing state of the character string as the inverted state when the standing angle interval is the second angle interval.

前記文字列の起立状態が前記倒立状態である場合、前記横方向の目標領域画像を前記正立状態に回転させて前記コンテンツ認識モデルに入力し、前記文字列コンテンツを取得するステップをさらに含む、ことを特徴とする請求項３又は４に記載の方法。 further comprising: if the upright state of the character string is the inverted state, rotating the horizontal target area image to the upright state and inputting it into the content recognition model to obtain the character string content; 5. A method according to claim 3 or 4, characterized in that:

予め構築された位置検出モデルに基づいて、前記画像のうちの前記認識対象文字列が位置する目標領域画像を取得する前記ステップは、
前記位置検出モデルを利用して、前記画像から文字領域画像特徴を抽出するステップと、
前記文字領域画像特徴に従って、前記目標領域画像の予測マスクを取得するステップと、
前記予測マスクについて連通ドメイン及び最小外接矩形を求め、前記目標領域画像を得るステップとを含む、ことを特徴とする請求項１に記載の方法。 The step of acquiring a target area image in which the recognition target character string is located in the image based on a position detection model constructed in advance,
extracting character region image features from the image using the position detection model;
obtaining a prediction mask for the target area image according to the character area image features;
determining a connected domain and a minimum bounding rectangle for the prediction mask to obtain the target area image.

前記画像から文字領域画像特徴を抽出する前記ステップは、
前記画像を前処理し、前処理後の前記画像から高次元画像特徴を抽出するステップと、
画像特徴ピラミッドを利用して、前記高次元画像特徴に対して第１特徴強調処理を行い、前記文字領域画像特徴とするステップとを含む、ことを特徴とする請求項６に記載の方法。 The step of extracting character region image features from the image comprises:
preprocessing the image and extracting high-dimensional image features from the preprocessed image;
7. The method of claim 6, further comprising: applying a first feature enhancement process to said high-dimensional image features to become said text region image features using an image feature pyramid.

前記前処理は、
前記認識対象文字列が付いた前記画像のうちの小さい又は視認しにくい文字列領域画像をフィルタリングすることで、前記認識対象文字列が付いた前記画像内の前記高次元画像特徴を抽出するステップを含む、ことを特徴とする請求項７に記載の方法。 The pretreatment includes
extracting the high-dimensional image features in the image with the recognition target character string by filtering small or difficult-to-visual character string region images in the image with the recognition target character string; 8. The method of claim 7, comprising:

予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得する前記ステップは、
前記コンテンツ認識モデルを利用して、前記横方向の目標領域画像に対してグローバル画像特徴抽出を行い、前記横方向の目標領域画像に対応する文字列画像特徴を得るステップと、
行ベクトル畳み込みカーネルを用いて前記横方向に沿って前記文字列画像特徴に対して第２特徴強調処理を行うステップと、
前記第２特徴強調処理により得られた文字列画像特徴に基づいて、前記認識対象文字列を並列予測して、前記文字列コンテンツを得るステップとを含む、ことを特徴とする請求項１に記載の方法。 The step of inputting the horizontal target area image into a pre-constructed content recognition model to obtain character string content corresponding to the character string to be recognized includes:
performing global image feature extraction on the horizontal target area image using the content recognition model to obtain character string image features corresponding to the horizontal target area image;
performing a second feature enhancement process on the character string image features along the horizontal direction using a row vector convolution kernel;
2. The method according to claim 1, further comprising a step of parallel prediction of the character string to be recognized based on the character string image feature obtained by the second feature enhancement processing to obtain the character string content. the method of.

コンピュータビジョンに基づく文字列認識装置であって、
認識対象文字列が付いた画像を取得する画像取得モジュールと、
予め構築された位置検出モデルに基づいて、前記画像のうちの前記認識対象文字列が位置する目標領域画像を取得する位置検出モジュールと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得る横方向補正モジュールと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得する角度判定モジュールと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するコンテンツ認識モジュールとを含む、ことを特徴とする文字列認識装置。 A computer vision-based character string recognition device comprising:
an image acquisition module for acquiring an image with a recognition target character string;
a position detection module that obtains a target area image in which the character string to be recognized is located in the image based on a position detection model constructed in advance;
a lateral correction module laterally correcting the target area image to obtain a lateral target area image;
an angle determination module that acquires an upright state of a character string in the horizontal target area image based on a pre-constructed angle determination model;
A content recognition module for acquiring character string content corresponding to the character string to be recognized by inputting the target area image in the horizontal direction into a content recognition model constructed in advance when the standing state of the character string is an upright state. A character string recognition device characterized by comprising:

コンピュータプログラムが記憶されているメモリと、
前記コンピュータプログラムを実行すると請求項１～９のいずれか１項に記載の方法のステップを実現するプロセッサと、を含むことを特徴とするコンピュータ機器。 a memory in which a computer program is stored;
a processor which, when executing the computer program, implements the steps of the method of any one of claims 1-9.

プロセッサによって実行されると請求項１～９のいずれか１項に記載の方法のステップを実現するコンピュータプログラムが記憶されている、ことを特徴とするコンピュータ読み取り可能な記憶媒体。 A computer readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of any one of claims 1 to 9.