JP2019148940A

JP2019148940A - Learning processing method, server device, and reflection detection system

Info

Publication number: JP2019148940A
Application number: JP2018032594A
Authority: JP
Inventors: 祐長谷川; Yu Hasegawa
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2018-02-26
Filing date: 2018-02-26
Publication date: 2019-09-05

Abstract

To generate a highly accurate reflection detection model which can detect a reflection image region showing a reflection place of light in an arbitrary picked-up image even in the case that the picked-up image is inputted.SOLUTION: A learning processing method in an AI server includes: a step of generating a false image B' of an original image A on the basis of the original image A including a reflection region showing a reflection position of light; a step of evaluating authenticity of the false image B' in accordance with a comparison among learning images B1, B2, B3 generated to make a reflection region g1 in the original image A distinguishable from the other image regions and the false image B'; a step of generating a false image A' of the original image A on the basis of the false image B'; a step of evaluating authenticity of the false image A' in accordance with a comparison between the false image A' and the original image A; and a step of generating a learned model to be used to detect the reflection region g1 in an arbitrary picked-up image on the basis of an evaluation result of the authenticity of each of the false image B' and the false image A'.SELECTED DRAWING: Figure 7

Description

本開示は、学習処理方法、サーバ装置及び反射検知システムに関する。 The present disclosure relates to a learning processing method, a server device, and a reflection detection system.

特許文献１には、文字の読取等の処理に利用可能な画素を特定するために、撮像画像に含まれる各画素の輝度値を示す輝度画像を取得し、各画素の輝度値の度数分布を基に輝度閾値を決定し、高い輝度値の画素を強調する処理を前述した輝度画像に対して行って高輝度部分強調画像を生成する画像評価装置が開示されている。この画像評価装置は、高輝度部分強調装置に含まれる画素のそれぞれについて輝度値が輝度閾値を超えるか否かの判定結果に基づいて、輝度閾値を超える輝度を有する画素を高輝度画素と特定する。 In Patent Document 1, in order to identify pixels that can be used for processing such as character reading, a luminance image indicating the luminance value of each pixel included in the captured image is acquired, and the frequency distribution of the luminance value of each pixel is obtained. An image evaluation apparatus is disclosed in which a luminance threshold value is determined based on the above-described luminance image to generate a high luminance partial emphasized image by performing a process of enhancing pixels having a high luminance value. The image evaluation device identifies a pixel having a luminance exceeding the luminance threshold as a high luminance pixel based on a determination result of whether or not the luminance value exceeds the luminance threshold for each pixel included in the high luminance partial enhancement device. .

特開２０１７−１６２０３０号公報JP 2017-162030 A

しかし、特許文献１には、例えばスマートフォン等の携帯端末により撮像された撮像画像に照明光や外交等の光が反射した部分が含まれている場合に、その撮像画像中に生じた光反射画像領域を検知することは考慮されていない。 However, in Patent Document 1, for example, when a captured image captured by a mobile terminal such as a smartphone includes a portion where illumination light or diplomatic light is reflected, a light reflection image generated in the captured image. Detection of the area is not considered.

本開示は、上述した従来の状況に鑑みて案出され、任意の撮像画像が入力された場合でも、その撮像画像中の光の反射箇所を示す反射画像領域を検知可能な高精度な反射検知モデルを生成でき、任意の撮像画像において検知される反射画像領域の信頼性を的確に担保する学習処理方法、サーバ装置及び反射検知システムを提供することを目的とする。 The present disclosure has been devised in view of the above-described conventional situation, and even when an arbitrary captured image is input, high-precision reflection detection that can detect a reflected image region indicating a reflected portion of light in the captured image. It is an object of the present invention to provide a learning processing method, a server device, and a reflection detection system that can generate a model and accurately ensure the reliability of a reflected image area detected in an arbitrary captured image.

本開示は、サーバにおける学習処理方法であって、光の反射箇所を示す反射画像領域を含む学習処理対象の撮像画像に基づいて、前記撮像画像の第１類似画像を生成するステップと、前記撮像画像中の前記反射画像領域が他の画像領域と識別可能に生成された学習用画像と前記第１類似画像との比較に応じて、前記第１類似画像の真偽性を評価するステップと、前記第１類似画像に基づいて、前記撮像画像の第２類似画像を生成するステップと、前記第２類似画像と前記撮像画像との比較に応じて、前記第２類似画像の真偽性を評価するステップと、前記第１類似画像及び前記第２類似画像のそれぞれの真偽性の評価結果に基づいて、任意の撮像画像における前記反射画像領域の検知に用いる反射検知モデルを生成するステップと、を有する、学習処理方法を提供する。 The present disclosure is a learning processing method in a server, the step of generating a first similar image of the captured image based on a captured image of a learning processing target including a reflected image region indicating a reflected part of light, and the imaging Evaluating the authenticity of the first similar image according to a comparison between the first similar image and the learning image generated so that the reflected image region in the image can be distinguished from other image regions; The authenticity of the second similar image is evaluated based on the step of generating a second similar image of the captured image based on the first similar image and comparing the second similar image with the captured image. And generating a reflection detection model used for detection of the reflected image area in an arbitrary captured image based on the authenticity evaluation results of the first similar image and the second similar image, Have To provide a learning process method.

また、本開示は、光の反射箇所を示す反射画像領域を含む学習処理対象の撮像画像を保持するサーバであって、プロセッサとメモリと、を備え、前記プロセッサは、前記メモリと協働して、前記撮像画像に基づいて、前記撮像画像の第１類似画像を生成し、前記撮像画像中の前記反射画像領域が他の画像領域と識別可能に生成された学習用画像と前記第１類似画像との比較に応じて、前記第１類似画像の真偽性を評価し、前記第１類似画像に基づいて、前記撮像画像の第２類似画像を生成し、前記第２類似画像と前記撮像画像との比較に応じて、前記第２類似画像の真偽性を評価し、前記第１類似画像及び前記第２類似画像のそれぞれの真偽性の評価結果に基づいて、任意の撮像画像における前記反射画像領域の検知に用いる反射検知モデルを生成する、サーバ装置を提供する。 In addition, the present disclosure is a server that holds a captured image that is a learning processing target including a reflection image area that indicates a reflection position of light, and includes a processor and a memory, and the processor cooperates with the memory. Based on the captured image, a first similar image of the captured image is generated, and the learning image and the first similar image generated so that the reflected image region in the captured image can be distinguished from other image regions In accordance with the comparison, the authenticity of the first similar image is evaluated, a second similar image of the captured image is generated based on the first similar image, and the second similar image and the captured image are generated. According to the comparison with the second similar image, the authenticity of the second similar image is evaluated, and based on the evaluation results of the authenticity of the first similar image and the second similar image, the arbitrary captured image Reflection detection model used for detection of reflection image area Generating, for providing a server device.

また、本開示は、上記サーバと、撮像部及び表示部を有する携帯端末とが互いに通信可能に接続された反射検知システムであって、前記サーバは、前記撮像部により撮像された任意の撮像画像を取得すると、前記反射検知モデルを用いて、前記撮像画像中の前記反射画像領域を検知するとともに、前記撮像画像中の前記反射画像領域を他の画像領域と識別可能に加工した出力画像を生成して前記携帯端末に送信し、前記携帯端末は、前記サーバから送信された前記出力画像を用いて、前記出力画像のうち前記反射画像領域以外の前記他の画像領域を文字認識した結果を前記表示部に表示する、反射検知システムを提供する。 Further, the present disclosure is a reflection detection system in which the server and a mobile terminal having an imaging unit and a display unit are connected to be able to communicate with each other, and the server is an arbitrary captured image captured by the imaging unit , The reflection detection model is used to detect the reflection image area in the captured image and generate an output image that is processed so that the reflection image area in the captured image can be distinguished from other image areas. Then, the portable terminal uses the output image transmitted from the server, and the character recognition result of the other image area other than the reflected image area in the output image is used. A reflection detection system for displaying on a display unit is provided.

本開示によれば、任意の撮像画像が入力された場合でも、その撮像画像中の光の反射箇所を示す反射画像領域を検知可能な高精度な反射検知モデルを生成でき、任意の撮像画像において検知される反射画像領域の信頼性を的確に担保できる。 According to the present disclosure, even when an arbitrary captured image is input, it is possible to generate a highly accurate reflection detection model that can detect a reflected image region indicating a light reflection location in the captured image. The reliability of the reflected image area to be detected can be accurately ensured.

実施の形態１に係る反射検知システムのハードウェア構成を示すブロック図1 is a block diagram showing a hardware configuration of a reflection detection system according to Embodiment 1. FIG. 元画像Ａの準備及び前処理の動作手順の一例を説明するフローチャートA flowchart for explaining an example of the operation procedure of the preparation and preprocessing of the original image A 学習画像Ｂ１を生成する動作手順の一例を説明するフローチャートThe flowchart explaining an example of the operation | movement procedure which produces | generates learning image B1. 学習画像Ｂ２を生成する動作手順の一例を説明するフローチャートThe flowchart explaining an example of the operation | movement procedure which produces | generates learning image B2. 学習画像Ｂ３を生成する動作手順の一例を説明するフローチャートThe flowchart explaining an example of the operation | movement procedure which produces | generates learning image B3. 元画像Ａ、前処理後の画像Ｂ０、学習画像Ｂ１，Ｂ２，Ｂ３を示す図The figure which shows the original image A, the image B0 after pre-processing, and learning image B1, B2, B3 ＡＩサーバの学習の動作手順の一例を説明するフローチャートThe flowchart explaining an example of the operation | movement procedure of learning of an AI server. ＡＩサーバの反射箇所の検出の動作手順の一例を説明するフローチャートThe flowchart explaining an example of the operation | movement procedure of the detection of the reflective location of AI server. スマートフォンの翻訳動作手順の一例を説明するフローチャートFlowchart explaining an example of translation operation procedure of smartphone 撮像画像が表示されたスマートフォンの撮影画面例を示す図The figure which shows the example of the photography screen of the smart phone where the captured image is displayed 重畳画像が表示されたスマートフォンの確認画面例を示す図The figure which shows the example of the confirmation screen of the smart phone where the superimposed image is displayed スマートフォンに表示された翻訳結果画面例を示す図The figure which shows the example of the translation result screen which is displayed on the smart phone スマートフォンに表示された他の翻訳結果画面例を示す図The figure which shows the other translation result screen example which is displayed on the smart phone 他の撮像画像が表示されたスマートフォンの撮影画面例を示す図The figure which shows the example of a photography screen of the smart phone in which the other picked-up image was displayed 一部文字認識可能な範囲を含む重畳画像が表示されたスマートフォンの確認画面例を示す図The figure which shows the example of the confirmation screen of the smart phone where the superposition picture which includes the range where the part of character recognition is possible is displayed 一部文字認識可能な範囲が変更された確認画面例を示す図The figure which shows the example of a confirmation screen where the range which can recognize some characters was changed スマートフォンに表示された翻訳結果画面例を示す図The figure which shows the example of the translation result screen which is displayed on the smart phone スマートフォンに表示された他の翻訳結果画面例を示す図The figure which shows the other translation result screen example which is displayed on the smart phone

（実施の形態１の内容に至る経緯）
例えば、外国人等の旅行者が旅行先で自己が所持するスマートフォン等の携帯端末を用いて、その旅行者が内容確認したい文字部分が含まれる被写体を撮像することがある。携帯端末は、外国人等の操作により、その撮像画像中に含まれる文字部分を文字認識し、その文字認識結果を予めインストールされた翻訳アプリケーションで自己の母国語に変換する。これにより、外国人等の旅行者は、携帯端末により撮像された任意の撮像画像に含まれる文字部分の内容確認を行える。 (Background to the contents of the first embodiment)
For example, a traveler such as a foreigner may take an image of a subject including a character part that the traveler wants to check using a portable terminal such as a smartphone that the traveler owns at a travel destination. The portable terminal recognizes the character portion included in the captured image by the operation of a foreigner or the like, and converts the character recognition result into its own native language using a translation application installed in advance. Thereby, travelers, such as a foreigner, can confirm the content of the character part contained in the arbitrary captured images imaged with the portable terminal.

ところが、前述したように、撮像画像中に光反射画像領域が存在すると、その文字部分は文字認識不可となる。従って、携帯端末に表示される任意の撮像画像に対応する文字部分の翻訳結果に文字認識不可領域（つまり、光反射画像領域）が検知された場合には、その領域が撮像画像中に明示されれば、外国人等の旅行者にとっては親切な翻訳等の各種アプリケーションの提供が実現可能となると考えられる。 However, as described above, if there is a light reflection image area in the captured image, the character portion cannot be recognized. Therefore, when a character-recognizable region (that is, a light reflection image region) is detected in the translation result of the character portion corresponding to an arbitrary captured image displayed on the mobile terminal, that region is clearly indicated in the captured image. Thus, it is considered possible to provide various applications such as kind translation for foreign travelers.

そこで、以下の実施の形態１では、任意の撮像画像が入力された場合でも、その撮像画像中の光の反射箇所を示す反射画像領域を検知可能な高精度な反射検知モデルを生成でき、任意の撮像画像において検知される反射画像領域の信頼性を的確に担保する学習処理方法、サーバ装置及び反射検知システムの例を説明する。 Therefore, in the following first embodiment, even when an arbitrary captured image is input, a highly accurate reflection detection model that can detect a reflected image region indicating a reflected portion of light in the captured image can be generated. Examples of a learning processing method, a server device, and a reflection detection system that accurately ensure the reliability of the reflected image area detected in the captured image will be described.

以下、適宜図面を参照しながら、本開示に係る学習処理方法、サーバ装置及び反射検知システムを具体的に開示した実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。なお、添付図面及び以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の範囲に記載の主題を限定することは意図されていない。 Hereinafter, embodiments that specifically disclose a learning processing method, a server device, and a reflection detection system according to the present disclosure will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art. The accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter described in the claims.

図１は、実施の形態１に係る反射検知システム５のハードウェア構成を示すブロック図である。反射検知システム５は、ＡＩ（artificial intelligence）サーバ１０と、スマートフォン３０と、翻訳サーバ５０とを含む構成である。ＡＩ（artificial intelligence）サーバ１０と、スマートフォン３０と、翻訳サーバ５０とは、ネットワーク７０を介して互いに通信可能に接続される、 FIG. 1 is a block diagram illustrating a hardware configuration of the reflection detection system 5 according to the first embodiment. The reflection detection system 5 includes an AI (artificial intelligence) server 10, a smartphone 30, and a translation server 50. The AI (artificial intelligence) server 10, the smartphone 30, and the translation server 50 are connected to be communicable with each other via a network 70.

サーバ装置の一例としてのＡＩサーバ１０は、プロセッサ１１と、ＡＩ処理部１３と、メモリ１５と、ストレージ１７と、通信部１８とを含む構成である。 The AI server 10 as an example of a server device includes a processor 11, an AI processing unit 13, a memory 15, a storage 17, and a communication unit 18.

プロセッサ１１は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＤＳＰ（Digital Signal Processor）もしくはＦＰＧＡ（Field Programmable Gate Array）を用いて構成される。プロセッサ１１は、ＡＩサーバ１０の動作を司るコントローラとして機能し、ＡＩサーバ１０の各部の動作を全体的に統括するための制御処理、ＡＩサーバ１０の各部との間のデータの入出力処理、データの演算（計算）処理及びデータの記憶処理を行う。プロセッサ１１は、メモリ１５に記憶されたプログラム及びデータに従って動作する。プロセッサ１１は、動作時にメモリ１５を使用し、プロセッサ１１が生成又は取得したデータ又は情報をメモリ１５に一時的に保存してよい。 The processor 11 is configured using, for example, a central processing unit (CPU), a micro processing unit (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA). The processor 11 functions as a controller that controls the operation of the AI server 10. The processor 11 controls the overall operation of each part of the AI server 10, the input / output process of data with each part of the AI server 10, the data The calculation (calculation) processing and data storage processing are performed. The processor 11 operates according to programs and data stored in the memory 15. The processor 11 may use the memory 15 during operation and temporarily store data or information generated or acquired by the processor 11 in the memory 15.

ＡＩ処理部１３は、例えばスマートフォン３０から送信された任意の撮像画像に対するリアルタイムな画像処理（例えば後述する撮像画像中における光の反射箇所の検出並びに学習済みモデルを用いた出力画像の生成）に適したＧＰＵ（Graphics Processing Unit）を用いて構成されるプロセッサである。ＡＩ処理部１３は、後述する元画像と学習画像とを用いて、ＣｙｃｌｅＧＡＮ技術を用いた機械学習を実行して学習済みモデルを生成し、ストレージ１７に学習済みモデルのデータ（つまり、学習済みモデルデータ）を記憶する。ＡＩ処理部１３は、メモリ１３ｚを有し、例えばスマートフォン３０から送信された任意の撮像画像における光の反射箇所の検知処理の実行時に、ストレージ１７に記憶された学習済みモデルデータを読み出し、学習済みモデルをメモリ１３ｚに一時的に展開して記憶する。ＡＩ処理部１３は、スマートフォン３０で撮像された任意の撮像画像を入力し、学習済みモデルの一部の機能（例えば、元画像からその元画像に類似する偽画像を生成する偽画像生成器の機能、生成した偽画像の真偽を評価する偽画像判別器の機能、詳細は後述参照）を用いて、検出された光の反射箇所の画像領域を含む可視化画像を出力する。 The AI processing unit 13 is suitable for real-time image processing on an arbitrary captured image transmitted from the smartphone 30, for example (for example, detection of a light reflection point in a captured image described later and generation of an output image using a learned model). It is a processor configured using a GPU (Graphics Processing Unit). The AI processing unit 13 generates a learned model by performing machine learning using the CycleGAN technology using an original image and a learning image, which will be described later, and stores the learned model data (that is, the learned model) in the storage 17. Data). The AI processing unit 13 includes a memory 13z, for example, reads out learned model data stored in the storage 17 at the time of executing detection processing of a light reflection portion in an arbitrary captured image transmitted from the smartphone 30, and has already learned The model is temporarily expanded and stored in the memory 13z. The AI processing unit 13 receives an arbitrary captured image captured by the smartphone 30, and functions as a part of the learned model (for example, a false image generator that generates a false image similar to the original image from the original image). Using the function, the function of a fake image discriminator for evaluating the authenticity of the generated fake image, details will be described later), a visualized image including the image area of the detected reflected portion of the light is output.

メモリ１５は、例えばＲＡＭ（Random Access Memory）とＲＯＭ（Read Only Memory）とを用いて構成され、ＡＩサーバ１０の動作の実行に必要なプログラムやデータ、更には、動作中に生成されたデータ又は情報を一時的に保持する。ＲＡＭは、例えばＡＩサーバ１０の動作時に使用されるワークメモリである。ＲＯＭは、例えばＡＩサーバ１０を制御するためのプログラム及びデータを予め記憶して保持する。 The memory 15 is configured by using, for example, a RAM (Random Access Memory) and a ROM (Read Only Memory), and programs and data necessary for executing the operation of the AI server 10, as well as data generated during the operation, Hold information temporarily. The RAM is a work memory used when the AI server 10 operates, for example. The ROM stores and holds a program and data for controlling the AI server 10 in advance, for example.

ストレージ１７は、例えばＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）を用いて構成された記録装置である。ストレージ１７は、例えばプロセッサ１１もしくはＡＩ処理部１３が生成又は取得したデータ又は情報を記憶する。ストレージ１７は、ＡＩ処理部１３により生成された学習済みモデルデータを記憶する（図１参照）。 The storage 17 is a recording device configured using, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive). The storage 17 stores, for example, data or information generated or acquired by the processor 11 or the AI processing unit 13. The storage 17 stores the learned model data generated by the AI processing unit 13 (see FIG. 1).

通信部１８は、例えば有線ＬＡＮ（Local Area Network）や無線ＬＡＮ等を用いてネットワーク７０に接続される。通信部１８は、ネットワーク７０に接続された翻訳サーバ５０との間で通信可能であるとともに、外国人等の旅行者（ユーザの一例）が携帯して所持するスマートフォン３０との間で通信可能である。通信部１８は、スマートフォン３０から送信された任意の撮像画像（つまり、前述した旅行者が内容確認したい文字部分を有する任意の被写体の撮像画像）を受信する。通信部１８は、光の反射箇所の検知処理の結果として生成される出力画像（後述参照）をスマートフォン３０翻訳サーバ５０に送信する。 The communication unit 18 is connected to the network 70 using, for example, a wired LAN (Local Area Network) or a wireless LAN. The communication unit 18 can communicate with the translation server 50 connected to the network 70 and can communicate with the smartphone 30 carried by a traveler (an example of a user) such as a foreigner. is there. The communication unit 18 receives an arbitrary captured image transmitted from the smartphone 30 (that is, an captured image of an arbitrary subject having a character portion that the traveler wants to confirm the content described above). The communication unit 18 transmits an output image (see later) generated as a result of the detection process of the light reflection portion to the smartphone 30 translation server 50.

スマートフォン３０は、プロセッサ３１と、撮像部３２と、表示部３３と、入力部３４と、通信部３５と、メモリ３６とを含む構成である。スマートフォン３０は、例えば外国人等の旅行者により携帯され、使用時に把持される。スマートフォン３０は、例えば文字認識処理を実行可能なアプリケーション（文字認識アプリケーション）と、翻訳処理を実行可能なアプリケーション（翻訳アプリケーション）とを少なくとも実行可能に予めインストールされている。 The smartphone 30 includes a processor 31, an imaging unit 32, a display unit 33, an input unit 34, a communication unit 35, and a memory 36. The smartphone 30 is carried by a traveler such as a foreigner, for example, and is gripped during use. For example, the smartphone 30 is preinstalled so that at least an application (character recognition application) capable of executing character recognition processing and an application (translation application) capable of executing translation processing can be executed.

プロセッサ３１は、例えばＣＰＵ、ＭＰＵ、ＤＳＰもしくはＦＰＧＡを用いて構成される。プロセッサ３１は、スマートフォン３０の動作を司るコントローラとして機能し、スマートフォン３０の各部の動作を全体的に統括するための制御処理、スマートフォン３０の各部との間のデータの入出力処理、データの演算（計算）処理及びデータの記憶処理を行う。プロセッサ３１は、メモリ３６に記憶されたプログラム及びデータに従って動作する。プロセッサ３１は、動作時にメモリ３６を使用し、プロセッサ３１が生成又は取得したデータ又は情報をメモリ３６に一時的に保存してよい。 The processor 31 is configured using, for example, a CPU, MPU, DSP, or FPGA. The processor 31 functions as a controller that controls the operation of the smartphone 30, performs control processing for overall control of operations of each unit of the smartphone 30, data input / output processing with each unit of the smartphone 30, and data calculation ( Calculation) processing and data storage processing are performed. The processor 31 operates according to programs and data stored in the memory 36. The processor 31 may use the memory 36 during operation, and temporarily store data or information generated or acquired by the processor 31 in the memory 36.

撮像部３２は、集光用のレンズと、ＣＣＤ（Charge Coupled Device）型イメージセンサもしくはＣＭＯＳ（Complementary Metal Oxide Semiconductor）型イメージセンサ等の固体撮像素子とを有する構成である。撮像部３２は、スマートフォン３０の電源がオンである間、固体撮像素子による撮像に基づいて得られた被写体の撮像映像のデータを常時プロセッサ３１に出力する。被写体は、例えば、外国人等の旅行者が内容確認したい文字部分を含む看板もしくは広告等の情報伝達媒体であるが、この情報伝達媒体に限定されないことは言うまでもない。 The imaging unit 32 includes a condensing lens and a solid-state imaging device such as a CCD (Charge Coupled Device) type image sensor or a CMOS (Complementary Metal Oxide Semiconductor) type image sensor. The imaging unit 32 constantly outputs to the processor 31 data of a captured image of a subject obtained based on imaging by the solid-state imaging device while the smartphone 30 is powered on. The subject is, for example, an information transmission medium such as a signboard or an advertisement including a character portion that a traveler such as a foreigner wants to confirm, but it goes without saying that the subject is not limited to this information transmission medium.

表示部３３は、例えばＬＣＤ（Liquid Crystal Display）もしくは有機ＥＬ（Electroluminescence）を用いて構成され、スマートフォン３０の現在の状態を報知する以外に、各種の画面（例えば、撮像部３２による撮像時の撮影画面（いわゆる、プレビュー画面）、後述する確認画面、翻訳結果を示す画面等）を表示する。 The display unit 33 is configured by using, for example, an LCD (Liquid Crystal Display) or an organic EL (Electroluminescence), and in addition to notifying the current state of the smartphone 30, various screens (for example, photographing at the time of imaging by the imaging unit 32) A screen (a so-called preview screen), a confirmation screen (to be described later), a screen showing a translation result, etc.) are displayed.

入力部３４は、ユーザ（例えば前述した外国人等の旅行者）による各種の入力操作を受け付けて、その入力操作に応じた信号をプロセッサ３１に出力する。表示部３３及び入力部３４は、公知のタッチパネルＴＰで構成されてよい。 The input unit 34 receives various input operations by a user (for example, a traveler such as a foreigner described above) and outputs a signal corresponding to the input operation to the processor 31. The display unit 33 and the input unit 34 may be configured by a known touch panel TP.

通信部３５は、ネットワーク７０に接続されたＡＩサーバ１０及び翻訳サーバ５０との間で無線通信可能な通信回路を用いて構成される。通信部３５は、図示しないモバイル通信網（例えば４Ｇ（第４世代移動通信システム）、５Ｇ（第５世代移動通信システム））を介してネットワーク７０に接続される。通信部３５は、ネットワーク７０に接続されたＡＩサーバ１０及び翻訳サーバ５０との間で通信可能である。通信部３５は、撮像部３２により撮像された任意の撮像画像のデータをＡＩサーバ１０に送信する。 The communication unit 35 is configured using a communication circuit capable of wireless communication between the AI server 10 and the translation server 50 connected to the network 70. The communication unit 35 is connected to the network 70 via a mobile communication network (not shown) (for example, 4G (4th generation mobile communication system), 5G (5th generation mobile communication system)). The communication unit 35 can communicate with the AI server 10 and the translation server 50 connected to the network 70. The communication unit 35 transmits arbitrary captured image data captured by the imaging unit 32 to the AI server 10.

メモリ３６は、例えばＲＡＭとＲＯＭとを用いて構成され、スマートフォン３０の動作の実行に必要なプログラムやデータ、更には、動作中に生成されたデータ又は情報を一時的に保持する。ＲＡＭは、例えばスマートフォン３０の動作時に使用されるワークメモリである。ＲＯＭは、例えばスマートフォン３０を制御するためのプログラム及びデータを予め記憶して保持する。 The memory 36 is configured by using, for example, a RAM and a ROM, and temporarily stores programs and data necessary for executing the operation of the smartphone 30 and further data or information generated during the operation. The RAM is a work memory used when the smartphone 30 operates, for example. For example, the ROM stores and holds a program and data for controlling the smartphone 30 in advance.

なお、スマートフォン３０は、撮像機能及び通信機能を有する機器の一例であり、スマートフォンに限らず、ネットワーク７０に接続可能なカメラ、タブレット端末、ノートＰＣ、監視カメラ等であってもよい。 The smartphone 30 is an example of a device having an imaging function and a communication function, and is not limited to a smartphone, and may be a camera, a tablet terminal, a notebook PC, a surveillance camera, or the like that can be connected to the network 70.

翻訳サーバ５０は、プロセッサ５１と、メモリ５２と、ストレージ５３と、通信部５４とを含む構成である。翻訳サーバ５０は、例えばネットワーク７０に接続されたクラウドサーバであってよいし、例えばＡＩサーバ１０が配置される運営業者の事業所（図示略）に設置されるオンプレミスサーバとして構成されてもよい。翻訳サーバ５０は、スマートフォン３０もしくはＡＩサーバ１０から送信された撮像画像もしくは出力画像中の文字部分に相当する文字情報を所定の言語（例えば、スマートフォン３０のユーザにより予め設定された言語）に翻訳処理し、その翻訳処理結果に相当する文字情報をスマートフォン３０に返信する。 The translation server 50 includes a processor 51, a memory 52, a storage 53, and a communication unit 54. The translation server 50 may be a cloud server connected to the network 70, for example, or may be configured as an on-premises server installed at an operator's office (not shown) where the AI server 10 is arranged, for example. The translation server 50 translates character information corresponding to a character portion in the captured image or output image transmitted from the smartphone 30 or the AI server 10 into a predetermined language (for example, a language preset by the user of the smartphone 30). Then, character information corresponding to the translation processing result is returned to the smartphone 30.

プロセッサ５１は、例えばＣＰＵ、ＭＰＵ、ＤＳＰもしくはＦＰＧＡを用いて構成される。プロセッサ５１は、翻訳サーバ５０の動作を司るコントローラとして機能し、翻訳サーバ５０の各部の動作を全体的に統括するための制御処理、翻訳サーバ５０の各部との間のデータの入出力処理、データの演算（計算）処理及びデータの記憶処理を行う。プロセッサ５１は、メモリ５２に記憶されたプログラム及びデータに従って動作する。プロセッサ５１は、動作時にメモリ５２を使用し、プロセッサ５１が生成又は取得したデータ又は情報をメモリ５２に一時的に保存してよい。 The processor 51 is configured using, for example, a CPU, MPU, DSP, or FPGA. The processor 51 functions as a controller that controls the operation of the translation server 50, performs control processing for overall control of the operations of each unit of the translation server 50, data input / output processing with each unit of the translation server 50, data The calculation (calculation) processing and data storage processing are performed. The processor 51 operates according to programs and data stored in the memory 52. The processor 51 may use the memory 52 during operation and temporarily store data or information generated or obtained by the processor 51 in the memory 52.

メモリ５２は、例えばＲＡＭとＲＯＭとを用いて構成され、翻訳サーバ５０の動作の実行に必要なプログラムやデータ、更には、動作中に生成されたデータ又は情報を一時的に保持する。ＲＡＭは、例えば翻訳サーバ５０の動作時に使用されるワークメモリである。ＲＯＭは、例えば翻訳サーバ５０を制御するためのプログラム及びデータを予め記憶して保持する。 The memory 52 is configured by using, for example, a RAM and a ROM, and temporarily stores programs and data necessary for executing the operation of the translation server 50, and further data or information generated during the operation. The RAM is a work memory used when the translation server 50 operates, for example. For example, the ROM stores and holds a program and data for controlling the translation server 50 in advance.

ストレージ５３は、例えばＨＤＤ又はＳＳＤを用いて構成された記録装置である。ストレージ５３は、例えばプロセッサ５１が生成又は取得したデータ又は情報を記憶する。また、ストレージ５３は、翻訳処理の際に参照される、国毎の公用語である言語に対応する辞書データが予め登録された辞書ＤＢ５３ｚを含む。なお、翻訳サーバ５０は、ネットワーク７０もしくは他のネットワーク（図示略）との間で接続された専用の辞書データ管理サーバ（図示略）との間で定期的に通信することで、辞書ＤＢ５３ｚの内容を定期的に更新してよい。 The storage 53 is a recording device configured using, for example, an HDD or an SSD. The storage 53 stores data or information generated or acquired by the processor 51, for example. Further, the storage 53 includes a dictionary DB 53z in which dictionary data corresponding to a language that is an official language for each country, which is referred to during translation processing, is registered in advance. The translation server 50 periodically communicates with a dedicated dictionary data management server (not shown) connected to the network 70 or another network (not shown), so that the contents of the dictionary DB 53z. May be updated regularly.

通信部５４は、有線ＬＡＮや無線ＬＡＮ等を用いてネットワーク７０に接続される。通信部５４は、ネットワーク７０に接続されたＡＩサーバ１０及びスマートフォン３０と通信可能である。通信部５４は、スマートフォン３０から文字認識処理結果の文字情報を受信すると、その受信された文字情報をスマートフォン３０のユーザの公用語に対応するように予め設定された又はその都度設定された所定の言語に翻訳処理し、その翻訳結果の文字情報をスマートフォン３０に返信する。 The communication unit 54 is connected to the network 70 using a wired LAN, a wireless LAN, or the like. The communication unit 54 can communicate with the AI server 10 and the smartphone 30 connected to the network 70. When the communication unit 54 receives the character information of the character recognition processing result from the smartphone 30, the received character information is set in advance so as to correspond to the official language of the user of the smartphone 30, or is set each time. The language is translated and the character information of the translation result is returned to the smartphone 30.

なお、実施の形態１では、文字認識結果の文字情報を、翻訳サーバ５０が翻訳する場合を示したが、スマートフォン３０が、インストール済みの翻訳アプリケーションを起動し、文字認識結果の文字情報を所定の言語に翻訳してもよい。 In Embodiment 1, the case where the translation server 50 translates the character information of the character recognition result has been shown. However, the smartphone 30 activates the installed translation application, and the character information of the character recognition result is set to a predetermined value. It may be translated into a language.

次に、上述した実施の形態１に係る反射検知システム５の動作について、図面を参照して説明する。 Next, the operation of the reflection detection system 5 according to Embodiment 1 described above will be described with reference to the drawings.

実施の形態１に係る反射検知システム５は、文字部分が掲載された広告等をスマートフォン３０により撮像された撮像画像に含まれる文字情報を文字認識処理し、その文字認識処理された文字情報を所定の言語に翻訳する。反射検知システム５は、スマートフォン３０により撮像された撮像画像中に照明光や外光等の光の反射がある場合に、この光の反射がある個所を含む領域（以下、「反射領域」と称する場合がある）を検出し、反射領域以外の領域より一層識別可能な反射領域を含む可視光画像（以下、「出力画像」と称する場合がある）を出力する。実施の形態１では、ＡＩサーバ１０は、ＡＩモデルとして近年注目されているＣｙｃｌｅＧＡＮ（サイクルガン）を用いて機械学習を行い、スマートフォン３０により撮像される任意の被写体の撮像画像に含まれる反射領域（上述参照）を検出するためのＡＩモデル（つまり、学習済みモデル）を生成する。ＣｙｃｌｅＧＡＮによる機械学習では、元画像である撮像画像とその元画像に基づいて生成される学習画像との両方が用いられる。 The reflection detection system 5 according to Embodiment 1 performs character recognition processing on character information included in a captured image obtained by capturing an advertisement or the like on which a character portion is posted by the smartphone 30, and the character information subjected to the character recognition processing is predetermined. Translate to other languages. The reflection detection system 5 includes a region including a portion where the light is reflected (hereinafter referred to as a “reflection region”) when there is reflection of light such as illumination light or external light in the captured image captured by the smartphone 30. In some cases, a visible light image including a reflection region that can be further discriminated from a region other than the reflection region (hereinafter may be referred to as an “output image”) is output. In the first embodiment, the AI server 10 performs machine learning using CycleGAN (cycle gun) that has been attracting attention as an AI model in recent years, and includes a reflection area (included in a captured image of an arbitrary subject captured by the smartphone 30) ( An AI model (that is, a learned model) for detecting (see above) is generated. In machine learning using CycleGAN, both a captured image that is an original image and a learning image that is generated based on the original image are used.

（学習画像の生成）
先ず始めに、ＡＩサーバ１０による学習画像の生成について説明する。図２は、元画像Ａの準備及び前処理の動作手順の一例を説明するフローチャートである。ユーザ（例えば、外国人等の旅行者。以下同様とする。）は、スマートフォン３０を用いて広告等の印刷物（被写体の一例）を撮像し、撮像画像である元画像（図６の元画像Ａ）を準備する（Ｓ１）。実施の形態１の説明のために、元画像Ａには、外光や照明光等による光の反射領域が含まれるとしている。 (Generation of learning images)
First, generation of a learning image by the AI server 10 will be described. FIG. 2 is a flowchart for explaining an example of the operation procedure of the preparation and preprocessing of the original image A. A user (for example, a traveler such as a foreigner, the same shall apply hereinafter) images a printed matter (an example of a subject) such as an advertisement using the smartphone 30, and an original image (original image A in FIG. 6) that is a captured image. ) Is prepared (S1). For the description of the first embodiment, it is assumed that the original image A includes a light reflection area by external light, illumination light, or the like.

ユーザは、元画像Ａに対し所定の前処理を行い、前処理後の画像Ｂ０を取得する（Ｓ２）。元画像Ａに対する所定の前処理は、例えばスマートフォン３０もしくはＰＣ（図示略９にインストールされた画像編集系のアプリケーション（後述参照）において、ユーザの操作により、撮像画像の一部に映る光の反射領域を所定の色で塗り潰す処理である。例えば、スマートフォン３０の画面に表示された撮像画像に対し、ユーザは、画像編集系のアプリケーション（例えば、描画ツール又は画像処理ソフト）を用いて、反射領域を赤色で塗り潰す。前処理後の画像（つまり、図６の前処理後の画像Ｂ０）には、赤色で塗り潰されたマーカ領域ｍｋが描画される。スマートフォン３０は、前処理後の画像Ｂ０のデータをＡＩサーバ１０に送信する。ＡＩサーバ１０は、スマートフォン３０から受信した前処理後の画像Ｂ０のデータをストレージ１７に記憶する。 The user performs predetermined preprocessing on the original image A, and obtains a preprocessed image B0 (S2). The predetermined preprocessing for the original image A is, for example, a reflection region of light reflected in a part of the captured image by a user operation in a smartphone 30 or a PC (an image editing application (not described later) installed in an illustration 9). For example, for the captured image displayed on the screen of the smartphone 30, the user uses an image editing application (for example, a drawing tool or image processing software) to reflect the reflection region. The marker area mk painted in red is drawn in the pre-processed image (that is, the pre-processed image B0 in FIG. 6. The smartphone 30 displays the pre-processed image B0. Is transmitted to the AI server 10. The AI server 10 stores the data of the preprocessed image B0 received from the smartphone 30. And stores it in the di-17.

ＡＩサーバ１０は、前処理後の画像Ｂ０を用いて、複数の学習画像を生成する。ここでは、ＡＩサーバ１０が３枚の学習画像Ｂ１，Ｂ２，Ｂ３を生成する例を説明するが、任意の枚数の学習画像を生成してもよい。多くの学習画像を用意することで、ＡＩサーバ１０における学習済みモデルを生成する処理（言い換えると、学習済みモデルに用いられる学習パラメータの更新）の精度（つまり、学習精度）が向上する。 The AI server 10 generates a plurality of learning images using the preprocessed image B0. Although an example in which the AI server 10 generates three learning images B1, B2, and B3 will be described here, an arbitrary number of learning images may be generated. By preparing many learning images, the accuracy (that is, learning accuracy) of the process of generating the learned model in the AI server 10 (in other words, the update of the learning parameter used for the learned model) is improved.

図３は、学習画像Ｂ１を生成する動作手順の一例を説明するフローチャートである。図３に示す処理は、例えばＡＩサーバ１０のＡＩ処理部１３により実行される。ＡＩサーバ１０のＡＩ処理部１３は、ストレージ１７に記憶された前処理後の画像Ｂ０から１画素の画素値を取得する（Ｓ１１）。ＡＩ処理部１３は、１画素の画素値の取得の際に、例えば元画像Ａと同一サイズを有する前処理後の画像Ｂ０に対して２次元座標（つまり、ＸＹ座標）を設定し、Ｘ方向及びＹ方向に画素単位に取得対象の画素を移動しながら該当する画素の画素値を取得する。 FIG. 3 is a flowchart illustrating an example of an operation procedure for generating the learning image B1. The process shown in FIG. 3 is executed by, for example, the AI processing unit 13 of the AI server 10. The AI processing unit 13 of the AI server 10 acquires a pixel value of one pixel from the preprocessed image B0 stored in the storage 17 (S11). When acquiring the pixel value of one pixel, the AI processing unit 13 sets two-dimensional coordinates (that is, XY coordinates) for the pre-processed image B0 having the same size as the original image A, for example, in the X direction. Then, the pixel value of the corresponding pixel is acquired while moving the pixel to be acquired in the Y direction in units of pixels.

ＡＩ処理部１３は、取得された１画素の画素値に基づいて、その画素が塗り潰された画素であるか否かを判別する（Ｓ１２）。塗り潰された画素である場合（Ｓ１２、ＹＥＳ）、ＡＩ処理部１３は、この画素の画素値を所定の色に設定（例えば赤色で塗り潰すように設定）し、反射領域の出力画素（つまり、図３により生成される学習画像Ｂ１内の対応する画素）と設定する（Ｓ１３）。 Based on the acquired pixel value of one pixel, the AI processing unit 13 determines whether the pixel is a filled pixel (S12). When the pixel is filled (S12, YES), the AI processing unit 13 sets the pixel value of this pixel to a predetermined color (for example, is set so as to be filled in red), and outputs the output pixel (that is, the pixel in the reflection area). The corresponding pixel in the learning image B1 generated in FIG. 3 is set (S13).

一方、取得された１画素が塗り潰された画素でない場合（Ｓ１２、ＮＯ）、ＡＩ処理部１３は、この画素を白色に設定（例えば白色で塗り潰すように設定）し、非反射領域の出力画素（前述参照）とする（Ｓ１４）。 On the other hand, if the acquired one pixel is not a filled pixel (S12, NO), the AI processing unit 13 sets this pixel to white (for example, set to fill in white), and outputs the non-reflective region output pixel. (See above) (S14).

ステップＳ１３又はステップＳ１４の処理後、ＡＩ処理部１３は、ステップＳ１１において取得された１画素が終端の画素であるか（つまり、前処理後の画像Ｂ０の終端の画素に到達したか）否かを判別する（Ｓ１５）。終端の画素でない場合（Ｓ１５、ＮＯ）、ＡＩ処理部１３は、前処理後の画像Ｂ０に対し、取得対象の画素の位置をＸ方向又はＹ方向に１画素分移動する（Ｓ１６）。ＡＩ処理部１３の処理はステップＳ１１に戻り、ステップＳ１６において移動された次の１画素を対象として取得して同様の処理を繰り返す。 After the processing in step S13 or step S14, the AI processing unit 13 determines whether one pixel acquired in step S11 is a terminal pixel (that is, has reached the terminal pixel of the preprocessed image B0). Is discriminated (S15). When the pixel is not the terminal pixel (S15, NO), the AI processing unit 13 moves the position of the acquisition target pixel by one pixel in the X direction or the Y direction with respect to the pre-processed image B0 (S16). The process of the AI processing unit 13 returns to step S11, and acquires the next one pixel moved in step S16, and repeats the same process.

一方、終端の画素である場合（Ｓ１５、ＹＥＳ）、ＡＩ処理部１３は、ステップＳ１１，Ｓ１２，Ｓ１３，Ｓ１４，Ｓ１６，Ｓ１５の一連の処理により得られた画像を学習画像Ｂ１（図６参照）として生成してメモリ１３ｚに保存する（Ｓ１７）。この後、ＡＩ処理部１３は、学習画像Ｂ１の生成処理を終了する。 On the other hand, when the pixel is the terminal pixel (S15, YES), the AI processing unit 13 uses the image obtained by the series of processes of steps S11, S12, S13, S14, S16, and S15 as a learning image B1 (see FIG. 6). And stored in the memory 13z (S17). Thereafter, the AI processing unit 13 ends the generation process of the learning image B1.

図４は、学習画像Ｂ２を生成する動作手順の一例を説明するフローチャートである。図４に示す処理は、例えばＡＩサーバ１０のＡＩ処理部１３により実行される。ＡＩサーバ１０のＡＩ処理部１３は、元画像Ａから１画素の画素値を取得する（Ｓ２１）。ＡＩ処理部１３は、前処理後の画像Ｂ０から、元画像Ａの１画素に対応する（つまり、ＸＹ座標が同じである）１画素の画素値を取得する（Ｓ２２）。ＡＩ処理部１３は、その取得された１画素の画素値に基づいて、ステップＳ２２において取得された前処理後の画像Ｂ０の１画素が、塗り潰された画素であるか（つまり、光の反射領域にある画素であるか）否かを判別する（Ｓ２３）。 FIG. 4 is a flowchart for explaining an example of an operation procedure for generating the learning image B2. The process illustrated in FIG. 4 is executed by, for example, the AI processing unit 13 of the AI server 10. The AI processing unit 13 of the AI server 10 acquires a pixel value of one pixel from the original image A (S21). The AI processing unit 13 acquires a pixel value of one pixel corresponding to one pixel of the original image A (that is, having the same XY coordinates) from the pre-processed image B0 (S22). The AI processing unit 13 determines whether one pixel of the pre-processed image B0 acquired in step S22 is a filled pixel based on the acquired pixel value of one pixel (that is, a light reflection region). (S23).

塗り潰された画素である場合（Ｓ２３、ＹＥＳ）、ＡＩ処理部１３は、元画像Ａの１画素値から輝度値を計算する（Ｓ２４）。例えば、ＡＩ処理部１３は、赤色成分をｒ、緑色成分をｇ、青色成分をｂ、輝度値ｙとすると、「ｙ＝０．２９９ｒ＋０．５８７ｇ＋０．１１４ｂ」の式により輝度値ｙを算出可能であり、以下同様である。ＡＩ処理部１３は、元画像Ａの１画素に対応する出力画素（つまり、図４に示す動作により生成される学習画像Ｂ２内の対応する画素）のＲ画素に、ステップＳ２４で計算された輝度値を設定する（Ｓ２５）。ＡＩ処理部１３は、出力画素のＧ，Ｂ画素にそれぞれ輝度値０を設定する（Ｓ２６）。 If the pixel is filled (S23, YES), the AI processing unit 13 calculates a luminance value from one pixel value of the original image A (S24). For example, if the red component is r, the green component is g, the blue component is b, and the luminance value is y, the AI processing unit 13 can calculate the luminance value y using the formula “y = 0.299r + 0.587 g + 0.114b”. The same applies hereinafter. The AI processing unit 13 calculates the luminance calculated in step S24 on the R pixel of the output pixel corresponding to one pixel of the original image A (that is, the corresponding pixel in the learning image B2 generated by the operation shown in FIG. 4). A value is set (S25). The AI processing unit 13 sets a luminance value 0 for each of the G and B pixels of the output pixel (S26).

一方、ステップＳ２２において取得された前処理後の画像Ｂ０の１画素が塗り潰された画素でない場合（Ｓ２３、ＮＯ）、ＡＩ処理部１３は、元画像Ａの１画素の画素値を出力画素（前述参照）の画素値に設定する（Ｓ２７）。 On the other hand, when one pixel of the pre-processed image B0 acquired in step S22 is not a filled pixel (S23, NO), the AI processing unit 13 outputs the pixel value of one pixel of the original image A as an output pixel (described above). (Refer to) is set to the pixel value (S27).

ステップＳ２６又はステップＳ２７の処理後、ＡＩ処理部１３は、ステップＳ２１において取得された画素が終端の画素であるか（つまり、元画像Ａ０の終端の画素に到達したか）否かを判別する（Ｓ２８）。終端の画素でない場合（Ｓ２８、ＮＯ）、ＡＩ処理部１３は、元画像Ａに対し、取得対象の画素の位置をＸ方向又はＹ方向に１画素分移動する（Ｓ２９）。ＡＩ処理部１３の処理はステップＳ２１に戻り、ステップＳ２９において移動された次の１画素を対象として取得して同様の処理を繰り返す。 After the process of step S26 or step S27, the AI processing unit 13 determines whether or not the pixel acquired in step S21 is a terminal pixel (that is, has reached the terminal pixel of the original image A0) ( S28). If it is not the terminal pixel (S28, NO), the AI processing unit 13 moves the position of the acquisition target pixel by one pixel in the X direction or the Y direction with respect to the original image A (S29). The process of the AI processing unit 13 returns to step S21, acquires the next one pixel moved in step S29, and repeats the same process.

一方、終端の画素である場合（Ｓ２８、ＹＥＳ）、ＡＩ処理部１３は、ステップＳ２１，Ｓ２２，Ｓ２３，Ｓ２４，Ｓ２５，Ｓ２６，Ｓ２７，Ｓ２８の一連の処理により得られた画像を学習画像Ｂ２（図６参照）として生成してメモリ１３ｚに保存する（Ｓ３０）。この後、ＡＩ処理部１３は学習画像Ｂ２の生成処理を終了する。 On the other hand, if the pixel is a terminal pixel (S28, YES), the AI processing unit 13 uses an image obtained by a series of processes of steps S21, S22, S23, S24, S25, S26, S27, and S28 as a learning image B2 ( And is stored in the memory 13z (S30). Thereafter, the AI processing unit 13 ends the generation process of the learning image B2.

図５は、学習画像Ｂ３を生成する動作手順の一例を説明するフローチャートである。図５に示す処理は、例えばＡＩサーバ１０のＡＩ処理部１３により実行される。ＡＩサーバ１０のＡＩ処理部１３は、元画像Ａから１画素の画素値を取得する（Ｓ３１）。ＡＩ処理部１３は、前処理後の画像Ｂ０から、元画像Ａの１画素に対応する（つまり、ＸＹ座標が同じである）１画素の画素値を取得する（Ｓ３２）。ＡＩ処理部１３は、ステップＳ３１において取得された元画像Ａの１画素の画素値から、例えば上述した算出式を用いて輝度値を計算する（Ｓ３３）。 FIG. 5 is a flowchart illustrating an example of an operation procedure for generating the learning image B3. The process illustrated in FIG. 5 is executed by, for example, the AI processing unit 13 of the AI server 10. The AI processing unit 13 of the AI server 10 acquires a pixel value of one pixel from the original image A (S31). The AI processing unit 13 acquires a pixel value of one pixel corresponding to one pixel of the original image A (that is, having the same XY coordinates) from the pre-processed image B0 (S32). The AI processing unit 13 calculates a luminance value from the pixel value of one pixel of the original image A acquired in step S31 using, for example, the above-described calculation formula (S33).

ＡＩ処理部１３は、その取得された１画素の画素値に基づいて、ステップＳ３２において取得された前処理後の画像Ｂ０の１画素が、塗り潰された画素であるか（つまり、光の反射領域にある画素であるか）否かを判別する（Ｓ３４）。塗り潰された画素である場合（Ｓ３４、ＹＥＳ）、ＡＩ処理部１３は、元画像Ａの１画素に対応する出力画素（つまり、図５に示す動作により生成される学習画像Ｂ３内の対応する画素）のＲ画素の輝度値を、ステップＳ３３において計算された輝度値に設定する（Ｓ３５）。ＡＩ処理部１３は、出力画素（前述参照）のＧ，Ｂ画素に、それぞれ輝度値０を設定する（Ｓ３６）。 Based on the acquired pixel value of one pixel, the AI processing unit 13 determines whether one pixel of the preprocessed image B0 acquired in step S32 is a filled pixel (that is, a light reflection region). (S34). When the pixel is filled (S34, YES), the AI processing unit 13 outputs the output pixel corresponding to one pixel of the original image A (that is, the corresponding pixel in the learning image B3 generated by the operation shown in FIG. 5). The luminance value of the R pixel is set to the luminance value calculated in step S33 (S35). The AI processing unit 13 sets a luminance value of 0 for each of the G and B pixels of the output pixel (see above) (S36).

一方、ステップＳ３２において取得された前処理後の画像Ｂ０の１画素が塗り潰された画素でない場合（Ｓ３４、ＮＯ）、ＡＩ処理部１３は、出力画素（前述参照）のＲ，Ｇ，Ｂ画素のそれぞれに、ステップＳ３３において計算された輝度値を設定する（Ｓ３７）。 On the other hand, when one pixel of the pre-processed image B0 acquired in step S32 is not a filled pixel (S34, NO), the AI processing unit 13 determines the R, G, and B pixels of the output pixel (see above). The brightness value calculated in step S33 is set for each (S37).

ステップＳ３６又はステップＳ３７の処理後、ＡＩ処理部１３は、ステップＳ３１において取得された画素が終端の画素であるか（つまり、元画像Ａの終端の画素に到達したか）否かを判別する（Ｓ３８）。終端の画素でない場合（Ｓ３８、ＮＯ）、ＡＩ処理部１３は、元画像Ａに対し、取得対象の画素の位置をＸ方向又はＹ方向に１画素分移動する（Ｓ３９）。ＡＩ処理部１３の処理はステップＳ３１に戻り、ステップＳ３９において移動された次の１画素を対象として取得して同様の処理を繰り返す。 After the processing of step S36 or step S37, the AI processing unit 13 determines whether or not the pixel acquired in step S31 is the terminal pixel (that is, reaches the terminal pixel of the original image A) ( S38). If it is not the terminal pixel (S38, NO), the AI processing unit 13 moves the position of the acquisition target pixel by one pixel in the X direction or the Y direction with respect to the original image A (S39). The process of the AI processing unit 13 returns to step S31, acquires the next one pixel moved in step S39, and repeats the same process.

一方、終端の画素である場合（Ｓ３９、ＹＥＳ）、ＡＩ処理部１３は、ステップＳ３１，Ｓ３２，Ｓ３３，Ｓ３４，Ｓ３５，Ｓ３６，Ｓ３７，Ｓ３８の一連の処理後の画像を学習画像Ｂ３（図６参照）として生成してメモリ１３ｚに保存する（Ｓ４０）。この後、ＡＩ処理部１３は学習画像Ｂ３の生成処理を終了する。 On the other hand, when it is the terminal pixel (S39, YES), the AI processing unit 13 uses the image after the series of processing in steps S31, S32, S33, S34, S35, S36, S37, and S38 as the learning image B3 (FIG. 6). As a reference) and stored in the memory 13z (S40). Thereafter, the AI processing unit 13 ends the generation process of the learning image B3.

図６は、元画像Ａ、前処理後の画像Ｂ０、学習画像Ｂ１，Ｂ２，Ｂ３を示す図である。元画像Ａは、広告や飲食店のメニュー等を被写体としてユーザの操作に基づいてスマートフォン３０により撮像された撮像画像である。元画像Ａには、照明光や外光等の光による反射領域ｇ１が存在し、反射領域ｇ１の近傍では、文字認識が不可である（言い換えると、文字情報が判読できない）。 FIG. 6 is a diagram illustrating the original image A, the preprocessed image B0, and the learning images B1, B2, and B3. The original image A is a captured image captured by the smartphone 30 based on a user operation with an advertisement, a restaurant menu, or the like as a subject. In the original image A, there is a reflection region g1 due to light such as illumination light or external light, and character recognition is impossible in the vicinity of the reflection region g1 (in other words, character information cannot be read).

前処理後の画像Ｂ０は、元画像Ａに対して前処理（図２参照）を行った画像である。前処理後の画像Ｂ０は、ユーザが描画ツールや画像処理ソフトを使用して反射領域を赤色で塗り潰したマーカ領域ｍｋが含まれる。 The pre-processed image B0 is an image obtained by performing pre-processing (see FIG. 2) on the original image A. The pre-processed image B0 includes a marker area mk in which the reflection area is painted in red by the user using a drawing tool or image processing software.

学習画像Ｂ１は、前処理後の画像Ｂ０に対し、マーカ領域ｍｋを所定の色（ここでは、赤色）に設定し、その他の領域を背景色（白色）に設定した画像である。なお、マーカ領域ｍｋに設定される所定の色は、赤色でなく、青色等の任意の色でもよい。また、背景色は、白色に限らず、緑色や青色等、撮像画像にあまり含まれない色でもよい。 The learning image B1 is an image in which the marker area mk is set to a predetermined color (here, red) and the other areas are set to the background color (white) with respect to the preprocessed image B0. Note that the predetermined color set in the marker region mk is not red but may be any color such as blue. The background color is not limited to white, and may be a color that is not so much included in the captured image, such as green or blue.

学習画像Ｂ２は、元画像Ａから輝度値を算出し、マーカ領域ｍｋでＲ，Ｇ，Ｂ成分のうち、Ｒ成分を算出した輝度値に置換し、Ｇ，Ｂ成分を輝度値０に設定し、その他の領域を元画像Ａの画素値にした画像である。 The learning image B2 calculates a luminance value from the original image A, replaces the R component of the R, G, and B components in the marker region mk with the calculated luminance value, and sets the G and B components to a luminance value of zero. This is an image in which other regions are set to the pixel values of the original image A.

学習画像Ｂ３は、元画像Ａから輝度値を算出した後、マーカ領域ｍｋでＲ成分を輝度値に置換し、その他の領域でＲ，Ｇ，Ｂ成分を輝度値に置換した画像である。 The learning image B3 is an image in which the luminance value is calculated from the original image A, the R component is replaced with the luminance value in the marker region mk, and the R, G, and B components are replaced with the luminance value in the other regions.

（学習済モデルを生成するための機械学習）
図７は、ＡＩサーバ１０の学習の動作手順の一例を説明するフローチャートである。図７に示す処理は、例えばＡＩサーバ１０のＡＩ処理部１３により実行される。ＡＩサーバ１０のＡＩ処理部１３は、ＡＩモデル（例えば前述したＣｙｃｌｅＧＡＮ）において使用されるパラメータ（以下、「学習パラメータ」という）を設定する（Ｓ５１）。 (Machine learning to generate a learned model)
FIG. 7 is a flowchart for explaining an example of the learning operation procedure of the AI server 10. The process illustrated in FIG. 7 is executed by, for example, the AI processing unit 13 of the AI server 10. The AI processing unit 13 of the AI server 10 sets parameters (hereinafter referred to as “learning parameters”) used in the AI model (for example, the above-described CycleGAN) (S51).

学習パラメータは、例えばＡＩモデルを形成するニューラルネットワークを学習する際のＬｅａｒｎｉｎｇＲａｔｅ（つまり、学習率）である。実施の形態１の機械学習では、例えばＣｙｃｌｅＧＡＮを用いたＡＩモデルの学習パラメータを最適化する。ＣｙｃｌｅＧＡＮを用いたＡＩモデルは、例えば、Ｂ’生成器、偽Ｂ評価器、Ｂ−Ｂ’類似度評価器、Ａ’生成器、偽Ａ評価器、及びＡ−Ａ’類似度評価器を含む。また、ＣｙｃｌｅＧＡＮを用いたＡＩモデルでは、元画像Ａ、元画像Ａの偽画像Ａ’、学習画像Ｂ、学習画像Ｂの偽画像Ｂ’が用いられる。このＡＩモデルでは、Ｂ’生成器の学習パラメータが最適化される。Ｂ’生成器は、元画像Ａあるいは偽画像Ａ’から偽画像Ｂ’を生成する。また、この学習モデルでは、Ａ’生成器の学習パラメータが最適化される。Ａ’生成器は、学習画像Ｂあるいは偽画像Ｂ’から偽画像Ａ’を生成する。学習画像Ｂには、図６に示した学習画像Ｂ１，Ｂ２，Ｂ３が用いられる。 The learning parameter is, for example, a learning rate (that is, a learning rate) when learning a neural network that forms an AI model. In the machine learning according to the first embodiment, for example, learning parameters of an AI model using CycleGAN are optimized. AI models using CycleGAN include, for example, a B ′ generator, a fake B evaluator, a BB ′ similarity evaluator, an A ′ generator, a fake A evaluator, and an AA ′ similarity evaluator. . In the AI model using CycleGAN, the original image A, the fake image A ′ of the original image A, the learning image B, and the fake image B ′ of the learning image B are used. In this AI model, the learning parameters of the B ′ generator are optimized. The B ′ generator generates a false image B ′ from the original image A or the false image A ′. In this learning model, the learning parameters of the A ′ generator are optimized. The A ′ generator generates a false image A ′ from the learning image B or the false image B ′. As the learning image B, learning images B1, B2, and B3 shown in FIG. 6 are used.

ＡＩ処理部１３は、元画像Ａから偽画像Ｂ’を生成する（Ｓ５２）。つまり、ＡＩ処理部１３は、ＡＩモデルのＢ’生成器（偽画像生成器）に元画像Ａを入力して偽画像Ｂ’を生成する。そして、ＡＩ処理部１３は、偽画像Ｂ’の生成精度を評価する（Ｓ５３）。この評価の結果に基づいて、Ｂ’生成器の精度指標となる生成精度指標ＫＢ１が更新される。ＡＩ処理部１３は、偽Ｂ評価器（偽画像判別器）により、Ｂ’生成器で生成した偽画像Ｂ’の真偽を評価する（Ｓ５４）。つまり、偽Ｂ評価器が、Ｂ’生成器で生成された偽画像Ｂ’の真偽を判定する。この判定の結果、偽Ｂ評価器の精度指標となる判別精度指標ＫＢ２が更新される。 The AI processing unit 13 generates a false image B ′ from the original image A (S52). That is, the AI processing unit 13 inputs the original image A to the AI model B ′ generator (fake image generator) to generate the fake image B ′. Then, the AI processing unit 13 evaluates the generation accuracy of the fake image B ′ (S53). Based on the result of this evaluation, the generation accuracy index KB1 serving as the accuracy index of the B ′ generator is updated. The AI processing unit 13 evaluates the authenticity of the fake image B ′ generated by the B ′ generator using the fake B evaluator (fake image discriminator) (S54). That is, the false B evaluator determines whether the false image B 'generated by the B' generator is true or false. As a result of this determination, the discrimination accuracy index KB2 that is the accuracy index of the false B evaluator is updated.

ＡＩ処理部１３は、偽画像Ｂ’から偽画像Ａ’を生成する（Ｓ５５）。つまり、ＡＩ処理部１３は、ＡＩモデルのＡ’生成器に偽画像Ｂ’を入力して偽画像Ａ’を生成する。ＡＩ処理部１３は、生成した偽画像Ａ’の類似度を評価する（Ｓ５６）。つまり、Ａ−Ａ’類似度評価器は、偽画像Ａ’と元画像Ａの類似度を計算する。類似度の計算結果、元画像Ａと再構築された偽画像Ａ’の再構築精度指標ＫＡ３が更新される。 The AI processing unit 13 generates a fake image A 'from the fake image B' (S55). That is, the AI processing unit 13 inputs the false image B ′ to the AI model A ′ generator to generate the false image A ′. The AI processing unit 13 evaluates the similarity of the generated fake image A ′ (S56). That is, the A-A ′ similarity evaluator calculates the similarity between the fake image A ′ and the original image A. As a result of the similarity calculation, the reconstruction accuracy index KA3 of the original image A and the reconstructed fake image A ′ is updated.

また、ＡＩ処理部１３は、学習画像Ｂから偽画像Ａ’を生成する（Ｓ５７）。つまり、ＡＩ処理部１３は、Ａ’生成器（偽画像生成器）に学習画像Ｂを入力して偽画像Ａ’を生成する。そして、ＡＩ処理部１３は、偽画像Ａ’の生成精度を評価する（Ｓ５８）。この評価の結果に基づいて、Ａ’生成器の精度指標となる生成精度指標ＫＡ１が更新される。ＡＩ処理部１３は、偽Ａ評価器（偽画像判別器）によりＡ‘生成器で生成した偽画像Ａ’の真偽を評価する（Ｓ５９）。つまり、偽Ａ評価器は、Ａ’生成器で生成された偽画像Ａ’の真偽を判定する。この判定の結果、偽Ｂ評価器の精度指標となる判別精度指標ＫＡ２が更新される。 Further, the AI processing unit 13 generates a false image A ′ from the learning image B (S57). That is, the AI processing unit 13 inputs the learning image B to the A ′ generator (fake image generator) and generates the fake image A ′. Then, the AI processing unit 13 evaluates the generation accuracy of the fake image A ′ (S58). Based on the result of this evaluation, the generation accuracy index KA1 serving as the accuracy index of the A ′ generator is updated. The AI processing unit 13 evaluates the authenticity of the fake image A ′ generated by the A ′ generator using the fake A evaluator (fake image discriminator) (S59). That is, the fake A evaluator determines whether the fake image A ′ generated by the A ′ generator is true or false. As a result of this determination, the discrimination accuracy index KA2 that is the accuracy index of the false B evaluator is updated.

ＡＩ処理部１３は、偽画像Ａ’から偽画像Ｂ’を生成する（Ｓ６０）。つまり、ＡＩ処理部１３は、Ｂ’生成器に偽画像Ａ’を入力して偽画像Ｂ’を生成する。ＡＩ処理部１３は、生成した偽画像Ｂ’の類似度を評価する（Ｓ６１）。つまり、Ｂ−Ｂ’類似度評価器は、偽画像Ｂ’と学習画像Ｂの類似度を計算する。類似度の計算結果、元画像Ｂと再構築された偽画像Ｂ’の再構築精度指標ＫＢ３が更新される。 The AI processing unit 13 generates a fake image B 'from the fake image A' (S60). That is, the AI processing unit 13 inputs the false image A ′ to the B ′ generator and generates the false image B ′. The AI processing unit 13 evaluates the similarity of the generated fake image B ′ (S61). That is, the B-B ′ similarity evaluator calculates the similarity between the fake image B ′ and the learning image B. As a result of the similarity calculation, the reconstruction accuracy index KB3 of the original image B and the reconstructed fake image B 'is updated.

ＡＩ処理部１３は、上述した生成精度指標ＫＡ１、判別精度指標ＫＡ２、再構築精度指標ＫＡ３、生成精度指標ＫＢ１、判別精度指標ＫＢ２、及び再構築精度指標ＫＢ３を基に、ＡＩモデルの学習パラメータ（例えば、Ｂ’生成器の学習パラメータとＡ’生成器の学習パラメータ）を更新する（Ｓ６２）。 The AI processing unit 13 is based on the above-described generation accuracy index KA1, discrimination accuracy index KA2, reconstruction accuracy index KA3, generation accuracy index KB1, discrimination accuracy index KB2, and reconstruction accuracy index KB3, and learning parameters of the AI model ( For example, the learning parameters of the B ′ generator and the learning parameters of the A ′ generator are updated (S62).

ＡＩ処理部１３は、全ての元画像Ａと学習画像Ｂ（例えば、学習画像Ｂ１，Ｂ２，Ｂ３）を用いて、上記ステップＳ５２〜Ｓ６２の学習処理を行ったか否かを判別する（Ｓ６３）。つまり、ＡＩ処理部１３は、全ての元画像Ａと学習画像Ｂのデータが学習済となったか否かを判別する。なお、図６に示した元画像Ａと学習画像Ｂ（Ｂ１，Ｂ２，Ｂ３）は、一例であり、多くの元画像Ａと学習画像Ｂを用いることが学習精度の向上のためには望ましい。 The AI processing unit 13 determines whether or not the learning processing in steps S52 to S62 has been performed using all the original images A and learning images B (for example, learning images B1, B2, and B3) (S63). That is, the AI processing unit 13 determines whether or not the data of all the original images A and the learning images B have been learned. Note that the original image A and the learning image B (B1, B2, B3) shown in FIG. 6 are examples, and it is desirable to use many original images A and learning images B in order to improve learning accuracy.

学習済でないデータがある場合（Ｓ６３、ＮＯ）、ＡＩ処理部１３は、次のデータを取得する（Ｓ６４）。ＡＩ処理部１３の処理はステップＳ５２に戻り、同様の処理（つまり、ステップＳ５２，Ｓ５３，Ｓ５４，Ｓ５５，Ｓ５６，Ｓ５７，Ｓ５８，Ｓ５９，Ｓ６０、Ｓ６１，Ｓ６２，Ｓ６３，Ｓ６４の一連の処理）を繰り返す。 When there is data that has not been learned (S63, NO), the AI processing unit 13 acquires the next data (S64). The process of the AI processing unit 13 returns to step S52, and the same process (that is, a series of processes of steps S52, S53, S54, S55, S56, S57, S58, S59, S60, S61, S62, S63, and S64) is performed. repeat.

一方、全てのデータが学習済となった場合（Ｓ６３、ＹＥＳ）、ＡＩ処理部１３は、学習済みモデル（つまり、学習済みのＣｙｃｌｅＧＡＮを用いたＡＩモデル）を生成し、生成した学習済みモデルのデータをストレージ１７に保存する（Ｓ６５）。この後、ＡＩ処理部１３は、図７に示す学習処理を終了する。 On the other hand, when all the data has been learned (S63, YES), the AI processing unit 13 generates a learned model (that is, an AI model using a learned cycle GAN), and the generated learned model Data is stored in the storage 17 (S65). Thereafter, the AI processing unit 13 ends the learning process shown in FIG.

図８は、ＡＩサーバ１０の反射箇所の検出の動作手順の一例を説明するフローチャートである。図８に示す処理は、例えばＡＩサーバ１０のＡＩ処理部１３により実行される。ＡＩサーバ１０のＡＩ処理部１３は、スマートフォン３０により撮像された撮像画像を検出対象画像として取得し、メモリ１３ｚに記憶する（Ｓ７１）。ＡＩ処理部１３は、ストレージ１７に保存された学習済みモデルデータを読み出し、ＡＩネットワークとしてメモリ１３ｚに展開して取り込む（Ｓ７２）。 FIG. 8 is a flowchart for explaining an example of an operation procedure for detecting a reflection location of the AI server 10. The process shown in FIG. 8 is executed by the AI processing unit 13 of the AI server 10, for example. The AI processing unit 13 of the AI server 10 acquires a captured image captured by the smartphone 30 as a detection target image and stores it in the memory 13z (S71). The AI processing unit 13 reads the learned model data stored in the storage 17 and develops and imports it into the memory 13z as an AI network (S72).

ＡＩ処理部１３は、学習済みモデルの一部であるＢ’´生成器に対し、検出対象画像（撮像画像）を入力し、反射領域が可視化された画像を出力する（Ｓ７３）。反射領域が可視化された画像は、例えばＡＩ処理部１３における学習済みモデル（ＡＩモデル）を用いた処理実行時に反射領域が赤く描画され、その他の領域がグレーで描画された画像である。 The AI processing unit 13 inputs the detection target image (captured image) to the B ″ generator that is a part of the learned model, and outputs an image in which the reflection region is visualized (S73). The image in which the reflection area is visualized is an image in which, for example, the reflection area is drawn in red when the process using the learned model (AI model) in the AI processing unit 13 is executed, and the other areas are drawn in gray.

ＡＩ処理部１３は、画像の色成分の強度比を基に、非反射領域か反射領域かを判断し、反射領域情報を取得する（Ｓ７４）。非反射領域の画像は、後述するように、文字認識処理及び翻訳処理のそれぞれの対象とされる。反射領域の画像は、文字認識処理及び翻訳処理の対象外とされる。この後、ＡＩ処理部１３は、図８に示すＡＩ反射検出処理を終了する。 The AI processing unit 13 determines whether the region is a non-reflective region or a reflective region based on the intensity ratio of the color components of the image, and acquires reflective region information (S74). As will be described later, the image of the non-reflective area is a target of character recognition processing and translation processing. The image of the reflection area is not subject to character recognition processing and translation processing. Thereafter, the AI processing unit 13 ends the AI reflection detection process shown in FIG.

（スマートフォンの翻訳動作）
図９は、スマートフォン３０の翻訳動作手順の一例を説明するフローチャートである。図９に示す処理は、例えばスマートフォン３０のプロセッサ３１により主に実行される。スマートフォン３０のプロセッサ３１は、ユーザの操作を受け付けると、文字認識・翻訳アプリを起動する（Ｓ８１）。ユーザが広告等の被写体に対し、シャッタ操作（つまり、撮像開始操作）を行うと、撮像部３２は、被写体を撮像する。プロセッサ３１は、撮像部３２で撮像された撮像画像ＧＺ１（図１０参照）を取得し、メモリ３６に記憶する（Ｓ８２）。通信部３５は、メモリ３６に記憶された撮像画像ＧＺ１を、ネットワーク７０を介して、ＡＩサーバ１０に送信する（Ｓ８３）。 (Translation behavior of smartphone)
FIG. 9 is a flowchart for explaining an example of the translation operation procedure of the smartphone 30. The process illustrated in FIG. 9 is mainly executed by the processor 31 of the smartphone 30, for example. When receiving the user operation, the processor 31 of the smartphone 30 activates the character recognition / translation application (S81). When the user performs a shutter operation (that is, an imaging start operation) on a subject such as an advertisement, the imaging unit 32 images the subject. The processor 31 acquires the captured image GZ1 (see FIG. 10) captured by the imaging unit 32 and stores it in the memory 36 (S82). The communication unit 35 transmits the captured image GZ1 stored in the memory 36 to the AI server 10 via the network 70 (S83).

図１０は、撮像画像ＧＺ１が表示されたスマートフォン３０の撮影画面ＧＭ１の一例を示す図である。撮像画像ＧＺ１内には、例えば２箇所に照明光による反射領域ｇ１が現れたとする。また、撮影画面ＧＭ１には、撮像画像ＧＺ１に矩形窓ｗｋ１が重畳して表示される。撮影画面ＧＭ１には、矩形窓ｗｋ１に隠れて表示されないが、撮像画像ＧＺ１には、コーヒー、紅茶の文字情報が含まれる（図１２Ｂ参照）。また、撮影画面ＧＭ１には、カメラのシャッタボタン（つまり、撮像開始ボタン）を示すシャッタアイコンｓｔが表示される。 FIG. 10 is a diagram illustrating an example of the shooting screen GM1 of the smartphone 30 on which the captured image GZ1 is displayed. In the captured image GZ1, for example, it is assumed that reflection areas g1 due to illumination light appear in two places. In addition, a rectangular window wk1 is superimposed and displayed on the captured image GM1 on the captured image GZ1. Although the shooting screen GM1 is not displayed hidden behind the rectangular window wk1, the captured image GZ1 includes character information of coffee and tea (see FIG. 12B). In addition, a shutter icon st indicating a shutter button (that is, an imaging start button) of the camera is displayed on the shooting screen GM1.

ＡＩサーバ１０の通信部１８は、スマートフォン３０から撮像画像を受信する。ＡＩ処理部１３は、受信した撮像画像に対し、図８に示したＡＩ反射検出処理を行って反射領域情報を取得する。通信部１８は、ＡＩ処理部１３で得られた反射領域情報をスマートフォン３０に送信する。 The communication unit 18 of the AI server 10 receives a captured image from the smartphone 30. The AI processing unit 13 performs the AI reflection detection process shown in FIG. 8 on the received captured image, and acquires reflection area information. The communication unit 18 transmits the reflection area information obtained by the AI processing unit 13 to the smartphone 30.

スマートフォン３０の通信部３５は、ネットワーク７０を介して、ＡＩサーバ１０から反射領域情報を受信する（Ｓ８４）。プロセッサ３１は、受信された反射領域情報を基に、メモリ３６に記憶された撮像画像に対し、特定の色（例えば赤色）で表された反射位置ｍｃを重畳させ、反射位置ｍｃが重畳した重畳画像ＧＺ２を生成し、表示部３３に表示する（Ｓ８５）。 The communication unit 35 of the smartphone 30 receives the reflection area information from the AI server 10 via the network 70 (S84). The processor 31 superimposes the reflection position mc represented by a specific color (for example, red) on the captured image stored in the memory 36 based on the received reflection area information, and superimposes the reflection position mc superimposed thereon. An image GZ2 is generated and displayed on the display unit 33 (S85).

図１１は、重畳画像ＧＺ２が表示されたスマートフォン３０の確認画面ＧＭ２の一例を示す図である。プロセッサ３１は、反射位置が重畳した重畳画像ＧＺ２に対し、文字認識を行う（Ｓ８６）。プロセッサ３１は、文字認識処理の結果をメモリ３６に記憶する。認識された文字には、文字認識できたことを表すマーキングとして文字掛けｈｍが施される。文字掛けｈｍが施されると、表示部３３の画面に表示される文字の表示形態が変化する。例えば、文字の色が文字認識前の黒色から文字を囲むグレーに変化する。 FIG. 11 is a diagram illustrating an example of the confirmation screen GM2 of the smartphone 30 on which the superimposed image GZ2 is displayed. The processor 31 performs character recognition on the superimposed image GZ2 on which the reflection position is superimposed (S86). The processor 31 stores the result of the character recognition process in the memory 36. Character recognition hm is applied to the recognized character as a marking indicating that the character has been recognized. When the character hanging hm is applied, the display form of the characters displayed on the screen of the display unit 33 changes. For example, the character color changes from black before character recognition to gray surrounding the character.

また、プロセッサ３１は、確認画面ＧＭ２の下方に矩形窓ｗｋ２を表示し、矩形窓ｗｋ２に翻訳の有無を確認するメッセージを表示する。ここでは、タッチパネルＴＰの画面の下方に設定された表示領域には、「Ｔｒａｎｓｌａｔｅｔｈｅｄｉｓｐｌａｙ．ＩｓｉｔＯＫ？」のメッセージが表示される。また、タッチパネルＴＰの画面の下方には、入力部３４としてＹＥＳボタン３４ｚ及びＮＯボタン３４ｙが配置される。ユーザは、文字認識の結果、翻訳を行う場合、ＹＥＳボタン３４ｚを押下する。また、ユーザは、翻訳を行わない場合、ＮＯボタン３４ｙを押下する。 Further, the processor 31 displays a rectangular window wk2 below the confirmation screen GM2, and displays a message for confirming the presence / absence of translation in the rectangular window wk2. Here, a message “Translate the display. Is it OK?” Is displayed in the display area set below the screen of the touch panel TP. Also, a YES button 34z and a NO button 34y are arranged as the input unit 34 below the screen of the touch panel TP. As a result of character recognition, the user presses the YES button 34z when translating. In addition, when the user does not perform translation, the user presses the NO button 34y.

プロセッサ３１は、ユーザの操作を受け付け、翻訳を開始するか否かを判別する（Ｓ８７）。翻訳を開始する場合、通信部３５は、プロセッサの指示に従い、メモリ３６に文字認識の結果得られた文字情報を、ネットワーク７０に接続された翻訳サーバ５０に送信する。翻訳サーバ５０の通信部５４は、スマートフォン３０から送信された文字情報を、受信する。翻訳サーバ５０のプロセッサ５１は、ストレージ５３の辞書ＤＢ５３ｚを参照し、文字情報を予め指定された国の言語（例えば、外国人自身の母国語）で翻訳処理する。通信部５４は、翻訳処理の結果をスマートフォン３０に送信する。 The processor 31 receives a user operation and determines whether or not to start translation (S87). When starting translation, the communication unit 35 transmits character information obtained as a result of character recognition to the memory 36 to the translation server 50 connected to the network 70 in accordance with an instruction from the processor. The communication unit 54 of the translation server 50 receives the character information transmitted from the smartphone 30. The processor 51 of the translation server 50 refers to the dictionary DB 53z of the storage 53 and translates the character information in a language of a predesignated country (for example, the foreigner's own native language). The communication unit 54 transmits the result of the translation process to the smartphone 30.

スマートフォン３０の通信部３５は、翻訳サーバ５０から翻訳結果を受信する。プロセッサ３１は、翻訳結果を表示部３３の画面に表示する（Ｓ８８）。なお、ここでは、翻訳サーバが翻訳を行ったが、スマートフォン３０がインストール済みの翻訳アプリを起動し、自装置で翻訳を行ってもよい。 The communication unit 35 of the smartphone 30 receives the translation result from the translation server 50. The processor 31 displays the translation result on the screen of the display unit 33 (S88). Here, the translation server performs translation, but the smartphone 30 may activate the installed translation application and perform translation on its own device.

図１２Ａは、スマートフォン３０に表示された翻訳結果画面ＧＭ３の一例を示す図である。翻訳結果画面ＧＭ３の下方に配置された、矩形窓ｗｋ３で囲まれた領域には、翻訳結果が表示される。ここでは、文字情報である「カレー」、「烏龍茶」に対し、それぞれ翻訳結果である「Ｃｕｒｒｙ」、「Ｏｏｌｏｎｇ」が表示される。また、反射位置ｍｃが重畳され、文字認識されなかった「たこ焼き」、「焼きそば」の画像に対しては、翻訳が行われないので、何も標示されない。なお、ここでは、日本語から英語へと翻訳されたが、翻訳前の言語及び翻訳後の言語は、日本語、英語、中国後、ドイツ語、フランス語等、任意の組み合わせが可能である。翻訳アプリは、スマートフォン３０に設定された所有者の国籍を判別し、該当する国の言語で翻訳を行う。 FIG. 12A is a diagram illustrating an example of the translation result screen GM3 displayed on the smartphone 30. The translation result is displayed in the area surrounded by the rectangular window wk3 arranged below the translation result screen GM3. Here, “Curry” and “Oolong” as translation results are displayed for “Curry” and “Oolong tea” as character information, respectively. Also, no translation is performed on the images of “takoyaki” and “yakisoba” on which the reflection position mc is superimposed and characters are not recognized, so that nothing is displayed. Although the translation from Japanese to English is performed here, the language before translation and the language after translation can be arbitrarily combined, such as Japanese, English, Chinese, German, and French. The translation application determines the nationality of the owner set in the smartphone 30 and translates it in the language of the corresponding country.

ユーザは、タッチパネルＴＰに対し、所定の操作を行うことで、翻訳結果を保存可能である。所定の操作として、例えば、翻訳結果画面ＧＭ３に表示された矩形窓ｗｋ３で囲まれた領域をダブルタップ操作することが挙げられる。 The user can save the translation result by performing a predetermined operation on the touch panel TP. As the predetermined operation, for example, a double tap operation may be performed on the area surrounded by the rectangular window wk3 displayed on the translation result screen GM3.

プロセッサ３１は、ユーザの操作を受け付け、翻訳結果を保存するか否かを判別する（Ｓ８９）。翻訳結果を保存する場合、プロセッサ３１は、メモリ３６に翻訳結果を保存する（Ｓ９０）。プロセッサ３１は、アプリ終了操作が行われたか否かを判別する（Ｓ９１）。アプリ終了操作が行われない場合、ステップＳ８２の処理に戻る。一方、アプリ終了操作が行われた場合、あるいはステップＳ８９で翻訳結果を保存しない場合、プロセッサ３１は、そのまま本処理を終了する。 The processor 31 receives a user operation and determines whether or not to save the translation result (S89). When saving the translation result, the processor 31 saves the translation result in the memory 36 (S90). The processor 31 determines whether or not an application termination operation has been performed (S91). When the application termination operation is not performed, the process returns to step S82. On the other hand, when the application ending operation is performed, or when the translation result is not stored in step S89, the processor 31 ends this processing as it is.

（他の翻訳結果画面）
図１２Ｂは、スマートフォン３０に表示された他の翻訳結果画面ＧＭ４の一例を示す図である。この翻訳結果画面ＧＭ４には、矩形窓が表示されず、文字認識結果画像ＧＺ４と、翻訳結果画像ＧＺ５とが対比して表示される。文字認識結果画像ＧＺ４には、文字認識された文字情報である、「カレー」、「烏龍茶」、「コーヒー」、「紅茶」が含まれる。翻訳結果画像ＧＺ５には、翻訳された文字情報である、「Ｃｕｒｒｙ」、「Ｏｏｌｏｎｇ」、「Ｃｏｆｆｅｅ」、「Ｂｌａｃｋｔｅａ」が含まれる。 (Other translation results screen)
FIG. 12B is a diagram illustrating an example of another translation result screen GM4 displayed on the smartphone 30. In this translation result screen GM4, a rectangular window is not displayed, and the character recognition result image GZ4 and the translation result image GZ5 are displayed in contrast. The character recognition result image GZ4 includes “curry”, “Oolong tea”, “coffee”, and “tea”, which are character information that has been character-recognized. The translation result image GZ5 includes translated character information “Curry”, “Oolong”, “Coffee”, and “Black tea”.

（スマートフォンの他の画面表示例）
別の利用例として、ユーザが、スマートフォン３０で食事メニューを撮像する場合を示す。図１３は、他の撮像画像ＧＺ６が表示されたスマートフォン３０の撮影画面ＧＭ６の一例を示す図である。図１０に示した撮影画面ＧＭ１と同様、撮影画面ＧＭ６には、撮像画像ＧＺ６、矩形窓ｗｋ６、及びシャッタアイコンｓｔが表示される。撮像画像ＧＺ６は、お食事メニュー、チキンカレー、ポークカレー、ビーフカレー、ドリングメニュー等の文字情報を含む。チキンカレー近傍の画像には、光による反射領域ｇ２がチキンカレーの「レー」部分と重畳して存在する。 (Other screen display examples for smartphones)
As another usage example, a case where the user images a meal menu with the smartphone 30 is shown. FIG. 13 is a diagram illustrating an example of the shooting screen GM6 of the smartphone 30 on which another captured image GZ6 is displayed. Similar to the shooting screen GM1 shown in FIG. 10, the shooting screen GM6 displays a captured image GZ6, a rectangular window wk6, and a shutter icon st. The captured image GZ6 includes character information such as a meal menu, chicken curry, pork curry, beef curry, and dragging menu. In the image near the chicken curry, a light reflection region g2 is superimposed on the “lay” portion of the chicken curry.

図１４Ａは、一部文字認識可能な範囲を含む重畳画像ＧＺ７が表示されたスマートフォン３０の確認画面ＧＭ７の一例を示す図である。撮像画像ＧＺ６に対し文字認識を行った結果、確認画面ＧＭ７では、お食事メニュー、ポークカレー、ビーフカレー、ドリングメニューが文字認識された。認識された文字には、文字認識できたことを表すマーキングとして文字掛けｈｍが施される。前述したように、文字掛けｈｍが施されると、表示部３３の画面に表示される文字の表示形態が変化する。 FIG. 14A is a diagram illustrating an example of a confirmation screen GM7 of the smartphone 30 on which a superimposed image GZ7 including a range where some characters can be recognized is displayed. As a result of performing character recognition on the captured image GZ6, the meal menu, pork curry, beef curry, and dragging menu were recognized on the confirmation screen GM7. Character recognition hm is applied to the recognized character as a marking indicating that the character has been recognized. As described above, when the character hook hm is applied, the display form of the characters displayed on the screen of the display unit 33 changes.

一方、チキンカレーを含む領域には、反射位置ｍｃが重畳表示される。この領域では、反射位置ｍｃが近傍に重畳表示されている。また、チキンカレー全体ではないが、その一部が文字認識可能である、一部文字認識可能な範囲が、マーカｍｒで識別可能に表示される。ここでは、一部文字認識可能な範囲は、チキンカレーのうちの「チキンカ」の部分である。「チキンカ」の範囲は、マーカｍｒとして、例えばオレンジ色の網掛け（図中、ハッチ表示）が施される。また、「チキンカ」の部分を挟むように、左右のカーソルｋｓがタッチパネルＴＰに表示される。ユーザが、例えば指でカーソルｋｓをドラッグ操作することで、一部文字認識可能な範囲が変更される。 On the other hand, the reflection position mc is superimposed on the area including the chicken curry. In this region, the reflection position mc is superimposed and displayed in the vicinity. Further, although not the whole chicken curry, a part of which a part of the curry can be recognized and a part of the part which can be recognized by the character are displayed so as to be identifiable by the marker mr. Here, the range in which some characters can be recognized is the “chicken curry” portion of the chicken curry. The range of “chicken” is, for example, orange shaded (indicated by hatching in the figure) as the marker mr. In addition, the left and right cursors ks are displayed on the touch panel TP so as to sandwich the “chicken” portion. For example, when the user drags the cursor ks with a finger, for example, a range in which some characters can be recognized is changed.

図１４Ｂは、一部文字認識可能な範囲が変更された確認画面ＧＭ８の一例を示す図である。ユーザは、「チキンカ」を翻訳しても、誤訳すると判断し、指でカーソルｋｓを図中左に１文字移動させる。一部文字認識可能な範囲は、「チキン」の部分に変化する。これにより、チキンを翻訳した場合、チキンカレーが連想される。 FIG. 14B is a diagram illustrating an example of the confirmation screen GM8 in which the range in which some characters can be recognized is changed. The user determines that the translation of “Chickenka” is wrong, and moves the cursor ks by one character to the left in the drawing. The range in which some characters can be recognized changes to the “chicken” portion. Thereby, when a chicken is translated, a chicken curry is associated.

図１４Ａ及び図１４Ｂには、図１１と同様、確認画面ＧＭ７，ＧＭ８の下方に矩形窓ｗｋ７，ｗｋ８がそれぞれ表示され、翻訳の有無を確認するメッセージが表示される。ユーザが、タッチパネルＴＰの下方に表示されたＹＥＳボタン３４ｚを押下すると、確認画面ＧＭ８に対し、翻訳が行われる。 14A and 14B, similarly to FIG. 11, rectangular windows wk7 and wk8 are displayed below the confirmation screens GM7 and GM8, respectively, and a message for confirming the presence / absence of translation is displayed. When the user presses the YES button 34z displayed below the touch panel TP, the confirmation screen GM8 is translated.

図１５Ａは、スマートフォン３０に表示された翻訳結果画面ＧＭ９の一例を示す図である。翻訳結果画面ＧＭ９の下方には、矩形窓ｗｋ９で囲まれた領域には、翻訳結果が表示される。ここでは、文字情報である、お食事メニュー、チキン、ポークカレー、ビーフカレー、ドリングメニューに対し、それぞれ翻訳結果である「ｆｏｏｄｍｅｎｕ」、「ｃｈｉｋｅｎ」、「ｐｏｒｋｃｕｒｒｙ」、「ｂｅｅｆｃｕｒｒｙ」、「ｄｒｉｎｋｍｅｎｕ」が表示される。 FIG. 15A is a diagram illustrating an example of the translation result screen GM9 displayed on the smartphone 30. Below the translation result screen GM9, the translation result is displayed in the area surrounded by the rectangular window wk9. In this case, the translation information “food menu”, “chiken”, “pork curry”, “beef curry”, “food curry” “drink menu” is displayed.

（他の翻訳結果画面）
図１５Ｂは、スマートフォン３０に表示された他の翻訳結果画面ＧＭ１０の一例を示す図である。翻訳結果画面ＧＭ１０の下方に表示された矩形窓ｗｋ１０で囲まれた領域は、空白である。翻訳結果画面ＧＭ１０には、文字情報である、お食事メニュー、チキンカレー、ポークカレー、ビーフカレー、ドリングメニューを上書きして、翻訳結果である「ｆｏｏｄｍｅｎｕ」、「ｃｈｉｋｅｎ」、「ｐｏｒｋｃｕｒｒｙ」、「ｂｅｅｆｃｕｒｒｙ」、「ｄｒｉｎｋｍｅｎｕ」が表示される。ただし、反射位置ｍｃの近傍の領域は、翻訳されず、そのまま表示される。 (Other translation results screen)
FIG. 15B is a diagram illustrating an example of another translation result screen GM10 displayed on the smartphone 30. The area surrounded by the rectangular window wk10 displayed below the translation result screen GM10 is blank. In the translation result screen GM10, the text menus such as meal menu, chicken curry, pork curry, beef curry, and dragling menu are overwritten, and the translation results “food menu”, “chiken”, “pork curry”, “Beef current” and “drink menu” are displayed. However, the region near the reflection position mc is not translated and is displayed as it is.

このように、スマートフォン３０で撮像された撮像画像に反射位置が含まれていても、ユーザが判読可能なように、翻訳結果が表示される。 Thus, even if the captured image captured by the smartphone 30 includes the reflection position, the translation result is displayed so that the user can read it.

以上により、実施の形態１に係るＡＩサーバ１０における学習処理方法は、光の反射位置（反射箇所の一例）を示す反射領域ｇ１（反射画像領域の一例）を含む元画像Ａ（学習処理対象の撮像画像の一例）に基づいて、元画像Ａの偽画像Ｂ’（第１類似画像の一例）を生成するステップを有する。また、学習処理方法は、元画像Ａ（撮像画像の一例）中の反射領域ｇ１が他の画像領域と識別可能に生成された学習画像Ｂ１，Ｂ２，Ｂ３と偽画像Ｂ’との比較に応じて、偽画像Ｂ’の真偽性を評価するステップを有する。また、学習処理方法は、偽画像Ｂ’に基づいて、元画像Ａの偽画像Ａ’（第２類似画像の一例）を生成するステップを有する。また、学習処理方法は、偽画像Ａ’と元画像Ａとの比較に応じて、偽画像Ａ’の真偽性を評価するステップを有する。また、学習処理方法は、偽画像Ｂ’及び偽画像Ａ’のそれぞれの真偽性の評価結果に基づいて、任意の撮像画像における反射領域ｇ１の検知に用いる学習済みモデル（反射検知モデルの一例）を生成するステップを有する。 As described above, the learning processing method in the AI server 10 according to the first embodiment is based on the original image A (the learning processing target) including the reflection region g1 (an example of the reflection image region) indicating the light reflection position (an example of the reflection portion). And generating a fake image B ′ (an example of a first similar image) of the original image A based on an example of a captured image. Further, the learning processing method is based on a comparison between learning images B1, B2, and B3 in which the reflection region g1 in the original image A (an example of the captured image) is identifiable from other image regions and the fake image B ′. And evaluating the authenticity of the fake image B ′. Further, the learning processing method includes a step of generating a false image A ′ (an example of a second similar image) of the original image A based on the false image B ′. In addition, the learning processing method includes a step of evaluating the authenticity of the fake image A ′ according to the comparison between the fake image A ′ and the original image A. In addition, the learning processing method uses a learned model (an example of a reflection detection model) used to detect the reflection region g1 in an arbitrary captured image based on the evaluation results of the authenticity of each of the false image B ′ and the false image A ′. ).

これにより、ＡＩサーバ１０は、スマートフォン３０から任意の撮像画像が入力された場合でも、その撮像画像中の光の反射箇所を示す反射画像領域を検知可能な高精度な反射検知モデルを生成でき、任意の撮像画像において検知される反射画像領域の信頼性を的確に担保できる。 Thereby, even when any captured image is input from the smartphone 30, the AI server 10 can generate a highly accurate reflection detection model that can detect a reflected image region indicating a reflected portion of light in the captured image. The reliability of the reflected image area detected in any captured image can be accurately ensured.

また、学習処理方法において、学習済みモデルを生成するステップは、偽画像Ｂ’及び偽画像Ａ’のそれぞれの真偽性の評価結果に基づいて、学習済みモデルが使用する学習パラメータ（パラメータの一例）を更新するステップと、更新された学習パラメータを用いて学習済みモデルを生成するステップとを含む。これにより、ＡＩサーバ１０は、偽画像と元画像との真偽性の評価結果に基づいて学習パラメータの更新された高精度な学習済みモデルを生成でき、学習済みモデルの学習効果を向上できる。 In the learning processing method, the step of generating the learned model includes learning parameters (an example of parameters) used by the learned model based on the authenticity evaluation results of the fake images B ′ and A ′. ) And a step of generating a learned model using the updated learning parameter. Thereby, the AI server 10 can generate a highly accurate learned model in which the learning parameter is updated based on the evaluation result of the authenticity of the fake image and the original image, and can improve the learning effect of the learned model.

また、学習処理方法において、偽画像Ｂ’´を生成するステップは、元画像Ａ（学習処理対象の撮像画像の一例）が複数存在する場合に、それぞれの元画像Ａ毎に対応する偽画像Ｂ’を生成するステップを含む。これにより、ＡＩサーバ１０は、複数の異なる元画像Ａに対応して複数の偽画像を生成できるので、元画像Ａ毎にそれぞれ学習パラメータを更新できるので、学習済みモデルの信頼性の精度を一層向上できる。 Further, in the learning processing method, the step of generating the fake image B ″ is a fake image B corresponding to each original image A when there are a plurality of original images A (an example of a captured image to be processed). Includes the step of generating '. Thereby, since the AI server 10 can generate a plurality of false images corresponding to a plurality of different original images A, the learning parameters can be updated for each of the original images A, so that the accuracy of the reliability of the learned model is further increased. It can be improved.

また、学習処理方法は、元画像Ａ（撮像画像の一例）中の反射領域ｇ１に赤色（第１の色の一例）を付与し、元画像Ａ中の反射領域ｇ１以外の他の画像領域に白色（第２の色の一例）を付与して学習画像Ｂ１を生成するステップを更に有する。これにより、ＡＩサーバ１０は、スマートフォン３０から入力された撮像画像内に光の反射領域とそれ以外の領域とが明確に識別された学習画像を容易に生成できる。 Further, the learning processing method assigns red (an example of the first color) to the reflection region g1 in the original image A (an example of the captured image), and applies it to other image regions other than the reflection region g1 in the original image A. The method further includes the step of generating the learning image B1 by giving white (an example of the second color). Thereby, the AI server 10 can easily generate a learning image in which the light reflection region and the other region are clearly identified in the captured image input from the smartphone 30.

また、学習処理方法は、元画像Ａ（撮像画像の一例）中の反射領域ｇ１を構成するそれぞれのＲ画素（画素のいずれか１色の一例）の画素値に、元画像Ａ中の対応する画素の輝度値を設定し、元画像Ａ中の反射領域ｇ１以外の他の画像領域を構成するそれぞれの画素の画素値に、元画像Ａ中の対応する画素の画素値を設定して学習画像Ｂ２を生成するステップを更に有する。これにより、ＡＩサーバ１０は、スマートフォン３０から入力された撮像画像内に光の反射領域とそれ以外の領域とが明確に識別された学習画像を容易に生成できる。 In addition, the learning processing method corresponds to the pixel value of each R pixel (an example of any one color of pixels) constituting the reflection region g1 in the original image A (an example of a captured image) in the original image A. The luminance value of the pixel is set, and the pixel value of the corresponding pixel in the original image A is set as the pixel value of each pixel constituting the image area other than the reflection area g1 in the original image A. The method further includes the step of generating B2. Thereby, the AI server 10 can easily generate a learning image in which the light reflection region and the other region are clearly identified in the captured image input from the smartphone 30.

また、学習処理方法は、元画像Ａ（撮像画像の一例）中の反射領域ｇ１を構成するそれぞれのＲ画素（画素のいずれか１色の一例）の画素値に、元画像Ａ中の対応する画素の輝度値を設定し、元画像Ａ中の反射領域ｇ１以外の他の画像領域を構成するそれぞれのＲＧＢ画素の（全ての色の一例）の画素値に、元画像Ａ中の対応する画素の画素値を設定して学習画像Ｂ３を生成するステップを更に有する。これにより、ＡＩサーバ１０は、スマートフォン３０から入力された撮像画像内に光の反射領域とそれ以外の領域とが明確に識別された学習画像を容易に生成できる。 In addition, the learning processing method corresponds to the pixel value of each R pixel (an example of any one color of pixels) constituting the reflection region g1 in the original image A (an example of a captured image) in the original image A. The luminance value of the pixel is set, and the corresponding pixel value in the original image A is set to the pixel value of each RGB pixel (one example of all colors) constituting the image area other than the reflection area g1 in the original image A. And a step of generating a learning image B3 by setting the pixel value of. Thereby, the AI server 10 can easily generate a learning image in which the light reflection region and the other region are clearly identified in the captured image input from the smartphone 30.

また、実施の形態１に係る反射検知システム５は、前述したＡＩサーバ１０（サーバ装置の一例）と、撮像部３２及び表示部３３を有するスマートフォン３０（携帯端末の一例）とが互いに通信可能に接続される。ＡＩサーバ１０は、撮像部３２により撮像された任意の撮像画像を取得すると、学習済みモデル（反射検知モデルの一例）を用いて、撮像画像中の光の反射領域（反射画像領域の一例）を検知するとともに、撮像画像中の光の反射領域を他の画像領域と識別可能に加工した出力画像を生成してスマートフォン３０に送信する。スマートフォン３０は、ＡＩサーバ１０から送信された出力画像を用いて、出力画像のうち光の反射領域以外の他の画像領域を文字認識した結果を表示部３３に表示する。 In addition, the reflection detection system 5 according to Embodiment 1 enables the above-described AI server 10 (an example of a server device) and a smartphone 30 (an example of a mobile terminal) having an imaging unit 32 and a display unit 33 to communicate with each other. Connected. When the AI server 10 acquires an arbitrary captured image captured by the imaging unit 32, the AI server 10 uses a learned model (an example of a reflection detection model) to calculate a light reflection region (an example of a reflected image region) in the captured image. While detecting, the output image which processed the reflective area | region of the light in a captured image so that identification with other image areas is produced | generated, and it transmits to the smart phone 30. FIG. Using the output image transmitted from the AI server 10, the smartphone 30 displays the result of character recognition of the image area other than the light reflection area in the output image on the display unit 33.

これにより、スマートフォン３０を使用するユーザ（例えば、外国人等の旅行者）は、自ら内容確認したい広告等を被写体とする撮像画像をＡＩサーバ１０に送信しかつその撮像画像に対するＡＩサーバ１０の処理結果をスマートフォン３０において文字認識及び翻訳させることで、文字部分として認識された文字情報の翻訳結果を把握できる。言い換えると、反射検知システム５は、外国人等の旅行者をユーザに親切な文字認識及び翻訳のアプリケーションを提供でき、ユーザの利便性を的確に向上できる。 Thereby, a user (for example, a traveler such as a foreigner) who uses the smartphone 30 transmits a captured image whose subject is an advertisement or the like whose content is to be confirmed to the AI server 10 and processing of the AI server 10 for the captured image. By causing the smartphone 30 to recognize and translate the result, the translation result of the character information recognized as the character portion can be grasped. In other words, the reflection detection system 5 can provide an application for character recognition and translation that is friendly to the user, such as a foreigner, and can improve the convenience of the user accurately.

以上、添付図面を参照しながら各種の実施の形態について説明したが、本開示はかかる例に限定されない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例、修正例、置換例、付加例、削除例、均等例に想到し得ることは明らかであり、それらについても本開示の技術的範囲に属すると了解される。また、発明の趣旨を逸脱しない範囲において、上述した各種の実施の形態における各構成要素を任意に組み合わせてもよい。 Although various embodiments have been described above with reference to the accompanying drawings, the present disclosure is not limited to such examples. It is obvious for those skilled in the art that various modifications, modifications, substitutions, additions, deletions, and equivalents can be conceived within the scope of the claims. It is understood that it belongs to the technical scope of the present disclosure. In addition, the constituent elements in the various embodiments described above may be arbitrarily combined without departing from the spirit of the invention.

本開示は、撮像画像中の光の反射箇所を示す反射画像領域を検知可能な高精度な反射検知モデルを生成でき、有用である。 The present disclosure is useful because it can generate a highly accurate reflection detection model that can detect a reflected image region indicating a reflected portion of light in a captured image.

５反射検知システム
１０ＡＩサーバ
１３ＡＩ処理部
３０スマートフォン
３２撮像部
３３表示部 DESCRIPTION OF SYMBOLS 5 Reflection detection system 10 AI server 13 AI processing part 30 Smartphone 32 Imaging part 33 Display part

Claims

サーバ装置における学習処理方法であって、
光の反射箇所を示す反射画像領域を含む学習処理対象の撮像画像に基づいて、前記撮像画像の第１類似画像を生成するステップと、
前記撮像画像中の前記反射画像領域が他の画像領域と識別可能に生成された学習用画像と前記第１類似画像との比較に応じて、前記第１類似画像の真偽性を評価するステップと、
前記第１類似画像に基づいて、前記撮像画像の第２類似画像を生成するステップと、
前記第２類似画像と前記撮像画像との比較に応じて、前記第２類似画像の真偽性を評価するステップと、
前記第１類似画像及び前記第２類似画像のそれぞれの真偽性の評価結果に基づいて、任意の撮像画像における前記反射画像領域の検知に用いる反射検知モデルを生成するステップと、を有する、
学習処理方法。 A learning processing method in a server device,
Generating a first similar image of the captured image based on a captured image of a learning process target including a reflected image region indicating a light reflection location;
Evaluating the authenticity of the first similar image according to a comparison between the learning image generated so that the reflected image region in the captured image can be distinguished from other image regions and the first similar image When,
Generating a second similar image of the captured image based on the first similar image;
Evaluating the authenticity of the second similar image according to a comparison between the second similar image and the captured image;
Generating a reflection detection model used for detection of the reflected image area in an arbitrary captured image based on the authenticity evaluation results of the first similar image and the second similar image, respectively.
Learning processing method.

前記反射検知モデルを生成するステップは、
前記第１類似画像及び前記第２類似画像のそれぞれの真偽性の評価結果に基づいて、前記反射検知モデルが使用するパラメータを更新するステップと、
更新された前記パラメータを用いて前記反射検知モデルを生成するステップと、を含む、
請求項１に記載の学習処理方法。 The step of generating the reflection detection model includes:
Updating parameters used by the reflection detection model based on the evaluation results of the authenticity of each of the first similar image and the second similar image;
Generating the reflection detection model using the updated parameters.
The learning processing method according to claim 1.

前記第１類似画像を生成するステップは、
前記学習処理対象の撮像画像が複数存在する場合に、それぞれの前記学習処理対象の撮像画像毎に対応する前記第１類似画像を生成するステップを含む、
請求項１に記載の学習処理方法。 The step of generating the first similar image includes
Including a step of generating the first similar image corresponding to each captured image of the learning process target when there are a plurality of captured images of the learning process target;
The learning processing method according to claim 1.

前記撮像画像中の前記反射画像領域に第１の色を付与し、前記撮像画像中の前記反射画像領域以外の他の画像領域に第２の色を付与して前記学習用画像を生成するステップ、を更に有する、
請求項１に記載の学習処理方法。 A step of generating a learning image by assigning a first color to the reflection image region in the captured image and assigning a second color to an image region other than the reflection image region in the captured image. Further comprising
The learning processing method according to claim 1.

前記撮像画像中の前記反射画像領域を構成するそれぞれの画素のいずれか１色の画素値に、前記撮像画像中の対応する画素の輝度値を設定し、前記撮像画像中の前記反射画像領域以外の他の画像領域を構成するそれぞれの画素の画素値に、前記撮像画像中の対応する画素の画素値を設定して前記学習用画像を生成するステップ、を更に有する、
請求項１に記載の学習処理方法。 A luminance value of a corresponding pixel in the captured image is set to a pixel value of any one color of each pixel constituting the reflected image region in the captured image, and other than the reflected image region in the captured image A step of generating the learning image by setting a pixel value of a corresponding pixel in the captured image to a pixel value of each pixel constituting another image region,
The learning processing method according to claim 1.

前記撮像画像中の前記反射画像領域を構成するそれぞれの画素のいずれか１色の画素値に、前記撮像画像中の対応する画素の輝度値を設定し、前記撮像画像中の前記反射画像領域以外の他の画像領域を構成するそれぞれの画素の全ての色の画素値に、前記撮像画像中の対応する画素の画素値を設定して前記学習用画像を生成するステップ、を更に有する、
請求項１に記載の学習処理方法。 A luminance value of a corresponding pixel in the captured image is set to a pixel value of any one color of each pixel constituting the reflected image region in the captured image, and other than the reflected image region in the captured image A step of generating the learning image by setting pixel values of corresponding pixels in the captured image to pixel values of all colors of the respective pixels constituting the other image region,
The learning processing method according to claim 1.

光の反射箇所を示す反射画像領域を含む学習処理対象の撮像画像を保持するサーバ装置であって、
プロセッサとメモリと、を備え、
前記プロセッサは、前記メモリと協働して、
前記撮像画像に基づいて、前記撮像画像の第１類似画像を生成し、
前記撮像画像中の前記反射画像領域が他の画像領域と識別可能に生成された学習用画像と前記第１類似画像との比較に応じて、前記第１類似画像の真偽性を評価し、
前記第１類似画像に基づいて、前記撮像画像の第２類似画像を生成し、
前記第２類似画像と前記撮像画像との比較に応じて、前記第２類似画像の真偽性を評価し、
前記第１類似画像及び前記第２類似画像のそれぞれの真偽性の評価結果に基づいて、任意の撮像画像における前記反射画像領域の検知に用いる反射検知モデルを生成する、
サーバ装置。 A server device that holds a captured image of a learning process target including a reflection image region indicating a light reflection portion,
A processor and a memory;
The processor, in cooperation with the memory,
Generating a first similar image of the captured image based on the captured image;
The authenticity of the first similar image is evaluated according to a comparison between the learning image and the first similar image generated so that the reflected image region in the captured image can be distinguished from other image regions,
Generating a second similar image of the captured image based on the first similar image;
According to the comparison between the second similar image and the captured image, the authenticity of the second similar image is evaluated,
Based on the authenticity evaluation results of the first similar image and the second similar image, a reflection detection model used for detecting the reflection image region in an arbitrary captured image is generated.
Server device.

請求項７に記載のサーバ装置と、撮像部及び表示部を有する携帯端末とが互いに通信可能に接続された反射検知システムであって、
前記サーバ装置は、
前記撮像部により撮像された任意の撮像画像を取得すると、前記反射検知モデルを用いて、前記撮像画像中の前記反射画像領域を検知するとともに、前記撮像画像中の前記反射画像領域を他の画像領域と識別可能に加工した出力画像を生成して前記携帯端末に送信し、
前記携帯端末は、
前記サーバ装置から送信された前記出力画像を用いて、前記出力画像のうち前記反射画像領域以外の前記他の画像領域を文字認識した結果を前記表示部に表示する、
反射検知システム。 A reflection detection system in which the server device according to claim 7 and a portable terminal having an imaging unit and a display unit are connected to be communicable with each other,
The server device
When an arbitrary captured image captured by the imaging unit is acquired, the reflected image area in the captured image is detected using the reflection detection model, and the reflected image area in the captured image is detected as another image. Generate an output image processed to be identifiable from the region and send it to the mobile terminal,
The portable terminal is
Using the output image transmitted from the server device, the result of character recognition of the other image area other than the reflected image area in the output image is displayed on the display unit.
Reflection detection system.