JP7446643B2

JP7446643B2 - Visual positioning methods, devices, equipment and readable storage media

Info

Publication number: JP7446643B2
Application number: JP2022566049A
Authority: JP
Inventors: 尊裕陳; ▲じゅえ▼其呉; 斯洋胡; 欣陳; 沛謙呉; 仲文張
Original assignee: Mapxus Technology Holding Ltd
Current assignee: Mapxus Technology Holding Ltd
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2024-03-11
Anticipated expiration: 2040-05-26
Also published as: CN111758118B; CN111758118A; JP2023523364A; WO2021237443A1

Description

本出願は測位技術分野に関して、特に視覚測位方法、装置、機器及び可読記憶媒体に関している。 The present application relates to the field of positioning technology, and in particular to visual positioning methods, devices, equipment and readable storage media.

機械学習による視覚測位原理において、位置マーカーが付けられた大量の実際シーン写真を使用して訓練することで、入力が写真（ＲＧＢ数値行列）であり、出力が具体的な位置であるニューラルネットワークモデルを取得する。訓練後のニューラルネットワークモデルを取得した後、ユーザーが環境に対して１枚の写真を撮影すれば、具体的な撮影位置を取得できる。 In the machine learning visual positioning principle, by training using a large number of real scene photos with location markers attached, a neural network model whose input is a photo (RGB numerical matrix) and whose output is a specific location get. After obtaining the trained neural network model, the user can take a photo of the environment and obtain the specific shooting location.

このような方法は、訓練データセットとして、使用環境に対して大量の写真サンプルを採集する。例えば、いくつかの文献の記載によれば、３５メートルの幅を有する小さな商店に対する視覚測位を実現するために、３３０枚の写真を採集し、また、１４０メートルの街路（１側のみを測位する）に対する視覚測位を実現するために、１５００枚以上の写真を採集し、また、ある工場に対する測位を実現するために、工場を１８個の領域に分割し、各領域に対して２００枚の画像を撮影する。このように、視覚測位効果を保証するために、訓練データとして、大量の現場写真を採集し、これらの写真は、シーンにおける隅々まで撮影されることを保証しなければならないため、非常に時間及び労力がかかる。 Such methods collect a large number of photographic samples of the usage environment as training datasets. For example, according to some documents, 330 photos were collected to realize visual positioning for a small shop with a width of 35 meters, and 330 photos were collected to realize visual positioning for a small shop with a width of 35 meters. ), we collected more than 1,500 photos, and in order to realize positioning for a certain factory, we divided the factory into 18 areas and created 200 images for each area. to photograph. Thus, in order to guarantee the visual positioning effect, a large number of field photos are collected as training data, and these photos have to be ensured that every corner of the scene is captured, which is very time-consuming. and labor-intensive.

以上のように、視覚測位においてサンプル採集が困難であるなどの問題を如何に解決するかということは、現在、当業者が解决しようとする技術問題である。 As described above, how to solve problems such as difficulty in sample collection in visual positioning is a technical problem that those skilled in the art are currently trying to solve.

本出願は、視覚測位方法、装置、機器及び可読記憶媒体を提供することを目的とし、ライブマップにおけるパノラマ写真を使用してニューラルネットワークモデルを訓練することで、視覚測位においてサンプル採集が困難であるという問題を解決できる。 The present application aims to provide a visual positioning method, device, equipment and readable storage medium for training a neural network model using panoramic photos in a live map, which is difficult to sample in visual positioning. This problem can be solved.

上記の技術問題を解决するために、本出願は以下の技術的解決策を提供し、
視覚測位方法であって、
広角写真を取得して、前記広角写真をランダムに分割することで、測定対象となる画像セットを取得するステップと、
前記測定対象となる画像セットを、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである測位モデルに入力し、測位して認識することで、複数の候補測位を取得するステップと、
複数の前記候補測位によって、最終測位を決定するステップと、を含む。 In order to solve the above technical problems, this application provides the following technical solutions,
A visual positioning method, comprising:
obtaining a wide-angle photograph and randomly dividing the wide-angle photograph to obtain an image set to be measured;
inputting the image set to be the measurement target into a positioning model that is a neural network model trained with panoramic photos on a live map, positioning and recognizing a plurality of candidate positionings;
determining a final positioning based on the plurality of candidate positionings.

好ましくは、前記複数の前記候補測位によって、最終測位を決定するステップは、
複数の前記候補測位に対してクラスタリング処理を行って、クラスタリング結果によって、複数の前記候補測位を選別するステップと、
選別されたいくつかの候補測位によって、幾何グラフィックスを構築するステップと、
前記幾何グラフィックスの幾何中心を前記最終測位とするステップと、を含む。 Preferably, the step of determining a final positioning based on the plurality of candidate positionings includes:
performing clustering processing on the plurality of candidate positionings, and selecting the plurality of candidate positionings based on the clustering results;
constructing geometric graphics using the selected several candidate positionings;
and setting the geometric center of the geometric graphics as the final positioning.

好ましくは、
前記最終測位によって、複数の前記候補測位の標準分散を計算するステップと、
前記標準分散を前記最終測位の測位誤差とするステップと、をさらに含む。 Preferably,
calculating a standard variance of the plurality of candidate positionings according to the final positioning;
The method further includes the step of using the standard variance as a positioning error of the final positioning.

好ましくは、前記ニューラルネットワークモデルの訓練過程は、
前記ライブマップからいくつかの前記パノラマ写真を取得して、各前記実景写真の地理位置を決定するステップと、
いくつかの前記パノラマ写真に対してデワーピング変換を行って、同一アスペクト比を有する複数組の平面投影写真を取得するステップと、
前記パノラマ写真との対応関係に基づいて、各組の前記平面投影写真に、地理位置及び具体的な方向が含まれる地理マーカーを付けるステップと、
地理マーカーが付けられた平面投影写真を訓練サンプルとするステップと、
前記訓練サンプルによって、前記ニューラルネットワークモデルを訓練して、訓練後の前記ニューラルネットワークモデルを前記測位モデルに決定するステップと、を含む。 Preferably, the training process of the neural network model comprises:
retrieving a number of said panoramic photos from said live map and determining the geolocation of each said live view photo;
performing dewarping transformation on some of the panoramic photographs to obtain a plurality of sets of planar projection photographs having the same aspect ratio;
attaching a geographic marker including a geographic location and a specific direction to each set of the planar projection photos based on the correspondence with the panoramic photos;
using a planar projection photograph with geographic markers as a training sample;
The method further includes training the neural network model using the training sample and determining the trained neural network model as the positioning model.

好ましくは、前記いくつかの前記パノラマ写真に対してデワーピング変換を行って、同一アスペクト比を有する複数組の平面投影写真を取得するステップは、
デワーピング変換において、異なる焦点距離パラメータに基づいて、各前記パノラマ写真を分割して、異なる視野角を有する複数組の平面投影写真を取得するステップを含む。 Preferably, the step of performing dewarping transformation on the several panoramic photographs to obtain a plurality of sets of planar projection photographs having the same aspect ratio,
The dewarping transformation includes dividing each said panoramic photograph based on different focal length parameters to obtain a plurality of sets of planar projection photographs with different viewing angles.

好ましくは、前記デワーピング変換において、異なる焦点距離パラメータに基づいて、各前記パノラマ写真を分割して、異なる視野角を有する複数組の平面投影写真を取得するステップは、
対応する原画像のカバレッジが所定割合より大きい分割数に基づいて、各前記パノラマ写真を分割して、隣接ピクチャには重畳視野角が存在する複数組の平面投影写真を取得するステップを含む。 Preferably, in the dewarping transformation, the step of dividing each panoramic photograph based on different focal length parameters to obtain a plurality of sets of planar projection photographs having different viewing angles includes:
The method includes the step of dividing each panoramic photograph based on the number of divisions in which the coverage of the corresponding original image is larger than a predetermined ratio, and obtaining a plurality of sets of planar projection photographs in which adjacent pictures have overlapping viewing angles.

好ましくは、前記ニューラルネットワークモデルの訓練過程は、
インターネットから取得されたシーン写真、又は測位環境に対して採集された環境写真を使用して、前記訓練サンプルを補充するステップをさらに含む。 Preferably, the training process of the neural network model comprises:
The method further includes replenishing the training samples using scene photos obtained from the Internet or environment photos collected for the positioning environment.

好ましくは、前記広角写真をランダムに分割することで、測定対象となる画像セットを取得するステップは、
分割数に基づいて、前記広角写真に対して、原画像カバレッジが所定割合より大きいランダム分割を行って、前記分割数にマッチングする測定対象となる画像セットを取得するステップを含む。 Preferably, the step of randomly dividing the wide-angle photograph to obtain an image set to be measured includes:
The method includes the step of randomly dividing the wide-angle photograph so that the original image coverage is larger than a predetermined ratio based on the number of divisions, and obtaining an image set to be measured that matches the number of divisions.

視覚測位装置であって、
広角写真を取得して、前記広角写真をランダムに分割することで、測定対象となる画像セットを取得する測定対象となる画像セット取得モジュールと、
前記測定対象となる画像セットを、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである測位モデルに入力し、測位して認識することで、複数の候補測位を取得する候補測位取得モジュールと、
複数の前記候補測位によって、最終測位を決定する測位出力モジュールと、を含む。 A visual positioning device,
an image set acquisition module to be measured, which acquires a wide-angle photograph and randomly divides the wide-angle photograph to acquire an image set to be measured;
a candidate positioning acquisition module that acquires a plurality of candidate positionings by inputting the image set to be the measurement target into a positioning model that is a neural network model trained with panoramic photos on a live map, positioning and recognizing;
and a positioning output module that determines a final positioning based on the plurality of candidate positionings.

視覚測位機器であって、
コンピュータプログラムを記憶するメモリと、
前記コンピュータプログラムを実行する時、上記の視覚測位方法を実現するプロセッサーと、を含む。 A visual positioning device,
memory for storing computer programs;
and a processor that implements the visual positioning method described above when executing the computer program.

可読記憶媒体であって、前記可読記憶媒体にはコンピュータプログラムが記憶され、前記コンピュータプログラムはプロセッサーにより実行されると、上記の視覚測位方法を実現する。 A readable storage medium stores a computer program thereon, which when executed by a processor implements the visual positioning method described above.

本出願の実施例が提供する方法によれば、広角写真を取得して、広角写真をランダムに分割することで、測定対象となる画像セットを取得するステップと、測定対象となる画像セットを、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである測位モデルに入力し、測位して認識することで、複数の候補測位を取得するステップと、複数の候補測位によって、最終測位を決定するステップと、を含む。 According to the method provided by the embodiments of the present application, a step of obtaining a wide-angle photograph and randomly dividing the wide-angle photograph to obtain an image set to be measured; A step of obtaining a plurality of candidate positionings by inputting to a positioning model, which is a neural network model trained by panoramic photos on a live map, and performing positioning and recognition; and a step of determining a final positioning from the plurality of candidate positionings. and, including.

ライブマップは実際街並みが見えるマップであり、ライブマップには３６０度の実景が含まれる。ライブマップにおけるパノラマ写真は、実際街並みのマップであり、視覚測位の応用環境と互いに重畳する。これに基づいて、本方法において、ライブマップにおけるパノラマ写真を使用してニューラルネットワークモジュールを訓練することで、視覚測位のための測位モデルを取得できる。広角写真を取得した後、広角写真をランダムに分割することで、測定対象となる画像セットを取得できる。測定対象となる画像セットを測位モデルに入力し、測位して認識することで、複数の候補測位を取得できる。これらの候補測位に基づいて、最終測位を決定できる。このように、本方法において、ライブマップにおけるパノラマ写真に基づいてニューラルネットワークモデルを訓練することで、測位モデルを取得でき、当該測位モデルに基づいて、視覚測位を完成でき、視覚測位において訓練サンプルの採集が困難であるという問題を解決する。 The live map is a map that shows the actual cityscape, and the live map includes a 360-degree view. The panoramic photo in the live map is a map of the actual cityscape, which overlaps with the visual positioning application environment. Based on this, in the present method, a positioning model for visual positioning can be obtained by training a neural network module using panoramic photos in a live map. After acquiring a wide-angle photo, the wide-angle photo can be randomly divided to obtain a set of images to be measured. By inputting an image set to be measured into a positioning model, positioning and recognizing it, multiple candidate positionings can be obtained. Based on these candidate positionings, the final positioning can be determined. As described above, in this method, a positioning model can be obtained by training a neural network model based on panoramic photos in a live map, and visual positioning can be completed based on the positioning model, and the training sample in visual positioning can be Solve the problem of difficult collection.

相応的に、本出願の実施例は、上記の視覚測位方法に対応する装置、機器及び可読記憶媒体をさらに提供し、上記の技術効果を具備するため、ここで、贅言していない。 Correspondingly, the embodiments of the present application further provide devices, equipment and readable storage media corresponding to the above visual positioning method, and have the above technical effects, so they are not exaggerated here.

本出願の実施例又は従来技術における技術的解決策をより明らかに説明するために、以下、実施例又は従来技術の記載の必要な図面を簡単に紹介し、当業者にとって、進歩性に値する労働をしないことを前提として、これらの図面に基づいて他の図面を取得できる。
本出願の実施例における視覚測位方法の実施フローチャートである。本出願の実施例における視野角分割概略図である。本出願の実施例における視覚測位装置の構成概略図である。本出願の実施例における視覚測位装置の構成概略図である。本出願の実施例における視覚測位装置の具体的な構成概略図である。 In order to explain the technical solutions in the embodiments of the present application or the prior art more clearly, the necessary drawings of the embodiments or the description of the prior art will be briefly introduced below, and it will be appreciated by those skilled in the art that the technical solution is worth the inventive step. Other drawings can be derived based on these drawings, provided that you do not.
1 is an implementation flowchart of a visual positioning method in an embodiment of the present application. FIG. 3 is a schematic diagram of viewing angle division in an embodiment of the present application. 1 is a schematic configuration diagram of a visual positioning device in an embodiment of the present application. 1 is a schematic configuration diagram of a visual positioning device in an embodiment of the present application. 1 is a specific configuration schematic diagram of a visual positioning device in an embodiment of the present application.

当業者が本出願の解決策をよりよく理解するために、以下、図面及び具体的な実施形態を結合して、本出願をさらに詳しく説明する。本出願の実施例に基づいて、当業者は進歩性に値する労働をしないことを前提として、取得した他の全ての実施例は、何れも本出願の保護範囲に属している。 In order for those skilled in the art to better understand the solution of the present application, the present application will be explained in more detail below in combination with drawings and specific embodiments. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art will fall within the protection scope of the present application, provided that they do not constitute an inventive step.

ここで、ニューラルネットワークモデルはクラウド又はロカール機器に記憶されるため、本発明の実施例が提供する視覚測位方法は直接的にクラウドサーバーに適用されてもよいし、ロカール機器に適用されてもよい。測位を必要とする装置は撮影、ネットワーク接続機能を有すると、１つの広角写真によって測位を実現できる。 Here, since the neural network model is stored in the cloud or local device, the visual positioning method provided by the embodiments of the present invention may be directly applied to the cloud server or may be applied to the local device. . If a device that requires positioning has photographing and network connection functions, positioning can be achieved using a single wide-angle photograph.

図１を参照して、図１は本出願の実施例における視覚測位方法のフローチャートであり、当該方法は以下のステップを含み、即ち、
Ｓ１０１：広角写真を取得して、広角写真をランダムに分割することで、測定対象となる画像セットを取得する。
広角、即ち、広角レンズ又はパノラマモードで撮影されたピクチャである。簡単に言えば、焦点距離が小さいほど、視野が広く、写真内の収容可能な景物の範囲も広くなっている。 Referring to FIG. 1, FIG. 1 is a flowchart of a visual positioning method in an embodiment of the present application, the method including the following steps:
S101: Obtain a wide-angle photo and randomly divide the wide-angle photo to obtain an image set to be measured.
It is a picture taken with a wide angle, that is, with a wide-angle lens or in panoramic mode. Simply put, the smaller the focal length, the wider the field of view and the wider the range of scenery that can be included in the photo.

発明が提供する視覚測位方法において、ライブマップにおけるパノラマ写真を使用してニューラルネットワークモデルを訓練する。従って、視覚測位をよりよく行うために、測位モデルによって視覚測位を行う場合、必要な写真も広角写真である。例えば、ユーザーは測位を必要とする位置に、広角モード（又は超広角モード）又はパノラマモードで周辺環境に対して、視野角が１２０度以上（無論、例えば１４０度、１８０度などの他の度数であってもよい）の１枚の広角写真を撮影する。 In the visual positioning method provided by the invention, panoramic photos in a live map are used to train a neural network model. Therefore, in order to perform visual positioning better, when visual positioning is performed using a positioning model, the required photograph is also a wide-angle photograph. For example, a user may arrive at a position that requires positioning in wide-angle mode (or ultra-wide-angle mode) or panoramic mode with a viewing angle of 120 degrees or more (of course, other degrees such as 140 degrees, 180 degrees, etc.) with respect to the surrounding environment. Take a wide-angle photo of the subject.

広角写真を取得した後、ランダムに分割して、分割されたいくつかの写真からなる測定対象となる画像セットを取得する。 After acquiring a wide-angle photograph, it is randomly divided into several divided photographs to obtain an image set to be measured.

特に、具体的に、当該広角写真を何枚の写真に分割するかということは、全地球測位モデルの訓練効果及び実際測位精度要求に基づいて、設定すればよい。一般的に、認識可能範囲内（写真の大きさが小さすぎると、関連測位特徴がなく、効果的な認識を行うことができないという問題が存在する）で、分割数が大きいほど、測位精度が高く、無論、モデルの訓練反復回数が多く、訓練時間が長い。 In particular, the number of pictures into which the wide-angle picture is divided may be determined based on the training effect of the global positioning model and the actual positioning accuracy requirements. In general, within the recognition range (if the size of the photo is too small, there is a problem that there are no relevant positioning features and effective recognition cannot be performed), the larger the number of divisions, the better the positioning accuracy. Of course, the number of training iterations of the model is large, and the training time is long.

好ましくは、測位精度を向上させるために、広角写真を分割する場合、分割数に基づいて、広角写真に対して原画像カバレッジが所定割合より大きいランダム分割を行うことで、分割数にマッチングする測定対象となる画像セットを取得する。具体的に、測定対象となる画像セットとして、広角写真を、アスペクト比が１:１（ここで、アスペクト比は他の比であってもよく、当該アスペクト比は、測位モデル訓練のための訓練サンプルのアスペクト比と同様であればよい）であり、高さが当該広角写真の高さの１／３～１／２であるＮ枚の画像にランダムに分割する。訓練の効果及び測位精度のニーズに基づいて、Ｎの数を設定し、訓練効果が少し悪く、測位精度の要求が高い場合、より高いＮ値を選択し、一般的に、Ｎの数は１００に設定されてもよい（無論、例えば、５０、８０などの他の数値を選択してもよく、ここで、一々列挙していない）。一般的に、ランダム分割結果は、原画像（即ち、当該広角写真）に対するカバレッジが９５％より大きいと要求される（無論、他の割合を設定してもよく、ここで、一々列挙していない）。 Preferably, in order to improve positioning accuracy, when dividing a wide-angle photograph, the wide-angle photograph is randomly divided based on the number of divisions in which the original image coverage is larger than a predetermined percentage, and measurement matching the number of divisions is performed. Get the target image set. Specifically, the image set to be measured is a wide-angle photograph with an aspect ratio of 1:1 (here, the aspect ratio may be another ratio, and the aspect ratio is the training ratio for positioning model training). (It is sufficient if the aspect ratio is similar to that of the sample) and the height is randomly divided into N images whose height is 1/3 to 1/2 of the height of the wide-angle photograph. Based on the training effect and positioning accuracy needs, set the number of N, if the training effect is a little bad and the positioning accuracy requirement is high, choose a higher N value, generally the number of N is 100. (Of course, other values such as 50, 80, etc. may be selected, and are not listed here). Generally, the random segmentation result is required to have a coverage of greater than 95% with respect to the original image (i.e., the wide-angle photograph) (of course, other percentages may be set and are not listed here). ).

Ｓ１０２：測定対象となる画像セットを測位モデルに入力し、測位して認識することで、複数の候補測位を取得する。 S102: A plurality of candidate position measurements are obtained by inputting an image set to be measured into a positioning model, positioning and recognizing the image set.

測位モデルは、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである。 The positioning model is a neural network model trained by panoramic photos on live maps.

より正確な測位効果を取得するために、本実施例において、測定対象となる画像セットにおける、分割による各写真を測位モデルにそれぞれ入力し、測位して認識し、各写真に対して、測位結果に関する１つの出力を取得する。本実施例において、分割による各写真に対応する測位結果を候補測位とする。 In order to obtain a more accurate positioning effect, in this example, each divided photo in the image set to be measured is input to the positioning model, the position is measured and recognized, and the positioning result is calculated for each photo. Get one output for. In this embodiment, the positioning results corresponding to each divided photograph are taken as candidate positioning.

ここで、実際に応用する前、訓練して測位モデルを取得する。ニューラルネットワークモデルの訓練は以下のステップを含み、
ステップ１：ライブマップからいくつかのパノラマ写真を取得して、各実景写真の地理位置を決定する；
ステップ２：いくつかのパノラマ写真に対してデワーピング変換を行って、同一アスペクト比を有する複数組の平面投影写真を取得する；
ステップ３：パノラマ写真との対応関係に基づいて、各組の平面投影写真に、地理位置及び具体的な方向が含まれる地理マーカーを付ける；
ステップ４：地理マーカーが付けられた平面投影写真を訓練サンプルとする；
ステップ５：訓練サンプルによって、ニューラルネットワークモデルを訓練することで、訓練後のニューラルネットワークモデルを測位モデルに決定する。 Here, before actual application, a positioning model is obtained by training. Training a neural network model includes the following steps:
Step 1: Get several panoramic photos from the live map and determine the geolocation of each live photo;
Step 2: Perform dewarping transformation on several panoramic photos to obtain multiple sets of planar projection photos with the same aspect ratio;
Step 3: Mark each set of planar projection photos with a geomarker containing a geolocation and specific direction based on the correspondence with the panoramic photo;
Step 4: Take flat projection photos with geographic markers as training samples;
Step 5: By training the neural network model using the training samples, the trained neural network model is determined as the positioning model.

記載を便利にするために、上記の５つのステップを結合して説明する。 For convenience of description, the above five steps will be combined and described.

パノラマ写真の視野角は３６０度に近接するため、本実施例において、パノラマ写真に対してデワーピング変換を行って、同一の長さ比を有する複数組の平面投影写真を取得する。ライブマップにおけるパノラマ写真と地理位置とは対応関係を有するため、本実施例において、同一のパノラマ写真から分割された１組の平面投影写真の地理位置をパノラマ写真の地理位置に対応させる。また、パノラマ写真を分割する場合、視野角に基づいて分割するため、分割による写真の方向は明瞭であり、本実施例において、地理位置及び具体的な方向を地理マーカーとして追加する。つまり、各平面投影写真は何れも対応する地理位置及び具体的な方向を有する。 Since the viewing angle of a panoramic photograph is close to 360 degrees, in this embodiment, a dewarping transformation is performed on the panoramic photograph to obtain a plurality of sets of planar projection photographs having the same length ratio. Since there is a correspondence relationship between panoramic photos and geographic locations in the live map, in this embodiment, the geographic locations of a set of planar projection photos divided from the same panoramic photo are made to correspond to the geographic locations of the panoramic photo. Furthermore, when dividing a panoramic photograph, the direction of the divided photograph is clear because it is divided based on the viewing angle, and in this embodiment, the geographical position and specific direction are added as geographical markers. That is, each planar projection photo has a corresponding geographic location and specific direction.

地理マーカーを有する平面投影写真を訓練サンプルとし、当該訓練サンプルによってニューラルネットワークモデルを訓練し、訓練後のニューラルネットワークモデルは測位モデルである。具体的に、具体的な位置、具体的な方向を有する写真セットをデータプールとする。当該データプールからランダムに抽出された８０％を訓練セットとし、残りの２０％をテストセットとする。当該比は、実際の訓練状況に基づいて調整してもよい。訓練セットを、初期化された又は大規模のピクチャセットによって事前訓練されたニューラルネットワークモデルに入力して訓練し、テストセットによって訓練結果を検証する。選択可能な通常ニューラルネットワーク構成は、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ、畳み込みニューラルネットワークであって、即ち、畳み込み層（ａｌｔｅｒｎａｔｉｎｇｃｏｎｖｏｌｕｔｉｏｎａｌｌａｙｅｒ）及びプーリング層（ｐｏｏｌｉｎｇｌａｙｅｒ）を含むフィードフォワードニューラルネットワーク）、その派生構成、ＬＳＴＭ（ＬｏｎｇＳｈｏｒｔ―ＴｅｒｍＭｅｍｏｒｙ、長短期記憶ネットワークであって、時間再帰型ニューラルネットワーク（ＲＮＮ））及び混合構成などを有する。本出願の実施例において、具体的にどんなニューラルネットワークを使用するかということについて、限定していない。訓練を完成した後、当該ライブマップデータソースサイトに適用されるニューラルネットワークモデル、即ち、測位モデルを取得する。 A plane projection photograph having geographical markers is used as a training sample, a neural network model is trained by the training sample, and the neural network model after training is a positioning model. Specifically, a photo set having a specific position and a specific direction is defined as a data pool. 80% randomly extracted from the data pool is used as a training set, and the remaining 20% is used as a test set. The ratio may be adjusted based on the actual training situation. The training set is input and trained to a neural network model that is initialized or pre-trained with a large-scale picture set, and the training results are verified with a test set. Common neural network configurations that can be selected include CNN (Convolutional Neural Network) (i.e., a feedforward neural network including an alternating convolutional layer and a pooling layer), derivative configurations thereof, It has a LSTM (Long Short-Term Memory, a long short-term memory network, and a time recurrent neural network (RNN)), a mixed configuration, and the like. In the embodiments of the present application, there is no limitation as to what kind of neural network is specifically used. After completing the training, obtain a neural network model, ie, a positioning model, that is applied to the live map data source site.

好ましくは、実際応用において、異なるピクチャ採集装置の焦点距離（即ち、視野角）に適するために、パノラマ写真を分割する場合、異なる焦点距離パラメータに基づいて分割することで、訓練サンプルとして、視野角の大きさが異なっている平面投影写真を取得する。具体的に、デワーピング変換において、異なる焦点距離パラメータに基づいて、各パノラマ写真を分割することで、異なる視野角を有する複数組の平面投影写真を取得する。即ち、焦点距離パラメータＦに基づいて分割の数ｎを決定する。焦点距離パラメータが小さいと、視野角が大きく、分割の数ｎはより小さくなってもよい。図２を参照して、図２は本出願の実施例における視野角分割概略図であり、最も通常の焦点距離パラメータＦ＝０.５であり、視野角は９０度であり、分割数ｎ＝４であると、３６０度の全角度をカバーできる。異なる視野角を有する複数の平面投影写真を必要とする場合、焦点距離パラメータＦを例えば１.０及び１.３などの他の数値に変更することで、他の視野角の平面投影写真を取得する。 Preferably, in practical applications, when dividing a panoramic photo to suit the focal length (i.e. viewing angle) of different picture acquisition devices, the viewing angle can be used as a training sample by dividing it based on different focal length parameters. Obtain plane projection photographs with different sizes. Specifically, in the dewarping transformation, each panoramic photograph is divided based on different focal length parameters to obtain a plurality of sets of planar projection photographs having different viewing angles. That is, the number n of divisions is determined based on the focal length parameter F. If the focal length parameter is small, the viewing angle is large and the number of divisions n may be smaller. Referring to FIG. 2, FIG. 2 is a schematic diagram of viewing angle division in the embodiment of the present application, the most common focal length parameter F=0.5, the viewing angle is 90 degrees, and the number of divisions n= If it is 4, all angles of 360 degrees can be covered. If you need multiple planar projection photos with different viewing angles, you can obtain planar projection photos with other viewing angles by changing the focal length parameter F to other values, such as 1.0 and 1.3. do.

好ましくは、視野角測位の精度を向上させるために、パノラマ写真を分割する場合、対応する原画像のカバレッジが所定割合より大きい分割数に基づいて、パノラマ写真を分割してもよい。即ち、同一視野角で、隣接ピクチャがカバー角度を有する平面投影写真を取得する。具体的に、対応する原画像のカバレッジが所定割合より大きい分割数に基づいて、各パノラマ写真を分割して、隣接ピクチャには重畳視野角が存在する複数組の平面投影写真を取得する。即ち、写真の撮影角度を豊かにするために、焦点距離が一定である場合、分割数が均等分割の数より大きくされるように推奨する。即ち、パノラマ写真投影球面の、地面に垂直する軸を回転軸とし、視線中心方向（例えば、図２の矢印）を４５度だけ回転したごとに、視野角が９０度である１枚の平面投影写真を分割し、この場合、隣接ピクチャは４５度の重畳視野角を有する。視線中心方向角度に基づいて、得られた平面投影写真に方向データをマーキングする。Ｆの値は１.０及び１.３であってもよく、視野角はそれぞれ約６０度、３０度であるため、ｎ値に対して１２及び２４を選択してもよい。より多くのＦ値を設定し、ｎの数を増やすことで、訓練セットのカバレッジをさらに向上させてもよい。一般的に、カバレッジが９５％より大きいことを保証できる。 Preferably, in order to improve the accuracy of viewing angle positioning, when dividing a panoramic photograph, the panoramic photograph may be divided based on the number of divisions in which the coverage of the corresponding original image is larger than a predetermined ratio. That is, a planar projection photograph is obtained in which adjacent pictures have the same viewing angle and a cover angle. Specifically, each panoramic photograph is divided based on the number of divisions in which the coverage of the corresponding original image is larger than a predetermined ratio, and a plurality of sets of planar projection photographs in which adjacent pictures have overlapping viewing angles are obtained. That is, in order to increase the number of photographic angles, it is recommended that the number of divisions be larger than the number of equal divisions when the focal length is constant. In other words, each time the panoramic photo projection sphere is rotated by 45 degrees in the line-of-sight center direction (for example, the arrow in Figure 2) with the axis perpendicular to the ground as the rotation axis, one plane projection with a viewing angle of 90 degrees is generated. Segment the photo, in this case adjacent pictures have an overlapping viewing angle of 45 degrees. Direction data is marked on the obtained planar projection photograph based on the line-of-sight center direction angle. The values of F may be 1.0 and 1.3, and the viewing angles are approximately 60 degrees and 30 degrees, respectively, so 12 and 24 may be selected for the n value. The coverage of the training set may be further improved by setting more F values and increasing the number of n. Generally, coverage can be guaranteed to be greater than 95%.

好ましくは、実際応用において、パノラマ写真のみによって訓練すると、ライブマップの更新頻度が低いなどの原因で、視覚測位の認識効果が悪くなる恐れがあり、従って、ニューラルネットワークモデルの訓練過程で、インターネットから取得されたシーン写真、又は測位環境に対して採集された環境写真を使用して、訓練サンプルを補充してもよい。 Preferably, in practical applications, if training is done only with panoramic photos, the recognition effect of visual positioning may be poor due to reasons such as the low update frequency of live maps, so in the training process of the neural network model, training from the Internet may be avoided. The acquired scene photos or environment photos collected for the positioning environment may be used to supplement the training samples.

Ｓ１０３：複数の候補測位によって、最終測位を決定する。 S103: Determine the final positioning based on a plurality of candidate positionings.

複数の候補測位を取得した後、これらの候補測位に基づいて、最終測位を決定する。最終測位を取得した後、ユーザーがチェックするために、それを出力する。 After obtaining a plurality of candidate positionings, a final positioning is determined based on these candidate positionings. After getting the final positioning, output it for the user to check.

具体的に、最終測位として、候補測位から１つの測位をランダムに選択してもよいし、候補測位からいくつかの候補測位をランダムに選択してもよく、これらのいくつかの候補測位に対応する幾何グラフィックスの幾何中心を最終測位とする。無論、高さが重畳するいくつかの候補測位を最終測位としてもよい。 Specifically, as the final positioning, one positioning may be randomly selected from the candidate positionings, or several candidate positionings may be randomly selected from the candidate positionings, and the method corresponds to these several candidate positionings. The final positioning is the geometric center of the geometric graphics. Of course, several candidate positionings whose heights overlap may be used as the final positioning.

好ましくは、候補測位には、特別な個別測位が出現する可能性があると考えて、最終的な測位の精度を向上させるために、候補測位に対してクラスタリング選別を行って、大多数の測位位置に遊離する候補測位を除去して、残された候補測位に基づいて最終測位を決定する。具体的に、実現過程は以下のステップを含み、
ステップ１：複数の候補測位に対してクラスタリング処理を行って、クラスタリング結果によって、複数の候補測位を選別する；
ステップ２：選別されたいくつかの候補測位によって、幾何グラフィックスを構築する；
ステップ３：幾何グラフィックスの幾何中心を最終測位とする。 Preferably, considering that there is a possibility that special individual positionings may appear in the candidate positionings, clustering selection is performed on the candidate positionings to improve the accuracy of the final positioning, and the majority of the positionings are The candidate positionings that are loose in the position are removed, and the final positioning is determined based on the remaining candidate positionings. Specifically, the realization process includes the following steps:
Step 1: Perform clustering processing on multiple candidate positionings, and select multiple candidate positionings based on the clustering results;
Step 2: Construct geometric graphics using selected candidate positioning;
Step 3: Set the geometric center of the geometric graphics as the final positioning.

具体的に、ＤＢＳＣＡＮ（Ｄｅｎｓｉｔｙ―ＢａｓｅｄＳｐａｔｉａｌＣｌｕｓｔｅｒｉｎｇｏｆＡｐｐｌｉｃａｔｉｏｎｓｗｉｔｈＮｏｉｓｅ：ノイズを伴うアプリケーションの密度ベースの空間クラスタリング）のようなクラスタリングアルゴリズムを使用して、候補測位に対して分類を行って、隣接する位置データを１類に分ける。分類パラメータについて、ε近隣エリア＝１であり、最少点数ｍｉｎＰｔｓ＝５であるように設定される。数が最も多い１類の位置結果を確実な結果と見なし、最終的な測位結果として、当該種類の全ての候補測位に対応する幾何グラフィックスの幾何中心を計算する。 Specifically, a clustering algorithm such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is used to perform classification on the candidate positionings and identify adjacent positions. Divide the data into one category. The classification parameters are set so that ε neighborhood area=1 and the minimum score minPts=5. The position result of type 1, which has the largest number, is regarded as a reliable result, and the geometric center of the geometric graphics corresponding to all candidate positioning of the type is calculated as the final positioning result.

好ましくは、測位状況をよりよく展示するために、測位誤差を決定してもよい。具体的に、最終測位によって、複数の候補測位の標準分散を計算して、標準分散を最終測位の測位誤差とする。即ち、各候補測位と最終測位との間の分散を計算して累算し、最終的な測位誤差を取得する。 Preferably, the positioning error may be determined to better display the positioning situation. Specifically, the standard variance of a plurality of candidate positionings is calculated based on the final positioning, and the standard variance is used as the positioning error of the final positioning. That is, the variance between each candidate positioning and the final positioning is calculated and accumulated to obtain the final positioning error.

ライブマップは、実際街並みが見えるマップであり、ライブマップには３６０度の実景が含まれる。ライブマップにおけるパノラマ写真は、実際街並みのマップであり、視覚測位の応用環境と互いに重畳する。これに基づいて、本方法において、ライブマップにおけるパノラマ写真を使用してニューラルネットワークモジュールを訓練することで、視覚測位のための測位モデルを取得できる。広角写真を取得した後、広角写真をランダムに分割することで、測定対象となる画像セットを取得できる。測定対象となる画像セットを測位モデルに入力し、測位して認識することで、複数の候補測位を取得できる。これらの候補測位に基づいて、最終測位を決定できる。このように、本方法において、ライブマップにおけるパノラマ写真に基づいてニューラルネットワークモデルを訓練することで、測位モデルを取得でき、当該測位モデルに基づいて、視覚測位を完成でき、視覚測位において訓練サンプルの採集が困難であるという問題を解決する。 The live map is a map that shows the actual cityscape, and the live map includes a 360-degree view. The panoramic photo in the live map is a map of the actual cityscape, which overlaps with the visual positioning application environment. Based on this, in the present method, a positioning model for visual positioning can be obtained by training a neural network module using panoramic photos in a live map. After acquiring a wide-angle photo, the wide-angle photo can be randomly divided to obtain a set of images to be measured. By inputting an image set to be measured into a positioning model, positioning and recognizing it, multiple candidate positionings can be obtained. Based on these candidate positionings, the final positioning can be determined. As described above, in this method, a positioning model can be obtained by training a neural network model based on panoramic photos in a live map, and visual positioning can be completed based on the positioning model, and the training sample in visual positioning can be Solve the problem of difficult collection.

ここで、上記の実施例に基づいて、本出願の実施例は相応的な改良解決策をさらに提供する。好適／改良実施例において、上記の実施例と同様又は対応するステップの間は、互いに参照してもよく、相応的な有益効果について、互いに参照してもよく、本明細書の好適／改良実施例において、贅言していない。 Now, based on the above embodiments, the embodiments of the present application further provide corresponding improvement solutions. In preferred/improved embodiments, reference may be made between steps similar or corresponding to those in the above embodiments, and reference may be made to each other for corresponding beneficial effects; In the examples, I am not exaggerating.

以上の方法実施例に対応して、本出願の実施例は視覚測位装置をさらに提供し、以下に記載の視覚測位装置と、以上に記載の視覚測位方法とは互いに対応するように参照すればよい。 Corresponding to the above method embodiments, embodiments of the present application further provide a visual positioning device, and the visual positioning device described below and the visual positioning method described above are referred to as corresponding to each other. good.

図３を参照して、当該視覚測位装置は、
広角写真を取得して、広角写真をランダムに分割することで、測定対象となる画像セットを取得する測定対象となる画像セット取得モジュール１０１と、
測定対象となる画像セットを、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである測位モデルに入力し、測位して認識することで、複数の候補測位を取得する候補測位取得モジュール１０２と、
複数の候補測位によって、最終測位を決定する測位出力モジュール１０３と、を含む。 Referring to FIG. 3, the visual positioning device is
an image set acquisition module 101 to be measured, which acquires a wide-angle photograph and randomly divides the wide-angle photograph to obtain an image set to be measured;
a candidate positioning acquisition module 102 that acquires a plurality of candidate positionings by inputting an image set to be measured into a positioning model that is a neural network model trained by panoramic photos on a live map, positioning and recognizing;
and a positioning output module 103 that determines the final positioning based on the plurality of candidate positionings.

本出願の実施例が提供する装置によれば、広角写真を取得して、広角写真をランダムに分割することで、測定対象となる画像セットを取得し、測定対象となる画像セットを、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである測位モデルに入力し、測位して認識することで、複数の候補測位を取得し、複数の候補測位によって、最終測位を決定する。 According to the apparatus provided by the embodiments of the present application, a wide-angle photograph is acquired, the wide-angle photograph is randomly divided to acquire an image set to be measured, and the image set to be measured is mapped to a live map. By inputting the information into a positioning model, which is a neural network model trained using panoramic photos, and performing positioning and recognition, multiple candidate positionings are obtained, and the final positioning is determined from the multiple candidate positionings.

ライブマップは、実際街並みが見えるマップであり、ライブマップには３６０度の実景が含まれる。ライブマップにおけるパノラマ写真は、実際街並みのマップであり、視覚測位の応用環境と互いに重畳する。これに基づいて、本装置において、ライブマップにおけるパノラマ写真を使用してニューラルネットワークモジュールを訓練することで、視覚測位のための測位モデルを取得できる。広角写真を取得した後、広角写真をランダムに分割することで、測定対象となる画像セットを取得できる。測定対象となる画像セットを測位モデルに入力し、測位して認識することで、複数の候補測位を取得できる。これらの候補測位に基づいて、最終測位を決定できる。このように、本装置において、ライブマップにおけるパノラマ写真に基づいてニューラルネットワークモデルを訓練することで、測位モデルを取得でき、当該測位モデルに基づいて、視覚測位を完成でき、視覚測位において訓練サンプルの採集が困難であるという問題を解決する。 The live map is a map that shows the actual cityscape, and the live map includes a 360-degree view. The panoramic photo in the live map is a map of the actual cityscape, which overlaps with the visual positioning application environment. Based on this, in the present device, the panoramic photos in the live map can be used to train the neural network module to obtain a positioning model for visual positioning. After acquiring a wide-angle photo, the wide-angle photo can be randomly divided to obtain a set of images to be measured. By inputting an image set to be measured into a positioning model, positioning and recognizing it, multiple candidate positionings can be obtained. Based on these candidate positionings, the final positioning can be determined. In this way, in this device, a positioning model can be obtained by training a neural network model based on panoramic photos in a live map, visual positioning can be completed based on the positioning model, and training samples can be used in visual positioning. Solve the problem of difficult collection.

本出願の具体的な実施形態において、測位出力モジュール１０３は具体的に、
複数の候補測位に対してクラスタリング処理を行って、クラスタリング結果によって、複数の候補測位を選別する測位選別ユニットと、
選別されたいくつかの候補測位によって、幾何グラフィックスを構築する幾何グラフィックス構築ユニットと、
幾何グラフィックスの幾何中心を最終測位とする最終測位決定ユニットと、を含む。 In a specific embodiment of the present application, the positioning output module 103 specifically includes:
a positioning selection unit that performs clustering processing on the plurality of candidate positionings and selects the plurality of candidate positionings based on the clustering results;
a geometric graphics construction unit that constructs geometric graphics based on selected candidate positioning;
and a final positioning determination unit that determines the geometric center of the geometric graphics as the final positioning.

本出願の具体的な実施形態において、測位出力モジュール１０３は、
最終測位によって、複数の候補測位の標準分散を計算し、標準分散を最終測位の測位誤差とする測位誤差決定ユニットをさらに含む。 In a specific embodiment of the present application, the positioning output module 103 includes:
The method further includes a positioning error determining unit that calculates a standard variance of the plurality of candidate positionings according to the final positioning, and takes the standard variance as a positioning error of the final positioning.

本出願の具体的な実施形態において、モデル訓練モジュールは、
ライブマップからいくつかのパノラマ写真を取得して、各実景写真の地理位置を決定するパノラマ写真取得ユニットと、
いくつかのパノラマ写真に対してデワーピング変換を行って、同一アスペクト比を有する複数組の平面投影写真を取得するデワーピング変換ユニットと、
パノラマ写真との対応関係に基づいて、各組の平面投影写真に、地理位置及び具体的な方向が含まれる地理マーカーを付ける地理マーカー付けユニットと、
地理マーカーが付けられた平面投影写真を訓練サンプルとする訓練サンプル決定ユニットと、
訓練サンプルによって、ニューラルネットワークモデルを訓練することで、訓練後のニューラルネットワークモデルを測位モデルに決定するモデル訓練ユニットと、を含む。 In a specific embodiment of the present application, the model training module includes:
a panoramic photo acquisition unit that obtains several panoramic photos from the live map and determines the geolocation of each live view photo;
a dewarping conversion unit that performs dewarping conversion on several panoramic photos to obtain a plurality of sets of planar projection photos having the same aspect ratio;
a geographic marker attaching unit that attaches a geographic marker including a geographic location and a specific direction to each set of planar projection photos based on the correspondence with the panoramic photo;
a training sample determination unit that uses, as a training sample, a planar projection photograph with geographic markers attached;
A model training unit that trains the neural network model using the training sample and determines the trained neural network model as a positioning model.

本出願の具体的な実施形態において、デワーピング変換ユニットは具体的に、デワーピング変換において、異なる焦点距離パラメータに基づいて、各パノラマ写真を分割して、異なる視野角を有する複数組の平面投影写真を取得する。 In a specific embodiment of the present application, the dewarping transformation unit specifically divides each panoramic photograph into multiple sets of planar projection photographs with different viewing angles based on different focal length parameters in the dewarping transformation. get.

本出願の具体的な実施形態において、デワーピング変換ユニットは具体的に、対応する原画像のカバレッジが所定割合より大きい分割数に基づいて、各パノラマ写真を分割して、隣接ピクチャには重畳視野角が存在する複数組の平面投影写真を取得する。 In a specific embodiment of the present application, the dewarping transformation unit specifically divides each panoramic photo based on the number of divisions in which the coverage of the corresponding original image is larger than a predetermined percentage, and the adjacent pictures have overlapping viewing angles. Obtain multiple sets of planar projection photographs in which .

本出願の具体的な実施形態において、モデル訓練モジュールは、
インターネットから取得されたシーン写真、又は測位環境に対して採集された環境写真を使用して、訓練サンプルを補充するサンプル補充ユニットをさらに含む。 In a specific embodiment of the present application, the model training module includes:
The system further includes a sample replenishment unit that replenishes the training samples using scene photos obtained from the Internet or environment photos collected for the positioning environment.

本出願の具体的な実施形態において、測定対象となる画像セット取得モジュール１０１は具体的に、分割数に基づいて、広角写真に対して、原画像カバレッジが所定割合より大きいランダム分割を行って、分割数にマッチングする測定対象となる画像セットを取得する。 In a specific embodiment of the present application, the image set acquisition module 101 to be measured specifically performs random segmentation on the wide-angle photo based on the number of segments, and the original image coverage is larger than a predetermined percentage. Obtain an image set to be measured that matches the number of divisions.

以上の方法実施例に対応して、本出願の実施例は視覚測位機器をさらに提供し、以下に記載の視覚測位機器と、以上に記載の視覚測位方法とは、互いに対応するように参照すればよい。 Corresponding to the above method embodiments, the embodiments of the present application further provide a visual positioning device, and the visual positioning device described below and the visual positioning method described above are referred to as corresponding to each other. Bye.

図４を参照して、当該視覚測位機器は、
コンピュータプログラムを記憶するメモリ４１０と、
コンピュータプログラムを実行すると、上記の方法実施例が提供する視覚測位方法のステップを実現するプロセッサー４２０と、を含む。 Referring to FIG. 4, the visual positioning device is
a memory 410 for storing computer programs;
and a processor 420 that, when executed, implements the steps of the visual positioning method provided by the method embodiments described above.

具体的に、図５を参照して、本実施例が提供する視覚測位機器の具体的な構成概略図であり、配置又は性能の異なることのため、当該視覚測位機器は大きな差を有する可能性があり、１つ又は１つ以上のプロセッサー（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔｓ、ＣＰＵ）４２０（例えば、１つ又は１つ以上のプロセッサー）及びメモリ４１０を含み、１つ又は１つ以上のコンピュータアプリケーションプログラム４１３又はデータ４１２を記憶する。メモリ４１０は短期記憶又は持続記憶であってもよい。当該コンピュータアプリケーションプログラムは１つ又は１つ以上のモジュール（図示せず）を含み、各モジュールは、データ処理機器に対する一連の指令操作を含む。さらに、中央演算処理装置４２０は、メモリ４１０と通信して、視覚測位装置３０１でメモリ４１０における一連の指令操作を実行するように配置されてもよい。 Specifically, with reference to FIG. 5, it is a specific configuration schematic diagram of the visual positioning device provided by this embodiment, and it is possible that the visual positioning devices have large differences due to differences in arrangement or performance. includes one or more central processing units (CPUs) 420 (e.g., one or more processors) and memory 410, and includes one or more computer application programs 413 or data. 412 is stored. Memory 410 may be short-term memory or persistent memory. The computer application program includes one or more modules (not shown), each module including a series of command operations for data processing equipment. Additionally, central processing unit 420 may be arranged to communicate with memory 410 to perform a series of command operations in memory 410 on visual positioning device 301 .

視覚測位機器４００は１つ又は１つ以上の電源４３０、１つ又は１つ以上の有線又は無線ネットワークインターフェース４４０、１つ又は１つ以上の入力出力インターフェース４５０、及び／又は１つ又は１つ以上のオペレーティングシステム４１１を含む。 The visual positioning device 400 has one or more power supplies 430, one or more wired or wireless network interfaces 440, one or more input/output interfaces 450, and/or one or more includes an operating system 411.

以上に記載の視覚測位方法におけるステップは、視覚測位機器の構成によって実現される。 The steps in the visual positioning method described above are realized by the configuration of the visual positioning device.

以上の方法実施例に対応して、本出願の実施例は可読記憶媒体をさらに提供し、以下に記載の可読記憶媒体と、以上に記載の視覚測位方法とは、互いに対応するように参照すればよい。 Corresponding to the above method embodiments, the embodiments of the present application further provide a readable storage medium, and the readable storage medium described below and the visual positioning method described above are referred to as corresponding to each other. Bye.

コンピュータプログラムが記憶される可読記憶媒体であって、コンピュータプログラムはプロセッサーにより実行されると、上記の方法実施例が提供する視覚測位方法のステップを実現する。 A readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the visual positioning method provided by the method embodiments described above.

当該可読記憶媒体は具体的に、Ｕディスク、ポータブルハードディスク、読み取り専用メモリ（Ｒｅａｄ―ＯｎｌｙＭｅｍｏｒｙ、ＲＯＭ）、ランダムアクセスメモリ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ、ＲＡＭ）、磁気ディスク又は光ディスクなどの、プログラムコードを記憶できる各種の可読記憶媒体であってもよい。 The readable storage medium can specifically store program codes, such as a U disk, a portable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk. It may be any type of readable storage medium.

当業者がさらに意識できるように、本明細書が開示した実施例を結合して記載された各例示のユニット及アルゴリズムステップは、電子ハードウェア、コンピュータソフトウェア又は両者の結合で実現され、ハードウェアとソフトウェアとの互換性を明らかに説明するために、上記の説明において、機能に従って、各例示の構成及ステップを一般的に説明する。これらの機能は、ハードウェア、それともソフトウェアの方式で実行されるかということは、技術的解決策の特定の応用及び設計の約束条件に依存する。当業者は、各特定の応用に対して、異なる方法を使用して記載される機能を実現してもよいが、このような実現は、本出願の範囲を超えていない。 As those skilled in the art will further appreciate, each example unit and algorithm step described in conjunction with the embodiments disclosed herein may be implemented in electronic hardware, computer software, or a combination of both; In order to clearly explain compatibility with software, the above description generally describes each example structure and steps according to function. Whether these functions are implemented in hardware or software depends on the specific application and design commitments of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application, but such implementation does not go beyond the scope of this application.

Claims

視覚測位方法であって、
広角写真を取得して、前記広角写真をランダムに分割することで、測定対象となる画像セットを取得するステップと、
前記測定対象となる画像セットを、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである測位モデルに入力し、測位して認識することで、複数の候補測位を取得するステップと、
複数の前記候補測位によって、最終測位を決定するステップと、を含み、
前記複数の前記候補測位によって、最終測位を決定するステップは、
複数の前記候補測位に対してクラスタリング処理を行って、クラスタリング結果によって、複数の前記候補測位を選別するステップと、
選別されたいくつかの候補測位によって、幾何グラフィックスを構築するステップと、
前記幾何グラフィックスの幾何中心を前記最終測位とするステップと、を含むことを特徴とする視覚測位方法。 A visual positioning method,
obtaining a wide-angle photograph and randomly dividing the wide-angle photograph to obtain an image set to be measured;
inputting the image set to be the measurement target into a positioning model that is a neural network model trained with panoramic photos on a live map, and obtaining a plurality of candidate positionings by positioning and recognizing;
determining a final positioning based on the plurality of candidate positionings ,
determining a final positioning based on the plurality of candidate positionings,
performing clustering processing on the plurality of candidate positionings, and selecting the plurality of candidate positionings based on the clustering results;
constructing geometric graphics using the selected several candidate positionings;
A visual positioning method comprising the step of setting the geometric center of the geometric graphics as the final positioning .

前記最終測位によって、複数の前記候補測位の標準分散を計算するステップと、
前記標準分散を前記最終測位の測位誤差とするステップと、をさらに含むことを特徴とする請求項１に記載の視覚測位方法。 calculating a standard variance of the plurality of candidate positionings according to the final positioning;
The visual positioning method according to claim 1 , further comprising the step of using the standard variance as a positioning error of the final positioning.

前記ニューラルネットワークモデルの訓練過程は、
前記ライブマップからいくつかの前記パノラマ写真を取得して、各実景写真の地理位置を決定するステップと、
いくつかの前記パノラマ写真に対してデワーピング変換を行って、同一アスペクト比を有する複数組の平面投影写真を取得するステップと、
前記パノラマ写真との対応関係に基づいて、各組の前記平面投影写真に、地理位置及び具体的な方向が含まれる地理マーカーを付けるステップと、
地理マーカーが付けられた平面投影写真を訓練サンプルとするステップと、
前記訓練サンプルによって、前記ニューラルネットワークモデルを訓練して、訓練後の前記ニューラルネットワークモデルを前記測位モデルに決定するステップと、を含むことを特徴とする請求項１に記載の視覚測位方法。 The training process of the neural network model is as follows:
retrieving a number of panoramic photos from the live map and determining the geolocation of each live view photo;
performing dewarping transformation on some of the panoramic photographs to obtain a plurality of sets of planar projection photographs having the same aspect ratio;
attaching a geographic marker including a geographic location and a specific direction to each set of the planar projection photos based on the correspondence with the panoramic photos;
using a planar projection photograph with geographic markers as a training sample;
The visual positioning method according to claim 1, further comprising the step of training the neural network model using the training sample and determining the trained neural network model as the positioning model.

前記いくつかの前記パノラマ写真に対してデワーピング変換を行って、同一アスペクト比を有する複数組の平面投影写真を取得するステップは、
デワーピング変換において、異なる焦点距離パラメータに基づいて、各前記パノラマ写真を分割して、異なる視野角を有する複数組の平面投影写真を取得するステップを含むことを特徴とする請求項３に記載の視覚測位方法。 The step of performing dewarping transformation on the several panoramic photographs to obtain a plurality of sets of planar projection photographs having the same aspect ratio,
Vision according to claim 3 , characterized in that the dewarping transformation comprises the step of dividing each said panoramic photograph based on different focal length parameters to obtain a plurality of sets of planar projection photographs having different viewing angles. Positioning method.

前記デワーピング変換において、異なる焦点距離パラメータに基づいて、各前記パノラマ写真を分割して、異なる視野角を有する複数組の平面投影写真を取得するステップは、
対応する原画像のカバレッジが所定割合より大きい分割数に基づいて、各前記パノラマ写真を分割して、隣接ピクチャには重畳視野角が存在する複数組の平面投影写真を取得するステップを含むことを特徴とする請求項４に記載の視覚測位方法。 In the dewarping transformation, dividing each panoramic photograph based on different focal length parameters to obtain a plurality of sets of planar projection photographs having different viewing angles includes:
The method further includes the step of dividing each of the panoramic photographs based on the number of divisions in which the coverage of the corresponding original image is larger than a predetermined ratio to obtain a plurality of sets of planar projection photographs in which adjacent pictures have overlapping viewing angles. The visual positioning method according to claim 4 .

前記ニューラルネットワークモデルの訓練過程は、
インターネットから取得されたシーン写真、又は測位環境に対して採集された環境写真を使用して、前記訓練サンプルを補充するステップをさらに含むことを特徴とする請求項３に記載の視覚測位方法。 The training process of the neural network model is as follows:
The visual positioning method of claim 3 , further comprising replenishing the training samples using scene photos obtained from the Internet or environment photos collected for the positioning environment.

前記広角写真をランダムに分割することで、測定対象となる画像セットを取得するステップは、
分割数に基づいて、前記広角写真に対して、原画像カバレッジが所定割合より大きいランダム分割を行って、前記分割数にマッチングする測定対象となる画像セットを取得するステップを含むことを特徴とする請求項１に記載の視覚測位方法。 The step of randomly dividing the wide-angle photograph to obtain an image set to be measured includes:
The method further comprises the step of performing random division on the wide-angle photograph based on the number of divisions so that the original image coverage is larger than a predetermined ratio, and obtaining an image set to be measured that matches the number of divisions. The visual positioning method according to claim 1.

視覚測位装置であって、
広角写真を取得して、前記広角写真をランダムに分割することで、測定対象となる画像セットを取得する測定対象となる画像セット取得モジュールと、
前記測定対象となる画像セットを、ライブマップにおけるパノラマ写真によって訓練されたニューラルネットワークモデルである測位モデルに入力し、測位して認識することで、複数の候補測位を取得する候補測位取得モジュールと、
複数の前記候補測位によって、最終測位を決定する測位出力モジュールと、を含み、
前記測位出力モジュールは、
複数の候補測位に対してクラスタリング処理を行って、クラスタリング結果によって、複数の候補測位を選別する測位選別ユニットと、
選別されたいくつかの候補測位によって、幾何グラフィックスを構築する幾何グラフィックス構築ユニットと、
幾何グラフィックスの幾何中心を最終測位とする最終測位決定ユニットと、を含むことを特徴とする視覚測位装置。 A visual positioning device,
an image set acquisition module to be measured, which acquires a wide-angle photograph and randomly divides the wide-angle photograph to acquire an image set to be measured;
a candidate positioning acquisition module that acquires a plurality of candidate positionings by inputting the image set to be the measurement target into a positioning model that is a neural network model trained with panoramic photos on a live map, positioning and recognizing;
a positioning output module that determines a final positioning based on the plurality of candidate positionings ,
The positioning output module is
a positioning selection unit that performs clustering processing on the plurality of candidate positionings and selects the plurality of candidate positionings based on the clustering results;
a geometric graphics construction unit that constructs geometric graphics based on selected candidate positioning;
A visual positioning device comprising: a final positioning determination unit that determines the final positioning using the geometric center of geometric graphics .

視覚測位機器であって、
コンピュータプログラムを記憶するメモリと、
前記コンピュータプログラムを実行する時、請求項１～７の何れか１項に記載の視覚測位方法を実現するプロセッサーと、を含むことを特徴とする視覚測位機器。 A visual positioning device,
memory for storing computer programs;
A visual positioning device comprising: a processor that implements the visual positioning method according to any one of claims 1 to 7 when executing the computer program.

可読記憶媒体であって、前記可読記憶媒体にはコンピュータプログラムが記憶され、前記コンピュータプログラムはプロセッサーにより実行されると、請求項１～７の何れか１項に記載の視覚測位方法を実現することを特徴とする可読記憶媒体。 A readable storage medium, wherein a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, the visual positioning method according to any one of claims 1 to 7 is realized. A readable storage medium characterized by: