CN111758118B

CN111758118B - Visual positioning method, device, equipment and readable storage medium

Info

Publication number: CN111758118B
Application number: CN202080001067.0A
Authority: CN
Inventors: 陈尊裕; 吴珏其; 胡斯洋; 陈欣; 吴沛谦; 张仲文
Original assignee: Fengtuzhi Technology Holding Co ltd
Current assignee: Fengtuzhi Technology Holding Co ltd
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2024-04-16
Anticipated expiration: 2040-05-26
Also published as: WO2021237443A1; JP2023523364A; JP7446643B2; CN111758118A

Abstract

A visual positioning method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: obtaining a wide-angle photo, and randomly dividing the wide-angle photo to obtain a to-be-detected atlas; inputting the to-be-detected atlas into a positioning model for positioning identification to obtain a plurality of candidate positioning; the positioning model is a neural network model trained by using panoramic photos in the live-action map; and determining the final positioning by utilizing the plurality of candidate positioning. In the application, the neural network model can be trained based on the panoramic photo in the live-action map to obtain a positioning model, and visual positioning can be completed based on the positioning model, so that the problem that a visual positioning training sample is difficult to collect is solved.

Description

Visual positioning method, device, equipment and readable storage medium

Technical Field

The present disclosure relates to the field of positioning technologies, and in particular, to a visual positioning method, device, apparatus, and readable storage medium.

Background

Visual positioning principle based on machine learning: training is carried out by using a large number of real scene photos with position marks, so that a neural network model with the input of photos (RGB numerical matrix) and the specific position is obtained. After the trained neural network model is obtained, a specific shooting position can be obtained only by taking a picture of the environment by a user.

This approach requires taking a large number of photo samples as training data sets for the use environment. For example, in some documents, 330 photographs need to be taken in order to achieve visual positioning of a street corner store 35 meters wide, while 1500 or more photographs need to be taken in order to achieve visual positioning of a street 140 meters wide (positioning only for one side); to achieve a certain factory positioning, the factory needs to be divided into 18 areas, and 200 images need to be taken in each area. It can be seen that in order to guarantee the visual positioning effect, a large number of live photographs need to be acquired as training data, and these photographs have to be guaranteed to be taken at every corner in the scene, which is very time-consuming and labor-intensive.

In summary, how to solve the problems of difficult sample collection in visual positioning is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a visual positioning method, a visual positioning device, visual positioning equipment and a readable storage medium, and the problem that a sample is difficult to collect in visual positioning can be solved by training a neural network model through panoramic photos in a live-action map.

In order to solve the technical problems, the application provides the following technical scheme:

a visual positioning method, comprising:

obtaining a wide-angle photo, and randomly dividing the wide-angle photo to obtain a to-be-detected atlas;

inputting the to-be-detected atlas into a positioning model for positioning identification to obtain a plurality of candidate positioning; the positioning model is a neural network model trained by using panoramic photos in a live-action map;

and determining the final positioning by utilizing a plurality of candidate positioning.

Preferably, said determining a final position fix using a plurality of said candidate position fixes includes:

clustering the candidate positioning, and screening the candidate positioning by using a clustering result;

constructing geometric figures by utilizing a plurality of candidate positioning obtained by screening;

the geometric center of the geometric figure is taken as the final positioning.

Preferably, the method further comprises:

calculating standard deviations of a plurality of candidate positioning by utilizing the final positioning;

and taking the standard deviation as a positioning error of the final positioning.

Preferably, the process of training the neural network model includes:

acquiring a plurality of panoramic photos from the live-action map, and determining the geographic position of each live-action photo;

performing anti-distortion transformation on a plurality of panoramic photos to obtain a plurality of groups of plane projection photos with the same length-width ratio;

marking geographic marks for each group of plane projection photos according to the corresponding relation with the panoramic photos; the geographic markers include geographic locations and specific orientations;

taking the plane projection photo marked with the geographic mark as a training sample;

and training the neural network model by using the training sample, and determining the trained neural network model as the positioning model.

Preferably, the performing a back-warping transformation on the panoramic photos to obtain a plurality of groups of planar projection photos with the same aspect ratio includes:

and in the anti-warping transformation, dividing each panoramic photo according to different focal length parameters to obtain a plurality of groups of plane projection photos with different visual angles.

Preferably, the splitting each panoramic photo according to different focal length parameters in the anti-warping transformation to obtain a plurality of groups of plane projection photos with different viewing angles includes:

and dividing each panoramic photo according to the dividing quantity with the coverage rate of the corresponding original image being larger than the specified percentage, so as to obtain a plurality of groups of plane projection photos with the overlapping view angles of the adjacent pictures.

Preferably, the process of training the neural network model further comprises:

the training samples are supplemented with ambient photographs taken from the internet or from the positioning environment.

Preferably, the random segmentation is performed on the wide-angle photo to obtain a to-be-detected atlas, including:

and randomly dividing the wide-angle photo with the original image coverage rate larger than a specified percentage according to the dividing number to obtain a to-be-detected image set matched with the dividing number.

A visual positioning device, comprising:

the to-be-detected atlas acquisition module is used for acquiring a wide-angle photo, and randomly dividing the wide-angle photo to acquire a to-be-detected atlas;

the candidate positioning acquisition module is used for inputting the atlas to be detected into a positioning model for positioning identification to obtain a plurality of candidate positioning; the positioning model is a neural network model trained by using panoramic photos in a live-action map;

and the positioning output module is used for determining the final positioning by utilizing a plurality of candidate positioning.

A visual positioning apparatus comprising:

a memory for storing a computer program;

and a processor for implementing the visual localization method as described above when executing the computer program.

A readable storage medium having stored thereon a computer program which when executed by a processor implements a visual localization method as described above.

By applying the method provided by the embodiment of the application, a wide-angle photo is obtained, and the wide-angle photo is randomly segmented to obtain a to-be-detected atlas; inputting the to-be-detected atlas into a positioning model for positioning identification to obtain a plurality of candidate positioning; the positioning model is a neural network model trained by using panoramic photos in the live-action map; and determining the final positioning by utilizing the plurality of candidate positioning.

The real scene map is a map capable of seeing a real street scene, and the real scene map comprises 360-degree real scenes. And the panoramic photo in the live-action map is the real street view map, which is overlapped with the application environment of visual positioning. Based on the above, in the method, the neural network module is trained by using the panoramic photo in the live-action map, so that a positioning model for visual positioning can be obtained. After the wide-angle photo is obtained, the wide-angle photo is randomly segmented, and a to-be-detected atlas can be obtained. And inputting the to-be-detected atlas into a positioning model for positioning identification, so that a plurality of candidate positioning can be obtained. Based on these candidate locations, a final location may be determined. Therefore, in the method, the neural network model is trained based on the panoramic photo in the live-action map to obtain a positioning model, and visual positioning can be completed based on the positioning model, so that the problem that a visual positioning training sample is difficult to collect is solved.

Correspondingly, the embodiment of the application also provides a device, equipment and a readable storage medium corresponding to the visual positioning method, which have the technical effects and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described, and it is possible for a person skilled in the art to obtain other drawings from these drawings without inventive effort.

FIG. 1 is a flowchart of an implementation of a visual positioning method according to an embodiment of the present application;

FIG. 2 is a view angle segmentation schematic diagram according to an embodiment of the present application;

FIG. 3 is a schematic structural view of a visual positioning device according to an embodiment of the present application;

FIG. 4 is a schematic structural view of a visual positioning device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a visual positioning device according to an embodiment of the present application.

Detailed Description

In order to provide a better understanding of the present application, those skilled in the art will now make further details of the present application with reference to the drawings and detailed description. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be noted that, because the neural network model may be stored in the cloud or the local device, the visual positioning method provided by the embodiment of the present invention may be directly applied to the cloud server, or may be in the local device. The equipment that needs to carry out the location possesses and shoots, networking function can realize the location through a wide-angle photograph.

Referring to fig. 1, fig. 1 is a flowchart of a visual positioning method according to an embodiment of the application, and the method includes the following steps:

s101, obtaining a wide-angle photo, and randomly dividing the wide-angle photo to obtain a to-be-detected atlas.

And wide angle, namely using a wide angle lens or a picture shot in a panoramic mode. In short, the smaller the focal length, the wider the field of view and the wider the range of scenes that can be accommodated within the photograph.

Because, in the visual positioning method provided by the invention, the panoramic photo in the live-action map is adopted to train the neural network model. Thus, in order to perform better visual positioning, the required photograph is also a wide-angle photograph in performing visual positioning using a positioning model. For example, a user may take a wide-angle photograph of the surrounding environment with a view angle exceeding 120 degrees (although other degrees, such as 140 degrees, 180 degrees, etc.) using a wide-angle mode (or super-wide angle mode) or a panoramic mode at a location where positioning is desired.

After the wide-angle photo is obtained, the wide-angle photo is randomly segmented, and a to-be-detected atlas formed by a plurality of segmented photos is obtained.

Particularly, how many photos are segmented from the wide-angle photo can be set according to the training effect of the world positioning model and the actual positioning accuracy requirement. Generally, in the identifiable range (the size of the photo is too small, there is a problem that no relevant positioning features exist and effective recognition cannot be performed), the larger the segmentation number is, the higher the positioning accuracy is, and of course, the more training iterations of the model are, the longer the training time is.

Preferably, in order to improve positioning accuracy, when the wide-angle photo is segmented, random segmentation with original image coverage larger than a specified percentage can be performed on the wide-angle photo according to the segmentation number, so as to obtain a to-be-detected image set matched with the segmentation number. Specifically, the wide-angle photograph can be randomly segmented into N images with an aspect ratio of 1:1 (it should be noted that the aspect ratio can be other ratios, the aspect ratio is the same as the aspect ratio of a training sample used for training the positioning model), and the height is 1/3-1/2 of the height of the wide-angle photograph, which is used as the to-be-measured atlas. The number of N is set according to the training effect and the positioning accuracy, when the training effect is slightly worse and the positioning accuracy is required to be high, a higher N value is selected, and usually the number of N can be set to 100 (of course, other values, such as 50, 80, etc., are also selected, and are not enumerated here). Typically, the random segmentation results require >95% coverage of the original (i.e., the wide-angle photograph) (although other percentages may be set and are not enumerated here).

S102, inputting the to-be-detected atlas into a positioning model for positioning identification, and obtaining a plurality of candidate positioning.

The positioning model is a neural network model trained by using panoramic photos in the live-action map.

In order to obtain a more accurate positioning effect, in this embodiment, each divided photo in the to-be-detected image set is respectively input into a positioning model for positioning identification, and an output about a positioning result is obtained for each photo. In this embodiment, the positioning result corresponding to each divided photograph is used as a candidate positioning.

It should be noted that, before practical application, a positioning model needs to be obtained through training. A process of training a neural network model, comprising:

step one, acquiring a plurality of panoramic photos from a live-action map, and determining the geographic position of each live-action photo;

performing anti-distortion transformation on the panoramic photos to obtain a plurality of groups of plane projection photos with the same length-width ratio;

step four, taking the plane projection photo marked with the geographic mark as a training sample;

training the neural network model by using the training sample, and determining the trained neural network model as a positioning model.

For ease of description, the five steps described above are combined.

Because the view angle of the panoramic photo is nearly 360 degrees, in this embodiment, the panoramic photo can be subjected to anti-warping transformation, and then several groups of plane projection photos with the same length ratio are obtained. Because the panoramic photo and the geographic position in the live-action map have a corresponding relationship, in this embodiment, the geographic position of a group of plane projection photos separated from the same panoramic photo corresponds to the geographic position of the panoramic photo. In addition, when a panoramic photograph is divided, the orientation of the divided photograph is also clear because the division is performed based on the angle of view, and in this embodiment, the geographic position and the specific orientation are added as geographic marks. That is, each planar projection photograph has a corresponding geographic location and specific orientation.

Taking the plane projection photo with the geographic mark as a training sample, and training the neural network model by using the training sample, wherein the trained neural network model is the positioning model. Specifically, a photo set with a specific location, a specific orientation may be used as a data pool. 80% of the data pool was randomly extracted as training set, leaving 20% as test set. The ratio can also be adjusted according to the actual training situation. The training set is input into an initialized neural network model or a neural network model pre-trained by a large-scale picture set for training, and the training result is verified by a test set. The common neural network structure is selected from CNN (Convolutional Neural Network, convolutional neural network, i.e. a feedforward neural network, including convolutional layer (alternating convolutional layer) and pool layer (pool layer)) and its derivative structure, LSTM (Long Short-Term Memory, long-Term Memory network), a time Recurrent Neural Network (RNN)), and a hybrid structure. The specific type of neural network used is not limited in the embodiments of the present application. And obtaining a neural network model, namely a positioning model, suitable for the live-action map data source field after training is completed.

Preferably, in order to adapt to the focal length (i.e. viewing angle) of different image capturing devices in practical application, when segmenting the panoramic photo, segmentation can be performed according to different focal length parameters, so as to obtain plane projection photos with different viewing angles as training samples. Specifically, each panoramic photo can be segmented according to different focal length parameters in the anti-warping transformation to obtain a plurality of groups of plane projection photos with different visual angles. That is, the number of divisions n is determined from the focal length parameter F. When the focal length parameter is small, the viewing angle is large, and the number n of divisions can be smaller. As shown in fig. 2, fig. 2 is a view angle division schematic diagram in the embodiment of the present application, where the most commonly used focal length parameter f=0.5, the view angle is 90 degrees, and the division number n=4 can cover a full angle of 360 degrees. When a plurality of plane projection photographs with different viewing angles are required, the focal length parameter F can be changed to other values, such as 1.0 and 1.3, so as to obtain plane projection photographs with other viewing angles.

Preferably, in order to improve the accuracy of positioning the view angle, when the panoramic photo is segmented, the panoramic photo can be segmented according to the segmentation quantity with the coverage rate of the corresponding artwork larger than a specified percentage. That is, a planar projection photograph is obtained in which adjacent pictures have a coverage angle at the same viewing angle. Specifically, each panoramic photo is segmented according to the segmentation quantity with the coverage rate of the corresponding original pictures being larger than a specified percentage, so that a plurality of groups of plane projection photos with the overlapping view angles of the adjacent pictures are obtained. That is, in order to enrich the photographing angle of the photograph, in the case where the focal length is fixed, it is recommended that the number of divisions is larger than the number of equal divisions. The axis perpendicular to the ground of the panoramic photo projection sphere is taken as a rotation axis, and a plane projection photo with a view angle of 90 degrees is segmented every 45 degrees when the center of the line of sight faces (as shown by an arrow in the diagram of fig. 2), and at the moment, adjacent pictures have overlapping view angles of 45 degrees. And labeling orientation data for the obtained plane projection photo according to the orientation angle of the sight line center. The values of F can be 1.0 and 1.3, the viewing angles are about 60 degrees and 30 degrees respectively, and the values of n can be 12 and 24. More F values can be set and the number of n can be increased to further improve the coverage rate of the training set. Coverage of greater than 95% is generally ensured.

Preferably, in consideration of practical application, the training is performed by relying on panoramic photos, which may result in poor visual positioning recognition effect due to low update frequency of live-action maps, so that in the process of training the neural network model, scene photos can be obtained from the internet, or environmental photos collected from the positioning environment can be used for supplementing training samples.

S103, determining final positioning by utilizing the plurality of candidate positioning.

After a plurality of candidate locations are obtained, a final location may be determined based on the candidate locations. After the final positioning is obtained, it can be output for viewing by the user.

Specifically, one positioning can be randomly selected from the candidate positioning as the final positioning, or a plurality of candidate positioning can be randomly selected from the candidate positioning, and the geometric centers of geometric figures corresponding to the plurality of candidate positioning are taken as the final positioning. Of course, several candidate locations with high degree of coincidence may also be used as final locations.

Preferably, in order to improve the accuracy of the final position fix, the candidate position fixes may be clustered to remove candidate position fixes that are free from most position fixes, and then the final position fix may be determined based on the remaining candidate position fixes, taking into account that relatively specific individual position fixes may occur in the candidate position fixes. Specifically, the implementation process includes:

step one, clustering a plurality of candidate positioning, and screening the plurality of candidate positioning by using a clustering result;

step two, constructing geometric figures by utilizing a plurality of candidate positioning obtained by screening;

and thirdly, taking the geometric center of the geometric figure as the final positioning.

Specifically, candidate locations may be classified using a clustering algorithm such as DBSCAN (Density-Based Spatial Clustering of Applications with Noise), classifying neighboring location data into one category. Wherein, the classification parameter can be set as epsilon neighborhood=1, and the minimum point minpts=5. And taking the most number of the position results as reliable results, and calculating the geometric centers of all candidate positioning corresponding geometric figures of the class as final positioning results.

Preferably, in order to better demonstrate the positioning, the positioning error is also determined. Specifically, calculating standard deviations of a plurality of candidate positioning by utilizing the final positioning; the standard deviation is taken as the positioning error of the final positioning. That is, the variance between each candidate position fix and the final position fix is calculated and accumulated to obtain the final position fix error.

It should be noted that, based on the above embodiments, the embodiments of the present application further provide corresponding improvements. The preferred/improved embodiments relate to the same steps as those in the above embodiments or the steps corresponding to the steps may be referred to each other, and the corresponding advantages may also be referred to each other, so that detailed descriptions of the preferred/improved embodiments are omitted herein.

Corresponding to the above method embodiments, the embodiments of the present application further provide a visual positioning device, where a visual positioning device described below and a visual positioning method described above may be referred to correspondingly to each other.

Referring to fig. 3, the visual positioning device includes:

the to-be-measured atlas obtaining module 101 is configured to obtain a wide-angle photograph, and randomly segment the wide-angle photograph to obtain a to-be-measured atlas;

the candidate positioning acquisition module 102 is used for inputting the atlas to be detected into the positioning model for positioning identification to obtain a plurality of candidate positioning; the positioning model is a neural network model trained by using panoramic photos in the live-action map;

and the positioning output module 103 is used for determining a final positioning by utilizing the plurality of candidate positioning.

By applying the device provided by the embodiment of the application, a wide-angle photo is obtained, and the wide-angle photo is randomly segmented to obtain a to-be-detected atlas; inputting the to-be-detected atlas into a positioning model for positioning identification to obtain a plurality of candidate positioning; the positioning model is a neural network model trained by using panoramic photos in the live-action map; and determining the final positioning by utilizing the plurality of candidate positioning.

The real scene map is a map capable of seeing a real street scene, and the real scene map comprises 360-degree real scenes. And the panoramic photo in the live-action map is the real street view map, which is overlapped with the application environment of visual positioning. Based on the above, in the device, the neural network module is trained by using the panoramic photo in the live-action map, so that a positioning model for visual positioning can be obtained. After the wide-angle photo is obtained, the wide-angle photo is randomly segmented, and a to-be-detected atlas can be obtained. And inputting the to-be-detected atlas into a positioning model for positioning identification, so that a plurality of candidate positioning can be obtained. Based on these candidate locations, a final location may be determined. Therefore, in the device, the neural network model can be trained based on the panoramic photo in the live-action map to obtain a positioning model, and visual positioning can be completed based on the positioning model, so that the problem that a visual positioning training sample is difficult to collect is solved.

In one embodiment of the present application, the positioning output module 103 specifically includes:

the positioning screening unit is used for carrying out clustering treatment on the plurality of candidate positioning and screening the plurality of candidate positioning by utilizing a clustering result;

the geometric figure constructing unit is used for constructing geometric figures by utilizing a plurality of candidate positioning obtained by screening;

and the final positioning determining unit is used for taking the geometric center of the geometric figure as the final positioning.

In a specific embodiment of the present application, the positioning output module 103 further includes:

a positioning error determination unit for calculating standard deviations of a plurality of candidate positioning using the final positioning; the standard deviation is taken as the positioning error of the final positioning.

In one embodiment of the present application, a model training module includes:

the panoramic photo acquisition unit is used for acquiring a plurality of panoramic photos from the live-action map and determining the geographic position of each live-action photo;

the anti-distortion transformation unit is used for carrying out anti-distortion transformation on the panoramic photos to obtain a plurality of groups of plane projection photos with the same length-width ratio;

the geographic marking and labeling unit is used for marking geographic marks for each group of plane projection photos according to the corresponding relation with the panoramic photos; the geographic markers include geographic locations and specific orientations;

the training sample determining unit is used for taking the plane projection photo marked with the geographic mark as a training sample;

the model training unit is used for training the neural network model by using the training sample and determining the trained neural network model as a positioning model.

In a specific embodiment of the present application, the anti-warping transformation unit is specifically configured to segment each panoramic photo according to different focal length parameters in anti-warping transformation, so as to obtain a plurality of groups of planar projection photos with different viewing angles.

In a specific embodiment of the present application, the anti-distortion transformation unit is specifically configured to segment each panoramic photo according to a segmentation number corresponding to an original image coverage rate greater than a specified percentage, so as to obtain a plurality of groups of planar projection photos with overlapping viewing angles of adjacent pictures.

In a specific embodiment of the present application, the model training module further includes:

and the sample supplementing unit is used for supplementing the training sample by utilizing the scene photo acquired from the Internet or the environment photo acquired from the positioning environment.

In a specific embodiment of the present application, the to-be-measured atlas obtaining module 101 is specifically configured to perform random segmentation with an original image coverage rate greater than a specified percentage on the wide-angle photograph according to the segmentation number, so as to obtain a to-be-measured atlas matched with the segmentation number.

Corresponding to the above method embodiments, the embodiments of the present application further provide a visual positioning apparatus, where a visual positioning apparatus described below and a visual positioning method described above may be referred to correspondingly to each other.

Referring to fig. 4, the visual positioning apparatus includes:

a memory 410 for storing a computer program;

the processor 420 is configured to implement the steps of the visual positioning method provided in the above method embodiment when executing the computer program.

Specifically, referring to fig. 5, a schematic diagram of a specific structure of a visual positioning apparatus according to the present embodiment may be provided, where the visual positioning apparatus may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 420 (e.g., one or more processors) and a memory 410, and one or more storage computer applications 413 or data 412. Wherein the memory 410 may be transient storage or persistent storage. The computer application may include one or more modules (not shown) each of which may include a series of instruction operations in the data processing apparatus. Still further, the central processor 420 may be configured to communicate with the memory 410 and execute a series of instruction operations in the memory 410 on the visual positioning device 301.

Visual positioning device 400 may also include one or more power supplies 430, one or more wired or wireless network interfaces 440, one or more input/output interfaces 450, and/or one or more operating systems 411.

The steps in the visual positioning method described above may be implemented by the structure of the visual positioning apparatus.

Corresponding to the above method embodiments, the embodiments of the present application further provide a readable storage medium, where a readable storage medium described below and a visual positioning method described above may be referred to correspondingly to each other.

A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the visual positioning method provided by the above-described method embodiments.

The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, and the like.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as outside the scope of this application.

Claims

1. A method of visual localization comprising:

determining a final positioning by utilizing a plurality of candidate positioning;

a process of training the neural network model, comprising:

acquiring a plurality of panoramic photos from the live-action map, and determining the geographic position of each panoramic photo;

training the neural network model by using the training sample, and determining the trained neural network model as the positioning model;

performing anti-distortion transformation on the panoramic photos to obtain a plurality of groups of plane projection photos with the same length-width ratio, wherein the method comprises the following steps:

dividing each panoramic photo according to different focal length parameters in the anti-distortion transformation to obtain a plurality of groups of plane projection photos with different visual angles;

dividing each panoramic photo according to different focal length parameters in the anti-warping transformation to obtain a plurality of groups of plane projection photos with different visual angles, wherein the method comprises the following steps:

2. The visual positioning method of claim 1, wherein said determining a final position fix using a plurality of said candidate position fixes comprises:

the geometric center of the geometric figure is taken as the final positioning.

3. The visual positioning method of claim 2, further comprising:

4. The visual localization method of claim 1, wherein the process of training the neural network model further comprises:

5. The visual positioning method according to claim 1, wherein randomly dividing the wide-angle photograph to obtain an atlas to be measured comprises:

6. A visual positioning device, comprising:

the positioning output module is used for determining final positioning by utilizing a plurality of candidate positioning;

a model training module that performs a process of training the neural network model, comprising:

the panoramic photo acquisition unit is used for acquiring a plurality of panoramic photos from the live-action map and determining the geographic position of each panoramic photo;

the anti-distortion transformation unit is used for carrying out anti-distortion transformation on a plurality of panoramic photos to obtain a plurality of groups of plane projection photos with the same length-width ratio;

the geographic marking unit is used for marking geographic marks for each group of plane projection photos according to the corresponding relation with the panoramic photos; the geographic markers include geographic locations and specific orientations;

the model training unit is used for training the neural network model by using the training sample and determining the trained neural network model as the positioning model;

the performing a reverse distortion transformation on the panoramic photos in the reverse distortion transformation unit to obtain a plurality of groups of planar projection photos with the same aspect ratio, including:

7. A visual positioning apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the visual localization method of any one of claims 1 to 5 when executing the computer program.

8. A readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the visual localization method according to any one of claims 1 to 5.