CN115641499B

CN115641499B - Photographing real-time positioning method, device and storage medium based on street view feature library

Info

Publication number: CN115641499B
Application number: CN202211277856.4A
Authority: CN
Inventors: 李传广
Original assignee: Perception World Beijing Information Technology Co ltd
Current assignee: Perception World Beijing Information Technology Co ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-07-18
Anticipated expiration: 2042-10-19
Also published as: CN115641499A

Abstract

The invention provides a photographing real-time positioning method, equipment and a storage medium based on a street view feature library, wherein the positioning method adopts a deep local feature extraction network to extract deep local intensive features of large-scale street view data, performs feature screening to obtain key features to construct a large-scale efficient index feature vector library, performs large-scale candidate street view image retrieval, and improves image retrieval efficiency; and carrying out fine matching on the candidate street view images obtained by searching, and obtaining high-precision geographic positioning of the pictures shot by the mobile phone.

Description

Photographing real-time positioning method, device and storage medium based on street view feature library

Technical Field

The invention relates to the technical field of image processing, in particular to the field of real-time image geographic position positioning.

Background

Image geographic position positioning is a research for finding the geographic position of an image or video shooting, and the task can be applied to various real scenes, so that the image geographic position positioning has wide application prospect and is also interesting to more researchers. Image geographic position location currently relies primarily on a large number of images of ground view with GPS information as references to determine the position information of the query image. The existing image geographic position positioning method mainly converts the geographic positioning problem into cross-view matching, image retrieval, classification and other methods. The methods have strong interference and low positioning accuracy.

Disclosure of Invention

In order to improve the geographic position positioning precision of shot images, deep local intensive feature extraction is carried out on large-scale street view data by adopting a deep local feature extraction network, feature screening is carried out to obtain key features, a large-scale efficient index feature vector library is constructed, large-scale candidate street view image retrieval is carried out, and image retrieval efficiency is improved; and carrying out fine matching on the candidate street view images obtained by searching, and obtaining high-precision geographic positioning of the pictures shot by the mobile phone.

The embodiment of the invention provides a photographing real-time positioning method based on a street view feature library, which comprises the following steps:

s1, acquiring large-scale street view image data, extracting feature points to form first features, selecting key feature points based on the first features to form first key feature points, and constructing a large-scale street view image feature index library; the selecting key feature points comprises: the feature selection adopts a classifier based on attention to compare the correlation among features, an attention module is connected after the output of ResNet50conv4_x, K local features with the largest score are selected according to the local feature expression score of each feature graph, the K value can be adjusted as required, and the attention module adopts a structure of two 1x1 convolution layers followed by a softplus activation function to obtain the score, wherein the calculation formula of the softplus activation function is as follows:

Softplus(x)＝log(1+e ^x )

s2, extracting features of the mobile phone picture to be positioned by adopting a feature extraction method which is the same as that of the image feature library, wherein the extracted features are second features; selecting key feature points to be matched as second key feature points;

s3, carrying out similarity measurement calculation on the second key feature points and the first key feature points, and rapidly searching candidate street view images matched with the second key feature points according to an index library;

s4, matching is carried out according to the first features of the candidate street view images and the second features extracted from the mobile phone images, and high-precision matching street view images are obtained;

and S5, obtaining the high-precision positioning of the mobile phone picture to be positioned according to the high-precision matching street view image.

In an alternative embodiment, the large-scale street view image data packet in S1 is virtually spliced, and the whole street view image is virtually restored.

In an alternative embodiment, after the whole scenery street view image is restored, a pyramid index is built for the virtual whole Jing Jiejing image; and building a high-efficiency similarity searching tool and packaging the high-efficiency index API.

In an alternative embodiment, quickly retrieving, in S3, the candidate street view image matching the second key feature point according to the index base includes: and calling an index API to quickly retrieve the matched candidate street view images.

In an alternative embodiment, the feature extraction method is an offline dense invariant feature extraction algorithm.

In an alternative embodiment, in S1, vectorizing the first key feature point and the second key feature point is further included.

In an alternative embodiment, the offline dense invariant feature extraction adopts a resnet-50 network for fine tuning, and the network is trained by using the paired illumination difference and shooting angle difference data, so that the feature extractor learns the invariant features of illumination and geometric variation images.

In an alternative embodiment, the original vector set is constructed and packaged into an index file and cached to provide real-time query calculation; when the index file is built for the first time, two processes of training and adding are carried out; if there is a new vector to be added to the index file, the add operation implements an incremental index.

Another embodiment of the present invention also provides a computer readable storage medium storing computer program code, which when executed by a computer device, performs the method for positioning a photo based on the street view feature library according to any one of the above.

Still another embodiment of the present invention provides a computer apparatus including: a memory and a processor;

the memory is used for storing computer instructions;

and the processor executes the computer instructions stored in the memory to enable the computer equipment to execute any of the photographing real-time positioning methods based on the street view feature library.

The invention has the following technical effects:

1. the resolution ratio of the mobile phone picture is inconsistent with that of the street view picture, and the mobile phone picture and the street view picture belong to heterogeneous image matching.

2. The invention is divided into quick matching and fine matching, the characteristic points used by the two matching are the same, the characteristic points are extracted by the same characteristic extractor, and the extracted characteristic points are selected and the key points are screened for the purpose of quickly acquiring candidate street view images by the front search (quick matching); and the matching of the back heterologous images adopts all the extracted characteristic points to carry out fine matching, thereby improving the matching accuracy.

3. The dense feature extraction method adopts a network of resnet-50 to conduct fine adjustment, and improves the discrimination capability of local expression through fine adjustment, so that deep features are obtained. The data of the matched large illumination difference (a large number of day and night image pairs) and the large shooting angle difference (landmark image pairs shot by different people) are utilized to train the network, so that the feature extractor can learn the invariance features of the illumination, geometry and other variable images. Meanwhile, in order to cope with larger scale differences, a discrete scale pyramid is constructed on the street view image, and feature extraction is carried out on each scale image to obtain features describing areas with different sizes and different receptive fields. The feature points have enough abstract and can obtain higher positioning precision.

4. In the invention, the key feature points are screened by the large-scale street view image data and the mobile phone image, so that the primary screening efficiency is improved. The feature index is based on the key point index (i.e. preliminary screening), so that the index efficiency is high, the fine screening is based on the first feature to match, the first feature has more points, and the search image can be more accurate.

5. According to the invention, the correlation among the features is compared by adopting the attention-based classifier through feature selection, the attention module is connected after the output of the ResNet50conv4_x, K local features with the largest score are selected according to the local feature expression score of each feature graph, the K value can be adjusted as required as the key feature, and the attention module adopts a structure that two 1x1 convolution layers are connected with a softplus activation function to obtain the score, so that the method is simple, the speed is high, and the accuracy is high.

6. The pyramid index is constructed by adopting the virtual whole Jing Jiejing image, so that matching indexes on different scales can be conveniently carried out later.

7. The invention adopts the efficient similarity searching tool, and can improve the searching efficiency.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a diagram of a mobile phone picture positioning technology route in an embodiment of the invention;

FIG. 2 is a schematic diagram of a dense feature extraction network in an embodiment of the invention;

FIG. 3 is a schematic diagram of an attention module structure in an embodiment of the present invention;

FIG. 4 is a schematic diagram of an index build data flow in an embodiment of the invention;

FIG. 5 is a flowchart of index api encapsulation in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In connection with the following figures:

FIG. 1 is a technical route of a photographing real-time positioning method based on a street view feature library in an embodiment of the invention;

FIG. 5 is a flowchart of an index API package in an embodiment of the invention.

An embodiment of the present invention provides a photographing real-time positioning method based on a street view feature library, as shown in fig. 1, the method includes:

s1, acquiring large-scale street view image data, extracting feature points to form first features, selecting key feature points based on the first features to form first key feature points, and constructing a large-scale street view image feature index library;

in the step, large-scale street view image data are acquired and processed, specifically, in the step S1, the large-scale street view image data comprise street view tile data, virtual splicing is carried out on the acquired large-scale street view tile data, and the whole street view image is virtually restored.

After the whole scenery street view image is restored, a pyramid index is constructed for the virtual whole Jing Jiejing image, so that matching indexes on different scales can be conveniently carried out later;

in addition, an efficient similarity search tool is built and efficient index API packaging is performed.

And performing offline intensive invariant feature extraction on the large-scale street view image data constructing the pyramid index, and extracting feature points of the large-scale street view image data.

Selecting the features of the extracted dense feature points, selecting key feature points, vectorizing the key features, and reducing the calculation amount of the subsequent indexes; and then, establishing an efficient index file by adopting a similarity searching tool to form a large-scale street view image characteristic index library which is used as a quick index library of the mobile phone picture reference image to be positioned.

s3, carrying out similarity measurement calculation on the second key feature points and the first key feature points, and rapidly searching candidate street view images matched with the second key feature points according to an index library; and rapidly searching candidate street view images matched with the second key feature points according to an index library as follows: and calling an index API to quickly retrieve the matched candidate street view images.

S4, further matching is carried out according to the first features of the candidate street view images and the second features extracted from the mobile phone images, and high-precision matching street view images are obtained;

The mobile phone picture to be positioned is precisely matched with the searched candidate street view image, the extracted street view image dense features are further matched with the mobile phone image extracted dense features, and the high-precision matched street view image is obtained, so that the high-precision positioning of the mobile phone picture to be positioned is realized.

In the method, dense feature extraction adopts a resnet-50 network for fine adjustment, and the network is trained by using paired illumination difference and shooting angle difference data, so that the feature extractor learns invariance features of illumination and geometric variation images.

The design of dense feature extraction networks is mainly based on the following points:

the previous layers of the convolutional network have very small receptive fields, and the obtained characteristics are local characteristics such as edges, corner points and the like of the relative bottom layer, but the positioning accuracy is higher; the deeper the network layer number is, the more abstract the extracted features are, the more global the information is, the more interference caused by the heterogeneous images can be resisted, but the poorer the positioning accuracy is. Therefore, in order to enable the feature points to have enough abstract and obtain higher positioning accuracy, the intensive feature extraction adopts a resnet-50 network for fine adjustment, and the discrimination capability of local expression is improved through fine adjustment, so that deep features are obtained.

Because the images shot by different persons and different angles on the same landmark have no fixed angle difference, the images shot by different persons have large difference.

Therefore, the data of large illumination difference (a large number of day and night image pairs) and large shooting angle difference (landmark image pairs shot by different people) which are already paired are utilized to train the network, so that the feature extractor can learn the invariance features of the illumination, geometry and other variable images.

Meanwhile, in order to cope with larger scale differences, a discrete scale pyramid is constructed on the street view image, and feature extraction is carried out on each scale image to obtain features describing areas with different sizes and different receptive fields. A scale range of from 0.25 to 2.0 is provided, using 8 different scales of 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0 respectively. As shown in fig. 2:

the figure is divided into three parts: the left side is the ResNet50 overall structure, the middle is the ResNet50 each Stage concrete structure, and the right side is the Bottleneck concrete structure.

(1) The ResNet50 overall structure shows the Backbone portion of ResNet without the global averaging pooling layer and full connectivity layer in ResNet.

(2) ResNet is divided into 5 stages, wherein Stage0 has a relatively simple structure, is used for preprocessing an INPUT, and the last 4 stages are all composed of Bottleneck and have relatively similar structures. Stage1 contains 3 bottlenecks, the remaining 3 stages including 4, 6, 3 respectively.

(3) The network uses a structure of 2 types of Bottleneck, 2 types of Bottleneck corresponding to 2 cases respectively: the number of input and output channels is the same.

In S1, in the method of selecting key feature points: the feature selection adopts a classifier based on attention to compare the correlation among features, an attention module is connected after the output of ResNet50conv4_x, K local features with the largest score are selected as key features according to the local feature expression score of each feature map, the K value can be adjusted as required, and the attention module adopts a structure of two 1x1 convolution layers followed by a softplus activation function to obtain the score. The attention module structure is shown in fig. 3, in which the calculation formula of the softplus activation function is as follows:

Softplus(x)＝log(1+e ^x )

in the step of building an efficient similarity search tool and performing efficient index API encapsulation, the efficient index build data flow is shown in fig. 4, and the index API encapsulation flow is shown in fig. 5:

before similarity searching of query vectors using similarity searching, the original vector set needs to be packaged into an index file (index) and cached in memory to provide real-time query computation. Two processes are required to be trained and added when constructing an index file for the first time. There may be an add operation to implement the incremental index if a new vector needs to be added to the index file.

Because the mobile phone picture and the street view image are the heterogeneous images, the heterogeneous image matching algorithm is used as the image matching algorithm.

The matching method of the invention is divided into quick matching and fine matching, the characteristic points used by the two matching are the same, the characteristic points are extracted by the same characteristic extractor, and the extracted characteristic points are selected and the key points are screened out for quickly acquiring candidate street view images by the front search (quick matching); and the matching of the back heterologous images adopts all the extracted characteristic points to carry out fine matching, thereby improving the matching accuracy.

In addition, the resolution ratio of the mobile phone picture is inconsistent with that of the street view image, and the pyramid is established to match on different levels, so that the accuracy of matching the heterogeneous image is higher.

The invention provides a photographing real-time positioning method based on a street view feature library, which comprises the following steps:

(1) Obtaining and processing large-scale street view image data, virtually splicing the obtained large-scale street view tile data, virtually recovering the whole-view street view image, and then constructing pyramid indexes for the virtual whole Jing Jiejing image, so that matching indexes on different scales can be conveniently carried out later;

(2) Building a high-efficiency similarity searching tool, improving the searching efficiency and encapsulating the high-efficiency index API;

(3) Performing offline intensive invariant feature extraction on the large-scale street view image data constructing the pyramid index, and extracting feature points of the large-scale street view image data;

(4) Selecting the features of the extracted dense feature points, selecting key feature points, vectorizing the key features, and reducing the calculation amount of the subsequent indexes; then, a similarity searching tool is adopted to establish an efficient index file, and a large-scale street view image characteristic index library is formed and is used as a quick index library of a mobile phone picture reference image to be positioned;

(5) The mobile phone picture to be positioned adopts a feature extraction algorithm with the same image feature library to acquire image invariance features, performs similarity measurement calculation with a large-scale street view image feature library, and calls an index API to quickly search out matched candidate street view images;

(6) And carrying out fine matching on the mobile phone picture to be positioned and the searched candidate street view image, and further matching by utilizing the extracted street view image dense features and the mobile phone image extracted dense features to obtain a high-precision matching street view image, thereby realizing high-precision positioning of the mobile phone picture to be positioned.

The high-precision geographic positioning for acquiring the picture shot by the mobile phone is completed through the method.

On the other hand, the positioning method provided by the embodiment of the application can be deployed on computer equipment.

The computer device may include: input unit, processor unit, communication unit, memory cell, output unit and power supply.

An input unit for inputting or loading image data,

a processor unit for performing processing and calculation of image data,

a communication unit for realizing the receiving and transmitting of data,

a memory unit for storing computer instructions, and a database,

and the output unit is used for outputting the processing result.

The computer device provided by the embodiment of the application can be used for executing the photographing real-time method based on the street view feature library in the previous embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state disk), among others.

It will be appreciated that in addition to the foregoing, some conventional structures and conventional methods are included, and as such are well known, they will not be described in detail. But this does not mean that the structures and methods do not exist in the present invention.

It will be appreciated by those skilled in the art that while a number of exemplary embodiments of the invention have been shown and described herein in detail, many other variations or modifications which are in accordance with the principles of the invention may be directly ascertained or inferred from the present disclosure without departing from the spirit and scope of the invention. Accordingly, the scope of the present invention should be understood and deemed to cover all such other variations or modifications.

Claims

1. A photographing real-time positioning method based on a street view feature library is characterized by comprising the following steps:

s1, obtaining large-scale street view image data, wherein the large-scale street view image data comprises street view tile data, virtually splicing the obtained large-scale street view tile data, virtually recovering a whole-view street view image, constructing a pyramid index for a virtual whole Jing Jiejing image, constructing a high-efficiency similarity search tool, and packaging an API (application program interface) for the high-efficiency index; extracting feature points from the acquired large-scale street view image data to form first features, selecting key feature points based on the first features to form first key feature points, and constructing a large-scale street view image feature index library; the selecting key feature points comprises: the feature selection adopts a classifier based on attention to compare the correlation among features, an attention module is connected after the output of ResNet50conv4_x, K local features with the largest score are selected according to the local feature expression score of each feature graph, K values can be adjusted as required, and the attention module adopts a structure of two 1x1 convolution layers followed by a softplus activation function to obtain the score, wherein the calculation formula of the softplus activation function is as follows:

Softplus(x)＝log(1+e ^x )

2. The positioning method of claim 1, wherein the quickly retrieving, in S3, the candidate street view image matching the second key feature point according to the index base comprises: and calling an index API to quickly retrieve the matched candidate street view images.

3. The positioning method of claim 1 wherein the feature extraction method is an offline dense invariant feature extraction algorithm.

4. The positioning method of claim 1, further comprising vectorizing the first key feature point and the second key feature point in S1.

5. The positioning method of claim 3 wherein the offline dense invariant feature extraction is fine-tuned using a resnet-50 network, and the network is trained using paired illumination differences and photographing angle differences, so that the feature extractor learns invariance features of illumination and geometrically-varying images.

6. The positioning method as set forth in claim 4, wherein the original vector set is packaged into an index file and cached to provide real-time query computation; when the index file is built for the first time, two processes of training and adding are carried out; if there is a new vector to be added to the index file, the add operation implements an incremental index.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores computer program code which, when executed by a computer device, performs a method of real-time positioning of photographs based on a street view feature library according to any of the preceding claims 1-6.

8. A computer device, comprising: a memory and a processor;

the memory is used for storing computer instructions;

the processor executes the computer instructions stored in the memory to cause the computer device to perform the method for real-time positioning of photographs based on a street view feature library according to any one of claims 1 to 6.