CN110443263B

CN110443263B - Closed loop detection method, device, equipment and computer readable medium

Info

Publication number: CN110443263B
Application number: CN201810409332.3A
Authority: CN
Inventors: 门春雷; 刘艳光; 巴航; 张文凯; 徐进; 韩微; 郝尚荣; 郑行; 陈明轩
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-05-02
Filing date: 2018-05-02
Publication date: 2022-06-07
Anticipated expiration: 2038-05-02
Also published as: CN110443263A

Abstract

The embodiment of the application discloses a closed loop detection method and a closed loop detection device. One embodiment of the above method comprises: extracting the features of the target image to obtain feature points and a first feature vector of the target image; selecting image blocks in the target image according to the characteristic points; inputting an image block into a pre-established depth learning model to obtain a second feature vector of the image block, wherein the depth learning model is used for representing the corresponding relation between the image block and the feature vector; fusing the first feature vector and the second feature vector to obtain a fused vector; and determining whether closed loop occurs or not according to the similarity between the fusion vector and the historical fusion vector and a preset threshold. This embodiment reduces calculation errors.

Description

Closed loop detection method, device, equipment and computer readable medium

Technical Field

The embodiment of the application relates to the technical field of computer vision, in particular to a closed loop detection method and device.

Background

SLAM (simultaneous localization and mapping) is a basic problem and research hotspot in the field of mobile robot navigation, and whether SLAM capability is provided is considered as a precondition for whether the robot can realize autonomous navigation. The robot needs to judge whether the current position is in the visited environment area in the walking process, and the judgment is made according to the judgment result, namely the closed-loop detection problem is solved.

Disclosure of Invention

The embodiment of the application provides a closed loop detection method and device.

In a first aspect, an embodiment of the present application provides a closed-loop detection method, including: extracting the features of a target image to obtain feature points and a first feature vector of the target image; selecting image blocks in the target image according to the characteristic points; inputting the image block into a pre-established deep learning model to obtain a second feature vector of the image block, wherein the deep learning model is used for representing the corresponding relation between the image block and the feature vector; fusing the first feature vector and the second feature vector to obtain a fused vector; and determining whether closed loop occurs or not according to the similarity between the fusion vector and the historical fusion vector and a preset threshold.

In some embodiments, the extracting features of the target image includes: detecting the characteristic points of the target image by using a first characteristic extraction algorithm to obtain the characteristic points of the target image; and extracting the features of the target image by using a second feature extraction algorithm to obtain a first feature vector.

In some embodiments, the selecting an image block in the target image according to the detected feature points includes: determining a target image as a feature point selection area; selecting an area based on the feature points, and executing the following selection steps: selecting a feature point in the feature point selection area; determining a first coverage area by taking the selected characteristic point as a circle center and a first preset length as a radius; determining an image block comprising the feature points in the first coverage area; and determining whether the selection times of the characteristic points are equal to the preset times.

In some embodiments, the determining the image block including the feature point in the first coverage area includes: determining the minimum circumscribed circle of the characteristic points in the first coverage area; and determining the image block by taking the circle center of the minimum circumscribed circle as a center and a preset size.

In some embodiments, the selecting an image block in the target image according to the detected feature points includes: in response to the fact that the selection times are not equal to the preset times, determining a second coverage area by taking the selected characteristic point as a circle center and a second preset length as a radius; and determining the area except the second coverage area in the feature point selection area as a new feature point selection area, and continuing to execute the selection step.

In some embodiments, the fusing the first feature vector and the second feature vector to obtain a fused vector includes: performing hash mapping on the first feature vector to obtain a third feature vector; and fusing the third feature vector and the second feature vector to obtain the fused vector.

In some embodiments, the deep learning model is trained by the following steps: acquiring a training sample set, wherein the training sample comprises a first image, a second image, a third image and a feature vector of each image, the distance between the feature vector of the first image and the feature vector of the second image is smaller than a first preset threshold, and the distance between the feature vector of the third image and the feature vector of the first image and/or the distance between the feature vector of the third image and the feature vector of the second image is larger than a second preset threshold; and training to obtain the deep learning model by taking the first image, the second image and the third image of the training sample in the training sample set as input and taking the feature vector corresponding to the input image as output.

In a second aspect, an embodiment of the present application provides a closed loop detection apparatus, including: the image processing device comprises a feature extraction unit, a feature extraction unit and a feature extraction unit, wherein the feature extraction unit is configured to extract features of a target image to obtain feature points and a first feature vector of the target image; an image block selecting unit configured to select an image block in the target image according to the feature points; the image processing device comprises a feature vector determining unit, a first feature vector determining unit and a second feature vector determining unit, wherein the feature vector determining unit is configured to input a pre-established deep learning model into the image block to obtain a second feature vector of the image block, and the deep learning model is used for representing the corresponding relation between the image block and the feature vector; a feature vector fusion unit configured to fuse the first feature vector and the second feature vector to obtain a fusion vector; and the closed loop detection unit is configured to determine whether closed loop occurs or not according to the similarity between the fusion vector and the historical fusion vector and a preset threshold.

In some embodiments, the above-mentioned feature extraction unit is further configured to: detecting the characteristic points of the target image by using a first characteristic extraction algorithm to obtain the characteristic points of the target image; and extracting the features of the target image by using a second feature extraction algorithm to obtain a first feature vector.

In some embodiments, the image block selecting unit includes: a first region determination module configured to determine a target image as a feature point selection region; an image block selection module configured to select an area based on the feature points, performing the following selection steps: selecting a feature point in the feature point selection area; determining a first coverage area by taking the selected characteristic point as a circle center and a first preset length as a radius; determining an image block comprising the feature points in the first coverage area; and determining whether the selection times of the feature points are equal to the preset times.

In some embodiments, the image block selecting module is further configured to: determining the minimum circumscribed circle of the characteristic points in the first coverage area; and determining the image block by taking the circle center of the minimum circumscribed circle as a center and a preset size.

In some embodiments, the image block selecting module is further configured to: in response to the fact that the selection times are not equal to the preset times, determining a second coverage area by taking the selected characteristic point as a circle center and a second preset length as a radius; and determining the area except the second coverage area in the feature point selection area as a new feature point selection area, and continuing to execute the selection step.

In some embodiments, the feature vector fusion unit is further configured to: performing hash mapping on the first feature vector to obtain a third feature vector; and fusing the third feature vector and the second feature vector to obtain the fused vector.

In some embodiments, the apparatus further comprises a model training unit configured to: acquiring a training sample set, wherein the training sample comprises a first image, a second image, a third image and a feature vector of each image, the distance between the feature vector of the first image and the feature vector of the second image is smaller than a first preset threshold, and the distance between the feature vector of the third image and the feature vector of the first image and/or the distance between the feature vector of the third image and the feature vector of the second image is larger than a second preset threshold; and training to obtain the deep learning model by taking the first image, the second image and the third image of the training sample in the training sample set as input and taking the feature vector corresponding to the input image as output.

In a third aspect, an embodiment of the present application provides an apparatus, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method as described in any one of the embodiments of the first aspect.

The method and the device for detecting the closed loop provided by the embodiment of the application firstly extract the features of the target image to obtain the feature points and the first feature vector of the target image, then select the image block in the target image according to the feature points, input the obtained image block into a pre-established deep learning model to obtain the second feature vector of the image block, fuse the first feature vector and the second feature vector of the image block to obtain a fusion vector, and finally determine whether the closed loop occurs according to the similarity of the fusion vector and the historical fusion vector and a preset threshold. The method and the device of the embodiment fuse the feature vector obtained by deep learning and the feature vector obtained by extracting the features, and reduce the calculation error.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a closed loop detection method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a closed loop detection method according to the present application;

FIG. 4a is a schematic diagram of training samples for training a deep learning model in a closed-loop detection method according to the present application;

FIG. 4b is a schematic diagram of a triplet convolutional neural network in a closed-loop detection method according to the present application;

FIG. 5 is a flow chart of determining image blocks in a closed-loop detection method according to the present application;

FIG. 6 is a schematic structural diagram of one embodiment of a closed loop detection apparatus according to the present application;

FIG. 7 is a block diagram of a computer system suitable for use in implementing the apparatus of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the closed-loop detection method or closed-loop detection apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include a robot 101, a network 102, and a server 103. The network 102 is used to provide a medium for a communication link between the robot 101 and the server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The robot 101 interacts with a server 103 via a network 102 to receive or send messages or the like. Various sensors, such as a laser radar sensor, a millimeter wave radar sensor, a gyroscope, an accelerometer, etc., may be mounted on the robot 101 to collect motion information of the robot 101. The robot 101 may be an intelligent mobile robot with a positioning function, which can collect images, and may have a positioning application, an instant messaging tool, and the like installed thereon.

The server 103 may be a server that provides various services, such as a background server that provides support for images captured by the robot 101. The backend server may perform processing such as analysis on the received data such as the image, and feed back a processing result (e.g., a closed-loop detection result) to the robot 101.

The closed-loop detection method provided in the embodiment of the present application may be executed by the robot 101 or the server 103, and accordingly, the closed-loop detection apparatus may be provided in the robot 101 or the server 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of robots, networks, and servers in fig. 1 is merely illustrative. There may be any number of robots, networks, and servers, as desired for the implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a closed loop detection method according to the present application is shown. The closed loop detection method of the embodiment comprises the following steps:

step 201, extracting features of the target image to obtain feature points and a first feature vector of the target image.

In this embodiment, the robot may collect an image of a driving environment during driving, and then use the collected image as a target image. An executing body (such as a robot or a server in fig. 1) of the closed-loop detection method may extract features of the target image, resulting in feature points and a first feature vector of the target image. Here, the feature point is a point where the image gray value changes drastically or a point where the curvature is large on the edge of the image, and the first feature vector is a binary description of the feature point.

It should be noted that, when the execution subject is a server, it may acquire the target image from the robot by a wired or wireless manner. The wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

The executing subject may extract features of the target image using various Feature extraction algorithms, for example, the SIFT algorithm (Scale Invariant Feature Transform matching), ORB algorithm (ORB: an effective alternative to SIFT or SURF, proposed in one of the papers published by Ruble et al on ICCV in 2011), FREAK algorithm (FREAK: Fast Retina Keypoint, proposed in one of the papers on CVPR in 2012). It is understood that the executing subject may employ various algorithms to extract the features of the target image, and the feature points and the first feature vector may also be obtained by different feature extraction algorithms.

In some optional implementations of this embodiment, the step 201 may be implemented by the following steps not shown in fig. 2: extracting the feature points of the target image by using a first feature extraction algorithm to obtain the feature points of the target image; and extracting the features of the target image by using a second feature extraction algorithm to obtain a first feature vector.

In this implementation, the first feature extraction algorithm may be a FREAK algorithm, and features extracted by the FREAK algorithm have scale invariance and rotation invariance, and are good in robustness. The second feature extraction algorithm may be an ORB algorithm.

Step 202, selecting image blocks in the target image according to the characteristic points.

After determining the feature points of the target image, the executing entity may select image blocks in the target image according to the feature points. In this embodiment, the number and size of the image blocks may be set in advance. The size of each image block may be the same or different. For example, the number of image blocks may be set to 10 blocks, and the size may be 60 × 60. It is understood that the selected image block includes the above-described feature points.

Step 203, inputting the image block into a pre-established deep learning model to obtain a second feature vector of the image block.

After obtaining the image block, the execution subject may input the image block into a pre-established deep learning model to obtain a second feature vector of the image block. The deep learning model is used to represent the correspondence between the image blocks and the feature vectors, and may be, for example, a convolutional neural network.

And 204, fusing the first feature vector and the second feature vector to obtain a fused vector.

After the second feature vector of the image block is obtained, the execution subject may fuse the first feature vector of the target image and the second feature vector of the image block to obtain a fusion vector. The execution agent may derive the fused vector by adding or multiplying the first feature vector with the second feature vector, or other feasible means.

Step 205, determining whether a closed loop occurs according to the distance between the fusion vector and the historical fusion vector and a preset threshold.

After the execution main body obtains the fusion vector, the distance between the fusion vector and the historical fusion vector can be calculated, and then whether closed loop occurs or not is determined by combining a preset threshold value. Specifically, when the distance value is smaller than the preset threshold, it indicates that the similarity between the target image and the historical image is high, and it may be determined that a closed loop occurs, that is, the robot is in an environmental area that has been visited. When the distance value is greater than the preset threshold value, the similarity between the target image and the historical image is low, and the target image and the historical image can be regarded as not closed-loop. Here, the history image refers to an image acquired by the robot before acquiring the target image, and accordingly, the history fusion vector refers to a fusion vector of the history image.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the closed-loop detection method according to the present embodiment. In the application scenario of fig. 3, the robot 301 has moved in the direction of the dashed line, having taken an image including a wardrobe 302 as it passes the annotation location 303 upon entering the left room. When the left room is returned to the marking position 303 again after the left room is driven, the image including the wardrobe 302 is captured again. By processing the re-shot image including the wardrobe 302, determining that the distance between the fusion vector thereof and the fusion vector of the image including the wardrobe 302 shot for the first time is small, that is, the similarity between the re-shot image including the wardrobe 302 and the image including the wardrobe 302 shot for the first time is high, the robot 301 may determine that the marked position 303 is a position traveled before, and a closed loop occurs.

The closed-loop detection method provided by the embodiment of the application includes the steps of firstly extracting features of a target image to obtain feature points and a first feature vector of the target image, then selecting an image block from the target image according to the feature points, inputting the obtained image block into a pre-established deep learning model to obtain a second feature vector of the image block, fusing the first feature vector and the second feature vector of the image block to obtain a fusion vector, and finally determining whether closed-loop occurs or not according to the similarity between the fusion vector and a historical fusion vector and a preset threshold. According to the method, the feature vector obtained by deep learning and the feature vector obtained by feature extraction are fused, so that the calculation error is reduced.

In some optional implementations of this embodiment, the step 205 may further include the following steps not shown in fig. 2: performing Hash mapping on the first feature vector to obtain a third feature vector; and fusing the third feature vector and the second feature vector to obtain a fused vector.

After the first feature vector is obtained, hash mapping may be performed on the first feature vector to obtain a third feature vector, and then the third feature vector and the second feature vector are fused to obtain a fused vector. Hash mapping refers to compressing the original high-dimensional feature vector (first feature vector) into a lower-dimensional feature vector (third feature vector), and the above compression process does not lose the expression ability of the first feature vector to the features. In this way, the fused vector obtained by fusion can improve the distinguishability of the features.

In some optional implementations of the present embodiment, the deep learning model is obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises a first image, a second image, a third image and a feature vector of each image, the distance between the feature vector of the first image and the feature vector of the second image is smaller than a first preset threshold, and the distance between the feature vector of the third image and the feature vector of the first image and/or the distance between the feature vector of the third image and the feature vector of the second image is larger than a second preset threshold; and taking the first image, the second image and the third image of the training sample in the training sample set as input, taking the feature vector corresponding to the input image as output, and training to obtain the deep learning model.

In this implementation, the executing subject may first obtain a training sample set, where each training sample includes a first image, a second image, and a third image, and further includes a feature vector of each image. The distance between the feature of the first image and the feature vector of the second image is smaller than a first preset threshold value, namely the similarity between the first image and the second image is higher; the distance between the feature vector of the third image and the feature vector of the first image and/or the distance between the feature vector of the third image and the feature vector of the second image are/is larger than a second preset threshold value, namely the similarity between the third image and the first image or the similarity between the third image and the second image is lower.

The first image and the second image may be images of the same scene taken from different perspectives. The relationship between the first image, the second image and the third image may be represented by fig. 4. In FIG. 4, L₁And L₂For different scenes, C₁、C₂And C₃For cameras at different positions, C₁、 C₂And C₃Respectively shooting to obtain images X₁、X₂And X₃. Wherein, C₁、C₂Filming a scene L from different perspectives₁，C₃Shot scene L₂。

In practical applications, the images can be distinguished by labeling, and then the labeled images are stored in a database. Images of the same scene from different perspectives can be labeled with the same label. In this way, two images can be selected as the first image and the second image from among the plurality of images to which the same label is applied, and one image can be selected as the third image from among the plurality of images to which different labels are applied. That is, the training sample includes two images labeled with the same label and one image labeled with a different label.

The deep learning model can be obtained by training a triple Convolutional Neural network (Tri-CNN), the triple Convolutional Neural network (Tri-CNN) can include an input layer, a partition layer, a Convolutional layer, a pooling layer, a fully-connected layer and an output layer, and the deep learning model can include other layers except for the partition layer in the triple Convolutional Neural network.

When the triple convolutional neural network is trained, the first image, the second image and the third image form a triple, and the triple is input into the triple convolutional neural network as a whole. After the division layer, three images are input into three networks, respectively. As shown in fig. 4b, the three networks share weight W and bias b across all convolutional, pooled, and fully-connected layers. After the three networks output the feature vectors of the input images, the similarity between the first image and the second image and the similarity between the first image and the third image in the training sample can be measured by cosine similarity. Then, a cost function is determined according to the similarity, and parameters of each network are adjusted according to the cost function and preset threshold parameters. After the training of the triple convolutional neural network is completed, any network can be used as a deep learning model.

In the training process, images of untrained triple convolutional neural networks in the database can be selected regularly to test the trained triple convolutional neural networks, and then parameters of each network are adjusted according to the cost function.

In the training process, the executive subject may also generate a new training sample according to the training sample input thereto, which may specifically be implemented by the following steps: extracting features of each image of a training sample according to a deep learning model obtained by training to obtain a feature vector, and selecting an image with the largest distance between the feature vector and the feature vector of x as x from all images with the same label as x for the image x₊. Selecting the image with the minimum distance between the characteristic vector and the characteristic vector of x as x from all the images with different labels with x_-Is composed of<x,x₊,x_->A triplet. In the distance values between the feature vector of x and the feature vectors of other images of the same label, x and x₊The distance between the feature vectors of (a) may be the maximum of the above distance values. In the distance values between the feature vector of x and the feature vectors of other images of different labels, x and x_-The distance between the feature vectors of (a) may be the minimum of the above distance values. However, x and x₊Is much smaller than x and x_-Is determined by the distance between feature vectors of (1).

With continued reference to fig. 5, a flow 500 for determining an image block in the closed-loop detection method of the present application is shown. As shown in fig. 5, the present embodiment may determine the image block by:

step 501, determining a target image as a feature point selection area.

After extracting the feature points of the target image, the target image may be first selected as the feature point selection region. It is to be understood that the feature point selection area includes feature points. Then, based on the feature points, selecting areas, and executing the following selection steps:

step 502, selecting a feature point in the feature point selection area.

After the feature point selection area is determined, a feature point may be first selected in the feature point selection area. In this embodiment, when selecting a feature point, one feature point may be randomly selected from a plurality of feature points in the feature point selection region.

Step 503, determining a first coverage area by taking the selected feature point as a circle center and the first preset length as a radius.

After the feature points are selected, the selected feature points are used as circle centers, and the first preset length is used as a radius, so that a circle is obtained. The area covered by the circle in the target image is the first coverage area. In this embodiment, after the feature points of the target image are obtained, the feature points may be clustered first because the number of the feature points is large. Then, for each type of feature points, one image block is selected, so that the number of the image blocks can be greatly reduced, and the calculation amount is reduced. In this embodiment, it is assumed that all feature points included in the first coverage area belong to the same class. Thus, an image block can be determined from the feature points in the first coverage area.

Step 504 determines an image block comprising the feature points in the first coverage area.

After the first coverage area is determined, image blocks including feature points in the first coverage area may be determined. In this embodiment, the image block may be determined in various ways. For example, the first coverage area may be used as an image block, or a minimum bounding rectangle of the first coverage area may be used as the image block.

In some optional implementations of this embodiment, the image blocks may be determined by the following steps not shown in fig. 5: determining a minimum circumscribed circle of the feature points in the first coverage area; and determining the image block by taking the circle center of the minimum circumscribed circle as a center and a preset size.

Since the number of feature points included in the first coverage area may be many, and may be few. In order to make the fixed-size image block cover all the feature points as much as possible, in the present implementation, the minimum circumscribed circle of all the feature points in the first coverage area may be determined first. And then determining the image block by taking the circle center of the minimum circumscribed circle as a center and a preset size. Thus, the feature point in the first coverage area is almost located in the central area of the image block.

And 505, judging whether the selection times of the feature points are equal to the preset times.

In this embodiment, an image block is obtained every time one feature point is selected, and therefore, the number of times of selecting the feature points can be limited by presetting the number of the image blocks, so as to reduce the amount of calculation. Specifically, the number of image blocks may be set to 10 in advance, that is, the preset number of times is 10.

If the number of times of selecting the feature points is equal to the preset number of times, it indicates that the image block selection is completed, and step 508 is executed. If the number of times of selecting the feature point is not equal to the preset number of times, the process continues to step 506.

Step 506, a second coverage area is determined by taking the selected feature point as a circle center and a second preset length as a radius.

When it is determined that the number of the determined image blocks is smaller than the preset number, another circle may be determined with the selected feature point as a center of a circle and the second preset length as a radius. The area covered by the circle in the target image is the second coverage area.

In some optional implementations of this embodiment, the second predetermined length is twice the first predetermined length.

In step 507, determining the area of the feature point selection area except the second coverage area as a new feature point selection area, and continuing to execute step 502.

After the second coverage area is determined, the area of the feature point selection area other than the second coverage area is used as a new feature point selection area, and then the step 502 is continuously executed.

Therefore, after the feature point is selected in the new feature point selection area next time, the determined new first coverage area cannot be covered with the previously determined first coverage area, and the situation that the determined image blocks are overlapped is reduced as much as possible, so that the determined image blocks can reflect the features of the target image as comprehensively as possible.

And step 508, ending.

The closed-loop detection method provided by the embodiment of the application can cluster the feature points of the target image, and then determine the image blocks according to the clustered feature points, so that the number of the image blocks is effectively reduced, the calculated amount is reduced, and the operation efficiency is improved.

With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a closed loop detection apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 6, the closed-loop detection apparatus 600 of the present embodiment includes a feature extraction unit 601, an image block selection unit 602, a feature vector determination unit 603, a feature vector fusion unit 604, and a closed-loop detection unit 605.

The feature extraction unit 601 is configured to extract features of the target image, and obtain feature points and a first feature vector of the target image.

The image block selecting unit 602 is configured to select an image block in the target image according to the feature points.

The feature vector determination unit 603 is configured to input the image block into a pre-established deep learning model to obtain a second feature vector of the image block, where the deep learning model is used to represent a corresponding relationship between the image block and the feature vector.

A feature vector fusion unit 604 configured to fuse the first feature vector and the second feature vector to obtain a fusion vector.

And a closed loop detection unit 605 configured to determine whether a closed loop occurs according to the similarity between the fusion vector and the historical fusion vector and a preset threshold.

In some optional implementations of the present embodiment, the feature extraction unit 601 may be further configured to: detecting the characteristic points of the target image by using a first characteristic extraction algorithm to obtain the characteristic points of the target image; and extracting the features of the target image by using a second feature extraction algorithm to obtain a first feature vector.

In some optional implementations of the present embodiment, the image block selecting unit 602 may further include a first area determining module and an image block selecting module, which are not shown in fig. 6.

The first area determining module is configured to determine that the target image is a feature point selection area.

An image block selection module configured to select an area based on the feature points, performing the following selection steps: selecting a feature point in the feature point selection area; determining a first coverage area by taking the selected characteristic point as a circle center and a first preset length as a radius; determining an image block comprising the feature points in the first coverage area; and determining whether the selection times of the feature points are equal to the preset times.

In some optional implementations of this embodiment, the image block selecting module is further configured to: determining a minimum circumscribed circle of the feature points in the first coverage area; and determining the image block by taking the circle center of the minimum circumscribed circle as a center and a preset size.

In some optional implementations of this embodiment, the image block selecting module may be further configured to: in response to the fact that the selection times are not equal to the preset times, determining a second coverage area by taking the selected characteristic point as a circle center and a second preset length as a radius; and determining the area except the second coverage area in the feature point selection area as a new feature point selection area, and continuing to execute the selection step.

In some optional implementations of the present embodiment, the feature vector fusing unit 604 may be further configured to: performing Hash mapping on the first feature vector to obtain a third feature vector; and fusing the third feature vector and the second feature vector to obtain a fused vector.

In some optional implementations of this embodiment, the apparatus 600 may further include a model training unit not shown in fig. 6, where the model training unit is configured to: acquiring a training sample set, wherein the training sample comprises a first image, a second image, a third image and a feature vector of each image, the distance between the feature vector of the first image and the feature vector of the second image is smaller than a first preset threshold, and the distance between the feature vector of the third image and the feature vector of the first image and/or the distance between the feature vector of the third image and the feature vector of the second image is larger than a second preset threshold; and taking the first image, the second image and the third image of the training sample in the training sample set as input, taking the feature vector corresponding to the input image as output, and training to obtain the deep learning model.

The closed-loop detection device provided by the above embodiment of the application fuses the feature vector obtained by deep learning and the feature vector obtained by feature extraction, thereby reducing the calculation error.

It should be understood that units 601 to 605 recited in the closed loop detection apparatus 600 correspond to respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the closed loop detection method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.

Referring now to FIG. 7, a block diagram of a computer system 700 suitable for use in implementing a robot or server of an embodiment of the present application is shown. The robot or server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application.

It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprises a feature extraction unit, an image block selection unit, a feature vector determination unit, a feature vector fusion unit and a closed loop detection unit. Here, the names of these units do not constitute a limitation to the unit itself in some cases, and for example, the feature extraction unit may also be described as a "unit that extracts a feature of a target image".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: extracting the features of a target image to obtain feature points and a first feature vector of the target image; selecting image blocks in the target image according to the characteristic points; inputting the image block into a pre-established deep learning model to obtain a second feature vector of the image block, wherein the deep learning model is used for representing the corresponding relation between the image block and the feature vector; fusing the first feature vector and the second feature vector to obtain a fused vector; and determining whether closed loop occurs or not according to the similarity between the fusion vector and the historical fusion vector and a preset threshold.

The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A closed loop detection method, comprising:

extracting the features of a target image to obtain feature points and a first feature vector of the target image;

selecting image blocks in the target image according to the characteristic points;

inputting the image block into a pre-established deep learning model to obtain a second feature vector of the image block, wherein the deep learning model is used for representing the corresponding relation between the image block and the feature vector;

fusing the first feature vector and the second feature vector to obtain a fused vector;

determining whether closed loop occurs or not according to the similarity between the fusion vector and the historical fusion vector and a preset threshold;

selecting an image block in the target image according to the feature points, wherein the selecting the image block comprises the following steps:

and clustering the feature points, and selecting an image block from the target image for each type of feature points.

2. The method of claim 1, wherein said extracting features of the target image comprises:

detecting the characteristic points of the target image by using a first characteristic extraction algorithm to obtain the characteristic points of the target image;

and extracting the features of the target image by using a second feature extraction algorithm to obtain a first feature vector.

3. The method according to claim 1, wherein the selecting an image block in the target image according to the detected feature points comprises:

determining a target image as a feature point selection area;

selecting an area based on the feature points, and executing the following selection steps: selecting a feature point in the feature point selection area; determining a first coverage area by taking the selected characteristic point as a circle center and a first preset length as a radius; determining an image block comprising the feature points in the first coverage area; and determining whether the selection times of the feature points are equal to the preset times.

4. The method of claim 3, wherein the determining image blocks comprising feature points in the first coverage area comprises:

determining a minimum circumscribed circle of feature points in the first coverage area;

and determining the image block by taking the circle center of the minimum circumscribed circle as a center and a preset size.

5. The method according to claim 3, wherein the selecting an image block in the target image according to the detected feature points comprises:

in response to the fact that the selection times are not equal to the preset times, determining a second coverage area by taking the selected characteristic point as a circle center and a second preset length as a radius; and determining the area except the second coverage area in the feature point selection area as a new feature point selection area, and continuing to execute the selection step.

6. The method of claim 1, wherein said fusing the first feature vector and the second feature vector to obtain a fused vector comprises:

performing hash mapping on the first feature vector to obtain a third feature vector;

and fusing the third feature vector and the second feature vector to obtain the fused vector.

7. The method of claim 1, wherein the deep learning model is trained by:

acquiring a training sample set, wherein the training sample comprises a first image, a second image, a third image and a feature vector of each image, the distance between the feature vector of the first image and the feature vector of the second image is smaller than a first preset threshold, and the distance between the feature vector of the third image and the feature vector of the first image and/or the distance between the feature vector of the third image and the feature vector of the second image is larger than a second preset threshold;

and taking the first image, the second image and the third image of the training sample in the training sample set as input, taking the feature vector corresponding to the input image as output, and training to obtain the deep learning model.

8. A closed loop detection apparatus comprising:

the image processing device comprises a feature extraction unit, a feature extraction unit and a feature extraction unit, wherein the feature extraction unit is configured to extract features of a target image to obtain feature points and a first feature vector of the target image;

the image block selecting unit is configured to select an image block in the target image according to the characteristic points;

the image processing device comprises a feature vector determining unit, a first feature vector determining unit and a second feature vector determining unit, wherein the feature vector determining unit is configured to input a pre-established deep learning model into the image block to obtain a second feature vector of the image block, and the deep learning model is used for representing the corresponding relation between the image block and the feature vector;

a feature vector fusing unit configured to fuse the first feature vector and the second feature vector to obtain a fused vector;

the closed loop detection unit is configured to determine whether closed loop occurs or not according to the similarity between the fusion vector and a historical fusion vector and a preset threshold;

the image block selection unit is further configured to:

and clustering the feature points, and selecting an image block in the target image for each type of feature points.

9. The apparatus of claim 8, wherein the feature extraction unit is further configured to:

10. The apparatus of claim 8, wherein the image block selecting unit comprises:

a first region determination module configured to determine a target image as a feature point selection region;

an image block selection module configured to select an area based on the feature points, performing the following selection steps: selecting a feature point in the feature point selection area; determining a first coverage area by taking the selected characteristic point as a circle center and a first preset length as a radius; determining an image block comprising the feature points in the first coverage area; and determining whether the selection times of the characteristic points are equal to the preset times.

11. The apparatus of claim 10, wherein the tile selection module is further configured to:

12. The apparatus of claim 10, wherein the tile selection module is further configured to:

13. The apparatus of claim 8, wherein the feature vector fusion unit is further configured to:

14. The apparatus of claim 8, wherein the apparatus further comprises a model training unit configured to:

acquiring a training sample set, wherein the training sample set comprises a first image, a second image, a third image and a feature vector of each image, the distance between the feature vector of the first image and the feature vector of the second image is smaller than a first preset threshold value, and the distance between the feature vector of the third image and the feature vector of the first image and/or the distance between the feature vector of the third image and the feature vector of the second image is larger than a second preset threshold value;

15. An apparatus, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.