CN111089579A

CN111089579A - Heterogeneous binocular SLAM method and device and electronic equipment

Info

Publication number: CN111089579A
Application number: CN201811231493.4A
Authority: CN
Inventors: 杨帅
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2018-10-22
Filing date: 2018-10-22
Publication date: 2020-05-01
Anticipated expiration: 2038-10-22
Also published as: CN111089579B

Abstract

The application discloses a heterogeneous binocular SLAM method, which comprises the following steps: respectively constructing a first monocular SLAM and a second monocular SLAM for a first monocular camera and a second monocular camera, wherein the first monocular camera and the second monocular camera are heterogeneous cameras; respectively correcting the scales of the first monocular SLAM and the second monocular SLAM by utilizing feature points positioned in a dual-purpose common visual field of the first monocular camera and the second monocular camera; and carrying out data fusion on the corrected first monocular SLAM and the corrected second monocular SLAM to construct a binocular SLAM. Also disclosed are a heterogeneous binocular SLAM apparatus, an electronic device and a non-transitory storage medium. By the heterogeneous binocular SLAM method, the binocular SLAM can be constructed more accurately, a wider visual angle and a deeper depth of field can be obtained, objects in different depth of field can be detected, and accuracy and precision of the binocular SLAM are improved.

Description

Heterogeneous binocular SLAM method and device and electronic equipment

Technical Field

The present application relates to the field of simultaneous localization and mapping (SLAM) technology, and more particularly, to a heterogeneous binocular SLAM method, a heterogeneous binocular SLAM device, and an electronic apparatus.

Background

Current binocular SLAM systems employ homogeneous cameras to capture images or video. However, such a homogeneous binocular SLAM system is limited in detecting the depth of a scene due to the use of homogeneous cameras.

Accordingly, there is a need for an improved binocular SLAM scheme.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. Embodiments of the present application provide a heterogeneous binocular SLAM method, a heterogeneous binocular SLAM apparatus, an electronic device, and a non-transitory storage medium, which can implement processing for different depths of a scene.

According to an aspect of the present application, there is provided a heterogeneous binocular SLAM method, the heterogeneous binocular SLAM including: respectively constructing a first monocular SLAM and a second monocular SLAM for a first monocular camera and a second monocular camera, wherein the first monocular camera and the second monocular camera are heterogeneous cameras and constitute a heterogeneous binocular camera; correcting the scale of the first monocular SLAM and the second monocular SLAM using feature points located within a common field of view of the first monocular camera and the second monocular camera; and carrying out data fusion on the corrected first monocular SLAM and the corrected second monocular SLAM to construct a binocular SLAM.

According to another aspect of the present application, there is provided a heterogeneous binocular SLAM device, including: the monocular SLAM building module is used for respectively building a first monocular SLAM and a second monocular SLAM for the first monocular camera and the second monocular camera; the monocular SLAM correcting module is used for respectively correcting the first monocular SLAM and the second monocular SLAM by utilizing the feature points in the common visual field of the first monocular camera and the second monocular camera; and a binocular SLAM construction module, which is used for constructing the binocular SLAM by using the corrected first monocular SLAM and the second monocular SLAM.

According to still another aspect of the present invention, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the heterogeneous binocular SLAM method described above.

According to yet another aspect of the present application, there is provided a non-transitory storage medium having stored thereon instructions for executing the heterogeneous binocular SLAM method described above.

Compared with the isomorphic binocular SLAM in the prior art, the technical scheme of the invention realizes a plurality of beneficial technical effects. For example, by adopting binocular cameras with different structures and utilizing feature points in the common visual field of a first monocular camera and a second monocular camera in heterogeneous binocular cameras to correct the scales of the first monocular SLAM and the second monocular SLAM, the invention constructs the binocular SLAM more accurately, obtains a wider visual angle and a deeper depth of field, can detect items positioned at different depth of field, and improves the accuracy and precision of the binocular SLAM.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 illustrates a flowchart of a heterogeneous binocular SLAM method according to an embodiment of the present application.

Fig. 2 illustrates a scene schematic of binocular vision.

Fig. 3 illustrates a functional block diagram of a heterogeneous binocular SLAM device according to an embodiment of the present application.

FIG. 4 illustrates a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Summary of the application

As described above, current binocular SLAM systems employ homogeneous cameras to capture images or video. However, in an automatic driving assistance system of a motor vehicle or a navigation system of an unmanned aerial vehicle, it is often necessary to detect objects located at different distances from a camera, and a conventional isomorphic binocular camera cannot meet the requirement. Therefore, by adopting heterogeneous cameras with different visual angles, images with different depths of field can be shot, so that objects with different distances from the cameras can be detected.

In view of the technical problem, the present application provides a heterogeneous binocular SLAM method, a heterogeneous binocular SLAM device, and an electronic device, which may acquire image videos by using heterogeneous cameras with different viewing angles, and correct monocular SLAMs respectively constructed for the heterogeneous cameras by using three-dimensional coordinates of a world coordinate system of a common view point located in a common view field of the heterogeneous cameras, thereby detecting objects located at different depths of a scene.

Specifically, slam of left and right monocular cameras of the heterogeneous binocular camera only uses image observation on respective camera time sequences, namely matching between feature points is performed on a plurality of frames of images of the left eye camera in a time domain, matching between feature points of the right eye camera is also performed on a plurality of frames of images of the right eye camera, and no interaction exists between the left camera and the right camera at the moment. And the left monocular slam and the right monocular slam realize the three-dimensional reconstruction of respective visual field scenes and the acquisition of camera poses. Because the monocular camera has a scale problem, that is, the acquired map and the camera pose can be arbitrarily scaled, it can be understood that the left eye slam has its own scale coefficient s1, and the right eye camera also has its own scale coefficient s 2. The calculation of the two scale coefficients needs to match the feature points in the common visual field of the left and right eye cameras, then three-dimensional coordinates of the feature points on the physical world are obtained, and because the feature points have three-dimensional coordinates under respective scales in the left and right monocular slams, the proportion between the coordinates of the feature points on the physical vision and the coordinates with uncertain scales is calculated, and the scale coefficients of the left and right eye cameras can be obtained. And correcting each monocular SLAM by using the scale coefficients of the left eye camera and the right eye camera, and forming a binocular SLAM.

It should be noted that the above-mentioned basic concept of the present application can be applied to not only the automatic driving of the motor vehicle, but also the application fields such as unmanned aerial vehicle navigation.

Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Exemplary method

Fig. 1 illustrates a flow diagram of a heterogeneous binocular slam method according to an embodiment of the present application.

As shown in fig. 1, a heterogeneous binocular SLAM method 100 according to an embodiment of the present application includes a step S110 of constructing a first monocular SLAM and a second monocular SLAM for a first monocular camera and a second monocular camera, respectively, where the first monocular camera and the second monocular camera are heterogeneous cameras. The first monocular camera and the second monocular camera are cameras with different visual fields, or cameras with different visual angles or different visual fields and depths of field, and the first monocular camera and the second monocular camera form a heterogeneous binocular camera. It should be understood that the first monocular camera and the second monocular camera may also be each independently arranged at a suitable distance from each other, but do not constitute a binocular camera.

Here, the camera includes cameras or camera lenses having different angles of view, such as a wide-angle camera and a super wide-angle camera; the system also comprises a camera and a camera known as a camera, wherein the camera can be a camera applied to various industries, fields and scenes, such as but not limited to cameras used for security monitoring, interactive motion sensing games, automatic driving, environmental 3D modeling and the like, for example, a camera of a surround view system or an automatic driving system on an automobile, a camera installed on a household sweeping robot, a monitoring camera installed in a supermarket or a market and the like. Such a camera may be a monocular camera, or may be a binocular camera or a multi-purpose camera of more than one purpose. When the camera is a multi-view camera, it will be appreciated that the principles of the invention may be applied, for example, to each view camera it comprises separately, thereby constituting a multi-view SLAM. On the other hand, the image received from the camera may be a single image or a video image including a plurality of frames of images.

When a monocular SLAM is constructed for a first monocular camera and a second monocular camera, image feature matching is carried out on images at different positions acquired by the first monocular camera and the second monocular camera respectively, and matched feature points of the images acquired by the first monocular camera and the second monocular camera respectively are obtained; calculating to obtain respective camera pose information of the first monocular camera and the second monocular camera according to the matched feature points of the images respectively acquired by the first monocular camera and the second monocular camera; acquiring three-dimensional world coordinate system coordinates of matched feature points of images acquired by the first monocular camera and the second monocular camera respectively; and respectively constructing the first monocular SLAM and the second monocular SLAM according to the camera pose information of the first monocular camera and the second monocular camera and the world coordinate system three-dimensional coordinates of the feature points, wherein at least one feature point in the matched feature points is positioned in the common visual field of the first monocular camera and the second monocular camera.

In this embodiment of the present application, in the monocular SLAM constructing process, at least one of the feature points used is located within the common view of the two cameras, so as to avoid using all the feature points located outside the common view of the two cameras in the monocular SLAM constructing process, thereby ensuring that the three-dimensional world represented by the images acquired by the two cameras can be reconstructed based on the feature points within the common view.

In some embodiments of the present application, a different approach to the above is provided when constructing a monocular SLAM for a first monocular camera and a second monocular camera. Specifically, for a monocular camera, constructing a first monocular SLAM and a second monocular SLAM for a first monocular camera and a second monocular camera respectively comprises: performing image feature matching on images at different positions acquired by the first monocular camera and the second monocular camera respectively, and dividing the images acquired by the first monocular camera and the second monocular camera into an image in a common view field of the first monocular camera and an image in the common view field of the second monocular camera respectively; extracting at least one feature point located outside a common field of view of the first monocular camera and the second monocular camera and obtaining their relative coordinates with respect to the first monocular camera and the second monocular camera, respectively; calculating to obtain respective camera pose information of the first monocular camera and the second monocular camera according to the feature points, matched with each other, of the first monocular camera and the second monocular camera and located outside the common field of view; acquiring world coordinate system three-dimensional coordinates of feature points in an image portion of the at least one first monocular camera acquired image that is within the common field of view and feature points in an image portion of the at least one second monocular camera acquired image that is within the common field of view; and respectively constructing the first monocular SLAM and the second monocular SLAM according to the camera pose information of the first monocular camera and the camera pose information of the second monocular camera, and the three-dimensional world coordinate system coordinates of the feature point of the image part of the at least one first monocular camera, which is positioned in the common visual field, and the feature point of the image part of the at least one second monocular camera, which is positioned in the common visual field.

In this embodiment of the present application, in constructing the monocular SLAM, by dividing the acquired image into a partial image located within the common field of view of the two cameras and a partial image located outside the common field of view, feature points partially located in the partial image located outside the common field of view of the two cameras and feature points partially located in the partial image located within the common field of view of the two cameras are extracted, and the SLAM is constructed accordingly. That is, in the present embodiment, it is ensured that the feature points in the partial images completely outside the common field of view of the two cameras are used in calculating the pose information of the respective cameras by dividing the images, but the monocular SLAMs are constructed using the feature points located in the common field of view and the pose information of the respective cameras. Feature points in the partial images that lie within the common field of view of the two cameras can thus be used to correct the scale of the two monocular SLAMs.

Further, in some embodiments, step S110 further includes determining a common field of view of the first monocular camera and the second monocular camera according to respective camera parameters of the first monocular camera and the second monocular camera. In this embodiment of the present application, the common field of view of the two monocular cameras is determined so as to divide the images acquired by the two monocular cameras into a partial image within the common field of view and a partial image outside the common field of view.

Next, in step S120, the scales of the first monocular SLAM and the second monocular SLAM are corrected using the feature points located within the common field of view of the first monocular camera and the second monocular camera, respectively. Specifically, feature points in the images at the different positions, which are located in a common visual field of the first monocular camera and the second monocular camera, are acquired; acquiring three-dimensional coordinates of the world coordinate system of the feature points positioned in the common view field; and respectively correcting the scales of the first monocular SLAM and the second monocular SLAM by utilizing the three-dimensional coordinates of the world coordinate system of the feature points positioned in the common visual field.

Finally, in step S130, data fusion is performed on the corrected first monocular SLAM and the second monocular SLAM to construct a binocular SLAM.

The heterogeneous binocular SLAM method of the present application is further described below in conjunction with fig. 2.

Fig. 2 illustrates a view of a binocular vision scene. As shown in fig. 2, the heterogeneous binocular SLAM method of the present application is constructed using two monocular cameras, i.e., a camera a and a camera b. It should be understood that the use of two monocular cameras is used herein for illustration only and not for limitation, and that cameras 1 and 2 in fig. 2 may also be a binocular camera or a multi-view camera, respectively, or one camera may be a monocular camera and the other camera may be a binocular camera or a multi-view camera.

As shown in fig. 2, two rays a1 and a2 extend from camera a, and the region between rays a1 and a2 represents the field of view of camera a; two rays b1 and b2 emerge from camera b, and the region between rays b1 and b2 represents the field of view of camera b. As can be seen from fig. 2, camera a and camera b are two heterogeneous cameras, which have different fields of view, and the field of view of camera a is narrower than that of camera b. As also shown in fig. 2, there is an overlap between the field of view of camera a and the field of view of camera b, region 1 in fig. 2, referred to as the common field of view of camera a and camera b.

In step S110, camera a is constructed separately for camera a and camera b_slamAnd a camera b_slam. Specifically, the camera a and the camera b are respectively at least two positionsEach taking at least one frame image a_image1And a_image2、b_image1And b_image2Separately for the image a_image1And a_image2、b_image1And b_image2Carrying out image characteristic matching to obtain an image a_image1And a_image2、b_image1And b_image2The matched feature points of (1); determining the common visual field of the camera a and the camera b according to the internal references of the camera a and the camera b respectively; respectively calculating the camera pose information of the camera a and the camera b according to the matched feature points of the images acquired by the camera a and the camera b; acquiring three-dimensional coordinates of world coordinate systems of matched characteristic points of images acquired by the camera a and the camera b respectively; then, respectively constructing camera a according to the respective camera pose information of camera a and camera b and the world coordinate system three-dimensional coordinates of the matched feature points_slamAnd a camera b_slamWherein at least one of the matched feature points is outside the common field of view of camera a and camera b. This ensures that camera a is constructed separately for camera a and camera b_slamAnd a camera b_slamThe selected feature points are not both located within the common field of view of camera a and camera b.

The construction of the monocular SLAM is not limited to the above method. In some embodiments, the common field of view of camera a and camera b is determined from the internal references of camera a and camera b; matching at least two frames of images of different positions acquired by the camera a and the camera b respectively, and dividing the images acquired by the camera a and the camera b into an image part located in a common visual field of the camera a and the camera b and an image part located outside the common visual field respectively; extracting at least one feature point in an image portion located outside a common field of view and acquiring relative coordinates thereof with respect to the camera a and the camera b, respectively; according to the relative coordinates of the characteristic points in the image part which is positioned outside the common view field relative to the camera a and the camera b, calculating to obtain the respective camera pose information of the camera a and the camera b and the three-dimensional coordinates of the world coordinate system of the characteristic points in the image part which is positioned outside the common view field; according to the respective camera pose information of the camera a and the camera b and the at least one camera located in the common visual fieldThe three-dimensional coordinates of the world coordinate system of the characteristic points in the outer image part respectively construct a camera a_slamAnd a camera b_slam。

Next, in step S120, at least one feature point in the images acquired by the cameras a and b and located in the common view of the cameras a and b is selected, three-dimensional coordinates of the world coordinate system are acquired, and the three-dimensional coordinates of the world coordinate system of the at least one common view point are used to correct the camera a and the camera b respectively_slamAnd a camera b_slamThe dimension (c) of (c).

Finally, in step S130, the corrected camera a is corrected_slamAnd a camera b_slamAnd (4) carrying out data fusion to construct a binocular SLAM.

Therefore, according to the heterogeneous binocular SLAM method of the embodiment of the application, heterogeneous cameras with different visual angles are adopted, monocular SLAMs are respectively constructed for the cameras, and the binocular SLAMs are constructed after data fusion, so that images with different depths of field are shot by using different depths of field of the two heterogeneous cameras, and different items within a small depth of field and a large depth of field from the cameras are respectively detected. According to the binocular heterogeneous SLAM method, the cameras with different visual angles are used, the large-visual-angle camera and the small-visual-angle camera are matched with each other, so that the objects in different depth of field in front of the cameras can be detected, and the scenes in front of the cameras can be observed more comprehensively.

Exemplary devices

Fig. 3 illustrates a schematic diagram of a heterogeneous binocular SLAM device according to an embodiment of the present application.

As shown in fig. 3, the heterogeneous binocular SLAM200 apparatus according to an embodiment of the present application includes a monocular SLAM constructing module 210, a monocular SLAM correcting module 220, and a binocular SLAM constructing module 230.

Monocular SLAM construction module 210 may be used to construct a first monocular SLAM and a second monocular SLAM for a first monocular camera and a second monocular camera, respectively.

In some examples, the monocular SLAM construction module 210 performs image feature matching on the images at different positions acquired by the first monocular camera and the second monocular camera, respectively, to obtain matched feature points of the image acquired by the first monocular camera and matched feature points of the image acquired by the second monocular camera, respectively; respectively calculating to obtain respective camera pose information of the first monocular camera and the second monocular camera; acquiring three-dimensional coordinates of a world coordinate system of matched characteristic points of the image acquired by the first monocular camera and three-dimensional coordinates of a world coordinate system of matched characteristic points of the image acquired by the second monocular camera; and respectively constructing the first monocular SLAM and the second monocular SLAM according to the camera pose information of the first monocular camera and the second monocular camera, and the world coordinate system three-dimensional coordinates of the matched feature points of the image acquired by the first monocular camera and the matched feature points of the image acquired by the second monocular camera, wherein at least one feature point of the matched feature points of the image acquired by the first monocular camera and the matched feature points of the image acquired by the second monocular camera is positioned outside the common visual field of the first monocular camera and the second monocular camera.

In some examples, different from the above examples, the monocular SLAM construction module 210 performs image feature matching on the images of different positions acquired by the first monocular camera and the second monocular camera, and divides the images acquired by the first monocular camera and the second monocular camera into an image portion located within a common field of view of the first monocular camera and the second monocular camera, and an image portion located outside the common field of view, respectively; extracting feature points in image portions of at least one first monocular camera acquired image lying outside the common field of view and feature points in image portions of at least one second monocular camera acquired image lying outside the common field of view and acquiring their relative coordinates with respect to the first monocular camera or the second monocular camera, respectively; calculating to obtain respective camera pose information of the first monocular camera and the second monocular camera; acquiring world coordinate system three-dimensional coordinates of feature points in an image portion of the image acquired by at least one first monocular camera that is located outside the common field of view and feature points in an image portion of the image acquired by at least one second monocular camera that is located outside the common field of view; and respectively constructing the first monocular SLAM and the second monocular SLAM according to the camera pose information of the first monocular camera and the camera pose information of the second monocular camera, and the three-dimensional world coordinate system coordinates of the feature point in the image part of the image acquired by the at least one first monocular camera and the feature point in the image part of the image acquired by the at least one second monocular camera, wherein the feature points are located outside the common visual field.

In some examples, the monocular SLAM build module 210 also determines a common field of view for the first monocular camera and the second monocular camera from their respective camera parameters.

Monocular SLAM correction module 220 may be configured to correct the first monocular SLAM and the second monocular SLAM, respectively, using feature points within a common field of view of the first monocular camera and the second monocular camera. In some embodiments, the monocular SLAM correction module 220 includes a common viewpoint acquisition unit operable to acquire a common viewpoint in the images of the different locations that is within a common field of view of the first monocular camera and the second monocular camera; the co-viewpoint three-dimensional coordinate acquisition unit can be used for acquiring the world coordinate system three-dimensional coordinates of the co-viewpoint; and the correcting unit can be used for respectively correcting the scales of the first monocular SLAM and the second monocular SLAM by utilizing the three-dimensional coordinates of the world coordinate system of the common viewpoint. Here, the common viewpoint means a feature point in an image portion of an image captured by each monocular camera that is located within a common field of view of the two monocular cameras; the non-common viewpoint means a feature point in an image portion of an image captured by each monocular camera that is located outside the common view of the two monocular cameras.

Finally, the binocular SLAM construction module 230 may be configured to perform data fusion on the corrected first monocular SLAM and the corrected second monocular SLAM to construct a binocular SLAM.

Since the specific functions and operations of the respective units and modules of the heterogeneous binocular SLAM device have been described in detail in the heterogeneous binocular SLAM method described above with reference to fig. 1 and 2, they are only briefly introduced here to avoid repetitive description.

Exemplary electronic device

Fig. 4 shows a block diagram of an electronic device 300 according to an embodiment of the present application. As shown in fig. 4, electronic device 300 may include a processor 310 and a memory 320.

The processor 310 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 300 to perform desired functions.

Memory 320 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 310 to implement the heterogeneous binocular SLAM method of the various embodiments of the present application described above and/or other desired functionality. The computer-readable storage medium may also store therein related information such as individual cameras and drivers.

In one example, the electronic device 300 may also include an interface 330, an input device 340, and an output device 350, which may be interconnected via a bus system and/or other form of connection mechanism (not shown).

The interface 330 may be used to connect to binocular cameras that need to be heterogeneous. For example, the interface 330 may be a USB interface commonly used for cameras, but may also be other interfaces such as a Type-C interface. The electronic device 300 may include one or more interfaces 330 to connect to respective cameras and receive images taken by the cameras therefrom for performing the heterogeneous binocular SLAM method described above.

The input device 340 may be used for receiving external input, such as physical point coordinate values input by a user. In some embodiments, input device 340 may be, for example, a keyboard, mouse, tablet, touch screen, or the like.

The output device 350 may output the obtained camera parameters or the SLAM map. For example, output devices 350 may include a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others. In some embodiments, the input device 340 and the output device 350 may be an integrated touch display screen.

For simplicity, only some of the components of the electronic device 300 that are relevant to the present application are shown in fig. 4, while some of the relevant peripheral or auxiliary components are omitted. In addition, electronic device 300 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the heterogeneous binocular SLAM method according to various embodiments of the present application described in the "exemplary methods" section of this specification, supra.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the binocular heterogeneous SLAM method according to various embodiments of the present application described in the "exemplary methods" section above of the present specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A heterogeneous binocular SLAM method, comprising:

respectively constructing a first monocular SLAM and a second monocular SLAM for a first monocular camera and a second monocular camera, wherein the first monocular camera and the second monocular camera are heterogeneous cameras;

respectively correcting the scales of the first monocular SLAM and the second monocular SLAM by utilizing feature points positioned in a dual-purpose common visual field of the first monocular camera and the second monocular camera; and carrying out data fusion on the corrected first monocular SLAM and the corrected second monocular SLAM to construct a binocular SLAM.

2. The heterogeneous binocular SLAM method of claim 1, wherein the constructing the first and second monocular SLAMs for the first and second monocular cameras, respectively, comprises:

respectively carrying out image feature matching on the images at different positions acquired by the first monocular camera and the second monocular camera to respectively obtain matched feature points of the image acquired by the first monocular camera and matched feature points of the image acquired by the second monocular camera;

respectively calculating to obtain respective camera pose information of the first monocular camera and the second monocular camera;

acquiring three-dimensional coordinates of a world coordinate system of matched characteristic points of the image acquired by the first monocular camera and three-dimensional coordinates of a world coordinate system of matched characteristic points of the image acquired by the second monocular camera; and

respectively constructing the first monocular SLAM and the second monocular SLAM according to the camera pose information of the first monocular camera and the camera pose information of the second monocular camera, and the world coordinate system three-dimensional coordinates of the matched feature points of the image acquired by the first monocular camera and the matched feature points of the image acquired by the second monocular camera, wherein at least one feature point of the matched feature points of the image acquired by the first monocular camera and the matched feature points of the image acquired by the second monocular camera is located in the common visual field of the first monocular camera and the second monocular camera.

3. The heterogeneous binocular SLAM method of claim 1, wherein the constructing the first and second monocular SLAMs for the first and second monocular cameras, respectively, comprises:

performing image feature matching on images at different positions acquired by the first monocular camera and the second monocular camera respectively, and segmenting the images acquired by the first monocular camera and the second monocular camera into an image part located in a common view of the first monocular camera and the second monocular camera and an image part located outside the common view respectively;

extracting feature points in image portions of at least one first monocular camera acquired image lying outside the common field of view and feature points in image portions of at least one second monocular camera acquired image lying outside the common field of view and acquiring their relative coordinates with respect to the first monocular camera or the second monocular camera, respectively;

calculating to obtain respective camera pose information of the first monocular camera and the second monocular camera;

acquiring world coordinate system three-dimensional coordinates of feature points in an image portion of the at least one first monocular camera acquired image that is within the common field of view and feature points in an image portion of the at least one second monocular camera acquired image that is within the common field of view; and

and respectively constructing the first monocular SLAM and the second monocular SLAM according to the camera pose information of the first monocular camera and the second monocular camera, and the three-dimensional world coordinate system coordinates of the characteristic point of the image part of the image acquired by the at least one first monocular camera, which is positioned in the common visual field, and the characteristic point of the image part of the image acquired by the at least one second monocular camera, which is positioned in the common visual field.

4. The method of claim 2 or 3, wherein the constructing the first monocular SLAM and the second monocular SLAM for the first monocular camera and the second monocular camera, respectively, further comprises:

determining a common field of view of the first monocular camera and the second monocular camera according to respective camera parameters of the first monocular camera and the second monocular camera.

5. The heterogeneous binocular SLAM method of claim 4, wherein the separately correcting the scale of the first and second monocular SLAMs using feature points located within a common field of view of the first and second monocular cameras comprises:

acquiring feature points and world coordinate system three-dimensional coordinates thereof in the images at different positions and in the common view field of the first monocular camera and the second monocular camera; and

and respectively correcting the scales of the first monocular SLAM and the second monocular SLAM by utilizing the three-dimensional coordinates of the world coordinate system of the feature points in the images at different positions, wherein the feature points are positioned in the common visual field of the first monocular camera and the second monocular camera.

6. The heterogeneous binocular SLAM method of claim 1, wherein the first and second monocular cameras are two cameras with different fields of view.

7. A heterogeneous binocular SLAM device, comprising:

the monocular SLAM building module is used for respectively building a first monocular SLAM and a second monocular SLAM for the first monocular camera and the second monocular camera;

the monocular SLAM correcting module is used for respectively correcting the first monocular SLAM and the second monocular SLAM by utilizing the feature points in the common visual field of the first monocular camera and the second monocular camera; and

and the binocular SLAM construction module is used for carrying out data fusion on the corrected first monocular SLAM and the corrected second monocular SLAM to construct a binocular SLAM.

8. The heterogeneous binocular SLAM device of claim 7, wherein the monocular SLAM correction module comprises:

a common viewpoint acquiring unit, configured to acquire feature points in the images at different positions, the feature points being located in a common field of view of the first monocular camera and the second monocular camera;

the co-viewpoint three-dimensional coordinate acquisition unit is used for acquiring the three-dimensional coordinates of the world coordinate system of the co-viewpoint; and

and the correcting unit is used for respectively correcting the scales of the first monocular SLAM and the second monocular SLAM by utilizing the three-dimensional coordinates of the world coordinate system of the common viewpoint.

9. An electronic device, comprising:

a processor; and

memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the method according to any one of claims 1 to 6.

10. A non-transitory storage medium having stored thereon instructions for performing the method of any one of claims 1-6.