CN111325792B

CN111325792B - Method, apparatus, device and medium for determining camera pose

Info

Publication number: CN111325792B
Application number: CN202010075927.7A
Authority: CN
Inventors: 郭亨凯; 杜思聪
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2020-01-23
Filing date: 2020-01-23
Publication date: 2023-09-26
Anticipated expiration: 2040-01-23
Also published as: CN111325792A

Abstract

Embodiments of the present disclosure disclose methods, apparatuses, electronic devices, and computer-readable media for determining camera pose. One embodiment of the method comprises the following steps: extracting characteristic points of at least two frames of images; based on the extraction of the feature points, determining a homography matrix between two adjacent frames of images; determining a plane normal vector according to the homography matrix between the adjacent images; determining a homography matrix between a plane corresponding to the plane normal vector and the target image; determining a homography matrix between the at least two frames of images and the plane according to the homography matrix between the plane of the plane normal vector and the target image and the homography matrix between the two adjacent frames of images; and processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images. This embodiment enables determination of camera pose by estimating the normal plane of the plane and tracking the feature points.

Description

Method, apparatus, device and medium for determining camera pose

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, apparatus, device, and computer readable medium for determining a pose of a camera.

Background

Camera pose, i.e. the position of the camera in space and the orientation of the camera. The pose of a camera can be seen as the transformation of the camera from the original position to the current position, including translational and rotational transformations, being the position of a rigid body in space and its own pose. Currently, the determination of camera pose is done at a known image plane normal vector and by fixing the horizontal plane in the image. But the normal vector to the plane in the image is unknown.

Disclosure of Invention

The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose methods, apparatuses, devices, and computer-readable media for determining a pose of a camera to solve the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method for determining a pose of a camera, the method comprising: extracting characteristic points of at least two frames of images; extracting characteristic points of at least two frames of images; determining a homography matrix between two adjacent frames of images based on the characteristic points of the at least two frames of images; determining a plane normal vector according to the homography matrix between the two adjacent frames of images; determining a homography matrix between a plane corresponding to the plane normal vector and the target image based on the rotation angle of the target image in the at least two frames of images and the translation distance corresponding to the plane normal vector and the target image; determining a homography matrix between the at least two frames of images and the plane according to the homography matrix between the plane corresponding to the plane normal vector and the target image and the homography matrix between the two adjacent frames of images; and processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

In a second aspect, some embodiments of the present disclosure provide an apparatus for determining a pose of a camera, the apparatus comprising: an extraction unit configured to extract feature points of at least two frames of images; a first determination unit configured to determine a homography matrix between two adjacent frame images based on the feature points of the at least two frame images; a second determining unit configured to determine a plane normal vector according to a homography matrix between the two adjacent frames of images; a third determining unit configured to determine a homography matrix between a plane corresponding to the planar normal vector and the target image based on a rotation angle of the target image in the at least two frames of images, the planar normal vector, and a translation distance corresponding to the target image; a fourth determination unit configured to determine a homography matrix between the at least two frames of images and a plane, based on a homography matrix between the plane corresponding to the plane normal vector and the target image and a homography matrix between the two adjacent frames of images; and the processing unit is configured to process the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

In a third aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as in any of the first aspects.

In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any of the first aspects.

One of the above embodiments of the present disclosure has the following advantageous effects: and extracting the characteristic points of the at least two frames of images, and determining a homography matrix between the two adjacent frames of images by tracking the characteristic points. Further, the generation of the planar normal vector can be automatically realized by the homography matrix between the adjacent images. The problem that normal vector of a plane is unknown in the at least two frames of images is solved. And determining a homography matrix between the plane corresponding to the plane normal vector and the target image to obtain the homography matrix between the plane and the target image. And taking the homography matrix between the plane and the target image as an initial homography matrix, and multiplying the homography matrix between the two adjacent frames of images to obtain the homography matrix between the at least two frames of images and the plane. Finally, the rotation angle and the translation distance are accurately and rapidly obtained through matrix decomposition, so that the pose of the camera is determined.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a schematic illustration of one application scenario of a method for determining a camera pose according to some embodiments of the present disclosure;

FIG. 2 is a flow chart of some embodiments of a method for determining a camera pose according to the present disclosure;

FIG. 3 is a flow chart of still further embodiments of a method for determining a camera pose according to the present disclosure;

FIG. 4 is a schematic structural view of some embodiments of an apparatus for determining a camera pose according to the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic illustration of an application scenario of a method for determining a camera pose according to some embodiments of the present disclosure.

As shown in fig. 1, a server 101 acquires an image 102, and extracts a large number of feature points 103 on the acquired image. Then, the server 101 determines the homography matrix 104 between the above-described two adjacent frame images according to the matching relationship between the feature points 103 in the images. The homography matrix 104 between the two adjacent frames of images is used as a constraint condition, and a plane normal vector 105 can be obtained. Based on the plane normal vector 105, the preset translation distance 106 and the rotation angle 107 between the target image and the plane, a homography matrix 108 between the plane and the target image is obtained. The homography matrix 109 between the plane and the target image is obtained according to the homography matrix 104 between the two adjacent frames of images and the homography matrix 108 between the plane and the target image. Finally, the homography matrix 109 between the plane and the target image is subjected to matrix decomposition to obtain the pose 110 of the camera corresponding to the image.

It is to be understood that the information presenting method may be performed by a terminal device, or may be performed by the server 101, and the main body of the method may include a device formed by integrating the terminal device and the server 101 through a network, or may be performed by various software programs. The execution body may be embodied as a server, software, or the like. When the execution subject is software, the execution subject can be installed in the electronic device enumerated above. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of servers 101 in FIG. 1 is merely illustrative. There may be any number of terminal devices, as desired for implementation.

With continued reference to fig. 2, a flow 200 of some embodiments of a method for determining a camera pose according to the present disclosure is shown. The method for determining the pose of the camera comprises the following steps:

in step 201, feature points of at least two frames of images are extracted.

In some embodiments, an execution subject of the method for determining a camera pose (e.g., a server shown in fig. 1) performs feature point extraction on at least two frames of images that have been acquired. The feature point extraction method includes, but is not limited to, at least one of the following: FAST feature points, harris corner points, kanade-Lucas-Tomasi Feature Tracker, ORB feature. As an example, the feature points of the above at least two frame images may be extracted using the FAST feature points.

Step 202, determining a homography matrix between two adjacent frames of images based on the feature points of the at least two frames of images.

In some embodiments, the executing entity (e.g., the server shown in fig. 1) may obtain the homography matrix between the two adjacent frames of images according to the feature points of the at least two frames of images. Homographies in homography matrices are concepts in projective geometry, also known as projective transformations. It maps points (three-dimensional homogeneous vectors) on one projection plane to another projection plane and maps straight lines to straight lines, with line preserving properties. In general, homography is a linear transformation of three-dimensional homogeneous vectors. Homography is a 3 x 3 non-singular matrix representing such a linear transformation. The method for determining the homography matrix between the two adjacent frames of images is obtained through a series of mathematical transformations of the feature points between the two adjacent frames of images. As an example, the number of feature points extracted in adjacent two frame images is at least 4 pairs.

As an example, based on the feature points of two adjacent frames of images, the homography matrix between two adjacent frames of images can be obtained by the following method:

feature points p1, p2 on two adjacent images. Wherein, the coordinate of p1 is p1 (x, y), and the coordinate of p2 is p2 (x ', y'). p1 and p2 are a pair of matched point pairs, and the homography matrix is H, and then:

by the feature points p1, p2 on the two adjacent images, it is possible to obtain:

therefore, to recover 8 parameters in H, at least 4 pairs of matching points are needed, the procedure is as follows:

4 pairs can be selected from all the matching points at a time, and a homography matrix H can be obtained.

And 203, determining a plane normal vector according to the homography matrix between the adjacent images.

In some embodiments, the planar normal vector is further solved based on the homography matrix between the adjacent images obtained in step 202. The planar normal vector may be a normal vector of a plane corresponding to the physical world of the target image, as an example. The plane corresponding to the physical world of the target image refers to the restoration of the region displayed by the target object in the image to the real plane corresponding to the physical world. The target image may be an object displayed in each of the above-described adjacent images. As an example, two frames of images can be arbitrarily extracted, and the normal vector of the plane can be obtained by matrix decomposition through the constraint relation between the two frames, namely, the homography matrix between the two frames of images.

In some optional implementations of some embodiments, determining the planar normal vector according to the homography matrix between the adjacent images includes: and obtaining the plane normal vector based on a homography matrix and a least square method between two adjacent frames of images. At least three frames of images are adopted, homography matrixes between two adjacent frames of images in the at least three frames of images are used as constraint conditions to be combined, and the plane normal vector is obtained through a least square method. The method of determining the normal plane using at least three frames of images is more accurate than the method of randomly extracting two frames of images.

Step 204, determining a homography matrix between a plane corresponding to the plane normal vector and the target image based on the rotation angle of the target image in the at least two frames of images, the plane normal vector and a translation distance corresponding to the target image.

In some embodiments, the plane normal vector may be a normal vector of a plane corresponding to the physical world of the target image, and the homography matrix H1 between the plane corresponding to the plane normal vector and the target image is obtained according to the following formula:

H1=K（R+T*d-1*NT）K-1

wherein H1 represents a homography matrix between a plane corresponding to the plane normal vector and the target image, K is an internal parameter of a shooting camera, R is a rotation angle of the target image in the at least two frames of images, N represents a plane normal vector, d represents a distance from the plane to the camera, that is, a translation distance corresponding to the target image, and T represents a distance between adjacent images. Here, the translation distance corresponding to the target image may be set in advance.

And 205, determining the homography matrix between the at least two frames of images and the plane according to the homography matrix between the plane corresponding to the plane normal vector and the target image and the homography matrix between the two adjacent frames of images.

In some embodiments, determining the homography matrix between the at least two frames of images and the plane is obtained by multiplying the homography matrix between the adjacent two frames of images by the homography matrix between the plane corresponding to the normal vector of the plane and the target image. As an example, in the case where the homography matrix between the first frame image and the plane and the homography matrix between the adjacent two frame images are known, the homography matrix between the second frame image and the plane can be obtained by multiplying the homography matrix between the first frame image and the plane and the homography matrix between the first frame image and the second frame image described above. Further, the homography matrix between the third frame image and the plane can be obtained by multiplying the homography matrix between the second frame image and the third frame image by the homography matrix between the second frame image and the plane, and the homography matrix between the plane corresponding to the normal vector of the plane and the target image and the homography matrix between the two adjacent frame images can be determined by the homography matrix between the at least two frame images and the plane.

And 206, processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

In some embodiments, the pose of the camera includes a transformation of the camera from an original position to a current position, including a translational transformation and a rotational transformation. And obtaining the rotation angle and the translation distance of the camera by carrying out matrix decomposition on the homography matrix between the at least two frames of images and the plane. And determining the pose of the camera under the current condition according to the rotation angle and the translation transformation corresponding to each frame of image.

In some optional implementations of some embodiments, processing the homography matrix to obtain a camera pose includes: singular value decomposition (SVD, singular Value Decomposition) is performed on the homography matrix between the image and the plane to obtain the rotation angle and translation distance of the camera. Here, the method using singular value decomposition is not only simple and effective, but also very straight white, the camera pose can be obtained. This matrix decomposition method is more efficient than other matrix decomposition methods.

With continued reference to FIG. 3, a flow 300 of further embodiments of a method of determining camera pose is shown. The process 300 of the method of determining camera pose includes the steps of:

step 301, extracting feature points of at least two frames of images.

In some embodiments, the specific implementation of step 301 and the technical effects thereof may refer to step 201 in the embodiment corresponding to fig. 2, which is not described herein again.

Step 302, determining a homography matrix between two adjacent frames of images based on the feature points and the rotation angles of the at least two frames of images.

In some embodiments, according to the feature points of the at least two frames of images, the homography matrix between the two adjacent frames of images reflects the mapping relationship of the same plane in the two frames of images. A basis is provided for obtaining a homography matrix between at least two frames of images and a plane. Here, taking the rotation angle of the at least two frame images as a known condition increases the accuracy of obtaining the homography matrix between the adjacent two frame images.

And step 303, determining a plane normal vector according to the homography matrix between the adjacent images.

Step 304, determining a homography matrix between a plane corresponding to the plane normal vector and the target image based on the rotation angle of the target image in the at least two frames of images, the plane normal vector and a translation distance corresponding to the target image.

And 305, determining the homography matrix between the at least two frames of images and the plane according to the homography matrix between the plane corresponding to the plane normal vector and the target image and the homography matrix between the two adjacent frames of images.

And 306, processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

In some embodiments, the specific implementation of steps 303-306 and the technical effects thereof may refer to steps 203-206 in the corresponding embodiment of fig. 2, which are not described herein.

Step 307, determining a movement mode of the camera based on the pose of the camera corresponding to the at least two frames of images.

In some embodiments, the camera may perform scene restoration according to the pose of the camera of the at least two frames by recording the pose of the camera of the at least two frames. And combining corresponding scenes of the pose of the cameras of the at least two frames of images to obtain a movement mode of the cameras.

With continued reference to fig. 4, as an implementation of the method described above for the various figures, the present disclosure provides some embodiments of an apparatus for determining camera pose, which apparatus embodiments correspond to those described above for fig. 2, and which apparatus is particularly applicable in a variety of electronic devices.

As shown in fig. 4, an apparatus 400 for determining a camera pose of some embodiments includes: an extraction unit 401 configured to extract feature points of at least two frame images; a first determining unit 402 configured to determine a homography matrix between two adjacent frame images based on the feature points of the at least two frame images; a second determining unit 403 configured to determine a planar normal vector based on the homography matrix between the adjacent images; a third determining unit 404 configured to determine a homography matrix between a plane corresponding to the planar normal vector and the target image based on a rotation angle of the target image in the at least two frames of images, the planar normal vector, and a translation distance corresponding to the target image; a fourth determining unit 405 configured to determine a homography matrix between the at least two frames of images and a plane according to a homography matrix between the plane corresponding to the plane normal vector and the target image and a homography matrix between the two adjacent frames of images; and a processing unit 406 configured to process the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

In some optional implementations of some embodiments, the second determining unit 403 is further configured to: and obtaining the plane normal vector based on a homography matrix and a least square method between two adjacent frames of images.

In some alternative implementations of some embodiments, the processing unit 406 is further configured to: singular value decomposition is carried out on the homography matrix between the image and the plane, so as to obtain the rotation angle and the translation distance of the camera.

In some optional implementations of some embodiments, the translation distance corresponding to the target image is preset.

In some optional implementations of some embodiments, the first determining unit 402 is further configured to: determining a homography matrix between two adjacent frames of images based on the characteristic points and the rotation angles of the at least two frames of images;

in some alternative implementations of some embodiments, the apparatus 400 may further include: a fifth determining unit (not shown in the figure) may be configured to determine a movement pattern of the camera based on the pose of the camera corresponding to the at least two frames of images.

It will be appreciated that the elements described in the apparatus 400 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 400 and the units contained therein, and are not described in detail herein.

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., server in fig. 1) 500 suitable for use in implementing some embodiments of the present disclosure is shown. The server illustrated in fig. 5 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 5 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communications device 509, or from the storage device 508, or from the ROM 502. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.

It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be embodied in the apparatus; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: extracting characteristic points of at least two frames of images; determining a homography matrix between two adjacent frames of images based on the characteristic points of the at least two frames of images; determining a plane normal vector according to the homography matrix between the two adjacent frames of images; determining a homography matrix between a plane corresponding to the plane normal vector and the target image based on the rotation angle of the target image in the at least two frames of images and the translation distance corresponding to the plane normal vector and the target image; determining a homography matrix between the at least two frames of images and the plane according to the homography matrix between the plane corresponding to the plane normal vector and the target image and the homography matrix between the two adjacent frames of images; and processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an extraction unit, a first determination unit, a second determination unit, a third determination unit, a fourth determination unit, and a processing unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the extraction unit may also be described as "a unit that extracts feature points of at least two frames of images".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In accordance with one or more embodiments of the present disclosure, there is provided a method of determining a camera pose, comprising: extracting characteristic points of at least two frames of images; extracting characteristic points of at least two frames of images; determining a homography matrix between two adjacent frames of images based on the characteristic points of the at least two frames of images; determining a plane normal vector according to the homography matrix between the two adjacent frames of images; determining a homography matrix between a plane corresponding to the plane normal vector and the target image based on the rotation angle of the target image in the at least two frames of images and the translation distance corresponding to the plane normal vector and the target image; determining a homography matrix between the at least two frames of images and the plane according to the homography matrix between the plane corresponding to the plane normal vector and the target image and the homography matrix between the two adjacent frames of images; and processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

According to one or more embodiments of the present disclosure, the determining a planar normal vector according to the homography matrix between the adjacent images includes: and obtaining the plane normal vector based on a homography matrix and a least square method between two adjacent frames of images.

According to one or more embodiments of the present disclosure, the translation distance corresponding to the target image is preset.

According to one or more embodiments of the present disclosure, the processing the homography matrix to obtain a camera pose includes: singular value decomposition is carried out on the homography matrix between the at least two frames of images and the plane, so that the rotation angle and the translation distance of the camera are obtained.

According to one or more embodiments of the present disclosure, determining a homography matrix between two adjacent frames of images includes: and determining a homography matrix between two adjacent frames of images based on the characteristic points and the rotation angles of the at least two frames of images.

According to one or more embodiments of the present disclosure, the above method further comprises: and determining a movement mode of the camera based on the pose of the camera corresponding to the at least two frames of images.

According to one or more embodiments of the present disclosure, there is provided an apparatus for determining a pose of a camera, comprising: an extraction unit configured to extract feature points of at least two frames of images; a first determination unit configured to determine a homography matrix between two adjacent frame images based on the feature points of the at least two frame images; a second determining unit configured to determine a plane normal vector according to a homography matrix between the two adjacent frames of images; a third determining unit configured to determine a homography matrix between a plane corresponding to the planar normal vector and the target image based on a rotation angle of the target image in the at least two frames of images, the planar normal vector, and a translation distance corresponding to the target image; a fourth determination unit configured to determine a homography matrix between the at least two frames of images and a plane, based on a homography matrix between the plane corresponding to the plane normal vector and the target image and a homography matrix between the two adjacent frames of images; and the processing unit is configured to process the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images.

According to one or more embodiments of the present disclosure, the second determining unit is further configured to: and obtaining the plane normal vector based on a homography matrix and a least square method between two adjacent frames of images.

According to one or more embodiments of the present disclosure, the processing unit is further configured to: singular value decomposition is carried out on the homography matrix between the at least two frames of images and the plane, so that the rotation angle and the translation distance of the camera are obtained.

According to one or more embodiments of the present disclosure, the first determining unit is further configured to: determining homography matrix between two adjacent frames of images based on the characteristic points and rotation angles of the at least two frames of images

According to one or more embodiments of the present disclosure, the apparatus further comprises: and a fifth determining unit configured to determine a movement mode of the camera based on the pose of the camera corresponding to the at least two frames of images.

According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments above.

According to one or more embodiments of the present disclosure, there is provided a computer readable medium having stored thereon a computer program, wherein the program, when executed by a processor, implements a method as described in any of the embodiments above.

Claims

1. A method for determining a pose of a camera, comprising:

extracting characteristic points of at least two frames of images;

determining a homography matrix between two adjacent frames of images based on the characteristic points of the at least two frames of images;

determining a plane normal vector according to the homography matrix between the two adjacent frames of images; the determining a plane normal vector according to the homography matrix between the two adjacent frames of images comprises:

taking homography matrixes between two adjacent images in at least three images as constraint conditions, and determining the plane normal vector according to a least square method;

determining a homography matrix between a plane corresponding to the plane normal vector and the target image based on the rotation angle of the target image, the translation distance corresponding to the plane normal vector and the target image which exist in at least two frames of images;

determining a homography matrix between the at least two frames of images and the plane according to the homography matrix between the plane corresponding to the plane normal vector and the target image and the homography matrix between the two adjacent frames of images;

processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images; the pose of the camera includes a translational transformation and a rotational transformation.

2. The method of claim 1, wherein the translation distance corresponding to the target image is preset.

3. The method of claim 1, wherein the processing the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images includes:

singular value decomposition is carried out on the homography matrix between the at least two frames of images and the plane, so that the rotation angle and the translation distance of the camera are obtained.

4. The method of claim 1, wherein the determining a homography matrix between two adjacent frames of images based on the feature points of the at least two frames of images comprises:

and determining a homography matrix between two adjacent frames of images based on the characteristic points and the rotation angles of the at least two frames of images.

5. The method of any of claims 1-4, wherein the method further comprises:

and determining a movement mode of the camera based on the pose of the camera corresponding to the at least two frames of images.

6. An apparatus for determining a pose of a camera, comprising:

an extraction unit configured to extract feature points of at least two frames of images;

a first determination unit configured to determine a homography matrix between two adjacent frame images based on the feature points of the at least two frame images;

the second determining unit is configured to determine a plane normal vector according to the homography matrix between the two adjacent frames of images; the determining a plane normal vector according to the homography matrix between the two adjacent frames of images comprises:

a third determining unit configured to determine a homography matrix between a plane corresponding to the planar normal vector and the target image based on a rotation angle of the target image, the planar normal vector, and a translation distance corresponding to the target image, which are present in each of the at least two frames of images;

a fourth determining unit configured to determine a homography matrix between the at least two frames of images and a plane according to a homography matrix between the plane corresponding to the plane normal vector and the target image and a homography matrix between the adjacent two frames of images;

the processing unit is configured to process the homography matrix between the at least two frames of images and the plane to obtain the pose of the camera corresponding to the at least two frames of images; the pose of the camera includes a translational transformation and a rotational transformation.

7. The apparatus of claim 6, wherein the translation distance corresponding to the target image is preset.

8. The apparatus of claim 6, wherein the processing unit is further configured to:

9. The apparatus of claim 6, wherein the first determination unit is further configured to:

10. The apparatus according to any one of claims 6-9, wherein the apparatus further comprises:

and a fifth determining unit configured to determine a movement mode of the camera based on the pose of the camera corresponding to the at least two frames of images.

11. An electronic device, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.

12. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-5.