CN110807807A

CN110807807A - Monocular vision target positioning pattern, method, device and equipment

Info

Publication number: CN110807807A
Application number: CN201810864207.1A
Authority: CN
Inventors: 熊友军; 郭奎
Original assignee: Ubtech Robotics Corp
Current assignee: Beijing Youbixuan Intelligent Robot Co ltd; Ubtech Robotics Corp
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2020-02-18
Anticipated expiration: 2038-08-01
Also published as: CN110807807B

Abstract

A monocular vision target positioning method comprises the following steps: acquiring a monocular visual color image, and extracting a contour included in the color image; filtering the outlines through the ellipse parameters and the distance between the outlines to obtain a first set of ellipse central points; filtering the elliptic central points in the first set according to whether the number of parallelograms formed by the central points in the first set of the elliptic central points and other adjacent central points meets a preset range or not to obtain a second set with 6 elliptic central points; and calculating a rotation matrix and a translation matrix through a preset perspective projection equation. The target positioning is carried out through the preset target positioning pattern comprising the characteristics of the preset number of parallelograms and the number of the parallelograms met by the central point of the ellipse in the detected image, and the monocular vision target positioning can be accurately and effectively completed.

Description

Monocular vision target positioning pattern, method, device and equipment

Technical Field

The application belongs to the field of target positioning, and particularly relates to a monocular vision target positioning method, device and equipment.

Background

Monocular visual target positioning is used to determine the positional relationship between the target space coordinate system and the camera coordinate system, i.e. includes determining the rotation parameter R and the translation parameter T. The monocular visual target positioning method can be divided into a marked point method and a non-marked point method according to the marked point method and the non-marked point method.

The marking point method is to design a marking structure or pattern with obvious characteristics, for example, the current two-dimensional code is also a marking pattern, and the two-dimensional code is positioned by squares at three corners of the two-dimensional code. The advantage of marking point location is stable, accurate, quick, and the shortcoming is that the design is more troublesome, needs a plane to conveniently paste or place the two-dimensional code in practical application, and this has the limitation in the in-service use.

The method for positioning the non-mark points is to position the targets through the characteristics of the targets, mainly adopts a bag-of-words method and a DL method, wherein the bag-of-words method is to make target characteristic points into bags of words, search the bags of words in the positioning process to find matching points, and finally obtain the orientation of the targets through the mapping from 3D to 2D of coordinates of the matching points. The DL method takes pictures of different orientations of the target and regresses R and T by a machine learning method. The target positioning of the no-mark point scheme is convenient and direct, a mark object does not need to be marked on the target, the attractiveness of the target is not affected, the characteristics of the target are fully utilized, but the characteristics of the target are not a perfect characteristic description method and are easily affected by illumination, angles and distances, so that the no-mark point scheme is unstable and has low precision, a learning process exists, and the complexity is higher than that of the no-mark point scheme.

Disclosure of Invention

In view of this, embodiments of the present application provide a pattern, a method, a device, and an apparatus for monocular vision target positioning, so as to solve the problems of low precision and complex implementation in target positioning in the prior art.

The first aspect of the embodiment of the application provides a monocular vision target positioning pattern, the pattern comprises 6 circles with specified color values, and the circle centers of the 6 circles are sequentially distributed at the central positions of the 2 nd, 5 th, 6 th, 7 th, 8 th and 9 th lattices in the nine-square lattices with preset sizes.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the RGB values of the specified color value are 255, and 255.

A second aspect of embodiments of the present application provides a monocular vision target positioning method based on the monocular vision target positioning pattern of the first aspect, the method including:

acquiring a monocular visual color image, and extracting a contour included in the color image;

filtering the outlines through the ellipse parameters and the distance between the outlines to obtain a first set of ellipse central points;

filtering the elliptic central points in the first set according to whether the number of parallelograms formed by the central points in the first set of the elliptic central points and other adjacent central points meets a preset range or not to obtain a second set with 6 elliptic central points;

and calculating a rotation matrix and a translation matrix through a preset perspective projection equation.

With reference to the second aspect, in a first possible implementation manner of the second aspect, before the step of calculating the rotation matrix and the translation matrix by the predetermined perspective projection equation, the method further includes:

and acquiring three collinear ellipse central points in the second set of the ellipse central points, and determining whether the number of parallelograms formed by the three collinear ellipse central points and the adjacent central points is respectively 1, 3 and 2.

With reference to the second aspect, in a second possible implementation manner of the second aspect, the step of calculating a rotation matrix and a translation matrix by using a preset perspective projection equation includes:

by perspective projection equation:

calculating a rotation matrix R and a translation matrix T, wherein: s is a scaling factor, physical scale of single pixelCun as dX and dY, f as focal length, α_x＝f/dX，α_y(u, v) are image coordinates (X)_W，Y_W，Z_W) For the corresponding spatial physical coordinates, (u0, v0) is the intersection point of the optical axis and the image plane, M1 × M2, M1 is the camera parameter matrix, M2 is the camera parameter matrix, PW is the matrix

With reference to the second aspect, in a third possible implementation manner of the second aspect, the step of extracting the contours included in the color image includes:

graying the color image, and performing binarization processing on the gray image through an automatic threshold algorithm to obtain a binarized image;

determining the contour included in the acquired binary image.

With reference to the second aspect, in a fourth possible implementation manner of the second aspect, the step of filtering the contours by the ellipse parameters and the distances between the contours to obtain the first set of ellipse center points includes:

carrying out ellipse fitting on the outline to obtain one or more ellipse parameters in the perimeter, the area, the inclination angle, the length of the long axis and the length of the short axis of the fitted ellipse;

clustering the ellipse parameters to obtain a set of similar contours;

and filtering the ellipses of which the distances between the center points of the ellipses and the center points of the adjacent ellipses do not accord with a preset distance threshold value to obtain a first set of the center points of the ellipses.

A third aspect of embodiments of the present application provides a monocular vision target positioning device based on the monocular vision target positioning pattern of the first aspect, the device comprising:

the contour extraction unit is used for acquiring a monocular visual color image and extracting contours included in the color image;

the first filtering unit is used for filtering the outlines through the ellipse parameters and the distance between the outlines to obtain a first set of ellipse central points;

the second filtering unit is used for filtering the elliptic central points in the first set according to whether the number of parallelograms formed by the central points in the first set of the elliptic central points and other adjacent central points meets a preset range or not to obtain a second set with 6 elliptic central points;

and the calculation unit is used for calculating a rotation matrix and a translation matrix through a preset perspective projection equation.

A fourth aspect of embodiments of the present application provides an object localization device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the monocular visual object localization method according to any one of the second aspects.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the monocular visual target positioning method according to any one of the second aspects.

Compared with the prior art, the embodiment of the application has the advantages that: this application is through setting for the pattern of target location, just the pattern sets for the circle of 6 appointed colour values in the 2 nd, 5, 6, 7, 8, 9 check central point of nine palace check, when the target location of monocular vision, can filter according to the profile that includes in the image that obtains and obtain the first set that constitutes by the ellipse central point, then filter the central point through the parallelogram's that central point and its adjacent central point constitute number, when obtaining the second set of 6 ellipse central points, rotation matrix and balanced matrix are calculated to the perspective projection equation of predetermineeing. The target positioning is carried out through the preset target positioning pattern comprising the characteristics of the preset number of parallelograms and the number of the parallelograms met by the central point of the ellipse in the detected image, and the monocular vision target positioning can be accurately and effectively completed.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic diagram of a monocular visual target positioning pattern provided by an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of a monocular vision based target positioning method of a monocular vision based target positioning pattern according to an embodiment of the present application;

FIG. 3 is a schematic view of a monocular visual target positioning device provided by an embodiment of the present application;

fig. 4 is a schematic diagram of an object locating apparatus provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Fig. 1 is a schematic diagram of a monocular vision target positioning pattern provided in an embodiment of the present application. As shown in fig. 1, the pattern includes circles of specified sizes of 6 specified color values, and centers of the 6 circles are sequentially distributed at center positions of 2 nd, 5 th, 6 th, 7 th, 8 th and 9 th grids of a nine-square grid of a predetermined size.

As shown in fig. 1, the radius of the circle can be flexibly set according to the size of the scene to be located. The distance between the center points of two adjacent circles may be twice the radius. The color of the circle may be a designated RGB color value, for example, 255 × 255, and of course, may also be flexibly set according to an environmental parameter of the scene, for example, a brighter color value may be selected along with the darkening of the scene, and a darker color value may be selected along with the lightening of the scene. When the position of the nine-square grid is named from top to bottom and from left to right in sequence, the 6 circles are distributed at the positions of the 2 nd, 5 th, 6 th, 7 th, 8 th and 9 th grids of the nine-square grid respectively. If the direction from the 5 th grid to the 6 th grid is taken as the X-axis direction, the direction from the 5 th grid to the 2 nd grid is taken as the Y-axis direction, and the cross product of the X-axis and the Y-axis is taken as the Z-axis direction, then the coordinates of the target positioning pattern in the application are respectively as follows: a first circle (0,1,0), a second circle (0,0,0), a third circle (1,0,0), a fourth circle (-1, -1,0), a fifth circle (0, -1,0), a sixth circle (1, -1, 0).

Fig. 2 is a schematic diagram illustrating a monocular vision target positioning method and an implementation flow provided in an embodiment of the present application, and as shown in fig. 2, the method includes:

in step S201, a monocular-vision color image is acquired, and a contour included in the color image is extracted;

specifically, a monocular-vision color image, such as an RGB image, may be acquired by a monocular camera. The acquired color image is subjected to graying processing, and then the grayscale image is subjected to binarization processing through an automatic threshold value method, namely the size of the threshold value is automatically adjusted according to the illumination intensity so as to perform binarization processing more accurately. Of course, the binarization processing based on the automatic threshold is only one of the preferred embodiments, and may also include binarization processing of a global threshold, for example. After binarization processing, the contours included in the color image can be obtained. The contour after the binarization process may include other patterns than the target positioning pattern, and therefore, it is necessary to further perform a filtering process to obtain a target positioning pattern.

In step S202, the contours are filtered according to the ellipse parameters and the distances between the contours to obtain a first set of ellipse center points;

after the contours are obtained through binarization processing, the contours can be filtered according to the ellipse parameters and the distance between the contours. When filtering is performed through the ellipse parameters, the method specifically includes:

and carrying out ellipse fitting processing on the outline to obtain an ellipse after each outline is fitted, and acquiring ellipse parameters corresponding to the fitted ellipse, wherein the ellipse parameters can comprise one or more of parameters such as perimeter, area of the ellipse, inclination angle of the ellipse, major axis length of the ellipse, minor axis length of the ellipse and the like.

Of course, the ellipse may be filtered out by further screening whether the perimeter and the area of the fitted ellipse belong to a predetermined range.

After the parameters of the fitted ellipse are obtained, clustering processing can be performed on the ellipses with similar parameters, so that a set with similar contours is obtained. I.e., the set of similar contours, the contours fitting ellipses having one or more of a similar perimeter, a similar area, a similar major axis, a similar minor axis, or a similar inclination.

After a set with similar contours is obtained, further screening can be performed by the distance between the contours. Of course, the step of performing the filtering by the distance of the outline may be performed before the step of performing the filtering by the ellipse parameter, and the amount of calculation of the ellipse fitting may be reduced for a scene having a high similarity of the size of the outline.

The distance between the outlines, that is, the distance between the fitted adjacent ellipses, may be represented as the distance between the center points of the fitted ellipses. Generally, ellipses with a closer distance may be clustered by the distance threshold T0 (2 × d × r)/a, i.e., ellipses with distances between ellipses greater than T0 are filtered out. Wherein d is a distance between preset circles of the pattern for target positioning, r is a radius of the preset circle of the pattern for target positioning, and a is a length of a major axis of the ellipse to be calculated.

In step S203, filtering the ellipse center points in the first set according to whether the number of parallelograms formed by the center points in the first set of the ellipse center points and other adjacent center points thereof satisfies a predetermined range, to obtain a second set with 6 ellipse center points;

after filtering through the ellipse parameters and the ellipse distance, the obtained ellipse can be judged whether to be a pattern for target positioning, and the judgment method can filter the ellipse center point in the first set by judging whether the number of parallelograms formed by the center point in the first set of the ellipse center point and other adjacent center points meets a preset range, and specifically can be:

Specifically, for the target positioning pattern shown in fig. 1, the center point of circle 1 and the center points of circles 2, 3, and 6 form a parallelogram, the center point of circle 2 and the center points of circles 1, 3, and 6, the center points of circles 3, 4, and 5, and the center points of circles 3, 5, and 6 form 3 parallelograms, and so on, so that the number of parallelograms including the center points of circles 1, 2, 3, 4, 5, and 6 is 1, 3, 1, 2, and 2 in turn. Therefore, whether the circle needs to be filtered can be judged by the number of parallelograms formed by the central point of the ellipse in the first set and the central points of the other circles adjacent to the central point.

For example, in the present application, if the number of parallelograms formed by the center point of the circle and the center point of the circle adjacent to the circle is more than 3, or less than 1, the circle is filtered, so that 6 ellipses meeting the requirement are obtained.

Of course, as a preferred embodiment of the present invention, after obtaining 6 ellipses satisfying the requirement, the following screening may be further performed:

If the number of the parallelogram formed by the center points of the collinear three ellipses and the adjacent center points can meet the requirements of 1, 3 and 2 respectively (can be in positive sequence or negative sequence), the 6 ellipse center points can be sequenced, preferably in positive sequence, a coordinate system is established by taking the point with the sequence number "2" in fig. 1 as the center of a circle, and corresponding coordinates are marked on each ellipse center point, for example: the point coordinates from sequence number "1" to sequence number "6" are in order: (0,1,0), (0,0,0), (1,0,0), (-1, -1,0), (0, -1,0), (1, -1,0), and then the 2D images of the 6 ellipse center points are mapped with the 3D position of each corresponding point in the coordinate system, thereby realizing perspective projection calculation. And when the requirements are not met, removing the 6 ellipses by screening.

In step S204, a rotation matrix and a translation matrix are calculated by preset perspective projection equations.

When the calculation is performed by the perspective projection equation, the following formula can be adopted:

calculating a rotation matrix R and a translation matrix T, wherein s is a scaling coefficient, the physical sizes of the single pixels are dX and dY, f is a focal length, α_x＝f/dX，α_y(u, v) are image coordinates (X)_W，Y_W，Z_W) For corresponding space physical coordinates, (u0, v0) is the intersection point of the optical axis and the image plane, or called the image center, M1M 2, M1 is the camera parameter matrix, M2 is the camera parameter matrix, PW is the matrix

And performing corresponding rotation and translation transformation on the calculated rotation matrix and translation matrix so as to adjust the pose of the monocular camera and realize effective positioning on the target.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 3 is a schematic structural diagram of a monocular vision target positioning device based on a monocular vision target positioning pattern according to an embodiment of the present application, which is detailed as follows:

the monocular visual target positioning device comprises:

an outline extraction unit 301 configured to acquire a monocular-vision color image and extract an outline included in the color image;

a first filtering unit 302, configured to filter the profiles according to the ellipse parameters and the distances between the profiles, so as to obtain a first set of ellipse center points;

the second filtering unit 303 is configured to filter the ellipse center points in the first set according to whether the number of parallelograms formed by the center point in the first set of the ellipse center points and other adjacent center points thereof satisfies a predetermined range, so as to obtain a second set with 6 ellipse center points;

a calculating unit 304, configured to calculate a rotation matrix and a translation matrix through a preset perspective projection equation.

The apparatus for monocular visual target positioning shown in fig. 3 corresponds to the method for monocular visual target positioning shown in fig. 2.

FIG. 4 is a schematic diagram of an object locating device according to an embodiment of the present application. As shown in fig. 4, the object positioning device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a monocular visual targeting program, stored in said memory 41 and executable on said processor 40. The processor 40, when executing the computer program 42, implements the steps in the above-described embodiments of the monocular visual object localization method, such as the steps 201 to 204 shown in fig. 2. Alternatively, the processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 301 to 304 shown in fig. 3.

Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 42 in the object localization device 4. For example, the computer program 42 may be divided such that the specific functions of the units are as follows:

The target positioning device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The target positioning device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that FIG. 4 is merely an example of the object-locating device 4 and is not intended to be limiting with respect to the object-locating device 4, and may include more or less components than shown, or some components in combination, or different components, e.g., the object-locating device may also include input-output devices, network access devices, buses, etc.

The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the target positioning device 4, such as a hard disk or a memory of the target positioning device 4. The memory 41 may also be an external storage device of the target positioning device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the target positioning device 4. Further, the memory 41 may also comprise both an internal storage unit of the object localization device 4 and an external storage device. The memory 41 is used for storing the computer program and other programs and data required by the object localization device. The memory 41 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. The pattern for monocular vision target positioning is characterized by comprising 6 circles with specified color values and specified sizes, wherein the centers of the 6 circles are sequentially distributed at the central positions of the No. 2 grid, the No. 5 grid, the No. 6 grid, the No. 7 grid, the No. 8 grid and the No. 9 grid in the Sudoku grids with preset sizes.

2. The monocular visual targeting pattern of claim 1, wherein the RGB values of the assigned color values are 255, 255.

3. A monocular visual target positioning method based on the monocular visual target positioning pattern of claim 1 or 2, the method comprising:

4. The monocular visual target positioning method of claim 3, wherein prior to the step of calculating a rotation matrix and a translation matrix by a predetermined perspective projection equation, the method further comprises:

5. The monocular visual target positioning method of claim 3, wherein the step of calculating a rotation matrix and a translation matrix by a preset perspective projection equation comprises:

by perspective projection equation:

calculating a rotation matrix R and a translation matrix T, wherein s is a scaling coefficient, the physical sizes of the single pixels are dX and dY, f is a focal length, α_x＝f/dX，α_y(u, v) are image coordinates (X)_W，Y_W，Z_W) For the corresponding spatial physical coordinates, (u0, v0) is the intersection point of the optical axis and the image plane, M1 × M2, M1 is the camera parameter matrix, M2 is the camera parameter matrix, PW is the matrix

6. The monocular visual target positioning method of claim 3, wherein the step of extracting the contour included in the color image comprises:

determining the contour included in the acquired binary image.

7. The monocular visual target positioning method of claim 3, wherein the step of filtering the outlines according to the ellipse parameters and the distance between the outlines to obtain the first set of ellipse center points comprises:

clustering the ellipse parameters to obtain a set of similar contours;

8. A monocular visual target positioning device based on the monocular visual target positioning pattern of claim 1 or 2, the device comprising:

9. An object localization device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, carries out the steps of the monocular visual object localization method of any one of claims 3 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a monocular visual target positioning method according to any one of claims 3 to 7.