CN114299230A

CN114299230A - Data generation method and device, electronic equipment and storage medium

Info

Publication number: CN114299230A
Application number: CN202111570104.2A
Authority: CN
Inventors: 丁华杰; 朱元豪
Original assignee: China Automotive Innovation Co Ltd
Current assignee: China Automotive Innovation Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-04-08

Abstract

The invention relates to the technical field of visual reconstruction, and discloses a data generation method, a data generation device, electronic equipment and a storage medium. The data generation method can generate a two-dimensional image by directly constructing a three-dimensional model of a target object with a contour line, replacing the three-dimensional model with an object in a three-dimensional image generated based on an acquired image, and projecting the updated three-dimensional image, wherein the two-dimensional image comprises an object identifier. The data generation method provided by the application has the characteristics of high marking accuracy and low generation cost.

Description

Data generation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of visual reconstruction technologies, and in particular, to a data generation method and apparatus, an electronic device, and a storage medium.

Background

Typically, when training an automated driving model, the data is divided into a training set and a test set. The training set is used for fitting the model, and the model is trained through the set parameters. The test set is used for predicting the data of the verification set by using each model and recording the accuracy of the model in order to find out the model with the best effect after a plurality of models are trained by the training set; and selecting the parameters corresponding to the model with the best effect, namely adjusting the model parameters.

At present, a data set of an automatic driving model is mainly formed by manual labeling after collected pictures or videos are based on the collected pictures or videos; however, the cost of labeling data is high, and generally, if an automatic driving model needs to be trained sufficiently, tens of millions of costs need to be invested for manual labeling.

The other is a solution for marking by utilizing a very deep neural network model instead of manpower; however, such a solution has two drawbacks: 1) the marking accuracy of the deep learning model is still lower than that of manual work, especially on the processing of difficult examples; 2) data originates from the real world, and some marginal scenes (i.e., rare scenes) are difficult to appear in a data collection.

Disclosure of Invention

The invention aims to solve the technical problems of high labeling cost and low accuracy in the prior art.

To solve the above technical problem, the present application discloses in one aspect a data generating method, including:

acquiring an image of a driving scene;

performing three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of the plurality of feature points; the plurality of feature points belong to the three-dimensional image;

determining a first target object in the three-dimensional image; the first target object comprises a vehicle and/or a person;

generating a second target object based on the first target object, wherein the second target object is a three-dimensional model with a contour line;

replacing the first target object in the three-dimensional image with the second target object based on the position coordinate set, and generating an updated three-dimensional image;

the updated three-dimensional image is projected into a two-dimensional image, which contains the object identification.

Optionally, the generating a second target object based on the first target object includes:

determining identification information of the first target object;

determining a target three-dimensional model from the gallery based on the identification information;

and carrying out outline frame adding operation on the target three-dimensional model to obtain the second target object.

Optionally, the replacing the first target object in the three-dimensional image with the second target object based on the position coordinate set, and generating an updated three-dimensional image includes:

determining the position coordinates of the first target object from the three-dimensional image based on the position coordinate set;

removing a first target object in the three-dimensional image to generate a target three-dimensional image;

determining the position coordinates of the first target object as the position coordinates of the second target object;

and adding the second target object into the target three-dimensional image based on the position coordinate of the second target object to obtain the updated three-dimensional image.

Optionally, the acquiring the image of the driving scene includes:

acquiring an image of the driving scene by using a binocular camera; the binocular camera comprises a first camera and a second camera;

the image comprises a first image captured by the first camera and a second image captured by the second camera; the first image and the second image are acquired at the same time.

Optionally, the performing a three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set includes:

determining a first data set based on the first image; the first data set includes pixel coordinates of each of a plurality of first feature points belonging to the first image;

determining a second data set based on the second image, wherein the second data set comprises pixel coordinates of each second feature point in a plurality of second feature points, and the second feature points belong to the second image;

determining internal and external parameters of the first camera and internal and external parameters of the second camera; the internal parameters of the first camera are equal to the internal parameters of the second camera;

determining the set of position coordinates based on the internal and external parameters of the first camera, the internal and external parameters of the second camera, the first data set and the second data set;

and performing three-dimensional reconstruction operation on the image based on the position coordinate set to obtain the three-dimensional image.

Optionally, the determining the position coordinate set based on the internal and external parameters of the first camera, the internal and external parameters of the second camera, the first data set, and the second data set includes:

determining epipolar constraint conditions according to the internal and external parameters of the first camera, the internal and external parameters of the second camera, the first data set and the second data set;

performing feature matching on the first data set and the second data set to obtain a feature point pair set; the characteristic point pair set comprises a plurality of characteristic point pairs; each of the plurality of pairs of feature points includes the corresponding first feature point and the second feature point;

screening out a target characteristic point pair set from the characteristic point pair set based on the epipolar constraint condition;

determining a target characteristic point set from the first data set based on the target characteristic point pair set; the set of target feature points includes pixel coordinates of each of a plurality of target first feature points;

determining the target three-dimensional coordinate set according to the target feature point set and the internal and external parameters of the first camera, wherein the target three-dimensional coordinate set comprises the three-dimensional coordinates of the first feature points of each target;

the target three-dimensional coordinate set is determined as the position coordinate set.

Optionally, the acquiring the image of the driving scene includes:

acquiring an image of a driving scene by using acquisition equipment;

the three-dimensional reconstruction operation is performed on the image to obtain a three-dimensional image and a position coordinate set, and the method comprises the following steps:

acquiring the position information of the acquisition equipment;

determining a point cloud data set corresponding to the image from a high-precision map based on the position information of the acquisition equipment and the image;

the three-dimensional image and the set of location coordinates are generated based on the point cloud dataset.

The present application also discloses in another aspect a data generating apparatus, comprising:

the acquisition module is used for acquiring an image of a driving scene;

the reconstruction module is used for performing three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of the plurality of feature points; the plurality of feature points belong to the three-dimensional image;

a determining module for determining a first target object in the three-dimensional image; the first target object comprises a vehicle and/or a person;

the target object generating module is used for generating a second target object based on the first target object, and the second target object is a three-dimensional model with a contour line;

a three-dimensional image generation module, configured to replace the first target object in the three-dimensional image with the second target object based on the position coordinate set, and generate an updated three-dimensional image;

and the projection module is used for projecting the updated three-dimensional image into a two-dimensional image, and the two-dimensional image contains an object identifier.

The present application also discloses in another aspect an electronic device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the data generating method described above.

The present application also discloses a computer storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the data generating method.

By adopting the technical scheme, the data generation method provided by the application has the following beneficial effects:

the method comprises the steps of obtaining an image of a driving scene, and performing three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of the plurality of feature points; the plurality of feature points belong to the three-dimensional image; determining a first target object in the three-dimensional image; the first target object comprises a vehicle and/or a person; subsequently generating a second target object based on the first target object, wherein the second target object is a three-dimensional model with a contour line; replacing the first target object in the three-dimensional image with the second target object based on the position coordinate set, and generating an updated three-dimensional image; the updated three-dimensional image is projected into a two-dimensional image, which contains the object identification. The data generation method provided by the application can generate the two-dimensional image by directly constructing the three-dimensional model of the target object with the contour line, replacing the object in the original three-dimensional image with the three-dimensional model and projecting the object, so that the generated two-dimensional image has the advantages of high graphic definition, high annotation accuracy and low generation cost.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram of an alternative application scenario of the present application;

FIG. 2 is a schematic flow chart diagram of a first alternative data generation method of the present application;

FIG. 3 is a schematic illustration of an alternative three-dimensional image provided herein;

FIG. 4 is a schematic flow chart diagram of a second alternative data generation method of the present application;

FIG. 5 is a schematic flow chart diagram of a third alternative data generation method of the present application;

fig. 6 is an alternative binocular camera model provided herein;

FIG. 7 is a schematic illustration of an alternative updated three-dimensional image provided by the present application;

FIG. 8 is a two-dimensional image labeled in the prior art;

FIG. 9 is a model of an alternative three-dimensional image of the present application projected as a two-dimensional image;

FIG. 10 is a schematic diagram of an alternative process for converting three-dimensional coordinates to pixel coordinates provided herein;

FIG. 11 is a schematic diagram of an alternative data generating device according to the present application;

fig. 12 is a block diagram of a hardware structure of a server according to an alternative data generation method of the present application.

The following is a supplementary description of the drawings:

10-a terminal; 20-data generation module.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, fig. 1 is an alternative application scenario diagram of the present application. The scene comprises a terminal 10 and a data generation module 20 positioned on the terminal 10, wherein the data generation module 20 is used for acquiring an image of a driving scene and performing three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of the plurality of feature points; the plurality of feature points belong to the three-dimensional image; determining a first target object in the three-dimensional image; the first target object comprises a vehicle and/or a person; subsequently generating a second target object based on the first target object, wherein the second target object is a three-dimensional model with a contour line; replacing the first target object in the three-dimensional image with the second target object based on the position coordinate set, and generating an updated three-dimensional image; the updated three-dimensional image is projected into a two-dimensional image, which contains the object identification. Therefore, the generated two-dimensional image has the advantages of high graphic definition, high labeling accuracy and low generation cost.

Optionally, the terminal may be a desktop computer, a notebook computer, a mobile phone, a tablet computer, a digital assistant, an intelligent wearable device, or other types of entity devices; wherein, wearable equipment of intelligence can include intelligent bracelet, intelligent wrist-watch, intelligent glasses, intelligent helmet etc..

The terminal may include a display screen, a storage device, and a processor connected by a data bus. The display screen is used for virtual images of the equipment to be monitored and connection relations among all sub-equipment in the equipment to be monitored, and the display screen can be a touch screen of a mobile phone or a tablet computer and the like. The storage device is used for storing program codes, data and data of the shooting device, and the storage device may be a memory of the terminal, and may also be a storage device such as a smart media card (smart media card), a secure digital card (secure digital card), and a flash memory card (flash card). The processor may be a single core or multi-core processor.

While specific embodiments of a data generation method of the present application are described below, fig. 2 is a schematic flow chart of a first alternative data generation method of the present application, and the present specification provides the method operation steps as in the embodiments or the flow chart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:

s201: an image of a driving scene is acquired.

In one possible implementation, step S201 can be specifically described as: and acquiring an image of the driving scene by using the acquisition equipment.

Optionally, the acquisition device includes a high definition camera, a Global Positioning System (GPS), an Inertial Measurement Unit (IMU), a camera, a laser radar, an ultrasonic radar, and a millimeter-wave radar.

The acquisition equipment is arranged on an acquisition vehicle, the acquisition vehicle is also required to be simultaneously provided with data acquisition devices such as an industrial personal computer, a video card and an Ethernet, and some acquisition vehicles with real-time uploading function can also carry a remote information processor (telematics BOX, T-box).

Optionally, the camera includes a monocular camera and a binocular camera.

It should be noted that, in the embodiment of the present application, the collection vehicle may set a corresponding collection device according to a subsequent data type.

In one possible implementation, step S201 can be specifically described as: acquiring an image of the driving scene by using a binocular camera; the binocular camera comprises a first camera and a second camera, the image comprising a first image captured by the first camera and a second image captured by the second camera; the first image and the second image are acquired at the same time. So that a three-dimensional reconstruction can subsequently be realized on the basis of the first image and the second image.

S202: performing three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of the plurality of feature points; the plurality of feature points belong to the three-dimensional image.

Optionally, referring to fig. 3, fig. 3 is a schematic diagram of an alternative three-dimensional image provided in the present application.

There are multiple ways of performing three-dimensional reconstruction based on images in step S202, such as binocular reconstruction, monocular reconstruction, and three-dimensional reconstruction based on high-precision maps; the monocular reconstruction is a three-dimensional reconstruction performed by using a series of images in a time domain, and is also called dynamic stereovision. According to different reconstruction instantaneity, monocular reconstruction can be divided into off-line reconstruction and on-line reconstruction; offline reconstruction includes the motion recovery from motion (SFM) technique, which reconstructs a three-dimensional environment from successive images acquired over a period of time.

The online reconstruction can be classified into a progressive reconstruction and a direct reconstruction. The progressive reconstruction utilizes three-dimensional information before the images at the next moment are fused continuously, is similar to the thought of Kalman filtering, and is actually depth reconstruction, so the progressive reconstruction is also called depth filtering. The direct reconstruction utilizes images at a plurality of moments (generally a plurality of frames to dozens of frames) to complete the three-dimensional reconstruction of the same scene at one time. Direct reconstruction is also called depth fusion, is somewhat similar to the SFM, and is different from the SFM in that fewer images are involved in calculation and the real-time performance is higher.

Optionally, in the embodiment of the present application, explanation is mainly performed based on binocular reconstruction and three-dimensional reconstruction of a high-precision map. In one possible implementation, referring to fig. 4, fig. 4 is a schematic flow chart of a second alternative data generation method according to the present application. Step S202 may be specifically stated as:

s401: determining a first data set based on the first image; the first data set includes pixel coordinates of each of a plurality of first feature points belonging to the first image.

S402: a second data set is determined based on the second image, the second data set including pixel coordinates of each of a plurality of second feature points belonging to the second image.

S403: determining internal and external parameters of the first camera and internal and external parameters of the second camera; the internal parameters of the first camera are equal to the internal parameters of the second camera.

Optionally, in this embodiment of the application, when the first camera and the second camera are used for image acquisition, the first camera and the second camera may be calibrated first, and a camera calibration method of Zhang Zhen friend may be adopted for calibration, so as to obtain the internal reference and the external reference of the first camera and the internal reference and the external reference of the second camera.

S404: the set of position coordinates is determined based on the internal and external parameters of the first camera, the internal and external parameters of the second camera, the first data set, and the second data set.

In one possible implementation, referring to fig. 5, fig. 5 is a schematic flow chart of a third optional data generation method according to the present application. Step S304 may be specifically stated as:

s501: determining epipolar constraint conditions according to the internal and external parameters of the first camera, the internal and external parameters of the second camera, the first data set and the second data set;

optionally, the limit constraint may be:

wherein, x'₁＝K^-1x₁，x'₂＝K^-1x₂，x₁Pixel coordinates of a first feature point in the first data set; x is the number of₂Pixel coordinates of a second feature point in a second data set;

is of to x'₂And T₂Making a three-dimensional vector obtained by outer product, wherein the three-dimensional vector is perpendicular to x'₂And T₂；R₂Is a rotation parameter in the external parameters of the second camera; t is₂Being a second cameraTranslation parameters in the external parameters.

The determination process to obtain the above formula (1) will be described below:

referring to fig. 6, fig. 6 is an alternative binocular camera model provided by the present application. Wherein O1 and O2 are the center of the first camera and the center of the second camera, respectively; the gray areas in the figure are the imaging plane of the first camera and the imaging plane of the second camera; wherein x₁A pixel point corresponding to an object point P in the space on an imaging plane of the first camera; the three-dimensional coordinate of P is (X, Y, Z); for the same reason, x₁A pixel point corresponding to an object point P in the space on an imaging plane of the first camera; setting the vertical distances from X to the two camera image surfaces as s1 and s2 respectively; let the internal parameters of the two cameras be K and the external parameter matrix of the first camera be [ R ]₁ T₁]The external reference matrix of the second camera is [ R ]₂ T₂]Then, the following formula (2) and formula (3) can be obtained:

s₁x₁＝K(R₁X+T₁) … … formula (2)

s₂x₂＝K(R₂X+T₂) … … formula (3)

Since K is an invertible matrix, multiplying formula (2) and formula (3) by the inverse of K, respectively, can yield formula (4) and formula (5):

s₁K^-1x₁＝R₁X+T₁… … formula (4)

s₂K^-1x₂＝R₂X+T₂… … formula (5)

X'₁＝K^-1x₁，x'₂＝K^-1x₂Then equation (4) and equation (5) can be converted to equation (6) and equation (7), respectively:

s₁x₁'＝R₁X+T₁… … formula (6)

s₂x₂'＝R₂X+T₂… … formula (7)

Since the world coordinate system can be arbitrarily selected, the world coordinate system can be selected as the camera coordinate system of the first camera, when R is the same as R₁＝1，T ₁0. Based on equation (6), the following equation is obtained:

s₁x₁' X … … formula (8)

Substituting equation (8) into equation (7) yields:

s₂x₂'＝s₁R₂x₁'+T₂… … formula (9)

Due to x'₂And T₂All are three-dimensional vectors, and another three-dimensional vector is obtained after performing outer product (cross product) on the three-dimensional vectors

The three-dimensional vector is perpendicular to x'₂And T₂Is reused again

The above equation (1) can be obtained by performing inner products on both sides of equation (9).

S502: performing feature matching on the first data set and the second data set to obtain a feature point pair set; the characteristic point pair set comprises a plurality of characteristic point pairs; each of the plurality of pairs of feature points includes the corresponding first feature point and the second feature point.

S503: and screening out a target characteristic point pair set from the characteristic point pair set based on the epipolar constraint condition.

That is, the pixel coordinate of the first feature point in the feature point pair is substituted into the above formula (1) to determine the pixel coordinate of the target second feature point corresponding to the first feature point, if the deviation between the pixel coordinate of the target second feature point and the second feature point in the feature point pair is less than or equal to the preset threshold, it may be determined that the matching is successful, and the feature point pair is the target feature point pair, otherwise, the feature point pair is the non-target feature point pair.

S504: a set of target feature points is determined from the first data set based on the set of target feature point pairs. The set of target feature points includes pixel coordinates of each of the plurality of target first feature points.

S505: and determining the target three-dimensional coordinate set according to the target feature point set and the internal and external parameters of the first camera, wherein the target three-dimensional coordinate set comprises the three-dimensional coordinates of the first feature points of each target.

It should be noted that, in this embodiment of the present application, the three-dimensional coordinates involved may be three-dimensional coordinates in a world coordinate system of the acquisition device, and then step S505 may be specifically expressed as: a first coordinate conversion rule may be determined based on the first camera extrinsic parameters, based on which the pixel coordinates of each target first feature point in the set of target feature points may be converted into three-dimensional coordinates of the target first feature point.

Optionally, the three-dimensional coordinate may also be a three-dimensional coordinate in an earth coordinate system, and step S505 may be specifically expressed as: the method comprises the steps of determining a first coordinate conversion rule based on internal and external parameters of a first camera, converting pixel coordinates of each target first feature point in a target feature point set into three-dimensional coordinates of the target first feature point under a world coordinate system of acquisition equipment based on the first coordinate conversion rule, determining relative position information of the target first feature point and the acquisition equipment based on the three-dimensional coordinates of the target first feature point under the world coordinate system of the acquisition equipment, acquiring three-dimensional coordinates (such as a GPS) of the acquisition equipment under a terrestrial coordinate system, determining the three-dimensional coordinates of the target first feature point under the terrestrial coordinate system based on the three-dimensional coordinates of the acquisition equipment under the terrestrial coordinate system and the relative position information of the target first feature point and the acquisition equipment, and obtaining the target three-dimensional coordinate set.

Alternatively, the GPS information of the collection device may be based on first acquiring the three-dimensional coordinates of the collection vehicle in the terrestrial coordinate system and the relative position information between the center of the collection vehicle and the collection device, and determining the three-dimensional coordinates of the collection device in the terrestrial coordinate system based on the three-dimensional coordinates of the collection vehicle in the terrestrial coordinate system and the relative position information therebetween, that is, the GPS information of the collection device, which may also be referred to as the position information of the collection device hereinafter.

S506: the target three-dimensional coordinate set is determined as the position coordinate set.

S405: and performing three-dimensional reconstruction operation on the image based on the position coordinate set to obtain the three-dimensional image.

It should be noted that, if the three-dimensional coordinates of each feature point in the position coordinate set in step S405 are coordinates in the world coordinate system of the acquisition device, step S405 may be specifically described as: determining relative position information of a first feature point of a target and the acquisition equipment based on a three-dimensional coordinate of each feature point in a world coordinate system of the acquisition equipment, acquiring a three-dimensional coordinate (such as a GPS) of the acquisition equipment in a terrestrial coordinate system, and determining a three-dimensional coordinate of each feature point in the terrestrial coordinate system based on the three-dimensional coordinate of the acquisition equipment in the terrestrial coordinate system and the relative position information of each feature point and the acquisition equipment to obtain a target position coordinate set; and performing three-dimensional reconstruction operation on the image based on the target position coordinate set to obtain the three-dimensional image.

Through further analysis of the above formula (1), the formula (1) is a relation satisfied by images of the same point P in two cameras, and it has no relation to the spatial coordinates of the point P and the distance from the point to the camera, i.e. the formula (1) is called an epipolar constraint, and the matrix therein is called an intrinsic matrix for the two cameras. It follows that if we know a number of corresponding points (e.g. at least 5 pairs) in both images, the matrix can be solved by the above equation, which is represented by R₂And T₂Constructed so that R can be solved₂And T₂Subsequently, the feature point pairs may be screened based on the above formula (1), and may be based on the K, R₂And T₂The pixel coordinates of the target characteristic points are converted into three-dimensional coordinates of the target first characteristic points, and a position coordinate set is obtained; in another alternative embodiment, the steps S303 to S304 may be specifically described as: determining a limit constraint based on the first data set and the second data set; determining an external parameter of the second camera based on the limit constraint condition; performing feature matching on the first data set and the second data set to obtain a feature point pair set; the characteristic point pair set comprises a plurality of characteristic point pairs; each of the plurality of pairs of feature points includes the corresponding first feature point and the second feature point; screening out target features from the feature point pair set based on the epipolar constraint conditionA token pair set; a set of target feature points is determined from the second data set based on the set of target feature point pairs. The set of target feature points includes pixel coordinates of each of a plurality of target second feature points; acquiring internal parameters of a second camera; determining the target three-dimensional coordinate set according to the target feature point set and the internal reference and the external reference of the second camera, wherein the target three-dimensional coordinate set comprises the three-dimensional coordinates of each target second feature point; the target three-dimensional coordinate set is determined as the position coordinate set. The external parameter of the second camera can be solved reversely because the external parameter can be directly based on the known second data set and the plurality of characteristic point pairs in the first data set, and the pixel coordinate of the target second characteristic point can be converted into the three-dimensional coordinate under the world coordinate system as long as the internal parameter of the second camera is determined; the steps of three-dimensional coordinate conversion are simplified, and the data processing efficiency is improved.

Optionally, an OpenCV technology may be used to obtain a coordinate set corresponding to the position coordinate set in the 3D environment; and thus three-dimensional reconstruction can be performed.

Optionally, when the feature points of the first image and the second image are matched to form the feature point pair set, when the difference between the first image and the second image is large, a Scale-invariant feature transform (SIFT) mode may be used, so as to ensure that the accuracy of the matched feature point pair is high, but the method has a disadvantage of low processing efficiency. Other faster Features may be employed when the first image and the second image differ less, such as Speeded Up Robust Features (SURF), directional Fast rotation (ORB) algorithms, and so on.

In order to further improve the data processing efficiency, in another possible implementation, step S202 may be specifically described as: acquiring the position information of the acquisition equipment, determining a point cloud data set corresponding to the image from a high-precision map based on the position information of the acquisition equipment and the image, and generating the three-dimensional image and the position coordinate set based on the point cloud data set.

Alternatively, by determining the acquisition time, a point cloud data set containing vehicles, people and road scenes can be directly determined from the high-precision map, so that a three-dimensional image and a position coordinate set can be directly generated.

In order to improve the application flexibility of the data generation method, in another possible embodiment, the steps S202 to S203 may be specifically set forth as: the method comprises the steps of obtaining position information of the acquisition device, determining a point cloud data set of a second target object in an image from a high-precision map based on the position information of the acquisition device and the image, determining pixel coordinates of a first target object in the image and the first target object, obtaining a first coordinate conversion rule, converting the pixel coordinates of the first target object into three-dimensional coordinates of the first target object under a world coordinate system of the acquisition device based on the first coordinate conversion rule, determining relative position information of the first target object and the acquisition device based on the three-dimensional coordinates of the first target object under the world coordinate system of the acquisition device, determining three-dimensional coordinates of the first target object under a terrestrial coordinate system based on the relative position information of the first target object and the acquisition device and the position information of the acquisition device, and generating a three-dimensional image and a position coordinate set based on the three-dimensional coordinates of the first target object under the terrestrial coordinate system and the point cloud data set.

Optionally, the capturing device is a camera, and the first coordinate transformation rule is determined based on internal and external parameters of the camera.

Optionally, the second target object includes a road scene, such as a road surface, a road sign, a street lamp, and the like.

S203: determining a first target object in the three-dimensional image; the first target object includes a vehicle and/or a person.

Optionally, step S203 may be specifically stated as: and determining the first target object from the three-dimensional image by using a target detection algorithm, wherein the target detection algorithm comprises Faster R-CNN, SSD and YOLO.

S204: and generating a second target object based on the first target object, wherein the second target object is a three-dimensional model with a contour line.

In one possible implementation, step S204 can be specifically described as: determining identification information of the first target object, determining a target three-dimensional model from a gallery based on the identification information, and performing a contour frame adding operation on the target three-dimensional model to obtain the second target object, referring to fig. 6, where fig. 6 is a schematic diagram of an optional target three-dimensional model provided by the present application.

It should be noted that, after the contour frame is added to the target three-dimensional module, the coordinates of each point on the contour frame can be obtained; according to the needs, the scenes of the three-dimensional image, such as trees, road signs and corresponding shadows, can be adjusted, so that the diversity of data is improved; the first target object also comprises an obstacle and a rider; such as railings, road cones, etc.; meanwhile, according to special requirements, non-naturally generated target points such as a large number of pedestrians, random traffic flows, some buildings and the like can be added.

S205: and replacing the first target object in the three-dimensional image with the second target object based on the position coordinate set, and generating an updated three-dimensional image.

In a possible implementation manner, step S205 may be specifically expressed as: determining the position coordinate of the first target object from the three-dimensional image based on the position coordinate set, removing the first target object from the three-dimensional image, generating a target three-dimensional image, determining the position coordinate of the first target object as the position coordinate of the second target object, and adding the second target object to the target three-dimensional image based on the position coordinate of the second target object to obtain the updated three-dimensional image, referring to fig. 7, where fig. 7 is a schematic diagram of an optional updated three-dimensional image provided by the present application.

S206: the updated three-dimensional image is projected into a two-dimensional image, which contains the object identification. Referring to fig. 8, fig. 8 is a two-dimensional image labeled in the prior art.

Optionally, referring to fig. 9, fig. 9 is a model of an alternative three-dimensional image projected into a two-dimensional image according to the present application. As can be seen from fig. 9, it can be calculated according to the theory of similar triangles: the coordinates of (x, y, z) in the projection plane are (x ', y', -d), x '-dx/z, y' -dy/z, which is not a linear transformation since z is a variable. With the addition of a homogeneous term, in a homogeneous coordinate system, we can complete projection by linear operation of a matrix, and refer to fig. 10, where fig. 10 is a schematic diagram of an alternative process for converting three-dimensional coordinates into pixel coordinates provided by the present application. Wherein the first matrix is:

1 in the first matrix is a homogeneous term, (x, y, z) is a three-dimensional coordinate of a target first feature point, (u, v) is a coordinate on an image corresponding to the target first feature point, and f is a focal length of the camera.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an alternative data generating device according to the present application. The present application also discloses in another aspect a data generating apparatus, comprising:

an acquisition module 1101 for acquiring an image of a driving scene;

a reconstruction module 1102, configured to perform a three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of the plurality of feature points; the plurality of feature points belong to the three-dimensional image;

a determining module 1103 for determining a first target object in the three-dimensional image; the first target object comprises a vehicle and/or a person;

a target object generation module 1104 for generating a second target object based on the first target object, the second target object being a three-dimensional model having a contour line;

a three-dimensional image generating module 1105 configured to replace the first target object in the three-dimensional image with the second target object based on the position coordinate set, and generate an updated three-dimensional image;

a projection module 1106 configured to project the updated three-dimensional image into a two-dimensional image, where the two-dimensional image includes an object identifier.

In a possible implementation, the target object generating module is configured to determine identification information of the first target object; determining a target three-dimensional model from the gallery based on the identification information; and carrying out outline frame adding operation on the target three-dimensional model to obtain the second target object.

In a possible embodiment, the three-dimensional image generation module is configured to determine the position coordinates of the first target object from the three-dimensional image based on the position coordinate set; removing a first target object in the three-dimensional image to generate a target three-dimensional image; determining the position coordinates of the first target object as the position coordinates of the second target object; and adding the second target object into the target three-dimensional image based on the position coordinate of the second target object to obtain the updated three-dimensional image.

In one possible embodiment, the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an image of the driving scene by using a binocular camera; the binocular camera comprises a first camera and a second camera; the image comprises a first image captured by the first camera and a second image captured by the second camera; the first image and the second image are acquired at the same time.

In a possible embodiment, the reconstruction module is configured to determine a first data set based on the first image; the first data set includes pixel coordinates of each of a plurality of first feature points belonging to the first image; determining a second data set based on the second image, wherein the second data set comprises pixel coordinates of each second feature point in a plurality of second feature points, and the second feature points belong to the second image; determining internal and external parameters of the first camera and internal and external parameters of the second camera; the internal parameters of the first camera are equal to the internal parameters of the second camera; determining the set of position coordinates based on the internal and external parameters of the first camera, the internal and external parameters of the second camera, the first data set and the second data set; and performing three-dimensional reconstruction operation on the image based on the position coordinate set to obtain the three-dimensional image.

In a possible embodiment, the reconstruction module is configured to determine an epipolar constraint according to the epipolar parameters of the first camera, the epipolar parameters of the second camera, the first data set, and the second data set; performing feature matching on the first data set and the second data set to obtain a feature point pair set; the characteristic point pair set comprises a plurality of characteristic point pairs; each of the plurality of pairs of feature points includes the corresponding first feature point and the second feature point; screening out a target characteristic point pair set from the characteristic point pair set based on the epipolar constraint condition; determining a target characteristic point set from the first data set based on the target characteristic point pair set; the set of target feature points includes pixel coordinates of each of a plurality of target first feature points; determining the target three-dimensional coordinate set according to the target feature point set and the internal and external parameters of the first camera, wherein the target three-dimensional coordinate set comprises the three-dimensional coordinates of the first feature points of each target; the target three-dimensional coordinate set is determined as the position coordinate set.

In one possible embodiment, the acquisition module is configured to acquire an image of a driving scene by using the acquisition device;

the reconstruction module is used for acquiring the position information of the acquisition equipment; determining a point cloud data set corresponding to the image from a high-precision map based on the position information of the acquisition equipment and the image; the three-dimensional image and the set of location coordinates are generated based on the point cloud dataset.

The method provided by the embodiment of the application can be executed in a computer terminal, a server or a similar operation device. Taking the example of the application running on a server, fig. 12 is a hardware structure block diagram of the server according to an alternative data generation method of the present application. As shown in fig. 12, the server 1200 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1210 (the CPU 1210 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 1230 for storing data, and one or more storage media 1220 (e.g., one or more mass storage devices) for storing applications 1223 or data 1222. Memory 1230 and storage media 1220, among other things, may be transient storage or persistent storage. The program stored in the storage medium 1220 may include one or more modules, each of which may include a series of instruction operations for a server. Further, the central processor 1210 may be configured to communicate with the storage medium 1220, and execute a series of instruction operations in the storage medium 1220 on the server 1200. The server 1200 may also include one or more power supplies 1260, one or more wired or wireless network interfaces 1250, one or more input-output interfaces 1240, and/or one or more operating systems 1221, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The input/output interface 1240 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 1200. In one example, the input/output Interface 1240 includes a Network Interface Controller (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In one example, the input/output interface 1240 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 12 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 1200 may also include more or fewer components than shown in FIG. 12, or have a different configuration than shown in FIG. 12.

Embodiments of the present application also provide an electronic device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the data generating method as described above.

Embodiments of the present application further provide a computer storage medium, which may be disposed in a server to store at least one instruction, at least one program, a set of codes, or a set of instructions related to implementing a data generation method in the method embodiments, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the data generation method.

Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of generating data, comprising:

acquiring an image of a driving scene;

performing three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of a plurality of feature points; the plurality of feature points belong to the three-dimensional image;

replacing the first target object in the three-dimensional image with the second target object based on the set of position coordinates, generating an updated three-dimensional image;

projecting the updated three-dimensional image into a two-dimensional image, the two-dimensional image containing an object identifier.

2. The data generation method of claim 1, wherein generating a second target object based on the first target object comprises:

determining identification information of the first target object;

determining a target three-dimensional model from a gallery based on the identification information;

3. The data generation method of claim 1, wherein the replacing the first target object in the three-dimensional image with the second target object based on the set of location coordinates, generating an updated three-dimensional image, comprises:

determining position coordinates of the first target object from the three-dimensional image based on the set of position coordinates;

4. The data generation method of any one of claims 1 to 3, wherein the acquiring an image of a driving scene comprises:

5. The data generating method of claim 4, wherein said performing a three-dimensional reconstruction operation on said image resulting in a three-dimensional image and a set of location coordinates comprises:

determining a first data set based on the first image; the first data set comprises pixel coordinates of each of a plurality of first feature points belonging to the first image;

determining an internal and external parameter of the first camera and an internal and external parameter of the second camera; the internal parameters of the first camera are equal to the internal parameters of the second camera;

determining the set of position coordinates based on the internal and external parameters of the first camera, the internal and external parameters of the second camera, the first data set, and the second data set;

6. The method of claim 5, wherein determining the set of location coordinates based on the first camera's internal and external parameters, the second camera's internal and external parameters, the first data set, and the second data set comprises:

performing feature matching on the first data set and the second data set to obtain a feature point pair set; the characteristic point pair set comprises a plurality of characteristic point pairs; each pair of feature points of the plurality of pairs of feature points comprises the corresponding first feature point and the second feature point;

screening out a target characteristic point pair set from the characteristic point pair set based on the polar line constraint condition;

determining a target characteristic point set from the first data set based on the target characteristic point pair set; the target feature point set comprises pixel coordinates of each of a plurality of target first feature points;

determining the target three-dimensional coordinate set according to the target feature point set and the internal and external parameters of the first camera, wherein the target three-dimensional coordinate set comprises the three-dimensional coordinates of each target first feature point;

determining the target three-dimensional coordinate set as the position coordinate set.

7. The data generation method of claim 1, wherein the acquiring an image of a driving scene comprises:

acquiring an image of a driving scene by using acquisition equipment;

acquiring the position information of the acquisition equipment;

generating the three-dimensional image and the set of location coordinates based on the point cloud dataset.

8. A data generation apparatus, comprising:

the acquisition module is used for acquiring an image of a driving scene;

the reconstruction module is used for performing three-dimensional reconstruction operation on the image to obtain a three-dimensional image and a position coordinate set; the position coordinate set comprises three-dimensional coordinates of each of a plurality of feature points; the plurality of feature points belong to the three-dimensional image;

a determination module for determining a first target object in the three-dimensional image; the first target object comprises a vehicle and/or a person;

9. An electronic device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a data generating method according to any one of claims 1 to 7.

10. A computer storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the data generating method of any one of claims 1 to 7.