WO2024040964A1 - 识别模型训练方法、装置以及可移动智能设备 - Google Patents

识别模型训练方法、装置以及可移动智能设备 Download PDF

Info

Publication number
WO2024040964A1
WO2024040964A1 PCT/CN2023/083772 CN2023083772W WO2024040964A1 WO 2024040964 A1 WO2024040964 A1 WO 2024040964A1 CN 2023083772 W CN2023083772 W CN 2023083772W WO 2024040964 A1 WO2024040964 A1 WO 2024040964A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
shooting component
component
dimensional information
recognition model
Prior art date
Application number
PCT/CN2023/083772
Other languages
English (en)
French (fr)
Inventor
毕舒展
张洁
谢鹏寰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024040964A1 publication Critical patent/WO2024040964A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Definitions

  • This application relates to the field of intelligent driving, and in particular to recognition model training methods, devices and mobile intelligent devices.
  • movable smart devices such as smart driving vehicles, sweeping robots, etc.
  • visual sensors such as cameras
  • obstacle recognition based on monocular cameras is a low-cost visual perception solution.
  • the principle is to input the image collected by the monocular camera into a preset recognition model, and through this recognition model, the three-dimensional (3D) information of each obstacle included in the image can be output, such as: 3D coordinates, size, orientation angle etc.
  • the above recognition model is trained by obtaining an image and 3D information of obstacles included in the image.
  • the obtained 3D information of obstacles is inaccurate.
  • This application provides a recognition model training method, device and movable intelligent device, which can make the 3D information of obstacles obtained more accurate when using the trained recognition model to identify obstacles.
  • this application provides a recognition model training method.
  • the method includes: acquiring a first image collected by a first shooting component, the first shooting component corresponding to a first viewing angle; and generating a second shooting component corresponding to the first image according to the first image.
  • the second image, the second photographing component is determined according to the first photographing component, the second photographing component corresponds to the second angle of view; the second photographing component in the second image is determined according to the first three-dimensional information of the first object in the first image.
  • Second three-dimensional information of the first object training a recognition model based on the second image and the second three-dimensional information, and the recognition model is used to recognize the object in the image collected by the first shooting component.
  • the image collected by the first shooting component generates an image corresponding to the second shooting component
  • the second shooting component can be determined according to the first shooting component.
  • the first shooting component corresponds to the first angle of view
  • the second shooting component corresponds to the second shooting component.
  • Two viewing angles so that images from different viewing angles of the first shooting component can be obtained, making the image data obtained more abundant.
  • the three-dimensional information of the object in the image collected by the first shooting component the three-dimensional information of the object in the image corresponding to the second shooting component can be determined, and the three-dimensional information of the object in the image with a different perspective from that of the first shooting component can be obtained, and then the three-dimensional information of the object in the image collected by the first shooting component can be obtained.
  • Using more evenly distributed three-dimensional information and then using this more evenly distributed three-dimensional information to train the recognition model can reduce the impact of uneven distribution of training data on the prediction results of the recognition model. Subsequently, when using this recognition model for object detection, the three-dimensional information of the object obtained is more accurate.
  • the second shooting component is a virtual shooting component; before generating the second image corresponding to the second shooting component according to the first image, the method further includes: converting the coordinate system of the first shooting component to Rotate the preset angle to obtain the second shooting component.
  • the second shooting component is obtained by rotating the coordinate system of the first shooting component by a preset angle. In this way, a virtual shooting component with a different perspective from the first shooting component can be obtained, and the images collected by the first shooting component can be And the three-dimensional information of the objects in the image is converted into the image corresponding to the virtual shooting component and the three-dimensional information of the objects in the image. Therefore, the three-dimensional information of the objects from different perspectives can be obtained, making the distribution of the obtained training data more balanced. , which can reduce the impact of uneven distribution of training data on the trained model.
  • the internal parameters of the first shooting component and the second shooting component are the same; or, the internal parameters of the first shooting component and the second shooting component are different.
  • generating a second image corresponding to the second shooting component according to the first image includes: determining the first coordinate of the first pixel point in the first image on a preset reference plane, the second A pixel point is any pixel point in the first image; the second coordinates corresponding to the first pixel point and the second shooting component are determined according to the first coordinates; the second pixel point is determined according to the second coordinates; The second pixel points generate the second image.
  • the preset reference surface is used as a reference. Since there is a corresponding relationship between the first pixel point and the point on the preset reference surface, there is also a corresponding relationship between the second pixel point and the point on the preset reference surface.
  • the pixels corresponding to the second shooting component can be determined based on the pixels in the image collected by the first shooting component, and the second shooting can be generated based on the second pixels corresponding to the second shooting component.
  • the image corresponding to the component can be determined based on the pixels in the image collected by the first shooting component, and the second shooting can be generated based on the second pixels corresponding to the second shooting component.
  • the preset reference surface is a spherical surface with the optical center of the first photographing component as the center of the sphere. Based on this design, the preset reference surface is a spherical surface with the optical center of the first shooting component as the center of the sphere. In this way, when the coordinate system of the first shooting component is rotated by the preset angle to obtain the second shooting component, the preset reference surface does not will change, the preset reference planes corresponding to the first shooting component and the second shooting component are both the same spherical surface, and then the preset reference surface can be used as an intermediary to determine the first shooting component based on the pixels in the image collected by the first shooting component. The pixels in the image corresponding to the two shooting components are used to generate the image corresponding to the second shooting component.
  • determining the second three-dimensional information of the first object in the second image based on the first three-dimensional information of the first object in the first image includes: based on the first three-dimensional information, the The coordinate transformation relationship between the first shooting component and the second shooting component determines the second three-dimensional information. Based on this design, the three-dimensional information of the object in the image collected by the first shooting component can be converted into the three-dimensional information of the object in the image corresponding to the second shooting component.
  • the method before determining the second three-dimensional information of the first object in the second image based on the first three-dimensional information of the first object in the first image, the method further includes: obtaining sensor acquisition Point cloud data corresponding to the first image, the point cloud data including the third three-dimensional information of the first object; determined based on the third three-dimensional information and the coordinate conversion relationship between the first shooting component and the sensor The first three-dimensional information. Based on this design, the three-dimensional information of the object collected by the sensor can be converted into the three-dimensional information of the object in the image collected by the first shooting component.
  • the first three-dimensional information, the second three-dimensional information, or the third three-dimensional information include one or more of the following: three-dimensional coordinates, size, and orientation angle.
  • this application provides a recognition model training device.
  • the recognition model training device includes modules or units corresponding to the above methods.
  • the modules or units can be implemented by hardware, software, or by The hardware executes the corresponding software implementation.
  • the recognition model training device includes an acquisition unit (or acquisition module) and a processing unit (processing module); the acquisition unit is used to acquire the first image collected by the first shooting component, the The first shooting component corresponds to the first perspective.
  • the processing unit is configured to: generate a second image corresponding to a second shooting component according to the first image, the second shooting component is determined according to the first shooting component, and the second shooting component corresponds to a second angle of view; according to the first The first three-dimensional information of the first object in the image determines the second three-dimensional information of the first object in the second image; a recognition model is trained according to the second image and the second three-dimensional information, and the recognition model is used to recognize the The object in the image collected by the first shooting component.
  • the second shooting component is a virtual shooting component; the processing unit is also configured to rotate the coordinate system of the first shooting component by a preset angle to obtain the second shooting component.
  • the internal parameters of the first shooting component and the second shooting component are the same; or, the internal parameters of the first shooting component and the second shooting component are different.
  • the processing unit is specifically configured to: determine the first coordinate of the first pixel point in the first image on the preset reference plane, and the first pixel point is any point in the first image.
  • One pixel point determine the second coordinate corresponding to the first pixel point and the second shooting component according to the first coordinate; determine the second pixel point according to the second coordinate; and generate the second image according to the second pixel point.
  • the preset reference surface is a spherical surface with the optical center of the first photographing component as the center of the sphere.
  • the processing unit is specifically configured to determine the second three-dimensional information based on the first three-dimensional information and the coordinate transformation relationship between the first shooting component and the second shooting component.
  • the acquisition unit is also used to acquire point cloud data corresponding to the first image collected by the sensor, and the point cloud data includes the third three-dimensional information of the first object;
  • the processing unit is also Used to determine the first three-dimensional information according to the third three-dimensional information and the coordinate transformation relationship between the first shooting component and the sensor.
  • the first three-dimensional information, the second three-dimensional information, or the third three-dimensional information include one or more of the following: three-dimensional coordinates, size, and orientation angle.
  • the present application provides a recognition model training device, including: a processor coupled to a memory; and a processor configured to execute a computer program stored in the memory, so that the recognition model training device performs the above-mentioned first aspect and the methods described in any of these designs.
  • the memory can be coupled to the processor or independent of the processor.
  • the recognition model training device further includes a communication interface, and the communication interface can be used to communicate with other devices.
  • the communication interface may be a transceiver, an input/output interface, an interface circuit, an output circuit, an input circuit, a pin or a related circuit, etc.
  • the recognition model training device of the third or fourth aspect may be a computing platform in an intelligent driving system, and the computing platform may be a vehicle-mounted computing platform or a cloud computing platform.
  • the present application provides an electronic device, which includes the recognition model described in the above first or second aspect and any one of the designs.
  • the electronic device further includes the first photographing component as described in the above first aspect or the second aspect and any one of the designs.
  • the first photographing component may be a monocular camera.
  • the present application provides a computer-readable storage medium.
  • the computer-readable storage medium includes a computer program or instructions.
  • the computer program or instructions When the computer program or instructions are run on a recognition model training device, the computer-readable storage medium causes the recognition model to be recognized.
  • the model training device performs the method described in the above first aspect and any one of the designs.
  • the present application provides a computer program product.
  • the computer program product includes: a computer program or instructions.
  • the computer program or instructions When the computer program or instructions are run on a computer, the computer causes the computer to execute the above-mentioned first aspect and wherein method described in either design.
  • the present application provides a chip system, including at least one processor and at least one interface circuit.
  • the at least one interface circuit is used to perform transceiver functions and send instructions to at least one processor.
  • at least one processor executes the instructions
  • at least one processor executes the method described in the above first aspect and any of the designs.
  • the present application provides a movable smart device, including: a first photographing component and the recognition model training device described in the second aspect or the third aspect and any one of the above designs, the first photographing component is used for Collect a first image and transmit the first image to the recognition model training device.
  • Figure 1 is a schematic diagram of point cloud data provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of the coordinate system of a monocular camera and the coordinate system of a lidar provided by an embodiment of the present application;
  • Figure 3 is a schematic diagram of visual 3D information provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the orientation angle of an obstacle provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of the distribution of orientation angles used to train the recognition model obtained in the relevant solution
  • Figure 6 is a schematic architectural diagram of a system provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • Figure 8 is a functional block diagram of a vehicle provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of a camera deployed on a vehicle according to an embodiment of the present application.
  • Figure 10 is a schematic flow chart of a recognition model training method provided by an embodiment of the present application.
  • Figure 11 is a schematic diagram of a second shooting component provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of the distribution of orientation angles obtained for training the recognition model provided by the embodiment of the present application.
  • Figure 13 is a schematic flow chart of another recognition model training method provided by an embodiment of the present application.
  • Figure 14 is a schematic diagram of a preset reference plane provided by an embodiment of the present application.
  • Figure 15 is a schematic diagram of 3D information predicted by a recognition model according to an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a recognition model training device provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of a chip system provided by an embodiment of the present application.
  • A/B can mean A or B; "and/or” in this application only means It is an association relationship that describes associated objects. It means that there can be three relationships.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. Among them, A and B Can be singular or plural.
  • plural means two or more than two.
  • At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • at least one of a, b, or c can mean: a, b, c, a and b, a and c, b and c, a and b and c, where a, b, c can be single or multiple.
  • words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects. Those skilled in the art can understand that words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not limit the number and execution order.
  • words such as “exemplary” or “for example” are used to represent examples, illustrations or explanations. Any embodiment or design described as “exemplary” or “such as” in the embodiments of the present application is not to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner that is easier to understand.
  • the size of the sequence numbers of each process does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not constitute the implementation process of the embodiments of the present application. Any limitations.
  • Some optional features in the embodiments of this application can be implemented independently in some scenarios without relying on other features to solve corresponding technical problems and achieve corresponding effects. They can also be implemented in certain scenarios according to needs. Combined with other features.
  • a monocular camera refers to a camera that has only one lens and cannot directly measure depth information.
  • a bird's-eye view is a three-dimensional view based on the principle of perspective, using high-angle perspective to look down at the ground undulations from a high point, that is, the image seen when looking down at a certain area from the air.
  • the recognition model is an information processing system composed of a large number of processing units (for example, neurons) connected to each other.
  • the processing units in the recognition model contain corresponding mathematical expressions. After the data is input to the processing unit, the processing unit runs the mathematical expression it contains, performs calculations on the input data, and generates output data.
  • the input data of each processing unit is the output data of the previous processing unit connected to it, and the output data of each processing unit is the input data of the next processing unit connected to it.
  • the recognition model After inputting data, the recognition model selects corresponding processing units for the input data based on its own learning and training, and uses these processing units to calculate the input data, determine and output the final calculation results. At the same time, the recognition model can continue to learn and evolve during the data operation process. Feedback continuously optimizes its own calculation process. The more times the recognition model is trained, the more feedback results are obtained and the more accurate the calculation results are.
  • the recognition model described in the embodiment of the present application is used to process the images collected by the shooting component and determine the three-dimensional information of the objects in the images.
  • these objects include, but are not limited to, various types of obstacles such as people and objects.
  • the reflected laser light When a laser beam strikes the surface of an object, the reflected laser light carries information such as orientation and distance. If the laser beam is scanned according to a certain trajectory, the reflected laser point information will be recorded while scanning. Because the scanning is extremely fine, a large number of laser points can be obtained. Each laser point contains 3D coordinates, so it can form Laser point clouds, these laser point clouds can be called point cloud data. In some embodiments, these laser point clouds can be obtained through laser scanning of lidar. Of course, these point cloud data can also be obtained through other types of sensors, and this application is not limited thereto.
  • the point cloud data may include 3D information of one or more objects, such as 3D coordinates, size, orientation angle, etc.
  • Figure 1 shows point cloud data provided by the embodiment of the present application.
  • Each point in Figure 1 is a laser point
  • the position B of the circle is the position of the sensor that obtains point cloud data.
  • the 3D box with an arrow (such as 3D box A) is a 3D model established based on the 3D information of the scanned object.
  • the coordinates of the 3D box are the 3D coordinates of the object.
  • the length, width and height of the 3D box are The size of the object and the direction of the arrow of the 3D box are the orientation of the object, etc.
  • the coordinate system of the monocular camera may be a three-dimensional coordinate system with the optical center of the monocular camera as the origin.
  • FIG. 2 shows a schematic diagram of the coordinate system of a monocular camera provided by an embodiment of the present application.
  • the coordinate system of the monocular camera can be a spatial rectangular coordinate system with the optical center o of the monocular camera as the origin, where the line of sight direction of the monocular camera is the positive z-axis direction, downward is the positive direction of the y-axis, and to the right is the positive direction of the x-axis.
  • the y-axis and x-axis can be parallel to the imaging plane of the monocular camera.
  • the coordinate system of the lidar can be a three-dimensional coordinate system with the laser emission center as the origin.
  • (2) in Figure 2 shows a schematic diagram of a coordinate system of a lidar provided by an embodiment of the present application.
  • the coordinate system of the lidar may be a spatial rectangular coordinate system with the laser emission center o of the lidar as the origin.
  • upward is the positive z-axis direction
  • forward is the positive x-axis direction
  • left is the positive y-axis direction.
  • the coordinate system of the monocular camera shown in (1) in Figure 2 and the coordinate system of the lidar shown in (2) in Figure 2 are only exemplary illustrations.
  • the coordinate system of the monocular camera and the coordinates of the lidar are The system can also be set in other ways, or set to other types of coordinate systems, which is not limited in this application.
  • movable smart devices taking autonomous vehicles as an example
  • visual sensors such as cameras
  • obstacle recognition based on monocular cameras is a low-cost visual perception solution.
  • the principle is to input the image collected by the monocular camera into a preset recognition model, and through the recognition model, the 3D information of each obstacle in the image in the monocular camera coordinate system can be output.
  • (1) in Figure 3 is a schematic diagram of visualizing the 3D information output by the recognition model on the image collected by the monocular camera provided by the embodiment of the present application.
  • each 3D box shown in (1) in Figure 3 (such as 3D Frame C, etc.) is drawn based on the 3D information of the obstacle where the 3D frame is located.
  • (2) in Figure 3 is a schematic diagram of visualizing the 3D information output by the recognition model on a bird's-eye view provided by the embodiment of the present application.
  • each 3D box (such as 3D box D, etc.) shown in (2) in Figure 3 is also drawn based on the 3D information of the obstacle where the 3D box is located.
  • the above recognition model can be trained by obtaining an image and 3D information of obstacles in the image.
  • the 3D information of the obstacles in the image can be obtained based on the point cloud data.
  • point cloud data corresponding to the image is obtained through lidar, and the point cloud data includes 3D information of the obstacle (the obstacle is the obstacle in the image) in the lidar coordinate system.
  • calibrate the extrinsics between the lidar and the monocular camera to convert the 3D information in the lidar coordinate system into 3D information in the monocular camera coordinate system.
  • the above-mentioned recognition model can be trained using the acquired image and the aforementioned converted 3D information of the obstacles in the image in the monocular camera coordinate system.
  • the 3D information of obstacles obtained for training the recognition model is unbalanced.
  • the orientation angle is usually obtained by adding ⁇ and ⁇ .
  • Figure 4 shows a schematic diagram of ⁇ and ⁇ provided by the embodiment of the present application.
  • is the angle in the horizontal direction between the line connecting the center point M of the obstacle and the origin o of the coordinate system of the monocular camera and the z-axis of the coordinate system of the monocular camera, where the direction of the arrow N is the obstacle. direction.
  • means that in the coordinate system of the monocular camera, with the origin o of the coordinate system as the center, and the line connecting the origin o to the center point M of the obstacle as the radius, the obstacle surrounds y of the coordinate system of the monocular camera.
  • the axis rotates to the z-axis, such as moving the obstacle from position 1 to position 2.
  • the angle between the direction of the obstacle that is, the direction of the arrow N shown in position 2
  • the x-axis direction that is, the direction of the arrow N shown in position 2
  • Figure 5 shows the distribution diagram of ⁇ and ⁇ of obstacles obtained by this solution.
  • the abscissa represents ⁇ and the ordinate represents ⁇ .
  • the distribution of ⁇ and ⁇ is unbalanced. For example: when ⁇ is 0 degrees, in areas such as -180 degrees, -90 degrees, 0 degrees, 90 degrees, 180 degrees, the ⁇ distribution is relatively dense, while in other areas other than the aforementioned areas, the ⁇ distribution is relatively sparse, etc.
  • Both ⁇ and ⁇ can be predicted by the trained recognition model, but ⁇ is usually predicted more accurately, while ⁇ is easily affected by the imbalance of training data. Therefore, when using the above-obtained 3D information of unevenly distributed obstacles to train the recognition model, affected by the uneven distribution of training data, the trained recognition model will be inaccurate in predicting ⁇ .
  • Figure 5 The ⁇ shown in the sparse distribution is predicted to be the ⁇ in the dense distribution, which makes the predicted orientation angle of the obstacle inaccurate. That is to say, the predicted 3D information of the obstacle is inaccurate.
  • this application provides a recognition model training method that can reduce the impact of the uneven distribution of training data on the prediction results of the recognition model, so that when using the trained recognition model to identify obstacles, the obtained obstacles
  • the 3D information is more accurate.
  • FIG. 6 shows a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system 60 includes a training device 10, an execution device 20, and so on.
  • the training device 10 can run the recognition model and train the recognition model.
  • the training device 10 may be a computing platform in an intelligent driving system.
  • the training device 10 may be a server, a cloud device, or any other computing device with computing capabilities.
  • the server may be one server, or may be composed of multiple servers A server cluster composed of.
  • the training device 10 may also be a vehicle-mounted computing platform. This application does not place any limitation on the specific type of the training device 10 .
  • the recognition model trained by the training device 10 is configured on the execution device 20, so that the recognition model can be used to identify various objects.
  • the objects include but are not limited to various types of things such as pedestrians, vehicles, road signs, animals, and buildings.
  • the recognition model trained by the training device 10 can be configured into multiple execution devices 20 , and each execution device 20 can use its configured recognition model to identify various obstacles, etc.
  • each execution device 20 may be configured with one or more recognition models trained by the training device 10 , and each execution device 20 may use its configured one or more recognition models to conduct various obstacle detection. Identify.
  • the execution device 20 may be an artificial intelligence (AI) device, such as a sweeping robot, an intelligent vehicle, and other devices with obstacle recognition requirements, or may be a desktop computer, a handheld computer, a notebook computer, Various devices such as ultra-mobile personal computers (UMPC) that have the function of controlling the aforementioned devices with obstacle recognition requirements.
  • AI artificial intelligence
  • UMPC ultra-mobile personal computers
  • Road signs can contain graphic road signs or text road signs.
  • Driving reference objects can be buildings or plants. Obstacles on the road can include dynamic objects (such as animals, pedestrians, moving vehicles, etc.) or stationary objects (such as stationary vehicles).
  • the training device 10 and the execution device 20 are different processors deployed on different physical devices (such as servers or servers in a cluster).
  • the execution device 20 may be a neural network processing unit (NPU), a graphics processing unit (GPU), a central processing unit (CPU), other general-purpose processors, digital signal processing Digital signal processor (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, Discrete hardware components, etc.
  • NPU neural network processing unit
  • GPU graphics processing unit
  • CPU central processing unit
  • DSP digital signal processing Digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • a general-purpose processor can be a microprocessor or any conventional processor, etc.
  • the training device 10 may be a GPU, NPU, microprocessor, ASIC, or one or more integrated circuits used to control the execution of the program of the present application.
  • FIG. 6 is only a simplified example diagram for ease of understanding. In practical applications, the above system may also include other devices, which are not shown in the figure.
  • the training device 10 in the embodiment of the present application can be implemented by different devices.
  • the training device 10 in the embodiment of the present application can be implemented by the communication device in FIG. 7 .
  • FIG. 7 is a schematic diagram of the hardware structure of the training device 10 provided by the embodiment of the present application.
  • the training device 10 includes at least one processor 701, a communication line 702, a memory 703 and at least one communication interface 704.
  • the processor 701 may be a general-purpose CPU, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the present application.
  • Communication line 702 may include a path that carries information between the above-mentioned components.
  • the communication interface 704 may be Modules, circuits, buses, interfaces, transceivers or other devices that can implement communication functions.
  • the transceiver can be an independently configured transmitter, which can be used to send information to other devices.
  • the transceiver can also be an independently configured receiver, which can be used to receive information from other devices. The device receives the information.
  • the transceiver may also be a component that integrates the functions of sending and receiving information. The embodiments of this application do not limit the specific implementation of the transceiver.
  • Memory 703 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory (RAM)) or other type that can store information and instructions.
  • a dynamic storage device can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc (compact disc read-only memory, CD-ROM) or other optical disk storage, optical disk storage (including compressed optical discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by a computer Any other media accessible.
  • the memory 703 may exist independently and be connected to the processor 701 through the communication line 702 . Memory 703 may also be integrated with processor 701.
  • the memory 703 is used to store computer execution instructions for implementing the solution of the present application.
  • the processor 701 is used to execute computer execution instructions stored in the memory 703, thereby implementing the methods provided by the following embodiments of the present application.
  • the computer execution instructions in the embodiments of the present application may also be called application codes, instructions, computer programs or other names, which are not specifically limited in the embodiments of the present application.
  • the processor 701 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 7 .
  • the training device 10 may include multiple processors, such as the processor 701 and the processor 705 in FIG. 7 .
  • processors may be a single-CPU processor or a multi-CPU processor.
  • a processor here may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the above-mentioned training device 10 may be a general device or a special device.
  • the embodiment of the present application does not limit the type of the training device 10 .
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the training device 10 .
  • the training device 10 may include more or less components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • FIG. 8 shows a functional block diagram of the vehicle 100 .
  • the vehicle 100 is equipped with a trained recognition model, which can use the recognition model to identify objects in the environment during driving to ensure safe and accurate driving of the vehicle 100 .
  • the objects include but are not limited to road signs, driving reference objects, obstacles on the road, etc.
  • Vehicle 100 may include various subsystems, such as travel system 110 , sensor system 120 , control system 130 , one or more peripheral devices 140 , power supply 150 , computer system 160 , and user interface 170 .
  • vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements. Additionally, each subsystem and element of vehicle 100 may be interconnected via wires or wirelessly.
  • Propulsion system 110 includes components that provide powered motion for vehicle 100 .
  • travel system 110 includes engine 111 , transmission 112 , energy source 113 and wheels 114 .
  • the engine 111 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine composed of a gasoline engine and an electric motor, or a hybrid engine composed of an internal combustion engine and an air compression engine.
  • Engine 111 converts energy source 113 into mechanical energy.
  • Examples of energy sources 113 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. Energy source 113 may also provide energy for other systems of vehicle 100 .
  • Transmission 112 may transmit mechanical power from engine 111 to wheels 114 .
  • Transmission 112 may include a gearbox, differential, and driveshaft.
  • the transmission device 112 may also include other components, such as a clutch.
  • the drive shaft may include one or more axles that may be coupled to one or more wheels 114 .
  • Sensor system 120 may include a number of sensors that sense information about the environment surrounding vehicle 100 .
  • the sensor system 120 includes a positioning system 121 (the positioning system can be a global positioning system (GPS), Beidou system or other positioning systems), an inertial measurement unit (inertial measurement unit, IMU) 122, and a radar 123 , lidar 124 and camera 125 .
  • Sensor data from one or more of these sensors can be used to detect objects and their corresponding properties (position, shape, orientation, speed, etc.). This detection and identification is a critical function for safe operation of autonomous driving of vehicle 100 .
  • Positioning system 121 may be used to estimate the geographic location of vehicle 100 .
  • the IMU 122 is used to sense changes in position and orientation of the vehicle 100 based on inertial acceleration.
  • IMU 122 may be a combination of an accelerometer and a gyroscope.
  • Radar 123 may utilize radio signals to sense objects within the surrounding environment of vehicle 100 . In some embodiments, in addition to sensing objects, radar 123 may be used to sense the speed and/or heading of the object.
  • LiDAR 124 may utilize laser light to sense objects in the environment in which vehicle 100 is located.
  • lidar 124 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
  • Camera 125 may be used to capture multiple images of the surrounding environment of vehicle 100, as well as multiple images within the vehicle's cabin. Camera 125 may be a still camera or a video camera. In some embodiments of the present application, the camera 125 may be a monocular camera.
  • the monocular camera includes but is not limited to a long-range camera, a medium-range camera, a short-range camera, a fish-eye camera, etc.
  • FIG. 9 shows a schematic diagram of a camera 125 deployed on a vehicle 100 provided by an embodiment of the present application.
  • the camera 125 can be deployed at various locations such as the front of the car, the rear of the car, both sides of the car body, and the roof of the car.
  • Control system 130 may control the operation of vehicle 100 and its components.
  • the control system 130 may include various elements such as a steering system 131, a throttle 132, a braking unit 133, a computer vision system 134, a route control system 135, and an obstacle avoidance system 136.
  • Steering system 131 is operable to adjust the forward direction of vehicle 100 .
  • it may be a steering wheel system.
  • Throttle 132 is used to control the operating speed of engine 111 and thereby the speed of vehicle 100 .
  • the braking unit 133 is used to control the deceleration of the vehicle 100 .
  • Computer vision system 134 may be operable to process and analyze images captured by camera 125 in order to identify vehicles 100 Objects and/or features in the surrounding environment as well as the physical features and facial features of the driver in the vehicle cockpit.
  • the objects and/or features may include traffic signals, road conditions and obstacles, and the driver's physical features and facial features may include the driver's behavior, sight lines, expressions, etc.
  • the route control system 135 is used to determine the driving route of the vehicle 100 .
  • route control system 135 may combine data from sensors, positioning system 121 and one or more predetermined maps to determine a driving route for vehicle 100 .
  • Obstacle avoidance system 136 is used to identify, evaluate, and avoid or otherwise negotiate potential obstacles in the environment of vehicle 100 .
  • control system 130 may additionally or alternatively include components other than those shown and described. Alternatively, some of the components shown above may be reduced.
  • Peripheral devices 140 may include a wireless communication system 141 , an onboard computer 142 , a microphone 143 and/or a speaker 144 .
  • peripheral device 140 provides a means for a user of vehicle 100 to interact with user interface 170 .
  • Wireless communication system 141 may wirelessly communicate with one or more devices directly or via a communication network.
  • Power supply 150 may provide power to various components of vehicle 100 .
  • Computer system 160 may include at least one processor 161 that executes instructions 1621 stored, for example, in data storage device 162.
  • Computer system 160 may also be a plurality of computing devices that control independent components or subsystems of vehicle 100 in a distributed fashion.
  • Processor 161 may be any conventional processor, such as a CPU or a special purpose device such as an ASIC or other hardware-based processor.
  • FIG. 8 functionally illustrates a processor, data storage device, and other elements in the same physical enclosure, one of ordinary skill in the art will understand that the processor, computer system, or data storage device may actually include storage. Multiple processors, computer systems, or data storage devices within the same physical enclosure, or including multiple processors, computer systems, or data storage devices stored within different physical enclosures.
  • the data storage device may be a hard drive, or other storage medium located within a different physical enclosure.
  • a reference to a processor or computer system will be understood to include a reference to a collection of processors or computer systems or data storage devices that may operate in parallel, or to a collection of processors or computer systems or data storage devices that may not operate in parallel. Collection reference. Rather than using a single processor to perform the steps described herein, some components, such as the steering component and the deceleration component, may each have its own processor that only performs calculations related to component-specific functionality. .
  • the processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle and others are performed by a remote processor, including taking the steps necessary to perform a single maneuver.
  • data storage 162 may contain instructions 1621 (eg, program logic) that may be executed by processor 161 to perform various functions of vehicle 100 , including those described above.
  • Data storage 162 may also contain additional instructions, including sending data to, receiving data from, interacting with, and/or performing operations on one or more of travel system 110 , sensor system 120 , control system 130 , and peripherals 140 Control instructions.
  • data storage 162 may store data such as road maps, route information, vehicle location, direction, speed and other such vehicle data, as well as other information. This information may be used by vehicle 100 and computer system 160 during operation of vehicle 100 in autonomous, semi-autonomous and/or manual modes.
  • a trained recognition model is stored in the data storage device 162 .
  • the data storage device 162 can obtain the image of the surrounding environment of the vehicle based on the camera 125 in the sensor 120, and obtain the 3D information of each obstacle in the image through the stored recognition model. .
  • User interface 170 for providing information to or receiving information from a user of vehicle 100 .
  • user interface 170 may interact with one or more input/output devices within a set of peripheral devices 140 , such as one or more of wireless communication system 141 , on-board computer 142 , microphone 143 , and speaker 144 .
  • Computer system 160 may control vehicle 100 based on information obtained from various subsystems (eg, travel system 110 , sensor system 120 , and control system 130 ) as well as information received from user interface 170 .
  • various subsystems eg, travel system 110 , sensor system 120 , and control system 130 .
  • one or more of these components described above may be installed separately or associated with vehicle 100 .
  • data storage device 162 may exist partially or completely separate from vehicle 100 .
  • the above components may be coupled together via wired and/or wireless means for communication.
  • the above-mentioned vehicle 100 can be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, an entertainment vehicle, a playground vehicle, a construction equipment, a tram, a golf cart, a train, a trolley, etc.
  • the application examples are not particularly limited.
  • the autonomous vehicle may also include a hardware structure and/or a software module to implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above functions is performed as a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
  • Figure 10 shows a recognition model training method provided by an embodiment of the present application.
  • the execution subject of this method can be the training device 10 shown in Figure 6, or can be a processor in the training device 10, The processor shown in Figure 7.
  • the method includes the following steps:
  • the first shooting component corresponds to the first viewing angle, that is, the first image is an image corresponding to the first viewing angle.
  • the first shooting component may be a monocular camera, which includes but is not limited to a long-range camera, a medium-range camera, a short-range camera, a fish-eye camera, etc.
  • the first shooting camera can also be another device with image acquisition function.
  • the first shooting component may also be called an image acquisition device, a sensor, etc. It can be understood that its name does not constitute a limitation on its function.
  • the second shooting component can be determined based on the first shooting component, and the second shooting component corresponds to the second viewing angle, that is, the second image is an image corresponding to the second viewing angle.
  • the first angle of view is different from the second angle of view, that is to say, the shooting angle corresponding to the first shooting component is different from the shooting angle corresponding to the second shooting component, that is, the first image is different from the second image.
  • the first shooting component is a virtual shooting component.
  • the coordinate system of the first photographing component can be rotated by a preset angle to obtain the second photographing component. That is to say, the coordinate system of the first shooting component is rotated by a preset angle to obtain a new coordinate system, and the shooting component corresponding to the new coordinate system is used as the second shooting component.
  • the external parameters between the second shooting component and the first shooting component can be determined based on one or more of the preset angle of rotation, the direction of rotation, etc. It can be understood that the external parameter refers to the conversion relationship between the two coordinate systems. Therefore, the external parameter between the second shooting component and the first shooting component can also be described as, the external parameter between the second shooting component and the first shooting component. Coordinate transformation relationship (or coordinate system transformation relationship).
  • the external parameter between the second shooting component and the first shooting component is a rotation matrix related to the preset angle.
  • the coordinate system of the first shooting component is the coordinate shown in (1) in Figure 2 Taking this as an example, the external parameters between the second shooting component and the first shooting component are introduced.
  • the second photographing component is obtained by rotating the coordinate system of the first photographing component around the x-axis in the coordinate system of the first photographing component by a preset angle
  • the second photographing component and the first photographing component The external parameters between shooting components can be expressed in the form of Formula 1.
  • X represents the external parameter between the second shooting component and the first shooting component
  • represents the preset angle of rotation
  • the external parameters between the second shooting component and the first shooting component can also be expressed in the form of formula 1a.
  • v and u are preset constants, which can be used to eliminate errors in corresponding parameters.
  • the constants used to eliminate the errors of each parameter can be the same, such as the values of v and u are the same, or they can be different, such as the values of v and u are different.
  • the overall error of Formula 1 can also be eliminated through an error matrix.
  • the error matrix can be a 3 ⁇ 3 matrix, and each element in the matrix is a preset constant. In other words, this application does not limit the expression form of the preset constant used to eliminate the error in Formula 1, nor does it limit the position where the preset constant is added in Formula 1.
  • the coordinate system of the first shooting component is rotated by a preset angle around the y-axis in the coordinate system of the first shooting component, then the second shooting component and the first shooting component
  • the external parameters between a shooting component can be expressed in the form of Formula 2.
  • the relationship between the second photographing component and the first photographing component is The external parameters between can be expressed in the form of Formula 3.
  • the above examples are all about rotating the coordinate system of the first shooting component around a coordinate axis in the coordinate system of the first shooting component to obtain the second shooting component.
  • the coordinate system of the first shooting component can also be rotated around a coordinate axis. Rotate multiple coordinate axes in the coordinate system of the first shooting component to obtain the second shooting component. For example, first rotate a certain angle around the x-axis, and then rotate a certain angle around the y-axis. Another example: first rotate a certain angle around the z-axis, and then rotate a certain angle around the x-axis.
  • Another example first rotate a certain angle around the y-axis, then rotate a certain angle around the x-axis, and finally rotate a certain angle around the z-axis.
  • you can rotate one or more times around a coordinate axis for example: first rotate a certain angle around the x-axis, then rotate a certain angle around the y-axis, then rotate a certain angle around the x-axis, etc.
  • the angles of these three rotations can be the same or different.
  • the external parameters between the second shooting component and the first shooting component can be expressed by It is expressed by multiplying a plurality of formulas 1, 2 and 3 above. For example, first rotate a certain angle around the x-axis and then rotate a certain angle around the y-axis.
  • the external parameters between the second shooting component and the first shooting component can be expressed by multiplying the above formula 1 and formula 2, such as the form of formula 4 .
  • X represents the external parameter between the second shooting component and the first shooting component.
  • represents the angle of rotation around the x-axis, 0 degrees ⁇ ⁇ ⁇ 360 degrees.
  • represents the angle of rotation around the y-axis, 0 degrees ⁇ ⁇ ⁇ 360 degrees.
  • ⁇ and ⁇ can be the same or different.
  • the preset angle ie, the above-mentioned ⁇
  • the preset angle can be determined based on the angle of rotation around each coordinate axis.
  • the preset angle can be determined based on ⁇ and ⁇ .
  • the preset angle may be the sum of ⁇ and ⁇ .
  • the expression of the external parameters between the second shooting component and the first shooting component can also refer to the above implementation, which will not be discussed here. Give detailed examples.
  • the coordinate system of the first shooting component may not be rotated around the coordinate axis in the coordinate system of the first shooting component, but may be rotated around other directions to obtain the second shooting component. That is to say, in the embodiment of the present application, the coordinate system of the first shooting component can be rotated in any direction and at any angle to obtain the second shooting component, which is not limited in this application.
  • the coordinate system of the first shooting component 1100 can be as shown in (1) in Figure 11, as shown in (1) in Figure 11
  • the coordinate system is rotated to the left by N degrees, N is a positive number, and a new coordinate system is obtained.
  • the new coordinate system can be the coordinate system shown in (2) in Figure 11,
  • the photographing component corresponding to the coordinate system shown in (2) in FIG. 11 is the second photographing component.
  • the setting method of the coordinate system of the second shooting component can be the same as that of the first shooting component.
  • the optical center of the shooting component is used as the origin of the coordinate system
  • the line of sight direction of the shooting component is used as z. axis direction.
  • the setting method of the coordinate system of the second photographing component may be different from the setting method of the coordinate system of the first photographing component, for example, the origin of the coordinate system is different, the setting of the coordinate axis is different, etc. This application does not make any restrictions on this.
  • the coordinate system of the first shooting component and the coordinate system of the second shooting component can be the same type of coordinate system, such as space rectangular coordinate systems, or they can be different types of coordinate systems.
  • the intrinsic parameters of the first shooting component and the second shooting component are the same, for example, the intrinsic parameters of the first shooting component are directly used as the intrinsic parameters of the second shooting component.
  • the internal parameters of the first shooting component and the second shooting component are different, for example, different internal parameters are set for the second shooting component and those of the first shooting component.
  • the distortion coefficients of the first photographing component and the second photographing component may be the same or different.
  • the internal parameters and distortion coefficient of the shooting component please refer to the introduction of the internal parameters and distortion coefficient of the camera in conventional technology, which will not be described again in this application.
  • the first three-dimensional information of the first object in the first image is the three-dimensional information of the first object in the coordinate system of the first shooting component
  • the second three-dimensional information of the first object in the second image is Three-dimensional information of the first object in the coordinate system of the second shooting component.
  • the second three-dimensional information of the first object may be determined based on the first three-dimensional information of the first object and the coordinate transformation relationship between the first shooting component and the second shooting component. It can be understood that the coordinate transformation relationship between the first shooting component and the second shooting component can also be described as an external parameter between the first shooting component and the second shooting component.
  • step S1003 the method shown in Figure 10 may also include the following steps S1003a and S1003b (not shown in the figure):
  • the point cloud data includes third three-dimensional information of the first object.
  • the third three-dimensional information of the first object is three-dimensional information in the coordinate system of the sensor.
  • the sensor can be various sensors used to collect point cloud data, such as lidar.
  • the first three-dimensional information, or the second three-dimensional information, or the third three-dimensional information includes one or more of the following: three-dimensional coordinates, size, orientation angle, etc.
  • S1003b Determine the first three-dimensional information according to the third three-dimensional information and the coordinate transformation relationship between the first shooting component and the sensor.
  • the recognition model is used to recognize objects in the images collected by the first shooting component.
  • the recognition model can also use the imaging principle to project the second three-dimensional information of the first object onto the pixels of the second image based on the internal parameters and distortion coefficients of the second shooting component for training. the recognition model.
  • the imaging principle please also refer to the introduction in conventional technology, which will not be repeated in this application. narrate.
  • the orientation angle is usually obtained by adding ⁇ and ⁇ .
  • FIG. 12 shows the distribution diagram of ⁇ and ⁇ of the object obtained by the embodiment of the present application. As shown in Figure 12, the abscissa represents ⁇ and the ordinate represents ⁇ . As can be seen from Figure 12, both the ⁇ and ⁇ distributions are balanced.
  • the image collected by the first shooting component generates an image corresponding to the second shooting component
  • the second shooting component can be determined according to the first shooting component, so that images with different viewing angles from those of the first shooting component can be obtained, so that Image data is richer.
  • the three-dimensional information of the object in the image collected by the first shooting component the three-dimensional information of the object in the image corresponding to the second shooting component is determined, and the three-dimensional information of the object in the image with a different perspective from that of the first shooting component can be obtained, and then Obtaining more evenly distributed three-dimensional information, and then using the more evenly distributed three-dimensional information to train the recognition model, can reduce the impact of the uneven distribution of training data on the prediction results of the recognition model. Subsequently, when using this recognition model to detect objects, the three-dimensional information of the objects obtained is more accurate.
  • each first photographing component may correspond to a recognition model.
  • the recognition model training device can determine its corresponding second shooting component according to the first shooting component, thereby training the recognition model corresponding to the first shooting component.
  • the subsequent recognition model corresponding to each first shooting component is only used to identify objects in the images collected by the corresponding first shooting component.
  • the multiple first photography components may correspond to one recognition model.
  • the model training device can determine the corresponding second photographing components according to the plurality of first photographing components, thereby training the recognition models corresponding to the plurality of first photographing components. Subsequently, the recognition model obtained through training can be used to identify its corresponding objects in the images collected by the plurality of first shooting components.
  • the recognition model training device can use a method such as that shown in Figure 10 to generate images corresponding to the second photographing component based on the images collected by the plurality of first photographing components. Sure.
  • the second three-dimensional information of the object in the image corresponding to the second shooting component is determined based on the first three-dimensional information of the object in the images collected by the plurality of first shooting components.
  • the recognition model is trained according to the image corresponding to the second shooting component and the second three-dimensional information of the object in the image corresponding to the second shooting component.
  • the image collected by each first shooting component and the first three-dimensional information of the object in the image can be used to train the same recognition model, increasing the training data for training the recognition model, and improving the accuracy of the data. Utilization reduces the cost of collecting training data.
  • training one recognition model can be applied to multiple first photography components. That is to say, one training model can recognize objects in images collected by multiple first photography components. In this way, there is no need to train a recognition model for each first shooting component, which reduces training costs.
  • step S1002 shown in Figure 10 can be specifically implemented as steps S1005 to S1008 shown in Figure 13:
  • the preset reference surface may be a spherical surface with the optical center of the first shooting component as the center of the sphere.
  • FIG. 14 shows a schematic diagram of a preset reference plane provided by the embodiment of the present application.
  • the preset reference surface may be a spherical surface K with the optical center of the first photographing component as the center of the sphere, that is, a spherical surface K with the origin o of the coordinate system of the first photographing component as the center of the sphere. .
  • the preset reference surface can also be set in other ways, such as: based on the light distance from the first shooting component.
  • a point at a preset distance from the center is a spherical surface whose center is the sphere, or another position of the first shooting component is a spherical surface whose center is the sphere.
  • This application does not limit the specific setting method of the preset reference surface.
  • the preset reference surface may be a plane or a curved surface, which is not limited in this application.
  • the point P point normalized to the spherical surface K is the P’ point. Since there is a correspondence between point P and point P1, and there is a correspondence between point P and point P’, there is also a correspondence between point P’ and point P1. It should be understood that any point in the coordinate system of the first shooting component can be normalized to a corresponding point on the spherical surface K. For different points in the coordinate system of the first shooting component, it can be normalized to a point on the spherical surface K. Probably the same. Among them, the points on the normalized sphere K have a one-to-one correspondence with the pixel points in the first image.
  • the first coordinate of the first pixel point on the preset reference plane is the coordinate of point P' in the coordinate system of the first shooting component, that is, (1 in Figure 14 ) The coordinates of point P' in the coordinate system shown in ).
  • the first pixel point can be any pixel point in the first image, and for each pixel point in the first image, the first coordinate on the preset reference plane can be determined according to this step.
  • the first pixel point may also be a pixel point obtained by performing difference processing on the pixel points in the first image. This application does not impose any limitation on the first pixel point. Since difference processing is an existing technology, reference can be made to existing solutions for details, and this article will not introduce it in detail.
  • the second photographing component can be obtained by rotating the coordinate system of the first photographing component by a preset angle. Therefore, when the second photographing component is obtained by rotating the coordinate system of the first photographing component by a preset angle, the spherical surface K shown in (1) in Figure 14 is also rotated by a preset angle, resulting in (2) in Figure 14 Spherical surface K1.
  • the position of point P in the camera coordinate system of the first shooting component shown in (1) in Figure 14 is normalized to the point on the spherical surface K1 shown in (2) in Figure 14.
  • point P' shown in (2) in Figure 14 is at the The coordinates in the coordinate system of the two shooting components (the coordinate system shown in (2) in Figure 14) are the second coordinates.
  • the second coordinates corresponding to the first pixel point and the second shooting component may be determined based on the first coordinates and the external parameters between the second shooting component and the first shooting component.
  • the second coordinates corresponding to the first pixel point and the second camera component can be obtained by left-multiplying the first coordinate with the aforementioned rotation matrix.
  • the rotation matrix is the matrix in Formula 1
  • the second coordinates corresponding to the first pixel point and the second shooting component can be expressed in the form of Formula 5.
  • the expression of the second coordinate corresponding to the first pixel point and the second shooting component can also refer to the implementation of Formula 5, here No further examples will be given.
  • the points on the spherical surface K1 can be mapped to the pixels on the second image.
  • the second coordinate is P2.
  • the corresponding second pixel point can be determined according to the solution of the embodiment of the present application.
  • the second image can be generated based on the second pixel by calling the remap function of an open source computer vision library (OpenCV).
  • OpenCV open source computer vision library
  • Figure 15 shows a schematic diagram of the orientation of an object predicted by the recognition model trained by the existing solution
  • (2) in Figure 15 shows the orientation of the object obtained by the recognition model trained by the embodiment of the present application.
  • Orientation diagram the vehicle shown in Figure 15 is the object.
  • the white rectangular box of each vehicle is the 3D box marked with the 3D information of the vehicle predicted according to the recognition model.
  • the direction of the white arrow is the direction of the front of the vehicle, that is, the direction of the vehicle. It can be seen from Figure 15 that the accuracy of the object orientation obtained by the recognition model trained by the embodiment of the present application is higher than the accuracy of the corresponding object orientation predicted by the model trained by the existing solution.
  • the recognition model training device includes corresponding hardware structures and/or software modules for executing each function.
  • the embodiments of this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed in hardware or computer-driven hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to exceed the scope of the technical solutions of the embodiments of the present application.
  • This application is an embodiment that can divide the recognition model training device into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or software function modules. It should be noted that the division of units in the embodiments of this application is schematic and is only a Logical function division can be divided in other ways during actual implementation.
  • FIG. 16 it is a schematic structural diagram of a recognition model training device 1600 provided by an embodiment of the present application.
  • the recognition model training device 1600 can be used to implement the methods described in each of the above method embodiments.
  • the recognition model training device may be the training device 10 as shown in Figure 6, or may be a module (such as a chip) applied to a server, and the server may be located in the cloud.
  • the recognition model training device 1600 may specifically include: an acquisition unit 1601 and a processing unit 1602.
  • the acquisition unit 1601 is used to support the recognition model training device 1600 in executing step S1001 in Figure 10 . And/or, the acquisition unit 1601 is also used to support the recognition model training device 1600 to execute step S1001 in FIG. 13 . And/or, the acquisition unit 1601 is also used to support the recognition model training device 1600 to perform other steps performed by the recognition model training device in the embodiment of the present application.
  • the processing unit 1602 is used to support the recognition model training device 1600 in executing steps S1002 to S1004 in FIG. 10 . And/or, the processing unit 1602 is also used to support the recognition model training device 1600 to perform steps S1005 to S1008 and steps S1003 to S1004 in FIG. 13 . And/or, the processing unit 1602 is also used to support the recognition model training device 1600 to perform other steps performed by the recognition model training device in the embodiment of the present application.
  • the recognition model training device 1600 shown in Figure 16 may also include a communication unit 1603, which is used to support the recognition model training device 1600 to perform communication between the recognition model training device and other devices in the embodiment of the present application. A step of.
  • the recognition model training device 1600 shown in Figure 16 may also include a storage unit (not shown in Figure 16), which stores programs or instructions.
  • a storage unit not shown in Figure 16
  • the recognition model training device 1600 shown in FIG. 16 can execute the method described in the above method embodiment.
  • the processing unit 1602 involved in the recognition model training device 1600 shown in Figure 16 can be implemented by a processor or processor-related circuit components, and can be a processor or a processing module.
  • the communication unit 1603 can be implemented by a transceiver or a transceiver-related circuit component, and can be a transceiver or a transceiver module.
  • the chip system includes at least one processor 1701 and at least one interface circuit 1702.
  • the processor 1701 and the interface circuit 1702 may be interconnected by wires.
  • interface circuitry 1702 may be used to receive signals from other devices.
  • interface circuit 1702 may be used to send signals to other devices (eg, processor 1701).
  • the interface circuit 1702 can read instructions stored in the memory and send the instructions to the processor 1701.
  • the recognition model training device can be caused to perform various steps performed by the recognition model training device in the above embodiment.
  • the chip system may also include other discrete devices, which are not specifically limited in the embodiments of this application.
  • processors in the chip system there may be one or more processors in the chip system.
  • the processor can be implemented in hardware or software.
  • the processor may be a logic circuit, an integrated circuit, or the like.
  • the processor may be a general-purpose processor implemented by reading software code stored in memory.
  • the memory may be integrated with the processor or may be provided separately from the processor, which is not limited by this application.
  • the memory can be a non-transient processor, such as a read-only memory ROM, which can be integrated on the same chip as the processor, or can be separately provided on different chips.
  • This application describes the type of memory, and the relationship between the memory and the processor. How the processor is set up No specific limitation is made.
  • the chip system can be a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or a system on chip (SoC). It can also be a central processor (central processor unit, CPU), a network processor (network processor, NP), a digital signal processing circuit (digital signal processor, DSP), or a microcontroller (micro controller unit (MCU), or a programmable logic device (PLD) or other integrated chip.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • SoC system on chip
  • CPU central processor unit
  • NP network processor
  • DSP digital signal processing circuit
  • MCU microcontroller
  • PLD programmable logic device
  • each step in the above method embodiment can be completed by an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the method steps disclosed in conjunction with the embodiments of this application can be directly implemented by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • An embodiment of the present application also provides an electronic device, which includes the recognition model described in the above method embodiment.
  • the electronic device can be the above execution device.
  • the electronic device further includes the first shooting component described in the above method embodiment.
  • Embodiments of the present application also provide a computer storage medium that stores computer instructions.
  • the recognition model training device executes the method described in the above method embodiment.
  • Embodiments of the present application provide a computer program product.
  • the computer program product includes: a computer program or instructions. When the computer program or instructions are run on a computer, the computer is caused to execute the method described in the above method embodiment.
  • An embodiment of the present application provides a movable smart device, including a first shooting component and the recognition model training device described in the above embodiment.
  • the first shooting component is used to collect a first image and transmit the first image to the recognition model training device. device.
  • the embodiment of the present application also provides a device.
  • This device may be a chip, a component or a module.
  • the device may include a connected processor and a memory.
  • the memory is used to store computer execution instructions. When the device is running, the processing The device can execute computer execution instructions stored in the memory, so that the device executes the methods in each of the above method embodiments.
  • the recognition model training device, movable smart device, computer storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can be referred to The beneficial effects of the corresponding methods provided above will not be described again here.
  • a unit described as a separate component may or may not be physically separate.
  • a component shown as a unit may be one physical unit or multiple physical units, that is, it may be located in one place, or it may be distributed to multiple different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • Integrated units may be stored in a readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.
  • the technical solutions of the embodiments of the present application are essentially or contribute to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium , including several instructions to cause a device (which can be a microcontroller, a chip, etc.) or a processor to execute all or part of the steps of the methods of various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种识别模型训练方法、装置以及可移动智能设备,涉及智能驾驶技术领域,能够使得采用训练得到的识别模型进行障碍物识别时,获得的障碍物的3D信息更加准确。方法包括:获取第一拍摄组件采集的第一图像,第一拍摄组件对应第一视角,根据第一图像生成第二拍摄组件对应的第二图像,第二拍摄组件根据第一拍摄组件确定,第二拍摄组件对应第二视角。根据第一图像中的第一对象的第一三维信息确定第二图像中的该第一对象的第二三维信息,根据所述第二图像以及所述第一对象的第二三维信息训练识别模型,识别模型用于识别第一拍摄组件采集的图像中的对象。

Description

识别模型训练方法、装置以及可移动智能设备
本申请要求于2022年08月22日提交国家知识产权局、申请号为202211009619.X、发明名称为“识别模型训练方法、装置以及可移动智能设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及智能驾驶领域,尤其涉及识别模型训练方法、装置以及可移动智能设备。
背景技术
目前,可移动智能设备(例如,智能驾驶车辆、扫地机器人等)可以利用视觉传感器(例如:摄像头)等获取环境信息,以进行障碍物的识别。其中,基于单目(monocular)相机的障碍物识别是一种低成本的视觉感知方案。其原理是将单目相机采集的图像输入预置的识别模型中,通过该识别模型即可输出该图像中包括的各障碍物的三维(3dimensions,3D)信息,比如:3D坐标,尺寸、朝向角等。
相关方案中,通过获取图像以及该图像中包括的障碍物的3D信息来训练上述识别模型。但是,由于获取到的障碍物的3D信息分布的不均衡,在采用该分布不均衡的障碍物的3D信息训练得到的识别模型进行障碍物识别时,获得的障碍物的3D信息不准确。
发明内容
本申请提供一种识别模型训练方法、装置以及可移动智能设备,能够使得采用训练得到的识别模型进行障碍物识别时,获得的障碍物的3D信息更加准确。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请提供一种识别模型训练方法,该方法包括:获取第一拍摄组件采集的第一图像,该第一拍摄组件对应第一视角;根据该第一图像生成第二拍摄组件对应的第二图像,该第二拍摄组件根据第一拍摄组件确定,该第二拍摄组件对应第二视角;根据该第一图像中的第一对象的第一三维信息确定该第二图像中的该第一对象的第二三维信息;根据所述该第二图像以及该第二三维信息训练识别模型,该识别模型用于识别该第一拍摄组件采集的图像中的对象。
基于上述技术方案,通过第一拍摄组件采集的图像生成第二拍摄组件对应的图像,而第二拍摄组件可以根据第一拍摄组件确定,第一拍摄组件对应第一视角,第二拍摄组件对应第二视角,这样可以获得与第一拍摄组件不同视角的图像,使得获得的图像数据更加丰富。根据第一拍摄组件采集的图像中的对象的三维信息确定第二拍摄组件对应的图像中的该对象的三维信息,可以获得与第一拍摄组件不同视角的图像中该对象的三维信息,进而获得分布地更加均衡的三维信息,然后利用该分布地更加均衡的三维信息训练识别模型,可以降低由于训练数据分布不均衡给识别模型的预测结果带来的影响。后续,在利用该识别模型进行对象检测时,获得的对象的三维信息更加准确。
一种可能的设计中,该第二拍摄组件为虚拟拍摄组件;在该根据该第一图像生成第二拍摄组件对应的第二图像之前,该方法还包括:将该第一拍摄组件的坐标系旋转预设角度获得该第二拍摄组件。基于该设计,通过将第一拍摄组件的坐标系旋转预设角度获得第二拍摄组件,这样可以获得一个虚拟的与第一拍摄组件视角不同的拍摄组件,进而可以将第一拍摄组件采集的图像以及该图像中的对象的三维信息,转换成该虚拟的拍摄组件对应的图像以及该图像中的对象的三维信息,因此能够获得不同视角的对象的三维信息,使得获得的训练数据分布地更加均衡,由此能够降低由于训练数据分布不均衡给训练的模型带来的影响。
一种可能的设计中,第一拍摄组件与第二拍摄组件的内参相同;或者,第一拍摄组件与第二拍摄组件的内参不同。
一种可能的设计中,该根据该第一图像生成第二拍摄组件对应的第二图像,包括:确定该第一图像中的第一像素点在预设参考面上的第一坐标,该第一像素点为该第一图像中的任意一个像素点;根据该第一坐标确定该第一像素点与该第二拍摄组件对应的第二坐标;根据该第二坐标确定第二像素点;根据该第二像素点生成该第二图像。基于该设计,以预设参考面为参考,由于第一像素点与预设参考面的点是存在对应关系的,而第二像素点与预设参考面上的点也存在对应关系。因此通过以预设参考面为中介,即可根据第一拍摄组件采集的图像中的像素点确定第二拍摄组件对应的像素点,并根据第二拍摄组件对应的第二像素点生成第二拍摄组件对应的图像。
一种可能的设计中,该预设参考面为以该第一拍摄组件的光心为球心的球面。基于该设计,预设参考面为以第一拍摄组件的光心为球心的球面,这样,在将第一拍摄组件的坐标系旋转预设角度获得第二拍摄组件时,预设参考面不会发生改变,第一拍摄组件与第二拍摄组件对应的预设参考面均为同一个球面,进而可以以该预设参考面为中介,根据第一拍摄组件采集的图像中的像素点确定第二拍摄组件对应的图像中的像素点,以生成第二拍摄组件对应的图像。
一种可能的设计中,该根据该第一图像中的第一对象的第一三维信息确定该第二图像中的该第一对象的第二三维信息,包括:根据该第一三维信息、该第一拍摄组件与该第二拍摄组件的坐标转换关系确定该第二三维信息。基于该设计,可以将第一拍摄组件采集的图像中的对象的三维信息转换为第二拍摄组件对应的图像中的该对象的三维信息。
一种可能的设计中,在该根据该第一图像中的第一对象的第一三维信息确定该第二图像中的该第一对象的第二三维信息之前,该方法还包括:获取传感器采集的与该第一图像对应的点云数据,该点云数据中包括该第一对象的第三三维信息;根据该第三三维信息以及该第一拍摄组件与该传感器之间的坐标转换关系确定该第一三维信息。基于该设计,可以将传感器采集的对象的三维信息转换成第一拍摄组件采集的图像中的该对象的三维信息。
一种可能的设计中,第一三维信息、第二三维信息、或者第三三维信息包括以下一种或多种:三维坐标、尺寸、朝向角。
第二方面,本申请提供一种识别模型训练装置,所述识别模型训练装置包括实现上述方法相应的模块或单元,该模块或单元可以通过硬件实现,软件实现,或者通过 硬件执行相应的软件实现。在一种可能的设计中,该识别模型训练装置包括获取单元(或称获取模块)和处理单元(或称处理模块);该获取单元,用于获取第一拍摄组件采集的第一图像,该第一拍摄组件对应第一视角。该处理单元,用于:根据该第一图像生成第二拍摄组件对应的第二图像,该第二拍摄组件根据该第一拍摄组件确定,该第二拍摄组件对应第二视角;根据该第一图像中的第一对象的第一三维信息确定该第二图像中的该第一对象的第二三维信息;根据该第二图像以及该第二三维信息训练识别模型,该识别模型用于识别该第一拍摄组件采集的图像中的对象。
一种可能的设计中,该第二拍摄组件为虚拟拍摄组件;该处理单元,还用于将该第一拍摄组件的坐标系旋转预设角度获得该第二拍摄组件。
一种可能的设计中,该第一拍摄组件与该第二拍摄组件的内参相同;或者,该第一拍摄组件与该第二拍摄组件的内参不同。
一种可能的设计中,该处理单元,具体用于:确定该第一图像中的第一像素点在预设参考面上的第一坐标,该第一像素点为该第一图像中的任意一个像素点;根据该第一坐标确定该第一像素点与该第二拍摄组件对应的第二坐标;根据该第二坐标确定第二像素点;根据该第二像素点生成该第二图像。
一种可能的设计中,该预设参考面为以该第一拍摄组件的光心为球心的球面。
一种可能的设计中,该处理单元,具体用于根据该第一三维信息、该第一拍摄组件与该第二拍摄组件的坐标转换关系确定该第二三维信息。
一种可能的设计中,该获取单元,还用于获取传感器采集的与该第一图像对应的点云数据,该点云数据中包括该第一对象的第三三维信息;该处理单元,还用于根据该第三三维信息以及该第一拍摄组件与该传感器之间的坐标转换关系确定该第一三维信息。
一种可能的设计中,第一三维信息、第二三维信息、或者第三三维信息包括以下一种或多种:三维坐标、尺寸、朝向角。
第三方面,本申请提供一种识别模型训练装置,包括:处理器,处理器与存储器耦合;处理器,用于执行存储器中存储的计算机程序,以使得识别模型训练装置执行如上述第一方面及其中任一设计所述的方法。可选的,存储器可以与处理器耦合,也可以独立于处理器。
一种可能的设计中,识别模型训练装置还包括通信接口,该通信接口可用于识别模型训练装置与其他装置通信。示例性的,该通信接口可以为收发器、输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等。
上述第三或第四方面的识别模型训练装置可以是智能驾驶***中的计算平台,该计算平台可以是车载计算平台或云端计算平台。
第四方面,本申请提供一种电子设备,该电子设备包括如上述第一方面或第二方面及其中任一设计中所述的识别模型。
一种可能的设计中,该电子设备还包括如上述第一方面或第二方面及其中任一设计中所述的第一拍摄组件。示例性的,第一拍摄组件可以为单目相机。
第五方面,本申请提供一种计算机可读存储介质,计算机可读存储介质包括计算机程序或指令,当计算机程序或指令在识别模型训练装置上运行的情况下,使得识别 模型训练装置执行如上述第一方面及其中任一设计所述的方法。
第六方面,本申请提供一种计算机程序产品,所述计算机程序产品包括:计算机程序或指令,当所述计算机程序或指令在计算机上运行时,使得所述计算机执行如上述第一方面及其中任一设计所述的方法。
第七方面,本申请提供一种芯片***,包括至少一个处理器和至少一个接口电路,至少一个接口电路用于执行收发功能,并将指令发送给至少一个处理器,当至少一个处理器执行指令时,至少一个处理器执行如上述第一方面极其中任一设计所述的方法。
第八方面,本申请提供一种可移动智能设备,包括:第一拍摄组件以及上述第二方面或第三方面及其中任一设计所述的识别模型训练装置,所述第一拍摄组件用于采集第一图像,并将所述第一图像传输给所述识别模型训练装置。
需要说明的是,上述第二方面至第八方面中任一设计所带来的技术效果可以参见第一方面中对应设计所带来的技术效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种点云数据的示意图;
图2为本申请实施例提供的一种单目相机的坐标系以及激光雷达的坐标系的示意图;
图3为本申请实施例提供的一种可视化3D信息的示意图;
图4为本申请实施例提供的一种障碍物的朝向角的示意图;
图5为相关方案中获取到的用于训练识别模型的朝向角的分布示意图;
图6为本申请实施例提供的一种***的架构示意图;
图7为本申请实施例提供的一种训练设备的结构示意图;
图8为本申请实施例提供的一种车辆的功能框图;
图9为本申请实施例提供的一种车辆上部署的相机的示意图;
图10为本申请实施例提供的一种识别模型训练方法的流程示意图;
图11为本申请实施例提供的一种第二拍摄组件的示意图;
图12为本申请实施例提供的获取到的用于训练识别模型的朝向角的分布示意图;
图13为本申请实施例提供的又一种识别模型训练方法的流程示意图;
图14为本申请实施例提供的一种预设参考面的示意图;
图15为本申请实施例提供一种识别模型预测得到的3D信息的示意图;
图16为本申请实施例提供的一种识别模型训练装置的结构示意图;
图17为本申请实施例提供的一种芯片***的结构示意图。
具体实施方式
在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。
在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a和b,a 和c,b和c,a和b和c,其中a,b,c可以是单个,也可以是多个。
另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
本申请中的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本申请实施例中的一些可选的特征,在某些场景下,可以不依赖于其他特征,而独立实施,解决相应的技术问题,达到相应的效果,也可以在某些场景下,依据需求与其他特征进行结合。
本申请中,除特殊说明外,各个实施例之间相同或相似的部分可以互相参考。在本申请中各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。本申请实施方式并不构成对本申请保护范围的限定。
此外,本申请实施例描述的网络架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
为便于理解,下面先对本申请实施例可能涉及的技术术语和相关概念进行介绍。
1、单目相机
单目相机指的是只有一个镜头且无法直接测量深度信息的相机。
2、鸟瞰图
鸟瞰图是根据透视原理,用高视点透视法从高处某一点俯视地面起伏绘制成的立体图,也即从空中俯视某一地区所看到的图像。
3、识别模型
识别模型是由大量处理单元(例如,神经元)互相连接组成的信息处理***,识别模型中的处理单元中包含有相应的数学表达式。数据输入处理单元之后,处理单元运行其包含的数学表达式,对输入数据进行计算,生成输出数据。其中,每个处理单元的输入数据为与其连接的上一个处理单元的输出数据,每个处理单元的输出数据为与其连接的下一个处理单元的输入数据。
在识别模型中,输入数据之后,识别模型根据自身的学习训练,为输入数据选择相应的处理单元,并利用这些处理单元对对输入数据进行计算,确定并输出最终的运算结果。同时,识别模型在数据运算过程中还可以不断学习进化,根据对运算结果的 反馈不断优化自身的运算过程,识别模型运算训练次数越多,得到的结果反馈越多,计算的结果越准确。
本申请实施例中所记载的识别模型用于对拍摄组件采集到的图像进行处理,确定图像中的对象的三维信息。示例性的,这些对象包括但不限于人、物等各种类型的障碍物等。
4、点云数据
当一束激光照射到物体表面时,所反射的激光会携带方位、距离等信息。若将激光束按照某种轨迹进行扫描,便会边扫描边记录到反射的激光点信息,由于扫描极为精细,则能够得到大量的激光点,每个激光点都包含3D坐标,因而就可形成激光点云,这些激光点云可称为点云数据。在一些实施例中,这些激光点云可以通过激光雷达的激光扫描获取。当然,这些点云数据也可以通过其他类型的传感器获取,本申请不局限于此。
可选的,点云数据中可能包括一个或多个对象的3D信息,例如:3D坐标,尺寸、朝向角等。示例性的,图1示出了本申请实施例提供的点云数据。图1中的每个点即为一个激光点,圆圈所在的位置B为获取点云数据的传感器的位置。带有箭头的3D框(如3D框A)为根据扫描到的对象的3D信息建立的3D模型,该3D框的坐标即为该对象的3D坐标,该3D框的长、宽、高即为该对象的尺寸,该3D框的箭头的方向即为该对象的朝向等。
5、单目相机的坐标系
单目相机的坐标系可以为以单目相机的光心为原点的三维坐标系。示例性的,图2中(1)示出了本申请实施例提供的一种单目相机的坐标系的示意图。如图2中(1)所示,单目相机的坐标系可以是以单目相机的光心o为原点的空间直角坐标系,其中,单目相机的视线方向为z轴正方向,向下为y轴正方向,向右为x轴正方向。可选的,y轴以及x轴可以与单目相机的成像平面平行。
6、激光雷达的坐标系
激光雷达的坐标系可以是以激光发射中心为原点的三维坐标系。示例性的,图2中(2)示出了本申请实施例提供的一种激光雷达的坐标系的示意图。如图2中(2)所示,激光雷达的坐标系可以是以激光雷达的激光发射中心o为原点的空间直角坐标系。其中,该激光雷达的坐标系中,向上为z轴正方向,向前为x轴正方向,向左为y轴正方向。
可以理解,图2中(1)所示的单目相机的坐标系以及图2中(2)所示的激光雷达的坐标系仅为示例性说明,单目相机的坐标系以及激光雷达的坐标系也可以有其他的设定方式,或者设定为其他类型的坐标系,本申请对此不做限定。
目前,可移动智能设备(以自动驾驶车辆为例)可以利用视觉传感器(如:摄像头)等获取环境信息,以进行障碍物的识别。其中,基于单目相机的障碍物识别是一种低成本的视觉感知方案。其原理是将单目相机采集的图像输入预置的识别模型中,通过该识别模型即可输出该图像中的各障碍物在单目相机坐标系下的3D信息。
示例性的,图3中(1)为本申请实施例提供的将该识别模型输出的3D信息可视化在单目相机采集的图像上的示意图。其中,图3中(1)所示的每个3D框(如3D 框C等)是根据该3D框所在的障碍物的3D信息绘制的。图3中(2)为本申请实施例提供的将该识别模型输出的3D信息可视化在鸟瞰图上的示意图。同理,图3中(2)所示的每个3D框(如3D框D等)也是根据该3D框所在的障碍物的3D信息绘制的。
目前,可以通过获取图像以及该图像中的障碍物的3D信息来训练上述识别模型。其中,该图像中的障碍物的3D信息可以根据点云数据获得。具体的,通过激光雷达获取与图像对应的点云数据,该点云数据中包括障碍物(该障碍物也即图像中的障碍物)在激光雷达坐标系下的3D信息。然后标定激光雷达与单目相机之间的外参(extrinsics),即可将激光雷达坐标系下的3D信息转换成单目相机坐标系下的3D信息。后续,即可利用获取的图像以及前述转换后的该图像中的障碍物的在单目相机坐标系下的3D信息训练上述识别模型。可以理解,外参指的是不同坐标系之间的转换关系,标定激光雷达与单目相机之间的外参也即求解激光雷达的坐标系(如图2中(2)所示的坐标系)与单目相机的坐标系(如图2中(1)所示的坐标系)之间的转换关系。
该方案中,获取到的用于训练识别模型的障碍物的3D信息是不均衡的。示例性的,以障碍物的3D信息为朝向角为例,该朝向角通常由θ与α相加得到。示例性的,结合图2中(1)所示的单目相机的坐标系,图4示出了本申请实施例提供的一种θ以及α的示意图。其中,θ为障碍物的中心点M与单目相机的坐标系的原点o的连线与单目相机的坐标系的z轴在水平方向上的夹角,其中,箭头N的方向为障碍物的朝向。α表示在单目相机的坐标系下,以该坐标系的原点o为中心,该原点o到障碍物的中心点M的连线为半径,将障碍物绕着单目相机的坐标系的y轴旋转至z轴,如将障碍物从位置1移动到位置2,此时障碍物的朝向(即位置2所示的箭头N的方向)与x轴方向之间的夹角。
示例性的,图5示出了该方案获取到的障碍物的θ以及α的分布图。如图5所示,横坐标代表θ,纵坐标代表α。从图5可以看出,θ以及α分布是不均衡的。比如:θ为0度时,在-180度、-90度、0度、90度、180度等区域,α分布较为密集,而除前述区域以外的其他区域,α分布较为稀疏等。
而θ以及α均可以通过训练得到的识别模型预测获得,但是θ通常预测的较为准确,而α容易受到训练数据不均衡的影响。因此,在采用上述获得的分布不均衡的障碍物的3D信息训练识别模型时,受训练数据分布不均衡的影响,训练得到的识别模型在预测α时预测的不准确,比如:会将图5所示的分布稀疏处的α预测成分布密集处的α,进而使得预测的障碍物的朝向角不准确,也就是说,预测得到的障碍物的3D信息不准确。
基于此,本申请提供一种识别模型训练方法,能够降低由于训练数据分布不均衡给识别模型预测的结果带来的影响,使得在采用训练得到的识别模型进行障碍物识别时,获得的障碍物的3D信息更加准确。
示例性的,图6示出了本申请实施例提供的一种***架构示意图。如图6所示,***60包括训练设备10、执行设备20等。
其中,训练设备10可运行识别模型,对识别模型进行训练。示例性的,训练设备10可以是智能驾驶***中的计算平台,例如,训练设备10可以是服务器、云端设备等各种具有计算能力的计算设备,服务器可以是一台服务器,也可以是由多台服务器 组成的服务器集群。训练设备10也可以是车载计算平台。本申请对训练设备10的具体类型不做任何限定。
执行设备20上配置有训练设备10训练完成的识别模型,以利用该识别模型进行各种对象的识别等。示例性的,该对象包括但不限于行人、车辆、路标、动物、建筑物等各种类型的事物。在一些实施例中,可以将训练设备10训练完成的识别模型配置到多个执行设备20中,每个执行设备20均可以利用其配置的识别模型进行各种障碍物的识别等。在一些实施例中,每个执行设备20中可以配置有一个或多个训练设备10训练完成的识别模型,每个执行设备20可以利用其配置的一个或多个识别模型进行各种障碍物的识别。
示例性的,执行设备20可以是人工智能(artificial intelligence,AI)设备,如:扫地机器人、智能车辆等各种具有障碍物识别需求的设备,还可以是桌面型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)等各种具有控制前述具有障碍物识别需求的设备的功能的设备。
例如,对于自动驾驶场景,自动驾驶汽车依据预定路线行驶过程中,利用识别模型对环境中的路标、行驶参照物和道路上的障碍物等进行识别,以确保自动驾驶汽车安全准确地行驶。路标可以包含图形路标或文字路标。行驶参照物可以是建筑物或植物。道路上的障碍物可以包括动态物体(如:动物、行人、行驶的车辆等)或静止物体(如:静止的车辆)。
作为一种可能的示例,训练设备10和执行设备20是部署在不同物理设备(如:服务器或集群中的服务器)上的不同处理器。
例如,执行设备20可以是神经网络处理器(neural network processing unit,NPU)、图形处理单元(graphic processing unit,GPU)、中央处理器(central processing unit,CPU)、其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
训练设备10可以是GPU、NPU、微处理器、ASIC、或一个或多个用于控制本申请方案程序执行的集成电路。
应理解,图6仅为便于理解而示例的简化示例图,在实际应用中,上述***中还可以包括其他设备,图中未予以画出。
示例性的,本申请实施例中的训练设备10可通过不同的设备实现。例如,本申请实施例中的训练设备10可通过图7中的通信设备实现。图7为本申请实施例提供的训练设备10的一种硬件结构示意图。该训练设备10包括至少一个处理器701,通信线路702,存储器703以及至少一个通信接口704。
处理器701可以是一个通用CPU,微处理器,ASIC,或一个或多个用于控制本申请方案程序执行的集成电路。
通信线路702可包括一通路,在上述组件之间传送信息。
通信接口704,用于与其他设备通信。在本申请实施例中,通信接口704可以是 模块、电路、总线、接口、收发器或者其它能实现通信功能的装置。可选的,当通信接口是收发器时,该收发器可以为独立设置的发送器,该发送器可用于向其他设备发送信息,该收发器也可以为独立设置的接收器,用于从其他设备接收信息。该收发器也可以是将发送、接收信息功能集成在一起的部件,本申请实施例对收发器的具体实现不做限制。
存储器703可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储器、光碟存储器(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质。存储器703可以是独立存在,通过通信线路702与处理器701相连接。存储器703也可以和处理器701集成在一起。
其中,存储器703用于存储用于实现本申请方案的计算机执行指令。处理器701用于执行存储器703中存储的计算机执行指令,从而实现本申请下述实施例提供的方法。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码、指令、计算机程序或者其它名称,本申请实施例对此不作具体限定。
在具体实现中,作为一种实施例,处理器701可以包括一个或多个CPU,例如图7中的CPU0和CPU1。
在具体实现中,作为一种实施例,训练设备10可以包括多个处理器,例如图7中的处理器701和处理器705。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
上述的训练设备10可以是一个通用设备或者是一个专用设备,本申请实施例不限定训练设备10的类型。
可以理解的是,本申请实施例示意的结构并不构成对训练设备10的具体限定。在本申请另一些实施例中,训练设备10可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
以执行设备20为自动驾驶车辆为例,图8示出了车辆100的功能框图。在一些实施例中,车辆100中配置有训练好的识别模型,在行驶过程中,其可以利用该识别模型对环境中的对象进行识别,以确保车辆100安全准确的行驶。示例性的,该对象包括但不限于路标、行驶参照物和道路上的障碍物等。
车辆100可包括各种子***,例如行进***110、传感器***120、控制***130、一个或多个***设备140、电源150、计算机***160和用户接口170。可选地,车辆100可包括更多或更少的子***,并且每个子***可包括多个元件。另外,车辆100的每个子***和元件可以通过有线或者无线互连。
行进***110包括为车辆100提供动力运动的组件。在一个实施例中,行进***110包括引擎111、传动装置112、能量源113和车轮114。引擎111可以是内燃引擎、电动机、空气压缩引擎或其他类型的引擎组合,例如汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎111将能量源113转换成机械能量。
能量源113的示例包括汽油、柴油、其他基于石油的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源113也可以为车辆100的其他***提供能量。
传动装置112可以将来自引擎111的机械动力传送到车轮114。传动装置112可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置112还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮114的一个或多个轴。
传感器***120可包括感测关于车辆100周边的环境的信息的若干个传感器。例如,传感器***120包括定位***121(定位***可以是全球定位***(global positioning system,GPS),也可以是北斗***或者其他定位***)、惯性测量单元(inertial measurement unit,IMU)122、雷达123、激光雷达124以及相机125。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是车辆100自动驾驶的安全操作的关键功能。
定位***121可用于估计车辆100的地理位置。IMU 122用于基于惯性加速度来感测车辆100的位置和朝向变化。在一个实施例中,IMU 122可以是加速度计和陀螺仪的组合。
雷达123可利用无线电信号来感测车辆100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达123还可用于感测物体的速度和/或前进方向。
激光雷达124可利用激光来感测车辆100所位于的环境中的物体。在一些实施例中,激光雷达124可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他***组件。
相机125可用于捕捉车辆100的周边环境的多个图像,以及车辆驾驶舱内的多个图像。相机125可以是静态相机或视频相机。在本申请的一些实施例中,相机125可以为单目相机,示例性的,单目相机包括但不限于长距相机、中距相机、短距相机、鱼眼相机等。示例性的,图9示出了本申请实施例提供的一种车辆100上部署的相机125的示意图。示例性的,相机125可以部署在车头、车尾、车身两侧、车顶等各种位置。
控制***130可控制车辆100及其组件的操作。控制***130可包括各种元件,例如转向***131、油门132、制动单元133、计算机视觉***134、路线控制***135以及障碍规避***136。
转向***131可操作来调整车辆100的前进方向。例如在一个实施例中可以为方向盘***。
油门132用于控制引擎111的操作速度并进而控制车辆100的速度。
制动单元133用于控制车辆100减速。
计算机视觉***134可以操作来处理和分析由相机125捕捉的图像以便识别车辆 100周边环境中的物体和/或特征以及车辆驾驶舱内的驾驶员的肢体特征和面部特征。所述物体和/或特征可包括交通信号、道路状况和障碍物,所述驾驶员的肢体特征和面部特征包括驾驶员的行为、视线、表情等。
路线控制***135用于确定车辆100的行驶路线。在一些实施例中,路线控制***135可结合来自传感器、定位***121和一个或多个预定地图的数据以为车辆100确定行驶路线。
障碍规避***136用于识别、评估和避免或者以其他方式越过车辆100的环境中的潜在障碍物。
当然,在一个实例中,控制***130可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。
车辆100通过***设备140与外部传感器、其他车辆、其他计算机***或用户之间进行交互。***设备140可包括无线通信***141、车载电脑142、麦克风143和/或扬声器144。在一些实施例中,***设备140提供车辆100的用户与用户接口170交互的手段。
无线通信***141可以直接地或者经由通信网络来与一个或多个设备无线通信。
电源150可向车辆100的各种组件提供电力。
车辆100的部分或所有功能受计算机***160控制。计算机***160可包括至少一个处理器161,处理器161执行存储在例如数据存储装置162中的指令1621。计算机***160还可以是采用分布式方式控制车辆100的独立组件或子***的多个计算设备。
处理器161可以是任何常规的处理器,诸如CPU或ASIC或其它基于硬件的处理器的专用设备。尽管图8功能性地图示了处理器、数据存储装置、和在相同物理外壳中的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机***、或数据存储装置实际上可以包括存储在相同的物理外壳内的多个处理器、计算机***、或数据存储装置,或者包括存储在不同的物理外壳内的多个处理器、计算机***、或数据存储装置。例如,数据存储装置可以是硬盘驱动器,或位于不同于物理外壳内的其它存储介质。因此,对处理器或计算机***的引用将被理解为包括对可以并行操作的处理器或计算机***或数据存储装置的集合的引用,或者可以不并行操作的处理器或计算机***或数据存储装置的集合的引用。不同于使用单一的处理器来执行此处所描述的步骤,诸如转向组件和减速组件的一些组件每个都可以具有其自己的处理器,所述处理器只执行与特定于组件的功能相关的计算。
在此处所描述的各个方面中,处理器可以位于远离该车辆并且与该车辆进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于车辆内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。
在一些实施例中,数据存储装置162可包含指令1621(例如,程序逻辑),指令1621可被处理器161执行来执行车辆100的各种功能,包括以上描述的那些功能。数据存储装置162也可包含额外的指令,包括向行进***110、传感器***120、控制***130和***设备140中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。
除了指令1621以外,数据存储装置162还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100在自主、半自主和/或手动模式中操作期间被车辆100和计算机***160使用。可选的,在本申请的一些实施例中,数据存储装置162中存储有训练好的识别模型。
比如,在一种可能的实施例中,数据存储装置162可以获取车辆基于传感器120中的相机125采集到的周围环境中的图像,通过存储的识别模型获得该图像中的各障碍物的3D信息。
用户接口170,用于向车辆100的用户提供信息或从其接收信息。可选地,用户接口170可与***设备140的集合内的一个或多个输入/输出设备进行交互,例如无线通信***141、车载电脑142、麦克风143和扬声器144中的一个或多个。
计算机***160可基于从各种子***(例如,行进***110、传感器***120和控制***130)获取的信息以及从用户接口170接收的信息来控制车辆100。
可选地,上述这些组件中的一个或多个可与车辆100分开安装或关联。例如,数据存储装置162可以部分或完全地与车辆100分开存在。上述组件可以通过有线和/或无线的方式耦合在一起进行通信。
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图8不应理解为对本申请实施例的限制。
上述车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车、和手推车等,本申请实施例不做特别的限定。
在本申请的另一些实施例中,自动驾驶车辆还可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
示例性的,图10示出了本申请实施例提供的一种的识别模型训练方法,该方法的执行主体可以为图6所示的训练设备10,也可以为训练设备10中的处理器,如图7所示的处理器。该方法包括以下步骤:
S1001、获取第一拍摄组件采集的第一图像。
其中,第一拍摄组件对应第一视角,即第一图像为第一视角对应的图像。示例性的,第一拍摄组件可以为单目相机,该单目相机包括但不限于长距相机、中距相机、短距相机、鱼眼相机等。可选的,第一拍摄相机也可为其他具有图像采集功能的装置。本申请实施例中,第一拍摄组件也可称为图像采集装置、传感器等,可以理解,其名称并不构成对其功能的限制。
S1002、根据第一图像生成第二拍摄组件对应的第二图像。
其中,第二拍摄组件可以根据第一拍摄组件确定,第二拍摄组件对应第二视角,即第二图像为第二视角对应的图像。第一视角与第二视角不同,也就是说,第一拍摄组件对应的拍摄角度与第二拍摄组件对应的拍摄角度不同,也即第一图像与第二图像不同。
在一种可能的实现方式中,第一拍摄组件为虚拟拍摄组件,在步骤S1002之前, 可以将第一拍摄组件的坐标系旋转预设角度获得第二拍摄组件。也就是说,将第一拍摄组件的坐标系旋转预设角度获得一个新坐标系,将该新坐标系对应的拍摄组件作为第二拍摄组件。
可选的,第二拍摄组件与第一拍摄组件之间的外参可以根据旋转的预设角度、旋转的方向等中的一种或多种确定。可以理解,外参指的是两个坐标系之间的转换关系,因此第二拍摄组件与第一拍摄组件之间的外参也可描述为,第二拍摄组件与第一拍摄组件之间的坐标转换关系(或称坐标系转换关系)。
在一些实施例中,第二拍摄组件与第一拍摄组件之间的外参为与预设角度有关的旋转矩阵,下面以第一拍摄组件的坐标系为图2中(1)所示的坐标系为例,对第二拍摄组件与第一拍摄组件之间的外参进行介绍。
在一种可能的示例中,以第二拍摄组件为将第一拍摄组件的坐标系绕第一拍摄组件的坐标系中的x轴旋转预设角度获得为例,则第二拍摄组件与第一拍摄组件之间的外参可以表示成公式1的形式。
其中,公式1中,X表示第二拍摄组件与第一拍摄组件之间的外参,ω表示旋转的预设角度,且0度≤ω≤360度。
可选的,由于公式1中的参数(如cos(ω)、sin(ω)、-sin(ω)等中的一个或多个)可能存在误差。因此,该示例下,第二拍摄组件与第一拍摄组件之间的外参还可以表示成公式1a的形式。
其中,公式1a中,v以及u均为预设常数,可用于消除对应参数的误差。可以理解的是,分别用于消除各参数的误差的常数可以相同,如v以及u的取值相同,也可以不同,如v以及u的取值不同。当然,也可以通过一个误差矩阵来消除公式1的整体误差,比如:该误差矩阵可以为3×3的矩阵,矩阵中的每个元素均为预设常数。也就是说,本申请并不限定用于消除公式1误差的预设常数的表示形式,也不限定该预设常数在公式1中添加的位置。
在另一种可能的示例中,以第二拍摄组件为将第一拍摄组件的坐标系绕第一拍摄组件的坐标系中的y轴旋转预设角度获得为例,则第二拍摄组件与第一拍摄组件之间的外参可以表示成公式2的形式。
其中,关于公式2中各参数的介绍可参考公式1中对应参数的介绍。
在又一种可能的示例中,以第二拍摄组件为将第一拍摄组件的坐标系绕第一拍摄组件的z轴旋转预设角度获得为例,则第二拍摄组件与第一拍摄组件之间的外参可表示成公式3的形式。
其中,关于公式3中各参数的介绍可参考公式1中对应参数的介绍。
可以理解,上述示例,均是将第一拍摄组件的坐标系绕第一拍摄组件的坐标系中的一个坐标轴进行旋转获得第二拍摄组件为例的,也可以将第一拍摄组件坐标系绕第一拍摄组件的坐标系中的多个坐标轴进行旋转获得第二拍摄组件,比如:先绕x轴旋转一定角度,再绕y轴旋转一定角度。再比如:先绕z轴旋转一定角度,再绕x轴旋转一定角度。再比如:先绕y轴旋转一定角度,再绕x轴旋转一定角度,最后绕z轴旋转一定角度。可选的,可以绕一个坐标轴旋转一次或多次,比如:先绕x轴旋转一定角度,再绕y轴旋转一定角度,再绕x轴旋转一定角度等。可选的,这三次旋转的角度可以相同也可以不同。
其中,在将第一拍摄组件的坐标系绕第一拍摄组件的坐标系中的多个坐标轴进行旋转获得第二拍摄组件时,第二拍摄组件与第一拍摄组件之间的外参可以由上述公式1、公式2、公式3中的多个相乘进行表示。以先绕x轴旋转一定角度,再绕y轴旋转一定角度为例,第二拍摄组件与第一拍摄组件之间的外参可以由上述公式1与公式2相乘表示,如公式4的形式。
其中,公式4中,X表示第二拍摄组件与第一拍摄组件之间的外参。β表示绕x轴旋转的角度,0度≤β≤360度。λ表示绕y轴旋转的角度,0度≤λ≤360度。β与λ可以相同也可以不同。可选的,预设角度(即上述α)可以根据绕各个坐标轴旋转的角度确定,比如,该示例中,预设角度可以根据β与λ确定。在一些示例中,预设角度可以为β与λ之和。
同理,在绕其他多个坐标轴旋转一次或多次获得第二拍摄组件的情况下,第二拍摄组件与第一拍摄组件之间的外参的表示也可参考上述实现,此处不再详细举例说明。
同理,也可以通过添加预设常数的方式消除公式2、公式3、公式4中的参数的误差,其具体实现可参考公式1a的相关实现,此处不再举例说明。
可选的,也可以不将第一拍摄组件的坐标系绕着第一拍摄组件的坐标系中的坐标轴进行旋转,而是绕着其他方向进行旋转,获得第二拍摄组件。也即本申请实施例中,可以将第一拍摄组件的坐标系向任意方向旋转任意角度,获得第二拍摄组件,本申请对此不做限定。
示例性的,结合图2中(1)所示的单目相机的坐标系,第一拍摄组件1100的坐标系可以如图11中(1)所示,如将图11中(1)所示的坐标系向左旋转N度,N为正数,获得一个新坐标系。示例性的,该新坐标系可以如图11中(2)所示的坐标系, 与图11中(2)所示的坐标系对应的拍摄组件(如图11中(2)所示的拍摄组件1110)即为第二拍摄组件。
可选的,第二拍摄组件的坐标系的设定方式可以和第一拍摄组件的设定方式相同,如:均是以拍摄组件的光心作为坐标系的原点,拍摄组件的视线方向作为z轴方向。或者,第二拍摄组件的坐标系的设定方式和第一拍摄组件的坐标系的设定方式也可以不同,例如:坐标系原点不同、坐标轴设定不同等。本申请对此不做任何限定。
在其他可能的实现方式中,第一拍摄组件的坐标系与第二拍摄组件的坐标系可以为相同类型的坐标系,如均为空间直角坐标系,也可以为不同类型的坐标系。
在一些可能的实现方式中,第一拍摄组件与第二拍摄组件的内参(intrinsics)相同,如直接将第一拍摄组件的内参作为第二拍摄组件的内参。或者,第一拍摄组件与第二拍摄组件的内参不同,如为第二拍摄组件设定与第一拍摄组件不同的内参。可选的,第一拍摄组件的与第二拍摄组件的畸变系数可以相同也可以不同。其中,关于拍摄组件的内参以及畸变系数的详细介绍请参考常规技术中相机的内参和畸变系数的介绍,本申请不再赘述。
S1003、根据第一图像中的第一对象的第一三维信息确定第二图像中的该第一对象的第二三维信息。
可选的,第一图像中的第一对象的第一三维信息为第一对象在第一拍摄组件的坐标系下的三维信息,第二图像中的该第一对象的第二三维信息为该第一对象在第二拍摄组件的坐标系下的三维信息。
可选的,可以根据该第一对象的第一三维信息、第一拍摄组件与第二拍摄组件的坐标转换关系确定该第一对象的第二三维信息。可以理解,第一拍摄组件与第二拍摄组件的坐标转换关系也可描述为第一拍摄组件与第二拍摄组件之间的外参。
可选的,在步骤S1003之前,图10所示的方法还可以包括以下步骤S1003a、S1003b(图中未示出):
S1003a、获取传感器采集的与第一图像对应的点云数据。
其中,点云数据中包括该第一对象的第三三维信息。可选的,该第一对象的第三三维信息为在传感器的坐标系下的三维信息。示例性的,该传感器可以为激光雷达等各种用于采集点云数据的传感器。
可选的,本申请实施例中,第一三维信息、或者第二三维信息、或者第三三维信息包括以下一种或多种:三维坐标、尺寸、朝向角等。
S1003b、根据第三三维信息以及第一拍摄组件与传感器之间的坐标转换关系确定第一三维信息。
可以理解,第一拍摄组件与传感器之间的坐标转换关系也可以描述为第一拍摄组件与传感器之间的外参。
S1004、根据第二图像以及第二三维信息训练识别模型。
其中,识别模型用于识别第一拍摄组件采集的图像中的对象。
可选的,在训练过程中,识别模型还可以根据第二拍摄组件的内参以及畸变系数等,利用成像原理将该第一对象的第二三维信息投影到第二图像的像素点上,以便训练该识别模型。关于此处成像原理的介绍也请参考常规技术中的介绍,本申请不再赘 述。
示例性的,以第二三维信息为朝向角为例,结合上文所述的朝向角的定义,该朝向角由通常由θ与α相加得到。示例性的,图12示出了本申请实施例获取到的对象的θ以及α的分布图。如图12所示,横坐标代表θ,纵坐标代表α。从图12可以看出,θ以及α分布都是均衡的。
基于上述技术方案,通过第一拍摄组件采集的图像生成第二拍摄组件对应的图像,而第二拍摄组件可以根据第一拍摄组件确定,这样可以获得与第一拍摄组件不同视角的图像,使得获得图像数据更加丰富。根据第一拍摄组件采集的图像中的对象的三维信息确定第二拍摄组件对应的图像中的该对象的三维信息,可以获得与第一拍摄组件不同视角的图像中的该对象的三维信息,进而获得分布地更加均衡的三维信息,然后利用该分布地更加均衡的三维信息训练识别模型,可以降低由于训练数据分布不均衡给识别模型预测的结果带来的影响。后续,在利用该识别模型进行对象的检测时,获得的对象的三维信息更加准确。
在一些实施例中,若存在多个第一拍摄组件,每个第一拍摄组件可以对应一个识别模型。如:对于每一个第一拍摄组件,识别模型训练装置可以根据该第一拍摄组件确定其对应的第二拍摄组件,以此训练该第一拍摄组件对应的识别模型。后续每个第一拍摄组件对应的识别模型仅用于识别对应的第一拍摄组件采集的图像中的对象。
在另一些实施例中,若存在多个第一拍摄组件,多个第一拍摄组件可以对应一个识别模型。如:对于这多个第一拍摄组件,模型训练装置可以根据这多个第一拍摄组件确定对应的第二拍摄组件,以此训练这多个第一拍摄组件对应的识别模型。后续,训练获得的该识别模型可用于识别其对应的这多个第一拍摄组件采集的图像中的对象。比如:识别模型训练装置可以采用诸如图10中所述的方法,分别根据多个第一拍摄组件采集的图像生成第二拍摄组件对应的图像,该第二拍摄组件根据该多个第一拍摄组件确定。同样的,分别根据多个第一拍摄组件采集的图像中的对象的第一三维信息,确定第二拍摄组件对应的图像中的该对象的第二三维信息。根据第二拍摄组件对应的图像以及第二拍摄组件对应的图像中的对象的第二三维信息训练识别模型。
基于该实施例的方案,每个第一拍摄组件采集的图像以及该图像中的对象的第一三维信息均可用于训练同一个识别模型,增加了训练该识别模型的训练数据,提高了数据的利用率,降低了采集训练数据的成本。并且,训练一个识别模型即可适用于多个第一拍摄组件,也就是说,一个训练模型即可识别多个第一拍摄组件采集的图像中的对象。这样,无需针对每个第一拍摄组件均训练一个识别模型,降低了训练成本。
可选的,在一种可能的实现方式中,图10中所示的步骤S1002可以具体实现为图13所示的步骤S1005至步骤S1008:
S1005、确定第一图像中的第一像素点在预设参考面上的第一坐标。
可选的,预设参考面可以为以第一拍摄组件的光心为球心的球面。
示例性的,图14示出了本申请实施例提供的一种预设参考面的示意图。如图14中(1)所示,预设参考面可以为以第一拍摄组件的光心为球心的球面K,也即以第一拍摄组件的坐标系的原点o为球心的球面K。
可选的,预设参考面也可以有其他的设定方式,比如:以距离第一拍摄组件的光 心预设距离的点为球心的球面,或者,以第一拍摄组件的其他位置为球心的球面。本申请并不限定预设参考面的具体设定方式。可选的,预设参考面可以为平面也可以为曲面,本申请对此也不做限定。
可以理解,第一拍摄组件的坐标系中的点与第一图像中的像素点是存在映射关系的。以第一拍摄组件的坐标系中的P点为例,根据第一拍摄组件的成像原理、内参以及畸变系数等,可以确定P点对应的第一图像中的像素点为P1点。
将P点归一化到球面K上,如图14中(1)所示,P点归一化到球面K上的点为P’点。由于P点与P1点之间存在对应关系,而P点与P’点之间也存在对应关系,因此P’点与P1点之间也存在对应关系。应理解,第一拍摄组件的坐标系中的任一点均可归一化到球面K上的对应点,对于第一拍摄组件的坐标系中的不同点,其归一化到球面K上的点可能是相同的。其中,归一化到的球面K上的点与第一图像中的像素点是一一对应的。以第一像素点为P1点为例,则第一像素点在预设参考面上的第一坐标即为P’点在第一拍摄组件的坐标系下的坐标,也即图14中(1)所示的坐标系中P’点的坐标。
可选的,第一像素点可以为第一图像中的任意像素点,对于第一图像中的每一个像素点均可以根据该步骤确定在预设参考面上的第一坐标。可选的,第一像素点也可以是对第一图像中的像素点进行差值处理获得的像素点,本申请对第一像素点不做任何限定。鉴于差值处理为现有技术,其具体可参考现有方案,本文不再详细介绍。
S1006、根据第一坐标确定第一像素点与第二拍摄组件对应的第二坐标。
结合图11所述的实现方式,由于第二拍摄组件可以通过将第一拍摄组件的坐标系旋转预设角度获得。因此,通过将第一拍摄组件的坐标系旋转预设角度得到第二拍摄组件时,诸如图14中(1)所示的球面K也旋转预设角度,得到图14中(2)所示的球面K1。而图14中(1)所示的第一拍摄组件的相机坐标系下的P点归一化到图14中(2)所示的球面K1上的点的位置不变,但是由于诸如图14中(2)所示第二拍摄组件的坐标系与图14中(1)所示的第一拍摄组件的坐标系是不同的,因此,图14中(2)所示的P’点在第二拍摄组件的坐标系下的坐标(如图14中(2)所示的坐标系)为第二坐标。
在一些实施例中,第一像素点与第二拍摄组件对应的第二坐标可以根据第一坐标以及第二拍摄组件与第一拍摄组件之间的外参确定。
在一种可能的实现方式中,以第二拍摄组件与第一拍摄组件之间的外参为与预设角度有关的旋转矩阵为例,第一像素点与第二拍摄组件对应的第二坐标系可以由第一坐标左乘前述旋转矩阵获得。
示例性的,以该旋转矩阵为公式1中的矩阵为例,则第一像素点与第二拍摄组件对应的第二坐标可以表示成公式5的形式。
其中,公式5中,为第一像素点与第二拍摄组件对应的第二坐标(x2,y2,z2),为第一坐标(x1,y1,z1),为旋转矩阵。
同理,也可以通过添加预设常数的方式消除公式5中的参数的误差,其具体实现可参考公式1a的相关实现,此处不再举例说明。
同理,在该旋转矩阵为其他形式(如公式2至公式4中的矩阵)的情况下,第一像素点与第二拍摄组件对应的第二坐标的表示也可参考公式5的实现,此处不再举例说明。
S1007、根据第二坐标确定第二像素点。
同样的,球面K1上的点与第二拍摄组件对应的第二图像上的像素点也存在对应关系,根据成像原理,即可将球面K1上的点映射到第二图像的像素点上。以第二坐标为图14中(2)所示的P’点在图14中(2)所示的坐标系中的坐标为例,则第二像素点为P2。
可以理解,对于第一图像中的每个第一像素点,均可以根据本申请实施例的方案确定对应的第二像素点。
S1008、根据第二像素点生成第二图像。
示例性的,可以通过调用开源图像处理库(open source computer vision library,OpenCV)的remap函数根据第二像素点生成第二图像。
其中,关于图13中其他步骤的介绍可参考图10中对应步骤的介绍。
示例性的,图15中(1)示出了现有方案训练的识别模型预测的对象的朝向的示意图,图15中(2)示出了本申请实施例训练得到的识别模型获得的对象的朝向的示意图。其中,图15中所示的车辆为对象,各车辆的白色矩形框为根据识别模型预测得到的该车辆的3D信息标注的3D框,白色箭头的方向为车头的方向,也即车辆的朝向。从图15中可以看出,本申请实施例训练得到的识别模型获得的对象的朝向的准确度要高于现有方案训练的模型预测的对应的对象的朝向的准确度。
上述主要是从方法的角度对本申请实施例提供的方案进行了介绍。可以理解的是,识别模型训练装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。结合本申请中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能,但是这种实现不应认为超过本申请实施例的技术方案的范围。
本申请是实施例可以根据上述方法示例对识别模型训练装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种 逻辑功能划分,实际实现时可以有另外的划分方式。
如图16所示,为本申请实施例提供的一种识别模型训练装置的结构示意图,该识别模型训练装置1600可用于实现以上各个方法实施例中记载的方法。可选的,该识别模型训练装置可以是如图6中所示的训练设备10,还可以是应用于服务器的模块(如芯片),该服务器可以位于云端。示例性的,该识别模型训练装置1600具体可以包括:获取单元1601、处理单元1602。
其中,获取单元1601用于支持识别模型训练装置1600执行图10中的步骤S1001。和/或,获取单元1601还用于支持识别模型训练装置1600执行图13中的步骤S1001。和/或,获取单元1601还用于支持识别模型训练装置1600执行本申请实施例中识别模型训练装置执行的其他步骤。
处理单元1602用于支持识别模型训练装置1600执行图10中的步骤S1002至步骤S1004。和/或,处理单元1602还用于支持识别模型训练装置1600执行图13中的步骤S1005至步骤S1008、以及步骤S1003至步骤S1004。和/或,处理单元1602还用于支持识别模型训练装置1600执行本申请实施例中识别模型训练装置执行的其他步骤。
可选的,图16所示的识别模型训练装置1600还可以包括通信单元1603,该通信单元1603,用于支持识别模型训练装置1600执行本申请实施例中识别模型训练装置与其他设备之间通信的步骤。
可选的,图16所示的识别模型训练装置1600还可以包括存储单元(图16中未示出),该存储单元存储有程序或指令。当处理单元1602执行该程序或指令时,使得图16所示的识别模型训练装置1600可以执行上述方法实施例所述的方法。
图16所示的识别模型训练装置1600的技术效果可以参考上述方法实施例所述的技术效果,此处不再赘述。图16所示的识别模型训练装置1600中涉及的处理单元1602可以由处理器或处理器相关电路组件实现,可以为处理器或处理模块。通信单元1603可以由收发器或收发器相关电路组件实现,可以为收发器或收发模块。
本申请实施例还提供一种芯片***,如图17所示,该芯片***包括至少一个处理器1701和至少一个接口电路1702。处理器1701和接口电路1702可通过线路互联。例如,接口电路1702可用于从其它装置接收信号。又例如,接口电路1702可用于向其它装置(例如处理器1701)发送信号。示例性的,接口电路1702可读取存储器中存储的指令,并将该指令发送给处理器1701。当所述指令被处理器1701执行时,可使得识别模型训练装置执行上述实施例中的识别模型训练装置执行的各个步骤。当然,该芯片***还可以包含其他分立器件,本申请实施例对此不作具体限定。
可选地,该芯片***中的处理器可以为一个或多个。该处理器可以通过硬件实现也可以通过软件实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等。当通过软件实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现。
可选地,该芯片***中的存储器也可以为一个或多个。该存储器可以与处理器集成在一起,也可以和处理器分离设置,本申请并不限定。示例性的,存储器可以是非瞬时性处理器,例如只读存储器ROM,其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型,以及存储器与处理器的设置方式 不作具体限定。
示例性的,该芯片***可以是现场可编程门阵列(field programmable gate array,FPGA),可以是专用集成芯片(application specific integrated circuit,ASIC),还可以是***芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)或其他集成芯片。
应理解,上述方法实施例中的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例还提供一种电子设备,该电子设备包括上述方法实施例所述的识别模型。可选的,该电子设备可以上述执行设备。
在一种可能的设计中,该电子设备还包括上述方法实施例所述的第一拍摄组件。
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在识别模型训练装置上运行时,使得识别模型训练装置执行上述方法实施例所述的方法。
本申请实施例提供一种计算机程序产品,该计算机程序产品包括:计算机程序或指令,当计算机程序或指令在计算机上运行时,使得该计算机执行上述方法实施例所述的方法。
本申请实施例提供一种可移动智能设备,包括第一拍摄组件以及上述实施例所述的识别模型训练装置,第一拍摄组件用于采集第一图像,并将第一图像传输给识别模型训练装置。
另外,本申请实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使装置执行上述各方法实施例中的方法。
其中,本实施例提供的识别模型训练装置、可移动智能设备、计算机存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。各实施例在不冲突的情况下可以相互结合或相互参考。以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接, 可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (16)

  1. 一种识别模型训练方法,其特征在于,所述方法包括:
    获取第一拍摄组件采集的第一图像,所述第一拍摄组件对应第一视角;
    根据所述第一图像生成第二拍摄组件对应的第二图像,所述第二拍摄组件根据所述第一拍摄组件确定,所述第二拍摄组件对应第二视角;
    根据所述第一图像中的第一对象的第一三维信息确定所述第二图像中的所述第一对象的第二三维信息;
    根据所述第二图像以及所述第二三维信息训练识别模型,所述识别模型用于识别所述第一拍摄组件采集的图像中的对象。
  2. 根据权利要求1所述的方法,其特征在于,所述第二拍摄组件为虚拟拍摄组件;在所述根据所述第一图像生成第二拍摄组件对应的第二图像之前,所述方法还包括:
    将所述第一拍摄组件的坐标系旋转预设角度获得所述第二拍摄组件。
  3. 根据权利要求1或2所述的方法,其特征在于:
    所述第一拍摄组件与所述第二拍摄组件的内参相同;或者,
    所述第一拍摄组件与所述第二拍摄组件的内参不同。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述根据所述第一图像生成第二拍摄组件对应的第二图像,包括:
    确定所述第一图像中的第一像素点在预设参考面上的第一坐标,所述第一像素点为所述第一图像中的任意一个像素点;
    根据所述第一坐标确定所述第一像素点与所述第二拍摄组件对应的第二坐标;
    根据所述第二坐标确定第二像素点;
    根据所述第二像素点生成所述第二图像。
  5. 根据权利要求4所述的方法,其特征在于,所述预设参考面为以所述第一拍摄组件的光心为球心的球面。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述根据所述第一图像中的第一对象的第一三维信息确定所述第二图像中的所述第一对象的第二三维信息,包括:
    根据所述第一三维信息、所述第一拍摄组件与所述第二拍摄组件的坐标转换关系确定所述第二三维信息。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,在所述根据所述第一图像中的第一对象的第一三维信息确定所述第二图像中的所述第一对象的第二三维信息之前,所述方法还包括:
    获取传感器采集的与所述第一图像对应的点云数据,所述点云数据中包括所述第一对象的第三三维信息;
    根据所述第三三维信息以及所述第一拍摄组件与所述传感器之间的坐标转换关系确定所述第一三维信息。
  8. 一种识别模型训练装置,其特征在于,包括获取单元和处理单元;
    所述获取单元,用于获取第一拍摄组件采集的第一图像,所述第一拍摄组件对应第一视角;
    所述处理单元,用于:
    根据所述第一图像生成第二拍摄组件对应的第二图像,所述第二拍摄组件根据所述第一拍摄组件确定,所述第二拍摄组件对应第二视角;
    根据所述第一图像中的第一对象的第一三维信息确定所述第二图像中的所述第一对象的第二三维信息;
    根据所述第二图像以及所述第二三维信息训练识别模型,所述识别模型用于识别所述第一拍摄组件采集的图像中的对象。
  9. 根据权利要求8所述的装置,其特征在于,所述第二拍摄组件为虚拟拍摄组件;所述处理单元,还用于将所述第一拍摄组件的坐标系旋转预设角度获得所述第二拍摄组件。
  10. 根据权利要求8或9所述的装置,其特征在于:
    所述第一拍摄组件与所述第二拍摄组件的内参相同;或者,
    所述第一拍摄组件与所述第二拍摄组件的内参不同。
  11. 根据权利要求8-10任一项所述的装置,其特征在于,所述处理单元,具体用于:
    确定所述第一图像中的第一像素点在预设参考面上的第一坐标,所述第一像素点为所述第一图像中的任意一个像素点;
    根据所述第一坐标确定所述第一像素点与所述第二拍摄组件对应的第二坐标;
    根据所述第二坐标确定第二像素点;
    根据所述第二像素点生成所述第二图像。
  12. 根据权利要求11所述的装置,其特征在于,所述预设参考面为以所述第一拍摄组件的光心为球心的球面。
  13. 根据权利要求8-12任一项所述的装置,其特征在于,
    所述处理单元,具体用于根据所述第一三维信息、所述第一拍摄组件与所述第二拍摄组件的坐标转换关系确定所述第二三维信息。
  14. 根据权利要求8-13任一项所述的装置,其特征在于,
    所述获取单元,还用于获取传感器采集的与所述第一图像对应的点云数据,所述点云数据中包括所述第一对象的第三三维信息;
    所述处理单元,还用于根据所述第三三维信息以及所述第一拍摄组件与所述传感器之间的坐标转换关系确定所述第一三维信息。
  15. 一种识别模型训练装置,其特征在于,包括:处理器,所述处理器与存储器耦合;所述处理器,用于执行所述存储器中存储的计算机程序,以使得所述识别模型训练装置执行如权利要求1-7中任一项所述的方法。
  16. 一种可移动智能设备,其特征在于,包括:第一拍摄组件以及权利要求8-15中任意一项所述的识别模型训练装置,所述第一拍摄组件用于采集第一图像,并将所述第一图像传输给所述识别模型训练装置。
PCT/CN2023/083772 2022-08-22 2023-03-24 识别模型训练方法、装置以及可移动智能设备 WO2024040964A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211009619.X 2022-08-22
CN202211009619.XA CN117671402A (zh) 2022-08-22 2022-08-22 识别模型训练方法、装置以及可移动智能设备

Publications (1)

Publication Number Publication Date
WO2024040964A1 true WO2024040964A1 (zh) 2024-02-29

Family

ID=90012302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083772 WO2024040964A1 (zh) 2022-08-22 2023-03-24 识别模型训练方法、装置以及可移动智能设备

Country Status (2)

Country Link
CN (1) CN117671402A (zh)
WO (1) WO2024040964A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180045164A (ko) * 2016-10-25 2018-05-04 (주)훌리악 다각도 촬영 영상을 이용한 유화 복원 방법
CN111862296A (zh) * 2019-04-24 2020-10-30 京东方科技集团股份有限公司 三维重建方法及装置、***、模型训练方法、存储介质
CN111918049A (zh) * 2020-08-14 2020-11-10 广东申义实业投资有限公司 三维成像的方法、装置、电子设备及存储介质
CN112287867A (zh) * 2020-11-10 2021-01-29 上海依图网络科技有限公司 一种多摄像头的人体动作识别方法及装置
CN112733723A (zh) * 2021-01-12 2021-04-30 南京易自助网络科技有限公司 车型识别方法及终端
CN113469091A (zh) * 2021-07-09 2021-10-01 北京的卢深视科技有限公司 人脸识别方法、训练方法、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180045164A (ko) * 2016-10-25 2018-05-04 (주)훌리악 다각도 촬영 영상을 이용한 유화 복원 방법
CN111862296A (zh) * 2019-04-24 2020-10-30 京东方科技集团股份有限公司 三维重建方法及装置、***、模型训练方法、存储介质
CN111918049A (zh) * 2020-08-14 2020-11-10 广东申义实业投资有限公司 三维成像的方法、装置、电子设备及存储介质
CN112287867A (zh) * 2020-11-10 2021-01-29 上海依图网络科技有限公司 一种多摄像头的人体动作识别方法及装置
CN112733723A (zh) * 2021-01-12 2021-04-30 南京易自助网络科技有限公司 车型识别方法及终端
CN113469091A (zh) * 2021-07-09 2021-10-01 北京的卢深视科技有限公司 人脸识别方法、训练方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN117671402A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
US11915492B2 (en) Traffic light recognition method and apparatus
US11966673B2 (en) Sensor simulation and learning sensor models with generative machine learning methods
JP6441993B2 (ja) レーザー点クラウドを用いる物体検出のための方法及びシステム
US11966838B2 (en) Behavior-guided path planning in autonomous machine applications
US11726189B2 (en) Real-time online calibration of coherent doppler lidar systems on vehicles
WO2021184218A1 (zh) 一种相对位姿标定方法及相关装置
US20240127062A1 (en) Behavior-guided path planning in autonomous machine applications
US20220198706A1 (en) Positioning method, apparatus, and system
US11734935B2 (en) Transferring synthetic lidar system data to real world domain for autonomous vehicle training applications
US8989944B1 (en) Methods and devices for determining movements of an object in an environment
US11592524B2 (en) Computation of the angle of incidence of laser beam and its application on reflectivity estimation
WO2022204855A1 (zh) 一种图像处理方法及相关终端装置
US20230047094A1 (en) Image processing method, network training method, and related device
RU2767949C2 (ru) Способ (варианты) и система для калибровки нескольких лидарных датчиков
CN112512887A (zh) 一种行驶决策选择方法以及装置
CN112810603B (zh) 定位方法和相关产品
CN114240769A (zh) 一种图像处理方法以及装置
CN114332845A (zh) 一种3d目标检测的方法及设备
WO2024040964A1 (zh) 识别模型训练方法、装置以及可移动智能设备
CN114167404A (zh) 目标跟踪方法及装置
CN115100630B (zh) 障碍物检测方法、装置、车辆、介质及芯片
CN110720025B (zh) 移动物体的地图的选择方法、装置、***和车辆/机器人
CN114549610A (zh) 一种点云数据的处理方法及相关装置
WO2024108380A1 (zh) 自动泊车方法及装置
CN115082886B (zh) 目标检测的方法、装置、存储介质、芯片及车辆

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856060

Country of ref document: EP

Kind code of ref document: A1