CN115457293A

CN115457293A - Target detection network model training method based on pixel and point cloud feature fusion

Info

Publication number: CN115457293A
Application number: CN202211194508.0A
Authority: CN
Inventors: 王发平; 庄鸿杰; 李南星
Original assignee: Shenzhen Haixing Zhijia Technology Co Ltd
Current assignee: Shenzhen Haixing Zhijia Technology Co Ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2022-12-09

Abstract

The embodiment of the invention relates to a target detection network model training method based on pixel and point cloud feature fusion, which comprises the following steps: acquiring a point cloud and an image of a driving environment where the engineering vehicle is located; projecting the point cloud onto the image to obtain a projected image; acquiring pixel channel characteristics of the projected image; and training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model. According to the method, the point cloud is controlled to be bound with different numbers of pixel channels to obtain the projection images, the target detection network model is trained according to the projection images and the image characteristics corresponding to the projection images, the target detection network model obtained through training in the mode is used for target detection, errors caused by relative movement of the sensors can be avoided, and target detection is more accurate.

Description

Target detection network model training method based on pixel and point cloud feature fusion

Technical Field

The embodiment of the invention relates to the field of automatic driving, in particular to a target detection network model training method based on pixel and point cloud feature fusion.

Background

LiDAR, a sensor that uses pulses of light to measure the 3D coordinates of objects in a scene, has the disadvantage of sparseness, limited range, etc., i.e., the farther from the sensor, the fewer points returned. This means that objects that are far away may only get a few points, or not at all, and may not be individually picked up by the LiDAR sensor, making it difficult to distinguish the class of objects, as shown in FIG. 1. Meanwhile, the image input from the vehicle-mounted camera is very dense, which is beneficial to semantic understanding tasks such as detection, target segmentation and the like. By virtue of high resolution, the camera can very effectively detect a distant target, but is not very accurate in measuring distance, so that the point cloud and image fusion algorithm is widely applied.

However, in engineering applications, there are many problems in the point cloud and image fusion. Firstly, the relative motion of the two sensors is caused by the vibration of the vehicle in the motion process, and the external parameters correspondingly change to a certain extent to generate vibration errors. Secondly, due to the problems of the laser algorithm, the laser points are continuously jumped, and the camera internal reference calibration has small errors due to the environmental problems of light and the like, so that the calibration errors of the jointly calibrated external reference exist. In addition, because the sensors have different clock precision and frequencies, the data acquired by the two sensors at the same moment has certain time and space deviation, which is equivalent to that the input data of the algorithm is not completely matched at the beginning, and has more or less deviation due to various reasons.

Disclosure of Invention

In view of this, to solve the above technical problems or some technical problems, embodiments of the present invention provide a target detection network model training method based on pixel and point cloud feature fusion.

In a first aspect, an embodiment of the present invention provides a target detection network model training method based on pixel and point cloud feature fusion, including:

acquiring a point cloud and an image of a driving environment where the engineering vehicle is located;

projecting the point cloud onto the image to obtain a projected image;

acquiring pixel channel characteristics of the projected image;

and training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model.

In one possible embodiment, the method further comprises:

and when an error exists, binding the point cloud and the pixel channel of the image based on a preset rule to obtain a projected image.

In one possible embodiment, the method further comprises:

when only a transverse error or a longitudinal error exists, controlling each point cloud to bind a first preset number of pixel channels;

when the transverse error and the longitudinal error exist at the same time, controlling each point cloud to bind a second preset number of pixel channels;

and taking the image obtained after the point cloud and the pixel channel are bound as a projection image.

In one possible embodiment, the method further comprises:

acquiring the category of a pixel channel bound by each point cloud by adopting an image semantic segmentation method;

and determining the pixel channel characteristics of the pixel areas formed by the pixel channels bound by each point cloud based on the categories.

In one possible embodiment, the method further comprises:

and inputting the pixel channel characteristics into a multi-layer perceptron to perform characteristic mixing to obtain pixel channel mixing characteristics.

In one possible embodiment, the method further comprises:

inputting the pixel channel mixed characteristics into an initial network model for model training, determining that the initial network model is trained when the result output by the initial network model meets a preset condition, and taking the trained initial network model as a target detection network model.

In a second aspect, an embodiment of the present invention provides a target detection method based on pixel and point cloud feature fusion, including:

acquiring a point cloud and an image of a running environment where a target engineering vehicle is located;

projecting the point cloud onto the image to obtain a projected image;

and inputting the projection images into a target detection network model, and detecting a plurality of targets of the running environment of the target engineering vehicle.

In a third aspect, an embodiment of the present invention provides a training apparatus for a target detection network model based on fusion of pixel and point cloud features, including:

the acquisition module is used for acquiring a point cloud and an image of a driving environment where the engineering vehicle is located;

the projection module is used for projecting the point cloud onto the image to obtain a projected image;

the acquisition module is further used for acquiring pixel channel characteristics of the projection image;

and the training module is used for training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model.

In a fourth aspect, an embodiment of the present invention provides a target detection apparatus based on pixel and point cloud feature fusion, including:

the acquisition module is used for acquiring point clouds and images of a running environment where the target engineering vehicle is located;

and the detection module is used for inputting the projection image into a target detection network model and detecting a plurality of targets of the running environment of the target engineering vehicle.

In a fifth aspect, an embodiment of the present invention provides an electronic device, including: the processor is configured to execute a training program of a target detection network model based on fusion of pixels and point cloud features and a target detection program based on fusion of pixels and point cloud features, which are stored in the memory, so as to implement the training method of the target detection network model based on fusion of pixels and point cloud features in the first aspect and the target detection method based on fusion of pixels and point cloud features in the second aspect.

In a sixth aspect, an embodiment of the present invention provides a storage medium, including: the storage medium stores one or more programs, which are executable by one or more processors to implement the method for training the network model based on object detection based on fusion of pixel and point cloud features in the first aspect and the method for object detection based on fusion of pixel and point cloud features in the second aspect.

According to the training scheme of the target detection network model based on the fusion of the pixel and the point cloud characteristics, the point cloud and the image of the driving environment where the engineering vehicle is located are obtained; projecting the point cloud onto the image to obtain a projected image; acquiring pixel channel characteristics of the projected image; and training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model. Compared with the point cloud and image fusion in the prior art, errors possibly exist due to relative motion of the sensor, and therefore target detection results are inaccurate.

According to the target detection scheme based on the fusion of the pixel and the point cloud characteristics, the point cloud and the image of the driving environment where the target engineering vehicle is located are obtained; projecting the point cloud onto the image to obtain a projected image; the projected images are input into a target detection network model, a plurality of targets of the running environment where the target engineering vehicle is located are detected, and according to the scheme, the target detection network model which is trained is used for carrying out target detection on the running environment where the target engineering vehicle is located, so that errors caused by relative motion of sensors can be avoided, the targets can be accurately detected, and the target types can be distinguished.

Drawings

Fig. 1 is a schematic flowchart of a training method of a target detection network model based on fusion of pixel and point cloud features according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of another training method for a target detection network model based on fusion of pixel and point cloud features according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a pixel channel feature according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a feature module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an experience model of an FMN network according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a self-learning mode of an FMN network according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of a target detection method based on pixel and point cloud feature fusion according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a training apparatus for a target detection network model based on fusion of pixel and point cloud features according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a target detection apparatus based on fusion of pixel and point cloud features according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

Fig. 1 is a schematic flowchart of a method for training a target detection network model based on fusion of pixel and point cloud features according to an embodiment of the present invention, and as shown in fig. 1, the method specifically includes:

s11, point cloud and an image of a running environment where the engineering vehicle is located are obtained.

In the embodiment of the invention, the point cloud and the image of the driving environment of the engineering vehicle are obtained, the point cloud can be obtained by a LiDAR sensor, and the image can be obtained by a vehicle-mounted camera. The point clouds and images of the driving environment in front of the engineering vehicle or in any direction around the vehicle can be acquired.

It should be noted that the point cloud and the image data are corresponding data of the same target in the same time and space, and the point cloud and the image data may also be data stored in an open source database or data acquired on site by an engineering vehicle, which is not limited in the present invention.

And S12, projecting the point cloud onto the image to obtain a projected image.

The color image is generally composed of R, G, B, an image composed of a single R or G or B channel is gray, the gray image is called a single-channel image, the pixel channel refers to a pixel of each channel, the point cloud can be projected in a perfect correspondence with the image without error, but the vehicle can cause relative movement of the sensor during movement, projection errors can be generated, the point cloud is projected onto the image in a misaligned state, and each point cloud cannot perfectly correspond to each pixel channel of the image.

In order to fully utilize the advantages of the point cloud and the image, in the embodiment of the invention, the point cloud is projected onto the image. Specifically, the point cloud and a pixel channel of the image are aligned, and when an error exists, the point cloud and the pixel channel are aligned according to a preset rule to obtain a projected image.

And S13, acquiring pixel channel characteristics of the projection image.

In the embodiment of the invention, the enhanced channel L can be selected to assist the image semantic recognition of the categories of the target, and the image semantic segmentation is utilized to recognize different target objects in the image, for example, people in the image are recognized as red, vehicles are recognized as blue, and the like, and each category corresponds to one color to obtain the pixel channel characteristics. The pixel channel features represent the target category and color and other features corresponding to each pixel channel.

And S14, training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model.

And training the initial network model by using the acquired pixel channel characteristics. The network building mode can be divided into two modes, namely an experience mode and a self-learning mode. As shown in fig. 5, which is a schematic diagram of an FMN network experience mode, pixel channel features may be learned through a fusion channel and MLP, and then the shallow features enter the FMN network to be fused with a feature module, and the feature module makes up for a certain loss of upper semantic information along with the depth of the network hierarchy, so that the information of image features is still retained in a deep network. In addition, the pixel channel of the image is divided into an R module, a G module and a B module, so that the network can learn the data characteristics of each channel respectively, and the learning of the image characteristics becomes simpler by another layer. And at the tail end of the FMN, fitting the R module, the G module and the B module again, further optimizing and calibrating the data characteristics learned by the network, and fusing the learned characteristics to obtain the optimal characteristics. As shown in fig. 6, the FMN network is built in a self-learning mode, and is searched by using an NAS architecture, the hyper-parameter m represents the total number of modules, the sequence of each feature module and the number of each module are obtained by NAS self-learning, but the total number cannot exceed m, and the data features of the image are obtained by learning in the self-learning mode.

Further, a target object is determined by the features learned in any of the above manners, and a target detection result is output. And then, whether the training of the model is finished or not can be judged according to the accuracy of the target detection result, and when the accuracy of the output result reaches the expected value, the initial training of the network model can be represented to be finished. And then, testing the model through a pre-prepared test set, if the accuracy of the result output by the model to the test set also reaches the expectation, representing that the model training is finished, and taking the trained network model as a target detection network model.

It should be noted that, because the model training computation processing capacity is large and a large amount of raw data is needed, the process of model training can be completed on the cloud server, the model can be deployed on an automatic driving domain controller or a computing platform at the vehicle end after being trained, and the subsequent process of target detection is completed on the domain controller or the computing platform.

Optionally, after model training is completed on the cloud server, the trained model is continuously deployed on the cloud server, the domain controller of the vehicle end obtains point clouds and images through the sensor and then performs primary processing, the processed data are sent to the cloud server, and the cloud server obtains a target detection result through the model and then feeds back the target detection result to the domain controller of the vehicle end.

The training method of the target detection network model based on the fusion of the pixel and the point cloud features, provided by the embodiment of the invention, comprises the steps of obtaining the point cloud and the image of the driving environment of the engineering vehicle; projecting the point cloud onto the image to obtain a projected image; acquiring pixel channel characteristics of the projected image; and training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model. Compared with the prior art, the point cloud and image fusion method has the advantages that errors possibly exist due to relative motion of the sensor, so that target detection results are inaccurate, according to the scheme, the point cloud is controlled to be bound with different numbers of pixel channels to obtain projected images, then the target detection network model is trained according to the projected images and image characteristics corresponding to the projected images, the target detection network model obtained through the training in the mode carries out target detection, errors caused by the relative motion of the sensor can be avoided, and the target detection is more accurate.

Fig. 2 is a schematic flowchart of another training method for a target detection network model based on fusion of pixel and point cloud features according to an embodiment of the present invention, and as shown in fig. 2, the method specifically includes:

and S21, binding the point cloud and the pixel channel of the image based on a preset rule to obtain a projected image when an error exists.

In the embodiment of the invention, the influence caused by errors can be reduced by exceeding the parameter q. Specifically, under the condition of no error, q is set to be equal to 1, and the point clouds are projected onto the image, namely, each point cloud is controlled to be bound with the image to form a unique pixel channel; when only a transverse error or a longitudinal error exists, q can be set to be equal to 5, namely, each point cloud is controlled to bind 5 pixel channels; when the horizontal error and the vertical error exist at the same time, q can be set to be equal to 9, namely, each point cloud is controlled to bind 9 pixel channels. And taking the image after the point cloud and the pixel channel are bound as a projection image.

And S22, acquiring the category of the pixel channel bound by each point cloud by adopting an image semantic segmentation method.

In the embodiment of the invention, the enhanced channel L can be selected to assist the image semantic recognition of the category of the target, and the image semantic segmentation is utilized to recognize different target objects in the image, for example, people in the image are recognized as red, vehicles are recognized as blue, and the like, each category corresponds to one color, the category of each pixel channel is obtained, and further the category of the pixel channel bound by each point cloud is obtained.

And S23, determining the pixel channel characteristics of the pixel area formed by the pixel channels bound by each point cloud based on the category.

Further, pixel channel characteristics of pixel regions composed of the pixel channels bound by each point cloud can be determined based on the category of the pixel channels bound by each point cloud. As shown in fig. 3, specifically, when there is no error, one pixel channel is bound to one point cloud, and the pixel channel feature of one pixel channel can be determined; when errors exist, a plurality of pixel channels are bound by one point cloud, the pixel channels form one or more pixel regions, and the pixel channel characteristics of the pixel regions can be obtained according to the category fusion of the pixel channels forming the pixel regions.

And S24, inputting the pixel channel characteristics into a multilayer perceptron to perform characteristic mixing to obtain pixel channel mixing characteristics.

And then, inputting the pixel channel characteristics into a multilayer perceptron to perform characteristic mixing, and combining to obtain a characteristic module, wherein as shown in fig. 4, the characteristic module sufficiently adds the characteristics learned by the network into the pixel channel characteristics to perform mixing to obtain pixel channel mixed characteristics, and can play a certain prompting role in network fitting, and additionally, adds the semantic information which is unique to a shallow layer into a deep layer network.

And S25, inputting the pixel channel mixed characteristics into an initial network model for model training, determining that the initial network model is trained completely until the result output by the initial network model meets a preset condition, and taking the trained initial network model as a target detection network model.

And training an initial network model by using the obtained pixel channel mixed characteristics. The network building mode can be divided into two modes, namely an experience mode and a self-learning mode. As shown in fig. 5, which is a schematic diagram of an FMN network experience mode, pixel channel features may be learned through a fusion channel and MLP, and then the shallow features enter the FMN network to be fused with a feature module, and the feature module makes up for a certain loss of upper semantic information along with the depth of the network hierarchy, so that the information of image features is still retained in a deep network. In addition, the pixel channel of the image is divided into an R module, a G module and a B module, so that the network can learn the data characteristics of each channel respectively, and the learning of the image characteristics becomes simpler by another layer. And at the tail end of the FMN, fitting the R module, the G module and the B module again, further optimizing and calibrating the data characteristics learned by the network, and fusing the learned characteristics to obtain the optimal characteristics. As shown in fig. 6, the FMN network is built in a self-learning mode, and is searched by using an NAS architecture, the hyper-parameter m represents the total number of modules, the sequence of each feature module and the number of each module are obtained by NAS self-learning, but the total number cannot exceed m, and the data features of the image are obtained by learning in the self-learning mode. And then determining a target object according to the learned characteristics, outputting a target detection result, representing that the training of the network model is finished when the accuracy of the output result reaches an expected value, and taking the trained network model as the target detection network model.

The embodiment of the invention provides a training method of a target detection network model based on fusion of pixel and point cloud characteristics, which comprises the steps of obtaining a point cloud and an image of a driving environment where an engineering vehicle is located; projecting the point cloud onto the image to obtain a projected image; acquiring pixel channel characteristics of the projected image; and training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model. By the method, errors caused by relative movement of the sensors can be avoided by binding the control point cloud with different numbers of pixel channels, so that target detection is more accurate.

Fig. 7 is a schematic flowchart of a target detection method based on pixel and point cloud feature fusion according to an embodiment of the present invention, and as shown in fig. 7, the method specifically includes:

and S71, acquiring the point cloud and the image of the running environment of the target engineering vehicle.

In the embodiment of the invention, the point cloud and the image of the running environment of the target engineering vehicle are obtained, the point cloud can be obtained by the LiDAR sensor, and the image can be obtained by the vehicle-mounted camera. The point clouds and images of the driving environment in front of the target engineering vehicle or in any direction around the vehicle can be acquired.

And S72, projecting the point cloud onto the image to obtain a projected image.

And projecting the point cloud onto the image to obtain a projected image. Specifically, the point clouds are aligned with the pixel channels of the image, a hyper-parameter q can be introduced, and q is equal to 1 under the condition of no error, namely, each point cloud is controlled to be bound with the unique pixel channel of the image; when only a transverse error or a longitudinal error exists, q can be equal to 5, namely, each point cloud is controlled to bind 5 pixel channels; when the horizontal error and the vertical error exist at the same time, q can be equal to 9, namely, 9 pixel channels are bound by each point cloud. And taking the image after the point cloud and the pixel channel are bound as a projection image.

And S73, inputting the projection image into a target detection network model, and detecting a plurality of targets of the running environment of the target engineering vehicle.

The obtained projection images are input into a trained target detection network model, the model analyzes the pixel channel characteristics of the images, and a plurality of targets of the driving environment of the target engineering vehicle are detected, for example, pedestrians, obstacles and the like in front are detected.

According to the target detection method based on the fusion of the pixel and the point cloud characteristics, the point cloud and the image of the driving environment where the target engineering vehicle is located are obtained; projecting the point cloud onto the image to obtain a projected image; the projected images are input into a target detection network model, a plurality of targets of the driving environment of the target engineering vehicle are detected, and by the method, the driving environment of the target engineering vehicle is subjected to target detection through the trained target detection network model, so that errors caused by relative motion of sensors can be avoided, the targets can be accurately detected, and the target types can be distinguished.

Fig. 8 is a schematic structural diagram of a training apparatus for a target detection network model based on fusion of pixel and point cloud features, which includes:

the acquisition module 801 is used for acquiring a point cloud and an image of a driving environment where the engineering vehicle is located. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.

A projection module 802, configured to project the point cloud onto the image to obtain a projection image. For detailed description, reference is made to the corresponding related description of the above method embodiments, and details are not repeated herein.

The obtaining module 801 is further configured to obtain pixel channel characteristics of the projection image. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.

And the training module 803 is configured to train the initial network model based on the pixel channel characteristics to obtain a trained target detection network model. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.

The training device of the target detection network model based on the fusion of the pixel and the point cloud features, which is provided in this embodiment, may be the training device of the target detection network model based on the fusion of the pixel and the point cloud features, as shown in fig. 8, and may perform all the steps of the training method of the target detection network model based on the fusion of the pixel and the point cloud features, as shown in fig. 1-2, so as to achieve the technical effect of the training method of the target detection network model based on the fusion of the pixel and the point cloud features, which is described with reference to fig. 1-2 specifically, and for brevity, no further description is given here.

Fig. 9 is a schematic structural diagram of a target detection apparatus based on pixel and point cloud feature fusion, which includes:

the acquisition module 901 is used for acquiring the point cloud and the image of the running environment where the target engineering vehicle is located. For detailed description, reference is made to the corresponding related description of the above method embodiments, and details are not repeated herein.

A projection module 902, configured to project the point cloud onto the image to obtain a projection image. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.

And a detection module 903, configured to input the projection image into a target detection network model, and detect multiple targets of a driving environment where the target engineering vehicle is located. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.

The target detection device based on the fusion of the pixel and the point cloud features provided in this embodiment may be the target detection device based on the fusion of the pixel and the point cloud features as shown in fig. 9, and may perform all the steps of the target detection method based on the fusion of the pixel and the point cloud features as shown in fig. 7, so as to achieve the technical effect of the target detection method based on the fusion of the pixel and the point cloud features as shown in fig. 7, and for brevity, reference is specifically made to the description of fig. 7, and no further description is given here.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 1000 shown in fig. 10 includes: at least one processor 1001, memory 1002, at least one network interface 1004, and other user interfaces 1003. The various components in the electronic device 1000 are coupled together by a bus system 1005. It is understood that bus system 1005 is used to enable communications among the components connected. The bus system 1005 includes a power bus, a control bus, and a status signal bus, in addition to a data bus. But for the sake of clarity the various busses are labeled in figure 10 as the bus system 1005.

The user interface 1003 may include, among other things, a display, a keyboard or a pointing device (e.g., a mouse, trackball, touchpad, or touch screen).

It is to be understood that the memory 1002 in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), enhanced Synchronous SDRAM (ESDRAM), synchlronous SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 1002 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 1002 stores the following elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system 10021 and applications 10022.

The operating system 10021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 10022 includes various applications, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. The program implementing the method according to the embodiment of the present invention may be included in the application program 10022.

In the embodiment of the present invention, by calling the program or the instruction stored in the memory 1002, specifically, the program or the instruction stored in the application 10022, the processor 1001 is configured to execute the method steps provided by the method embodiments, for example, the method steps include:

acquiring a point cloud and an image of a driving environment where the engineering vehicle is located; projecting the point cloud onto the image to obtain a projected image; acquiring pixel channel characteristics of the projected image; and training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model.

In a possible implementation manner, when an error exists, the point cloud and a pixel channel of the image are bound based on a preset rule to obtain a projected image.

In one possible implementation, when only a transverse error or a longitudinal error exists, each point cloud is controlled to bind a first preset number of pixel channels; when the transverse error and the longitudinal error exist at the same time, controlling each point cloud to bind a second preset number of pixel channels; and taking the image after the point cloud and the pixel channel are bound as a projection image.

In one possible implementation mode, an image semantic segmentation method is adopted to obtain the category of a pixel channel bound by each point cloud; and determining the pixel channel characteristics of the pixel area formed by the pixel channels bound by each point cloud based on the categories.

In one possible implementation, the pixel channel features are input to a multi-layer perceptron for feature mixing to obtain pixel channel mixed features.

In a possible implementation manner, the pixel channel mixed features are input into an initial network model for model training, and if the result output by the initial network model meets a preset condition, it is determined that the initial network model is trained, and the trained initial network model is used as a target detection network model.

Or the like, or, alternatively,

acquiring a point cloud and an image of a running environment where a target engineering vehicle is located; projecting the point cloud onto the image to obtain a projected image; and inputting the projection images into a target detection network model, and detecting a plurality of targets of the running environment of the target engineering vehicle.

The method disclosed by the embodiment of the invention can be applied to the processor 1001 or can be implemented by the processor 1001. The processor 1001 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 1001. The Processor 1001 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 1002, and the processor 1001 reads the information in the memory 1002 and performs the steps of the method in combination with the hardware.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be the electronic device shown in fig. 10, and may perform all the steps of the training method of the target detection network model based on the fusion of the pixel and the point cloud features in fig. 1-2 and the target detection method based on the fusion of the pixel and the point cloud features in fig. 7, so as to achieve the technical effects of the training method of the target detection network model based on the fusion of the pixel and the point cloud features in fig. 1-2 and the target detection method based on the fusion of the pixel and the point cloud features in fig. 7, and for brevity, no further description is given here.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of the above kinds of memories.

When the one or more programs in the storage medium are executable by the one or more processors, the method for training the network model for detecting the target based on the fusion of the pixel and the point cloud features and the method for detecting the target based on the fusion of the pixel and the point cloud features, which are executed on the electronic device side, are implemented.

The processor is used for executing a training program of a target detection network model based on the fusion of the pixel and the point cloud characteristics and a target detection program based on the fusion of the pixel and the point cloud characteristics, which are stored in the memory, so as to realize the following steps of a training method of the target detection network model based on the fusion of the pixel and the point cloud characteristics and a target detection method based on the fusion of the pixel and the point cloud characteristics, which are executed on the electronic equipment side:

acquiring a point cloud and an image of a driving environment of the engineering vehicle; projecting the point cloud onto the image to obtain a projected image; acquiring pixel channel characteristics of the projected image; and training the initial network model based on the pixel channel characteristics to obtain a trained target detection network model.

Or the like, or a combination thereof,

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A training method of a target detection network model based on pixel and point cloud feature fusion is characterized by comprising the following steps:

projecting the point cloud onto the image to obtain a projected image;

acquiring pixel channel characteristics of the projected image;

2. The method of claim 1, wherein projecting the point cloud onto the image results in a projected image comprising:

3. The method of claim 2, wherein when there is an error, binding the point cloud with a pixel channel of the image based on a preset rule to obtain a projection image, comprising:

and taking the image after the point cloud and the pixel channel are bound as a projection image.

4. The method of claim 3, wherein the acquiring pixel channel features of the projection image comprises:

and determining the pixel channel characteristics of the pixel area formed by the pixel channels bound by each point cloud based on the categories.

5. The method of claim 4, further comprising:

6. The training of the initial network model based on the pixel channel characteristics to obtain the trained target detection network model comprises the following steps:

7. A target detection method based on pixel and point cloud feature fusion is characterized by comprising the following steps:

projecting the point cloud onto the image to obtain a projected image;

inputting the projection images into an object detection network model constructed according to any one of claims 1-6, and detecting a plurality of objects of the driving environment of the target engineering vehicle.

8. A training device of a target detection network model based on pixel and point cloud feature fusion is characterized by comprising the following steps:

the acquisition module is used for acquiring point clouds and images of a driving environment where the engineering vehicle is located;

9. A target detection device based on pixel and point cloud feature fusion is characterized by comprising:

10. An electronic device, comprising: a processor and a memory, wherein the processor is configured to execute a training program of a target detection network model based on fusion of pixels and point cloud features and a target detection program based on fusion of pixels and point cloud features stored in the memory to implement the training method of the target detection network model based on fusion of pixels and point cloud features according to any one of claims 1 to 6 and the target detection method based on fusion of pixels and point cloud features according to claim 7.

11. A storage medium storing one or more programs, wherein the one or more programs are executable by one or more processors to implement the method for training a network model for detecting objects based on fusion of pixels and point cloud features according to any one of claims 1 to 6 and the method for detecting objects based on fusion of pixels and point cloud features according to claim 7.