CN116152702A

CN116152702A - Point cloud label acquisition method and device, electronic equipment and automatic driving vehicle

Info

Publication number: CN116152702A
Application number: CN202211649079.1A
Authority: CN
Inventors: 欧阳博骏; 梁志栋; 王云鹏; 陈竞凯; 马彧; 王昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-05-23

Abstract

The disclosure provides a method and a device for acquiring a point cloud tag and an automatic driving vehicle, and belongs to the technical field of artificial intelligence, in particular to the technical field of deep learning, semantic segmentation and automatic driving, and the specific implementation scheme is as follows: acquiring M point cloud frames including a current point cloud frame, and respectively projecting point clouds in the M point cloud frames into grids of a bird's eye view, wherein M is an integer greater than or equal to 2; acquiring target fusion characteristics of M point cloud frames corresponding to each grid according to the projected aerial view; aiming at each grid, based on target fusion characteristics of the grid, respectively acquiring semantic segmentation labels and dynamic and static labels of the grid; and carrying out back projection on the grid, obtaining the grid where the point cloud is located in the current point cloud frame, and determining the label of the grid where the point cloud is located as the label of the point cloud. According to the method and the device for obtaining the labels of the point cloud, feature fusion is carried out on the multi-frame point cloud, the labels of the point cloud are obtained based on the fusion features, the time delay between the labels of the obtained point cloud is reduced, and the accuracy and the reliability of the labels of the point cloud are improved.

Description

Point cloud label acquisition method and device, electronic equipment and automatic driving vehicle

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, semantic segmentation and automatic driving, and particularly relates to a method and a device for acquiring a point cloud label and an electronic equipment storage medium.

Background

With the popularization and development of deep learning and laser radar, semantic segmentation and dynamic and static estimation of a point cloud background are gradually possible through a deep learning method. In the related art, two tasks, namely semantic segmentation of a point cloud background and dynamic and static estimation are independent of each other, and when the semantic segmentation result and the dynamic and static estimation result are wanted to be obtained at the same time, two deep learning models are needed to be used for respectively generating the semantic segmentation result and the dynamic and static estimation result, and when an application scene is an automatic driving scene, the method tends to increase the time delay of an automatic driving perception link. Therefore, how to reduce the time delay between the semantic segmentation label and the dynamic and static label of each point cloud in the point cloud frame, and ensure the accuracy and reliability of the label of each point cloud in the point cloud frame has become a problem to be solved.

Disclosure of Invention

The disclosure provides a method, a device, electronic equipment, a storage medium and a program product for acquiring a point cloud label.

According to a first aspect, there is provided a method for acquiring a point cloud tag, including: acquiring M point cloud frames including a current point cloud frame, and respectively projecting point clouds in the M point cloud frames into grids of a bird's eye view, wherein M is an integer greater than or equal to 2; acquiring target fusion characteristics of the M point cloud frames corresponding to each grid according to the projected aerial view; respectively acquiring semantic segmentation labels and dynamic and static labels of each grid based on target fusion characteristics of the grids; and carrying out back projection on each grid, determining the grid of the point cloud in the current point cloud frame, and determining the semantic segmentation label and the dynamic and static label of the grid of the point cloud as the label of the point cloud.

According to a second aspect, there is provided an acquisition apparatus of a point cloud tag, including: the projection module is used for acquiring M point cloud frames including the current point cloud frame, and respectively projecting the point clouds in the M point cloud frames into grids of the aerial view, wherein M is an integer greater than or equal to 2; the first acquisition module is used for acquiring target fusion characteristics of the M point cloud frames corresponding to each grid according to the projected aerial view; the second acquisition module is used for respectively acquiring semantic segmentation labels and dynamic and static labels of the grids based on target fusion characteristics of the grids aiming at each grid; and the third acquisition module is used for carrying out back projection on each grid, determining the grid of the point cloud in the current point cloud frame, and determining the semantic segmentation label and the dynamic and static label of the grid of the point cloud as the label of the point cloud.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of obtaining a point cloud tag according to the first aspect of the present disclosure.

According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of acquiring a point cloud tag according to the first aspect of the present disclosure.

According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of obtaining a point cloud tag according to the first aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flowchart of a method for acquiring a point cloud tag according to a first embodiment of the present disclosure;

fig. 2 is a flowchart of a method for acquiring a point cloud tag according to a second embodiment of the present disclosure;

fig. 3 is a flowchart of a method for acquiring a point cloud tag according to a third embodiment of the present disclosure;

fig. 4 is a schematic diagram of the architecture of a backbone network according to the present disclosure;

fig. 5 is a flowchart of a method for acquiring a point cloud tag according to a fourth embodiment of the present disclosure;

fig. 6 is a flowchart of a method for acquiring a point cloud tag according to a fifth embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a method of acquiring a point cloud tag according to the present disclosure;

FIG. 8 is a block diagram of an acquisition device for a point cloud tag used to implement an embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing a method for acquiring a point cloud tag according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Artificial intelligence (Artificial Intelligence, AI for short), which is a new technical science to study, develop theories, methods, techniques and application systems for simulating, extending and expanding human intelligence.

Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), which is introduced into Machine Learning to make it closer to the original target-artificial intelligence, and Deep Learning is the inherent law and presentation hierarchy of Learning sample data, and information obtained in these Learning processes greatly helps interpretation of data such as text, image and sound. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data.

Semantic segmentation (semantic segmentation), in which visual input needs to be divided into different semantically interpretable categories, i.e. classification categories are meaningful in the real world, is a fundamental task in computer vision.

Autopilot generally refers to an autopilot system that employs advanced communication, computer, network and control techniques to achieve real-time, continuous control of a train. The method adopts modern communication means, directly faces the train, can realize bidirectional data communication between the train and the ground, has high transmission rate and large information quantity, and can timely obtain the exact position of the front train by the follow-up train and the control center, so that the operation management is more flexible, the control is more effective, and the method is more suitable for the automatic driving requirement of the train.

The following describes a method for acquiring a point cloud tag according to an embodiment of the present disclosure with reference to the accompanying drawings.

Fig. 1 is a flowchart of a method for acquiring a point cloud tag according to a first embodiment of the present disclosure.

As shown in fig. 1, the method for obtaining a point cloud label according to the embodiment of the disclosure may specifically include the following steps:

s101, acquiring M point cloud frames including a current point cloud frame, and respectively projecting point clouds in the M point cloud frames into grids of a bird' S eye view, wherein M is an integer greater than or equal to 2.

Specifically, the execution body of the method for acquiring the point cloud label according to the embodiment of the present disclosure may be a processing apparatus provided by the embodiment of the present disclosure, where the processing apparatus may be a hardware device with a data information processing capability and/or software necessary for driving the hardware device to work. Alternatively, the execution body may include a workstation, a server, a computer, a user terminal, and other devices. The user terminal comprises, but is not limited to, a mobile phone, a computer, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals and the like.

It should be noted that, the specific manner of acquiring the point cloud frame is not limited in this disclosure, and may be selected according to actual situations.

Alternatively, a laser radar acquisition device may be utilized to acquire the point cloud frame.

For example, a point cloud frame may be acquired by using an image acquisition device such as a laser line scan camera or a binocular structured light camera.

Optionally, a laser radar (Light Detection And Ranging, liDAR for short) may be used to acquire the point cloud frame.

The Bird's Eye View (BEV) is a perspective View drawn by looking up the ground from a certain point at a high place by a high-View perspective method according to the perspective principle.

It should be noted that, the point clouds in the M point cloud frames are projected into the grid of the aerial view respectively, that is, the three-dimensional (x, y, z) coordinates of the point clouds are projected onto the two-dimensional (x, y) coordinates of the grid.

The specific arrangement of the grid in the bird's eye view is not limited in this disclosure, and may be set according to actual circumstances.

Alternatively, 10m may be set as a grid; alternatively, 20m may be provided as a grid.

After the three-dimensional (x, y, z) coordinates of the point cloud are obtained, the point cloud may be projected into a grid corresponding to the bird's-eye view according to the three-dimensional (x, y, z) coordinates of the point cloud and the grid value of the bird's-eye view.

Alternatively, when M is 2, the point clouds in the current frame point cloud frame and the previous frame point cloud frame may be projected into the grid of the aerial view respectively.

S102, acquiring target fusion characteristics of M point cloud frames corresponding to each grid according to the projected aerial view.

After the point clouds in the M point cloud frames are projected to the grids of the aerial view respectively, a plurality of initial feature information can be obtained according to the aerial view after projection, and then the plurality of initial feature information can be processed to obtain the target fusion feature.

For example, according to the projected aerial view, a plurality of initial features such as the number information of the point clouds, the reflectivity information of the point clouds, the height difference information of the point clouds and the like can be obtained, then the plurality of initial features are spliced to obtain spliced features, and then feature extraction is performed on the spliced features to obtain target fusion features.

S103, based on the target fusion characteristics of the grid, semantic segmentation labels and dynamic and static labels of the grid are respectively obtained.

For example, after the target fusion feature of the grid is obtained, the target fusion feature may be input into a corresponding model to obtain the semantic segmentation tag and the dynamic tag of the grid respectively, where the model has two output branches, and the semantic segmentation tag and the dynamic tag of the grid may be output at the same time.

And S104, carrying out back projection on each grid, determining the grid where the point cloud is located in the current point cloud frame, and determining the semantic segmentation label and the dynamic and static label of the grid where the point cloud is located as the label of the point cloud.

In the embodiment of the disclosure, after the semantic segmentation labels and the dynamic and static labels of the grids are obtained, the grids can be back projected, that is, each grid is back projected to the current point cloud frame, so that the point cloud in the current point cloud frame covered by each grid can be determined, and the grid where each point cloud in the current point cloud frame is located can be determined. Further, determining semantic segmentation labels and dynamic and static labels of the grid where the point cloud is located as labels of the point cloud.

For example, for grid 1, when the semantic segmentation label of grid 1 is 1 and the dynamic and static label is 0, the semantic segmentation label of each point cloud of the current point cloud frame in grid 1 is 1 and the dynamic and static label is 0.

In summary, according to the method for acquiring the point cloud label in the embodiment of the present disclosure, by acquiring M point cloud frames including a current point cloud frame, and projecting point clouds in the M point cloud frames to grids of a bird's eye view respectively, where M is an integer greater than or equal to 2, acquiring target fusion features of the M point cloud frames corresponding to each grid according to the bird's eye view after projection, respectively acquiring semantic segmentation labels and dynamic and static labels of the grids based on the target fusion features of the grids, and performing back projection on each grid to acquire a label of each point cloud in the current point cloud frame. According to the method and the device, point clouds in a plurality of point cloud frames are projected into the grid, the aggregation of the point clouds in the plurality of point cloud frames is achieved through the grid, the same grid can have the characteristics of the plurality of point cloud frames, further, semantic segmentation labels and dynamic and static labels of the grid are obtained based on the target fusion characteristics of the grid, the grid where the point clouds are located in the current point cloud frame is determined through back projection, and therefore the semantic segmentation labels and the dynamic and static labels of the point clouds in the current point cloud frame are determined, time delay between the semantic segmentation labels and the dynamic and static labels of the obtained point clouds is reduced, and meanwhile accuracy and reliability of labels of the point clouds are improved.

Fig. 2 is a flowchart of a method for acquiring a point cloud tag according to a second embodiment of the present disclosure.

As shown in fig. 2, on the basis of the embodiment shown in fig. 1, the method for obtaining the point cloud label according to the embodiment of the present disclosure specifically includes the following steps:

s201, obtaining M point cloud frames including a current point cloud frame, and respectively projecting point clouds in the M point cloud frames into grids of a bird' S eye view, wherein M is an integer greater than or equal to 2.

Specifically, step S201 in this embodiment is the same as step S101 in the above embodiment, and will not be described here again.

Step S102 "obtaining the target fusion characteristics of M point cloud frames corresponding to each grid according to the projected aerial view" in the above embodiment may specifically include the following steps S202 to S204.

S202, acquiring initial characteristic information of each point cloud frame corresponding to each grid according to the projected aerial view of the point cloud frame for each point cloud frame in the M point cloud frames.

As a possible implementation manner, as shown in fig. 3, based on the foregoing embodiment, the specific process of obtaining initial feature information of the point cloud frame corresponding to each grid according to the projected bird' S eye view of the point cloud frame in the step S202 includes the following steps:

S301, acquiring point clouds in each grid in the aerial view after projection.

Alternatively, the point clouds located within each grid in the projected aerial view may be determined from two-dimensional (x, y) coordinates of the grids, where the point clouds are composed of a plurality of point clouds.

S302, determining initial characteristic information of each grid based on the point cloud set of each grid.

Alternatively, the number of point clouds in the point cloud set may be obtained, the height value of the point cloud set may be obtained, the height difference and/or the average height of the point cloud set may be determined according to the height value, the reflectivity of the point cloud set may be obtained, and the average reflectivity of the point cloud set may be determined according to the reflectivity.

Alternatively, the number of point clouds in the point cloud, the height difference and/or the average height of the point cloud, and the average reflectivity of the point cloud may be used as initial characteristic information of each grid.

And S203, splicing the initial characteristic information of each point cloud frame of the same grid to obtain candidate splicing characteristics.

For example, the initial feature information of each point cloud frame of the same grid may be stitched in a row dimension to obtain candidate stitching features.

S204, processing the candidate splicing characteristics of the grids through the backbone network to obtain target fusion characteristics of the grids.

Optionally, an initial backbone network structure may be established, the initial backbone network structure is trained using a known data set and a validation set, and a total loss function is set to monitor the initial backbone network structure, thereby obtaining a trained backbone network structure.

For example, as shown in fig. 4, the backbone network structure may input the candidate splicing feature (concat) into the backbone network, and the candidate splicing feature is processed by the convolution layer and deconvolution layer of the backbone network to output the target fusion feature.

Step S103 "based on the target fusion feature of the mesh, the acquisition of the semantic segmentation tags of the mesh" in the above embodiment may specifically include the following steps S205 to S207.

S205, carrying out semantic segmentation on the target fusion features of the grid to obtain a plurality of semantic segmentation probabilities of the grid.

As a possible implementation manner, as shown in fig. 5, on the basis of the foregoing embodiment, the specific process of performing semantic segmentation on the target fusion feature of the mesh in the foregoing step S205 to obtain multiple semantic segmentation probabilities of the mesh includes the following steps:

s501, performing first convolution processing on the target fusion feature to obtain a first convolution post-fusion feature.

Alternatively, the target fusion feature may be encoded using a two-dimensional convolution to the target fusion feature to obtain a first convolved fusion feature.

S502, performing first probability function mapping on the fusion features after the first convolution to obtain a plurality of semantic segmentation probabilities.

Alternatively, the first probability function may be a normalized exponential function, i.e. a softmax function, by which a vector of values may be normalized to a vector of probability distributions, and the sum of the probabilities is one, i.e. mapped to a value of (0, 1).

For example, a softmax function mapping is performed on the first convolved fusion feature, and a plurality of semantic segmentation probabilities can be obtained.

S206, determining the maximum semantic segmentation probability from the semantic segmentation probabilities.

It should be noted that, the specific manner of determining the maximum semantic segmentation probability from the multiple semantic segmentation probabilities is not limited in this disclosure, and may be selected according to actual situations.

Alternatively, the maximum semantic segmentation probability may be determined from a plurality of semantic segmentation probabilities by a maximum value argument point set argmax function.

S207, determining the semantic segmentation label corresponding to the maximum semantic segmentation probability as the semantic segmentation label of the grid.

In the embodiment of the disclosure, when determining the semantic segmentation label as the grid, the semantic segmentation label corresponding to the largest semantic segmentation probability may be determined as the semantic segmentation label of the grid.

Step S103 "acquiring the dynamic and static labels of the grid based on the target fusion characteristics of the grid" in the above embodiment may specifically include the following steps S208 to S209.

S208, classifying and identifying the target fusion characteristics of the grid, and acquiring the type identification probability of the grid.

As a possible implementation manner, as shown in fig. 6, based on the foregoing embodiment, the specific process of performing classification recognition on the target fusion feature of the grid in the step S208 to obtain the type recognition probability of the grid includes the following steps:

s601, performing second convolution processing on the target fusion feature to obtain a second convolution post-fusion feature.

Alternatively, the target fusion feature may be encoded using a two-dimensional convolution to the target fusion feature to obtain a second convolved fusion feature.

S602, performing second probability function mapping on the fusion features after the second convolution to obtain the type recognition probability of the grid.

Alternatively, the second probability function may be a sigmiod function by which the variables may be mapped to values of (0, 1).

For example, sigmid function mapping is performed on the fusion feature after the second convolution to obtain the type recognition probability of the grid.

S209, comparing the type recognition probability with a preset probability threshold, and determining dynamic and static labels of the grid based on the comparison result.

Alternatively, the probability threshold may be preset to 0.5.

For example, if the type recognition probability is greater than 0.5, the dynamic and static label of the grid may be determined to be 1, and if the type recognition probability is less than 0.5, the dynamic and static label of the grid may be determined to be 0.

S210, carrying out back projection on each grid to obtain the label of each point cloud in the current point cloud frame.

Specifically, step S210 in this embodiment is the same as step S104 in the above embodiment, and will not be described here again.

According to the method and the device, point clouds in a plurality of point cloud frames are projected into the grid, the aggregation of the point clouds in the plurality of point cloud frames is achieved through the grid, the same grid can have the characteristics of the plurality of point cloud frames, further, semantic segmentation labels and dynamic and static labels of the grid are obtained based on the target fusion characteristics of the grid, the grid where the point clouds are located in the current point cloud frame is determined through back projection, therefore, the semantic segmentation labels and the dynamic and static labels of the point clouds in the current point cloud frame are determined, and the accuracy and the reliability of the labels of each point cloud in the point cloud frame are improved. Further, the semantic segmentation labels and the dynamic labels of each point cloud in the point cloud frame can be obtained simultaneously through a complete neural network, and the time delay between the semantic segmentation labels and the dynamic labels of the obtained point cloud is reduced.

The method for acquiring the point cloud label is explained below.

For example, as shown in fig. 7, a current frame point cloud frame (point clouds) and a previous frame point cloud frame (point clouds) may be projected onto a Bev grid, a post-projection aerial view (Bev project) may be obtained, then initial feature information (manual feature) of the point cloud frame corresponding to each grid is generated (hand craft feature), the initial feature information generated by each of the two frame point clouds is spliced (concat), and input into a model, where the model includes a backbone network, a semantic segmentation network and a dynamic and static estimation network, optionally, an initial backbone network structure may be established, the initial backbone network structure is trained by using a known data set and a verification set, a trained backbone network structure may be obtained, optionally, an initial semantic segmentation network structure may be established, the initial semantic segmentation network structure may be trained by using the known data set and the verification set, the initial feature information is supervised, the initial feature information generated by each of the two frame point clouds may be spliced (concat), the model includes a backbone network, the semantic segmentation network and the dynamic and static estimation network may be established, the initial backbone network structure may be supervised by using the known data set and the verification set, the initial backbone network may be obtained, the initial backbone network structure may be obtained by training the initial structure is performed by using the set, the initial semantic segmentation network structure may be obtained, the initial semantic segmentation network structure may be obtained, the initial structure may be obtained, and the initial semantic structure may be obtained, and the initial, and the final structure may be calculated by using a binary structure by using a known structure, and the binary structure is calculated by using a set by the method by the set as a set by the known as a set by a set and a set by a method is calculated by a method and a method is calculated by a method is calculated by a training a method is obtained by a training a, converting output into a probability value by using a softmax function, determining the maximum semantic segmentation probability from a plurality of semantic segmentation probabilities by using an argmax function, determining the semantic segmentation label corresponding to the maximum semantic segmentation probability as a grid semantic segmentation label, encoding the target fusion feature by using a two-dimensional convolution in an output branch of dynamic and static estimation, converting the output into a probability value by using a sigmiod function, comparing the type recognition probability with a preset probability threshold by setting a 0.5 probability threshold, determining the dynamic and static labels of the grid based on the comparison result, and obtaining the semantic segmentation label and the dynamic and static labels in the bev grid, wherein the label of each point cloud in the point cloud frame of the current frame can be obtained by a back projection mode.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.

Fig. 8 is a schematic structural diagram of an acquisition device of a point cloud tag according to an embodiment of the present disclosure.

As shown in fig. 8, the device 800 for acquiring a point cloud label includes: the projection module 810, the first acquisition module 820, the second acquisition module 830, and the third acquisition module 840. Wherein:

The projection module 810 is configured to obtain M point cloud frames including a current point cloud frame, and respectively project point clouds in the M point cloud frames into grids of a bird's eye view, where M is an integer greater than or equal to 2;

a first obtaining module 820, configured to obtain target fusion features of the M point cloud frames corresponding to each grid according to the projected aerial view;

a second obtaining module 830, configured to obtain, for each grid, a semantic segmentation tag and a dynamic and static tag of the grid based on a target fusion feature of the grid, respectively;

the third obtaining module 840 is configured to perform back projection on each grid, determine the grid where the point cloud is located in the current point cloud frame, and determine the semantic segmentation tag and the dynamic and static tag of the grid where the point cloud is located as the tag of the point cloud.

The first obtaining module 820 is further configured to:

aiming at each point cloud frame in the M point cloud frames, acquiring initial characteristic information of the point cloud frames corresponding to each grid according to the projected aerial view of the point cloud frames;

splicing the initial characteristic information of each point cloud frame of the same grid to obtain candidate splicing characteristics;

and processing the candidate splicing characteristics through a backbone network to obtain the target fusion characteristics.

Wherein, the first obtaining module 820 is further configured to:

acquiring point clouds in each grid in the projected aerial view;

the initial characteristic information of each grid is determined based on the point cloud set of each grid.

The first obtaining module 820 is further configured to:

acquiring the quantity of point clouds in the point cloud set;

acquiring a height value of point clouds in the point cloud set, and determining a height difference and/or an average height of the point cloud set according to the height value;

and acquiring the reflectivity of the point cloud in the point cloud set, and determining the average reflectivity of the point cloud set according to the reflectivity.

The second obtaining module 830 is further configured to:

for each grid, carrying out semantic segmentation on target fusion features of the grid to obtain a plurality of semantic segmentation probabilities of the grid;

determining a maximum semantic segmentation probability from the plurality of semantic segmentation probabilities;

and determining the semantic segmentation label corresponding to the maximum semantic segmentation probability as the semantic segmentation label of the grid.

The second obtaining module 830 is further configured to:

performing first convolution processing on the target fusion feature to obtain a first convolution post-fusion feature;

and performing first probability function mapping on the first convolution fusion characteristic to obtain the semantic segmentation probabilities.

The second obtaining module 830 is further configured to:

aiming at each grid, carrying out classification recognition on the target fusion characteristics of the grid, and obtaining the type recognition probability of the grid;

and comparing the type recognition probability with a preset probability threshold value, and determining the dynamic and static labels of the grid based on a comparison result.

The second obtaining module 830 is further configured to:

performing second convolution processing on the target fusion feature to obtain a second convolution post-fusion feature;

and performing second probability function mapping on the fusion features after the second convolution to obtain the type recognition probability of the grid.

It should be noted that the explanation of the embodiment of the method for acquiring the point cloud label is also applicable to the device for acquiring the point cloud label in the embodiment of the present disclosure, and the specific process is not repeated here.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 907 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, the acquisition method of the point cloud tag. For example, in some embodiments, a method of obtaining a point cloud tag. May be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the above-described model training or point cloud tag acquisition method may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the method of acquiring the point cloud tag in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

The present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of obtaining a point cloud tag as described above.

The present disclosure also provides an autonomous vehicle, which may include an electronic device as in the above embodiment, for executing the method for acquiring a point cloud tag as in the above embodiment. The automatic driving vehicle is provided with a point cloud acquisition device, the point cloud acquisition device is used for acquiring point cloud frames, the acquired point cloud frames can be input into electronic equipment, and the electronic equipment executes the acquisition method of the point cloud label in the embodiment.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. The method for acquiring the point cloud label comprises the following steps:

acquiring M point cloud frames including a current point cloud frame, and respectively projecting point clouds in the M point cloud frames into grids of a bird's eye view, wherein M is an integer greater than or equal to 2;

acquiring target fusion characteristics of the M point cloud frames corresponding to each grid according to the projected aerial view;

aiming at each grid, acquiring a semantic segmentation label and a dynamic and static label of the grid based on target fusion characteristics of the grid;

and carrying out back projection on each grid, determining the grid where the point cloud is located in the current point cloud frame, and determining semantic segmentation labels and dynamic and static labels of the grid where the point cloud is located as labels of the point cloud.

2. The method of claim 1, wherein the obtaining, according to the projected aerial view, the target fusion features of the M point cloud frames corresponding to each grid includes:

And processing the candidate splicing characteristics of the grids through a backbone network aiming at each grid to obtain target fusion characteristics of the grids.

3. The method of claim 2, wherein the obtaining initial feature information of the point cloud frame corresponding to each grid according to the projected bird's eye view of the point cloud frame comprises:

acquiring point clouds in each grid in the projected aerial view;

4. A method according to claim 3, wherein said determining said initial characteristic information for each grid based on the point clouds of each grid comprises:

acquiring the quantity of point clouds in the point cloud set;

5. The method of any of claims 1-4, wherein for each mesh, obtaining semantic segmentation tags for the mesh based on target fusion features of the mesh, comprises:

Performing semantic segmentation on the target fusion features of the grid to obtain a plurality of semantic segmentation probabilities of the grid;

6. The method of claim 5, wherein semantically segmenting the target fusion feature of the mesh to obtain a plurality of semantic segmentation probabilities for the mesh, comprises:

7. The method of any of claims 1-4, wherein for each grid, obtaining an dynamic and static tag for the grid based on target fusion features of the grid, comprises:

performing classification recognition on the target fusion characteristics of the grid to obtain the type recognition probability of the grid;

8. The method of claim 6, wherein the classifying the target fusion feature of the grid to obtain a type recognition probability of the grid comprises:

9. An acquisition device of a point cloud tag, comprising:

the projection module is used for acquiring M point cloud frames including the current point cloud frame, and respectively projecting the point clouds in the M point cloud frames into grids of the aerial view, wherein M is an integer greater than or equal to 2;

the first acquisition module is used for acquiring target fusion characteristics of the M point cloud frames corresponding to each grid according to the projected aerial view;

the second acquisition module is used for respectively acquiring semantic segmentation labels and dynamic and static labels of the grids based on target fusion characteristics of the grids aiming at each grid;

and the third acquisition module is used for carrying out back projection on each grid, determining the grid where the point cloud is located in the current point cloud frame, and determining the semantic segmentation label and the dynamic and static label of the grid where the point cloud is located as the label of the point cloud.

10. The apparatus of claim 9, wherein the first acquisition module is further configured to:

11. The apparatus of claim 10, wherein the first acquisition module is further configured to:

acquiring point clouds in each grid in the projected aerial view;

12. The apparatus of claim 11, wherein the first acquisition module is further configured to:

acquiring the quantity of point clouds in the point cloud set;

13. The apparatus of any of claims 9-12, wherein the second acquisition module is further to:

14. The apparatus of claim 13, wherein the second acquisition module is further configured to:

15. The apparatus of any one of claims 9-12, wherein the second acquisition module is further configured to:

16. The apparatus of claim 15, wherein the second acquisition module is further configured to:

17. An electronic device, comprising a processor and a memory;

wherein the processor runs a program corresponding to executable program code stored in the memory by reading the executable program code for implementing the method according to claims 1-8.

18. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to claims 1-8.

20. An autonomous vehicle comprising the electronic device of claim 17.