CN118072288A

CN118072288A - Target object extraction method, electronic device and storage medium

Info

Publication number: CN118072288A
Application number: CN202410316158.3A
Authority: CN
Inventors: 张莉; 黄明凤; 覃芳玥; 李天辉; 覃高峰
Original assignee: SAIC GM Wuling Automobile Co Ltd
Current assignee: SAIC GM Wuling Automobile Co Ltd
Priority date: 2024-03-19
Filing date: 2024-03-19
Publication date: 2024-05-24

Abstract

The embodiment of the application provides a target object extraction method, electronic equipment and a storage medium, wherein the method comprises the following steps: inputting the image to be detected into a deep learning model to obtain tensor information corresponding to each pixel point in N pixel points of the image to be detected; analyzing tensor information corresponding to a plurality of pixel points each time, and respectively determining the pixel points corresponding to each target object; determining coordinates of the pixel points corresponding to each target object in a bird's eye view coordinate system according to the position information of the pixel points corresponding to each target object; and carrying out back projection conversion on a plurality of pixel points each time, and converting the coordinates of the pixel points corresponding to each target object in the aerial view coordinate system into the coordinates of each target object in the vehicle coordinate system. According to the embodiment of the application, a plurality of pixel points can be processed at the same time each time, so that the extraction speed of a target object is improved, the instantaneity of an auxiliary parking system is further ensured, and the safety of auxiliary parking is enhanced.

Description

Target object extraction method, electronic device and storage medium

Technical Field

The present application relates to the field of object extraction technologies, and in particular, to an object extraction method, an electronic device, and a storage medium.

Background

With the continuous development of the automobile industry, more and more automobiles are provided with driving assistance functions. Autopilot functions include, but are not limited to, adaptive cruise control, front collision warning, automatic emergency braking, lane keeping assistance, and parking assistance. It is understood that these auxiliary driving functions require the vehicle to sense the surrounding environment through various sensing devices in order to ensure the safety of the vehicle and the personnel on the vehicle, thereby avoiding the collision of the vehicle with other vehicles or obstacles.

In the process of assisting parking, the vehicle not only needs to acquire a drivable area around the vehicle in real time, but also needs to acquire the types of the obstacles corresponding to the surrounding obstacles and the relative position information of the obstacles and the vehicle, thereby completing the dynamic obstacle avoidance of various obstacles. In the related art, when a vehicle is parked in an auxiliary manner, an image acquired by an on-board looking-around camera is processed pixel by pixel, so as to obtain a drivable area around the vehicle and an obstacle position.

However, the pixels in the image are processed one by one in the auxiliary parking process, so that the data processing time is long, the auxiliary parking system cannot run in real time, and the requirement of a user on the safety of automatic driving cannot be met.

It should be noted that the information disclosed in the background section of the present application is only for enhancement of understanding of the general background of the present application and should not be taken as an admission or any form of suggestion that this information forms the prior art that is well known to a person skilled in the art.

Disclosure of Invention

In view of this, the present application provides a target object extraction method, an electronic device, and a storage medium, so as to solve the problem in the prior art that the vehicle processes pixels in an image one by one in the auxiliary parking process, so that the data processing time is long, the auxiliary parking system cannot run in real time, and the requirement of the user for the safety of automatic driving cannot be met.

In a first aspect, an embodiment of the present application provides a method for extracting a target object, including:

Inputting an image to be detected into a deep learning model, and obtaining tensor information corresponding to each pixel point in N pixel points of the image to be detected, wherein the tensor information comprises position information and K channel scores which are respectively used for representing the similarity between the pixel points and K targets, and N is more than or equal to 2,K and more than or equal to 2;

analyzing tensor information corresponding to a plurality of pixel points each time, and respectively determining the pixel points corresponding to each target object, wherein the target channel score of the pixel points corresponding to the target objects is the highest channel score, and the target channel score is the channel score corresponding to the target objects;

Determining coordinates of the pixel points corresponding to each target object in a bird's eye view coordinate system according to the position information of the pixel points corresponding to each target object;

And carrying out back projection transformation on a plurality of pixel points each time, and converting the coordinates of the pixel points corresponding to each target object in a bird's eye view coordinate system into the coordinates of each target object in a vehicle coordinate system.

In the embodiment of the application, tensor information corresponding to pixel points in an image to be detected is obtained through a deep learning model, tensor information corresponding to a plurality of pixel points is analyzed, so that the pixel points corresponding to each target object are determined, the coordinates of the pixel points corresponding to each target object in a bird's eye view coordinate system are determined according to the position information of the pixel points corresponding to each target object, and then the coordinates of the pixel points corresponding to each target object in a vehicle coordinate system are determined through back projection transformation. It can be understood that in the embodiment of the application, tensor information corresponding to a plurality of pixel points can be analyzed each time, and the plurality of pixel points are processed simultaneously during back projection conversion, so that the extraction speed of a target object can be improved, the instantaneity of an automatic driving system is further ensured, and the safety of automatic driving is enhanced.

In one possible implementation manner, the analyzing tensor information corresponding to the plurality of pixel points each time, respectively determining the pixel point corresponding to each target object includes:

And analyzing tensor information corresponding to the pixel points each time, and marking the pixel points corresponding to each target object as different colors.

In the embodiment of the application, the pixel points corresponding to each target object are marked as different colors. And determining which target object each pixel belongs to according to the corresponding relation between the color and the type of the target object. According to the embodiment of the application, the color of the pixel point is represented by using YUV data, and it can be understood that different targets can be distinguished more easily by using the color, the data size of the YUV data is small, and the processing of the YUV data is faster and more labor-saving.

In one possible implementation manner, the analyzing tensor information corresponding to the pixel points each time, respectively marking the pixel points corresponding to each target object as different colors, includes:

Analyzing tensor information corresponding to a plurality of pixel points each time, and judging whether the highest channel score corresponding to each pixel point is the target channel score corresponding to the ith target object;

and marking the colors of all pixel points with the highest channel score being the target channel score corresponding to the ith target object as the colors corresponding to the ith target object, wherein i is more than or equal to 1 and less than or equal to K.

In the embodiment of the application, tensor information corresponding to a plurality of pixel points is analyzed each time, so that colors corresponding to the plurality of pixel points are determined, the extraction speed of a target object can be increased, the instantaneity of an automatic driving system is improved, and the safety of automatic driving is improved.

In one possible implementation manner, the marking the color of all the pixels with the highest channel score as the target channel score corresponding to the ith target object as the color corresponding to the ith target object includes:

If the highest channel score corresponding to any pixel point is the target channel score corresponding to the ith target object, setting the target channel ID of any pixel point to be 1;

If the highest channel score corresponding to any pixel point is not the target channel score corresponding to the ith target object, setting the target channel ID of any pixel point to be 0;

And performing AND operation on the color information corresponding to the target channel ID of each pixel point and the ith target object.

In the embodiment of the application, the and operation is performed through the color information corresponding to the target channel ID and the i-th target object, and it can be understood that if the target channel ID of any pixel point is 1, the color information corresponding to the pixel point after the and operation is the color information corresponding to the i-th target object; if the target channel ID of any pixel is 0, the color information corresponding to the pixel is 0 after the corresponding operation. In the embodiment of the present application, the color corresponding to the color information of 0 is preset, and in one possible implementation manner, the color corresponding to the color information of 0 is gray, and of course, those skilled in the art may set the color corresponding to the color information of 0 to other colors according to actual needs, which is not particularly limited in the embodiment of the present application.

In one possible implementation, the image to be detected is a panoramic image acquired by a plurality of cameras.

In the embodiment of the application, the image to be detected is a looking-around image acquired by a plurality of cameras, and the looking-around image acquired by the plurality of cameras is processed in a multithreading parallel processing mode. In one possible implementation, four looking-around cameras are configured on the vehicle, and then four paths of looking-around images acquired by the four looking-around cameras are processed in parallel. The extraction speed of the target object can be improved by processing the looking-around images in parallel through multiple lines, and the instantaneity of an automatic driving system is improved.

In one possible implementation manner, the analyzing tensor information corresponding to the plurality of pixel points at a time includes: analyzing tensor information corresponding to a plurality of pixel points each time by adopting neon instruction sets;

The performing back projection transformation on the plurality of pixel points each time includes: the neon instruction set is used to perform back-projection transformation on a plurality of pixels at a time.

In the embodiment of the application, tensor information corresponding to a plurality of pixels is analyzed each time by using a neon instruction set, and back projection conversion is performed on the plurality of pixels each time by using a neon instruction set. The data processing speed can be improved through neon instruction sets, images acquired by the fisheye camera can be directly processed, and the efficiency of extracting the target object is improved.

In one possible implementation manner, the performing back projection transformation on the plurality of pixel points each time converts coordinates of the pixel point corresponding to each target object in a bird's eye view coordinate system into coordinates of each target object in a vehicle coordinate system, including:

and carrying out back projection transformation on a plurality of pixel points based on the camera external parameter matrix each time, and converting the coordinates of the pixel points corresponding to each target object in a bird's eye view coordinate system into the coordinates of each target object in a vehicle coordinate system.

In the embodiment of the application, back projection transformation is performed on a plurality of pixel points at each time, so that the coordinates of the pixel points corresponding to the target object in the aerial view coordinate system are converted into the coordinates in the vehicle coordinate system. The efficiency of the coordinate system conversion is improved, and the instantaneity of the automatic driving system is improved.

In one possible implementation, the target includes an obstacle and a travelable region.

In the embodiment of the application, the object comprises the obstacle and the drivable area, so that the position information of various objects and the drivable area around the vehicle can be obtained by extracting the object, the obstacle is avoided being touched in the automatic driving process, and the safety of an automatic driving system is improved.

In a second aspect, an embodiment of the present application provides an electronic device, including:

A processor;

A memory;

And a computer program, wherein the computer program is stored in the memory, the computer program comprising instructions that, when executed by the processor, cause the electronic device to perform the method of any of the first aspects.

In a third aspect, an embodiment of the present application provides a computer readable storage medium, where the computer readable storage medium includes a stored program, where when the program runs, the program controls a device in which the computer readable storage medium is located to execute the method of any one of the first aspects.

It will be appreciated that the electronic device provided in the second aspect and the computer readable storage medium provided in the third aspect described above are each for performing the method provided by the present application. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.

In the embodiment of the application, tensor information corresponding to pixel points in an image to be detected is obtained through a deep learning model, tensor information corresponding to a plurality of pixel points is analyzed, so that the pixel points corresponding to each target object are determined, the coordinates of the pixel points corresponding to each target object in a bird's eye view coordinate system are determined according to the position information of the pixel points corresponding to each target object, and then the coordinates of the pixel points corresponding to each target object in a vehicle coordinate system are determined through back projection transformation. It can be understood that in the embodiment of the application, tensor information corresponding to a plurality of pixel points can be analyzed each time, and the plurality of pixel points are processed simultaneously during back projection transformation, so that the extraction speed of a target object can be improved, the instantaneity of an auxiliary parking system is further ensured, and the safety of auxiliary parking is enhanced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic flow chart of a target object extraction method according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for determining a pixel point corresponding to a target object according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of another method for extracting a target object according to an embodiment of the present application;

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For a better understanding of the technical solution of the present application, the following detailed description of the embodiments of the present application refers to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one way of describing an association of associated objects, meaning that there may be three relationships, e.g., a and/or b, which may represent: the first and second cases exist separately, and the first and second cases exist separately. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

For ease of understanding, a specific application scenario will be first described by way of example.

Referring to fig. 1, a schematic view of an application scenario is provided in an embodiment of the present application. As shown in fig. 1, in this application scenario, a vehicle 101, a first obstacle 102, a second obstacle 103, a space line 104, and a third obstacle 105 are shown. The vehicle 101 is equipped with an auxiliary parking system, which needs to acquire an obstacle around the vehicle 101 and a drivable area in real time, so that the vehicle 101 is parked in a parking space while avoiding touching the obstacle. As shown in fig. 1, three kinds of obstacles exist around the own vehicle 101, a first obstacle 102, a second obstacle 103, and a third obstacle 105. The areas other than the obstacle are all drivable areas, but in order for the own vehicle 101 to be parked in a parking space, the auxiliary parking system needs to recognize a parking space line 104 around the vehicle, and plan a driving route of the own vehicle 101 based on the obstacle around the vehicle and the parking space line 104, thereby safely parking the own vehicle 101 in the parking space.

It should be noted that, the parking line 104 and the third obstacle 105 shown in fig. 1 are exemplary descriptions, and should not be taken as limiting the scope of the present application, and the parking line 104 may be a parking space or other parking space built by bricks besides a parking line drawn in advance; the third obstacle 105 may be other obstacles, such as trees, road signs, pedestrians, or other types of vehicles, among other cars. In addition, the first obstacle 102 and the second obstacle 103 shown in fig. 1 may be various obstacles such as trees, signboards, pedestrians, or vehicles.

In the related art, when a vehicle is parked in an auxiliary mode, after an auxiliary parking system receives an image acquired by a vehicle-mounted looking-around camera, image data sent by the vehicle-mounted looking-around camera is processed through a semantic segmentation model, and multi-dimensional tensors of semantic types of obstacles and drivable areas in the image are acquired. When the tensor is converted into the data of the drivable area under the vehicle coordinate system, the multi-dimensional data in the tensor need to be traversed one by one, the semantic segmentation image is obtained according to the image mapping table, and then the pixel points are classified one by one and the coordinate system is converted, so that the grid map and the obstacle position information around the vehicle are obtained. It will be appreciated that in this method it is necessary to process the pixels one by one in order to obtain the drivable area around the vehicle and the obstacle position. Processing pixels in the image one by one can make the data processing time longer, and the auxiliary parking system can not run in real time, and can not meet the requirement of users on the safety of automatic driving.

In view of the above problems, an embodiment of the present application provides a method for extracting a target object, where tensor information corresponding to a pixel point in an image to be detected is obtained through a deep learning model, tensor information corresponding to a plurality of pixel points is analyzed, so as to determine a pixel point corresponding to each target object, and according to position information of the pixel point corresponding to each target object, determine coordinates of the pixel point corresponding to each target object in a bird's eye view coordinate system, and further determine coordinates of the pixel point corresponding to each target object in a vehicle coordinate system through back projection transformation. It can be understood that in the embodiment of the application, tensor information corresponding to a plurality of pixel points can be analyzed each time, and the plurality of pixel points are processed simultaneously during back projection transformation, so that the extraction speed of a target object can be improved, the instantaneity of an auxiliary parking system is further ensured, and the safety of auxiliary parking is enhanced. The following detailed description refers to the accompanying drawings and specific embodiments.

Referring to fig. 2, a flow chart of a method for extracting a target object according to an embodiment of the present application is shown. The method can be applied to the application scenario shown in fig. 1, and the method is performed by the object extraction device, as shown in fig. 2, and mainly includes the following steps.

Step S201: and inputting the image to be detected into a deep learning model to obtain tensor information corresponding to each pixel point in N pixel points of the image to be detected.

Specifically, an all-around camera on a vehicle acquires an image and sends the image to a target object extraction device, the target object extraction device inputs an image to be detected into a deep learning model, and tensor information corresponding to each pixel point in N pixel points of the image to be detected is obtained through the deep learning model, wherein N is more than or equal to 2. The deep learning model is a model trained by a large number of marked images in advance, and in the embodiment of the application, the image to be detected is input into the deep learning model to obtain tensor information corresponding to each pixel point in N pixel points in the image to be detected. It can be understood that the tensor information and the pixel point are in a one-to-one correspondence relationship, that is, among the N pixel points, the tensor information corresponding to the pixel point a is a, the tensor information corresponding to the pixel point B is B, and so on, and the tensor information corresponding to the pixel point N is N. It should be noted that, since the pixels in the image to be detected do not necessarily all obtain tensor information, N in the embodiment of the present application is not necessarily all the pixels in the image to be detected. In addition, in the embodiment of the application, the looking-around camera of the vehicle can be a fisheye camera, and can also be other various types of cameras.

The tensor information comprises position information of pixel points corresponding to the tensor information and K channel scores, wherein K is more than or equal to 2. The position information of the pixel point is used for representing the position of the point in the image to be detected, and the position information is understood to be the abscissa of the pixel point in the image to be detected; the K channel scores are used to represent the similarity between the pixel point and K target objects, and because in the embodiment of the application, K preset target object types exist, tensor information corresponding to each pixel point includes the similarity between the pixel point and K target objects. In the embodiment of the present application, K is 19, and of course, those skilled in the art may set the types of the targets to other numbers according to actual needs, which is not particularly limited in the embodiment of the present application.

For example, for ease of understanding, the embodiment of the present application provides a schematic chart of tensor information, as shown in table one, the position information of the pixel point corresponding to the tensor information is (x 1, y 1), the 0 th channel score of the pixel point corresponding to the tensor information is 0.07, that is, the similarity between the pixel point and the first target object is 0.07; similarly, the 1 st channel fraction of the pixel point corresponding to the tensor information is 0.05, that is, the similarity between the pixel point and the second target object is 0.05; the 2 nd channel fraction of the pixel point corresponding to the tensor information is 0.05, namely the similarity between the pixel point and the third target object is 0.05; the 3 rd channel fraction of the pixel point corresponding to the tensor information is 0.7, namely the similarity between the pixel point and the fourth target object is 0.7; similarly, the K-th channel score of the pixel corresponding to the tensor information is 0.03, i.e. the similarity between the pixel and the K-th object is 0.03. It should be noted that, in the specific implementation, tensor information is not necessarily recorded in a table form, and the channel fraction is not necessarily floating point type data, so the table should not be taken as a limitation on the protection scope of the present application.

Table one:

Position information

Channel 0

Channel 1

Channel 2

3 Rd channel

……

K channel

(x1，y1)

0.07

0.05

0.7

……

0.03

If the vehicle extracts only the obstacle around the vehicle and takes the area without the obstacle as the drivable area, it may cause the vehicle to drive into the area where the vehicle is prohibited from entering. Therefore, in the driving support function, it is necessary to obtain information of the drivable region in addition to information of the obstacle around the vehicle, and therefore, in the embodiment of the present application, the target object includes the obstacle and the drivable region.

Step S202: and analyzing tensor information corresponding to the plurality of pixel points at each time, and respectively determining the pixel points corresponding to each target object.

The embodiment of the application can analyze tensor information corresponding to a plurality of pixel points each time so as to determine the pixel point corresponding to each target object, wherein the target channel score of the pixel point corresponding to the target object is the highest channel score, and the target channel score is the channel score corresponding to the target object. Specifically, according to actual needs, a designer may determine what the target object needs to be extracted is in advance, and set the channel corresponding to the target object as the 0 th channel or other channels, that is, the channel corresponding to the target object needs to be extracted is the target channel, and the fraction of tensor information corresponding to each pixel point in the target channel is the target channel fraction. If the target channel score of any pixel point is the highest score in all channel scores of the pixel point, the pixel point is the pixel point corresponding to the target object, in other words, if the channel corresponding to the highest channel score of any pixel point is the target channel, the pixel point is the pixel point corresponding to the target object. In addition, in the embodiment of the present application, a pixel is taken as an example for illustration, but in the embodiment of the present application, a plurality of pixels are analyzed simultaneously, and the plurality of pixels are processed in parallel, that is, the highest channel score is the highest channel score of each pixel, and the channel scores of the plurality of pixels are not compared with each other.

The target channel is the 0 th channel, the 0 th channel fraction of the tensor information corresponding to the pixel point a is 0.9, and the fractions of other channels in the tensor information corresponding to the pixel point a are all smaller than 0.9, so that the pixel point a is the pixel point corresponding to the target object; similarly, if the 0 th channel score of the tensor information corresponding to the pixel point B is 0.8 and the scores of other channels in the tensor information corresponding to the pixel point B are all smaller than 0.8, the pixel point B is the pixel point corresponding to the target object; however, if the 0 th channel score of the tensor information corresponding to the pixel point C is 0.2, and the 5 th channel score of the tensor information corresponding to the pixel point C is 0.5, which is greater than the target channel score of 0.2, the pixel point C is not the pixel point corresponding to the target object. Of course, those skilled in the art may set the target channel in any channel according to actual needs, and the embodiment of the present application is not limited thereto.

In one possible implementation, multiple targets need to be extracted at the same time, if only channel scores in tensor information corresponding to pixels are used to distinguish the targets, the data processing amount may be large, and the distinction between each target is not obvious enough. Therefore, in order to distinguish multiple targets, the pixel points corresponding to each target are marked with different colors in the embodiment of the application.

Specifically, tensor information corresponding to a plurality of pixel points is analyzed each time, whether the highest channel score corresponding to each pixel point is a target channel score corresponding to an ith target object or not is judged, and colors of all pixel points with the highest channel scores being the target channel scores corresponding to the ith target object are marked as colors corresponding to the ith target object, wherein i is more than or equal to 1 and less than or equal to K. Wherein the i-th target is one of K targets. In the embodiment of the application, the tensor information corresponding to a plurality of pixel points is analyzed each time through neon instruction sets, so that the data processing efficiency is improved. In one possible implementation, the neon instruction set may process 8 pixels at a time or 16 pixels at a time, as desired by the skilled artisan.

For example, if the target object to be extracted by the technician is the first target object, the target channel corresponding to the first target object is the 0 th channel, and the color corresponding to the first target object is red. The target object extraction device judges whether the highest channel score corresponding to each pixel point is the target channel score corresponding to the first target object, namely judges whether the highest channel score corresponding to each pixel point is the 0 th channel score, and marks the colors of all the pixel points with the highest channel scores being the 0 th channel score corresponding to the first target object as red corresponding to the first target object.

In one possible implementation, if the highest channel score corresponding to a pixel is different from the target channel score, the color of the pixel is marked as gray. Of course, according to actual needs, those skilled in the art may set the color of the pixel point corresponding to the non-target object to other colors, which is not particularly limited in the embodiment of the present application.

For example, if the target object to be extracted by the technician is a first target object, the target channel corresponding to the first target object is a 0th channel, the color corresponding to the first target object is red, and the color of the pixel point corresponding to the non-target object is gray. The highest channel score corresponding to the pixel point A is the 0th channel score, the target object extraction device judges that the highest channel score corresponding to the pixel point A is the target channel score, and the color of the pixel point A is marked as red; and if the highest channel score corresponding to the pixel point D is the 7 th channel score, the target object extraction device judges that the highest channel score corresponding to the pixel point D is not the target channel score, and marks the color of the pixel point D as gray.

In order to record color information of pixel points corresponding to each target object conveniently, in the embodiment of the present application, if the highest channel score corresponding to any pixel point is the target channel score corresponding to the i-th target object, the target channel ID of the pixel point is set to 1; if the highest channel score corresponding to any pixel point is not the target channel score corresponding to the ith target object, setting the target channel ID of the pixel point to be 0; and performing AND operation on the color information corresponding to the target channel ID of each pixel point and the ith target object. The target channel ID is used to represent whether the pixel point is a pixel point corresponding to the target object, that is, the channel corresponding to the target object to be extracted is the 0 th channel, if the channel ID of the 0 th channel of one pixel point is 1, the pixel point is the pixel point corresponding to the target object, and similarly, if the channel ID of the 0 th channel of one pixel point is 0, the pixel point is not the pixel point corresponding to the target object. In other words, the object corresponding to the channel with the channel score of 1 in each pixel is the object type corresponding to the pixel. In addition, in the embodiment of the present application, the color information corresponding to the i-th target object is and calculated by the target channel ID of each pixel, and it can be understood that if the target channel ID of any pixel is 1, the color information corresponding to the i-th target object is the result of the calculation; if the target channel ID of any pixel is 0, the result of AND operation is 0. In the embodiment of the present application, when the color information and the operation result corresponding to the i-th target object are 0, the pixel point is marked as gray. Of course, according to actual needs, those skilled in the art may mark the pixel point with the color information corresponding to the target channel ID and the ith target object and the result of the operation being 0 as other colors, which is not particularly limited in the embodiment of the present application.

If the color of the pixel point is marked by using a conventional color standard (e.g., RGB), the data processing amount may be large, thereby affecting the extraction efficiency of the target object. Therefore, in the embodiment of the present application, a YUV color model is used to represent color, where Y represents luminance, UV represents chrominance, and chrominance may be subdivided into hue and saturation, in the YUV color model, each pixel is wrapped with Y, U, V components, the Y components typically occupy 8 bits (1 byte) of storage space, the U, V components typically occupy 8 bits (1 byte) of storage space, so that YUV information of one pixel typically occupies 16 bits (2 bytes) of storage space, and R, G, B components each occupy 8 bits (1 byte) of storage space, and one pixel typically occupies 24 bits (3 bytes) of storage space. Therefore, the YUV color model can reduce the data processing amount, and further improve the efficiency of extracting the target object.

In addition, in practical application, the pixel point corresponding to the target object is determined only according to the highest channel score and the target channel score of tensor information corresponding to each pixel point, and it may happen that a part of the pixels belonging to the background are marked as the target object. In order to solve the problem, in the embodiment of the present application, when determining the pixel points corresponding to the target object, determining whether the highest channel score corresponding to each pixel point is the target channel score, if the highest channel score of any pixel point is the target channel score, determining whether the target channel score of the pixel point is greater than or equal to a preset target channel score threshold, and if the target channel score of the pixel point is greater than or equal to the preset target channel score threshold, determining that the pixel point is the pixel point corresponding to the target object; if the target channel score of the pixel is smaller than the preset target channel score threshold, determining that the pixel is not the pixel corresponding to the target object and the pixel is the background. Wherein the background represents a pixel that does not belong to any of the K objects, for example, the pixel may represent the sky.

Corresponding to the above embodiment, the present application further provides a flowchart of a method for determining a pixel point corresponding to the target object.

Referring to fig. 3, a flowchart of a method for determining a pixel point corresponding to a target object according to an embodiment of the present application is provided. As shown in fig. 3, it mainly includes the following steps.

Step S301: and reading the target channel fraction of the 8-bit pixel point corresponding to the tensor information.

Specifically, in the embodiment of the present application, floating point data is adopted, so that 8-bit pixel points can be processed simultaneously, if integer data is adopted, 16-bit pixel points can be processed simultaneously, and according to actual needs, a person skilled in the art can set the number of pixel points processed simultaneously to other numbers, which is not particularly limited in the embodiment of the present application.

Step S302: and reading the highest channel fraction of the 8-bit pixel point corresponding to the tensor information.

Step S303: judging whether the highest channel score is greater than the target channel score, if so, executing step S304; if not, step S305 is performed.

Step S304: the pixel is stored as a non-target pixel.

Step S305: judging whether the target channel score is greater than a target channel score threshold, if so, executing step S307; if not, step S306 is performed.

Step S306: the target channel ID of the pixel is set to 0.

Step S307: the target channel ID of the pixel is set to 1.

Step S308: and performing AND operation on the target channel ID and the YUV component.

Step S309: and obtaining the color of the pixel point corresponding to the target object.

The specific content of the embodiment of the present application may be referred to the description of the embodiment described in step S202, and for brevity, the description is omitted.

Step S203: and determining the coordinates of the pixel points corresponding to each target object in the aerial view coordinate system according to the position information of the pixel points corresponding to each target object.

Specifically, the object extraction device converts the position information of the pixel point corresponding to each object in the image to be detected into coordinates in the aerial view coordinate system through looking around the internal and external parameters of the camera. In addition, the embodiment of the application can directly process the image sent by the fisheye camera, so that the pixel points corresponding to each target object are required to be subjected to de-distortion processing, and the coordinates in the aerial view coordinate system are obtained.

In practical applications, more than one looking-around camera is usually used, and if the images sent by the cameras are separately processed, only coordinates in a plurality of sets of aerial view coordinate systems can be obtained finally, for example, the front of the vehicle corresponds to coordinates in a set of aerial view coordinate systems, the rear of the vehicle corresponds to coordinates in a set of aerial view coordinate systems, the left of the vehicle corresponds to coordinates in a set of aerial view coordinate systems, and the right of the vehicle corresponds to coordinates in a set of aerial view coordinate systems. In this way, no relation exists between coordinates in the aerial view coordinate system, and the subsequent auxiliary parking function is affected. Therefore, in the embodiment of the application, the image to be detected is an all-around image acquired by a plurality of cameras, and after the target object extraction device performs the de-distortion processing on the pixel points corresponding to each target object, the pixel points acquired by the plurality of cameras corresponding to the same moment are spliced, so that the coordinates of the complete and continuous target objects around the vehicle at the same moment in the aerial view coordinate system are obtained.

In one possible implementation, only the position information of the pixel corresponding to the target object is converted into coordinates in the aerial view coordinate system, that is, if one pixel is a background or the corresponding target object is not the target object to be extracted, the pixel is not processed. Only the pixel points corresponding to the target object to be extracted are in the aerial view coordinate system obtained by the method, so that the data processing speed is improved, and the efficiency of extracting the target object is improved.

In another possible implementation manner, the position information of all the pixels is converted into coordinates in the aerial view coordinate system, that is, the position information of the pixels is converted into coordinates in the aerial view coordinate system regardless of whether the object corresponding to the pixels is the object to be extracted. The aerial view coordinate system obtained by the method comprises the target object to be extracted and the background, and the method does not need to delete the pixel points and does not delete the pixel points corresponding to the target object, so that the method is less prone to error.

Step S204: and carrying out back projection conversion on a plurality of pixel points each time, and converting the coordinates of the pixel points corresponding to each target object in the aerial view coordinate system into the coordinates of each target object in the vehicle coordinate system.

Specifically, each time a plurality of pixels are subjected to back projection transformation, the coordinates of the pixels corresponding to each object in the aerial view coordinate system are converted into the coordinates of each object in the vehicle coordinate system. It will be appreciated that the back projection transformation may convert two-dimensional coordinates in the aerial view to three-dimensional coordinates in the vehicle coordinate system. The vehicle coordinate system in the embodiment of the application refers to a coordinate system fixed on the vehicle body, and generally uses the center point of the rear axle of the vehicle as the origin of coordinates, uses the vehicle direction as the x direction, uses the left side of the vehicle as the y direction, and uses the vertical upward direction as the z direction to form a right-hand coordinate system.

In the embodiment of the application, the neon instruction set is adopted to carry out back projection conversion on a plurality of pixel points each time, and the neon instruction set can process the plurality of pixel points at one time, so that the extraction efficiency of the target object is improved, and the real-time performance of the auxiliary parking system is further ensured. In the embodiment of the application, the mode of carrying out back projection transformation on the pixel points is that a plurality of pixel points are back projection transformed based on a camera external parameter matrix. The efficiency and accuracy of the back projection transformation can be improved by performing the back projection transformation through the external matrix of the camera.

In summary, in the embodiment of the present application, tensor information corresponding to a pixel point in an image to be detected is obtained through a deep learning model, and tensor information corresponding to a plurality of pixel points is analyzed, so as to determine a pixel point corresponding to each target object, and according to position information of the pixel point corresponding to each target object, determine coordinates of the pixel point corresponding to each target object in a bird's eye view coordinate system, and further determine coordinates of the pixel point corresponding to each target object in a vehicle coordinate system through back projection transformation. It can be understood that in the embodiment of the application, tensor information corresponding to a plurality of pixel points can be analyzed each time, and the plurality of pixel points are processed simultaneously during back projection transformation, so that the extraction speed of a target object can be improved, the instantaneity of an auxiliary parking system is further ensured, and the safety of auxiliary parking is enhanced. The target extraction method provided by the embodiment of the application can be applied to various vehicle-mounted all-round camera systems, and the intelligent level of the system is improved.

Corresponding to the above embodiment, the present application also provides a flow chart of another target extraction method.

Referring to fig. 4, a flow chart of another method for extracting a target object according to an embodiment of the present application is shown. As shown in fig. 4, it mainly includes the following steps.

Step S401: and acquiring an image to be detected.

Step S402: and inputting the image to be detected into a deep learning model to obtain tensor information.

Step S403: and analyzing tensor information corresponding to a plurality of pixel points each time by adopting neon instruction sets to obtain YUV information corresponding to the pixel points.

Step S404: and de-distorting and splicing the YUV information to obtain the coordinates of the pixel points corresponding to each object in the aerial view coordinate system.

Step S405: and carrying out back projection transformation on a plurality of pixel points each time based on a camera external parameter matrix by using neon instruction sets to obtain the coordinates of the corresponding pixel points of each target object in a vehicle coordinate system.

Step S406: coordinates of the obstacle and the drivable area in the own vehicle coordinate system are obtained.

The details of the embodiments of the present application may be referred to the description of the embodiment shown in fig. 2, and for brevity, the description is omitted.

Corresponding to the embodiment, the application also provides electronic equipment.

Referring to fig. 5, a schematic structural diagram of an electronic device according to an embodiment of the present application is provided. As shown in fig. 5, the electronic device 500 may include: a processor 501, a memory 502 and a communication unit 503. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the electronic device shown in the drawings is not limiting of the embodiments of the application, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.

Wherein the communication unit 503 is configured to establish a communication channel, so that the electronic device may communicate with other devices. Receiving user data sent by other devices or sending user data to other devices.

The processor 501, which is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and/or processes data by running or executing software programs, instructions, and/or modules stored in the memory 502, and invoking data stored in the memory. The processor may be comprised of integrated circuits (INTEGRATED CI rcuit, ICs), such as a single packaged IC, or may be comprised of packaged ICs that connect multiple identical or different functions. For example, the processor 501 may include only a central processing unit (centra l process ing unit, CPU). In the embodiment of the application, the CPU can be a single operation core or can comprise multiple operation cores.

The memory 502, for storing instructions for execution by the processor 501, the memory 502 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.

The execution of the instructions in memory 502, when executed by processor 501, enables electronic device 500 to perform some or all of the steps of the embodiment shown in fig. 1.

In a specific implementation, an embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a program, where the program may include some or all of the steps in each embodiment of the method for generating a simulation scene provided by the embodiment of the present application when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in the embodiments disclosed herein can be implemented as a combination of electronic hardware, computer software, and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In several embodiments provided by the present application, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory RAM), a magnetic disk, or an optical disk, etc., which can store program codes.

The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the device embodiment and the terminal embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description in the method embodiment for relevant points.

Claims

1. A method for extracting a target, comprising:

2. The method according to claim 1, wherein analyzing tensor information corresponding to a plurality of the pixels at a time, respectively determining pixels corresponding to each of the targets, includes:

3. The method according to claim 2, wherein analyzing tensor information corresponding to a plurality of the pixels at a time, respectively marking pixels corresponding to each of the objects as different colors, includes:

4. A method according to claim 3, wherein the marking the color of all the pixels having the highest channel score as the target channel score corresponding to the i-th target object as the color corresponding to the i-th target object comprises:

5. The method of claim 1, wherein the image to be detected is a look-around image acquired by a plurality of cameras.

6. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The analyzing tensor information corresponding to the pixel points each time includes: analyzing tensor information corresponding to a plurality of pixel points each time by adopting neon instruction sets;

7. The method according to claim 1, wherein the performing the back projection transformation on the plurality of pixels each time converts coordinates of the pixel corresponding to each of the objects in the bird's eye view coordinate system into coordinates of each of the objects in the vehicle coordinate system, includes:

8. The method of claim 1, wherein the target comprises an obstacle and a travelable region.

9. An electronic device, comprising:

A processor;

A memory;

And a computer program, wherein the computer program is stored in the memory, the computer program comprising instructions that, when executed by the processor, cause the electronic device to perform the method of any one of claims 1 to 8.

10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run controls a device in which the computer readable storage medium is located to perform the method according to any one of claims 1 to 8.