CN113537163B

CN113537163B - Model training method and system for parking space detection

Info

Publication number: CN113537163B
Application number: CN202111077297.8A
Authority: CN
Inventors: 陈宇; 李发成; 张如高; 虞正华
Original assignee: Suzhou Moshi Intelligent Technology Co ltd
Current assignee: Suzhou Moshi Intelligent Technology Co ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2021-12-28
Anticipated expiration: 2041-09-15
Also published as: CN113537163A

Abstract

The invention discloses a model training method and a system for parking space detection, wherein the method comprises the following steps: acquiring a bird's-eye view sample of a parking space, wherein the parking space in the bird's-eye view sample is provided with a coding label, and the coding label is used for representing closed-loop direction information of the parking space; the closed loop direction information is composed of direction information of all pixel points on the parking space line; inputting the aerial view sample into a preset model, extracting characteristic information of the aerial view sample through the preset model, and identifying the coding information of the parking space in the aerial view sample on the basis of the characteristic information; and comparing the code label of the parking space with the code information of the parking space, and correcting the training parameters in the preset model according to the comparison result, so that the code information obtained by processing the aerial view sample by the corrected preset model is matched with the code label. The technical scheme provided by the invention has wider detection adaptability and higher parking space identification precision.

Description

Model training method and system for parking space detection

Technical Field

The invention relates to the technical field of image processing, in particular to a model training method and system for parking space detection.

Background

With the continuous development of artificial intelligence technology, automatic driving and assistant driving technologies are also continuously advanced. Currently, the automatic parking function has been applied to some vehicle models. When the automatic parking function is implemented, it is often necessary to detect the environment around the vehicle body so as to detect a parking space where parking is possible.

The existing parking space detection method can identify the angular point information of the parking space, and then classify the angular point information, thereby identifying the parking space information. In addition, the detection of the mark points can be carried out on the aerial view containing the parking spaces, and then the machine learning method is used for deducing the detected mark points, so that the possible parking spaces can be predicted.

However, in the prior art, these parking space detection methods usually rely on the identification process of the angular points or the marking points, and many technologies are specially designed for some common parking spaces, so that there is a strong prior requirement for the shape of the parking space. If the angular point or the mark point of the parking space is affected by the environment and cannot be identified, the global closed-loop information of the parking space cannot be well utilized, and the detected parking space is inaccurate.

Disclosure of Invention

In view of this, the embodiment of the present invention provides a model training method and system for parking space detection, which have wider detection adaptability and higher parking space identification accuracy.

The invention provides a model training method for parking space detection, which comprises the following steps: acquiring a bird's-eye view sample of a parking space, wherein the parking space in the bird's-eye view sample is provided with a coding label, and the coding label is used for representing closed-loop direction information of the parking space; the closed loop direction information is composed of direction information of all pixel points on the parking space line; identifying direction information of the parking space line aiming at any parking space line in the aerial view pattern book, and taking a code value matched with the direction information as a code label of the parking space line; combining the coded labels of the parking space lines of the parking spaces to serve as the coded labels of the parking spaces; inputting the aerial view sample into a preset model, extracting characteristic information of the aerial view sample through the preset model, and identifying the coding information of the parking space in the aerial view sample on the basis of the characteristic information; and comparing the code label of the parking space with the code information of the parking space, and correcting the training parameters in the preset model according to the comparison result, so that the code information obtained by processing the aerial view sample by the corrected preset model is matched with the code label.

In one embodiment, if the parking spaces in the bird's eye view sample are irregular shapes, the method further comprises: identifying direction information of each pixel point on a parking space line of the parking space, and using a coded value corresponding to the direction information of each pixel point as a coded label of the parking space line; and the direction information of the pixel points represents the tangential direction.

In one embodiment, extracting the feature information of the bird's eye view sample comprises: generating a plurality of layered features of the aerial view sample, fusing at least two layered features of the plurality of layered features to generate corresponding fused features, and using the generated fused features as feature information of the aerial view sample.

In one embodiment, the code information of the parking space includes code values of each parking space line of the parking space, where the code value of each parking space line includes a first code value and a second code value, and the first code value and the second code value are output through different channels of the preset model respectively; and if the target parking space line of the parking space is overlapped with the parking space lines of other parking spaces, the first coding value and the second coding value of the target parking space line are output through two independent and different channels respectively.

In one embodiment, after identifying the coded information of the parking space in the bird's-eye view pattern, the method further comprises: and identifying the parking space type of the parking space, wherein the parking space type comprises a parking space or a non-parking space, and outputting the coded information of the parking space through a channel group matched with the parking space type.

In one embodiment, after correcting the training parameters in the preset model, the method further comprises: inputting the aerial view to be detected into the corrected preset model to generate the coding information of the parking space to be detected in the aerial view to be detected; and determining each parking space line of the parking space to be detected according to the coded information of the parking space to be detected, so as to represent the parking space to be detected through the determined parking space lines.

In one embodiment, the pixel points on each parking space line of the parking spaces to be detected are determined according to the following method: determining an initial pixel point located on a parking space line in the aerial view to be detected, and identifying the direction represented by the coding information of the initial pixel point; determining a searching area in the direction, determining a representative pixel point in the searching area, and taking the representative pixel point as the next pixel point adjacent to the initial pixel point on the parking space line; and taking the representative pixel point as an initial pixel point, and continuously searching the next pixel point.

In one embodiment, the method further comprises: and if a plurality of target parking spaces with overlapping areas are detected in the aerial view to be detected, determining the confidence of each target parking space, and taking the target parking space with the maximum confidence as the detected real parking space.

In another aspect of the present invention, a model training system for parking space detection is further provided, where the system includes: the system comprises a sample acquisition unit, a storage unit and a display unit, wherein the sample acquisition unit is used for acquiring a bird's-eye view sample of a parking space, the parking space in the bird's-eye view sample is provided with a coding label, and the coding label is used for representing closed-loop direction information of the parking space; identifying direction information of the parking space line aiming at any parking space line in the aerial view pattern book, and taking a code value matched with the direction information as a code label of the parking space line; combining the coded labels of the parking space lines of the parking spaces to serve as the coded labels of the parking spaces; the coded information identification unit is used for inputting the aerial view sample into a preset model, extracting the characteristic information of the aerial view sample through the preset model and identifying the coded information of the parking space in the aerial view sample on the basis of the characteristic information; and the model correction unit is used for comparing the code label of the parking space with the code information of the parking space and correcting the training parameters in the preset model according to the comparison result so that the code information obtained by processing the aerial view sample by the corrected preset model is matched with the code label.

According to the technical scheme, the parking spaces contained in the aerial view can be detected in a machine learning mode. When the machine learning model is trained, the parking spaces in the training samples can be provided with coding labels, and the coding labels can represent closed-loop direction information of the parking spaces. In practical application, the closed-loop direction information may be composed of direction information of each pixel point on the parking space line. Like this, at the in-process that model training and parking stall detected, can not totally rely on the angular point information of parking stall, as long as the parking stall that waits to detect is closed loop structure, can both detect through the technical scheme of this application to the suitability and the degree of accuracy that the parking stall detected have been improved.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:

FIG. 1 is a schematic diagram illustrating steps of a model training method for parking space detection according to an embodiment of the present invention;

FIG. 2 shows a schematic view of a bird's eye view in one embodiment of the invention;

FIG. 3 is a schematic diagram illustrating directional information of a parking space in one embodiment of the present invention;

FIG. 4 shows a schematic diagram of a unit circle in one embodiment of the present invention;

fig. 5 is a schematic direction diagram of pixel points in irregular parking spaces according to an embodiment of the present invention;

FIG. 6 is a schematic view of a parking space on a shared line in accordance with an embodiment of the present invention;

FIG. 7 is a diagram illustrating pixel searching according to an embodiment of the invention;

FIG. 8 shows a functional block diagram of a model training system in one embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Referring to fig. 1, the method for training a parking space detection model provided by the present application may include the following steps.

S1: acquiring a bird's-eye view sample of a parking space, wherein the parking space in the bird's-eye view sample is provided with a coding label, and the coding label is used for representing closed-loop direction information of the parking space; the closed loop direction information is composed of direction information of all pixel points on the parking space line.

In this embodiment, the bird's eye view sample of the parking space may be acquired by a camera mounted on the vehicle. Specifically, the fish-eye cameras can be installed on the vehicle in different directions, the environment images acquired by the respective fish-eye cameras can be spliced into a bird's-eye view, and the center of the bird's-eye view can be the vehicle. Fig. 2 shows a schematic view of a bird's eye view sample, in which it can be seen that the vehicle is located in the center of the bird's eye view sample and that two slots can be included in the schematic view.

In order to effectively perform the machine learning process, training labels can be labeled for the parking spaces in the bird's-eye view sample. Generally speaking, parking spaces are closed loop structures formed by a plurality of parking space lines. The common parking spaces are in rectangular structures formed by four parking space lines, and some novel parking spaces can also be in irregular structures formed by straight lines and curves. In this embodiment, the direction information of the parking space line may be determined first, and the parking space line may be encoded based on the direction information.

Specifically, referring to fig. 3, taking a rectangular parking space as an example, the parking space lines of all parking spaces in the bird's eye view can follow a clockwise direction. Of course, in some scenarios, the direction of each parking space line may also be determined according to the entry line of the parking space. For example, in fig. 3, the number of four vertices may be defined according to the entry line of the parking space, wherein two vertices corresponding to the entry line may be respectively labeled as 1 and 2, and then the remaining two vertices are sequentially labeled as 3 and 4 in order from 1 to 2. Thus, the direction of the vehicle line can be determined in the order of the numbers from small to large. For example, the direction of the entry line is from 1 to 2, and so on.

In addition, in practical applications, the direction of the vehicle position line may be determined according to other rules, and the present application is not limited thereto as long as the direction of the vehicle position line can be determined.

After the directional information of the vehicle location line is determined, the directional information may be quantized in an encoded manner for subsequent digital processing. Referring to fig. 4, for any direction in the plane image, it can be represented by a point on the unit cell shown in fig. 4. The coordinate value of the point can be used as the encoded value corresponding to the direction information. For example, in fig. 3, after the vertical downward direction is mapped to fig. 3, it can be represented by a point whose coordinate value is (0, -1), and then (0, -1) can be used as the encoded value of the direction. For another example, when the horizontal right direction is mapped to fig. 3 and can be represented by a point whose coordinate value is (1, 0), then (1, 0) can be used as the code value for that direction.

As can be seen from the above description, for any one of the parking spaces in the bird's-eye view pattern, the direction information of the parking space line can be identified, and then the direction information is mapped into the unit circle shown in fig. 4, so that the code value matched with the direction information can be used as the code label of the parking space line. For a parking space, the combination of the code labels of each parking space line can be used as the code label of the parking space.

In practical application, the code label of the parking space can be expressed by taking a pixel point as a unit. Specifically, the parking space line is constituted by individual pixel points, and then the corresponding direction information of the pixel points on the same parking space line is consistent. For example, in fig. 3, the direction information of each pixel point on the entry line can be represented by the encoding values (0, -1).

Of course, if the parking space is not rectangular but irregular, the direction information of each pixel point on the parking space line of the parking space can be identified, and the code value corresponding to the direction information of each pixel point is used as the code label of the parking space line. Specifically, the direction of the pixel point may be a corresponding tangential direction. For example, in the parking space shown in fig. 5, the direction of each pixel point on the straight line may be consistent with the direction of the straight line, but the direction of each pixel point on the curve is the corresponding tangential direction, and of course, the tangential direction also needs to keep the same trend as the direction of the curve.

Through the mode of carrying out direction coding one by the pixel points, the coding value of each pixel point can be finally gathered, so that the coding label of the whole parking space is obtained, and the coding label can represent the direction information of the parking space.

S3: inputting the aerial view sample into a preset model, extracting characteristic information of the aerial view sample through the preset model, and identifying the coding information of the parking space in the aerial view sample on the basis of the characteristic information.

In this embodiment, after the code label is marked on the parking space in the bird's-eye view sample, the bird's-eye view sample can be used to train the preset model. Specifically, the backbone network (backbone) in the preset model may adopt a resnet, a mobilene, a shufflenet, an xcepton, or a customized network structure. In practical applications, a plurality of hierarchies may be connected in sequence in the backbone network, so that the bird's-eye-view pattern is downsampled layer by layer, and after each downsampling, a corresponding hierarchical feature may be generated. For example, five hierarchies may be included in the backbone network, and then the five hierarchies may perform layer-by-layer down-sampling on the input bird's eye view sample, so as to obtain 5 hierarchical features with different dimensions.

After the plurality of hierarchical features are generated, in order to enlarge the receptive field of the network, at least two of the hierarchical features may be fused to generate corresponding fused features, and the fused features may be used as feature information of the extracted bird's-eye view sample.

In a specific application example, assuming that 5 hierarchical features are generated in total, the last four hierarchical features (denoted as F2, F3, F4, F5) may be selected. F2 may not be subjected to additional processing compared to the bottom layer. The features of F3 to F5 may be further extracted by using ASPP (aperture space convolutional Pooling Pyramid) modules, respectively. After the feature extraction of the ASPP modules, the F3 to F5 can generate the deepened features of the corresponding F31, F41 and F51. The final fused feature is generated by merging (concat) the layered feature F2 with the deepened features F31, F41, F51, and fusing the merged features with 1 × 1 convolutional layers.

Of course, in practical applications, in order to enlarge the field of view of the network, the layered features may be fused in other ways, which is not limited here.

In the present embodiment, after the characteristic information of the bird's-eye view sample is generated, the characteristic information may be input to the final convolution layer, and the code information of the parking space in the bird's-eye view sample may be output. The encoding information has the same format as the encoding label marked in step S1, and is also an encoding value expressed in units of pixel points.

It should be noted that, since the coded value of one pixel includes two scores X and Y, in order to distinguish the two scores, the two scores may be output by using different channels during machine learning. For example, the encoded value corresponding to the X-axis may be output through the same channel, and the encoded value corresponding to the Y-axis may be output through another channel. Finally, the coded values corresponding to the X axis and the Y axis can be output through two different channels. Since the encoded values are determined based on a unit circle, the encoded values of X and Y should conform to the following relationship:

where X represents the encoded value for the X-axis and Y represents the encoded value for the Y-axis.

That is to say, the code information of the parking space output by the preset model may include code values of each parking space line of the parking space, where the code value of each parking space line may include a first code value (corresponding to the X axis) and a second code value (corresponding to the Y axis), and the first code value and the second code value may be output through different channels of the preset model respectively.

In one embodiment, the same parking space line may be shared between parking spaces in some scenarios. For example, in the scenario shown in fig. 6, two parking spaces share the same parking space line. And if the parking space line is viewed from the parking space on the left side, the direction of the parking space line is vertically downward. But if viewed from the right parking space, the direction of the parking line should be vertically upward. That is to say, the same line is when detecting as different parking stalls, and its direction is totally different. In order to accurately represent such different directions, it is necessary to output the code values of the common bit line using separate channels. Taking the scenario shown in fig. 6 as an example, the parking space lines represented by the parking space entrances may not be shared, but the parking space lines with longer sides may be shared with adjacent parking spaces. For this reason, for four parking lines of one parking space (marked as 1-2 sides, 2-3 sides, 3-4 sides and 4-1 sides according to numbers), the 1-2 sides and the 3-4 sides can use two channels in total to output the coded value in the X direction and the coded value in the Y direction respectively. And 2-3 sides need to use two independent channels to respectively output the coded value in the X direction and the coded value in the Y direction. Similarly, side 4-1 also requires the use of two additional separate channels. Thus, for a parking space, the final channel output result can be expressed as:

wherein the content of the first and second substances,

the encoded values in the X direction for 1-2 sides and 3-4 sides can be output,

the encoded values in the Y direction of 1-2 sides and 3-4 sides can be output,

the encoded values in the X direction for 2-3 sides may be output,

the encoded values in the Y direction for 2-3 sides may be output,

the encoded value in the X direction of the 4-1 edge can be output,

the encoded value in the Y direction of the 4-1 edge can be output.

That is, if the target parking space line of a certain parking space overlaps with the parking space lines of other parking spaces, the first coded value and the second coded value of the target parking space line need to be output through two independent channels respectively.

In another specific application scenario, the parking spaces in the bird's-eye view sample may be marked as parking spaces and parking spaces, and in order to distinguish the two different parking space types, the encoded information corresponding to the two different parking space types may be output through different channel groups. For example, for parking spaces, the parking spaces can be arranged as described above

Such a channel can be used for outputting, and for non-parking spaces, the channel can be used for outputting

Such a channel for output.

That is to say, predetermine the model except can discerning the coding information of each pixel, can also confirm simultaneously that this pixel is located can the parking stall, still is located can not the parking stall, finally according to the parking stall type of discerning, predetermine the model and can organize the coding information of output corresponding parking stall through the passageway with this parking stall type assorted.

S5: and comparing the code label of the parking space with the code information of the parking space, and correcting the training parameters in the preset model according to the comparison result, so that the code information obtained by processing the aerial view sample by the corrected preset model is matched with the code label.

In this embodiment, the training parameters in the preset model may not be accurate during initialization, so that the finally output code information of the parking space is inconsistent with the labeled code label. Under the condition, the code label of the parking space and the code information of the parking space output by the preset model need to be compared, and the training parameters in the preset model are corrected according to the comparison result.

In practical applications, a loss function may be employed to represent the difference between the encoded tag and the encoded information. In one embodiment, two loss functions may be employed to jointly characterize this difference. One of the loss functions may be BCE loss, which may be expressed as:

wherein the content of the first and second substances,

the coded value corresponding to the nth pixel point representing the preset model output,

and representing the corresponding encoding value of the nth pixel point in the encoding label.

The above-mentioned loss function may perform error calculation for the encoded values output in each channel, and may finally count the total error value.

Further, another loss function may be L2 loss, which may be expressed as:

finally, the results of the two loss functions are added, so that a final error value can be obtained. Through the mode of gradient feedback, the preset model can be corrected.

Through continuous training of a large number of aerial view samples, the coding information output by the corrected preset model can be matched with the corresponding coding labels. The term "match" may mean that the two are identical to each other, or that the error between the two is within an allowable range.

In one embodiment, after a preset model with high precision is obtained through training, the preset model can be used for identifying the code information of the parking space of the bird's-eye view to be detected. Specifically, the bird's-eye view to be detected can be input into the corrected preset model, so that the encoded information of the parking space to be detected in the bird's-eye view to be detected is generated. The encoded information may be the encoded value of each pixel point output according to different channels.

In this embodiment, according to the coded information of the parking space to be detected, each parking space line of the parking space to be detected can be determined, so that the parking space to be detected is characterized by the determined parking space lines.

Specifically, the preset model outputs the X-axis and Y-axis encoded values according to different channels, and the X-axis encoded value and the Y-axis encoded value belonging to the same pixel point can be identified through the calculated values. For example, if the sum of the squares of an X-axis encoded value and a Y-axis encoded value is 1, it indicates that the two encoded values belong to the same pixel. Through the mode, the coding values output by all the channels of the preset model can be matched, so that the coding information of all the pixel points on the parking space line can be obtained.

The coding information of the pixel points can represent the direction of the pixel points, and based on the coding information of the pixel points, the pixel points on the parking space line can be searched one by one in a pixel point searching mode. Specifically, referring to fig. 7, an initial pixel point located on a parking space line can be determined in the bird's-eye view to be detected, and the direction represented by the encoding information of the initial pixel point is identified. Then, a search area (shaded rectangular in fig. 7) can be determined in the direction, and representative pixels can be determined in the search area. Specifically, a plurality of pixel points may exist in the search area, at this time, candidate pixel points having pixel values greater than a specified threshold value may be screened out (theoretically, after the processing of the preset model, the pixel values of the pixel points on the parking space line are all close to 1, and the pixel values of the pixel points outside the parking space line are all 0), and then the coordinate values of the candidate pixel points are averaged, so that the obtained average coordinate value is used as the coordinate value of the representative pixel point. The representative pixel point can be used as the next pixel point adjacent to the initial pixel point on the parking space line. Then, the representative pixel can be used as the initial pixel, and the searching process is repeated, so that the next pixel is searched continuously until the initial pixel is returned to. Therefore, through the searching method of pixel points one by one, each pixel point on each parking space line can be searched in the aerial view to be detected, and therefore the corresponding parking spaces to be detected can be represented through the determined parking space lines.

It should be noted that there may be multiple parking spaces in the bird's eye view to be detected, and then the connected domain may be determined for 2-3 sides of each parking space. If there are N disconnected regions, it means there are N different parking spaces. The reason why the 2-3 sides are judged to be connected is that the 2-3 sides may be used as a common side for two parking spaces. In this way, the connected domain is judged by the shared edge, so that the situation that the current aerial view specifically contains a plurality of different parking spaces can be accurately identified.

The recognized parking spaces can contain parking spaces and non-parking spaces, and the parking spaces of different types can output corresponding coded information through different channel groups. Therefore, the line of parking space can be searched according to the above mode aiming at the coded information output by different channel groups, so as to determine the parking spaces in the bird's-eye view.

In one embodiment, the accuracy of image stitching and the actual environment may have an impact on the parking space recognition result. For example, repeated parking space lines may be generated at the spliced positions of the four fish-eye cameras due to calibration errors and/or uneven ground, so that the final parking space recognition result includes a plurality of target parking spaces with overlapping areas. For the partially overlapped target parking spaces, the overlapping degree between the parking spaces can be determined by a calculation mode of an Interactive Over Union (IOU). If the contact ratio is larger than the designated threshold value, the confidence of each target parking space can be respectively determined. The confidence of the target parking space may be the sum of the coded values of the pixel points on the parking space line of the target parking space. It should be noted that the encoded value of the pixel here may be a sum of squares of the encoded values in both the X axis and the Y axis. Therefore, after the sum of squares of all the pixel points is added, the confidence coefficient of the target parking space can be obtained. Finally, the target parking space with the maximum confidence coefficient can be used as the detected real parking space, and other parking spaces with the coincidence degree with the real parking space larger than the designated threshold value can be removed.

Referring to fig. 8, the present application further provides a model training system for parking space detection, the system includes:

the system comprises a sample acquisition unit, a storage unit and a display unit, wherein the sample acquisition unit is used for acquiring a bird's-eye view sample of a parking space, the parking space in the bird's-eye view sample is provided with a coding label, and the coding label is used for representing closed-loop direction information of the parking space; the closed loop direction information is composed of direction information of all pixel points on the parking space line; identifying direction information of the parking space line aiming at any parking space line in the aerial view pattern book, and taking a code value matched with the direction information as a code label of the parking space line; combining the coded labels of the parking space lines of the parking spaces to serve as the coded labels of the parking spaces;

the coded information identification unit is used for inputting the aerial view sample into a preset model, extracting the characteristic information of the aerial view sample through the preset model and identifying the coded information of the parking space in the aerial view sample on the basis of the characteristic information;

and the model correction unit is used for comparing the code label of the parking space with the code information of the parking space and correcting the training parameters in the preset model according to the comparison result so that the code information obtained by processing the aerial view sample by the corrected preset model is matched with the code label.

An embodiment of the present application further provides a model training device for parking space detection, where the model training device for parking space detection includes a processor and a memory, where the memory is used to store a computer program, and when the computer program is executed by the processor, the aforementioned model training method for parking space detection is implemented.

The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present invention. The processor executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing a computer program, and when the computer program is executed by a processor, the method for training a model for parking space detection is implemented.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A model training method for parking space detection is characterized by comprising the following steps:

acquiring a bird's-eye view sample of a parking space, wherein the parking space in the bird's-eye view sample is provided with a coding label, and the coding label is used for representing closed-loop direction information of the parking space; the closed loop direction information is composed of direction information of all pixel points on the parking space line; identifying direction information of the vehicle position line aiming at any vehicle position line of the parking spaces in the bird's-eye view pattern book, and taking an encoding value matched with the direction information as an encoding label of the vehicle position line; combining the coded labels of the parking space lines of the parking spaces to serve as the coded labels of the parking spaces;

inputting the aerial view sample into a preset model, extracting characteristic information of the aerial view sample through the preset model, and identifying the coding information of the parking space in the aerial view sample on the basis of the characteristic information;

comparing the code label of the parking space with the code information of the parking space, and correcting the training parameters in the preset model according to the comparison result so that the code information obtained by processing the aerial view sample by the corrected preset model is matched with the code label;

the code information of the parking spaces comprises code values of all parking space lines of the parking spaces, wherein the code values of all parking space lines comprise a first code value and a second code value, and the first code value and the second code value are respectively output through different channels of the preset model; and if the target parking space line of the parking space is overlapped with the parking space lines of other parking spaces, the first coding value and the second coding value of the target parking space line are output through two independent and different channels respectively.

2. The method of claim 1, wherein if the parking spaces in the bird's eye view sample are irregularly shaped, the method further comprises:

identifying direction information of each pixel point on a parking space line of the parking space, and using a coded value corresponding to the direction information of each pixel point as a coded label of the parking space line; and the direction information of the pixel points represents the tangential direction.

3. The method of claim 1, wherein extracting feature information of the aerial view sample comprises:

generating a plurality of layered features of the aerial view sample, fusing at least two layered features of the plurality of layered features to generate corresponding fused features, and using the generated fused features as feature information of the aerial view sample.

4. The method of claim 1, wherein after identifying the coded information for the slot in the bird's eye pattern, the method further comprises:

and identifying the parking space type of the parking space, wherein the parking space type comprises a parking space or a non-parking space, and outputting the coded information of the parking space through a channel group matched with the parking space type.

5. The method of claim 1, wherein after correcting the training parameters in the pre-set model, the method further comprises:

inputting the aerial view to be detected into the corrected preset model to generate the coding information of the parking space to be detected in the aerial view to be detected;

and determining each parking space line of the parking space to be detected according to the coded information of the parking space to be detected, so as to represent the parking space to be detected through the determined parking space lines.

6. The method according to claim 5, characterized in that the pixel points on each lane of the parking spaces to be detected are determined in the following manner:

determining an initial pixel point located on a parking space line in the aerial view to be detected, and identifying the direction represented by the coding information of the initial pixel point;

determining a searching area in the direction, determining a representative pixel point in the searching area, and taking the representative pixel point as the next pixel point adjacent to the initial pixel point on the parking space line;

and taking the representative pixel point as an initial pixel point, and continuously searching the next pixel point.

7. The method of claim 5, further comprising:

and if a plurality of target parking spaces with overlapping areas are detected in the aerial view to be detected, determining the confidence of each target parking space, and taking the target parking space with the maximum confidence as the detected real parking space.

8. A model training system for parking space detection, the system comprising:

the system comprises a sample acquisition unit, a storage unit and a display unit, wherein the sample acquisition unit is used for acquiring a bird's-eye view sample of a parking space, the parking space in the bird's-eye view sample is provided with a coding label, and the coding label is used for representing closed-loop direction information of the parking space; identifying direction information of the vehicle position line aiming at any vehicle position line of the parking spaces in the bird's-eye view pattern book, and taking an encoding value matched with the direction information as an encoding label of the vehicle position line; combining the coded labels of the parking space lines of the parking spaces to serve as the coded labels of the parking spaces;

the model correction unit is used for comparing the code label of the parking space with the code information of the parking space and correcting the training parameters in the preset model according to the comparison result so that the code information obtained by processing the aerial view sample by the corrected preset model is matched with the code label;