CN113052048A

CN113052048A - Traffic incident detection method and device, road side equipment and cloud control platform

Info

Publication number: CN113052048A
Application number: CN202110290826.6A
Authority: CN
Inventors: 董子超; 董洪义; 时一峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-06-29
Anticipated expiration: 2041-03-18
Also published as: CN113052048B

Abstract

The application discloses a traffic incident detection method and device, roadside equipment and a cloud control platform, relates to data processing, and particularly relates to the fields of artificial intelligence, intelligent traffic, computer vision, deep learning and cloud computing. The specific implementation scheme is as follows: acquiring a scene image of a road scene; determining a target image area contained in a scene image, wherein the target image area is an image area of which the occurrence probability of a traffic incident is greater than or equal to a first threshold; and carrying out traffic incident detection on the target image area to obtain a detection result of whether a traffic incident occurs in the target image area. Therefore, the traffic incident detection in the road scene is realized based on the scene image, and the accuracy of the traffic incident detection is improved.

Description

Traffic incident detection method and device, road side equipment and cloud control platform

Technical Field

The application relates to the field of data processing, in particular to a traffic incident detection method and device, roadside equipment and a cloud control platform, which can be used in the fields of artificial intelligence, intelligent transportation, computer vision, deep learning and cloud computing.

Background

With the gradual improvement of urban road construction and the technical progress of transportation means, the travel distance of people is increased, and the traffic condition of roads brings great influence to the life and work of people.

The traffic conditions of roads are mainly affected by traffic events, such as traffic jams, vehicles not following the signal light indication, traffic offences, etc. Currently, the detection of traffic events is performed by inputting images in surveillance videos into a deep neural network.

Disclosure of Invention

The application provides a traffic incident detection method and device, road side equipment and a cloud control platform.

According to a first aspect of the present application, there is provided a traffic event detection method, comprising:

acquiring a scene image of a road scene;

determining a target image area contained in the scene image, wherein the target image area is an image area with a traffic incident occurrence probability greater than or equal to a first threshold;

and detecting the traffic incident in the target image area to obtain a detection result of whether the traffic incident occurs in the target image area.

According to a second aspect of the present application, there is provided a model training method comprising:

acquiring sample data;

training a feature extraction model according to the sample data;

the sample data comprises a plurality of sample images marked with traffic incident occurrence areas, and the feature extraction model is used for extracting image features.

According to a third aspect of the present application, there is provided a traffic event detection device comprising:

an acquisition unit configured to acquire a scene image of a road scene;

the determining unit is used for determining a target image area contained in the scene image, wherein the target image area is an image area of which the traffic incident occurrence probability is greater than or equal to a first threshold value;

and the detection unit is used for detecting the traffic incident of the target image area to obtain a detection result of whether the traffic incident occurs in the target image area.

According to a fourth aspect of the present application, there is provided a model training apparatus comprising:

an acquisition unit configured to acquire sample data;

the training unit is used for training the feature extraction model according to the sample data;

According to a fifth aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a traffic event detection method as described in the first aspect above or a model training method as described in the second aspect above.

According to a sixth aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the traffic event detection method according to the first aspect described above, or the model training method according to the second aspect described above.

According to a seventh aspect of the present application, there is provided a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform a traffic event detection method as described in the first aspect above or a model training method as described in the second aspect above.

According to an eighth aspect of the present application, there is provided a roadside apparatus including the electronic apparatus according to the fifth aspect described above.

According to a ninth aspect of the present application, a cloud control platform is provided, which comprises the electronic device according to the fifth aspect.

The technology according to the application improves the accuracy of traffic incident detection.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic diagram according to a first embodiment of the present application;

FIG. 3 is a schematic diagram according to a second embodiment of the present application;

FIG. 4 is a schematic illustration according to a third embodiment of the present application;

FIG. 5(a) is a diagram illustrating an example of the distribution of a target image area in a scene image when the second threshold is 1;

FIG. 5(b) is a diagram illustrating an example of the distribution of a target image area in a scene image when the second threshold is 2;

FIG. 6 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 7 is a schematic illustration according to a fifth embodiment of the present application;

FIG. 8 is a schematic illustration according to a sixth embodiment of the present application;

FIG. 9 is a schematic illustration according to a seventh embodiment of the present application;

FIG. 10 is a schematic illustration according to an eighth embodiment of the present application;

FIG. 11 is a block diagram of an electronic device for implementing a traffic event detection method and/or a model training method of embodiments of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The detection of the traffic incident is beneficial to improving the efficiency and the intelligent degree of traffic management. Wherein traffic events such as traffic jams, non-compliance of vehicles or pedestrians with traffic lights, traffic offences, etc.

Generally, the whole monitoring image is input into a deep neural network, image features of the whole monitoring image are extracted through the deep neural network, and a detection result of a traffic incident is obtained based on the extracted image features. This approach enables the detection of traffic events, but does not take into account that traffic events often only occur in a certain small area of a road scene. The whole monitoring image of the road scene is input into the deep neural network, so that the deep neural network is difficult to pay attention to image features associated with the traffic incident, the learning difficulty of the deep learning network is increased, and the detection accuracy of the traffic incident is low.

In order to solve the problems, the application provides a traffic incident detection method and device, roadside equipment and a cloud control platform, which are applied to the fields of artificial intelligence, intelligent transportation, computer vision, deep learning and cloud computing in the field of data processing. In the method and the device, the image area with higher traffic incident occurrence probability is determined in the scene image of the road scene, and the traffic incident is detected based on the determined image area, so that the purpose of improving the accuracy and the efficiency of the traffic incident detection is achieved.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application, where the application scenario is a road scenario. As shown in fig. 1, a road scene is taken as an example of a road intersection, the road scene includes one or more image capturing devices 101 and one or more roadside devices 102 (in the figure, two image capturing devices 101 and two roadside devices 102 are taken as an example), and the image capturing devices 101 and the roadside devices 102 are in communication connection in a wired or wireless manner.

In the application scene, the image acquisition device 101 acquires a scene image of a road scene, transmits the acquired scene image to the roadside device 102, and the roadside device 102 processes the scene image.

Optionally, the image capturing device 101 is a road side camera.

Optionally, as shown in fig. 1, the application scenario further includes a remote server 103. The server 103 communicates with the image capturing apparatus 101 and/or the roadside apparatus 102 through a network. The remote server 103 may receive the scene image sent by the image capturing device 101, or may receive the scene image sent by the roadside device 102, and process the scene image.

In the system architecture of the intelligent transportation vehicle-road cooperation, the roadside device comprises a roadside sensing device and a roadside computing device, the roadside sensing device (such as a roadside camera) is connected to the roadside computing device (such as a roadside computing unit (RSCU)), the roadside computing device is connected to a server device, and the server device can communicate with an automatic driving vehicle or an auxiliary driving vehicle in various modes; in another system architecture, the roadside sensing device itself includes a computing function, and at this time, the roadside sensing device is directly connected to the server device. Wherein, the connection can be wired or wireless; the server device in the application is, for example, a cloud control platform, a vehicle-road cooperative management platform, a central subsystem, an edge computing platform, a cloud computing platform, and the like.

For example, the executing subject of each method embodiment of the present application may be an electronic device, and the electronic device may be a road side device (such as the road side device 102 in fig. 1), or a terminal device, or a server device (such as the server 103 in fig. 1), or a traffic event detection apparatus or device, or another apparatus or device that can execute each method embodiment of the present application.

Further, the executing main body of each method embodiment of the present application is a road side device, for example, a road side sensing device with a computing function, and a road side computing device connected with the road side sensing device. Alternatively, the execution subject of each embodiment of the present application is a server device connected to the roadside computing device, a server device directly connected to the roadside sensing device, or the like.

Fig. 2 is a schematic diagram according to a first embodiment of the present application. As shown in fig. 2, the method for detecting a traffic event according to the present embodiment includes:

s201, obtaining a scene image of a road scene.

The road scene can be a road scene of a road intersection, the road intersection refers to an intersection of two or more roads, where vehicles are collected or vehicles and pedestrians are collected, the traffic condition is complex, and it is necessary to detect a traffic event for the road intersection. The road intersection includes, for example, an intersection, a T-junction, a roundabout entrance/exit, and the like.

In this step, a scene image acquired in real time by an image acquisition device provided at a road scene may be acquired to perform real-time traffic incident detection for the road scene. Alternatively, scene images acquired at historical times by an image acquisition device stored in a preset database and set for a road scene may be acquired to detect traffic events occurring at the historical times at the road scene.

S202, determining a target image area contained in the scene image, wherein the target image area is an image area of which the traffic incident occurrence probability is greater than or equal to a first threshold value.

The first threshold is a preset probability threshold, and can be set by professionals according to experience and experiments. For example, the first threshold is set to 80%, and the target image area is an image area having a probability of occurrence of a traffic event greater than or equal to 80%. Further, the first threshold may be adjusted by the user as desired. For example, the user may adjust the first threshold to 95% in order to determine an image region in the scene image having a temporal occurrence probability greater than or equal to 95%.

In this step, one or more image regions with a traffic event occurrence probability greater than or equal to the first threshold are determined in the scene image, and for the convenience of clear description, the image region with a traffic event occurrence probability greater than or equal to the first threshold is referred to as a target image region. It can be understood that the higher the occurrence probability of the traffic event in the image region, the higher the degree of association between the image region and the traffic event, and the easier the image features related to the traffic event are extracted in the image region. Therefore, when the traffic incident occurrence probability of the target image area is greater than or equal to the first threshold, the degree of association between the target image area and the traffic incident is considered to be high, and when the traffic incident is detected, the image features associated with the traffic time are more easily extracted from the target image area, which is beneficial to improving the detection accuracy of the traffic incident.

In some embodiments, one possible implementation of S202 includes: and determining a target image area with the traffic event occurrence probability being greater than or equal to a first threshold value in the scene image according to the mapping relation between the preset multiple image area types and the traffic event occurrence probability.

The mapping relationship between the multiple image area types and the traffic event occurrence probability is configured by a professional according to experience and experiments, or the traffic event occurrence frequency of each image area type is counted in multiple sample images serving as training data, and the traffic event occurrence probability of each image area type is determined according to the traffic event occurrence frequency of each image area type.

In the mapping relationship between the image area types and the traffic incident occurrence probability, one image area type can correspond to one traffic incident occurrence probability. Image area types include, for example, sidewalk areas, motor lane areas, non-motor lane areas, intersection center areas, and the like.

Specifically, in the scene image, a sub-image area belonging to the image area type is identified according to the image area type. And determining the traffic incident occurrence probability corresponding to the image area type to which the image area belongs as the traffic incident occurrence probability of the image area in the mapping relation between the image area types and the traffic incident occurrence probability. And comparing the occurrence probability of the traffic incident in the image area with a first threshold value, and determining the target image area with the traffic incident occurrence probability greater than or equal to the first threshold value according to the comparison result, thereby improving the accuracy of the traffic incident occurrence probability in the image area.

Optionally, in addition to the possible implementation manner, the target image area may be determined based on the occurrence probability of the traffic event of each pixel, which may be specifically shown in fig. 3 and the subsequent embodiments shown in fig. 3.

S203, detecting the traffic incident in the target image area to obtain a detection result of whether the traffic incident occurs in the target image area.

In this step, after one or more target image areas with traffic incident occurrence probability greater than or equal to a first threshold are determined, traffic incident detection is performed on each target image area to obtain a detection result of whether a traffic incident occurs in the target image area. For example, for each target image area, the traffic event detection is performed on the target image area through the deep neural network model, so as to obtain a detection result of whether the traffic event occurs in the target image area. Therefore, compared with the traffic incident detection of the whole scene image area, the degree of association between each target image area and the traffic incident is higher, the traffic incident detection of the target image areas is carried out, and the detection accuracy of the traffic incident can be effectively improved.

In the embodiment of the application, in a road scene, a target image area with the traffic incident occurrence probability being greater than or equal to a first threshold is determined, and traffic incident detection is carried out on the target image area. Therefore, the process of detecting the traffic incident on the scene image is divided into two stages, the first stage is to screen out a target image area with high correlation degree with the traffic incident before the traffic incident is detected, the second stage is to detect the traffic incident in the target image area, image features related to the traffic incident are easier to extract in the target image area, and the accuracy of detecting the traffic incident is further improved.

Fig. 3 is a schematic diagram according to a second embodiment of the present application. As shown in fig. 3, the method for detecting a traffic event according to the present embodiment includes:

s301, acquiring a scene image of a road scene.

The implementation principle and the technical effect of S301 may refer to the foregoing embodiments, and are not described again.

S302, determining the traffic incident occurrence probability corresponding to each pixel in the scene image.

The probability of the traffic event occurring at the scene position corresponding to the pixel is referred to as the probability of the traffic event occurring at the scene position corresponding to the pixel. The greater the probability of occurrence of a traffic event corresponding to a pixel, the greater the probability of occurrence of a traffic event at the scene location corresponding to the pixel, in other words, the greater the probability of the scene location corresponding to the pixel being located in the event area. Therefore, the probability of occurrence of a traffic event corresponding to a pixel can also be understood as the probability that the pixel is located in the image area where the traffic event occurs.

In the step, the traffic incident occurrence probability is predicted for the scene image, and the traffic incident occurrence probability corresponding to each pixel in the scene image is obtained.

In some embodiments, one possible implementation of S302 includes: carrying out convolution coding processing and feature recovery processing on the scene image to obtain a feature image, determining each pixel value in the feature image as the traffic incident occurrence probability corresponding to the corresponding pixel in the scene image, and realizing prediction of the traffic incident occurrence probability of the pixel.

The image size of the characteristic image is the same as that of the scene image, and pixels in the characteristic image correspond to pixels in the scene image one to one. The characteristic image is a single-channel image, namely, each pixel in the characteristic image corresponds to one pixel value respectively. The pixel value in the feature image is the traffic event occurrence probability corresponding to the pixel of the scene image, which is located at the same position as the pixel value.

Furthermore, the value range of the pixel values in the characteristic image is 0-1. For example, if the pixel value at the pixel position (0, 0) in the feature image is 1, it indicates that the probability of occurrence of the traffic event corresponding to the pixel at the pixel position (0, 0) in the scene image is 1, i.e., 100%. As another example, if the pixel value at the pixel position (2, 1) in the feature image is 0.5, it indicates that the traffic event occurrence probability corresponding to the pixel at the pixel position (2, 1) in the scene image is 50%.

Before the convolutional coding processing and the feature recovery processing are performed on the scene image, the convolutional coding processing and the feature recovery processing can be trained through sample data serving as training data, so that the traffic event occurrence probability corresponding to each pixel in the scene image can be accurately reflected through the pixel values of the feature image obtained by performing the convolutional coding processing and the feature recovery processing on the scene image.

The sample data includes a plurality of sample images. The training process of the convolutional encoding process and the feature recovery process may be executed on the execution subject of this embodiment, or may be executed on other devices, for example, on other servers and computers.

Further, the sample data includes a plurality of sample images marked with traffic incident occurrence areas. For example, a plurality of historical scene images of a road scene are used as a plurality of sample images, whether a traffic event occurs in the sample images is determined manually, and traffic event occurrence areas of the sample images are marked, such as the traffic event occurrence areas are selected on the sample images.

Therefore, during training, for each sample image, the feature image corresponding to the sample image is determined according to the traffic event occurrence area marked on the sample image, and the image size of the feature image corresponding to the sample image is the same as the image size of the sample image. For example, the characteristic image of the sample image is obtained by determining the pixel value of each pixel in the traffic event occurrence area marked on the sample image, which corresponds to the characteristic image of the sample image, as 1, and determining the pixel value of each pixel in the remaining area except the traffic event occurrence area, which corresponds to the characteristic image of the sample image, as 0. After the sample image and the characteristic image corresponding to the sample image are obtained, the sample image can be used as input, the characteristic image corresponding to the sample image is used as a label, and supervised training is performed on convolutional coding processing and characteristic recovery processing. In the training process, the characteristic image corresponding to the sample image is compared with the characteristic image output by the characteristic extraction model, and the characteristic extraction model is adjusted based on the difference value of each pixel between the characteristic image corresponding to the sample image and the characteristic image output by the characteristic extraction model. And the traffic incident occurrence probability corresponding to each pixel in the sample image predicted by the feature extraction model is in the feature image output by the feature extraction model.

In the process of performing convolution coding processing and feature recovery processing on a scene image to obtain a feature image, convolution coding processing may be performed on the scene image to obtain a coded image, where the coded image includes image features related to the occurrence probability of a traffic event. The convolution process is a process of extracting image features at different abstract levels, and more detailed features in a scene image can be lost, so that feature recovery is performed on a coded image subjected to convolution coding, and finally a feature image is obtained. Therefore, the accuracy of the obtained feature image is improved by the convolution encoding processing and the feature recovery processing.

Optionally, the number of the encoded images is multiple, and the image size of the encoded image is smaller than that of the scene image, so as to better extract image features of different image regions of the scene image.

As an example, a feature extraction is performed on a scene image by convolution to obtain a coded image representing a feature coding result of the scene image, and the image size of the coded image is, for example, one sixteenth of the image size of the scene image. And decoding the coded image through deconvolution, so as to realize the characteristic recovery processing of the coded image and obtain a characteristic image with the same size as the image of the scene image.

Further, the scene image is subjected to convolutional coding processing and feature recovery processing through a feature extraction model, so that a feature image is obtained. The feature extraction model is a convolutional neural network model, for example, the feature extraction model includes a plurality of convolutional layers and a plurality of pooling layers, in the convolutional layers, convolutional coding processing is performed on the scene image to obtain a plurality of coded images, and in the pooling layers, feature recovery processing is performed on the coded images to obtain feature images of the scene image.

Before the feature extraction model is used for carrying out convolution coding processing and feature recovery processing on the scene image, the feature extraction model can be trained. The feature extraction model may be trained by sample data as training data. Therefore, when the traffic event occurrence probability corresponding to each pixel in the scene image is predicted through the feature extraction model, the prediction accuracy of the traffic event occurrence probability corresponding to the pixel is favorably improved.

The sample data used for training the feature extraction model may refer to the sample data used for the training convolutional encoding process and the feature recovery process. When the sample data comprises a plurality of sample images marked with traffic incident occurrence areas, determining the characteristic images corresponding to the sample images according to the traffic incident occurrence areas marked on the sample images, wherein the image sizes of the characteristic images corresponding to the sample images are the same as the image sizes of the sample images. The sample image can be used as input, the characteristic image corresponding to the sample image is used as a label, and the characteristic extraction model is subjected to supervised training. The training algorithm of the feature extraction model is, for example, a gradient descent algorithm.

The training process of the feature extraction model may be executed on the execution subject of the embodiment, or may be executed on other devices, for example, on other servers and computers.

And S303, determining a target image area according to the traffic incident occurrence probability corresponding to each pixel and the first threshold value.

In this step, the greater the traffic event occurrence probability corresponding to the pixel, the greater the traffic event occurrence probability of the image area constituted by the pixels. Therefore, after the traffic incident occurrence probability corresponding to each pixel is determined, the target image area with the traffic incident occurrence probability greater than or equal to the first threshold value can be determined in the scene image according to the traffic incident occurrence probability corresponding to each pixel and the first threshold value, so that the accuracy of the target image area is improved based on the traffic incident occurrence probability corresponding to each pixel.

In some embodiments, one possible implementation of S303 includes:

determining a plurality of image areas contained in a scene image; aiming at each image area contained in the scene image, determining the traffic incident occurrence probability corresponding to the image area according to the traffic incident occurrence probability corresponding to each pixel in the image area; and determining the image area with the traffic incident occurrence probability being larger than or equal to a first threshold value as a target image area.

Specifically, a plurality of image regions may be determined in the scene image according to a preset region size. The preset area size may be one or more, for example: the image area size is, for example, one or more of 3 times 3, 9 times 9, 16 times 16 in pixels, or length units (millimeters, centimeters, etc.). When determining the plurality of image regions in the scene image, the determination of the image regions may be performed in a preset order, for example, from left to right, from top to bottom, and the plurality of image regions having the preset region size are determined in the scene image. Alternatively, the image area may be randomly determined in the scene image. After the plurality of image areas are determined, for each image area, for example, the average, mode, or median of the traffic event occurrence probabilities corresponding to all pixels in the image area is determined as the traffic event occurrence probability of the image area.

Optionally, in addition to the possible implementation manner, in S303, the target image area may be determined based on the comparison between the traffic event occurrence probability corresponding to each pixel and the first threshold, which may be specifically shown in fig. 4 and the subsequent embodiments shown in fig. 4.

S304, carrying out traffic incident detection on the target image area to obtain a detection result of whether a traffic incident occurs in the target image area.

The implementation principle and the technical effect of S304 may refer to the foregoing embodiments, and are not described again.

In the embodiment of the application, in the scene image of the road scene, the target image area with the traffic event occurrence probability being greater than or equal to the first threshold is determined based on the traffic event occurrence probability corresponding to each pixel, so that the accuracy of determining the target image area in the scene image is improved, and further, when the traffic event detection is carried out on the target image area, the accuracy of traffic time detection is further improved.

Fig. 4 is a schematic diagram according to a third embodiment of the present application. As shown in fig. 4, the method for detecting a traffic event according to the present embodiment includes:

s401, obtaining a scene image of a road scene.

S402, determining the traffic incident occurrence probability corresponding to each pixel in the scene image.

The implementation process and the technical principle of S401 and S402 may refer to the foregoing embodiments, and are not described again.

And S403, determining the target pixel with the traffic incident occurrence probability being larger than or equal to a first threshold value.

In this step, after the traffic event occurrence probability corresponding to each pixel in the scene image is determined, the traffic event occurrence probability of each pixel is compared with a first threshold, and if the traffic event occurrence probability of a pixel is greater than or equal to the first threshold, the pixel is determined as a target pixel. Therefore, the target pixel with a high traffic event occurrence probability is screened out from all the pixels in the scene image.

Optionally, if there is no pixel in the scene image with the traffic event occurrence probability being greater than or equal to the first threshold, it is determined that there is no target image area in the scene image with the traffic event occurrence probability being greater than or equal to the first threshold, in other words, the probability of detecting a traffic event in the scene image is low, and therefore it is determined that no traffic event occurs in the scene image, so as to improve the detection efficiency of the traffic event. For example, a plurality of roadside cameras are usually arranged in a road scene, and there may be an area monitored by the roadside cameras, which is an area with a low traffic event occurrence probability, and at this time, there may be no pixel with a traffic event occurrence probability greater than or equal to the first threshold in a scene image taken by the roadside camera.

S404, determining a target image area according to the distribution of the target pixels in the scene image.

In this step, the target pixel is a pixel in which the occurrence probability of the traffic event in the scene image is greater than or equal to the first threshold, and therefore, the distribution area of the target pixel in the scene image can be determined according to the distribution of the target pixel in the scene image, and the distribution area of the target pixel in the scene image is determined as the target image area in which the occurrence probability of the traffic event is greater than or equal to the first threshold, so as to improve the accuracy of determining the target image area in the scene image.

In some embodiments, one possible implementation of S404 includes: and in the scene image, carrying out aggregation processing on the target pixels to obtain a target image area, wherein the distance between adjacent target pixels in the target image area is less than or equal to a second threshold value.

Specifically, in the scene image, the target pixels with the distance less than or equal to the second threshold are determined to be located in the same image area, and the aggregation processing of the target pixels is realized. And in one or more image areas obtained after the aggregation processing, the distance between adjacent target pixels is smaller than or equal to a second threshold value. It can be seen that the one or more image areas are distribution areas of the target pixels, and most of the pixels in the one or more image areas are the target pixels. Accordingly, one or more image areas obtained after the aggregation process may be determined as the target image area.

Wherein the unit of distance is a pixel. There may be regions of coincidence between different target image regions.

Optionally, the second threshold is a preset constant value.

Further, the second threshold is 1, that is, the distance between adjacent target pixels in the target image area is less than or equal to 1 pixel, in other words, only two adjacent target pixels are included in the target image area. At the moment, the occurrence probability of the traffic incident corresponding to all the pixels in the target image area is greater than or equal to the first threshold value, so that the accuracy of determining the target image area in the scene image is improved.

Further, the second threshold is greater than 1, that is, the distance between adjacent target pixels in the target image pixel is greater than 1 pixel, in other words, in the target image region, in addition to the target pixels with a larger number, the pixels with the probability of occurrence of the traffic event being smaller than the first threshold are included. In this case, it is fully considered that the image area with a high degree of association with the traffic incident is not necessarily large in the probability of occurrence of the traffic incident corresponding to each pixel, and the recall rate of the target image area in the scene image is improved.

Optionally, the second threshold is a variable related to a total number of pixels of the scene image. In other words, the second threshold may be adjusted according to the total amount of pixels of the scene image, so as to improve the reasonableness of the second threshold, and further determine the accuracy of the target image area in the scene image.

Further, the second threshold increases with the total number of pixels of the scene image, i.e., the more the total number of pixels of the scene image, the larger the second threshold. When the situation that the number of pixels of the scene image is increased is fully considered, the number of pixels, with the traffic incident occurrence probability being smaller than the first threshold value, in the image area with the higher degree of correlation with the traffic incident is correspondingly increased, and the accuracy of determining the target image area in the scene image is improved.

As an example, fig. 5(a) is a diagram illustrating a distribution example of a target image area in a scene image when the second threshold is 1, and fig. 5(b) is a diagram illustrating a distribution example of a target image area in a scene image when the second threshold is 2. In fig. 5(a) and 5(b), each square represents a pixel, and a value in a square is 1, which indicates that the corresponding pixel is a target pixel, and a value in a square is 0, which indicates that the corresponding pixel is a non-target pixel (i.e., a pixel in the scene image whose occurrence probability of the traffic event is less than the first threshold). The diagonally marked area represents a target image area.

As can be seen from fig. 5(a), when the second threshold is 1, only the target pixel is included in the target image region, and the distance between each adjacent target pixel is equal to 1. As can be seen from fig. 5(b), when the second threshold is 2, the target pixel region includes the target pixel and the non-target pixel, and the number of the target pixels is large.

In some embodiments, another possible implementation of S404 includes: determining a plurality of image areas contained in a scene image; determining the number ratio of target pixels in each image area contained in the scene image; and determining the image area with the number of the target pixels larger than or equal to the third threshold as the target image area.

Specifically, a plurality of image regions may be determined in the scene image according to the preset region size, where the number of the preset region sizes may be one or more, and reference may be made to the description of the foregoing embodiments. When determining the plurality of image regions in the scene image, the determination of the image regions may be performed in a preset order, for example, from left to right, from top to bottom, and the plurality of image regions having the preset region size are determined in the scene image. Alternatively, the image area may be randomly determined in the scene image.

After a plurality of image areas included in the scene image are determined, the number proportion of target pixels in the image areas is determined for each image area, if the number proportion of the target pixels in the image areas is larger than or equal to a third threshold value, the occurrence probability of the traffic incident corresponding to the image areas is determined to be larger than or equal to a first threshold value, and the image areas are determined to be the target image areas. The number of target pixels in the image area is the ratio of the number of target pixels in the image area to the total number of pixels in the image area.

Wherein the third threshold is a preset constant value. Therefore, the accuracy of determining the target image area in the scene image can be improved by setting the third threshold with a larger value. The user can adjust the number of the target image areas obtained by screening by adjusting the size of the third threshold. The smaller the third threshold value is, the larger the number of target image areas obtained by screening.

S405, carrying out traffic incident detection on the target image area to obtain a detection result of whether a traffic incident occurs in the target image area.

Wherein, the realization principle and the technical effect of S405 can refer to the description of the foregoing embodiments.

In the embodiment of the application, the traffic event occurrence probability corresponding to each pixel is determined in the scene image of the road scene, the pixel with the traffic event occurrence probability being larger than or equal to the first threshold value is determined as the target pixel, and the target image area is determined according to the distribution of the target pixel, so that the accuracy of determining the target image area in the scene image is improved, and further, when the traffic event detection is carried out on the target image area, the accuracy of traffic time detection is further improved.

In some embodiments, when detecting a traffic event in a target image area to obtain a detection result of whether the traffic event occurs in the target image area, one possible implementation manner (i.e., one possible implementation manner of S203, S304, or S405) includes: and carrying out traffic incident detection on the target image area through the incident detection model to obtain a detection result of whether a traffic incident occurs in the target image area, wherein the incident detection model is used for detecting whether the traffic incident occurs in the image.

The event detection model may be a semantic understanding model for performing semantic understanding on the video or the image, for example, a Temporal Segment Networks (TSN) model may be used.

Specifically, after one or more target image areas are obtained, the scene image may be cropped according to the target image areas to obtain one or more sub-images. The obtained one or more sub-images can be input into the event detection model, the event detection model extracts image features related to the traffic event from the sub-images, and whether the traffic event occurs in the sub-images is determined according to the extracted image features, namely whether the traffic time occurs in the target image area corresponding to the sub-images is determined, so that the accuracy of time detection on the target image area is improved.

Optionally, the areas of different target image areas may be different in size, and before the sub-images corresponding to the target image areas are input into the event detection model, normalization processing may be performed on the sub-images corresponding to the target image areas, so that the sizes of the sub-images corresponding to the target image areas are consistent, and thus the accuracy of the event detection model in detecting traffic events in the sub-images is improved. The sub-images corresponding to the target image areas are normalized, for example, the sub-images corresponding to the target image areas are enlarged or reduced, so that the image sizes of the sub-images of the target image areas are consistent.

Before the traffic event detection is performed on the target image area through the event detection model, the trained event detection model can be obtained or the event detection model can be trained. During training, a sample image marked with a traffic incident occurrence result is obtained, the sample image is used as input, the traffic incident occurrence result of the sample image is used as a label, and the event detection model is subjected to supervised training to obtain a trained time detection model. The training algorithm of the event detection model is, for example, a gradient descent algorithm, and the traffic event occurrence result of the sample image includes whether a traffic event occurs in the sample image.

Optionally, the process of determining the target image region of the scene image through the feature extraction model (i.e., performing convolutional coding processing and feature recovery processing on the scene image to obtain the feature image, determining the occurrence probability of the traffic event corresponding to each pixel in the scene image based on the feature image, and then determining the target image region in the scene image based on the occurrence probability of the traffic event corresponding to each pixel) and the process of performing time detection on the target image region through the event detection model may be combined with each other. In this case, the training process of the feature extraction model and the training process of the event detection model are independent processes, and for example, the training of the feature extraction model and the training of the event detection model may be performed in the same device or different devices, and the sample images involved in the training of the feature extraction model and the training of the event detection model may be the same image or different images. Therefore, the traffic incident detection in the scene image is divided into two stages through the feature extraction model and the time detection model, wherein one stage is a stage of screening out a target image region with high traffic incident occurrence probability from the scene image, and the other stage is a stage of carrying out incident detection on the target image region, so that the detection of image features related to traffic incidents in the incident detection stage is effectively reduced, and the accuracy of the traffic incident detection is improved.

In some embodiments, based on the traffic event detection method shown in any of the foregoing method embodiments, after obtaining the detection result of whether the traffic event occurs in the target image area, the traffic event detection method further includes at least one of: displaying a target image area where a traffic incident occurs; displaying the number of target image areas where traffic events occur; sending a target image area of the traffic incident to a target server and/or a target terminal; and transmitting the number of the target image areas where the traffic events occur to the target server and/or the target terminal.

In an example, when the current device has a display device, for example, the roadside camera is directly connected to the display screen, after it is determined that a traffic event occurs in the target image region, the target image region where the traffic event occurs (for example, a sub-image corresponding to the target image region where the traffic event occurs is displayed, or a scene image is displayed and the target image region where the traffic event occurs is marked in the scene image) and/or the number of the target image regions where the traffic event occurs may be displayed on the display device.

Therefore, compared with the method of only outputting whether the traffic event occurs in the scene image, the method can display the target image area where the traffic event occurs and the number of the target image areas, improves the richness and the understandability (or the interpretability) of the traffic event detection result, is beneficial to a user to know the detailed area where the traffic event occurs in the road scene, and improves the user experience.

Optionally, the current device may also store the target image areas where the traffic events occur in the scene image and/or the number of target image areas where the traffic events occur.

In another example, after the detection result of whether the traffic event occurs in the target image area, the target image area where the traffic event occurs and/or the number of the target image areas where the traffic event occurs may also be sent to the target server and/or the target terminal, so that the target server stores the target image area where the traffic event occurs and/or the number of the target image areas where the traffic event occurs, and/or so that the target terminal stores or displays the target image area where the traffic event occurs and/or the number of the target image areas where the traffic event occurs. The number of the target image areas of the traffic incident and/or the number of the target image areas where the traffic incident occurs are also increased, and a remote user can know the traffic incident occurrence condition of the road scene by accessing the server or viewing the target terminal.

The target server is, for example, a cloud control platform, a vehicle-road cooperative management platform, a central subsystem, an edge computing platform, and a cloud computing platform. The target terminal is a mobile phone, a computer, a tablet computer and the like.

In some embodiments, based on the traffic incident detection method shown in any one of the foregoing method embodiments, after the detection result of whether a traffic incident occurs in the target image area is obtained, if the traffic incident occurs in the target image area is determined, the geographic position corresponding to the target image area where the traffic incident occurs may also be determined, so that the traffic incident detection result is assisted by the geographic position corresponding to the target image area where the traffic incident occurs, the richness and interpretability of the traffic incident detection result are further improved, and a user can accurately position the detailed position where the traffic incident occurs.

Furthermore, after the geographical position corresponding to the target image area where the traffic event occurs is determined, the geographical position can be displayed, and/or the geographical position is sent to the target server and/or the target terminal, so that a user can conveniently obtain the detailed position where the traffic event occurs from the current device or the target server and the target terminal, and the user experience is effectively improved.

According to an embodiment of the present application, a model training method is provided for training a feature extraction model in the foregoing embodiment.

Fig. 6 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 6, the model training method provided in this embodiment includes:

s601, acquiring sample data.

And S602, training the feature extraction model according to the sample data.

In this step, for each sample image, a feature image corresponding to the sample image is determined according to the traffic event occurrence area marked on the sample image, and the image size of the feature image corresponding to the sample image is the same as the image size of the sample image. The sample image can be used as input, the characteristic image corresponding to the sample image is used as a label, and the characteristic extraction model is subjected to supervised training. In the training process, the characteristic image corresponding to the sample image is compared with the characteristic image output by the characteristic extraction model, and the characteristic extraction model is adjusted based on the difference value of each pixel between the characteristic image corresponding to the sample image and the characteristic image output by the characteristic extraction model. And the traffic incident occurrence probability corresponding to each pixel in the sample image predicted by the feature extraction model is in the feature image output by the feature extraction model.

The training algorithm of the feature extraction model is, for example, a gradient descent algorithm.

In some embodiments, one possible implementation of S602 includes: determining a characteristic image corresponding to each sample image according to the traffic incident occurrence area marked on the sample image; and carrying out supervised training on the feature extraction model according to the sample image and the feature image corresponding to the sample image.

Specifically, when the sample data includes a plurality of sample images marked with traffic incident occurrence areas, for each sample image, the feature image corresponding to the sample image is determined according to the traffic incident occurrence area marked on the sample image, and the image size of the feature image corresponding to the sample image is the same as the image size of the sample image. For example, the characteristic image of the sample image is obtained by determining the pixel value of each pixel in the traffic event occurrence area marked on the sample image, which corresponds to the characteristic image of the sample image, as 1, and determining the pixel value of each pixel in the remaining area except the traffic event occurrence area, which corresponds to the characteristic image of the sample image, as 0. And after the characteristic image is obtained, taking the sample image as input, taking the characteristic image corresponding to the sample image as a label, and performing supervised training on the characteristic extraction model.

In the embodiment of the application, based on the sample image, the feature extraction model for extracting the image features is trained so as to improve the accuracy of the occurrence probability of the traffic incident of each pixel point in the predicted image of the feature extraction model and further improve the accuracy of the traffic time detection.

According to the embodiment of the application, the application also provides a traffic incident detection device.

Fig. 7 is a schematic diagram according to a fifth embodiment of the present application. As shown in fig. 7, the traffic event detecting device provided in this embodiment includes:

an acquisition unit 701 configured to acquire a scene image of a road scene;

a determining unit 702, configured to determine a target image region included in the scene image, where the target image region is an image region in which the occurrence probability of the traffic event is greater than or equal to a first threshold;

the detecting unit 703 is configured to perform traffic incident detection on the target image area to obtain a detection result of whether a traffic incident occurs in the target image area.

In one possible implementation, the determining unit 702 includes a first determining module and a second determining module. The first determining module is used for determining the traffic incident occurrence probability corresponding to each pixel in the scene image, and the second determining module is used for determining the target image area according to the traffic incident occurrence probability corresponding to each pixel and the first threshold.

In one possible implementation, the first determination module includes an image processing module and a first determination sub-module. The image processing module is used for carrying out convolution coding processing and feature recovery processing on the scene image to obtain a feature image, and the first determining submodule is used for determining each pixel value in the feature image as the traffic incident occurrence probability corresponding to the corresponding pixel in the scene image.

In one possible implementation, the image processing module is specifically configured to: carrying out convolution coding processing on the scene image to obtain a coded image, wherein the image size of the coded image is smaller than that of the scene image; and performing feature recovery processing on the coded image to obtain a feature image, wherein the image size of the feature image is the same as that of the scene image.

In one possible implementation, the image processing module is specifically configured to: and carrying out convolutional coding processing and feature recovery processing on the scene image through the feature extraction model to obtain a feature image.

In one possible implementation, the second determination module includes a comparison module and a second determination submodule. The comparison module is used for determining a target pixel of which the occurrence probability of the traffic incident is greater than or equal to a first threshold, and the second determination submodule is used for determining a target image area according to the distribution of the target pixel in the scene image.

In a possible implementation manner, the second determining submodule is specifically configured to: and in the scene image, carrying out aggregation processing on the target pixels to obtain a target image area, wherein the distance between adjacent target pixels in the target image area is less than or equal to a second threshold value.

In a possible implementation manner, the second determining submodule is specifically configured to: determining a plurality of sub-image areas contained in a scene image; determining the number ratio of target pixels in the sub-image area; and determining the image area with the number of the target pixels larger than or equal to the third threshold as the target image area.

In one possible implementation, the detection unit 703 includes a detection module. The detection module is used for detecting the traffic event in the target image area through the event detection model to obtain the detection result of whether the traffic event occurs in the target image area, and the event detection model is used for detecting whether the traffic event occurs in the image.

Fig. 8 is a schematic diagram according to a sixth embodiment of the present application. As shown in fig. 8, the traffic event detecting device provided in this embodiment includes:

an acquisition unit 801 for acquiring a scene image of a road scene;

a determining unit 802, configured to determine a target image region included in the scene image, where the target image region is an image region where a traffic event occurrence probability is greater than or equal to a first threshold;

the detecting unit 803 is configured to perform traffic incident detection on the target image area to obtain a detection result of whether a traffic incident occurs in the target image area.

In one possible implementation, the traffic event detection device further comprises at least one of:

a first display unit 804 for displaying a target image area where a traffic incident occurs;

a second display unit 805 for displaying the number of target image areas where traffic events occur;

a first sending unit 806, configured to send a target image area where a traffic event occurs to a target server and/or a target terminal;

a second transmitting unit 807 for transmitting the number of target image areas where the traffic event occurs to the target server and/or the target terminal.

Fig. 9 is a schematic diagram of a seventh embodiment according to the present application. As shown in fig. 9, the traffic event detecting device provided in this embodiment includes:

an acquiring unit 901 configured to acquire a scene image of a road scene;

a determining unit 902, configured to determine a target image region included in the scene image, where the target image region is an image region where a traffic event occurrence probability is greater than or equal to a first threshold;

the detecting unit 903 is configured to perform traffic event detection on the target image area to obtain a detection result of whether a traffic event occurs in the target image area.

And the positioning unit 904 is configured to determine a geographic location corresponding to the target image area where the traffic event occurs.

a third display unit 905 for displaying the geographical position;

a third sending unit 906, configured to send the geographic location to the target server and/or the target terminal.

The object processing apparatus provided in fig. 7, 8, and 9 is configured to execute the corresponding foregoing method embodiments, and the implementation principle and technical effect thereof are similar, and details of this embodiment are not repeated here.

Fig. 10 is a schematic diagram according to an eighth embodiment of the present application. As shown in fig. 10, the model training apparatus provided in this embodiment includes:

an obtaining unit 1001 configured to obtain sample data;

a training unit 1002, configured to train the feature extraction model according to the sample data;

In a possible implementation manner, the training unit 1002 includes:

the determining module is used for determining a characteristic image corresponding to each sample image according to the traffic incident occurrence area marked on the sample image;

and the training module is used for carrying out supervised training on the feature extraction model according to the sample image and the feature image corresponding to the sample image.

The model training apparatus provided in fig. 10 is used for executing the corresponding foregoing method embodiments, and the implementation principle and technical effect thereof are similar, and this embodiment is not described herein again.

According to an embodiment of the present application, there is also provided an electronic device including: at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the scheme as provided by any one of the embodiments described above.

There is also provided, in accordance with an embodiment of the present application, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the aspects provided in any of the embodiments described above.

There is also provided, in accordance with an embodiment of the present application, a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

According to an embodiment of the present application, the present application further provides a roadside apparatus including the electronic apparatus provided by the above-described embodiment.

According to an embodiment of the present application, the present application further provides a cloud control platform, which includes the electronic device provided by the above embodiment.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 11, the electronic device 1100 includes a computing unit 1101, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in electronic device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 1101 performs the various methods and processes described above, such as the traffic event detection method and/or the model training method, for example, which in some embodiments may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1100 via the ROM 1102 and/or the communication unit 1109. When loaded into RAM 1103 and executed by computing unit 1101, may perform one or more steps of the traffic event detection method and/or the model training method described above. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the traffic event detection method and/or the model training method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A traffic event detection method, comprising:

acquiring a scene image of a road scene;

2. The traffic event detection method according to claim 1, wherein the determining a target image area contained in the scene image comprises:

determining the occurrence probability of the traffic incident corresponding to each pixel in the scene image;

and determining the target image area according to the traffic incident occurrence probability corresponding to each pixel and the first threshold value.

3. The traffic event detecting method according to claim 2, wherein the determining the probability of occurrence of the traffic event corresponding to each pixel in the scene image comprises:

carrying out convolution coding processing and feature recovery processing on the scene image to obtain a feature image;

and determining each pixel value in the characteristic image as the traffic incident occurrence probability corresponding to the corresponding pixel in the scene image.

4. The traffic event detection method according to claim 3, wherein the performing convolution encoding processing and feature recovery processing on the scene image to obtain a feature image comprises:

carrying out convolution coding processing on the scene image to obtain a coded image, wherein the image size of the coded image is smaller than that of the scene image;

and performing feature recovery processing on the coded image to obtain a feature image, wherein the image size of the feature image is the same as that of the scene image.

5. The traffic event detection method according to claim 3, wherein the performing convolution encoding processing and feature recovery processing on the scene image to obtain a feature image comprises:

and carrying out convolution coding processing and feature recovery processing on the scene image through a feature extraction model to obtain a feature image.

6. The traffic event detection method according to any one of claims 2-5, wherein the determining the target image area according to the traffic event occurrence probability corresponding to each pixel and the first threshold value comprises:

determining a target pixel with a traffic event occurrence probability greater than or equal to the first threshold;

and determining the target image area according to the distribution of the target pixels in the scene image.

7. The traffic event detection method according to claim 6, wherein said determining the target image area according to the distribution of the target pixel in the scene image comprises:

and in the scene image, carrying out aggregation processing on the target pixels to obtain the target image area, wherein the distance between adjacent target pixels in the target image area is smaller than or equal to a second threshold value.

8. The traffic event detection method according to claim 6, wherein said determining the target image area according to the distribution of the target pixel in the scene image comprises:

determining a plurality of image areas contained in the scene image;

determining the number ratio of the target pixels in each image area contained in the scene image;

and determining the image area of which the number proportion of the target pixels is greater than or equal to a third threshold value as the target image area.

9. The traffic incident detection method according to any one of claims 1 to 5, wherein the detecting the traffic incident on the target image area to obtain a detection result of whether the traffic incident occurs in the target image area comprises:

and carrying out traffic incident detection on the target image area through an incident detection model to obtain a detection result of whether a traffic incident occurs in the target image area, wherein the incident detection model is used for detecting whether the traffic incident occurs in the image.

10. The traffic event detection method according to any one of claims 1-5, further comprising at least one of:

displaying a target image area where a traffic incident occurs;

displaying the number of target image areas where traffic events occur;

sending a target image area of the traffic incident to a target server and/or a target terminal;

and sending the number of the target image areas where the traffic events occur to the target server and/or the target terminal.

11. The traffic event detection method according to any one of claims 1-5, further comprising:

and determining the geographic position corresponding to the target image area where the traffic event occurs.

12. The traffic event detection method of claim 11, further comprising at least one of:

displaying the geographic location;

and sending the geographical position to a target server and/or a target terminal.

13. A model training method, comprising:

acquiring sample data;

training a feature extraction model according to the sample data;

14. The model training method of claim 13, wherein said training a feature extraction model according to said sample data comprises:

for each sample image, determining a characteristic image corresponding to the sample image according to a traffic event occurrence area marked on the sample image;

and carrying out supervised training on the feature extraction model according to the sample image and the feature image corresponding to the sample image.

15. A traffic event detection device, comprising:

an acquisition unit configured to acquire a scene image of a road scene;

16. The traffic event detection device of claim 15, wherein the determination unit comprises:

the first determining module is used for determining the occurrence probability of the traffic incident corresponding to each pixel in the scene image;

and the second determining module is used for determining the target image area according to the traffic incident occurrence probability corresponding to each pixel and the first threshold.

17. The traffic event detection device of claim 16, wherein the first determination module comprises:

the image processing module is used for carrying out convolution coding processing and characteristic recovery processing on the scene image to obtain a characteristic image;

and the first determining submodule is used for determining each pixel value in the characteristic image as the traffic event occurrence probability corresponding to the corresponding pixel in the scene image.

18. The traffic event detection device of claim 17, wherein the image processing module is specifically configured to:

19. The traffic event detection device of claim 17, wherein the image processing module is specifically configured to:

20. The traffic event detection device of any of claims 16-19, wherein the second determination module comprises:

the comparison module is used for determining a target pixel of which the occurrence probability of the traffic incident is greater than or equal to the first threshold;

and the second determining submodule is used for determining the target image area according to the distribution of the target pixel in the scene image.

21. The traffic event detection device of claim 20, wherein the second determination submodule is specifically configured to:

22. The traffic event detection device of claim 20, wherein the second determination submodule is specifically configured to:

determining a plurality of image areas contained in the scene image;

23. The traffic event detection device according to any of claims 15-19, wherein the detection unit comprises:

the detection module is used for detecting the traffic event of the target image area through an event detection model to obtain a detection result of whether the traffic event occurs in the target image area, and the event detection model is used for detecting whether the traffic event occurs in the image.

24. The traffic event detection device of any of claims 15-19, further comprising at least one of:

the first display unit is used for displaying a target image area where a traffic incident occurs;

a second display unit for displaying the number of target image areas where traffic events occur;

the first sending unit is used for sending a target image area of a traffic event to a target server and/or a target terminal;

and the second sending unit is used for sending the number of the target image areas with the traffic events to the target server and/or the target terminal.

25. The traffic event detection device of any of claims 15-19, further comprising:

and the positioning unit is used for determining the geographic position corresponding to the target image area of the traffic incident.

26. The traffic event detection device of claim 25, further comprising at least one of:

a third display unit for displaying the geographical location;

and the third sending unit is used for sending the geographic position to a target server and/or a target terminal.

27. A model training apparatus comprising:

an acquisition unit configured to acquire sample data;

28. The model training apparatus as claimed in claim 27, wherein the training unit comprises:

29. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a traffic event detection method as claimed in any one of claims 1 to 12 or a model training method as claimed in any one of claims 13 to 14.

30. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the traffic event detection method of any one of claims 1-12 or the model training method of any one of claims 13-14.

31. A computer program product comprising a computer program which, when executed by a processor, implements a traffic event detection method as claimed in any one of claims 1 to 12 or a model training method as claimed in any one of claims 13 to 14.

32. A roadside apparatus comprising: the electronic device of claim 29.

33. A cloud-controlled platform, comprising: the electronic device of claim 29.