CN114005104A

CN114005104A - Intelligent driving method and device based on artificial intelligence and related products

Info

Publication number: CN114005104A
Application number: CN202111283225.9A
Authority: CN
Inventors: 艾的梦
Original assignee: Shenzhen Chuang Le Hui Technology Co ltd
Current assignee: Shenzhen Chuang Le Hui Technology Co ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2022-02-01
Also published as: CN113011347B; CN113011347A

Abstract

The embodiment of the application provides an artificial intelligence intelligent driving method, an artificial intelligence intelligent driving device and a related product, wherein the method comprises the following steps: the method comprises the steps of acquiring image frames collected by a camera installed in an automobile when the automobile is detected to run; performing action recognition on a target human body in an image frame acquired by the camera to obtain an action recognition result of the target human body; acquiring road condition data of a road surface on which the automobile currently runs, and determining candidate dangerous actions according to the road condition data; when the dangerous action of the target human body in the candidate dangerous action is determined according to the action recognition result, and the time length for executing the dangerous action exceeds the preset time length, determining the dangerous grade of the dangerous action and the duration time length of the dangerous action; determining an alarm level according to the danger level and the duration; and determining alarm information according to the alarm grade, and outputting the alarm information through the alarm equipment of the automobile.

Description

Intelligent driving method and device based on artificial intelligence and related products

Technical Field

The application relates to the technical field of image processing, in particular to an intelligent driving method and device based on artificial intelligence and a related product.

Background

With the development of traffic systems, the driving of automobiles is becoming more and more popular, and the number of automobiles is also increasing. The driving of the automobile on the road needs to follow certain traffic rules, but because the automobile is operated manually, dangerous traffic accidents may be caused once improper manual operation is implemented.

Disclosure of Invention

The embodiment of the application provides an intelligent driving method and device based on artificial intelligence and a related product, and driving safety is achieved.

An artificial intelligence based intelligent driving method, the method comprising:

the method comprises the steps of acquiring image frames collected by a camera installed in an automobile when the automobile is detected to run;

performing action recognition on a target human body in an image frame acquired by the camera to obtain an action recognition result of the target human body;

acquiring road condition data of a road surface on which the automobile currently runs, and determining candidate dangerous actions according to the road condition data;

when the dangerous action of the target human body in the candidate dangerous action is determined according to the action recognition result, and the time length for executing the dangerous action exceeds the preset time length, determining the dangerous grade of the dangerous action and the duration time length of the dangerous action;

determining an alarm level according to the danger level and the duration;

and determining alarm information according to the alarm grade, and outputting the alarm information through the alarm equipment of the automobile.

Further, the acquiring an image frame collected by a camera installed in the automobile when the automobile is detected to be running includes:

continuously acquiring running speed data of the automobile after the automobile is started;

when the duration that the driving speed data is continuously greater than the first speed threshold exceeds a first duration threshold, starting to acquire image frames acquired by a camera installed in an automobile;

and when the duration of the running speed data which is continuously less than the second speed threshold exceeds the second duration threshold, stopping acquiring the image frames acquired by the camera arranged in the automobile.

Further, the obtaining road condition data of the road surface on which the automobile currently runs and determining candidate dangerous actions according to the road condition data includes:

acquiring road condition data of a road surface on which the automobile currently runs and driving data of the automobile;

determining a first candidate dangerous action according to the road condition data, and determining a second candidate dangerous action according to the driving data;

and obtaining the candidate dangerous action according to the first candidate dangerous action and the second candidate dangerous action.

Further, the determining a first candidate dangerous action according to the traffic data includes:

acquiring the road type, the congestion condition and pedestrian data of the road surface on which the automobile currently runs, and determining a road condition danger coefficient according to the road type, the congestion condition and the pedestrian data;

determining a first candidate dangerous action according to the road condition danger coefficient;

the determining a second candidate dangerous action according to the driving data comprises:

acquiring the continuous running time length and the average running speed of the automobile, and determining a driving risk coefficient according to the continuous running time length and the average running speed;

and determining a second candidate dangerous action according to the driving danger coefficient.

Further, the determining an alarm level according to the risk level and the duration includes:

when the danger level exceeds a preset level and the duration exceeds a preset duration, determining an automobile control parameter according to the danger level and the duration, and controlling the driving performance of the automobile according to the automobile control parameter;

and when the danger level does not exceed a preset level or the duration does not exceed a preset duration, determining an alarm level according to the danger level and the duration.

Further, the motion recognition of the target human body in the image frame acquired by the camera to obtain a motion recognition result of the target human body includes:

extracting spatial interactive characteristics through a spatial flow convolution neural network aiming at the image frames collected by the camera, and extracting global spatial discriminative characteristics by utilizing a bidirectional LSTM;

extracting time interactive features through a time flow convolutional neural network, extracting global time features from the time interactive features through a three-dimensional convolutional neural network, and constructing a time attention model guided by optical flow to calculate global time discriminative features according to the global time features;

performing classification processing according to the global time discriminative feature to obtain a first classification result, and performing classification processing according to the global space discriminative feature to obtain a second classification result;

and fusing the first classification result and the second classification result to obtain a fusion classification result, and obtaining an action recognition result of the target human body according to the fusion classification result.

Further, the extracting the spatial interactivity features through a spatial stream convolutional neural network comprises:

inputting the image frame into a behavior significance detection network model to obtain a detection result, and obtaining a spatial interactivity characteristic according to the detection result;

constructing a mask-guided spatial attention model according to the image frame and the spatial interactive characteristics to obtain spatial discriminative characteristics;

determining a spatial interactivity characteristic according to the temporal attention weight and the spatial discriminative characteristic;

the method comprises the steps of extracting time interactive features through a time flow convolution neural network, extracting global time features from the time interactive features through a three-dimensional convolution neural network, and constructing a time attention model guided by an optical flow to calculate global time discriminative features according to the global time features, and comprises the following steps:

performing optical flow calculation on the shot image through a TVNet network to obtain an optical flow frame;

weighting the obtained optical flow frame according to the spatial attention weight to obtain the time interactive feature;

extracting global time characteristics from the time interactive characteristics through a three-dimensional convolutional neural network;

inputting the global time characteristic into a time attention model guided by optical flow to obtain a time attention weight, and weighting the global time characteristic through the time attention weight to obtain a global time discriminative characteristic;

the method for fusing the first classification result and the second classification result comprises the following steps:

S_r＝((1+C₁^2)/(1+C₂^2))*S₁+(1-((1+C₁^2)/(1+C₂^2)))*S₂

wherein S is₁Representing the first classification result, S₂Representing the second classification result, S_rRepresents the fusion classification result, C₁And C₂Representing a variable defined during the fusion, C₁Less than or equal to C₂。

An intelligent driving apparatus based on artificial intelligence, the apparatus comprising:

the system comprises an image acquisition module, a data acquisition module and a data processing module, wherein the image acquisition module is used for acquiring image frames acquired by a camera arranged in an automobile in the process of detecting that the automobile runs;

the action recognition module is used for carrying out action recognition on a target human body in the image frame acquired by the camera to obtain an action recognition result of the target human body;

the danger identification module is used for acquiring road condition data of a road surface on which the automobile runs currently and determining candidate dangerous actions according to the road condition data;

the time length obtaining module is used for determining the danger level of the dangerous action and the duration of the dangerous action when the dangerous action of the target human body in the candidate dangerous action is determined according to the action recognition result and the time length for executing the dangerous action exceeds the preset time length;

the grade acquisition module is used for determining an alarm grade according to the danger grade and the duration;

and the information output module is used for determining alarm information according to the alarm grade and outputting the alarm information through the alarm equipment of the automobile.

An electronic device comprising a memory having computer-executable instructions stored thereon and a processor that implements the method when executing the computer-executable instructions on the memory.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the above-mentioned method.

According to the intelligent driving method and device based on artificial intelligence and the related products, the action of the driver can be identified according to the image collected by the camera, the dangerous action of the driver is determined according to the current driving road condition data and the identified action, and corresponding alarm information is output according to the execution duration and the danger level of the dangerous action so as to prompt the dangerous action of the user and reduce the driving risk.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below.

FIG. 1 is a flow diagram illustrating an artificial intelligence based intelligent driving method in one embodiment.

Fig. 2 is a schematic structural diagram of an intelligent driving device based on artificial intelligence in one embodiment.

Fig. 3 is a schematic diagram of a network structure for performing motion recognition on a target human body in one embodiment.

FIG. 4 is a diagram illustrating the hardware components of an artificial intelligence based intelligent driving system in one embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

FIG. 1 is a flow diagram illustrating an artificial intelligence based intelligent driving method in one embodiment. The intelligent driving method based on artificial intelligence comprises the following steps:

and 102, acquiring image frames collected by a camera installed in the automobile when the automobile is detected to run.

In the embodiment provided by the application, the camera can be installed in the automobile, the camera keeps an open state in the automobile running process, and the image frame of a driver is collected through the camera. The monitoring of the driver is realized through the image frames shot by the driver.

And 104, performing motion recognition on the target human body in the image frame acquired by the camera to obtain a motion recognition result of the target human body.

The target human body in the image is identified as the human body outline of the driver by identifying the image collected by the driver. And then, performing action recognition on the target human body to obtain an action recognition result.

In this embodiment, the specific algorithm for motion recognition is not limited, and motion recognition of the target human body in the image may be implemented by any method. For example, it is recognized that the driver is performing actions such as "hands off steering wheel", "smoke", "make a call", and the like.

And step 106, acquiring road condition data of the road on which the automobile runs currently, and determining candidate dangerous actions according to the road condition data.

Specifically, the road condition data represents the road condition of the road on which the vehicle is currently driving. For example, the current road is an expressway or an urban road, the congestion degree of the traffic of the current road section, the road surface flatness of the current road, and the like.

According to the road condition data, dangerous actions are determined, for example, when the danger level is higher when the vehicle runs at high speed, the types of the candidate dangerous actions are set to be more. For some urban roads or roads with a low driving speed, the category of the candidate dangerous action setting is less.

And step 108, determining the danger level of the dangerous action and the duration of the dangerous action when the dangerous action of the target human body in the candidate dangerous actions is determined according to the action recognition result and the time length of the executed dangerous action exceeds the preset time length.

After the candidate dangerous action is determined, the identified action of the target human body is compared with the candidate dangerous action, whether the identified action of the target human body is matched with the candidate dangerous action or not is determined, and if the action of the target human body is matched with the candidate dangerous action, the driver is indicated to execute the dangerous action.

When the identified action matches a dangerous action in the candidate dangerous actions, the danger level of the dangerous action may be determined first, and a higher danger level indicates a higher degree of danger for performing the action. The length of time for performing the hazardous action is then determined, and may specifically be determined by multiplying the number of image frames in which the hazardous action occurs by the time interval between two image frames. If the duration of continuously executing the dangerous action exceeds the preset duration, the alarm operation can be executed.

And step 110, determining an alarm level according to the danger level and the duration.

Specifically, the alarm level may be determined based on the hazard level and the duration. Generally, the higher the danger level and the longer the duration, the higher the corresponding alarm level; the lower the risk level and the shorter the duration, the lower the corresponding alarm level. In the present embodiment, the specific correspondence relationship is not limited.

And 112, determining alarm information according to the alarm grade, and outputting the alarm information through the alarm equipment of the automobile.

And determining alarm information according to the determined alarm grade, and outputting the alarm information through the alarm equipment of the automobile to prompt a driver to stop dangerous actions and reduce driving risks.

The artificial intelligence intelligent driving method provided by the embodiment can identify the action of the driver according to the image acquired by the camera, determine the dangerous action of the current driver according to the current driving road condition data and the identified action, and output corresponding alarm information according to the execution duration and the danger level of the dangerous action so as to prompt the dangerous behavior of the user, thereby reducing the driving risk.

In one embodiment, acquiring image frames collected by a camera installed in an automobile during the process of detecting the driving of the automobile comprises: continuously acquiring running speed data of the automobile after the automobile is started; when the duration that the driving speed data is continuously greater than the first speed threshold exceeds a first duration threshold, starting to acquire image frames acquired by a camera installed in an automobile; and when the duration of the running speed data which is continuously less than the second speed threshold exceeds the second duration threshold, stopping acquiring the image frames acquired by the camera arranged in the automobile.

Specifically, after the automobile is started, the running speed data of the automobile can be continuously acquired, for example, the running speed data is 80km/h (kilometer per hour). And judging the state of the automobile according to the driving speed data, and if the driving speed continuously exceeds a certain value, considering that the automobile gradually tends to a stable driving state after starting, starting to acquire images and recognizing the driving action of a driver. If the driving speed is continuously lower than a certain value, the automobile is considered to gradually tend to a parking driving state, the image acquisition can be stopped, and the driving action of the driver can be recognized

In one embodiment, obtaining road condition data of a road on which an automobile is currently driving, and determining candidate dangerous actions according to the road condition data comprises: acquiring road condition data of a road surface on which an automobile currently runs and driving data of the automobile; determining a first candidate dangerous action according to the road condition data, and determining a second candidate dangerous action according to the driving data; and obtaining candidate dangerous actions according to the first candidate dangerous actions and the second candidate dangerous actions.

Specifically, road condition data of the road surface and driving data of the vehicle may also be obtained, and the driving data of the vehicle may specifically represent data such as driving speed, driving duration, driving distance, and the like of the vehicle, which is not limited herein. And then obtaining a first candidate dangerous action according to the road condition data, obtaining a second candidate dangerous action according to the driving data, and then obtaining a final result.

In one embodiment, determining a first candidate risky action based on the traffic data comprises: acquiring the road type, the congestion condition and pedestrian data of a road surface on which the automobile currently runs, and determining a road condition danger coefficient according to the road type, the congestion condition and the pedestrian data; determining a first candidate dangerous action according to the road condition danger coefficient; determining a second candidate hazardous action from the driving data, comprising: acquiring the continuous running time length and the average running speed of the automobile, and determining a driving risk coefficient according to the continuous running time length and the average running speed; and determining a second candidate dangerous action according to the driving danger coefficient.

Further, the road condition data may include a road type, a congestion condition and pedestrian data, and then the road condition risk factor of the currently driving road surface is determined according to the road type, the congestion condition and the pedestrian data. The driving data includes a continuous driving time period and an average driving speed of the vehicle, and the driving risk coefficient is determined according to the continuous driving time period and the average driving speed of the vehicle. And finally, obtaining the final candidate dangerous action according to the road condition danger coefficient and the driving danger coefficient. The higher the risk factor, the more dangerous actions are ultimately determined.

In one embodiment, determining the alert level based on the hazard level and the duration includes: when the danger level exceeds a preset level and the duration exceeds a preset duration, determining automobile control parameters according to the danger level and the duration, and controlling the driving performance of the automobile according to the automobile control parameters; and when the danger level does not exceed the preset level or the duration does not exceed the preset duration, determining the alarm level according to the danger level and the duration.

It is understood that when the danger level and the duration exceed certain values, the vehicle parameters can be appropriately controlled according to the danger level and the duration, thereby controlling the drivability of the vehicle, and thus forcibly reducing the danger level of driving.

In one embodiment, the motion recognition of the target human body in the image frames acquired by the camera to obtain the motion recognition result of the target human body includes: extracting spatial interactive characteristics through a spatial flow convolution neural network aiming at an image collected by a camera, and extracting global spatial discriminative characteristics by utilizing a bidirectional LSTM; extracting time interactive characteristics through a time flow convolutional neural network, extracting global time characteristics from the time interactive characteristics through a three-dimensional convolutional neural network, and constructing a time attention model guided by an optical flow to calculate global time discriminative characteristics according to the global time characteristics; performing classification processing according to the global time discriminative features to obtain a first classification result, and performing classification processing according to the global space discriminative features to obtain a second classification result; and fusing the first classification result and the second classification result to obtain a fusion classification result, and obtaining an action recognition result of the target human body according to the fusion classification result.

Specifically, the motion recognition process mainly obtains motion features of human body motion according to temporal features and spatial features of continuous images. And then, obtaining a final action recognition result through an action recognition result obtained by temporal characteristic recognition and an action recognition result obtained by spatial characteristic recognition. The motion recognition structure obtained in this way can integrate the temporal and spatial characteristics of human motion to obtain the final recognition result.

Specifically, the method for extracting the spatial interactivity features through the spatial stream convolutional neural network comprises the following steps:

inputting image frames into a behavior significance detection network model to obtain a detection result, and obtaining a spatial interactivity characteristic according to the detection result;

extracting time interactive features through a time flow convolutional neural network, extracting global time features from the time interactive features through a three-dimensional convolutional neural network, and constructing a time attention model guided by optical flow to calculate global time discriminative features according to the global time features, wherein the method comprises the following steps:

weighting the obtained optical flow frame according to the spatial attention weight to obtain a time interactive characteristic;

extracting global time characteristics from the time interactive characteristics through a three-dimensional convolution neural network;

inputting the global time characteristic into a time attention model guided by optical flow to obtain a time attention weight, and weighting the global time characteristic through the time attention weight to obtain a global time discriminant characteristic;

the method for fusing the first classification result and the second classification result is as follows:

S_r＝((1+C₁^2)/(1+C₂^2))*S₁+(1-((1+C₁^2)/(1+C₂^2)))*S₂

wherein S is₁Denotes the first classification result, S₂Denotes the second classification result, S_rRepresents the fusion classification result, C₁And C₂Representing a variable defined during the fusion, C₁Less than or equal to C₂。

In the embodiment provided by the present application, a network structure for performing motion recognition on a target human body is shown in fig. 3, and the motion recognition method specifically may include the following steps:

1) acquiring RGB captured images in a continuous image stream: obtaining an original RGB captured image

Where N is the number of frame samples, f_iRepresenting the ith frame.

2) Calculating a light flow graph: image F shot from RGB by applying TVNet network_RGBCalculating pairwise to obtain a light flow graph

o_iRepresenting the ith optical flow frame.

3) Training a specific behavior significance detection network model based on Mask R-CNN segmentation technology, and taking each original shot image F_RGBGenerating a detection image for input

Then, the output form is modified to obtain the space interactive characteristic

4) Image F taken with original RGB_RGBAnd spatial interactivity features M_RGBConstructing a mask-guided spatial attention model for input, and calculating a spatial attention weight W_SGenerating spatially discriminative features K by attention weighting_RGB。

5) Weighting W the spatial attention calculated in the step 4)_SAnd optical flow frame F_OPTWeighting and calculating the time interactive characteristics I_OPT。

6) By means of a temporal interactivity characteristic I_OPTFor input, a three-dimensional convolutional neural network is used to extract the global temporal features G_OPT。

7) By global temporal features G_OPTFor input, a time attention model guided by optical flow is constructed, and a time attention weight W is calculated_tGenerating a global time-discriminative feature GK by attention weighting_OPT。

8) The time attention weight W calculated in the step 7) is used_tCharacteristic K distinguishable from space_RGBWeighting and calculating the space interactive characteristics I_RGB。

9) By spatial interactivity features I_RGBFor input, further extracting global space discriminative characteristics GK based on a bidirectional long-time and short-time memory network_RGBAnd then calculating a first classification result, namely a space probability score S through the full connection layer and Softmax classification₁。

10) With global time discriminative feature GK_OPTFor input, calculating a second classification result, i.e. a time probability score S, through the full connection layer and the Softmax classification₂。

11) Score the spatial probability S₁And the time probability S₂The scores are fused to generate a final predicted result score S_r。

The third step of the above process is directed to detecting the image

Modifying its output form, calculating local mask characteristic diagram

That is, only the detected discrimination area is left, and the pixel tone value of the remaining image area is set to 0. The calculation process is represented as (formula 1).

Wherein (p, q) represents a pixel value of a pixel point whose position is (p, q). For example, the data sets each contain different objects and human bodies. The foreground and background of each inspection image are separated by computing a local mask feature map.

RGB image frame F in the above process_RGBAnd spatial interactivity features M_RGBFor input, a spatial attention model guided with a mask is constructed. Each space interactivity feature m_iEach RGB image frame f is passed through an L-Net network_iThrough a G-Net network. L-Net and G-Net have the same network structure, but the network parameters are not shared with each other. These two networks each generate a respective signature, denoted F_L，F_G. The execution process of L-Net and G-Net can be expressed by the following mathematical forms (equation 2) to (equation 5):

I_i＝Inc(m_i) (formula 2)

F_L＝GAP(I_i) (formula 3)

G_i＝Inc(f_i) (formula 4)

F_G＝GAP(G_i) (formula 5)

Wherein, F_LAnd F_GRespectively representing a local feature and a global feature; inc for the inclusion v3 network; GAP represents global average pooling, and for a feature with one dimension of W × H × C, output with the dimension of 1 × 1 × C can be obtained through the global average pooling, namely, global information of each feature channel is obtained. Then the two characteristics are connected in series along the channel as F, and the formula is shown in the specification

The representation channels are connected in series, and richer feature representation is obtained.

Taking F as an input, constructing a spatial attention model to re-weight F to obtain a weighted feature map, wherein the weighting process can be described by the following formula:

W_S1＝γ(FC_S1(GAP (F)) (equation 7)

W_S＝σ(FC_S2(W_S1) Equation 8)

K_RGB＝F⊙W_S(formula 9)

Where γ denotes a ReLU activation function, σ denotes a Sigmoid activation function, FC_S1，FC_S2Represents two fully connected layers; GAP represents global average pooling; an indication of channel level multiplication; after passing through GAP, W_S1Has an output size of

Final weight W_SHas an output size of

Weighting spatial attention W_SAnd carrying out weighted multiplication with the original characteristic F to selectively highlight the valid characteristic and weaken the invalid characteristic.

The above-mentioned flow 7) with a global temporal feature G_OPTFor input, a temporal attention model guided by optical flow is constructed. The calculation of the temporal attention weight is converted into a calculation of the channel attention. Then, dimensions of the feature map are changed and global average pooling is performed, compressing all information into channel descriptors whose statistics can represent the entire video. This process of global average pooling can be expressed as:

wherein W and H represent the width and height, respectively, and o represents the number of channels. And inputting the compressed feature diagram into a network consisting of two fully connected layers so as to obtain the mutual dependence on time. The size of the second full connection layer is consistent with the channel number o of the input feature graph, and the newly learned weight and the original feature G are combined_OPTPerforms channel-level multiplication:

W_t1＝γ(FC_t1(F_g') Equation 11

W_t＝σ(FC_t2(W_t1) Equation 12

Wherein, W_tRepresenting temporal attention weight; gamma denotes the ReLU activation function and sigma denotes the Sigmoid activation function; FC_t1，FC_t2Two fully connected layers are shown.

In step 11) of the above flow, the method for fusing the first classification result and the second classification result is as follows:

S_r＝((1+C₁^2)/(1+C₂^2))*S₁+(1-((1+C₁^2)/(1+C₂^2)))*S₂(formula 14)

Wherein S is₁Denotes the first classification result, S₂Denotes the second classification result, S_rRepresenting fusion classificationsAs a result, C₁And C₂Representing a variable defined during the fusion, C₁Less than or equal to C₂。C₁And C₂The variables may be empirically set or may be set in advance, and are not limited herein.

Fig. 3 is a schematic structural diagram of an intelligent driving device based on artificial intelligence in one embodiment. As shown in fig. 3, the intelligent driving apparatus based on artificial intelligence includes:

the image acquisition module 302 is configured to acquire an image frame acquired by a camera mounted in an automobile when the automobile is detected to be running;

the action recognition module 304 is configured to perform action recognition on a target human body in the image frame acquired by the camera to obtain an action recognition result of the target human body;

the danger identification module 306 is configured to obtain road condition data of a road on which the automobile currently runs, and determine candidate dangerous actions according to the road condition data;

a duration obtaining module 308, configured to determine a danger level of the dangerous action and a duration of the dangerous action when it is determined that the dangerous action of the candidate dangerous action is executed by the target human body according to the action recognition result, and the duration of the dangerous action exceeds a preset duration;

a grade obtaining module 310, configured to determine an alarm grade according to the risk grade and the duration;

and an information output module 312, configured to determine alarm information according to the alarm level, and output the alarm information through an alarm device of the automobile.

The artificial intelligence intelligent driving device provided by the embodiment can identify the action of the driver according to the image collected by the camera, determine the dangerous action of the current driver according to the current driving road condition data and the identified action, and output corresponding alarm information according to the execution duration and the danger level of the dangerous action so as to prompt the dangerous action of the user, thereby reducing the driving risk.

FIG. 4 is a diagram illustrating the hardware components of an artificial intelligence based intelligent driving system in one embodiment. It will be appreciated that fig. 4 only shows a simplified design of the electronic device. In practical applications, the electronic devices may further include necessary other components, including but not limited to any number of input/output systems, processors, controllers, memories, etc., respectively, and all electronic devices that can implement the method for managing big data across cloud platforms according to the embodiments of the present application are within the scope of the present application.

The memory includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), which is used for storing instructions and data.

The input system is for inputting data and/or signals and the output system is for outputting data and/or signals. The output system and the input system may be separate devices or may be an integral device.

The processor may include one or more processors, for example, one or more Central Processing Units (CPUs), and in the case of one CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor may also include one or more special purpose processors, which may include GPUs, FPGAs, etc., for accelerated processing.

The memory is used to store program codes and data of the network device.

The processor is used for calling the program codes and data in the memory and executing the steps in the method embodiment. Specifically, reference may be made to the description of the method embodiment, which is not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable system. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An artificial intelligence based intelligent driving method, characterized in that the method comprises:

determining an alarm level according to the danger level and the duration;

determining alarm information according to the alarm grade, and outputting the alarm information through alarm equipment of the automobile;

the motion recognition of the target human body in the image frame acquired by the camera to obtain the motion recognition result of the target human body includes:

2. The method according to claim 1, wherein the acquiring image frames collected by a camera installed in the automobile during the detection of the driving of the automobile comprises:

3. The method of claim 1, wherein the obtaining road condition data of a road surface on which the vehicle is currently traveling and determining candidate dangerous actions according to the road condition data comprises:

4. The method of claim 3, wherein determining a first candidate risky action based on the traffic data comprises:

5. The method of claim 1, wherein said determining an alarm level based on said hazard level and said duration comprises:

6. The method of claim 1, wherein the extracting spatial interactivity features through a spatial stream convolutional neural network comprises:

S_r＝((1+C₁^2)/(1+C₂^2))*S₁+(1-((1+C₁^2)/(1+C₂^2)))*S₂

7. An intelligent driving device based on artificial intelligence, the device comprising:

the information output module is used for determining alarm information according to the alarm grade and outputting the alarm information through the alarm equipment of the automobile;

the action recognition module is used for carrying out action recognition on a target human body in an image frame acquired by the camera to obtain an action recognition result of the target human body, and comprises: extracting spatial interactive characteristics through a spatial flow convolution neural network aiming at the image frames collected by the camera, and extracting global spatial discriminative characteristics by utilizing a bidirectional LSTM; extracting time interactive features through a time flow convolutional neural network, extracting global time features from the time interactive features through a three-dimensional convolutional neural network, and constructing a time attention model guided by optical flow to calculate global time discriminative features according to the global time features; performing classification processing according to the global time discriminative feature to obtain a first classification result, and performing classification processing according to the global space discriminative feature to obtain a second classification result; and fusing the first classification result and the second classification result to obtain a fusion classification result, and obtaining an action recognition result of the target human body according to the fusion classification result.

8. An electronic device comprising a memory having computer-executable instructions stored thereon and a processor that, when executing the computer-executable instructions on the memory, implements the method of any of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the method of any one of claims 1 to 6.