CN110764507A - Artificial intelligence automatic driving system for reinforcement learning and information fusion - Google Patents

Artificial intelligence automatic driving system for reinforcement learning and information fusion Download PDF

Info

Publication number
CN110764507A
CN110764507A CN201911079929.7A CN201911079929A CN110764507A CN 110764507 A CN110764507 A CN 110764507A CN 201911079929 A CN201911079929 A CN 201911079929A CN 110764507 A CN110764507 A CN 110764507A
Authority
CN
China
Prior art keywords
module
vehicle
reinforcement learning
information fusion
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911079929.7A
Other languages
Chinese (zh)
Inventor
舒子宸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201911079929.7A priority Critical patent/CN110764507A/en
Publication of CN110764507A publication Critical patent/CN110764507A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0234Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using optical markers or beacons
    • G05D1/0236Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using optical markers or beacons in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Electromagnetism (AREA)
  • Optics & Photonics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to an artificial intelligence automatic driving system for reinforcement learning and information fusion, which comprises an environment sensing system module, a central processor module and a vehicle control system module which are connected with each other pairwise, wherein the central processor module is used for comprehensively utilizing information transmitted by the environment sensing system module, making a processing decision and transmitting a control signal to the vehicle control system module; the central processor module adopts a reinforcement learning mechanism and an information fusion mechanism, the reinforcement learning mechanism is used for learning reasonable control measures according to historical driving conditions, the information fusion mechanism is used for processing and processing information collected by different signal sources to make corresponding decisions, and information fusion is carried out on obtained different decision results. The artificial intelligent automatic driving system for reinforcement learning and information fusion can safely, efficiently and accurately realize unmanned driving of vehicles, fully utilize road resources and reduce accidents.

Description

Artificial intelligence automatic driving system for reinforcement learning and information fusion
Technical Field
The invention belongs to the technical field of automatic driving, and relates to an artificial intelligence automatic driving system for reinforcement learning and information fusion.
Background
An automatic vehicle (Self-steering automatic vehicle) is also called an unmanned vehicle, a computer-driven vehicle or a wheeled mobile robot, and is an intelligent vehicle which realizes unmanned driving through a computer system. The beginning of the 20 th century appeared and in the beginning of the 21 st century a trend toward near practicality was presented.
The automatic driving automobile depends on the cooperation of artificial intelligence, visual calculation, radar, monitoring device and global positioning system, so that the computer can operate the motor vehicle automatically and safely without any active operation of human. In the middle and last 12 months in 2014, *** shows the finished automatic-driving original vehicle for the first time, the vehicle can run in full function, in 2015, in 5 months, *** announces that the automatic-driving vehicle is to be tested on roads in mountain city, california in summer of the same year.
The automobile automatic driving technology mainly relies on a video camera, a radar sensor and a laser range finder to know the surrounding traffic conditions, and a detailed map (a map collected by a person driving an automobile) is used for navigating the road in front, all of which are realized through a data center of ***, and the data center of *** can process a large amount of information about surrounding terrain collected by the automobile. In this regard, the autonomous driving vehicle is equivalent to a remote control vehicle or an intelligent vehicle of a *** data center, and the automotive autonomous driving technology is one of the applications of the internet of things.
Four phases of unmanned driving are distinguished according to the level of automation: driving assistance, partial automation, high automation and full automation:
1) driving Assistance System (DAS): the aim is to provide assistance to the driver, including providing important or beneficial driving-related information, and issuing clear and concise warnings when situations begin to become critical, such as "lane departure warning" (LDW) systems and the like;
2) partial automation system: systems that can automatically intervene when the driver receives a warning but fails to take appropriate action in time, such as "automatic emergency braking" (AEB) systems and "emergency lane assist" (ELA) systems;
3) highly automated systems: systems that can take on the responsibility of operating the vehicle instead of the driver for a longer or shorter period of time, but still require the driver to monitor the driving activity;
4) a fully automated system: this level of automation allows passengers to engage in computer work, rest and sleep and other entertainment.
The control performance of a traditional automatic driving system is improved by directly adjusting parameters of a controller, a vehicle is used as a time-lag system with relatively large time constant and relatively sensitive vehicle performance, basic performance index samples of the control system such as controller adaptability, control accuracy and system response speed are difficult to consider, and the comfort of automatic driving is more difficult to guarantee. Even if a plurality of sets of controller parameters are used, the adjustable dimensionality is still limited, most of the controller parameters are not intuitive enough and are not easy to understand, and the increase of the controller parameters causes great difficulty in aspects of software development and verification, field debugging, after-sale maintenance and the like. Meanwhile, along with the accumulation of the running time of the vehicle, the performance of the vehicle can generate chronic drift along with the time, so that the control effect of the vehicle-mounted controller is influenced, and the controller is more difficult to operate the control of the train.
Therefore, the research on the automatic driving system which is easy to control, high in accuracy and good in comfort has very important significance.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides an artificial intelligence automatic driving system for reinforcement learning and information fusion.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
an artificial intelligence automatic driving system for reinforcement learning and information fusion comprises an environment sensing system module, a central processor module and a vehicle control system module which are connected with each other in pairs;
the central processing unit module is used for comprehensively utilizing the information transmitted by the environment sensing system module, making a processing decision and transmitting a control signal to the vehicle control system module;
the central processor module adopts a reinforcement learning mechanism and an information fusion mechanism, the reinforcement learning mechanism is used for learning reasonable control measures according to historical driving conditions, the information fusion mechanism is used for processing and processing information collected by different signal sources to make corresponding decisions, and information fusion is carried out on obtained different decision results.
As a preferred technical scheme:
the artificial intelligence automatic driving system with the reinforcement learning and information fusion is characterized in that the environment sensing system module is mainly used for collecting external environment information of a vehicle for processing and transmitting the external environment information to the central processing unit module and the vehicle control system module, the module is provided with a distributed CPU (central processing unit), the sampling frequency and the sampling content can be adjusted according to the change of the external environment, a standby sensor is started, and when an emergency signal appears, a control signal can be directly transmitted to the vehicle control system to take an emergency evasion measure; the environment perception system module consists of a perception processor, a nearby vehicle perception and transmission system, a digital map system, a camera system, a laser ranging system, a radar ranging system and a positioning system which are connected with the perception processor; the nearby vehicle sensing and transmitting system is mainly used for collecting vehicle communication signals sent by nearby vehicles, and comprises: global positioning position, road occupation condition, speed and vehicle state, and broadcasting the signal of the vehicle for other unmanned vehicles to collect and utilize; the digital map system mainly utilizes the information of the digital map to master the driving region information, road information, weather information and the like of the vehicle; the camera subsystem is mainly used for collecting external image information and providing image data for the decision of the central processing unit; two different ranging modes, namely laser ranging and radar ranging, are adopted, so that a mutual backup effect can be achieved, and the laser ranging system and the radar ranging system are mainly used for measuring information such as the distance between a vehicle and a roadside; the positioning system is mainly used for grasping the current running position information of the vehicle by utilizing the information of the global positioning system and combining a digital map. The system comprises a near vehicle sensing and transmitting system, a camera system, a laser ranging system, a radar ranging system and a positioning system, wherein the near vehicle sensing and transmitting system, the camera system, the laser ranging system, the radar ranging system and the positioning system respectively comprise an external sensor circuit system and an internal software processing system, and the digital map subsystem only comprises the internal software processing system.
The artificial intelligence automatic driving system for reinforcement learning and information fusion mainly comprises a vehicle control system module, a light whistle control system, an angle control system, a speed control system, a path tracking system and a stability control system, wherein the vehicle control system module mainly comprises a motion processor, and the light whistle control system, the angle control system, the speed control system, the path tracking system and the stability control system which are connected with the motion processor; the system comprises a lighting whistle system, an environment perception system module and a traffic light monitoring module, wherein the lighting whistle system is used for controlling lighting and whistle according to current road condition information, and for road sections which cannot whistle, the system can utilize a camera system in the environment perception system module to collect and mark; the module is provided with a distributed CPU, the current vehicle control condition is in fault, the distributed CPU can carry out emergency safety processing according to the current vehicle running condition, and the distributed CPU does not receive a control signal sent by the central processing unit.
Set target data output variable as XqAnd establishing a data quality model by using principal component regression:
Xq=TθT+F;
wherein theta is a principal component regression model coefficient, and F is a model error;
converting the target output data into the principal component set value by using a formula
tsp=xqspT)f
Wherein, tspIs a set value of the pivot score, xqspIs the data quality set point, (θ)T)fIs thetaTThe generalized inverse of (1);
t=xP;
wherein x is a process variable and an operation variable at the current time;
definition of Δ t ═ tspT is the error of the pivot setting from the pivot at the current time, which can be mapped to X space
ΔX=ΔtPT
The above formula can also be written
[ΔXm|Δu]=ΔtPT
When the pivot model is correct, Δ u in the above formula is a change of the manipulated variable, i.e., a control amount;
the invention mainly adopts a Principal Component Analysis (PCA) method, and carries out dimension reduction processing on X, wherein the specific algorithm is as follows:
(1) and (3) data standardization treatment: the multi-information acquisition module acquires the working state of the GIS system in the period:
X1=(x1,x2...xn)T
writing the states of m cycles in the form of a state matrix:
Xm=(X1,X2...Xn);
mixing XmCarrying out normalization treatment to obtain:
Figure BDA0002263639280000031
wherein
Figure BDA0002263639280000032
Represents XmMean value of (1), σ represents XmStandard deviation of (d);
(2) singular value decomposition
Performing singular value decomposition on the covariance matrix:
Figure BDA0002263639280000041
wherein the content of the first and second substances,
(3) taking main element elements:
taking the first k principal elements of Λ as analysis elements, and taking the vectors of the corresponding first k U matrixes:
P=(u1,u2...uk);
(4) obtaining a dimensionality reduced form of X:
wherein T ═ XP.
The artificial intelligence automatic driving system with reinforcement learning and information fusion comprises a reinforcement learning module connected with a sensing processor, wherein the reinforcement learning module is used for processing information collected by an adjacent vehicle sensing and transmitting system, a digital map system, a camera system, a laser ranging system, a radar ranging system and a positioning system to obtain camera data, current vehicle speed, automobile driving direction, positioning data, map data, destination coordinates, distance to nearby obstacles and vehicle driving speed.
The work flow of reinforcement learning is mainly divided into five modules: external environment perception, a neural network parameter server, a reinforcement learning algorithm, a deep learning network and a historical database, wherein the main algorithm steps are as follows:
(1) the external environment perception mainly establishes the connection between the environment and the deep learning network, and communicates the external environment with the internal learning model;
(2) the reinforcement learning algorithm is input into a historical database as a data source to train the algorithm, a heuristic mechanism and a reward mechanism are adopted, each operation is carried out by pushing in a random action mode, and each pushing is evaluated; if the propulsion effect of the heuristic is better, the weight of the heuristic is increased in a reward mode; and if the mode of the trial is poor, punishing. Thus, excellent propulsion modes are recorded, the trial mode evaluated as poor is removed, the deep learning model parameters trained by the reinforcement learning algorithm are output to a neural network parameter server, and the neural network structure is output to a deep learning network;
(3) the deep learning network and the neural network parameter server form a learner system, the system is an implementation form of a reinforcement learning algorithm, learns a good propulsion mode in reinforcement learning, performs modeling, and makes an optimal decision according to the current state of an object;
(4) continuously recording the current decision, the state and the final result by the historical database to prepare for model training;
(5) the reinforcement learning mechanism constructs feedback of an external environment and an internal heuristic propulsion mode;
(6) the machine is used in all algorithms of a central processing unit, and parameters of the algorithms are continuously adjusted to continuously optimize the parameters of the algorithms.
The artificial intelligence automatic driving system with reinforcement learning and information fusion further comprises a multi-agent coordination module connected with the reinforcement learning module and the motion processor, wherein the multi-agent coordination module is used for processing the camera data, the current vehicle speed and the automobile driving direction to obtain the vehicle target vector speed.
The artificial intelligence automatic driving system with reinforcement learning and information fusion further comprises a path decision module connected with the reinforcement learning module and the motion processor, wherein the path decision module is used for processing positioning data, map data and destination coordinates to obtain a vehicle running path.
The artificial intelligent automatic driving system with the reinforcement learning and information fusion functions further comprises an anti-collision algorithm module connected with the reinforcement learning module and the motion processor, wherein the anti-collision algorithm module is used for processing the distance between the central processing unit and a nearby obstacle and the driving speed of a vehicle to obtain whether braking is needed or not.
The artificial intelligence automatic driving system for reinforcement learning and information fusion further comprises a conflict reduction module connected with the multi-agent coordination module, the anti-collision algorithm module and the motion processor, wherein the conflict reduction module is used for comparing the outputs of the multi-agent coordination module and the anti-collision algorithm module and selecting one of the outputs.
The artificial intelligence automatic driving system for reinforcement learning and information fusion further comprises an information fusion module connected with the multi-agent coordination module, the path decision module, the conflict reduction module, the anti-collision algorithm module and the motion processor, wherein the information fusion module is used for integrating the outputs of the multi-agent coordination module, the path decision module, the conflict reduction module and the anti-collision algorithm module and outputting the vehicle target vector speed.
The module mainly fuzzifies input quantity according to the fuzzy rule of the TS fuzzy model by the following main mechanism:
(1.1) fuzzy rule:
contribution component y of ith TS fuzzy rule to system outputi(k+1) The statement "If … Then" can be used as follows:
Figure BDA0002263639280000051
wherein c is the number of fuzzy rules, n is the number of input variables of the TS fuzzy model, and x1(k),x2(k),…,xn(k) X (k) is a regression variable of the input/output data at the k-th time and before, x (k) is [ < x >1(k),x2(k),…,xn(k)]In order to blur the input vector of the model,
Figure BDA0002263639280000052
a fuzzy set with a linear membership function, representing each fuzzy subspace, can be used to perform fuzzy inference of the ith rule,
Figure BDA0002263639280000053
a back-part parameter of the ith fuzzy rule;
(1.2) output calculation:
definitions βiTo blur the fitness of rule i, the output y (k +1) of the model at time (k +1) can be calculated by the following formula:
Figure BDA0002263639280000061
defining:
Figure BDA0002263639280000062
where r ═ c (n +1), one can obtain:
y(k+1)=Φ(k)TΘ(k);
the information fusion mechanism mainly solves the problem that the environment perception system and the vehicle control system are different in decision, the environment perception system obtains the decision 1 of the system through feature extraction and distributed decision 1, the vehicle control system obtains the decision 2 of the system through feature extraction and distributed decision 2, at the moment, the decision 1 and the decision 2 may be contradictory, and at the moment, the information fusion mechanism adopts various fusion mechanisms: and performing decision fusion in a weighting method, a voting method, a one-vote casting decision system and other modes, and applying the fused decision to all algorithms of the central processing unit.
Has the advantages that:
the artificial intelligent automatic driving system adopting the reinforcement learning mechanism and the information fusion mechanism has the advantages that the reinforcement learning mechanism can learn reasonable control measures according to historical driving conditions, the information fusion mechanism can process and process information collected by different signal sources to make corresponding decisions, and information fusion is carried out on obtained different decision results; adopting four algorithms of a multi-agent coordination algorithm, a path decision algorithm, a conflict reduction algorithm and a collision algorithm; the two action mechanisms form an intelligent algorithm system by adjusting the parameters of the four algorithms, can be used for driving the vehicle by combining the current historical driving mode, manual driving training and other modes, and are easy to control, high in accuracy and good in comfort.
Drawings
FIG. 1 is an overall block diagram of the present invention;
FIG. 2 is a diagram illustrating a reinforcement learning mechanism according to the present invention;
FIG. 3 is a schematic diagram of an information fusion mechanism according to the present invention.
Detailed Description
The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
An artificial intelligence automatic driving system for reinforcement learning and information fusion is shown in figure 1 and comprises an environment sensing system module, a central processor module and a vehicle control system module which are connected with each other in pairs, and the work flow is as follows:
(1) the environmental perception system module is used for collecting external environmental information of the vehicle and processing the external environmental information, and comprises the following components: road condition, vehicle running condition, distance between front and rear, vehicle distance between left and right, vehicle global positioning condition, signal lamp indicating condition, road sign indicating condition, etc. and transmitting to the central processor module and the vehicle control system module;
in addition, the environment sensing system module is provided with a distributed CPU, can adjust sampling frequency and sampling content according to the change of an external environment, starts a standby sensor, and can directly transmit a control signal to a vehicle control system to take an emergency evasive measure when an emergency signal appears; the distributed CPU is a control logic device which can make a control decision independently without passing through a master control CPU and is responsible for independent subsystems such as safety subsystems, motor control subsystems and the like, the distributed CPU can obtain a system running state and a system environment state according to sampling data of devices such as sensors in the subsystems, and the optimal running mode of the subsystems is judged through analysis, for example, a data acquisition mode can be converted from video image processing to laser radar data processing, the number of data sampling frames can be changed according to the change of the environment and the like; and "emergency signal" indicates that the subsystem is in case of an irreversible damage or a tendency to damage;
the environment perception system module consists of a perception processor, a nearby vehicle perception and transmission system, a digital map system, a camera system, a laser ranging system, a radar ranging system and a positioning system which are connected with the perception processor; each subsystem is in parallel relation, each submodule transmits signals of the system to a distributed CPU of the environment perception system for gathering and then transmitting the signals to other modules, and the method comprises the following specific steps:
(1.1) the nearby vehicle sensing and transmitting system collects vehicle communication signals sent by nearby vehicles, and comprises: global positioning position, road occupation condition, speed and vehicle state, and broadcasting the signal of the vehicle for other unmanned vehicles to collect and utilize;
(1.2) the digital map system grasps the region information, road information, weather information and the like of the vehicle running by using the information of the digital map;
(1.3) the camera system collects external image information and provides image data for the decision of the central processing unit;
(1.4) the laser ranging system measures the distance between vehicles and the distance between roadside by adopting a laser ranging mode;
(1.5) the radar ranging system measures the distance between the vehicle and the roadside in a radar ranging mode, and the information and the laser ranging system play a role in mutual backup;
(1.6) the positioning system grasps the current running position information of the vehicle by utilizing the information of the global positioning system and combining a digital map;
the above sensing and transmitting system, camera system, laser ranging system, radar ranging system and positioning system of the approaching vehicle comprise an external sensor circuit system and an internal software process, while the digital map system only comprises an internal software process;
(2) the central processing unit module is used for comprehensively utilizing the information transmitted by the environment sensing system module, making a processing decision and transmitting a control signal to the vehicle control system module;
(2.1) the central processor module comprises a reinforcement learning module, a multi-agent coordination module, an anti-collision algorithm module, a conflict reduction module and an information fusion module; four algorithms are used: a multi-agent coordination algorithm, a path decision algorithm, a conflict reduction algorithm and a collision algorithm;
(2.1.1) the reinforcement learning module is connected with the perception processor, and the reinforcement learning module is used for processing information collected by an adjacent vehicle perception and transmission system, a digital map system, a camera system, a laser ranging system, a radar ranging system and a positioning system to obtain camera data, current vehicle speed, vehicle running direction, positioning data, map data, destination coordinates, distance to nearby obstacles and vehicle running speed;
(2.1.2) the multi-agent coordination module is connected with the reinforcement learning module and the motion processor, and the multi-agent coordination module is used for processing the camera data, the current vehicle speed and the automobile driving direction to obtain a vehicle target vector speed;
in the module, a multi-agent coordination algorithm is applied: using reinforcement learning algorithm based on camera dataObtaining road condition information by using a 3D modeling technology, judging the conditions of traffic lights in front, and taking the positions of vehicle lanes and the like as a state vector set S ═ S1,s2,...,snDefine the action vector set a ═ a1,a2,...,anRepresents different accelerations of the vehicle;
artificially setting an incentive matrix R according to a historical database, wherein the R matrix records different states s of the vehicletTake different actions atThe reward can be waited for, and the Q matrix is initialized to be a zero matrix and records the reward Q of different actions in different states in the long run; the R matrix is set manually and represents the scores of different actions of the vehicle in the current vehicle state; the Q matrix needs to be trained, and the trained Q matrix represents the global values of different actions of the vehicle under the current vehicle state;
and (3) training a Q matrix by using a Q-learning algorithm, wherein the training method comprises the following steps:
Q(st,at)=r+γmaxaQ(st+1,at+1);
wherein R is s in the R matrixtAnd atCorresponding prize value, stIs an element of the S set, representing a certain state, atAn element of the A set represents a certain action, gamma is an attenuation coefficient, the value is between 0 and 1, a Monte Carlo method is used, t is randomly extracted, and the Q table is updated until convergence;
using a deep learning algorithm, stAnd atAs the input of the network, the Q value in the corresponding Q table is used as the output, the network is trained, and a deep learning model NN is trained, wherein the Q value of a certain action is the maximum and represents the global optimum of the action;
after a deep learning model is trained, the current vehicle state s and the vehicle action set are used as input into a neural network, and an action a which is input into the neural network and can obtain the maximum q value is selected as the action selected by the current vehicle (namely, in the current state, all possible actions are traversed, and the action with the highest score is the optimal action);
(2.1.3) the path decision module is connected with the reinforcement learning module and the motion processor, and the path decision module is used for processing the positioning data, the map data and the destination coordinate to obtain a vehicle running path;
in this module, a path decision algorithm is applied: the path decision module uses Dijkstra algorithm, according to the positioning data, the map data and the destination coordinate, a weighted graph can be abstracted, G is (V, E), V represents a vertex set of the weighted graph, E represents a weight between the vertices, the vehicle starting point is V0, and the destination is Vi;
the shortest distance from the starting point to each vertex can be calculated by the following algorithm:
s: the determined vertex set (initially including only the starting point V0);
V-S ═ T: a set of vertices that have not been determined;
①, initially, let S be { V0}, T be V-S be { the rest of the vertices }, the distance values corresponding to the vertices in T are defined as < V0, Vi >, and if there is no < V0, Vi > is ∞;
② selecting a vertex W with the smallest weight and the associated edge with the vertex S from T, and adding the vertex W into S;
③ modifying the distance value of the top point in the rest T, if W is added as the middle top point, the distance value from V0 to Vi is shortened, then modifying the distance value, otherwise, not modifying;
④ repeating steps (2) and (3) until W ═ Vi;
therefore, the shortest distance and the shortest path of < V0, Vi > can be obtained, wherein the shortest path is the running path of the vehicle;
(2.1.4) the anti-collision algorithm module is connected with the reinforcement learning module and the motion processor, and the anti-collision algorithm module is used for processing the distance between the anti-collision algorithm module and nearby obstacles and the vehicle running speed to obtain whether braking is needed or not;
in this module, a collision algorithm is applied: the anti-collision module judges whether the vehicle has collision risk by using radar ranging and laser ranging technologies, calculates the time required by collision, outputs the collision event less than 2s and needs braking, and otherwise outputs the collision event without needing braking;
(2.1.5) the conflict reduction module is connected with the multi-agent coordination module, the anti-collision algorithm module and the motion processor, and the conflict reduction module is used for comparing the outputs of the multi-agent coordination module and the anti-collision algorithm module and selecting one of the outputs;
in the module, a conflict reduction algorithm is applied: the conflict reduction module uses an expert system, namely a judgment method of if … else, and judges which module to use according to the input multi-agent coordination result anti-collision module result and the current vehicle running state by using expert prior knowledge;
(2.1.6) the information fusion module is connected with the multi-agent coordination module, the path decision module, the conflict reduction module, the anti-collision algorithm module and the motion processor, and the information fusion module is used for integrating the outputs of the multi-agent coordination module, the path decision module, the conflict reduction module and the anti-collision algorithm module and outputting the vehicle target vector speed;
in the module, a TS fuzzy model algorithm is used, a multi-agent coordination module, a path decision module, a conflict reduction module and an anti-collision algorithm module are used as different rules, the back piece parameters of the different rules are identified according to historical optimal operation data, and the product sum of the parameters and the different rules can obtain the final vehicle vector acceleration;
(2.2) the central processor module adopts a reinforcement learning mechanism and an information fusion mechanism; the two mechanisms are as follows:
(2.2.1) a reinforcement learning mechanism for learning a reasonable control measure from the historical driving situation, as shown in fig. 2; the specific working mechanism comprises the following algorithm steps:
① the external environment senses and establishes the connection between the environment and the deep learning network, and communicates the external environment with the internal learning model;
② reinforcement learning algorithm, inputting historical database as data source to train the algorithm, adopting heuristic mechanism and reward mechanism, each operation is propelled by one action mode, evaluating each propulsion, if the propulsion effect of the heuristic is better, adopting reward mode to increase the weight of the heuristic, if the heuristic mode is bad, punishing, recording the excellent propulsion mode, removing the heuristic mode evaluated as bad, outputting the depth learning model parameter trained by the reinforcement learning algorithm to the neural network parameter server, and outputting the neural network structure to the depth learning network;
③ deep learning network and neural network parameter server form a learning system, which is the realization form of reinforcement learning algorithm, learning the good propulsion mode in reinforcement learning, modeling, and making the optimal decision according to the current state of the object;
④ the historical database continuously records the current decision, state and final result to prepare for model training;
⑤ strengthening learning mechanism, constructing a feedback of external environment and internal exploration propulsion mode;
⑥ the machine is used in all algorithms of the central processing unit, and the parameters of the algorithm are continuously adjusted to make the parameters of the algorithm continuously optimized;
(2.2.2) the information fusion mechanism is used for processing the information collected by different signal sources to make corresponding decisions, and performing information fusion on the obtained different decision results, as shown in fig. 3; the mechanism is to fuzzify the input quantity according to the fuzzy rule of the TS fuzzy model, and the processing mechanism is as follows:
① fuzzy rule
Contribution component y of ith TS fuzzy rule to system outputi(k +1) can be expressed by the statement "If … Then" as follows:
Figure BDA0002263639280000101
wherein c is the number of fuzzy rules, n is the number of input variables of the TS fuzzy model, and x1(k),x2(k),…,xn(k) X (k) is a regression variable of the input/output data at the k-th time and before, x (k) is [ < x >1(k),x2(k),…,xn(k)]In order to blur the input vector of the model,
Figure BDA0002263639280000102
a fuzzy set with a linear membership function, representing each fuzzy subspace, can be used to perform fuzzy inference of the ith rule,
Figure BDA0002263639280000103
a back-part parameter of the ith fuzzy rule;
② output calculation
Definitions βiTo blur the fitness of rule i, the output y (k +1) of the model at time (k +1) can be calculated by the following formula:
Figure BDA0002263639280000104
definition of
Where r ═ c (n +1), one can obtain:
y(k+1)=Φ(k)TΘ(k);
the information fusion mechanism solves the problem that the environmental perception system and the vehicle control system have different decisions; the environment perception system obtains a decision 1 of the system through feature extraction and distributed decision 1; the vehicle control system obtains a decision 2 of the system through feature extraction and distributed decision 2; in this case, the decision 1 and the decision 2 may generate contradiction, and the information fusion mechanism adopts multiple fusion mechanisms: performing decision fusion in the ways of weight method, voting method, one-vote casting decision, and the like, and applying the fused decision to all algorithms of the central processing unit;
the two action mechanisms form an intelligent algorithm system by adjusting the parameters of the four algorithms, and can drive the vehicle by combining the current historical driving mode, manual driving training and other modes;
(3) the vehicle control system module controls the running of the vehicle by utilizing the information sent by the environment perception system module and the instruction sent by the central processor module; the system comprises a motion processor, a light whistle control system, an angle control system, a speed control system, a path tracking system and a stability control system which are connected with the motion processor, wherein the motion processor of a vehicle control system module is connected with a perception processor of an environment perception system module;
the light whistle control system controls light and whistle according to current road condition information, and for a road section which cannot whistle, the system can acquire an icon which cannot whistle by using a camera subsystem in the environment perception system module;
the flow of the motion processor takes the central processor module as input and outputs different control instructions of the motor, and the method specifically comprises the following steps: control signals such as light whistling control, angle control, speed control, path control and stability control; the control command is realized by using a PCA controller algorithm; the PCA controller algorithm is as follows:
set target data output variable as XqAnd establishing a data quality model by using principal component regression:
Xq=TθT+F;
wherein theta is a principal component regression model coefficient, and F is a model error;
converting target output data into a pivot set value by using a formula:
tsp=xqspT)f
wherein, tspIs a set value of the pivot score, xqspIs the data quality set point, (θ)T)fIs thetaTThe generalized inverse of (1);
t=xP;
wherein x is a process variable and an operation variable at the current time;
definition of Δ t ═ tspT is the error of the pivot setting from the pivot at the current time, and this pivot error can be mapped to X space:
ΔX=ΔtPT
or the following steps:
[ΔXm|Δu]=ΔtPT
when the principal component model is correct, in the above formulaΔ u is a change of an operation variable, i.e., a control amount, Δ u includes amounts to be controlled, such as a speed control amount and an angle control amount, Δ u is inputted to a motion controller, and XqIncluding a target speed and a target angle;
the Principal Component Analysis (PCA) method is mainly adopted, and dimension reduction processing is carried out on X (X is a system output state, the possible dimension of the output state is more, the redundancy is larger, and certain influence is generated on the calculation time of a controller, so that dimension reduction processing is carried out on the system output state, and then the system output state enters a system feedback controller for control), and the specific algorithm is as follows:
① data normalization process
The multi-information acquisition module acquires the working state of the GIS system in the period:
X1=(x1,x2...xn)T
writing the states of m cycles in the form of a state matrix:
Xm=(X1,X2...Xn);
mixing XmCarrying out normalization treatment to obtain:
Figure BDA0002263639280000121
whereinRepresents XmMean value of (1), σ represents XmStandard deviation of (d);
② singular value decomposition
Performing singular value decomposition on the covariance matrix:
Figure BDA0002263639280000123
wherein the content of the first and second substances,
Figure BDA0002263639280000124
③ pivot element
Taking the first k principal elements of Λ as analysis elements, and taking the vectors of the corresponding first k U matrixes:
P=(u1,u2...uk);
④ obtaining a dimensionality reduction form of X
Figure BDA0002263639280000125
Wherein T ═ XP.
In addition, the vehicle control system module is provided with a distributed CPU, when the current vehicle control condition has a fault, the distributed CPU can carry out emergency safety processing according to the current vehicle running condition, and does not receive a control signal sent by the central processing unit at the moment (because the distributed CPU can independently control the subsystem, the distributed CPU can more quickly and effectively deal with the fault when monitoring the fault).

Claims (9)

1. An artificial intelligence automatic driving system for reinforcing learning and information fusion is characterized in that: the system comprises an environment sensing system module, a central processor module and a vehicle control system module which are connected with each other pairwise;
the central processing unit module is used for comprehensively utilizing the information transmitted by the environment sensing system module, making a processing decision and transmitting a control signal to the vehicle control system module;
the central processor module adopts a reinforcement learning mechanism and an information fusion mechanism, the reinforcement learning mechanism is used for learning reasonable control measures according to historical driving conditions, the information fusion mechanism is used for processing and processing information collected by different signal sources to make corresponding decisions, and information fusion is carried out on obtained different decision results.
2. The artificial intelligence autopilot system for reinforcement learning and information fusion of claim 1, wherein the environment sensing system module is mainly composed of a sensing processor and a nearby vehicle sensing and transmitting system, a digital map system, a camera system, a laser ranging system, a radar ranging system and a positioning system which are connected with the sensing processor.
3. The artificial intelligence automatic driving system for reinforcement learning and information fusion of claim 2, wherein the vehicle control system module mainly comprises a motion processor, and a light whistle control system, an angle control system, a speed control system, a path tracking system and a stability control system connected with the motion processor, wherein the motion processor of the vehicle control system module is connected with the perception processor of the environment perception system module.
4. The artificial intelligence autopilot system of reinforcement learning and information fusion of claim 3 wherein the central processing unit includes a reinforcement learning module connected to the sensing processor, the reinforcement learning module being configured to process information collected by the nearby vehicle sensing and transmitting system, the digital map system, the camera system, the laser ranging system, the radar ranging system, and the positioning system to obtain camera data, current vehicle speed, vehicle traveling direction, positioning data, map data, destination coordinates, distance to nearby obstacles, and vehicle traveling speed.
5. The system of claim 4, wherein the central processing unit further comprises a multi-agent coordination module connected to the reinforcement learning module and the motion processor, the multi-agent coordination module being configured to process the camera data, the current vehicle speed, and the vehicle traveling direction to obtain the vehicle target vector speed.
6. The artificial intelligence autopilot system of reinforcement learning and information fusion of claim 5 wherein the central processor module further comprises a path decision module connected to the reinforcement learning module and the motion processor, the path decision module for processing the positioning data, the map data and the destination coordinates to obtain the vehicle travel path.
7. The artificial intelligence automatic driving system with reinforcement learning and information fusion of claim 6, characterized in that, the central processing unit module further comprises an anti-collision algorithm module connected with the reinforcement learning module and the motion processor, the anti-collision algorithm module is used for processing the distance to nearby obstacles and the vehicle running speed to obtain whether braking is needed.
8. The system of claim 7, wherein the central processing unit further comprises a collision reduction module connected to the multi-agent coordination module, the collision avoidance algorithm module, and the motion processor, wherein the collision reduction module is configured to compare outputs of the multi-agent coordination module and the collision avoidance algorithm module, and select one of the outputs.
9. The system of claim 8, wherein the central processing unit further comprises an information fusion module connected to the multi-agent coordination module, the path decision module, the collision reduction module, the collision avoidance algorithm module, and the motion processor, wherein the information fusion module is configured to integrate outputs of the multi-agent coordination module, the path decision module, the collision reduction module, and the collision avoidance algorithm module to output the vehicle target vector velocity.
CN201911079929.7A 2019-11-07 2019-11-07 Artificial intelligence automatic driving system for reinforcement learning and information fusion Pending CN110764507A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911079929.7A CN110764507A (en) 2019-11-07 2019-11-07 Artificial intelligence automatic driving system for reinforcement learning and information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911079929.7A CN110764507A (en) 2019-11-07 2019-11-07 Artificial intelligence automatic driving system for reinforcement learning and information fusion

Publications (1)

Publication Number Publication Date
CN110764507A true CN110764507A (en) 2020-02-07

Family

ID=69336778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911079929.7A Pending CN110764507A (en) 2019-11-07 2019-11-07 Artificial intelligence automatic driving system for reinforcement learning and information fusion

Country Status (1)

Country Link
CN (1) CN110764507A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311945A (en) * 2020-02-20 2020-06-19 南京航空航天大学 Driving decision system and method fusing vision and sensor information
CN111340234A (en) * 2020-02-27 2020-06-26 南京星火技术有限公司 Video data processing method and device, electronic equipment and computer readable medium
CN111845741A (en) * 2020-06-28 2020-10-30 江苏大学 Automatic driving decision control method and system based on hierarchical reinforcement learning
CN112183288A (en) * 2020-09-22 2021-01-05 上海交通大学 Multi-agent reinforcement learning method based on model
CN112396183A (en) * 2021-01-21 2021-02-23 国汽智控(北京)科技有限公司 Method, device and equipment for automatic driving decision and computer storage medium
CN112733448A (en) * 2021-01-07 2021-04-30 北京理工大学 Parameter self-learning double Q table combined agent establishing method for automatic train driving system
CN113428540A (en) * 2021-05-28 2021-09-24 南京航空航天大学 Intelligent autonomous production system of modular production line
CN113655799A (en) * 2021-08-21 2021-11-16 山东金博电动车有限公司 Low-delay remote control automatic driving device and system connection access method
CN113837063A (en) * 2021-10-15 2021-12-24 中国石油大学(华东) Curling motion field analysis and decision-making assisting method based on reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108196535A (en) * 2017-12-12 2018-06-22 清华大学苏州汽车研究院(吴江) Automated driving system based on enhancing study and Multi-sensor Fusion
WO2019092150A1 (en) * 2017-11-10 2019-05-16 Knorr-Bremse Systeme für Nutzfahrzeuge GmbH System for the at least semi-autonomous operation of a motor vehicle with double redundancy
CN109906180A (en) * 2016-09-08 2019-06-18 克诺尔商用车制动***有限公司 Electrical system for vehicle
WO2019115683A1 (en) * 2017-12-15 2019-06-20 Knorr-Bremse Systeme für Nutzfahrzeuge GmbH Electrical equipment of a vehicle having redundant abs and driving dynamics control
US20190302232A1 (en) * 2018-03-30 2019-10-03 Matthew Harrison Method and apparatus for object detection using a beam steering radar and a decision network
US20190310650A1 (en) * 2018-04-09 2019-10-10 SafeAI, Inc. Techniques for considering uncertainty in use of artificial intelligence models

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109906180A (en) * 2016-09-08 2019-06-18 克诺尔商用车制动***有限公司 Electrical system for vehicle
WO2019092150A1 (en) * 2017-11-10 2019-05-16 Knorr-Bremse Systeme für Nutzfahrzeuge GmbH System for the at least semi-autonomous operation of a motor vehicle with double redundancy
CN108196535A (en) * 2017-12-12 2018-06-22 清华大学苏州汽车研究院(吴江) Automated driving system based on enhancing study and Multi-sensor Fusion
WO2019115683A1 (en) * 2017-12-15 2019-06-20 Knorr-Bremse Systeme für Nutzfahrzeuge GmbH Electrical equipment of a vehicle having redundant abs and driving dynamics control
US20190302232A1 (en) * 2018-03-30 2019-10-03 Matthew Harrison Method and apparatus for object detection using a beam steering radar and a decision network
US20190310650A1 (en) * 2018-04-09 2019-10-10 SafeAI, Inc. Techniques for considering uncertainty in use of artificial intelligence models

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任立伟等: "二自由度飞行姿态模拟器的模糊强化学习控制", 《电机与控制学报》 *
熊璐 等: "无人驾驶车辆行为决策***研究", 《汽车技术》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311945A (en) * 2020-02-20 2020-06-19 南京航空航天大学 Driving decision system and method fusing vision and sensor information
CN111340234A (en) * 2020-02-27 2020-06-26 南京星火技术有限公司 Video data processing method and device, electronic equipment and computer readable medium
CN111340234B (en) * 2020-02-27 2024-01-30 南京星火技术有限公司 Video data processing method, apparatus, electronic device and computer readable medium
CN111845741A (en) * 2020-06-28 2020-10-30 江苏大学 Automatic driving decision control method and system based on hierarchical reinforcement learning
CN111845741B (en) * 2020-06-28 2021-08-03 江苏大学 Automatic driving decision control method and system based on hierarchical reinforcement learning
CN112183288A (en) * 2020-09-22 2021-01-05 上海交通大学 Multi-agent reinforcement learning method based on model
CN112733448A (en) * 2021-01-07 2021-04-30 北京理工大学 Parameter self-learning double Q table combined agent establishing method for automatic train driving system
CN112396183A (en) * 2021-01-21 2021-02-23 国汽智控(北京)科技有限公司 Method, device and equipment for automatic driving decision and computer storage medium
CN113428540A (en) * 2021-05-28 2021-09-24 南京航空航天大学 Intelligent autonomous production system of modular production line
CN113655799A (en) * 2021-08-21 2021-11-16 山东金博电动车有限公司 Low-delay remote control automatic driving device and system connection access method
CN113837063A (en) * 2021-10-15 2021-12-24 中国石油大学(华东) Curling motion field analysis and decision-making assisting method based on reinforcement learning
CN113837063B (en) * 2021-10-15 2024-05-10 中国石油大学(华东) Reinforcement learning-based curling motion field analysis and auxiliary decision-making method

Similar Documents

Publication Publication Date Title
CN110764507A (en) Artificial intelligence automatic driving system for reinforcement learning and information fusion
CN113056749B (en) Future object trajectory prediction for autonomous machine applications
US11635764B2 (en) Motion prediction for autonomous devices
CN110248861B (en) Guiding a vehicle using a machine learning model during vehicle maneuvers
KR102070527B1 (en) Evaluation Framework for Trajectories Predicted in Autonomous Vehicle Traffic Prediction
CN110371132B (en) Driver takeover evaluation method and device
CN113168708B (en) Lane line tracking method and device
CN112034834A (en) Offline agent for accelerating trajectory planning for autonomous vehicles using reinforcement learning
CN111186402A (en) System and method for controlling actuator based on load characteristics and passenger comfort
CN112034833A (en) Online agent to plan open space trajectories for autonomous vehicles
Hecker et al. Learning accurate, comfortable and human-like driving
CN111948938A (en) Relaxation optimization model for planning open space trajectories for autonomous vehicles
CN116134292A (en) Tool for performance testing and/or training an autonomous vehicle planner
WO2019134110A1 (en) Autonomous driving methods and systems
CN115243949A (en) Positioning error monitoring
WO2022178858A1 (en) Vehicle driving intention prediction method and apparatus, terminal and storage medium
US20230150549A1 (en) Hybrid log simulated driving
CN114415672A (en) Dynamic model evaluation for autonomously driven vehicles
CN113743469A (en) Automatic driving decision-making method fusing multi-source data and comprehensive multi-dimensional indexes
US11590969B1 (en) Event detection based on vehicle data
Siboo et al. An empirical study of ddpg and ppo-based reinforcement learning algorithms for autonomous driving
CN110426215B (en) Model establishing method for vehicle ride comfort test and intelligent driving system
CN116776151A (en) Automatic driving model capable of performing autonomous interaction with outside personnel and training method
CN114895682B (en) Unmanned mine car walking parameter correction method and system based on cloud data
Chen et al. From perception to control: an autonomous driving system for a formula student driverless car

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207