CN107169567A - The generation method and device of a kind of decision networks model for Vehicular automatic driving - Google Patents
The generation method and device of a kind of decision networks model for Vehicular automatic driving Download PDFInfo
- Publication number
- CN107169567A CN107169567A CN201710201086.8A CN201710201086A CN107169567A CN 107169567 A CN107169567 A CN 107169567A CN 201710201086 A CN201710201086 A CN 201710201086A CN 107169567 A CN107169567 A CN 107169567A
- Authority
- CN
- China
- Prior art keywords
- sample
- default
- training
- experience database
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Traffic Control Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The present invention is applicable field of computer technology there is provided a kind of generation method of decision networks model for Vehicular automatic driving and device, and methods described includes:The car status information gathered according to each experiment moment, default vehicle set of actions and default prize payouts function, generation each experiment moment corresponding sample triple, it is the sample data in the experience database built in advance by all sample triple stores, and clustering is carried out to all sample datas, according to default oversampling ratio value, training sample is equably gathered in each cluster that experience database clustering is obtained, and calculate the return aggregate-value of each training sample, according to all training samples, the return aggregate-value and default deep learning algorithm of each training sample, training obtains the decision networks model of Vehicular automatic driving, so as to be effectively improved the training effectiveness of the decision networks model and the generalization ability of the tactful network model.
Description
Technical field
The invention belongs to field of computer technology, more particularly to a kind of decision networks model for Vehicular automatic driving
Generation method and device.
Background technology
With expanding economy and the propulsion of Development of China's Urbanization, Global Auto recoverable amount and mileages of transport route are stepped up, and are led
The problem of a series of orthodox cars such as cause traffic congestion, accident, pollution, land resource in short supply can not be properly settled is increasingly convex
It is aobvious.Pilotless automobile technology is considered as the effective solution of these problems, and its development gets most of the attention.
Pilotless automobile, i.e., travelled in the case of without driver by the DAS (Driver Assistant System) of itself on road,
Possesses environment sensing ability.At present, the control method of DAS (Driver Assistant System) is mainly rule-based control decision, i.e., according to
The driving experience known, builds to the expert decision system of situation of remote output control decision-making, as similar Expert Rules system
Shallow-layer learning method is considered as from the process that rule is found between labeled data, when rule is difficult to be conceptualized as formula or letter
During single logic, shallow-layer study can not just prove effective.
With the fast development of deeply learning art, some research institutions propose that the automatic Pilot of " end-to-end " formula is calculated
Method, the control decision model in DAS (Driver Assistant System) is built by depth network, the input of depth network is camera, laser
The status datas such as radar, GPS location, speed, the output of depth network is directly as the actuating signal for controlling vehicle drive.It is this
Method need not carry out rule-based identification to the state of vehicle, but the training of depth network generally requires substantial amounts of data
Sample, with the raising and the increase of complicated network structure degree of vehicle sensor data dimension, the computing resource of model training disappears
Consumption is greatly increased, and the huge consumption of computing resource is referred to as a big obstruction of depth network model training.
The content of the invention
It is an object of the invention to provide a kind of generation method of decision networks model for Vehicular automatic driving and dress
Put, it is intended to which the decision model training effectiveness for solving Vehicular automatic driving is relatively low and learning ability of Vehicle Decision Method model is weaker, nothing
Method preferably adapts to different route and scene.
On the one hand, it is described the invention provides a kind of generation method of the decision networks model for Vehicular automatic driving
Method comprises the steps:
Returned according to car status information, default vehicle set of actions and the default reward that each experiment moment gathers
Function is reported, corresponding sample triple of each experiment moment is generated;
It is the sample data in the experience database built in advance by all sample triple stores, and to the experience number
Clustering is carried out according to sample data all in storehouse;
According to default oversampling ratio value, equably gathered in each cluster that the experience database clustering is obtained
Training sample, and calculate the corresponding accumulative return value of each training sample;
According to all training samples, the accumulative return value of each training sample and default deep learning algorithm,
Training obtains the decision networks model of Current vehicle automatic Pilot.
On the other hand, the invention provides a kind of generating means of the decision networks model for Vehicular automatic driving, institute
Stating device includes:
Sample generation module, for the car status information according to each experiment moment collection, default vehicle behavior aggregate
Close and default prize payouts function, generate corresponding sample triple of each experiment moment;
Cluster Analysis module, for being the sample number in the experience database that builds in advance by all sample triple stores
According to, and clustering is carried out to sample data all in the experience database;
Sampling module is trained, for according to default oversampling ratio value, being obtained in the experience database clustering
Training sample is equably gathered in each cluster, and calculates the corresponding accumulative return value of each training sample;And
Model generation module, for according to all training samples, the accumulative return value of each training sample and pre-
If deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
The present invention obtains experience number according to the car status information, vehicle set of actions and prize payouts function of collection
According to the sample data in storehouse, i.e. sample triple, the sample data in experience database is classified by clustering, and by than
Example uniform sampling in each cluster of classification, obtains the training sample of depth network, and finally training obtains Vehicular automatic driving
Decision networks model, so as to simplify training sample by clustering method, and uses return value and deep learning training decision-making mode
Network model, not only enables decision networks model quickly to be trained under the data set simplified, and effective raising
The learning ability and generalization ability of decision networks model in control loop.
Brief description of the drawings
Fig. 1 is the reality of the generation method for the decision networks model for Vehicular automatic driving that the embodiment of the present invention one is provided
Existing flow chart;
Fig. 2 is the knot of the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention two is provided
Structure schematic diagram;And
Fig. 3 is the knot of the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention three is provided
Structure schematic diagram.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Implementing for the present invention is described in detail below in conjunction with specific embodiment:
Embodiment one:
Fig. 1 shows the generation method for the decision networks model for Vehicular automatic driving that the embodiment of the present invention one is provided
Implementation process, for convenience of description, illustrate only the part related to the embodiment of the present invention, details are as follows:
In step S101, according to it is each experiment the moment gather car status information, default vehicle set of actions with
And default prize payouts function, generation each experiment moment corresponding sample triple.
The present invention is applied to be based on racing car analogue simulation platform or race simulator (such as open race simulator
TORCS, The open racing car simulation) interaction platform set up, on the interaction platform carrying out nobody drives
Sail the traveling interaction experiment of automobile.In current interaction process of the test, pass through default multiple sensor collecting vehicles on vehicle
Status information, car status information may include vehicle from a distance from center line of road, the folder of vehicle forward direction and road tangentially
The velocity component of angle, the distance value of vehicle front laser range finder and vehicle on road is tangential etc..In addition to the initial trial moment,
The vehicle-state at each experiment moment is the vehicle-state of last moment and the result or function of vehicle action, for example, using St
Car status information when representing to carve t at the trial, then car status information when carving t+1 at the trial is St+1=f (St,at)
=f (f (St-1,at-1))=..., wherein, atVehicle action message during for experiment moment t.Specifically, vehicle action message can
Including straight trip, brake etc..
In embodiments of the present invention, after the car status information for collecting the current test moment, rewarded back according to default
Value function is reported, travels through and the vehicle action that can obtain maximal rewards value is searched in default vehicle set of actions, and sent out to vehicle
Give the vehicle to act, for the ease of distinguishing, vehicle action is referred to as the action of maximal rewards value, by car status information, maximum
Return value is acted and the return value of maximal rewards value action is combined into sample triple, for example, the sample three during experiment moment t
Tuple is represented by (St,at,rt), rtFor the return value of experiment moment t maximal rewards value action.
As illustratively, in the current traveling-position of consideration vehicle, the formula of prize payouts function can be:
R=Δs dis*cos (α * angle) * sgn (trackPos-threshold), wherein, r is prize payouts function
Return value, Δ dis is the coverage that vehicle was run at the adjacent experiment moment, and α is default weight zoom factor, and angle is
Vehicle current driving direction and the tangential angle of road, trackPos be vehicle from center line of road with a distance from, threshold is pre-
If threshold value, when trackPos is more than threshold, r is infinitesimal, can represent to punish when being too near to road boundary to vehicle
Penalize.In addition, it is also possible to consider travel speed, SFC, smoothness etc. for prize payouts function.
In step s 102, by all sample triple stores it is sample data in the experience database that builds in advance,
And clustering is carried out to sample data all in experience database.
In embodiments of the present invention, in current interaction off-test, each experiment moment pair during the interaction is tested
The sample triple answered all is stored as the sample data in experience database.Each sample is calculated by the disaggregated model initialized
Classification belonging to notebook data (i.e. sample triple) so that all sample datas are assigned in corresponding cluster, so as to pass through cluster point
Some inwardness or rule of sample in experience database are found in analysis, with reduce decision networks model training sample dimension or
Quantity, reaches the purpose for simplifying training sample.
In embodiments of the present invention, when initializing disaggregated model, driving number of the professional driver of collection in driving
According to can be described as professional driving data, car status information and vehicle action when professional driving data includes professional driver driving
" state-action " two tuples of information composition, carry out clustering, with first by default clustering algorithm to professional driving data
Beginningization disaggregated model.Wherein, clustering algorithm can be the clustering algorithm such as K-means or principal component analysis (PCA).
As illustratively, when using K-means algorithm initialization disaggregated models, randomly selected in professional driving data
Multiple cluster centres, and the classification of each " state-action " two tuples in professional driving data is calculated, update the poly- of each classification
Class center, calculating the formula of classification can be:
Wherein, x(i)For i-th of " state-action " two tuple in professional driving data,
ujFor j-th of cluster centre.
Updating the formula of the cluster centre of each classification can be:
Preferably, whether occur in the driving procedure of interaction experiment unexpected or whether complete to preset by detecting vehicle
Test drive task, to determine whether current interaction experiment terminates, when vehicle occur in driving procedure it is unexpected or complete
During into default test drive task, it is determined that current interactive task terminates, so as to obtain a series of sample three of Time Continuous
Tuple.Specifically, occur unexpected may include to roll road away from, collide or fuel tank oil starvation etc. in driving procedure.
In step s 103, according to default oversampling ratio value, in each cluster that experience database clustering is obtained
Training sample is equably gathered, and calculates the corresponding accumulative return value of each training sample.
In embodiments of the present invention, the uniform sampling in each cluster of experience database, to select representative instruction
Practice sample and keep independent same distribution between training sample, so, can have when with these training sample Training strategy network models
Effect improves the training effectiveness of tactful network model.Car status information, the action of maximal rewards value in each training sample
With maximal rewards value, it can calculate and obtain corresponding accumulative return value, because each car status information is a upper vehicle shape
State information and the result of action, therefore optimal strategy can be determined by accumulative return value.Accumulative return value is also tactful network mould
The output of type.
Specifically, return value Q (s are added upt,at) r can be passed through0+γr1+γ2r2+ ... calculate, γ is parameter preset and 0≤γ
<1。
In step S104, according to the accumulative return value and default depth of all training samples, each training sample
Learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
In embodiments of the present invention, by the car status information S in training samplet, vehicle action message atAnd training sample
Accumulative return value Q store into default data set, according to the data set and default deep learning algorithm, training vehicle from
The dynamic decision networks model driven.Wherein, can be using elastic BP neural network (Rprop), Back Propagation Algorithm or length memory
Algorithm (LSTM) even depth learning algorithm.
Preferably, experiment is repeatedly interacted, multiple training is carried out to decision-making network model, after once training terminates
Whether detection experience database meets default constraints, when being unsatisfactory for, reject empirical data can in undesirable sample
Notebook data, so that the empirical data that upgrades in time, improves the quality of training sample.
Specifically, constraints can be len (DSh) < μrms&&num(DSh) < Knum, wherein, DShFor experience database,
Len () is used for calculating the number of sample data in experience database, and num () is used for counting the number of times of interaction experiment, μrmsFor sample
The maximum quantity of notebook data, KnumFor the maximum times of interaction experiment.
Specifically, time gap that can be rule of thumb in data between adjacent sample data comes whether judgement sample data meet
It is required that, when the time gap between current sample data and a upper sample data is less than pre-determined distance threshold value, it is determined that current sample
Data are undesirable.
In embodiments of the present invention, it is corresponding most by the car status information at different tests moment, the car status information
Big return value action and the maximal rewards value act the sample data in corresponding return value composition experience database, to all
Sample data carry out clustering and to after clustering each cluster carry out uniform sampling, obtain be used for train vehicle automatic
The training sample of driving strategy network model, and experience database is updated according to constraints after each training terminates, to pick
Except undesirable sample data, so as to improve the representativeness of training sample, the dimension of training sample is reduced, and use
Prize payouts and deep learning training decision networks model, are effectively improved training effectiveness, the study energy of decision networks model
Power and generalization ability.
Can be with one of ordinary skill in the art will appreciate that realizing that all or part of step in above-described embodiment method is
The hardware of correlation is instructed to complete by program, described program can be stored in a computer read/write memory medium,
Described storage medium, such as ROM/RAM, disk, CD.
Embodiment two:
Fig. 2 shows the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention two is provided
Structure, for convenience of description, illustrate only the part related to the embodiment of the present invention, including:
Sample generation module 21, for the car status information according to each experiment moment collection, the action of default vehicle
Set and default prize payouts function, generation each experiment moment corresponding sample triple.
In embodiments of the present invention, after the car status information for collecting the current test moment, rewarded back according to default
Value function is reported, the action of maximal rewards value, i.e. maximal rewards value can be obtained and move by traveling through to search in default vehicle set of actions
Make, sample triple is combined into by the return value of car status information, the action of maximal rewards value and the action of maximal rewards value.
Cluster Analysis module 22, for being the sample in the experience database that builds in advance by all sample triple stores
Data, and clustering is carried out to sample data all in experience database.
In embodiments of the present invention, by the disaggregated model initialized through default clustering algorithm and professional driving data,
Calculate the classification belonging to each sample data (i.e. sample triple) so that all sample datas are assigned in corresponding cluster, so that
Some inwardness or rule of sample in experience database are found by clustering, to reduce decision networks model training sample
This dimension or quantity, reaches the purpose for simplifying training sample.
Train sampling module 23, for according to default oversampling ratio value, experience database clustering obtain it is every
Training sample is equably gathered in individual cluster, and calculates the corresponding accumulative return value of each training sample.
In embodiments of the present invention, the uniform sampling in each cluster of experience database, to select representative instruction
Practice sample and keep independent same distribution between training sample, so, can have when with these training sample Training strategy network models
Effect improves the training effectiveness of tactful network model.Car status information, the action of maximal rewards value in each training sample
With maximal rewards value, it can calculate and obtain corresponding accumulative return value, specifically, add up return value Q (st,at) r can be passed through0+γr1
+γ2r2+ ... calculate, γ is parameter preset and 0≤γ<1.
Model generation module 24, for according to all training samples, the accumulative return value of each training sample and default
Deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
In embodiments of the present invention, by the car status information S in training samplet, vehicle action message atAnd training sample
Accumulative return value Q store into default data set, according to the data set and default deep learning algorithm, training vehicle from
The dynamic decision networks model driven.
In embodiments of the present invention, it is corresponding most by the car status information at different tests moment, the car status information
Big return value action and the maximal rewards value act the sample data in corresponding return value composition experience database, and pass through
Clustering and uniform sampling choose representative training sample from sample data, pass through prize payouts and deep learning
Training sample is trained and obtains decision networks model, so that by the simplifying of training sample, prize payouts and deep learning,
It is effectively improved training effectiveness, learning ability and the generalization ability of decision networks model.
Embodiment three:
Fig. 3 shows the generating means for the decision networks model for Vehicular automatic driving that the embodiment of the present invention three is provided
Structure, including:
Sample generation module 31, for the car status information according to each experiment moment collection, the action of default vehicle
Set and default prize payouts function, generation each experiment moment corresponding sample triple.
In embodiments of the present invention, after the car status information for collecting the current test moment, rewarded back according to default
Value function is reported, the action of maximal rewards value, i.e. maximal rewards value can be obtained and move by traveling through to search in default vehicle set of actions
Make, sample triple is combined into by the return value of car status information, the action of maximal rewards value and the action of maximal rewards value.
Off-test module 32, occurs the unexpected or default examination of completion for that ought detect vehicle during test drive
When testing driving task, terminate current interaction experiment, and obtain the corresponding sample ternary of each experiment moment in interactive process of the test
Group.
In embodiments of the present invention, occur surprisingly may include to roll road away from, collide or fuel tank in driving procedure
Oil starvation etc..
Cluster Analysis module 33, for being the sample in the experience database that builds in advance by all sample triple stores
Data, and clustering is carried out to sample data all in experience database.
In embodiments of the present invention, in embodiments of the present invention, by through at the beginning of default clustering algorithm and professional driving data
The good disaggregated model of beginningization, calculates the classification belonging to each sample data (i.e. sample triple) so that all sample datas point
Into corresponding cluster, so that some inwardness or rule of sample in experience database are found by clustering, to reduce
The dimension or quantity of decision networks model training sample, reach the purpose for simplifying training sample.
Train sampling module 34, for according to default oversampling ratio value, experience database clustering obtain it is every
Training sample is equably gathered in individual cluster, and calculates the corresponding accumulative return value of each training sample.
In embodiments of the present invention, the uniform sampling in each cluster of experience database, to select representative instruction
Practice sample and keep independent same distribution between training sample, so, can have when with these training sample Training strategy network models
Effect improves the training effectiveness of tactful network model.Car status information, the action of maximal rewards value in each training sample
With maximal rewards value, it can calculate and obtain corresponding accumulative return value.
Model generation module 35, for according to all training samples, the accumulative return value of each training sample and default
Deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
In embodiments of the present invention, by the car status information S in training samplet, vehicle action message atAnd training sample
Accumulative return value Q store into default data set, according to the data set and default deep learning algorithm, training vehicle from
The dynamic decision networks model driven.
Experience update module 36, for detecting whether the experience database meets default constraints, when the warp
When testing database and being unsatisfactory for the constraints, the sample data in the experience database is updated.
In embodiments of the present invention, experiment is repeatedly interacted, multiple training is carried out to decision-making network model, once
Training terminate whether rear detection experience database meets default constraints, when being unsatisfactory for, reject empirical data can in be not inconsistent
Desired sample data is closed, so that the empirical data that upgrades in time, improves the quality of training sample.
Specifically, constraints can be len (DSh) < μrms&&num(DSh) < Knum, wherein, DShFor experience database,
Len () is used for calculating the number of sample data in experience database, and num () is used for counting the number of times of interaction experiment, μrmsFor sample
The maximum quantity of notebook data, KnumFor the maximum times of interaction experiment.
Specifically, time gap that can be rule of thumb in data between adjacent sample data comes whether judgement sample data meet
It is required that, when the time gap between current sample data and a upper sample data is less than pre-determined distance threshold value, it is determined that current sample
Data are undesirable.
Preferably, sample generation module 31 includes vehicle-state acquisition module 311, vehicle action searching modul 312 and sample
This generation submodule 313, wherein:
Vehicle-state acquisition module 311, the car status information for gathering the current test moment;
Vehicle acts searching modul 312, for the car status information according to the current test moment and prize payouts function,
The action of maximal rewards value is searched in vehicle set of actions and order vehicle performs the maximal rewards value action;And
Sample generate submodule 313, for by the car status information at current test moment, maximal rewards value action and
The return value of maximal rewards value action is combined, and obtains current test moment corresponding sample triple.
Preferably, Cluster Analysis module 33 includes disaggregated model initialization module 331 and classification generation module 332, wherein:
Disaggregated model initialization module 331, for according to default clustering algorithm and the professional driving data gathered in advance,
Initialize the disaggregated model for carrying out clustering to experience database;And
Classification generation module 332, for being calculated by disaggregated model in experience database, vehicle-state is believed in sample data
Classification belonging to breath, to obtain the classification of all sample datas in experience database.
In embodiments of the present invention, it is corresponding most by the car status information at different tests moment, the car status information
Big return value action and the maximal rewards value act the sample data in corresponding return value composition experience database, and pass through
Clustering and uniform sampling choose representative training sample from sample data, pass through prize payouts and deep learning
Training sample is trained and obtains decision networks model, carrying out test of many times to decision-making network model repeatedly trains, and every
Experience database is updated after secondary training, so as to by the simplifying of training sample, prize payouts and deep learning, be effectively improved
Training effectiveness, learning ability and the generalization ability of decision networks model.
In embodiments of the present invention, each module for the generating means of the decision networks model of Vehicular automatic driving can be by
Corresponding hardware or software module realize that each module can be independent soft and hardware module, can also be integrated into one it is soft, hard
Part module, herein not to limit the present invention.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.
Claims (10)
1. the generation method of a kind of decision networks model for Vehicular automatic driving, it is characterised in that under methods described includes
State step:
Car status information, default vehicle set of actions and the default prize payouts letter gathered according to each experiment moment
Number, generates corresponding sample triple of each experiment moment;
It is the sample data in the experience database built in advance by all sample triple stores, and to the experience database
In all sample data carry out clustering;
According to default oversampling ratio value, the equably collection training in each cluster that the experience database clustering is obtained
Sample, and calculate the corresponding accumulative return value of each training sample;
According to all training samples, the accumulative return value of each training sample and default deep learning algorithm, training
Obtain the decision networks model of Current vehicle automatic Pilot.
2. the method as described in claim 1, it is characterised in that training obtains the decision networks model of the Vehicular automatic driving
The step of after, methods described also includes:
Detect whether the experience database meets default constraints, when the experience database is unsatisfactory for the constraint bar
During part, the sample data in the experience database is updated.
3. the method as described in claim 1, it is characterised in that according to the car status information of each experiment moment collection, in advance
If vehicle set of actions and default prize payouts function, generate each experiment moment corresponding sample triple
Step, including:
Gather the car status information at current test moment;
According to the car status information at the current test moment and the prize payouts function, in the vehicle set of actions
The action of maximal rewards value is searched, and the maximal rewards value is sent to the vehicle and is acted;
By the car status information at the current test moment, maximal rewards value action and maximal rewards value action
Return value be combined, obtain current test moment corresponding sample triple.
4. the method as described in claim 1, it is characterised in that generate corresponding sample triple of each experiment moment
After step, the step of be the sample data in the experience database that builds in advance by all sample triple stores before, institute
Stating method also includes:
When detecting the vehicle in generation accident during test drive or the default test drive task of completion, terminate to work as
Preceding interaction experiment, and obtain in the interactive process of the test corresponding sample triple of each experiment moment.
5. the method as described in claim 1, it is characterised in that gathered to sample data all in the experience database
The step of alanysis, including:
According to default clustering algorithm and the professional driving data gathered in advance, initialize for being carried out to the experience database
The disaggregated model of clustering;
Classification in the experience database in sample data belonging to car status information is calculated by the disaggregated model, with
The classification of all sample datas into the experience database.
6. the generating means of a kind of decision networks model for Vehicular automatic driving, it is characterised in that described device includes:
Sample generation module, for the car status information according to each experiment moment collection, default vehicle set of actions with
And default prize payouts function, generate corresponding sample triple of each experiment moment;
Cluster Analysis module, for being the sample data in the experience database that builds in advance by all sample triple stores,
And clustering is carried out to sample data all in the experience database;
Train sampling module, for according to default oversampling ratio value, the experience database clustering obtain it is each
Training sample is equably gathered in cluster, and calculates the corresponding accumulative return value of each training sample;And
Model generation module, for according to all training samples, the accumulative return value of each training sample and default
Deep learning algorithm, training obtains the decision networks model of Current vehicle automatic Pilot.
7. device as claimed in claim 6, it is characterised in that described device also includes:
Experience update module, for detecting whether the experience database meets default constraints, when the empirical data
When storehouse is unsatisfactory for the constraints, the sample data in the experience database is updated.
8. device as claimed in claim 6, it is characterised in that the sample generation module includes:
Vehicle-state acquisition module, the car status information for gathering the current test moment;
Vehicle acts searching modul, for the car status information according to the current test moment and the prize payouts letter
Number, maximal rewards value is searched in the vehicle set of actions and acts and orders the vehicle execution maximal rewards value to be moved
Make;And
Sample generate submodule, for by the car status information at the current test moment, the maximal rewards value action with
And the return value of the maximal rewards value action is combined, and obtains the current test moment corresponding sample triple.
9. device as claimed in claim 6, it is characterised in that described device also includes:
Off-test module, occurs the unexpected or default experiment of completion for that ought detect the vehicle during test drive
During driving task, terminate current interaction experiment, and obtain the corresponding sample three of each experiment moment in the interactive process of the test
Tuple.
10. device as claimed in claim 6, it is characterised in that the Cluster Analysis module includes:
Disaggregated model initialization module, for according to default clustering algorithm and the professional driving data gathered in advance, initialization
Disaggregated model for carrying out clustering to the experience database;And
Classification generation module, for being calculated by the disaggregated model in the experience database, vehicle-state is believed in sample data
Classification belonging to breath, to obtain the classification of all sample datas in the experience database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710201086.8A CN107169567B (en) | 2017-03-30 | 2017-03-30 | Method and device for generating decision network model for automatic vehicle driving |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710201086.8A CN107169567B (en) | 2017-03-30 | 2017-03-30 | Method and device for generating decision network model for automatic vehicle driving |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107169567A true CN107169567A (en) | 2017-09-15 |
CN107169567B CN107169567B (en) | 2020-04-07 |
Family
ID=59849244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710201086.8A Active CN107169567B (en) | 2017-03-30 | 2017-03-30 | Method and device for generating decision network model for automatic vehicle driving |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107169567B (en) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107544516A (en) * | 2017-10-11 | 2018-01-05 | 苏州大学 | Automated driving system and method based on relative entropy depth against intensified learning |
CN107826105A (en) * | 2017-10-31 | 2018-03-23 | 清华大学 | Translucent automatic Pilot artificial intelligence system and vehicle |
CN107862346A (en) * | 2017-12-01 | 2018-03-30 | 驭势科技(北京)有限公司 | A kind of method and apparatus for carrying out driving strategy model training |
CN108647789A (en) * | 2018-05-15 | 2018-10-12 | 浙江大学 | A kind of intelligent body deep value function learning method based on the sampling of state distributed awareness |
CN108932840A (en) * | 2018-07-17 | 2018-12-04 | 北京理工大学 | Automatic driving vehicle urban intersection passing method based on intensified learning |
CN108944944A (en) * | 2018-07-09 | 2018-12-07 | 深圳市易成自动驾驶技术有限公司 | Automatic Pilot model training method, terminal and readable storage medium storing program for executing |
CN108995655A (en) * | 2018-07-06 | 2018-12-14 | 北京理工大学 | A kind of driver's driving intention recognition methods and system |
CN109344969A (en) * | 2018-11-01 | 2019-02-15 | 石家庄创天电子科技有限公司 | Nerve network system and its training method and computer-readable medium |
CN109739216A (en) * | 2019-01-25 | 2019-05-10 | 深圳普思英察科技有限公司 | The test method and system of the practical drive test of automated driving system |
CN109747655A (en) * | 2017-11-07 | 2019-05-14 | 北京京东尚科信息技术有限公司 | Steering instructions generation method and device for automatic driving vehicle |
CN109752952A (en) * | 2017-11-08 | 2019-05-14 | 华为技术有限公司 | Method and device for acquiring multi-dimensional random distribution and strengthening controller |
CN109871010A (en) * | 2018-12-25 | 2019-06-11 | 南方科技大学 | method and system based on reinforcement learning |
CN109901446A (en) * | 2017-12-08 | 2019-06-18 | 广州汽车集团股份有限公司 | Controlling passing of road junction, apparatus and system |
CN109934171A (en) * | 2019-03-14 | 2019-06-25 | 合肥工业大学 | Driver's passiveness driving condition online awareness method based on layered network model |
CN110160804A (en) * | 2019-05-31 | 2019-08-23 | 中国科学院深圳先进技术研究院 | A kind of test method of automatic driving vehicle, apparatus and system |
CN110196587A (en) * | 2018-02-27 | 2019-09-03 | 中国科学院深圳先进技术研究院 | Vehicular automatic driving control strategy model generating method, device, equipment and medium |
CN110363295A (en) * | 2019-06-28 | 2019-10-22 | 电子科技大学 | A kind of intelligent vehicle multilane lane-change method based on DQN |
CN110378460A (en) * | 2018-04-13 | 2019-10-25 | 北京智行者科技有限公司 | Decision-making technique |
CN110478911A (en) * | 2019-08-13 | 2019-11-22 | 苏州钛智智能科技有限公司 | The unmanned method of intelligent game vehicle and intelligent vehicle, equipment based on machine learning |
CN110673602A (en) * | 2019-10-24 | 2020-01-10 | 驭势科技(北京)有限公司 | Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment |
CN110738221A (en) * | 2018-07-18 | 2020-01-31 | 华为技术有限公司 | operation system and method |
CN110764496A (en) * | 2018-07-09 | 2020-02-07 | 株式会社日立制作所 | Automatic driving assistance device and method thereof |
CN110824912A (en) * | 2018-08-08 | 2020-02-21 | 华为技术有限公司 | Method and apparatus for training a control strategy model for generating an autonomous driving strategy |
CN110850861A (en) * | 2018-07-27 | 2020-02-28 | 通用汽车环球科技运作有限责任公司 | Attention-based hierarchical lane change depth reinforcement learning |
CN110991095A (en) * | 2020-03-05 | 2020-04-10 | 北京三快在线科技有限公司 | Training method and device for vehicle driving decision model |
CN111091020A (en) * | 2018-10-22 | 2020-05-01 | 百度在线网络技术(北京)有限公司 | Automatic driving state distinguishing method and device |
CN111325230A (en) * | 2018-12-17 | 2020-06-23 | 上海汽车集团股份有限公司 | Online learning method and online learning device of vehicle lane change decision model |
CN111426933A (en) * | 2020-05-19 | 2020-07-17 | 浙江巨磁智能技术有限公司 | Safety type power electronic module and safety detection method thereof |
CN111443621A (en) * | 2020-06-16 | 2020-07-24 | 深圳市城市交通规划设计研究中心股份有限公司 | Model generation method, model generation device and electronic equipment |
CN111899594A (en) * | 2019-05-06 | 2020-11-06 | 百度(美国)有限责任公司 | Automated training data extraction method for dynamic models of autonomous vehicles |
CN112100787A (en) * | 2019-05-28 | 2020-12-18 | 顺丰科技有限公司 | Vehicle motion prediction method, device, electronic device, and storage medium |
CN112201070A (en) * | 2020-09-29 | 2021-01-08 | 上海交通大学 | Deep learning-based automatic driving expressway bottleneck section behavior decision method |
CN112924177A (en) * | 2021-04-02 | 2021-06-08 | 哈尔滨理工大学 | Rolling bearing fault diagnosis method for improved deep Q network |
CN113511222A (en) * | 2021-08-27 | 2021-10-19 | 清华大学 | Scene self-adaptive vehicle interactive behavior decision and prediction method and device |
CN113807503A (en) * | 2021-09-28 | 2021-12-17 | 中国科学技术大学先进技术研究院 | Autonomous decision making method, system, device and terminal suitable for intelligent automobile |
CN114332500A (en) * | 2021-09-14 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Image processing model training method and device, computer equipment and storage medium |
CN114624645A (en) * | 2022-03-10 | 2022-06-14 | 扬州宇安电子科技有限公司 | Miniature rotor unmanned aerial vehicle radar reconnaissance system based on micro antenna array |
CN114880938A (en) * | 2022-05-16 | 2022-08-09 | 重庆大学 | Method for realizing decision of automatically driving automobile behavior |
CN116757272A (en) * | 2023-07-03 | 2023-09-15 | 西湖大学 | Continuous motion control reinforcement learning framework and learning method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102109821A (en) * | 2010-12-30 | 2011-06-29 | 中国科学院自动化研究所 | System and method for controlling adaptive cruise of vehicles |
CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive cruise control method based on approximate policy iteration |
CN105109485A (en) * | 2015-08-24 | 2015-12-02 | 奇瑞汽车股份有限公司 | Driving method and system |
CN106295637A (en) * | 2016-07-29 | 2017-01-04 | 电子科技大学 | A kind of vehicle identification method based on degree of depth study with intensified learning |
-
2017
- 2017-03-30 CN CN201710201086.8A patent/CN107169567B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102109821A (en) * | 2010-12-30 | 2011-06-29 | 中国科学院自动化研究所 | System and method for controlling adaptive cruise of vehicles |
CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive cruise control method based on approximate policy iteration |
CN105109485A (en) * | 2015-08-24 | 2015-12-02 | 奇瑞汽车股份有限公司 | Driving method and system |
CN106295637A (en) * | 2016-07-29 | 2017-01-04 | 电子科技大学 | A kind of vehicle identification method based on degree of depth study with intensified learning |
Non-Patent Citations (2)
Title |
---|
WEI XIA ET AL: "A Control Strategy of Autonomous Vehicles Based on Deep Reinforcement Learning", 《2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN》 * |
毛喆: "机动车疲劳驾驶行为识别方法研究", 《中国博士学位论文全文数据库 工程科技Ⅱ辑》 * |
Cited By (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107544516A (en) * | 2017-10-11 | 2018-01-05 | 苏州大学 | Automated driving system and method based on relative entropy depth against intensified learning |
CN107826105A (en) * | 2017-10-31 | 2018-03-23 | 清华大学 | Translucent automatic Pilot artificial intelligence system and vehicle |
CN109747655A (en) * | 2017-11-07 | 2019-05-14 | 北京京东尚科信息技术有限公司 | Steering instructions generation method and device for automatic driving vehicle |
CN109747655B (en) * | 2017-11-07 | 2021-10-15 | 北京京东乾石科技有限公司 | Driving instruction generation method and device for automatic driving vehicle |
CN109752952B (en) * | 2017-11-08 | 2022-05-13 | 华为技术有限公司 | Method and device for acquiring multi-dimensional random distribution and strengthening controller |
CN109752952A (en) * | 2017-11-08 | 2019-05-14 | 华为技术有限公司 | Method and device for acquiring multi-dimensional random distribution and strengthening controller |
CN107862346A (en) * | 2017-12-01 | 2018-03-30 | 驭势科技(北京)有限公司 | A kind of method and apparatus for carrying out driving strategy model training |
CN107862346B (en) * | 2017-12-01 | 2020-06-30 | 驭势科技(北京)有限公司 | Method and equipment for training driving strategy model |
CN109901446A (en) * | 2017-12-08 | 2019-06-18 | 广州汽车集团股份有限公司 | Controlling passing of road junction, apparatus and system |
US11348455B2 (en) | 2017-12-08 | 2022-05-31 | Guangzhou Automobile Group Co., Ltd. | Intersection traffic control method, apparatus and system |
CN109901446B (en) * | 2017-12-08 | 2020-07-07 | 广州汽车集团股份有限公司 | Intersection passage control method, device and system |
CN110196587A (en) * | 2018-02-27 | 2019-09-03 | 中国科学院深圳先进技术研究院 | Vehicular automatic driving control strategy model generating method, device, equipment and medium |
CN110378460B (en) * | 2018-04-13 | 2022-03-08 | 北京智行者科技有限公司 | Decision making method |
CN110378460A (en) * | 2018-04-13 | 2019-10-25 | 北京智行者科技有限公司 | Decision-making technique |
CN108647789B (en) * | 2018-05-15 | 2022-04-19 | 浙江大学 | Intelligent body depth value function learning method based on state distribution sensing sampling |
CN108647789A (en) * | 2018-05-15 | 2018-10-12 | 浙江大学 | A kind of intelligent body deep value function learning method based on the sampling of state distributed awareness |
CN108995655A (en) * | 2018-07-06 | 2018-12-14 | 北京理工大学 | A kind of driver's driving intention recognition methods and system |
CN110764496A (en) * | 2018-07-09 | 2020-02-07 | 株式会社日立制作所 | Automatic driving assistance device and method thereof |
CN110764496B (en) * | 2018-07-09 | 2023-10-17 | 株式会社日立制作所 | Automatic driving assistance device and method thereof |
CN108944944A (en) * | 2018-07-09 | 2018-12-07 | 深圳市易成自动驾驶技术有限公司 | Automatic Pilot model training method, terminal and readable storage medium storing program for executing |
CN108932840A (en) * | 2018-07-17 | 2018-12-04 | 北京理工大学 | Automatic driving vehicle urban intersection passing method based on intensified learning |
CN110738221B (en) * | 2018-07-18 | 2024-04-26 | 华为技术有限公司 | Computing system and method |
CN110738221A (en) * | 2018-07-18 | 2020-01-31 | 华为技术有限公司 | operation system and method |
CN110850861A (en) * | 2018-07-27 | 2020-02-28 | 通用汽车环球科技运作有限责任公司 | Attention-based hierarchical lane change depth reinforcement learning |
CN110850861B (en) * | 2018-07-27 | 2023-05-23 | 通用汽车环球科技运作有限责任公司 | Attention-based hierarchical lane-changing depth reinforcement learning |
CN110824912A (en) * | 2018-08-08 | 2020-02-21 | 华为技术有限公司 | Method and apparatus for training a control strategy model for generating an autonomous driving strategy |
CN111091020A (en) * | 2018-10-22 | 2020-05-01 | 百度在线网络技术(北京)有限公司 | Automatic driving state distinguishing method and device |
CN109344969B (en) * | 2018-11-01 | 2022-04-08 | 石家庄创天电子科技有限公司 | Neural network system, training method thereof, and computer-readable medium |
CN109344969A (en) * | 2018-11-01 | 2019-02-15 | 石家庄创天电子科技有限公司 | Nerve network system and its training method and computer-readable medium |
CN111325230A (en) * | 2018-12-17 | 2020-06-23 | 上海汽车集团股份有限公司 | Online learning method and online learning device of vehicle lane change decision model |
CN111325230B (en) * | 2018-12-17 | 2023-09-12 | 上海汽车集团股份有限公司 | Online learning method and online learning device for vehicle lane change decision model |
CN109871010A (en) * | 2018-12-25 | 2019-06-11 | 南方科技大学 | method and system based on reinforcement learning |
CN109739216A (en) * | 2019-01-25 | 2019-05-10 | 深圳普思英察科技有限公司 | The test method and system of the practical drive test of automated driving system |
CN109934171B (en) * | 2019-03-14 | 2020-03-17 | 合肥工业大学 | Online perception method for passive driving state of driver based on hierarchical network model |
CN109934171A (en) * | 2019-03-14 | 2019-06-25 | 合肥工业大学 | Driver's passiveness driving condition online awareness method based on layered network model |
US11704554B2 (en) | 2019-05-06 | 2023-07-18 | Baidu Usa Llc | Automated training data extraction method for dynamic models for autonomous driving vehicles |
CN111899594A (en) * | 2019-05-06 | 2020-11-06 | 百度(美国)有限责任公司 | Automated training data extraction method for dynamic models of autonomous vehicles |
CN112100787B (en) * | 2019-05-28 | 2023-12-08 | 深圳市丰驰顺行信息技术有限公司 | Vehicle motion prediction method, device, electronic equipment and storage medium |
CN112100787A (en) * | 2019-05-28 | 2020-12-18 | 顺丰科技有限公司 | Vehicle motion prediction method, device, electronic device, and storage medium |
CN110160804A (en) * | 2019-05-31 | 2019-08-23 | 中国科学院深圳先进技术研究院 | A kind of test method of automatic driving vehicle, apparatus and system |
CN110160804B (en) * | 2019-05-31 | 2020-07-31 | 中国科学院深圳先进技术研究院 | Test method, device and system for automatically driving vehicle |
CN110363295A (en) * | 2019-06-28 | 2019-10-22 | 电子科技大学 | A kind of intelligent vehicle multilane lane-change method based on DQN |
CN110478911A (en) * | 2019-08-13 | 2019-11-22 | 苏州钛智智能科技有限公司 | The unmanned method of intelligent game vehicle and intelligent vehicle, equipment based on machine learning |
CN110673602A (en) * | 2019-10-24 | 2020-01-10 | 驭势科技(北京)有限公司 | Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment |
CN110673602B (en) * | 2019-10-24 | 2022-11-25 | 驭势科技(北京)有限公司 | Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment |
CN110991095A (en) * | 2020-03-05 | 2020-04-10 | 北京三快在线科技有限公司 | Training method and device for vehicle driving decision model |
CN110991095B (en) * | 2020-03-05 | 2020-07-03 | 北京三快在线科技有限公司 | Training method and device for vehicle driving decision model |
CN111426933A (en) * | 2020-05-19 | 2020-07-17 | 浙江巨磁智能技术有限公司 | Safety type power electronic module and safety detection method thereof |
CN111443621A (en) * | 2020-06-16 | 2020-07-24 | 深圳市城市交通规划设计研究中心股份有限公司 | Model generation method, model generation device and electronic equipment |
CN111443621B (en) * | 2020-06-16 | 2020-10-27 | 深圳市城市交通规划设计研究中心股份有限公司 | Model generation method, model generation device and electronic equipment |
CN112201070A (en) * | 2020-09-29 | 2021-01-08 | 上海交通大学 | Deep learning-based automatic driving expressway bottleneck section behavior decision method |
CN112924177A (en) * | 2021-04-02 | 2021-06-08 | 哈尔滨理工大学 | Rolling bearing fault diagnosis method for improved deep Q network |
CN113511222A (en) * | 2021-08-27 | 2021-10-19 | 清华大学 | Scene self-adaptive vehicle interactive behavior decision and prediction method and device |
CN113511222B (en) * | 2021-08-27 | 2023-09-26 | 清华大学 | Scene self-adaptive vehicle interaction behavior decision and prediction method and device |
CN114332500A (en) * | 2021-09-14 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Image processing model training method and device, computer equipment and storage medium |
CN113807503B (en) * | 2021-09-28 | 2024-02-09 | 中国科学技术大学先进技术研究院 | Autonomous decision making method, system, device and terminal suitable for intelligent automobile |
CN113807503A (en) * | 2021-09-28 | 2021-12-17 | 中国科学技术大学先进技术研究院 | Autonomous decision making method, system, device and terminal suitable for intelligent automobile |
CN114624645B (en) * | 2022-03-10 | 2022-09-30 | 扬州宇安电子科技有限公司 | Miniature rotor unmanned aerial vehicle radar reconnaissance system based on micro antenna array |
CN114624645A (en) * | 2022-03-10 | 2022-06-14 | 扬州宇安电子科技有限公司 | Miniature rotor unmanned aerial vehicle radar reconnaissance system based on micro antenna array |
CN114880938A (en) * | 2022-05-16 | 2022-08-09 | 重庆大学 | Method for realizing decision of automatically driving automobile behavior |
CN116757272A (en) * | 2023-07-03 | 2023-09-15 | 西湖大学 | Continuous motion control reinforcement learning framework and learning method |
Also Published As
Publication number | Publication date |
---|---|
CN107169567B (en) | 2020-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107169567A (en) | The generation method and device of a kind of decision networks model for Vehicular automatic driving | |
Li et al. | Humanlike driving: Empirical decision-making system for autonomous vehicles | |
CN107229973A (en) | The generation method and device of a kind of tactful network model for Vehicular automatic driving | |
CN109709956B (en) | Multi-objective optimized following algorithm for controlling speed of automatic driving vehicle | |
CN103364006B (en) | For determining the system and method for vehicle route | |
CN106991251B (en) | Cellular machine simulation method for highway traffic flow | |
CN109466543A (en) | Plan autokinetic movement | |
CN113044064B (en) | Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning | |
CN110196587A (en) | Vehicular automatic driving control strategy model generating method, device, equipment and medium | |
CN107310550A (en) | Road vehicles travel control method and device | |
Scheel et al. | Situation assessment for planning lane changes: Combining recurrent models and prediction | |
Bayar et al. | Impact of different spacing policies for adaptive cruise control on traffic and energy consumption of electric vehicles | |
CN115601954B (en) | Lane change judgment method, device, equipment and medium for intelligent networked fleet | |
CN113715842B (en) | High-speed moving vehicle control method based on imitation learning and reinforcement learning | |
Koenig et al. | Bridging the gap between open loop tests and statistical validation for highly automated driving | |
CN117668413A (en) | Automatic driving comprehensive decision evaluation method and device considering multiple types of driving elements | |
Jia et al. | An LSTM-based speed predictor based on traffic simulation data for improving the performance of energy-optimal adaptive cruise control | |
Wen et al. | Modeling human driver behaviors when following autonomous vehicles: An inverse reinforcement learning approach | |
CN108839655A (en) | A kind of cooperating type self-adaptation control method based on minimum safe spacing | |
CN114954498A (en) | Reinforced learning lane change behavior planning method and system based on simulated learning initialization | |
Jebessa et al. | Analysis of reinforcement learning in autonomous vehicles | |
CN115096305A (en) | Intelligent driving automobile path planning system and method based on generation of countermeasure network and simulation learning | |
Mao et al. | Deep learning based vehicle position estimation for human drive vehicle at connected freeway | |
Tang et al. | Research on decision-making of lane-changing of automated vehicles in highway confluence area based on deep reinforcement learning | |
Zhang et al. | Lane Change Decision Algorithm Based on Deep Q Network for Autonomous Vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |