CN109948054A - A kind of adaptive learning path planning system based on intensified learning - Google Patents
A kind of adaptive learning path planning system based on intensified learning Download PDFInfo
- Publication number
- CN109948054A CN109948054A CN201910202413.0A CN201910202413A CN109948054A CN 109948054 A CN109948054 A CN 109948054A CN 201910202413 A CN201910202413 A CN 201910202413A CN 109948054 A CN109948054 A CN 109948054A
- Authority
- CN
- China
- Prior art keywords
- learning
- student
- state
- path
- ability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to a kind of adaptive learning path planning system based on intensified learning, including environmental simulation, three modules of Strategies Training and path planning, the ability value at student's each moment that whole process is obtained according to the improved project principle of reflection, based on markov decision process, simulate complicated academic environment, it is online finally student's Adaptive Planning learning path according to trained strategy and rationally using the study track off-line training Path Planning of algorithm combination student's history of intensified learning.Present invention is finally based on the thoughts of intensified learning, the complex scene learnt on online education platform is implemented in the frame of markov decision process, target is promoted to efficient capacitation, the duration for providing education resource for student is recommended, optimal learning path is planned, to improve the learning effect and learning efficiency of learner.
Description
Technical field
The present invention relates to a kind of adaptive learning path planning system based on intensified learning, belongs to Computer Applied Technology
Field.
Background technique
With becoming increasingly popular for online education, various E-learning resources, including e-book are can be used in student, are practiced after class
It practises and learns video, in view of the background of student, the diversity and otherness of mode of learning and know-how, online education platform are needed
Personalized education resource recommendation tool is introduced, to facilitate student to select the learning path of oneself, it is personalized to meet them
Learning demand.
Existing individualized learning resource recommendation algorithm, can be divided into two classes substantially, and rule-based recommendation and data are driven
Dynamic recommendation, most of intelligence instruction systems (Intelligent Tutoring System, ITS), mostly uses greatly rule-based
Method carry out the recommendation of education resource, this needs domain expert just to assess the study scene of different type student, and fixed
Adopted corresponding extensive recommendation rules.It will be apparent that and this labor-intensive method can be only applied to specific learning areas, can
Scalability is not strong.For modern extensive on-line education system, designer generallys use the recommended method of data-driven, such as assists
Study proposed algorithm is realized with filter method.The proposed algorithm of these data-drivens attempts by comparing student and learning object
Between similitude to recommend suitable education resource for student.
Although the recommended method of data-driven has more scalability and versatility than rule-based method, at present
Some solutions realize to student carry out adaptive learning resource recommendation in terms of all there is a problem that it is identical, i.e., often only
It can be according to the perhaps learning behavior of student in education resource, to retrieve the education resource or similar learning behavior of Similar content
Student group in, not in view of education resource difficulty and student's learning state dynamic changes influence.
Based on it is presently recommended that algorithm present Research, traditional proposed algorithm such as collaborative filtering, hidden semantic model etc., mainly
It is intended for commercial product recommending or the distribution from media content, main target is to guess the hobby of user, recommends its interested for user
Commodity or content all more lay particular emphasis on the calculating of similitude whether in terms of customer-side or content;And it is provided towards study
The recommendation in source, what is more valued is that education resource can be for the capability improving of student's bring, this is not simple in conventional recommendation algorithm
What the calculating based on similitude can be accomplished, and the promotion of student ability be it is procedural, it is not stranghtforward, among these
Just it has been related to the planning of learning path.Thus the present invention proposes a kind of adaptive learning path planning side based on intensified learning
Method effectively solves the above problems, and makes the strategy of the maximum most fast capability improving of student's acquisition.
Summary of the invention
Technology of the invention solves the problems, such as: overcoming the deficiencies of the prior art and provide a kind of based on the adaptive of intensified learning
The complex scene learnt on online education platform is implemented in Ma Er by learning path planning system, the thought based on intensified learning
In the frame of section's husband's decision process, target is promoted to efficient capacitation, the duration for providing education resource for student is recommended,
It plans optimal learning path, can be improved the learning effect and learning efficiency of learner.
A kind of technical solution of the invention: adaptive learning path planning system based on intensified learning, comprising: packet
Include environmental simulation module, Strategies Training module and path planning module.
Environmental simulation module, realize complicated online learning environment is converted to machine it will be appreciated that language and text
Word;Based on student in the essential information of history learning record and education resource on on-line study platform, according to improved
The project principle of reflection, formalization obtain the five-tuple of markov decision process;
Strategies Training module realizes function of the off-line training based on the Path Planning under each capability state;According to
The five-tuple for the markov decision process that environmental simulation module obtains, using the Q_learning algorithm based on intensified learning,
Off-line training is obtained based on the Path Planning under each capability state;
Path planning module realizes as the function of target student's real-time perfoming path planning;According to Strategies Training module
Obtained strategy obtains the Optimal Learning path planned in real time for target student based on the current capability state of target student.
It is finally reached the target for improving learning effect and efficiency.
The environmental simulation module step is as follows: environmental simulation is needed based on markov decision process, by it is complicated
Line study scene form turns to the five-tuple<S, A, T of markov decision process, and R, γ>;
(11) S indicates that state will be learned according to the ability value at student's each moment that the improved project principle of reflection obtains
Raw ability value divides ability value area in student's quantity normal distribution ratio as state S, by every one-dimensional student ability value
Between, five sections are divided according to the distribution proportion of student's quantity 1:2:5:2:1, each section takes energy of the mean value as the section
Force value;
(12) A expression acts, and refers to the behavior set that intelligent body can be taken, and in the environment of online education, as student can
With the resource collection of study;
(13) T indicates state transition probability, and the student after being divided based on the state demarcation in (11), and a large amount of abilities is learned
Habit behavior path data, statistics calculate state transition probability T;
T (s, a, s')=P (st=s'| st=s, at=a)
(14) R indicates that award, award are divided into award immediately and accumulation award;
Immediately award is applied to the learning process of student, it will be appreciated that shifts after the state s moment has learnt resource a for student
To state s ', instant reward value r (s, a, s ') can be obtained, the reward value is related with following three factor:
P (T): being correctly completed probability, and student can be correctly completed the probability of education resource a under the moment ability value,
It is predicted based on learning effect assessment models.
F (T): correctly shifting the frequency, all in student path that the sample that a is transferred to state s ' is displaced through from state s,
The probability that transfer is wherein completed and being correctly completed education resource, may be expressed as:
·Diff(s1,s2)=(s '-s) difficultya, before the maximal increment of conversion front and back ability is expressed as ability
The dot product of difference value vector and education resource difficulty afterwards, it is therefore an objective to match the ability value of student and the difficulty of education resource, and
By vector scalarization, convenient for award calculating compared with.
Award may be expressed as: immediately as a result,
R (s, a, s')=ω × Diff (s, s')
ω=P (T) × F (T)+(1-P (T)) × (1-F (T))
Wherein, coefficient of the ω as maximum capacity increment, it is therefore an objective to according to student ability and known sample distribution, difference
Changing big maximum capacity increment, student can be from the growth for being correctly completed capacitation in education resource, and vice versa is trained, than
If student answers after certain wrong problem according to feedback sense to the knowledge point wherein contained, for student also it is a kind of at
It is long.Such representation also maintains the consistency of P (T) Yu F (T).
Accumulation award
Accumulation award (Return, G), is also referred to as returned, and is defined as awarding a certain specific function of sequence, if t
Award sequence after step is Rt+1,Rt+2,Rt+3,…RT, T is total step-length, then returns G and can be expressed simply as each step and encourage immediately
The sum of reward:
But since the path length of student is not quite similar, if only to find cumulative maximum award for target, with student road
The growth of electrical path length, G value also can be increasing, and this and do not meet and recommend optimal and shortest path target herein for student,
Therefore discount factor should be added herein, to weaken the influence of future returns.
(15) γ indicates discount factor, and in the expression of above-mentioned calculating accumulation award, γ ∈ [0,1] will be equivalent to future
Return make a discount, if γ levels off to 0, be only concerned about current instant award, often executing makes current award immediately most
Big behavior, essence are a kind of greedy behaviors;It, can more consideration is given to future returns if γ levels off to 1.
The Strategies Training module step is as follows:
(21) five-tuple<S of the obtained markov decision process in storage environment simulation steps, A, T, R, γ>;
(22) an initial capability state S is randomly choosed from competence set S1;
(23) based on ε-greedy strategy in S1Resource A has been selected under capability state1Learnt, has learnt A1Later, root
According to the next capability state S of environment Observable2, while having obtained instant award R2(consummatory behavior strategy), selects current energy at this time
Maximum Q value is to update Q function (completing target strategy) under power state:
Qk+1(S1, A1)=(1- α) Qk(S1,A1)rα[R2+ymaxaQk(S2, Az)]
(24) (23) constantly are recycled, until learning ability reaches requirement, i.e., arrival final state, circulation (22) are selected again
Select initial capability state;
(25) optimal policy under each capability state is stored in the form of dictionary.
Further, specific step is as follows for ε-greedy strategy:
(1) ε ∈ [0,1] value, and the random number between a random 0-1 are specified;
(2) selectable resource under current ability state is randomly choosed if random number is less than ε learnt (each resource
The probability selected isWherein | A1 | for selectable resource number under current state);
(3) select that there is maximum rating-action value Q resource under current state if random number is more than or equal to ε
It practises;
Steps are as follows for the path planning module:
(31) the current ability state s of target student is obtained;
(32) in the strategy of step (25) storage, the learning path l with s in the state of is found;
(33) path l is recommended into target student, and adaptive revised planning study in learning process continuous behind
Path.
Further, adaptive revised planning path step is as follows:
(1) previous step (31,32) can plan learning path l, next according to the current ability s of target student for it
After the habit stage, the capability state of target student is changed to s ';
(2) step (32) are repeated, according to the updated capability state s ' of target student, new recommendation paths l ' is planned for it
Compare the subsequent path and l ' of l, if it is different, l ' replacement l is then used, it is then constant if they are the same.
The advantages of the present invention over the prior art are that: existing education resource recommended technology is broadly divided into rule-based
Recommendation and data-driven education resource recommended technology, rule-based method carries out the recommendation of education resource, needs to lead
Domain expert assesses the study scene of different type student, and defines corresponding extensive recommendation rules.It is a kind of labor-intensive
Method, can be only applied to specific learning areas, scalability is not strong, and the present invention is based on intensified learning technology, using from
Cost of labor is greatly saved compared with rule-based recommended method in dynamicization planning learning path;For modern extensive
On-line education system, designer generally use the recommended method of data-driven, and the proposed algorithm of these data-drivens passes through mostly
Compare the similitude between student and learning object to recommend suitable education resource for student, causes to exist in learning path big
The education resource of similarity redundancy is measured, not in view of the efficiency that student ability is promoted, the present invention is with the history learning rail of a large amount of students
Mark is sample, extracts the capability state of the student of student, using end-state as target training Generalization bounds, is realized most fast maximum
The ability of ground promotion student;The present invention in such a way that online recommendation paths combine, solves recommendation using off-line training strategy
Response speed problem, to realize Adaptive Planning learning path.
Detailed description of the invention
Fig. 1 is the system construction drawing of learning path planing method;
Fig. 2 is the flow diagram of environmental simulation;
Fig. 3 is the flow diagram of Strategies Training;
Fig. 4 is the schematic diagram of learning path reasonable evaluation;
Fig. 5 is the recommendation paths and non-recommended path average length comparison diagram of this technology and the prior art;
Fig. 6 is the schematic diagram of learning path efficiency assessment;
Fig. 7 is this technology route matching degree and capacity gain datagram.
Specific embodiment
The adaptive learning paths planning method proposed by the present invention based on intensified learning is explained in detail with reference to the accompanying drawing.
Adaptive learning paths planning method proposed by the present invention based on intensified learning, overall system architecture such as Fig. 1, base
In the historical data of student and education resource, the user basic information of teacher and student, the content-data (class of different education resources
Journey video, after class system, zone of discussion etc.) and student and education resource interbehavior data, initial data is stored regular
It is transmitted in HDFS and saves for a long time, since learning path planning system can also generate student and education resource in the process of running
Interbehavior data are equally also required to regularly update this batch data.Based on the partial data, environment mould is successively carried out
It is quasi-, Strategies Training and path planning step, based on the study scene of markov decision process frame simulation student, extract and from
Dispersion student each study stage Efficiency analysis as state, statistic behavior transfer is general from the learning behavior data of history
Rate, and associative learning recruitment evaluation module trains the build-in attribute of obtained education resource, training generates during intensified learning
Complicated on-line study scene form, is thus turned to the Ma Er of mathematics level by the instant award of intelligent body and environmental interaction feedback
Section husband decision process frame, using nitrification enhancement, the optimal learning strategy of trial and error training, above section are counted due to it repeatedly
The considerations of evaluation time cost is periodically offline update, is finally based on trained learning strategy, the current energy according to target student
Power state plans optimal learning path for it, and to be enable to respond quickly recommender system, which is student
Quickly and consistently carry out the recommendation of education resource and the planning of learning path, then by target student it is newly generated with study provide
The interaction data in source is stored in database.
The present invention is based on intensified learning, it is pair that markov decision process, which is that complete observable environment is described,
The abstract and idealization for a kind of mathematics level that intensified learning problem carries out, it enables complex environment to transform into machine
The language and text that device understands, in order to be able to the challenge under actual environment be solved using the algorithm of intensified learning
It answers.It thus needs to carry out each key element in markov decision process formal definitions mathematically, according to student
Learning behavior data, simulation steps flow diagram is carried out to environment of the student in learning process as shown in Fig. 2, study is imitated
The ability value at student's each moment that the training of fruit assessment models obtains is made as input according to normal distribution discretization ability value
For state S;Based on the state divided, and a large amount of learning behavior data, statistics calculate state transition probability T;According to meter
Formula is calculated, award R immediately can be calculated;Based on instant award, strategy is obtained using nitrification enhancement training, i.e., each state
The optimal movement that can be taken down can be used for doing for target student and recommend, and the current capability state of input target student is planned for it
Optimal learning path.Based on above-mentioned process, academic environment form complicated in online education can be turned to Markov and determined
Plan process, is represented by a five-tuple<S, A, T, and R, γ>.
Strategies Training step involved in the present invention, process signal are as shown in Figure 3, the specific steps are as follows:
(1) five-tuple<S of the obtained markov decision process in storage environment simulation steps, A, T, R, γ>;
(2) an initial capability state S is randomly choosed from competence set S1;
(3) based on ε-greedy strategy in S1Resource A has been selected under capability state1Learnt, has learnt A1Later, root
According to the next capability state S of environment Observable2, while having obtained instant award R2(consummatory behavior strategy), selects current energy at this time
Maximum Q value is to update Q function (completing target strategy) under power state:
Qk+1(S1,A1)=(1- α) Qk(SI,A1)ra[R2+γmaxaQk(S2, Az)]
(4) (23) constantly are recycled, until learning ability reaches requirement, i.e., arrival final state, circulation (22) reselect
Initial capability state;
(5) optimal policy under each capability state is stored in the form of dictionary.
Adaptive learning paths planning method proposed by the present invention based on intensified learning, the ability current from target student
State is set out, and optimal learning path is planned for it, so that student ability is obtained the promotion of highest effect, for the study of recommendation
Path, the present invention compare the prior art, have carried out experimental evaluation for the learning path of recommendation, experimental section is classified into two sides
Face, the validity experiment of recommendation paths and the reasonability experiment of recommendation paths.
1. reasonability is tested
The reasonability experiment of recommendation paths is mainly used for verifying, and the education resource in recommendation paths is for target student
Whether rationally, consider from the length in path, if the most fast promotion for obtaining capacitation of student, i.e., more identical threshold energy can be made
The path of power and identical final ability, whether path length is shorter compared with Actual path for recommendation paths.As shown in figure 4, this
Invention is that the student of each capability state recommends a paths, for each paths, from the original interaction data of a large amount of student
In, the non-recommended path of initial ability identical as recommendation paths and identical final ability is picked out, the difference of path length is compared
Different, for the difference of the length of student's recommendation paths and non-recommended path of more different ability levels, the present invention rises according to student
The Efficiency analysis of beginning is clustered, and student is fallen into 5 types, and from I to V, integration capability from low to high, counts under each classification and owns
The non-recommended learning path length of start-stop ability identical as recommendation paths, and more accordingly pushed away in following different proposed algorithm
It recommends and the mean value size of non-recommended path length, wherein UCF and ICF is Collaborative Filtering Recommendation Algorithm, PI, VI, Sarsa and Q_
Learning is the learning path planning algorithm based on intensified learning.For experimental index, the present invention is intuitively using recommendation road
The length L of diameterrecAnd the average length L in non-recommended pathno_rec。
Lrec=lrec
1) UCF: the collaborative filtering based on user calculates the similitude of student ability, recommends and target student ability
The learning path of similar student.
2) ICF: the collaborative filtering based on article calculates the similitude of study Resource Properties, search and target student
The similar education resource of history learning resource, will there is the student of interbehavior with this education resource, and other education resources are recommended
Give target student.
3) PI: the path planning algorithm based on Policy iteration, the nitrification enhancement based on Dynamic Programming.
4) VI: the path planning algorithm based on value iteration, the nitrification enhancement based on Dynamic Programming.
5) Sarsa: the path planning algorithm based on Sarsa, Timing Difference synchronization policy nitrification enhancement.
6) Q_learning: the path planning algorithm based on Q_learning, the asynchronous tactful intensified learning of Timing Difference are calculated
Method, the Strategies Training method used for the present invention.
The result of reasonability experiment as shown in figure 5, compare under different initiation capacity states, proposed algorithm initiation capacity compared with
Performance when low is preferable, and initiation capacity, in a higher state, effect and the non-recommended effect of recommendation are not much different,
Show that higher student of ability value itself has had stronger learning ability, and selectable resource space is smaller.
Based on the proposed algorithm of intensified learning under identical initial ability level, the path length of recommendation is integrally shorter than
The recommendation paths of UCF and ICF algorithm, reason are that the path planning algorithm based on collaborative filtering only accounts for student or study money
The similitude in source, for target student recommend similar student path or similar education resource, learning in view of student
The demand of capability improving in the process.Wherein ICF is more that student recommends similar education resource, consolidates knowledge repeatedly though having
Effect, reduce the forgetting of knowledge point, can also bring the promotion of ability value, but the similar education resource of repetition learning causes to learn
The redundancy in path is practised, so that learning efficiency reduces.In contrast, UCF brings relatively more reasonable recommendation in path length
Performance, but since it searches for already present learning path in existing student, other learning paths are not explored, and it is similar
Student might not have optimal learning path, target student can not be made to reach most so as to cause the learning path of recommendation
The promotion of big ability, if recommendation paths length of the UCF in class ii is 12, but be only capable of reaching can for its final integration capability
The 72% of power.
Comparing four kinds of learning path planning algorithms based on intensified learning can reach under identical initial ability
Highest capability state.Wherein the algorithm PI based on Policy iteration and the algorithm VI recommendation effect based on value iteration are almost the same,
It is to find optimum state value function, difference is based on state in Policy iteration since essence is consistent in an iterative process for it
Value is continuously evaluated stragetic innovation strategy, and being worth iteration then is direct searching optimum state value function, calculates plan further according to state value
Slightly, but since Policy iteration has carried out double-layer lap generation, iteration efficiency is far below value iteration.
Sarsa and Q_learning algorithm is compared with the nitrification enhancement based on Dynamic Programming, same original state energy
Under power, the learning path length of recommendation is relatively much shorter, recommends performance more excellent especially in I class and class ii, reason is base
In the learning algorithm that the nitrification enhancement of Timing Difference is model-free, the ambient condition without relying on sample data shifts general
Rate, and the diversity of data is also enriched while study come self-learning environment by way of continuous trial and error.
It is equally Temporal-difference, Q_learning algorithm is compared with Sarsa algorithm, in lower initial ability state
Under, the recommendation learning path of Q_learning is shorter, then shows under the initial ability state of ability similar, and the main distinction exists
Synchronization policy is used when Sarsa is in more new environment and value function, using same policy update state and movement, with selection
Movement updates value function, and Q_learning uses asynchronous strategy, independently selects current value function maximum when updating value function
When action value, exploring and use aspects have obtained better balance, thus be easier to obtain global optimum path, and Sarsa
Update mode then tend to safer local optimum path.
And the problem of thus bringing be Q_leanring convergence rate it is slower compared with Sarsa, but grind in view of of the invention
Study carefully content, trained strategy can be used for the online real-time recommendation learning path of student with off-line training strategy, thus Q_
Learning is a better choice of the invention.
2. validity is tested
Recommend validity experiment, as shown in fig. 6, analyzing true learning path using the existing historical interaction data of student
With the distribution of matching degree and the student capability improving under really study scene of recommendation paths, i.e. identical of analysis foundation
It is raw, it after the education resource for completing identical quantity, is matched with recommendation paths more, if ability value improves more.
The present invention is that the student of each capability state recommends an optimal path, for each paths, is learned from a large amount of
In raw original interaction data, the true learning path of initial ability identical as recommendation paths is picked out, and with the length of recommendation paths
Degree truncation, the matching degree of comparative analysis Actual path and recommendation paths, and final ability value mentioning compared to threshold energy force value
It rises, i.e., under more identical initiation capacity state and same path length, the matching degree and ability for analyzing it with recommendation paths are mentioned
The distribution situation risen.
Matching degree Match indicates under identical initiation capacity state, the matching journey in non-recommended path after recommendation paths and truncation
Degree:
Wherein, | | Pathrec∩Pathno_rec| | indicate the length of recommendation paths and the continuous public substring of non-recommended path longest
Degree, | | Pathrec| | indicate the length of recommendation paths.
Fig. 7 is the path planning algorithm experimental data based on Q_learning, and row indicates under identical match degree, different first
The corresponding capacity gain of beginning ability;Under the identical initial ability of column expression, the corresponding capacity gain of Different matching degree.Wherein '-'
It indicates not finding the Actual path with recommendation paths exact matching in the history interbehavior data of student.It can be seen by data
Out, under identical match degree, when initial ability is lower, capability improving is bigger, such as schemes.When matching degree is 40% or more
When, under identical initiation capacity state, capacity gain increases with matching degree and is improved, as shown in fig. 7, Actual path and recommending
Route matching degree is higher, is more conducive to the promotion of student ability, and the path for sufficiently demonstrating recommendation promotes student ability
Validity.
And for I, under II class initial ability state, in practical interbehavior data, can not find and recommendation paths
The true path of exact matching indicates the proposed algorithm based on Q_learning based on the new overall situation of existing Data Mining most
Shortest path.
Recited above is only the adaptive learning paths planning method embodiment embodied the present invention is based on intensified learning.This
Invention is not limited to above-described embodiment.Specification of the invention is not limit the scope of the claims for being illustrated.For
Those skilled in the art, it is clear that can have many replacements, improvements and changes.It is all to use equivalent substitution or equivalent transformation shape
At technical solution, be all fallen within the protection domain of application claims.
Claims (8)
1. a kind of adaptive learning path planning system based on intensified learning characterized by comprising environmental simulation module,
Strategies Training module and path planning module;
Environmental simulation module, realize complicated online learning environment is converted to machine it will be appreciated that language and text;It is based on
Student is reflected in the essential information of history learning record and education resource on on-line study platform according to improved project
Principle, formalization obtain the five-tuple of markov decision process;
Strategies Training module realizes function of the off-line training based on the Path Planning under each capability state;According to environment mould
The five-tuple for the markov decision process that quasi- module obtains, it is offline to instruct using the Q_learning algorithm based on intensified learning
It gets based on the Path Planning under each capability state;
Path planning module is embodied as the function of target student's real-time perfoming path planning;It is obtained according to Strategies Training module
Strategy is obtained the Optimal Learning path planned in real time for target student, is finally reached based on the current capability state of target student
To the target for improving learning effect and efficiency.
2. the adaptive learning path planning system according to claim 1 based on intensified learning, it is characterised in that: described
Environmental simulation module step is accomplished by
(21) S indicates capability state set, obtains the ability value at student's each moment according to the improved project principle of reflection, will
The ability value of student is defined as state, for the discrete type for guaranteeing state, needs to carry out ability division, by every one-dimensional student ability
Value divides ability value section in student's quantity normal distribution ratio, carrys out demarcation interval according to student's quantity Gaussian Profile ratio,
Each section takes ability value of the mean value as the section;
(22) A indicates set of actions, refers to the behavior set that intelligent body can be taken, and in the environment of online education, as student is learned
The resource collection of habit;
(23) T indicates state transition probability, after state and ability division after being divided based on the ability in step (11)
Raw learning behavior path data, statistics calculate state transition probability T;
T (s, a, s ')=P (st+1=s ' | st=s, at=a)
WhereinIndicate stateful example,Expression acts example, and t indicates moment, stIndicate the shape under t moment
State, atIndicate the movement selected under t moment;
(24) R indicates that award, award are divided into award immediately and accumulation award
Immediately award is applied to the learning process of student, and being interpreted as student in sometime state is that s ∈ S has learnt resource a ∈ A
After be transferred to state s ' ∈ S, can obtain the instant reward value r (s, a, s ') at the moment, indicate the award that R is obtained at the moment
Example, the reward value and is correctly completed probability, correct to shift the frequency and three factors of ability increment are related;
Accumulation award (Return, G), is also referred to as returned, and is defined as awarding a certain specific function of sequence, it is assumed that when current
Carve be t, then after t moment after award sequence be Rt+1, Rt+2, Rt+3... RM, M is that total duration then returns G and is expressed as each moment
Immediately then the sum awarded adds discount factor and obtains:
(25) γ indicates discount factor, and in the expression of above-mentioned calculating accumulation award, γ ∈ [0,1] is equivalent to and returns following
Report is made a discount, if γ levels off to 0, is only concerned about current instant award, often execute make currently to award immediately it is maximum
Behavior, essence are a kind of greedy behaviors;It, can more consideration is given to future returns if γ levels off to 1.
3. the adaptive learning path planning system according to claim 1 based on intensified learning, it is characterised in that: described
Steps are as follows for Strategies Training:
(31) five-tuple < S, A, T, R, the γ > of the obtained markov decision process in storage environment simulation steps;
(32) an initial capability state S is randomly choosed from capability state set S1;
(33) based on ε-greedy strategy in capability state S1Lower selection resource A1Learnt, further according under environment Observable one
A capability state S2, while obtaining awarding R immediately2, at this time select current ability state under maximum Q value to update Q value table:
Qk+1(S1, A1)=(1- α) Qk(S1, A1)+α[R2+γmaxaQk(S2, A2)],
Wherein QkIndicate current Q value table, Qk+1Indicate that updated Q value table, α indicate update ratio, every time more by new value part
New old value;
(34) continuous circulation step (33), until learning ability reaches requirement, i.e., arrival final state, circulation step (22) are heavy
Newly select initial capability state;
(35) optimal path under each capability state is stored in the form of dictionary, so far Strategies Training is completed.
4. the adaptive learning path planning system according to claim 1 based on intensified learning, it is characterised in that: described
Path planning module realizes that steps are as follows:
(41) the current ability state s ∈ S of target student is obtained;
(42) in strategy, a learning path 1 with ability s in the state of is found;
(43) learning path is recommended into target student, and the adaptive revised planning study in subsequent learning process
Path.
5. the adaptive learning path planning system according to claim 4 based on intensified learning, it is characterised in that: described
In step (43), adaptive revised planning path step is as follows:
It (51) is that the student plans learning path, after next study stage, target student according to the current ability s of target student
Capability state be changed to s ';
(52) step (42) are repeated, according to the updated capability state s ' of target student, new recommendation paths is planned for the student
l′
(53) subsequent path of a learning path l in comparison step (42) and new recommendation paths l ', if it is different, then with new
The recommendation paths l ' replacement step (42) in learning path l, it is then constant if they are the same.
6. the adaptive learning path planning system according to claim 1 based on intensified learning, it is characterised in that: described
In step (21), the discretization method of student ability state interval is distributed ratio according to the Gaussian Profile of student's quantity 1: 2: 5: 2: 1
Example divides five sections.
7. the adaptive learning path planning system according to claim 1 based on intensified learning, it is characterised in that: step
(24) in, instant reward value is related with following three factor:
P (T): being correctly completed probability, and student can be correctly completed the probability of education resource a under the moment ability value, based on
Practise recruitment evaluation model prediction;
F (T): correctly shifting the frequency, all in student path that the sample that a is transferred to state s ' is displaced through from state s, wherein leading to
The probability for being correctly completed education resource and completing transfer is crossed, is indicated are as follows:
C indicates sample number
Diff(s1, s2)=(s '-s) difficultya, the maximal increment of ability is expressed as the difference before and after ability before and after conversion
It is worth the dot product of vector and education resource difficulty, to match the ability value of student and the difficulty of education resource, and by vector scalarization,
Convenient for award calculating compared with;
Immediately award r is indicated are as follows:
R (s, a, s ')=ω × Diff (s, s ')
ω=P (T) × F (T)+(1-P (T)) × (1-F (T))
Wherein, coefficient of the ω as maximum capacity increment.
8. the adaptive learning path planning system according to claim 1 based on intensified learning, it is characterised in that: described
In step (33), specific step is as follows for ε-greedy strategy:
(71) ε ∈ [0,1] value, and the random number between a random 0-1 are specified;
(72) it randomly chooses selectable resource under current ability state if random number is less than ε to be learnt, each resource quilt
The probability of selection isWherein | A1 | for selectable resource number under current state;
(73) select under current state there is maximum rating-action value Q resource to be learnt if random number is more than or equal to ε.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910202413.0A CN109948054A (en) | 2019-03-11 | 2019-03-11 | A kind of adaptive learning path planning system based on intensified learning |
CN201910907990.XA CN110569443B (en) | 2019-03-11 | 2019-09-24 | Self-adaptive learning path planning system based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910202413.0A CN109948054A (en) | 2019-03-11 | 2019-03-11 | A kind of adaptive learning path planning system based on intensified learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109948054A true CN109948054A (en) | 2019-06-28 |
Family
ID=67008429
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910202413.0A Pending CN109948054A (en) | 2019-03-11 | 2019-03-11 | A kind of adaptive learning path planning system based on intensified learning |
CN201910907990.XA Active CN110569443B (en) | 2019-03-11 | 2019-09-24 | Self-adaptive learning path planning system based on reinforcement learning |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910907990.XA Active CN110569443B (en) | 2019-03-11 | 2019-09-24 | Self-adaptive learning path planning system based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN109948054A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288878A (en) * | 2019-07-01 | 2019-09-27 | 科大讯飞股份有限公司 | Adaptive learning method and device |
CN110601973A (en) * | 2019-08-26 | 2019-12-20 | 中移(杭州)信息技术有限公司 | Route planning method, system, server and storage medium |
CN110673488A (en) * | 2019-10-21 | 2020-01-10 | 南京航空航天大学 | Double DQN unmanned aerial vehicle concealed access method based on priority random sampling strategy |
CN110738860A (en) * | 2019-09-18 | 2020-01-31 | 平安科技(深圳)有限公司 | Information control method and device based on reinforcement learning model and computer equipment |
CN110941268A (en) * | 2019-11-20 | 2020-03-31 | 苏州大学 | Unmanned automatic trolley control method based on Sarsa safety model |
CN111626489A (en) * | 2020-05-20 | 2020-09-04 | 杭州安恒信息技术股份有限公司 | Shortest path planning method and device based on time sequence difference learning algorithm |
CN111859099A (en) * | 2019-12-05 | 2020-10-30 | 马上消费金融股份有限公司 | Recommendation method, device, terminal and storage medium based on reinforcement learning |
CN111898770A (en) * | 2020-09-29 | 2020-11-06 | 四川大学 | Multi-agent reinforcement learning method, electronic equipment and storage medium |
CN111896006A (en) * | 2020-08-11 | 2020-11-06 | 燕山大学 | Path planning method and system based on reinforcement learning and heuristic search |
CN112187710A (en) * | 2020-08-17 | 2021-01-05 | 杭州安恒信息技术股份有限公司 | Method and device for sensing threat intelligence data, electronic device and storage medium |
CN112307214A (en) * | 2019-07-26 | 2021-02-02 | 株式会社理光 | Deep reinforcement learning-based recommendation method and recommendation device |
CN112446526A (en) * | 2019-09-05 | 2021-03-05 | 美商讯能集思智能科技股份有限公司台湾分公司 | Production scheduling system and method |
CN112712385A (en) * | 2019-10-25 | 2021-04-27 | 北京达佳互联信息技术有限公司 | Advertisement recommendation method and device, electronic equipment and storage medium |
CN112734142A (en) * | 2021-04-02 | 2021-04-30 | 平安科技(深圳)有限公司 | Resource learning path planning method and device based on deep learning |
CN113111907A (en) * | 2021-03-01 | 2021-07-13 | 浙江工业大学 | Individualized PEEP adjusting method based on reinforcement learning |
CN113271338A (en) * | 2021-04-25 | 2021-08-17 | 复旦大学 | Intelligent preloading algorithm for mobile augmented reality scene |
CN113467481A (en) * | 2021-08-11 | 2021-10-01 | 哈尔滨工程大学 | Path planning method based on improved Sarsa algorithm |
CN113829351A (en) * | 2021-10-13 | 2021-12-24 | 广西大学 | Collaborative control method of mobile mechanical arm based on reinforcement learning |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111123963B (en) * | 2019-12-19 | 2021-06-08 | 南京航空航天大学 | Unknown environment autonomous navigation system and method based on reinforcement learning |
CN111415048B (en) * | 2020-04-10 | 2024-04-19 | 大连海事大学 | Vehicle path planning method based on reinforcement learning |
CN113379063B (en) * | 2020-11-24 | 2024-01-05 | 中国运载火箭技术研究院 | Whole-flow task time sequence intelligent decision-making method based on online reinforcement learning model |
CN112612948B (en) * | 2020-12-14 | 2022-07-08 | 浙大城市学院 | Deep reinforcement learning-based recommendation system construction method |
CN113128611B (en) * | 2021-04-27 | 2023-06-06 | 陕西师范大学 | Model detection method based on online learning efficiency prediction of deep learning students |
CN113268611B (en) * | 2021-06-24 | 2022-11-01 | 北京邮电大学 | Learning path optimization method based on deep knowledge tracking and reinforcement learning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6120300A (en) * | 1996-04-17 | 2000-09-19 | Ho; Chi Fai | Reward enriched learning system and method II |
CN105956754A (en) * | 2016-04-26 | 2016-09-21 | 北京京师乐学教育科技有限公司 | Learning path planning system and method based on students' academic big data system |
US20180253989A1 (en) * | 2017-03-04 | 2018-09-06 | Samuel Gerace | System and methods that facilitate competency assessment and affinity matching |
CN108803313B (en) * | 2018-06-08 | 2022-07-12 | 哈尔滨工程大学 | Path planning method based on ocean current prediction model |
-
2019
- 2019-03-11 CN CN201910202413.0A patent/CN109948054A/en active Pending
- 2019-09-24 CN CN201910907990.XA patent/CN110569443B/en active Active
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288878A (en) * | 2019-07-01 | 2019-09-27 | 科大讯飞股份有限公司 | Adaptive learning method and device |
CN110288878B (en) * | 2019-07-01 | 2021-10-08 | 科大讯飞股份有限公司 | Self-adaptive learning method and device |
CN112307214A (en) * | 2019-07-26 | 2021-02-02 | 株式会社理光 | Deep reinforcement learning-based recommendation method and recommendation device |
CN110601973A (en) * | 2019-08-26 | 2019-12-20 | 中移(杭州)信息技术有限公司 | Route planning method, system, server and storage medium |
CN112446526B (en) * | 2019-09-05 | 2024-03-12 | 美商讯能集思智能科技股份有限公司台湾分公司 | Production scheduling system and method |
CN112446526A (en) * | 2019-09-05 | 2021-03-05 | 美商讯能集思智能科技股份有限公司台湾分公司 | Production scheduling system and method |
CN110738860A (en) * | 2019-09-18 | 2020-01-31 | 平安科技(深圳)有限公司 | Information control method and device based on reinforcement learning model and computer equipment |
CN110738860B (en) * | 2019-09-18 | 2021-11-23 | 平安科技(深圳)有限公司 | Information control method and device based on reinforcement learning model and computer equipment |
CN110673488A (en) * | 2019-10-21 | 2020-01-10 | 南京航空航天大学 | Double DQN unmanned aerial vehicle concealed access method based on priority random sampling strategy |
CN112712385B (en) * | 2019-10-25 | 2024-01-12 | 北京达佳互联信息技术有限公司 | Advertisement recommendation method and device, electronic equipment and storage medium |
CN112712385A (en) * | 2019-10-25 | 2021-04-27 | 北京达佳互联信息技术有限公司 | Advertisement recommendation method and device, electronic equipment and storage medium |
CN110941268A (en) * | 2019-11-20 | 2020-03-31 | 苏州大学 | Unmanned automatic trolley control method based on Sarsa safety model |
CN111859099B (en) * | 2019-12-05 | 2021-08-31 | 马上消费金融股份有限公司 | Recommendation method, device, terminal and storage medium based on reinforcement learning |
CN111859099A (en) * | 2019-12-05 | 2020-10-30 | 马上消费金融股份有限公司 | Recommendation method, device, terminal and storage medium based on reinforcement learning |
CN111626489B (en) * | 2020-05-20 | 2023-04-18 | 杭州安恒信息技术股份有限公司 | Shortest path planning method and device based on time sequence difference learning algorithm |
CN111626489A (en) * | 2020-05-20 | 2020-09-04 | 杭州安恒信息技术股份有限公司 | Shortest path planning method and device based on time sequence difference learning algorithm |
CN111896006A (en) * | 2020-08-11 | 2020-11-06 | 燕山大学 | Path planning method and system based on reinforcement learning and heuristic search |
CN111896006B (en) * | 2020-08-11 | 2022-10-04 | 燕山大学 | Path planning method and system based on reinforcement learning and heuristic search |
CN112187710B (en) * | 2020-08-17 | 2022-10-21 | 杭州安恒信息技术股份有限公司 | Method and device for sensing threat intelligence data, electronic device and storage medium |
CN112187710A (en) * | 2020-08-17 | 2021-01-05 | 杭州安恒信息技术股份有限公司 | Method and device for sensing threat intelligence data, electronic device and storage medium |
CN111898770B (en) * | 2020-09-29 | 2021-01-15 | 四川大学 | Multi-agent reinforcement learning method, electronic equipment and storage medium |
CN111898770A (en) * | 2020-09-29 | 2020-11-06 | 四川大学 | Multi-agent reinforcement learning method, electronic equipment and storage medium |
CN113111907A (en) * | 2021-03-01 | 2021-07-13 | 浙江工业大学 | Individualized PEEP adjusting method based on reinforcement learning |
CN112734142A (en) * | 2021-04-02 | 2021-04-30 | 平安科技(深圳)有限公司 | Resource learning path planning method and device based on deep learning |
CN113271338A (en) * | 2021-04-25 | 2021-08-17 | 复旦大学 | Intelligent preloading algorithm for mobile augmented reality scene |
CN113467481B (en) * | 2021-08-11 | 2022-10-25 | 哈尔滨工程大学 | Path planning method based on improved Sarsa algorithm |
CN113467481A (en) * | 2021-08-11 | 2021-10-01 | 哈尔滨工程大学 | Path planning method based on improved Sarsa algorithm |
CN113829351A (en) * | 2021-10-13 | 2021-12-24 | 广西大学 | Collaborative control method of mobile mechanical arm based on reinforcement learning |
CN113829351B (en) * | 2021-10-13 | 2023-08-01 | 广西大学 | Cooperative control method of mobile mechanical arm based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN110569443B (en) | 2022-05-17 |
CN110569443A (en) | 2019-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109948054A (en) | A kind of adaptive learning path planning system based on intensified learning | |
CN110555112B (en) | Interest point recommendation method based on user positive and negative preference learning | |
CN111813921B (en) | Topic recommendation method, electronic device and computer-readable storage medium | |
CN114020929B (en) | Intelligent education system platform design method based on course knowledge graph | |
CN107103384A (en) | A kind of learner's study track quantization method based on three-dimensional knowledge network | |
CN108172047B (en) | A kind of network on-line study individualized resource real-time recommendation method | |
CN109858797A (en) | The various dimensions information analysis of the students method of knowledge based network exact on-line education system | |
CN113239209A (en) | Knowledge graph personalized learning path recommendation method based on RankNet-transformer | |
CN113434563A (en) | Reinforced learning method and system in adaptive learning path recommendation | |
Wang et al. | Education Data‐Driven Online Course Optimization Mechanism for College Student | |
CN115249072A (en) | Reinforced learning path planning method based on generation of confrontation user model | |
Zhao et al. | An improved ant colony optimization algorithm for recommendation of micro-learning path | |
Dai et al. | Study of online learning resource recommendation based on improved BP neural network | |
Zhou et al. | LANA: towards personalized deep knowledge tracing through distinguishable interactive sequences | |
Ren et al. | MulOER-SAN: 2-layer multi-objective framework for exercise recommendation with self-attention networks | |
CN117035074B (en) | Multi-modal knowledge generation method and device based on feedback reinforcement | |
Hnida et al. | Adaptive teaching learning sequence based on instructional design and evolutionary computation | |
Dong | [Retracted] Teaching Design of “Three‐Dimensional” Blended Ideological and Political Courses from the Perspective of Deep Learning | |
CN111882124B (en) | Homogeneous platform development effect prediction method based on generation confrontation simulation learning | |
Youssef et al. | Optimal Combination of Imitation and Reinforcement Learning for Self-driving Cars. | |
Wu et al. | Contrastive Personalized Exercise Recommendation With Reinforcement Learning | |
Chen et al. | Adaptive Learning Path Navigation Based on Knowledge Tracing and Reinforcement Learning | |
Liu et al. | SARLR: Self-adaptive Recommendation of Learning Resources. | |
Ren et al. | Fully adaptive recommendation paradigm: top-enhanced recommender distillation for intelligent education systems | |
Xia et al. | The construction of knowledge graphs based on associated STEM concepts in MOOCs and its guidance for sustainable learning behaviors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190628 |