CN109496305A - Nash equilibrium strategy on continuous action space and social network public opinion evolution model - Google Patents

Nash equilibrium strategy on continuous action space and social network public opinion evolution model Download PDF

Info

Publication number
CN109496305A
CN109496305A CN201880001570.9A CN201880001570A CN109496305A CN 109496305 A CN109496305 A CN 109496305A CN 201880001570 A CN201880001570 A CN 201880001570A CN 109496305 A CN109496305 A CN 109496305A
Authority
CN
China
Prior art keywords
media
gossiper
idea
intelligent body
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201880001570.9A
Other languages
Chinese (zh)
Other versions
CN109496305B (en
Inventor
侯韩旭
郝建业
张程伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan University of Technology
Original Assignee
Dongguan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan University of Technology filed Critical Dongguan University of Technology
Publication of CN109496305A publication Critical patent/CN109496305A/en
Application granted granted Critical
Publication of CN109496305B publication Critical patent/CN109496305B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Feedback Control In General (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a Nash equilibrium strategy on a continuous action space and a social network public opinion evolution model, belonging to the field of a reinforcement learning method. The strategy of the invention comprises the following steps: initializing parameters; according to a normal distribution with a certain exploration rateRandomly selecting an action(ii) a And executes the execution and then obtains the reward from the environment(ii) a If an agentPerforming an actionThe return received laterGreater than the current cumulative average rewardThen, thenHas a learning rate ofOtherwise the learning rate isUpdating according to the selected learning rateVariance, varianceFinally, updating the cumulative average policy(ii) a If cumulative average policyConvergence, then output cumulative average strategyAs the final action of agent i. The invention has the beneficial effects that: maximize their interest in interacting with other agents and ultimately enable learning of nash equilibrium.

Description

Nash Equilibrium strategy and social networks public opinion evolution model on Continuous action space
Technical field
The present invention relates to the Nash Equilibrium strategy on a kind of Nash Equilibrium strategy more particularly to a kind of Continuous action space, Further relate to a kind of social networks public opinion evolution model based on the Nash Equilibrium strategy on the Continuous action space.
Background technique
In the environment of Continuous action space, on the one hand, intelligent body be to the selection of movement it is unlimited, it is traditional based on Q Table class algorithm can not also store the estimation of infinite number of return;On the other hand, in multiple agent environment, continuous movement is empty Between also will increase the difficulty of problem.
In multiple agent nitrification enhancement field, the motion space of intelligent body can be discrete finite aggregate, can also be with It is continuously to gather.Because the essence of intensified learning is to find optimal, and continuous motion space tool by continuous trial and error There is infinite more movement selection, and multiple agent environment increases the dimension of motion space, this makes general intensified learning Algorithm is difficult study to global optimum (or balanced).
Major part algorithm is all based on function approximation technology and solves continuous problem at present, and this kind of algorithm can be divided into two classes: value Approximate algorithm [1-5] and tactful approximate algorithm [6-9].It is worth approximate algorithm to explore motion space and estimate corresponding value according to return Function, and policy definition is the probability-distribution function on Continuous action space and direct learning strategy by tactful approximate algorithm.This The performance of class algorithm depends on the accuracy of the estimation to value function or strategy, asks in processing challenge such as nonlinear Control It is often unable to do what one wishes when topic.In addition, there are also a kind of algorithm [10,11] based on sampling, this kind of algorithm maintain one it is discrete dynamic Work collects, then using the optimal movement in conventional discrete class algorithms selection behavior aggregate, finally according to a kind of resampling new mechanism Behavior aggregate is to gradually learn to optimal.This kind of algorithm can be very easily in conjunction with conventional discrete class algorithm, the disadvantage is that algorithm Need longer convergence time.Above-mentioned all algorithms are all to calculate the optimal policy in single intelligent body environment as target design , it can not be directly applied in the study of multiple agent environment.
Many work develop [12-14] using the public opinion in intelligent body simulation Technique Study social networks in recent years.It is given Different groups there are the groups of different ideas distribution, research group its idea during mutual contacts is finally that can reach altogether Know or polarization still tanglewracks [15] always.The key for solving the problems, such as this is how to understand public opinion differentiation Dynamic, thus obtain cause public opinion move towards consistent immanent cause [15].Problem is developed for the public opinion in social networks, is ground The person of studying carefully proposes a variety of multi-agent Learning models [16-20] and has studied the factors such as different information sharings or exchange degree to public opinion The influence of differentiation.Wherein [21-23] have studied the influence that the factors such as different information sharings or exchange degree develop public opinion.[14 24-28] etc. work studied using evolutionary game theory model intelligent body behavior (such as betray and cooperation) it is how mutual from companion It is evolved in dynamic.These work are to the behavior modeling of intelligent body, and it is identical for assuming all intelligent bodies all.However, in reality In the situation of border, individual can play the part of different roles (for example, leader or follower) in society, and this is according to the above method It is unable to accurate modeling.For this purpose, social groups are divided into media and public two parts and difference by Quattrociochi et al. [12] Modeling, wherein media and other masses that public idea is followed by it are influenced, and the idea of media is by outstanding in media Person influences.Then, Zhao et al. [29] proposes the public opinion mould based on leader follower (leader-follower) type Type explores the formation of public opinion.In the two work, the adjustable strategies of intelligent body idea are all to imitate leader or successfully Colleague.Based on the related work of imitation, there are also Local Majority [30], Conformity [31] and Imitating Neighbor[32].However, in actual environment, strategy that people take in doing decision is complicated more than simply imitating. People are combined and oneself carry out decision factum with the knowledge of grasp often by constantly interacting with circumstances not known.This Outside, what the strategy based on imitation cannot guarantee that algorithm can learn is global optimum, because of the quality of its intelligent body strategy The strategy of leader or the person of being imitated are depended on, and the strategy of leader is also not necessarily all best.
Summary of the invention
To solve the problems of the prior art, the present invention provides the Nash Equilibrium strategy on a kind of Continuous action space, this Invention additionally provides a kind of social networks public opinion evolution model based on the Nash Equilibrium strategy on the Continuous action space.
The present invention includes the following steps:
(1) constant α is setubAnd αus, wherein αub> αusQσ∈ (0,1) is learning rate;
(2) initiation parameter, wherein the parameter includes the mean value u of intelligent body i expectation movement ui, accumulative Average StrategyConstant C, variances sigmaiWith accumulative average return Qi
(3) the accumulative Average Strategy until the sampling action of intelligent body i is repeated the steps ofConvergence,
(3.1) by certain exploration rate according to normal distribution N (uij) one movement x of random selectioni
(3.2) execution acts xi, return r is then obtained from environmenti
(3.3) if intelligent body i execution acts xiThe return r received afterwardsiGreater than current accumulative average return Qi, then uiLearning rate be αub, otherwise learning rate is αus, u is updated according to selected learning ratei
(3.4) u is arrived according to studyiUpdate variances sigmai
(3.5) if intelligent body i execution acts xiThe return r received afterwardsiGreater than current accumulative average return Qi, then uiLearning rate be αub, otherwise learning rate is αus, Q is updated according to selected learning ratei
(3.6) according to constant C and movement xiIt updates
(4) accumulative Average Strategy is exportedFinal movement as intelligent body i.
The present invention is further improved, in step (3.3) and step (3.5), the update step-length of Q and the update step-length of u It is synchronous, in uiNeighborhood in, QiAbout uiMapping can linearly turn to Qi=Kui+ C, wherein slope
The present invention is further improved, and gives positive number σLWith positive number K, receiving on the Continuous action space of two intelligent bodies is assorted Balance policy may finally converge to Nash Equilibrium, wherein σLIt is the lower bound of variances sigma.
The present invention also provides a kind of social networks public opinions based on the Nash Equilibrium strategy on the Continuous action space Evolution model, the social networks public opinion evolution model include two class intelligent bodies, respectively ordinary populace in simulation social networks Gossiper class intelligent body and simulation social networks in media or public figure for the purpose of attracting ordinary populace Media Class intelligent body, wherein the Media class intelligent body is using the Nash Equilibrium policy calculation on the Continuous action space to its time Optimal idea is reported, its idea is updated and is broadcasted in social networks.
The present invention is further improved, and is included the following steps:
S1: the idea of each Gossiper and Media is by a random value being initialized as on motion space [0,1];
S2: each time interact in, each intelligent body according to following Developing Tactics oneself idea, until each intelligent body not Changing concept again;S21: it to any one Gossiper class intelligent body, is selected at random in Gossiper network according to setting probability Select a neighbours, according to BCM (the bounded confidence model, bounded confidence model) its idea of policy update and The Media followed;
S22: a subset of stochastical sampling Gossiper network GGossiper idea in subset G ' is wide It broadcasts to all Media;
S23: to any one Media, using the Nash Equilibrium policy calculation on Continuous action space, it is returned optimal Idea, and updated idea is broadcast in entire social networks.
The present invention is further improved, in the step s 21, the operating method of the Gossiper class intelligent body are as follows:
A1: idea initialization: xi τ=xi τ-1
A2: the renewal of ideas: it is less than given threshold when the intelligent body is differed with the idea of the intelligent body of selection, updates the intelligence The idea of body;
A3: the intelligent body compares the difference of oneself and other Media ideas, follows according to one Media of probability selection.
The present invention is further improved, in step A2, if the neighbours currently selected are Gossiper j, and | xj τ- xi τ| < dg, then xi τ←xi τg(xj τ-xi τ);If the neighbours currently selected are Media k, and | yk τ-xi τ| < dm, then xi τ ←xi τm(yk τ-xi τ), wherein dgAnd dmThe threshold value respectively set for the idea of different types of neighbours, ɑgAnd ɑmRespectively For the learning rate for different types of neighbours.
The present invention is further improved, in step A3, according to probabilityMedia k is followed, Wherein,
The present invention is further improved, in step S23, Media j current return rjIt is defined as the middle selection of G ' to chase after The ratio of the middle total number of persons of G ' shared by number with the Gossiper of j, PijIndicate that Gossiper i follows the probability of Media j.
The present invention is further improved, the presence of a Media, and the public opinion of each Gossiper intelligent body can be accelerated to tend to system One;When there are in the environment of multiple Media competition, the dynamic change of each Gossiper intelligent body idea is to be influenced by each Media Weighted average.
Compared with prior art, the beneficial effects of the present invention are: in the environment of Continuous action space, intelligent body with it is other The interests of oneself can either be maximized during intelligent body interaction, and finally can learn to arrive Nash Equilibrium.
Detailed description of the invention
Fig. 1 is that r=0.7 > of the present invention 2/3, a=0.4, b=0.6, two intelligent body converge to the signal of Nash Equilibrium point Figure;
Fig. 2 is that r=0.6 < of the present invention 2/3, a=0.4, b=0.6, two intelligent body converge to the signal of Nash Equilibrium point Figure;
Fig. 3 is that the public opinion of Gossiper-Media model each network when full-mesh network does not have Media develops schematic diagram;
Fig. 4 is that the public opinion of Gossiper-Media model each network when small-world network does not have Media develops schematic diagram;
Fig. 5 is that the public opinion differentiation of Gossiper-Media model each network when full-mesh network has a Media is shown It is intended to;
Fig. 6 is that the public opinion differentiation of Gossiper-Media model each network when small-world network has a Media is shown It is intended to;
The public opinion of Fig. 7 each network when being the Media competed there are two Gossiper-Media model has in full-mesh network Develop schematic diagram;
The public opinion of Fig. 8 each network when being the Media competed there are two Gossiper-Media model has in small-world network Develop schematic diagram.
Specific embodiment
The present invention is described in further details with reference to the accompanying drawings and examples.
Nash Equilibrium strategy on Continuous action space of the invention is extended from single intelligent body nitrification enhancement CALA [7] (Continuous Action Learning Automata, continuous action learning automaton), by introducing WoLS (Win or Learn Slow wins then Fast Learning) study mechanism, the study that algorithm is effectively handled in multiple agent environment asks Topic, therefore, Nash Equilibrium strategy of the invention is referred to as are as follows: WoLS-CALA (Win or Learn Slow Continuous Action Learning Automaton wins then fast-continuous action learning automaton).The present invention first carries out the CALA It is described in detail.
Continuous action learning automaton (CALA) [7] is the Policy-Gradient of the problem concerning study of a solution Continuous action space Nitrification enhancement.Wherein, the strategy of intelligent body is defined as the Normal Distribution N (u on motion spacett) probability it is close Spend function.
The policy update of CALA intelligent body is as follows: in moment t, intelligent body is according to normal distribution N (utt) select one to move Make xt;Execution acts xtAnd ut, then obtain corresponding return V (x respectively from environmentt) and V (ut), it means that algorithm is each It is acted twice with being needed to be implemented during environmental interaction;Finally, updating normal distribution N (u according to following formulatt) mean value And variance,
Wherein,
Here αuAnd ασFor learning rate, K is a normal number, is used to control algolithm convergence.Specifically, the size and calculation of K The study number of method is related, is typically set to the order of magnitude of 1/N, and N is algorithm iteration number, σLIt is the lower bound of variances sigma.Algorithm continues Mean value and variance are updated until u is constant and σtIt is intended to σL.Mean value u is by a most optimal solution of problem of being directed toward after algorithmic statement.Side The size of σ determines the exploring ability of CALA algorithm: σ in journey (2)tBigger, CALA is more possible to search out potential better Movement.
By definition, CALA algorithm is the learning algorithm based on Policy-Gradient class.The algorithm is confirmed returning by theory In the case where reporting function V (x) smooth enough, CALA algorithm, which can be sought, looks for local optimum [7].De Jong et al. [34] passes through Reward Program is improved, CALA is extended and is applied under multiple agent environment, and its modified hydrothermal process can by experimental verification To converge to Nash Equilibrium.WoLS-CALA proposed by the present invention introduces " WoLS " mechanism and solves the problems, such as multi-agent Learning, and from Theoretically analyze and prove that algorithm can learn to arrive Nash Equilibrium in continuous motion space.
Since CALA requires intelligent body to need disposable to obtain sampling action simultaneously and expectation acts in each study Return, however this be in most of intensified learning environment it is infeasible, usual intelligent body be in the interaction of environment every time only A movement can be executed.For this purpose, the present invention extends CALA in terms of Q value function estimation and variable learning rate two, propose WoLS-CALA algorithm.
1, Q Function Estimation
In free-standing multiple agent intensified learning environment, intelligent body once selects a movement, then obtains from environment Return.To the exploring mode of normal distribution, one naturally mode be exactly using Q value to expectation movement u average return into Row estimation.Specifically, in formula (1) intelligent body i movement uiExpected returnsIt can be estimated with following formula,
HereFor the sampling action of t moment.It is that intelligent body i is acted in selectionWhen the return that receives, when by $ t $ Carve the teamwork of each intelligent bodyIt determines.It is intelligent body i about the learning rate to Q.Update side in formula (3) Formula is the normal method for the value function that intensified learning estimates single state, and essence is to use riAssembly average go to estimateThis It is outer it is a further advantage thatIt can update for one time one, and the return newly received is all α forever to the accounting that Q value is estimated.
According to formula (3), the renewal process (formula (1)) of u and the renewal process (formula (2)) of σ are represented by,
HereFor the sampling action of t moment.It is that intelligent body i is acted in selectionWhen the return that receives, it is each by t moment The teamwork of intelligent bodyIt determines.WithIt is intelligent body i about uiAnd σiLearning rate.
However new problem directly can be brought to algorithm using Q Function Estimation in multiple agent environment.Because in more intelligence In energy body environment, the return of intelligent body is influenced by other intelligent bodies, and the strategy change of other intelligent bodies will lead to environment not Stablize.Update mode in formula (4) does not ensure that u can adapt to the dynamic change of environment.Here it cites a plain example, Assuming that $ t $ moment intelligent body i has acquired the optimal movement at current timeAndIt is exactly rightAccurate estimation According to definition, in t moment, to arbitrary xiHaveFormula (3) is brought into (4) to obtain,
If environment remains unchanged, then havingContinue to set up;If however environment changes, it is assumed thatAndIt is no longer optimal movement, then can existSo that its corresponding returnIn this case continue according to the update mode in formula (5), uiIt can be far from xi, However theoretically becauseTo guarantee accurately estimation uiIt should be close to xi.Because Q is the statistical estimate of r, institute It is slower than the variation of r with the update of Q, cause below at no point in the update processIt sets up always, u under multiple repairing weldiIt will It is continuously maintained inIt is constant nearby.Theoretically uiStandard, which should be changed, looks for new optimal movement just right.Cause the original of these problems Because of unstability caused by being primarily due to multiple agent environment, and traditional estimation method (such as Q study) can not be answered effectively Variation to environment.
2, WoLS rule and analysis
In order to more accurately estimate the expected returns of u in multiple agent environment, the present invention passes through variable learning rate Mode updates expectation movement u.Formally, it is expected that acting uiLearning rate be more newly defined as following formula according to the following formula,
Then uiUpdate be represented by
WoLS rule can be intuitively construed to, if the return V (x) of intelligent body movement x is greater than the return V (u) of desired u, So it should learn fastly, on the contrary then slow.It can be seen that WoLS and WoLF (Win or Learn Fast) [35] Strategy is exactly the opposite.Difference is that the target of WoLF design is to guarantee convergence, and WoLS strategy of the invention is In order to enable algorithm to update according to increased direction is returned while guaranteeing the expected returns of correct estimation movement u u.By the inherent dynamic characteristic of analysis WoLS strategy, following conclusion can be obtained,
On 1 Continuous action space of theorem, the learning dynamics using the CALA algorithm of WoLS rule can be approximately that gradient rises Tactful (GA, gradient ascent).
It proves: according to definition (4), it is known that xtIt is intelligent body in moment t according to normal distribution N (utt) selection movement, V (xt) and V ({ ut) it is to correspond respectively to movement xtAnd utReturn.Define f (x)=E [V (xt)|xt=x] it is about movement x Expected returns function.Assuming that αuInfinitesimal, then u in WoLS-CALA algorithmtDynamic change can be indicated by following ODE,
Here N (u, σu) be just be distributed very much probability density function (dN (a, b) indicate mean value be a, variance b2Normal state It is distributed the differential about a).X=u+y is enabled, then by f (x) Taylor expansion in formula (8) at y=0, and abbreviation arrangement can obtain,
It notices in formula (9), itemAnd σ2It is that weighing apparatus is positive.
The conclusion of CALA can be used directly as original CALA algorithm in the renewal process (formula (4)) of standard deviation sigma: to Fixed sufficiently large positive number a K, σ will eventually converge to σL.Convolution (9), the present invention can obtain following conclusion:
To a small positive number σL(such as 1/10000), after the enough time, about utODE can be approximately,
WhereinFor a small normal number.F ' (u) is gradient direction of the function f (u) at u.Formula (10) table Bright u can change towards the gradient direction of f (u), i.e. the fastest-rising direction f (u).I.e. the dynamic trajectory of u can be approximately in gradient Rise strategy.
In the presence of only one intelligent body, the dynamic of u finally will converge to an optimum point, because working as u= u*When for an optimum point,And
It can be seen that from theorem 1, the learning dynamics of the CALA intelligent body expectation movement of WoLS rule are similar to previously described Gradient rise strategy, i.e., they about the differential of time can all be expressed as shaped likeForm.If f (u) is deposited In multiple local optimums, can algorithm finally converge to global optimum depending on algorithm to exploration-utilization (Exploration- Exploitation distribution [36]), and this is a problem that can not be satisfactory to both parties in intensified learning field.To explore to the overall situation most It is excellent the common approach is that the initial exploration rate σ (i.e. standard deviation) of algorithm is taken biggish value, and to the initial learning rate of σ Especially small value is taken, to guarantee that algorithm there can be enough sampling numbers within the scope of entire motion space.Furthermore it is advised plus WoLS The expectation movement u of CALA algorithm after then can also be restrained when standard deviation sigma is not 0 itself, therefore, in order to ensure that enough spies The lower bound σ of rope rate σLA biggish value can be taken.In summary tactful, it may learn by choosing suitable parameter algorithm Global optimum.
Another problem is to may result in algorithm using pure gradient rising strategy under multiple agent environment not restrain, thus Present invention combination PHC (Policy Hill Climbing, strategy are climbed the mountain) [35] algorithm, proposes an Actor-Critic type Free-standing multiple agent nitrification enhancement, referred to as WoLS-CALA.The main thought of Actor-Critic framework is strategy Estimation and strategy update learn respectively in independent process, processing strategie estimating part is known as Critic, policy update Part be known as Actor.Specific learning process is following (algorithm 1),
The learning strategy of 1 WoLS-CALA intelligent body i of algorithm
For simplicity, with two constant αs in algorithm 1ubAnd αus, (αub> αus), instead of uiLearning rateIf intelligent body I execution acts xiThe return r received afterwardsiGreater than current accumulative average return Qi, then ujLearning rate be αub(winning), (otherwise losing) is αus(the 3.3rd step).Because containing denominator φ (σ in formula (7) and (4)i t), one when denominator very little Point tolerance all can cause very big influence to the update of u and σ.More held during specific experiment using two fixed step-lengths The renewal process of algorithm easy to control, it is also easy to accomplish.It is furthermore noted that the update step-length of Q and the step-length of u in the 3.5th step of algorithm It is synchronous, i.e., in ri> QiShi Douwei αub, otherwise it is all αus.Because of αubAnd αusIt is the number of two very littles, in uiVery little neighborhood It is interior, QiAbout uiMapping available linearization be Qi=Kui+ C, wherein slopeIf uiIt changesThenThe purpose done so is also for more accurately estimating Count the expected returns of u.Finally (step 4), algorithm withConvergence is exported as loop termination condition and algorithm.The purpose done so Primarily to preventing in the environment of competition, uiIt will appear periodic solution and cause algorithm that cannot terminate.Here it should be noted variable And uiRepresent different meanings:For the cumulative statistics average value of the sampling action of intelligent body i, it is most under multiple agent environment Termination fruit can converge to Nash Equilibrium strategy;And ujIt is the expectation mean value of the policy distributed of intelligent body i, it may under competitive environment It can periodically be shaken near equilibrium point.Detailed explanation will provide in theorem 2 later.
Because the dynamic trajectory in higher dimensional space might have chaos phenomenon, cause to be difficult to algorithm with multiple intelligence Qualitative analysis is done in dynamic behaviour when body.It is essentially all based on two to the dynamic analysis of multiple agent related algorithm in field [3537-39] of intelligent body.Therefore here there are two Main Analysis tools the case where WoLS-CALA intelligent body.
Theorem 2 gives positive number σLWith a sufficiently large positive number K, the strategy of two WoLS-CALA intelligent bodies may finally Converge to Nash Equilibrium.
It proves: two classes can be divided by the position Nash Equilibrium of equilibrium point: being located at Continuous action space (bounded closed set) side Equilibrium point in boundary and it is another kind of be equalization point inside Continuous action space.In view of borderline equalization point can wait Valence is the equalization point inside the lower one-dimensional space, and this example inquires into emphatically the second class equalization point here.One ODE it is dynamic State feature depends on the stability property [40] of its internal balance point, therefore the equalization point in this example calculating formula first (10), then Analyze the stability of these equalization points.
It enablesFor intelligent body i in t moment according to normal distributionThe movement of stochastical sampling.WithPoint It Wei not actWithCorresponding expected returns.Such as fruit dotIt is an equalization point of equation (10), thenHaveAccording to nonlinear dynamics theory [40], the stability of point eq can be under The characteristic value decision of face matrix,
WhereinAs i ≠ j.
Furthermore according to the definition of Nash Equilibrium, Nash Equilibrium pointMeet lower surface properties,
Formula (12) is brought into M, it is known that the characteristic value of Nash equilibrium point belongs to one of following three kinds of possibilities:
(a) all characteristic values of matrix M have negative real part.This kind of equalization point is asymptotically stability, i.e., all $ eq $ are attached Close track finally can all converge to this equalization point.
(b) all characteristic values of matrix M have non-positive real part, and the characteristic root containing a pair of pure void.This kind of balance Point is stable, but the alpha limit set of the track near it is periodic solution, and alpha limit set is noncountable.In addition, being easy to prove I.e.The Nash Equilibrium will finally be converged to.In view of WoLS-CALA To add up average valueTo export, therefore algorithm can also handle this kind of equalization point problem.
(c) there are the characteristic value of positive real part, i.e. equalization point are unstable by matrix M.To this kind of equalization point, according to Nonlinear Dynamic Theory of mechanics, the track around the unstable equilibrium point can be divided into two kinds: track and other tracks in stable manifold cite {Shilnikov1998Methods}.Stable manifold is the subspace generated by the stable corresponding feature vector of characteristic value.Place Track in stable manifold theoretically finally can all converge to this equalization point.In view of due to randomness and calculate error, It is 0 that algorithm, which maintains the probability that do not go out in the subspace,.And all tracks for being not belonging to the stable manifold all will be gradually remote The above-mentioned other kinds of equalization point analyzed from the equalization point and finally is converged to, that is, converges to borderline equalization point or One and the second class equalization point.
In addition, being similar to single intelligent body environment, closed according to the analysis to theorem 1 given if there is multiple equalization points When suitable exploration-utilization rate, such as σLSufficiently large, σ takes big initial value and small learning rate, algorithm can converge to one receive it is assorted It weighs point (global optimum of each intelligent body when other intelligent body strategies are constant).In conclusion the present invention is completed to algorithm Converge to the proof of Nash Equilibrium.
The present invention also provides a kind of social networks public opinions based on the Nash Equilibrium strategy on the Continuous action space Evolution model, the social networks public opinion evolution model include two class intelligent bodies, respectively ordinary populace in simulation social networks Gossiper class intelligent body and simulation social networks in media or public figure for the purpose of attracting ordinary populace Media Class intelligent body, therefore, social networks public opinion evolution model of the invention are also Gossiper-Media model.Wherein, described Media class intelligent body returns it optimal idea using the Nash Equilibrium policy calculation on the Continuous action space, updates Its idea is simultaneously broadcasted in social networks.WoLS-CALA algorithm is applied to the public opinion in true social networks and developed by the present invention Research in, by being modeled to the media in network using WoLS-CALA, inquire into competition media public opinion can be caused it is assorted The influence of sample.
It is described in detail below:
1.Gossiper-Media model
The present invention proposes a multiple agent intensified learning frame, Gossiper-Media model, to study group's public opinion Differentiation.Gossiper-Media model includes two class intelligent bodies, Gossiper class intelligent body and Media class intelligent body.Wherein Gossiper class intelligent body is used to simulate ordinary populace in live network, and idea (public opinion) is simultaneously by Media and other The influence of Gossiper;And Media class intelligent body is used to simulate the media in social networks for the purpose of attracting masses or the public The idea of personage, the selection oneself of such intelligent body active go to maximize the follower of oneself.Considering one has N number of intelligent body Network, wherein the number of Gossiper be | G |, the number of Media is | M | (N=G ∪ M).Assuming that Gossiper and Media Between be full connection, i.e., each Gossiper can equiprobable any one Media of selection interaction.And between Gossiper Full connection is not provided, i.e., each Gossiper is only possible to interact with the neighbours of oneself.Network between Gossiper is by between it Social networks determine.Particularly, in emulation experiment below, it is imitative to do that this example respectively defines two kinds of Gossiper networks True experiment: Quan Liantong network (fully connected network) and small-world network (small-world network). The idea of note Gossiper i and Media j is denoted as x respectivelyiAnd yj.The interactive process of each intelligent body is in accordance with algorithm 2 in model.
The learning model of idea in 2 Gossiper-Media network of algorithm
Firstly, the idea of each Gossiper and Media is by a random value being initialized as on motion space [0,1] (step 1).Then in interacting each time, each intelligent body adjusts separately the idea of oneself until algorithmic statement according to Different Strategies (each intelligent body all no longer Changing concept).To each Gossiper intelligent body, selection first selects the object interacted with it: according to Select a Gossiper in the probability ξ random neighbours from it, or according to probability 1- ξ random one Media of selection (the 2.1 steps).This subsequent Gossiper updates its idea according to algorithm 3, and is selected according to the idea difference of itself and each Media Follow a Media closest to oneself idea.Assuming that Media intelligent body can be by sampling random acquisition a part The idea of Gossiper, and all Media are broadcast to, it is denoted as G ' (the 2.2nd step) here.Then each Media uses WoLS-CALA The mutual game of algorithm calculates the idea that can maximize the follower of oneself, and updated idea is broadcast to entire net In network (the 2.3rd step).Each Media can also be sampled alone in principle, so that the G ' that they obtain is different, this is to below The study influence of WoLS-CALA algorithm is simultaneously little, because theoretically the idea distribution of G ' is identical as G.Environmental postulates of the invention Primarily to easy consider, while also reducing the possible uncertainty as caused by stochastical sampling.
1.1Gossiper tactful
The strategy of each Gossiper includes two parts: 1) how to break with the conventional idea;2) Media followed how is selected.Tool Body is described as follows (algorithm 3):
The strategy that 3 Gossiper i of algorithm takes turns in τ
To Gossiper i, its idea: x is initialized firsti τ=xi τ-1(step 1).Then according to BCM (the bounded Confidence model, bounded confidence model) tactful [12,33] update its idea (step 2).BCM is a kind of more typical The model of group's idea is described, the idea of the intelligent body based on BCM is only influenced by intelligent body similar in idea therewith.In algorithm In 3, is only differed with the idea of the intelligent body of its selection and be less than threshold value dg(or dm) when, Gossiper just will be updated its idea. Here dgAnd dmThe intelligent body for corresponding respectively to selection is Gossiper and Media.Threshold value dg(or dm) size represent Gossiper receives the degree of new idea.Intuitively, d is bigger, and Gossiper is easier to be influenced [41- by other intelligent bodies 43].Then the Gossiper compares the difference of oneself and other Media ideas, follows the (the 3rd according to one Media of probability selection Step).Here probability P is usedij τIt indicates to follow the probability of Media j in τ moment Gossiper i selection, meets following characteristic:
(i) as | xi-yj| > dmWhen, Pij=0;
(ii)(ii)PijIdea y of the > 0 and if only if Media jjMeet | xi-yj|≤dm
(iii)(iii)PijWith idea xiAnd yjDistance | xi-yj| increase and reduce.
It notices if rightHave | xi-yj| > dm, thenThis means that there are this possibility, one A Gossiper will not follow any one Media.Equation λijIn parameter δ be a small positive number, for preventing score Denominator is 0.
1.2 Media strategy
To the idea sample information of one group of given Gossiper, each Media can be by learning adjustment appropriate oneself Idea, to cater to the hobby of Gossiper, so that more Gossiper be attracted to follow it.There are the more of multiple Media In multiagent system, Nash Equilibrium is that multiple intelligent bodies are vied each other the stable state being finally achieved.In this condition, each intelligence Energy body cannot obtain higher return by the one-side strategy for changing oneself.In view of the motion space of Media is to connect Continuous (idea is defined as any point on section [0,1]), builds the behavior of Media used here as WoLS-CALA algorithm Mould, algorithm 4 are the Media strategies based on WoLS-CALA building.
The strategy that 4 Media j of algorithm takes turns in τ
Media j current return rjIt is defined as total people in G ' shared by the number of the middle Gossiper for selecting to follow j of G ' Several ratios,
Here λijDefinition with algorithm 3.PijIndicate that Gossiper i follows the probability of Media j.
2, group's public opinion dynamic analysis
Remember { yj}j∈M, yj∈ (0,1) is the idea of Media j.Assuming that Gossiper network is infinitely great, then Gossiper Idea distribution can be indicated by a continuous distribution density function, indicate Gossiper group in t moment with p (x, t) here The probability density function of idea distribution.Then Gossiper public opinion differentiation can be expressed as probability density function p (x, t) about when Between partial derivative.First this example consider only one Media there are the case where.
Theorem 3 is contained only at one in the Gosiper-Media network of a Media, and the distribution of Gossiper idea is drilled Become and obey following formula,
Wherein,
Here I1=x | | and x-y | < (1- αm)dm, I2=x | dm≥|x-y|≥(1-αm)dm}。
It proves:, Gossiper based on BCM theoretical based on MF approximate [40] (Mean Field approximations) The probability distribution of idea about t local derviation p (x, t) can with [12] are expressed below,
Here Wx+y→xIndicate Gossiper of the idea equal to x+y can probability of the Changing concept to x, and Wx+y→xp(x+y)dy Indicate the ratio for being transferred to x from section (x+y, x+y+dy) in the idea of time interval (t, t+dt) interior intelligent body.Similar Wx→x+yIndicate probability of the intelligent body meeting Changing concept of idea x to x+y, Wx→x+yP (x) dy indicates that idea is equal to the Gossiper of x It is transferred to section (x+y, x+y+dy) ratio.
According to the definition of algorithm 3, intelligent body Gossiper is according to probability ξ by the ideal effect of other Gossiper, Huo Zheyi Then probability 1- ξ is made the decision of oneself by the ideal effect of Media.By Wx+y→xAnd Wx→x+yIt is refined as by other Gossiper Idea and the two parts influenced by Media idea, are denoted as w respectively[g]And w[m], then Wx→x+yAnd Wx+y→xIt is represented by,
Formula (18), which is brought into formula (17), to be obtained,
Definition
Wherein Ψg(x, t) indicates the variation that the probability density function p (x, t) of intelligent body g idea is influenced by Gossiper Rate.Weisbuch G [45] et al. is proved Ψg(x, t) obeys following formula,
HereIt is second order local derviation of the p about x.αgBe one between 0 to 0.5 real number.dgFor The threshold value of Gossiper.
Formula Ψm(x, t) represents the change rate that the distribution density function p (x, t) of idea is influenced by media.Assuming that Media j Idea be uj(uj=x+dj), then the idea distribution of Media can use Dirac delta equation q (x)=δ (x-uj) table Show.Dirac delta equation δ (x) [46] is commonly used for simulating the narrow peaking function (pulse) of a height and other similar pumping As concepts, charge, point mass or electronics are such as put, is defined as follows,
The then rate of transform from x+y to xIt is represented by
δ (x- [(x+y)+α in formula (21)m((x+z)-(x+y))]) indicate that following event occurs, idea x+y is by idea x+z Influence and be transferred to x.Q (x+z) is distribution density of the Media at idea x+z.Similarly, wx→x+yIt can be expressed as,
Convolution (21)-(22) calculate and arrange and can obtain,
Wherein I1=x | | and x-y | < (1- αm)dm, I2=x | dm≥|x-y|≥(1-αm)dm}。
Composite type (20) is completed to prove.
This example can be seen that from formula (14), and the change rate of p (x, t) is formula Ψg(x, t) and ΨmThe weighted average of (x, t). The former, which represents public opinion variation, is influenced part by Gossiper network, and the latter, which represents, is influenced part by Media network.Only Formula Ψ containing Gossiperg(x, t) was researched and analysed by the work [45] of Weisbuch G.Its obtain one it is important Property be from any one distribution, the point of local optimum can gradually be strengthened in distribution density, this shows pure Gossiper net The development of public opinion can gradually tend to always in network.In addition, can be seen that from theorem 3, formula Ψg(x, t) and formula Ψm(x, t) all with The specific network of Gossiper is unrelated, and when this shows network infinity, the development of public opinion is not influenced by network structure.
The second part of following analysis equation (14), Ψm(x, t) (formula (23)).Assuming that y is constant, analysis (23) can ,
Intuitively, formula (24) shows that the viewpoint of Gossiper similar with Media idea can all converge to this Media, It therefore follows that following conclusion,
The presence of 1 one Media of inference can accelerate the public opinion of Gossiper to tend to unified.
Below this example consider multiple Media there are the case where.Define Pj(x) idea for being Gossiper is at x by Media The probability that j influences, then
So Gossiper with multiple Media competition in the environment of, the dynamic change of idea can be expressed as by The weighted average that each Media influences.Following conclusion can be obtained,
The dynamic change of the distribution function of 2 Gossiper idea of inference obeys following formula:
Wherein Ψg(x, t) and Ψm(x, t) is defined respectively by formula (20) and (23).
3, emulation experiment and analysis
First verify that WoLS-CALA algorithm may learn Nash Equilibrium.Then provide Gossiper-Media model Experiment simulation, for verifying the theoretical analysis result of front.
3.1 WoLS-CALA algorithm performances are examined
This example considers the Gossiper-Media model of a simplified version, to examine whether WoLS-CALA algorithm can be learned Practise Nash Equilibrium strategy.Specifically, the problem of two Media are competed follower is modeled as following objective optimisation problems,
max(f1(x,y),f2(x,y))
S.t., (s.t. indicates constraint condition to x, y ∈ [0,1], is the standard literary style for optimizing class problem.)(26)
Wherein
And
r∈[0,1].A, b ∈ [0,1] ∧ | a-b | >=0.2 is the idea of Gossiper.
Here function f1(x, y) and f2R in (x, y) simulation algorithm 4, respectively represent Media 1 and 2 teamwork be < The return for x, y > be.This example uses two WoLS-CALA intelligent bodies, controls x and y respectively by independent study, each to maximize From Reward Program f1(x, y) and f2(x,y).In the model, the idea of Gossiper can according to various forms of Nash Equilibriums It is divided into two classes:
(i) as r > 2/3, equilibrium point is that (a, a), as r < 1/3, equilibrium point is (b, b);
(ii) (ii) as 1/3≤r≤2/3, equilibrium point be set | x-a | 0.1 ∧ of < | y-b | < 0.1 or | x-b | < 0.1 ∧ | y-a | any point on < 0.1.
In specific emulation experiment, this example has respectively taken a point, i.e. r=0.7 > 2/3 and r=0.6 in the two types < 2/3.Then it observes when the idea distribution of Gossiper is different, can algorithm can learn as expected to Nash Equilibrium.Table 1 is The parameter setting of WoLS-CALA.
1 parameter setting of table
Fig. 1 and 2 is the simulation result of two experiments, can be evident that, Media intelligent body is passing through in two experiments After crossing 3000 times or so study, Nash Equilibrium has all been converged to, that is to say, that converge to<0.4,0.4 when r=0.6>, r <0.4,0.57>has been converged to when=0.7.As shown in Figure 1, when r=0.7 > 2/3, a=0.4, b=0.6, two intelligent body are received Hold back Nash Equilibrium point (0.4,0.4);As shown in Fig. 2, working as r=0.6 < 2/3, a=0.4, b=0.6, intelligent body 1 (agent1) X=0.4 is converged to, intelligent body 2 (agent2) converges to y=0.57.
The experiment simulation of 3.2 Gossiper-Media models
The simulation result of this trifle displaying Gossiper-Media model.Consider 200 Gossiper and there is difference The experimental situation of number Media is respectively as follows: (i) without Media;(ii) only one Media;(iii) there are two competitions Media.To each environment, this example considers two kinds of representative Gossiper networks, full-mesh network (Fully respectively Connected Network) and small-world network [47] (Small-World Network).Pass through these comparative experimentss, this example Inquire into the influence that Media develops Gossiper public opinion.
To even things up, each experimental situation uses same parameter setting.Same net is used in three experimental situations The initial idea of network and identical Gossiper and Media.Here, small-world network is constructed using Watts-Strogatz Method [47] is generated at random by degree of communication p=0.2.The initial idea of each Gossiper is by being evenly distributed on section [0,1] Stochastical sampling.The initial idea of Media is 0.5.In view of the observation for crossing conference interference experiment of threshold value, here will Gossiper-Media threshold value dmWith Gossiper-Gossiper threshold value dgIt is set as a small positive number 0.1.Gossiper Habit rate αgAnd αmIt is set as 0.5.Set G ' is sampled from G at random, and is met | G ' |=80 % | G |.
Because there are two types of Gossiper network modes under each environment: full-mesh network and small-world network.Therefore, Fig. 3- 4 respectively show under full-mesh network and small-world network, and the public opinion of each network develops when without Media;Fig. 5-6 is opened up respectively Show under full-mesh network and small-world network, the public opinion of each network develops when with a Media;Fig. 7-8 is shown respectively Under the full-mesh network and small-world network, there are two the public opinions of network each when the Media competed to develop for tool.From these figures In, it can be seen firstly that under three kinds of all Media environment, the number of the final convergent point of different Gossiper networks It is identical: to converge to 5 in zero Media environment;4 are converged in one Media environment;3 are converged in two Media environment It is a.This phenomenon is consistent with the conclusion in theorem 3 and inference 2, the topology of the public opinion dynamic and Gossiper network of Gossiper Structure is unrelated, because the public opinion dynamic under heterogeneous networks of Gossiper can be modeled with identical formula.
Second, it can be observed from Fig. 3-6, when there are in the case where a Media, the carriage of Gossiper in two networks 4 all are reduced to from 5 by last convergent points.This shows that the presence of Media can accelerate the generation of Gossiper public opinion unification, Meet conclusion of this example in inference 1.Meanwhile from Fig. 5-8, when the number of Media increases to 2 from 1, in two networks The last convergent points of the public opinion of Gossiper are further reduced to 3 from 4.This shows that the Media of competition can be further speeded up The unification of Gossiper public opinion.
In addition, experimental result is also able to verify that the performance of WoLS-CALA algorithm.In fig. 5 and fig., the sight of Media intelligent body Thought maintains always around the idea with most Gossiper (N in full-mesh networkmax=69, N in small-world networkmax =68).This phenomenon meets the expection of algorithm design, i.e. WoLS-CALA intelligent body can learn to global optimum.In Fig. 7 and In Fig. 8, it can be seen that when there are two Media, the idea of a Media is maintained around the idea with most Gossiper (N in two networksmaxIt is all that 89), another Media maintains around the idea with more than second Gossiper (full-mesh net N ' in networkmax=70, N ' in small-world networkmax=66).This also complies with the expection of theorem 2, and two WoLS-CALA intelligent bodies are most Nash Equilibrium can be converged to eventually.The idea of Media vibrates up and down around Gossiper idea always in Fig. 3-8, be because In Gossiper-Media model, the optimal strategy of Media is not unique (to be less than d around Gossiper ideamIn the range of all It is the optimum point of Media).
4, it summarizes
The invention proposes the nitrification enhancement WoLS- of the Continuous action space of the multiple agent of an independent study CALA, demonstrating the algorithm in terms of theoretical proof and experimental verification two respectively may learn Nash Equilibrium.Then should Algorithm is applied in the research that public opinion develops in network environment.Here by the individual in social networks be divided into Gossiper and Two class of Media models respectively, and wherein Gossiper class represents ordinary populace, and Media represents society using the modeling of WoLS-CALA algorithm Hand over the individual for the purpose of attracting public concern such as media.By modeling respectively to two kinds of intelligent bodies, the present invention has discussed difference The influence that the competition of number Media generates Gossiper public opinion.Last theoretical and experiment shows that the competition of Media can add Fast public opinion reaches unanimity.
The specific embodiment of the above is better embodiment of the invention, is not limited with this of the invention specific Practical range, the scope of the present invention includes being not limited to present embodiment, all equal according to equivalence changes made by the present invention Within the scope of the present invention.
The corresponding bibliography of the label being related in the present invention is as follows:
[1]PazisJ,LagoudakisMG.Binary Action Search for Learning Continuous- action Control Policies[C].In Proceedings of the 26th Annual International Conference on Machine Learning,New York,NY,USA,2009:793–800.
[2]Pazis J,Lagoudakis M G.Reinforcement learning in multidimensional continuous action spaces[C].In IEEE Symposiumon Adaptive Dynamic Programming& Reinforcement Learning,2011:97–104.
[3]Sutton R S,Maei H R,Precup D,et al.Fast Gradient-descent Methods for Temporal-difference Learning with Linear Function Approximation[C].In Proceedings of the 26th Annual International Conference on Machine Learning, 2009:993–1000.
[4]Pazis J,Parr R.Generalized Value Functions for Large Action Sets [C].In International Conference on Machine Learning,ICML 2011,Bellevue, Washington,USA,2011:1185–1192.
[5]Lillicrap T P,Hunt J J,Pritzel A,et al.Continuous control with deep reinforcement learning[J].Computer Science,2015,8(6):A187.
[6]KONDA V R.Actor-critic algorithms[J].SIAM Journal on Control and Optimization,2003,42(4).
[7]Thathachar M A L,Sastry P S.Networks of Learning Automata: Techniques for Online Stochastic Optimization[J].Kluwer Academic Publishers, 2004.
[8]Peters J,Schaal S.2008Special Issue:Reinforcement Learning of Motor Skills with Policy Gradients[J].Neural Netw.,2008,21(4).
[9]van Hasselt H.Reinforcement Learning in Continuous State and Action Spaces[M].In Reinforcement Learning:State-of-the-Art.Berlin, Heidelberg:Springer Berlin Heidelberg,2012:207–251.
[10]Sallans B,Hinton G E.Reinforcement Learning with Factored States and Actions [J].J.Mach.Learn.Res.,2004,5:1063–1088.
[11]Lazaric A,Restelli M,Bonarini A.Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods[C].In Conference on Neural Information Processing Systems,Vancouver,British Columbia,Canada,2007:833–840.
[12]Quattrociocchi W,Caldarelli G,Scala A.Opinion dynamics on interacting networks:media competition and social influence[J].Scientific Reports,2014,4(21):4938–4938.
[13]Yang H X,Huang L.Opinion percolation in structured population[J] .Computer Physics Communications,2015,192(2):124–129.
[14]Chao Y,Tan G,Lv H,et al.Modelling Adaptive Learning Behaviours for Consensus Formation in Human Societies[J].Scientific Reports,2016,6: 27626.
[15]De Vylder B.The evolution of conventions in multi-agent systems [J].Unpublished doctoral dissertation,Vrije Universiteit Brussel,Brussels, 2007.
[16]Holley R A,Liggett T M.Ergodic Theorems for Weakly Interacting Infinite Systems and the Voter Model[J].Annals of Probability,1975,3(4):643– 663.
[17] Nowak A, Szamrej J, Latan thatch B.From private attitude to public opinion:A dynamic theory of social impact.[J].Psychological Review,1990,97 (3):362–376.
[18]Tsang A,Larson K.Opinion dynamics of skeptical agents[C].In Proceedings of the 2014international conference on Autonomous agents and multi-agent systems,2014:277–284.
[19]Ghaderi J,Srikant R.Opinion dynamics in social networks with stubborn agents:Equilibrium and convergence rate[J].Automatica,2014,50(12): 3209–3215.
[20]Kimura M,Saito K,Ohara K,et al.Learning to Predict Opinion Share in Social Networks.[C].In Twenty-Fourth AAAI Conference on Artificial Intelligence,AAAI 2010,Atlanta,Georgia,Usa,July,2010.
[21]Liakos P,Papakonstantinopoulou K.On the Impact of Social Cost in Opinion Dynamics [C].In Tenth International AAAI Conference on Web and Social Media ICWSM,2016.
[22]Bond R M,Fariss C J,Jones J J,et al.A 61-million-person experiment in social influence and political mobilization[J].Nature,2012,489 (7415):295–8.
[23]Szolnoki A,Perc M.Information sharing promotes prosocial behaviour[J].New Journal of Physics,2013,15(15):1–5.
[24]Hofbauer J,Sigmund K.Evolutionary games and population dynamics [M].Cambridge;New York,NY:Cambridge University Press,1998.
[25]Tuyls K,Nowe A,Lenaerts T,et al.An Evolutionary Game Theoretic Perspective on Learning in Multi-Agent Systems[J].Synthese,2004,139(2):297– 330.
[26]Szabo B G.Fath G(2007)Evolutionary games on graphs[C].In Physics Reports,2010.
[27]Han T A,Santos F C.The role of intention recognition in the evolution of cooperative behavior[C].In International Joint Conference on Artificial Intelligence,2011:1684–1689.
[28]Santos F P,Santos F C,Pacheco J M.Social Norms of Cooperation in Small-Scale Societies[J].PLoS computational biology,2016,12(1):e1004709.
[29]Zhao Y,Zhang L,Tang M,et al.Bounded confidence opinion dynamics with opinion leaders and environmental noises[J].Computers and Operations Research,2016,74(C):205–213.
[30]Pujol J M,Delgado J,Sang,et al.The role of clustering on the emergence of efficient social conventions[C].In International Joint Conference on Artificial Intelligence,2005:965–970.
[31]Nori N,Bollegala D,Ishizuka M.Interest Prediction on Multinomial, Time-Evolving Social Graph.[C].In IJCAI 2011,Proceedings of the International Joint Conference on Artificial Intelligence,Barcelona,Catalonia,Spain,July, 2011:2507–2512.
[32]Fang H.Trust modeling for opinion evaluation by coping with subjectivity and dishonesty[C].In International Joint Conference on Artificial Intelligence,2013:3211–3212.
[33]Deffuant G,Neau D,Amblard F,et al.Mixing beliefs among interacting agents[J].Advances in Complex Systems,2011,3(1n04):87–98.
[34]De Jong S,Tuyls K,Verbeeck K.Artificial agents learning human fairness[C].In International Joint Conference on Autonomous Agents and Multiagent Systems,2008:863–870.
[35]BowlingM,Veloso.Multiagent learning using a variable learning rate[J].Artificial Intelligence,2002,136(2):215–250.
[36]Sutton R S,Barto A G.Reinforcement learning:an introduction[M] .Cambridge,Mass:MIT Press,1998.
[37]Abdallah S,Lesser V.A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics[J].J.Artif.Int.Res.,2008,33(1):521–549.
[38]Singh S P,Kearns M J,Mansour Y.Nash Convergence of Gradient Dynamics in General-Sum Games[J],2000:541–548.
[39]Zhang C,Lesser V R.Multi-agent learning with policy prediction [J],2010:927–934.
[40]Shilnikov L P,Shilnikov A L,Turaev D,et al.Methods of qualitative theory in nonlinear dynamics/[M].World Scientific,1998.
[41]Dittmer J C.Consensus formation under bounded confidence[J] .Nonlinear Analysis Theory Methods and Applications,2001,47(7):4615–4621.
[42]LORENZ J.CONTINUOUS OPINION DYNAMICS UNDER BOUNDED CONFIDENCE:A SURVEY[J].International Journal of Modern Physics C,2007,18(12):2007.
[43]Krawczyk M J,Malarz K,Korff R,et al.Communication and trust in the bounded confidence model[J].Computational Collective Intelligence.Technologies and Applications,2010,6421:90–99.
[44]Lasry J M,Lions P L.Mean field games[J].Japanese Journal of Mathematics,2007,2(1):229–260.
[45]WeisbuchG,DeffuantG,AmblardF,etal.Interacting Agents and Continuous Opinions Dynamics[M].Springer Berlin Heidelberg,2003.
[46]Hassani S.Dirac Delta Function[M].Springer New York,2000.
[47]DJ W,SH S.Collectivedynamics of’small-world’networks[C].In Nature,1998:440–442.

Claims (10)

1. the Nash Equilibrium strategy on Continuous action space, it is characterised in that include the following steps:
(1) constant α is setubAnd αus, wherein αub> αusQσ∈ (0,1) is learning rate;
(2) initiation parameter, wherein the parameter includes the mean value u of intelligent body i expectation movement ui, accumulative Average StrategyOften Number C, variances sigmaiWith accumulative average return Qi
(3) the accumulative Average Strategy until the sampling action of intelligent body i is repeated the steps ofConvergence,
(3.1) by certain exploration rate according to normal distribution N (uij) one movement x of random selectioni
(3.2) execution acts xi, return r is then obtained from environmenti
(3.3) if intelligent body i execution acts xiThe return r received afterwardsiGreater than current accumulative average return Qi, then ui? Habit rate is αub, otherwise learning rate is αus, u is updated according to selected learning ratei
(3.4) u is arrived according to studyiUpdate variances sigmai
(3.5) if intelligent body i execution acts xiThe return r received afterwardsiGreater than current accumulative average return Qi, then ui? Habit rate is αub, otherwise learning rate is αus, Q is updated according to selected learning ratei
(3.6) according to constant C and movement xiIt updates
(4) accumulative Average Strategy is exportedFinal movement as intelligent body i.
2. the Nash Equilibrium strategy on Continuous action space according to claim 1, it is characterised in that: in step (3.3) In step (3.5), the update step-length of Q and the update step-length of u are synchronous, in uiNeighborhood in, QiAbout uiMapping can be linear Turn to Qi=Kui+ C, wherein slope
3. the Nash Equilibrium strategy on Continuous action space according to claim 2, it is characterised in that: given positive number σLWith One positive number K, the Nash Equilibrium strategy on the Continuous action space of two intelligent bodies may finally converge to Nash Equilibrium, In, σLIt is the lower bound of variances sigma.
4. the social networks public opinion based on the Nash Equilibrium strategy on the described in any item Continuous action spaces of claim 1-3 is drilled Varying model, it is characterised in that: the social networks public opinion evolution model includes two class intelligent bodies, respectively in simulation social networks Media or public people in the Gossiper class intelligent body and simulation social networks of ordinary populace for the purpose of attracting ordinary populace The Media class intelligent body of object, wherein the Media class intelligent body is using the Nash Equilibrium strategy on the Continuous action space Optimal idea is returned it in calculating, is updated its idea and is broadcasted in social networks.
5. social networks public opinion evolution model according to claim 4, it is characterised in that include the following steps:
S1: the idea of each Gossiper and Media is by a random value being initialized as on motion space [0,1];
S2: in interacting each time, each intelligent body according to following Developing Tactics oneself idea, until each intelligent body all no longer changes Become idea;
S21: to any one Gossiper class intelligent body, a neighbour is randomly choosed in Gossiper network according to setting probability It occupies, according to Media BCM policy update its idea and followed;
S22: a subset of stochastical sampling Gossiper network GGossiper idea in subset G ' is broadcast to All Media;
S23: to any one Media, using the Nash Equilibrium policy calculation on Continuous action space, it returns optimal idea, And updated idea is broadcast in entire social networks.
6. social networks public opinion evolution model according to claim 5, it is characterised in that: in the step s 21, described The operating method of Gossiper class intelligent body are as follows:
A1: idea initialization: xi τ=xi τ-1
A2: the renewal of ideas: it is less than given threshold when the intelligent body is differed with the idea of the intelligent body of selection, updates the intelligent body Idea;
A3: the intelligent body compares the difference of oneself and other Media ideas, follows according to one Media of probability selection.
7. social networks public opinion evolution model according to claim 6, it is characterised in that: in step A2, if currently The neighbours of selection are Gossiper j, and | xj τ-xi τ| < dg, then xi τ←xi τg(xj τ-xi τ);If the neighbours currently selected It is Media k, andThenWherein, dgAnd dmRespectively it is directed to The threshold value of the idea setting of different types of neighbours, ɑgAnd ɑmRespectively it is directed to the learning rate of different types of neighbours.
8. social networks public opinion evolution model according to claim 7, it is characterised in that: in step A3, according to probability Media k is followed,Wherein,
9. social networks public opinion evolution model according to claim 8, it is characterised in that: in step S23, Media j Current return rjIt is defined as the ratio of the middle total number of persons of G ' shared by the number of the middle Gossiper for selecting to follow j of G ',PijIndicate that Gossiper i follows the probability of Media j.
10. according to the described in any item social networks public opinion evolution models of claim 4-9, it is characterised in that: Media's In the presence of the public opinion of each Gossiper intelligent body can be accelerated to tend to unified;In the environment of being competed there are multiple Media, respectively The dynamic change of Gossiper intelligent body idea is the weighted average influenced by each Media.
CN201880001570.9A 2018-08-01 2018-08-01 Social network public opinion evolution method Active CN109496305B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/098101 WO2020024170A1 (en) 2018-08-01 2018-08-01 Nash equilibrium strategy and social network consensus evolution model in continuous action space

Publications (2)

Publication Number Publication Date
CN109496305A true CN109496305A (en) 2019-03-19
CN109496305B CN109496305B (en) 2022-05-13

Family

ID=65713809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880001570.9A Active CN109496305B (en) 2018-08-01 2018-08-01 Social network public opinion evolution method

Country Status (2)

Country Link
CN (1) CN109496305B (en)
WO (1) WO2020024170A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362754A (en) * 2019-06-11 2019-10-22 浙江大学 The method that social network information source is detected on line based on intensified learning
CN111445291A (en) * 2020-04-01 2020-07-24 电子科技大学 Method for providing dynamic decision for social network influence maximization problem
CN112862175A (en) * 2021-02-01 2021-05-28 天津天大求实电力新技术股份有限公司 Local optimization control method and device based on P2P power transaction

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801299B (en) * 2021-01-26 2023-12-01 西安电子科技大学 Method, system and application for constructing game model of evolution of reward and punishment mechanism
CN113572548B (en) * 2021-06-18 2023-07-07 南京理工大学 Unmanned plane network cooperative fast frequency hopping method based on multi-agent reinforcement learning
CN113645589B (en) * 2021-07-09 2024-05-17 北京邮电大学 Unmanned aerial vehicle cluster route calculation method based on inverse fact policy gradient
CN113568954B (en) * 2021-08-02 2024-03-19 湖北工业大学 Parameter optimization method and system for preprocessing stage of network flow prediction data
CN113778619B (en) * 2021-08-12 2024-05-14 鹏城实验室 Multi-agent state control method, device and terminal for multi-cluster game
CN113687657B (en) * 2021-08-26 2023-07-14 鲁东大学 Method and storage medium for multi-agent formation dynamic path planning
CN114021456A (en) * 2021-11-05 2022-02-08 沈阳飞机设计研究所扬州协同创新研究院有限公司 Intelligent agent invalid behavior switching inhibition method based on reinforcement learning
CN114065916A (en) * 2021-11-11 2022-02-18 西安工业大学 DQN-based agent training method
CN114845359A (en) * 2022-03-14 2022-08-02 中国人民解放军军事科学院战争研究院 Multi-intelligent heterogeneous network selection method based on Nash Q-Learning
CN115515101A (en) * 2022-09-23 2022-12-23 西北工业大学 Decoupling Q learning intelligent codebook selection method for SCMA-V2X system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055268A1 (en) * 2007-08-20 2009-02-26 Ads-Vantage, Ltd. System and method for auctioning targeted advertisement placement for video audiences
CN103490413A (en) * 2013-09-27 2014-01-01 华南理工大学 Intelligent electricity generation control method based on intelligent body equalization algorithm
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
CN106899026A (en) * 2017-03-24 2017-06-27 三峡大学 Intelligent power generation control method based on the multiple agent intensified learning with time warp thought
CN107135224A (en) * 2017-05-12 2017-09-05 中国人民解放军信息工程大学 Cyber-defence strategy choosing method and its device based on Markov evolutionary Games
US20180033081A1 (en) * 2016-07-27 2018-02-01 Aristotle P.C. Karas Auction management system and method
CN107832882A (en) * 2017-11-03 2018-03-23 上海交通大学 A kind of taxi based on markov decision process seeks objective policy recommendation method
CN107979540A (en) * 2017-10-13 2018-05-01 北京邮电大学 A kind of load-balancing method and system of SDN network multi-controller
CN109511277A (en) * 2018-08-01 2019-03-22 东莞理工学院 The cooperative method and system of multimode Continuous action space

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106936855B (en) * 2017-05-12 2020-01-10 中国人民解放军信息工程大学 Network security defense decision-making determination method and device based on attack and defense differential game
CN108092307A (en) * 2017-12-15 2018-05-29 三峡大学 Layered distribution type intelligent power generation control method based on virtual wolf pack strategy

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055268A1 (en) * 2007-08-20 2009-02-26 Ads-Vantage, Ltd. System and method for auctioning targeted advertisement placement for video audiences
CN103490413A (en) * 2013-09-27 2014-01-01 华南理工大学 Intelligent electricity generation control method based on intelligent body equalization algorithm
CN106358308A (en) * 2015-07-14 2017-01-25 北京化工大学 Resource allocation method for reinforcement learning in ultra-dense network
US20180033081A1 (en) * 2016-07-27 2018-02-01 Aristotle P.C. Karas Auction management system and method
CN106899026A (en) * 2017-03-24 2017-06-27 三峡大学 Intelligent power generation control method based on the multiple agent intensified learning with time warp thought
CN107135224A (en) * 2017-05-12 2017-09-05 中国人民解放军信息工程大学 Cyber-defence strategy choosing method and its device based on Markov evolutionary Games
CN107979540A (en) * 2017-10-13 2018-05-01 北京邮电大学 A kind of load-balancing method and system of SDN network multi-controller
CN107832882A (en) * 2017-11-03 2018-03-23 上海交通大学 A kind of taxi based on markov decision process seeks objective policy recommendation method
CN109511277A (en) * 2018-08-01 2019-03-22 东莞理工学院 The cooperative method and system of multimode Continuous action space

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋玉坚等: "多智能体布谷鸟算法的网络计划资源均衡优化", 《计算机工程与应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362754A (en) * 2019-06-11 2019-10-22 浙江大学 The method that social network information source is detected on line based on intensified learning
CN110362754B (en) * 2019-06-11 2022-04-29 浙江大学 Online social network information source detection method based on reinforcement learning
CN111445291A (en) * 2020-04-01 2020-07-24 电子科技大学 Method for providing dynamic decision for social network influence maximization problem
CN111445291B (en) * 2020-04-01 2022-05-13 电子科技大学 Method for providing dynamic decision for social network influence maximization problem
CN112862175A (en) * 2021-02-01 2021-05-28 天津天大求实电力新技术股份有限公司 Local optimization control method and device based on P2P power transaction

Also Published As

Publication number Publication date
WO2020024170A1 (en) 2020-02-06
CN109496305B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN109496305A (en) Nash equilibrium strategy on continuous action space and social network public opinion evolution model
Wang et al. R-MADDPG for partially observable environments and limited communication
Russell et al. Q-decomposition for reinforcement learning agents
Busoniu et al. A comprehensive survey of multiagent reinforcement learning
Zhang et al. Collective behavior coordination with predictive mechanisms
Abed-Alguni et al. A comparison study of cooperative Q-learning algorithms for independent learners
WO2019127945A1 (en) Structured neural network-based imaging task schedulability prediction method
Simões et al. Multi-agent actor centralized-critic with communication
Xu et al. Learning multi-agent coordination for enhancing target coverage in directional sensor networks
Mehta State-of-the-art reinforcement learning algorithms
CN114510012A (en) Unmanned cluster evolution system and method based on meta-action sequence reinforcement learning
JP7448683B2 (en) Learning options for action selection using meta-gradient in multi-task reinforcement learning
Wang et al. Distributed reinforcement learning for robot teams: A review
Liu et al. Efficient exploration for multi-agent reinforcement learning via transferable successor features
Yun et al. Multi-agent deep reinforcement learning using attentive graph neural architectures for real-time strategy games
Juang et al. A self-generating fuzzy system with ant and particle swarm cooperative optimization
Choudhury et al. Scalable Online planning for multi-agent MDPs
Han et al. Multi-uav automatic dynamic obstacle avoidance with experience-shared a2c
Zhou et al. Strategic interaction multi-agent deep reinforcement learning
Abed-Alguni Cooperative reinforcement learning for independent learners
Dias et al. Quantum-inspired neuro coevolution model applied to coordination problems
Lima et al. Formal analysis in a cellular automata ant model using swarm intelligence in robotics foraging task
Subramanian et al. Efficient exploration in monte carlo tree search using human action abstractions
Zhan et al. Dueling network architecture for multi-agent deep deterministic policy gradient
Martín H et al. Learning autonomous helicopter flight with evolutionary reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant