CN104008203B - A kind of Users' Interests Mining method for incorporating body situation - Google Patents
A kind of Users' Interests Mining method for incorporating body situation Download PDFInfo
- Publication number
- CN104008203B CN104008203B CN201410269562.6A CN201410269562A CN104008203B CN 104008203 B CN104008203 B CN 104008203B CN 201410269562 A CN201410269562 A CN 201410269562A CN 104008203 B CN104008203 B CN 104008203B
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- user
- interest
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of Users' Interests Mining method for incorporating body situation, first against the Web user interest behavior characteristic of complex multi-dimensional in e-commerce website, builds the user interest profile extraction model based on Second-Order Hidden Markov Model;Next analyzes the contextual information that can reflect user interest, including the individual information of user, environmental information and facility information etc.;The user interest model based on situation ontologies is again pulled up, the interest-degree of user's individual information is measured and expressed using the thought of fuzzy logic at the same time, it is finally based on the user interest drift detection method of hidden semi-Markov model, model is built according to user's browse path, using the average of the probable probability of the average log of sequence as threshold point, to judge whether interest is drifted about.The present invention constructs the interest model that disclosure satisfy that user demand to provide personalized ventilation system, improves the effective means of user satisfaction, has good application value.
Description
Technical field
The present invention relates to data mining and ontology field, especially a kind of Users' Interests Mining method is particularly suitable
In user personalized information services the problem of.
Background technology
Network application becomes increasingly complex, and data volume is also increasing, some such as e-commerce, web site design work
Become more complicated with heavy, this is needed on the basis of user's existing information, from user's Access Interest, access time, access
Dynamic adjustment structure of web page in terms of the behaviors such as frequency, targetedly carries out e-commerce to meet user demand, there is provided individual character
Change service.The individual info service of Internet be exactly according to user it is different the characteristics of, and user interest hobby carry out from
Dynamic information tissue and the service of adjustment, with a kind of quick, efficiently, accurate information acquiring pattern is isotropic to solve user information
The problems such as.Based on this, how from the information expanded rapidly accurate understanding user information requirement, the structure characterization network user is special
Sign, interest, the user model of target and Behavior preference simultaneously predict user behavior, preferably provide personalized clothes to the user accordingly
Business becomes a problem.How to find in time and exactly user interest drift at the same time, the user of structure dynamic renewal is emerging
Interesting model, to meet the individual information needs service of different user, has become the key issue of individual info service.
The content of the invention
In order to overcome the interest model that can not meet user demand of existing data mining mode to be pushed away to provide personalization
The deficiency of clothes is recommended, present invention structure disclosure satisfy that the interest model of user demand to provide personalized ventilation system, improves user
A kind of effective means of satisfaction, there is provided Users' Interests Mining method for incorporating body situation.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of Users' Interests Mining method for incorporating body situation, the Users' Interests Mining method comprise the following steps:
1) the user interest profile extraction model based on Second-Order Hidden Markov Model is established:
The data of user interest can be reflected by obtaining those firstly the need of collection, and process is as follows:From client, server
End, proxy server end obtain user's source data, after these source datas obtain, they are pre-processed and with the lattice of setting
Formula is preserved, for later use the excavation of family interest.
Secondly, user interest profile is extracted using Second-Order Hidden Markov Model, including training part is with extracting part;
Training department point includes being pre-processed the characteristic information sequencing of user interest, forms text document, then
After being scanned through to text, marked text sequence is converted to the text of mark using separator, space, line feed, colon typesetting
This segmentation sequences, finally calculates it following model parameter, the definite algorithm of its parameter is as shown by the equation with second order HMM model:
1. initial probability distribution vector
Wherein, Init (i) refers in marked whole training sample, with state SiTo start the number of status switch,Then refer to it is stateful for start status switch number summation;
2. original state transition probability
Wherein, CijAnd CijkRepresent respectively from state SiTo SjTransfer number, and the state S at t-1 momenti, t moment shape
State Sj, it is S to be transferred to t+1 moment statekNumber.WithRepresent respectively from state SiTo stateful transfer
The sum of number, and the state S at t-1 momenti, t moment state Sj, it is transferred to the sum of stateful number;
3. observed value discharges probability
Wherein, Ej(Ok) and Eij(Ok) state S is represented respectivelyjWhen discharge observed value OkNumber, and the shape at t-1 moment
State Si, t moment state Sj, release observed value OkNumber.WithState S is represented respectivelyjWhen release it is all
The sum of number of observed value, and the state S at t-1 momenti, t moment state Sj, discharge the sum of number of all observed values;
Extraction unit is divided to including two steps, i.e.,:(a) text of feature to be extracted is pre-processed, to text through overscan
After retouching, marked text sequence is converted to the text sections sequence of mark using separator, space, line feed, colon typesetting;
(b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm, using well-established HMM moulds
Type carries out user interest profile extraction, will handle the state output observed value O=O after obtaining1O2...OTAs mode input, from
In find out maximum probability in state tag sequence, the content of user characteristics extraction is exactly to be marked as the sight of dbjective state label
Examine text;
2) contextual information of analysis reflection user interest:Pass through the search to user, navigation patterns and purchaser record information
Analysis, derive the true interest of user in a period of time;
3) the user interest ontology model structure of situation is incorporated:First by region, gender, age, marriage, education background and receipts
Enter the key of several influence user interests as contextual factor index, and combine the historical purchase information and user behavior of user
Feature carries out Fuzzy Processing to obtain its interest level;Then the method for expressing of body situation is used, passes through more granularity divisions, structure
Build user interest ontology model;
4) user interest drift detection method based on hidden semi-Markov model:
Two observed values are chosen to describe the navigation patterns of user:A) user accesses the browse path sequence of webpage;B) from
One webpage reaches the time interval of another webpage;All state sets are expressed as S={ S1,S2,...,SN, it is corresponding
Observation set is expressed as V={ v1,v2,...,vN, time interval is expressed as set I={ 1,2 ... };For certain of user
One navigation patterns, the number of its browse path link is a stochastic variable, the number of the observed value exported under given state
The navigation patterns can be expressed as to set { 1 ..., D }.It is that two-dimentional observed value sequence is expressed as O=user's browse path sequence
{(r1,τ1),...,(rT,τT), wherein:rt∈ V represent the object of user's browsed web content;τt∈ I represent user from one
Page jump is to another page rtWith rt-1Between time interval;The output probability matrix B={ b of modeli(v, q) } table
Show, for given state i ∈ S, bi(v, q) represents user in a page rt=v ∈ V and with the time interval of the previous page
For τtThe probability of=q ∈ I, and meet ∑v,qbi(v, q)=1;With P={ pi(d) } represent to export observed value under given state i
Number is the probability of d ∈ { 1 ..., D }, is the probability matrix of state duration in hidden semi-Markov model, and meet ∑dpi(d)=1;State transition probability matrix passes through A={ aijBe indicated, aijRepresent the probability shifted from i ∈ S to j ∈ S;Just
Beginning probability vector π={ πiRepresent, πiRepresent probability of the original state in i ∈ S;
One of user important interest behavior record is defined as:Uinterest=user, background,
History, behavior, timestamp, content }, wherein, user user represents, such as ID;Background represents user
Specific contextual factor;History represents the history purchaser record of user;Behavior identifies specific interest behavior operating result;
Timestamp represents the execution time of user behavior;Content represents interest topic content;
In user accesses affairs, there is access transition probability P (q between any two behavior operationi→qj), represent
Interest weight is as follows:
For each qjAnd its corresponding conceptAll there are an observed value probability distributionI.e. u is to qj's
It is right in all accessInterest probabilities, can be byiThe collection of included access node is combined into Qi={ q '1,...,q'f|q'∈
IC }, then Qi,jRepresent atiIn in qjThe set of all access nodes afterwards,Represent Qi,jIn containThe collection of node
Close:
By u in qjUpper observed value probability distributionIt is defined as:
Then user u according toBe possible in access sequence find a status switch, establish user interest row
For hidden semi-Markov model, make it have maximum access probability:
During being detected to user interest drift, it is necessary first to the observation sequence in HSMM models is gathered, and
And data are pre-processed before model is trained, after determining model parameter, then by calling HSMM algorithms, obtain
The constant probable value of user interest, its probable value are calculated with the probable probability of average log, when the interest value of user is in just
In normal scope, then user data is added to training data and concentrated, to update the parameter of hidden semi-Markov model;Otherwise, should
User will be considered as interest drift.
Further, in the step 1), obtaining the approach of user personalized information has two kinds:(a) by network surveying, use
The mode that family oneself participates in is collected;(b) interest information of user is obtained by tracking user behavior, using user behavior
The feature extracting method of data.
Further, in the step 2), the behavioural information of user includes user's search key, user's history purchase note
Record and user's history navigation patterns.
Further,, will in User-ontology situation is built according to the interest situation information of user in the step 3)
User context is divided into user's individual situation, user environment situation and user equipment situation,.Body is using level concept tree
Form, a certain element of user context are represented by each node in tree, that is, build situation ontologies tree.
The present invention technical concept be:User oriented personalized service field, the concept drift according to involved by method
And Question Scene, it is proposed that incorporate the Users' Interests Mining method of body situation, construct the interest that disclosure satisfy that user demand
Model improves the effective means of user satisfaction to provide personalized ventilation system.
Based on this, the present invention introduces data mining, ontology, fills using user personalized information service as research object
Divide and consider user individual feature, propose a kind of Users' Interests Mining method for incorporating body situation, effectively realize user personality
Change demand for services.
Data mining, ontology are introduced, takes into full account user individual feature, first against multiple in e-commerce website
The Web user interest behavior characteristic of miscellaneous multidimensional, structure are based on Second-Order Hidden Markov Model (Second-Order
Hidden Markov Model) user interest profile extraction model;Next analyzes the situation letter that can reflect user interest
Breath, including the individual information of user, environmental information and facility information etc.;The user interest based on situation ontologies is again pulled up
Model, while the interest-degree of user's individual information is measured and expressed using the thought of fuzzy logic, it is finally based on hidden
The user interest drift detection method of semi-Markov model (Hidden Semi-Markov Model, HSMM), according to user
Browse path builds model, using the average of the probable probability of the average log of sequence as threshold point, to judge whether interest is sent out
Drift is given birth to.
The beneficial effects of the present invention are:The present invention, which constructs, disclosure satisfy that the interest model of user demand to provide individual character
Change recommendation service, improve the effective means of user satisfaction, there is good application value.
Brief description of the drawings
Fig. 1 is the algorithm flow chart of the interest characteristics extraction based on second order HMM.
Fig. 2 is the structure flow of user context body.
Fig. 3 interest drifts detect block diagram.
Embodiment
The invention will be further described below in conjunction with the accompanying drawings.
With reference to Fig. 1, Fig. 2 and Fig. 3, a kind of Users' Interests Mining method for incorporating body situation, the Users' Interests Mining
Method comprises the following steps:
5) the user interest profile extraction model based on Second-Order Hidden Markov Model is established:Web information extracts (Web
Information Extraction) belong to the category that web content excavates, it is to extract data from semi-structured Web document,
A kind of information extraction method using Web as information source.This step includes the collection of user data and user interest profile carries
The foundation of modulus type.
In order to build user interest model, it is necessary first to which collection, which obtains those, can reflect the data of user interest.Usually
In the case of, the data of user are often very much, include the information of user's registration, log information, page of text content-data, and website is opened up
Flutter structure, the behavioral data of user, and page hyperlink information etc..These data can be from client, server end, agency
The data sources such as server end obtain, and after these metadata obtain, can pre-process them and carry out in an appropriate format
Preserve, for later use excavation of family interest.It is summed up, the approach for obtaining user personalized information mainly there are two kinds:(a) pass through
Network surveying, the mode that user oneself participates in are collected.This method can directly acquire interest and the information requirement of user
Tendency, but must have the positive cooperation of user;(b) interest information of user is obtained by tracking user behavior.Due to
The first obtains the approach of user data, such as log-on message, is directly provided by user in a manner of list, is passed to back-end data
Storehouse, the extraction comparison of its user interest profile is convenient, and infers the data of user interest by tracking the implicit behavior of user
But can not directly obtain, so mainly using the feature extracting method of user behavior data here.
Secondly, the feature extraction of user interest belongs to Text Information Extraction category, and information extraction has become nature language
Say an important directions of processing, theoretical research is continuously available development.The model extracted for information about at present mainly has 3 classes:One
Kind is the model based on dictionary;One kind is rule-based model, such as body;A kind of is the model based on statistics, such as hidden Ma Er
Can husband's model (HMM).Since HMM there are the statistical basis for being very suitable for natural language processing, strong robustness, essence are extracted plus it
Degree is high, be easy to establish and it is adaptable the advantages that, it is more and more interested to researchers.Here second order hidden Markov is used
Model extracts user interest profile, and flow chart is as shown in Figure 1.Mainly include two large divisions, i.e. training part is with extracting part.
Training department point includes being pre-processed some characteristic information sequencings of user interest, forms text document,
Then after being scanned through to text, marked text sequence is converted into mark using typesettings such as separator, space, line feed, colons
The text sections sequence of note, finally calculates it following model parameter, the definite algorithm such as formula of its parameter with second order HMM model
It is shown:
1. initial probability distribution vector
Wherein, Init (i) refers in marked whole training sample, with state SiTo start the number of status switch,Then refer to it is stateful for start status switch number summation.
2. original state transition probability
Wherein, CijAnd CijkRepresent respectively from state SiTo SjTransfer number, and the state S at t-1 momenti, t moment shape
State Sj, it is S to be transferred to t+1 moment statekNumber.WithRepresent respectively from state SiTo stateful transfer
The sum of number, and the state S at t-1 momenti, t moment state Sj, it is transferred to the sum of stateful number.
3. observed value discharges probability
Wherein, Ej(Ok) and Eij(Ok) state S is represented respectivelyjWhen discharge observed value OkNumber, and the shape at t-1 moment
State Si, t moment state Sj, release observed value OkNumber.WithState S is represented respectivelyjWhen release it is all
The sum of number of observed value, and the state S at t-1 momenti, t moment state Sj, discharge the sum of number of all observed values.
Extraction unit is divided to including two steps, i.e.,:(a) text of feature to be extracted is pre-processed, to text through overscan
After retouching, marked text sequence is converted to the text sections sequence of mark using typesettings such as separator, space, line feed, colons;
(b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm.Using well-established HMM moulds
Type carries out user interest profile extraction.The state output observed value O=O after obtaining will be handled1O2...OTAs mode input, from
In find out maximum probability in state tag sequence, the content of user characteristics extraction is exactly to be marked as the sight of dbjective state label
Examine text.
6) contextual information of analysis reflection user interest:The interest characteristics of the network user is mainly by related to user interest
Internal factor and external factor influence.Internal factor has gender, age, occupation, personality, education, income etc., external
Factor then includes culture background, social environment, home background etc., and inherent and external many factors result in network
The generation of user's difference behavior.Just because of this reason so that different users is there are many difference, to the interest of commodity
Degree and deviation are also different.
The interest of user can usually be reflected in the behavior of itself, when they are interested in whatsit to produce
Certain tendentiousness, demand and the interest of user can be recorded in their behavioural information, therefore can be by user's
The analysis of the information such as search, navigation patterns and purchaser record, derives the true interest of user in a period of time.Here, user
Behavioural information mainly include the following aspects:User's search key, user's history purchaser record, user's history browse row
For etc..
7) the user interest ontology model structure of situation is incorporated:First by region, gender, age, marriage, education background and receipts
Enter the key of several influence user interests as contextual factor index, and combine the historical purchase information and user behavior of user
Feature carries out Fuzzy Processing to obtain its interest level;Then the method for expressing of body situation is used, passes through more granularity divisions, structure
Build user interest ontology model.The flow chart for building user context ontology model is as shown in Figure 2.
According to the interest situation information of user, in User-ontology situation is built, user context is divided into user's individual
Situation, user environment situation and user equipment situation.Body be typically using level concept tree form, user context certain
One element is represented by each node in tree, that is, builds situation ontologies tree.
8) user interest drift detection method based on hidden semi-Markov model:Shopping row of the user on the network in browsing
For process be by the complex process for browsing a variety of individual factors such as purpose, culture background, hobby and being influenced, by background because
Element, user behavior and interest content consider the interest of user, and establish hidden semi-Markov model (HSMM) to examine
Survey whether user interest drifts about.
Assuming that user, during webpage is browsed, its navigation patterns meets Markov property, then following two are chosen herein
A observed value describes the navigation patterns of user:A) user accesses the browse path sequence of webpage;B) reached from a webpage another
The time interval of one webpage.All state sets are expressed as S={ S1,S2,...,SN, corresponding observation set represents
For V={ v1,v2,...,vN, time interval is expressed as set I={ 1,2 ... };For a certain navigation patterns of user, its is clear
The number that path links of looking at is a stochastic variable, and the number of the observed value exported under given state can be by the navigation patterns table
It is shown as set { 1 ..., D }.It is that two-dimentional observed value sequence is expressed as O={ (r user's browse path sequence1,τ1),...,(rT,
τT), wherein:rt∈ V represent the object of user's browsed web content;τt∈ I represent user from a page jump to another
Page rtWith rt-1Between time interval.The output probability matrix B={ b of modeli(v, q) } represent, for given state i ∈
S, bi(v, q) represents user in a page rt=v ∈ V and be τ with the time interval of the previous pagetThe probability of=q ∈ I, and
Meet ∑v,qbi(v, q)=1.With P={ pi(d) } represent to export observed value number under given state i as d ∈'s { 1 ..., D }
Probability, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑dpi(d)=1.State transition probability
Matrix passes through A={ aijBe indicated, aijRepresent the probability shifted from i ∈ S to j ∈ S.Probability vector uses π={ πiTable
Show, πiRepresent probability of the original state in i ∈ S.
One of user important interest behavior record is defined as:Uinterest=user, background,
history,behavior,timestamp,content}.Wherein, user user represents, such as ID;Background represents user
Specific contextual factor;History represents the history purchaser record of user;Behavior identifies specific interest behavior operating result;
Timestamp represents the execution time of user behavior;Content represents interest topic content.
In user accesses affairs, there is access transition probability P (q between any two behavior operationi→qj), can table
Show that interest weight is as follows:
For each qjAnd its corresponding conceptAll there are an observed value probability distributionI.e. u is to qj
All access in.It is rightInterest probabilities, can be byiThe collection of included access node is combined into Qi={ q '1,...,q'f|q'
∈ IC }, then Qi,jRepresent atiIn in qjThe set of all access nodes afterwards,Represent Qi,jIn containNode
Set:
By u in qjUpper observed value probability distributionIt is defined as:
Then user u according toBe possible in access sequence find a status switch, establish user interest row
For hidden semi-Markov model, make it have maximum access probability:
During being detected to user interest drift, it is necessary first to the observation sequence in HSMM models is gathered, this
In the navigation patterns data of user are mainly used as observation value sequence, and before model is trained data are carried out pre-
Processing, after determining model parameter, then by calling HSMM algorithms, obtains the constant probable value of user interest, its probable value is used
The probable probability of average log is calculated.When the interest value of user is in normal range (NR), then user data is added to training
In data set, to update the parameter of hidden semi-Markov model;Otherwise, the user will be considered as interest drift.Drift detection
Implementation method it is as shown in Figure 3.
Claims (4)
- A kind of 1. Users' Interests Mining method for incorporating body situation, it is characterised in that:The Users' Interests Mining method includes Following steps:1) the user interest profile extraction model based on Second-Order Hidden Markov Model is established:The data of user interest can be reflected by obtaining those firstly the need of collection, and process is as follows:From client, server end, generation Manage server end obtain user's source data, these source datas obtain after, by they pre-processed and with the form of setting into Row preserves, for later use excavation of family interest;Secondly, user interest profile is extracted using Second-Order Hidden Markov Model, including training part is with extracting part;Training department point includes being pre-processed the characteristic information sequencing of user interest, text document is formed, then to text Originally after being scanned through, marked text sequence is converted to the text point of mark using separator, space, line feed, colon typesetting Block sequence, finally with second order HMM model according to formula (1)~(5) computation model parameter, the definite algorithm such as formula institute of its parameter Show:1. initial probability distribution vector<mrow> <msub> <mi>&pi;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>I</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>I</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>Wherein, Init (i) refers in marked whole training sample, with state SiTo start the number of status switch, Then refer to it is stateful for start status switch number summation;2. original state transition probability<mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>k</mi> </mrow> </msub> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow><mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>u</mi> </mrow> </msub> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>Wherein, CijAnd CijkRepresent respectively from state SiTo SjTransfer number, and the state S at t-1 momenti, t moment state Sj, it is S to be transferred to t+1 moment statekNumber;WithRepresent respectively from state SiTo stateful transfer number The sum of, and the state S at t-1 momenti, t moment state Sj, it is transferred to the sum of stateful number;3. observed value discharges probability<mrow> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>E</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>E</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>j</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow><mrow> <msub> <mi>b</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>E</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>E</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>u</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&le;</mo> <mi>N</mi> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>k</mi> <mo>&le;</mo> <mi>M</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>Wherein, Ej(Ok) and Eij(Ok) state S is represented respectivelyjWhen discharge observed value OkNumber, and the state S at t-1 momenti, t Moment state Sj, release observed value OkNumber;WithState S is represented respectivelyjWhen discharge all observed values The sum of number, and the state S at t-1 momenti, t moment state Sj, discharge the sum of number of all observed values;Extraction unit is divided to including two steps, i.e.,:(a) the characteristic information sequencing of user interest is pre-processed, forms text This document, after being scanned through to text, mark is converted to using separator, space, line feed, colon typesetting by marked text sequence The text sections sequence of note;(b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm, should User interest profile extraction is carried out with well-established HMM model, the state output observed value O=O after obtaining will be handled1O2… OTAs mode input, maximum probability in state tag sequence is therefrom found out, the content of user characteristics extraction is exactly labeled For the observation text of dbjective state label;2) contextual information of analysis reflection user interest:Pass through point of the search to user, navigation patterns and purchaser record information Analysis, derives the true interest of user in a period of time;3) the user interest ontology model structure of situation is incorporated:It is first that region, gender, age, marriage, education background and income is several A key for influencing user interest combines the historical purchase information and user behavior feature of user as contextual factor index Fuzzy Processing is carried out to obtain its interest level;Then the method for expressing of body situation is used, by more granularity divisions, structure is used Family interesting ontological profile;4) user interest drift detection method based on hidden semi-Markov model:Two observed values are chosen to describe the navigation patterns of user:A) user accesses the browse path sequence of webpage;B) from one Webpage reaches the time interval of another webpage;All state sets are expressed as S={ S1,S2,…,SN, corresponding observed value Set expression is V={ v1,v2,…,vN, time interval is expressed as set I={ 1,2 ... };A certain for user browses row For the number of its browse path link is a stochastic variable, and the number of the observed value exported under given state can be clear by this Behavior representation is look at into set { 1 ..., D };It is that two-dimentional observed value sequence is expressed as O={ (r user's browse path sequence1, τ1),…,(rT,τT), wherein:rt∈ V represent the object of user's browsed web content;τt∈ I represent that user jumps from a page Go to another page rtWith rt-1Between time interval;The output probability matrix B={ b of modeli(v, q) } represent, for Given state i ∈ S, bi(v, q) represents user in a page rt=v ∈ V and be τ with the time interval of the previous paget=q The probability of ∈ I, and meet ∑v,qbi(v, q)=1;P is the probability matrix of state duration in hidden semi-Markov model, P ={ pi(d) }, pi(d) represent that observed value number is exported under given state i is the probability of d ∈ { 1 ..., D }, and meet ∑dpi (d)=1;State transition probability matrix passes through A={ aijBe indicated, aijRepresent the probability shifted from i ∈ S to j ∈ S;Initially Probability vector π={ πiRepresent, πiRepresent probability of the original state in i ∈ S;One of user important interest behavior record is defined as:Uinterest=user, background, history, Behavior, timestamp, content }, wherein, user represents user;Background represents the specific contextual factor of user; History represents the history purchaser record of user;Behavior identifies specific interest behavior operating result;Timestamp is represented The execution time of user behavior;Content represents interest topic content;In user accesses affairs, there is access transition probability P (q between any two behavior operationi→qj), represent as follows:<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>&RightArrow;</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>|</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mfrac> <mrow> <msub> <mi>&theta;</mi> <mn>1</mn> </msub> <msub> <mi>W</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>2</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>H</mi> <mi>I</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>3</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>I</mi> <mi>B</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>4</mn> </msub> <msub> <mi>W</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&theta;</mi> <mn>1</mn> </msub> <msub> <mi>W</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>2</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>H</mi> <mi>I</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>3</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>I</mi> <mi>B</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>4</mn> </msub> <msub> <mi>W</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>For each qjAnd its corresponding observed valueAll there are an observed value probability distributionThat is user user To qjAll access in, to observed valueInterest probabilities, can be byiThe set Q of included access node statei= {q1',…,q'f| q' ∈ IC } represent, then Qi,jRepresent atiIn in qjThe set of all access nodes afterwards,Represent Qi,jIn contain observed valueThe set of node:<mrow> <msub> <mi>Q</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>{</mo> <msubsup> <mi>q</mi> <mrow> <mi>k</mi> <mo>+</mo> <mi>l</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>|</mo> <msubsup> <mi>q</mi> <mi>k</mi> <mo>&prime;</mo> </msubsup> <mo>=</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>,</mo> <mi>l</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mrow> <mo>(</mo> <mi>f</mi> <mo>-</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>&Element;</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>N</mi> <mi>u</mi> <mi>l</mi> <mi>l</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>By user user in qjUpper observed value probability distributionIt is defined as:Then user user according toBe possible in access sequence find a status switch, establish user's interest behavior Hidden semi-Markov model, make it have maximum access probability:<mrow> <msub> <mi>P</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>&sigma;</mi> <mi>z</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>arg</mi> <mi>max</mi> <mo>&Pi;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>k</mi> </msub> <mo>&RightArrow;</mo> <msub> <mi>q</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>&sigma;</mi> <mi>z</mi> <mi>k</mi> </msubsup> <mo>|</mo> <msub> <mi>q</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>During being detected to user interest drift, it is necessary first to gather the observation sequence in HSMM models, and Model pre-processes data before being trained, and after determining model parameter, then by calling HSMM algorithms, obtains user The constant probable value of interest, its probable value are calculated with the probable probability of average log, when the interest value of user is in normal model In enclosing, then user data is added to training data and concentrated, to update the parameter of hidden semi-Markov model;Otherwise, the user It will be considered as interest drift.
- A kind of 2. Users' Interests Mining method for incorporating body situation as claimed in claim 1, it is characterised in that:The step 1) in, obtaining the approach of user personalized information has two kinds:(a) by network surveying, the mode that user oneself participates in is received Collection;(b) interest information of user is obtained by tracking user behavior, using the feature extracting method of user behavior data.
- A kind of 3. Users' Interests Mining method for incorporating body situation as claimed in claim 1 or 2, it is characterised in that:It is described In step 2), the behavioural information of user includes user's search key, user's history purchaser record and user's history navigation patterns.
- A kind of 4. Users' Interests Mining method for incorporating body situation as claimed in claim 1 or 2, it is characterised in that:It is described In step 3), according to the interest situation information of user, in User-ontology situation is built, user context is divided into user's individual Situation, user environment situation and user equipment situation, body use the form of level concept tree, a certain element of user context Exactly represented by each node in tree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410269562.6A CN104008203B (en) | 2014-06-17 | 2014-06-17 | A kind of Users' Interests Mining method for incorporating body situation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410269562.6A CN104008203B (en) | 2014-06-17 | 2014-06-17 | A kind of Users' Interests Mining method for incorporating body situation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104008203A CN104008203A (en) | 2014-08-27 |
CN104008203B true CN104008203B (en) | 2018-04-17 |
Family
ID=51368860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410269562.6A Active CN104008203B (en) | 2014-06-17 | 2014-06-17 | A kind of Users' Interests Mining method for incorporating body situation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104008203B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718471A (en) * | 2014-12-03 | 2016-06-29 | 中国科学院声学研究所 | User preference modeling method, system, and user preference evaluation method and system |
US9940362B2 (en) * | 2015-05-26 | 2018-04-10 | Google Llc | Predicting user needs for a particular context |
CN106055661B (en) * | 2016-06-02 | 2017-11-17 | 福州大学 | More interest resource recommendations based on more Markov chain models |
CN106776757B (en) * | 2016-11-15 | 2020-03-27 | 中国银行股份有限公司 | Method and device for indicating user to complete online banking operation |
CN106651517B (en) * | 2016-12-20 | 2021-11-30 | 广东技术师范大学 | Drug recommendation method based on hidden semi-Markov model |
CN109388661B (en) | 2017-08-02 | 2020-04-21 | 创新先进技术有限公司 | Model training method and device based on shared data |
CN107609063B (en) * | 2017-08-29 | 2020-03-17 | 重庆邮电大学 | Multi-label classified mobile phone application recommendation system and method thereof |
CN108134691B (en) * | 2017-12-18 | 2019-10-01 | Oppo广东移动通信有限公司 | Model building method, Internet resources preload method, apparatus, medium and terminal |
CN108038222B (en) * | 2017-12-22 | 2022-01-11 | 冶金自动化研究设计院 | System of entity-attribute framework for information system modeling and data access |
CN108596205B (en) * | 2018-03-20 | 2022-02-11 | 重庆邮电大学 | Microblog forwarding behavior prediction method based on region correlation factor and sparse representation |
CN108809955B (en) * | 2018-05-22 | 2019-05-24 | 南瑞集团有限公司 | A kind of power consumer behavior depth analysis method based on hidden Markov model |
CN109741146B (en) * | 2019-01-04 | 2022-06-28 | 平安科技(深圳)有限公司 | Product recommendation method, device, equipment and storage medium based on user behaviors |
CN109933741B (en) * | 2019-02-27 | 2020-06-23 | 京东数字科技控股有限公司 | Method, device and storage medium for extracting user network behavior characteristics |
CN110162553A (en) * | 2019-05-21 | 2019-08-23 | 南京邮电大学 | Users' Interests Mining method based on attention-RNN |
CN110297817A (en) * | 2019-06-25 | 2019-10-01 | 哈尔滨工业大学 | A method of the structure of knowledge is constructed based on personalized Bayes's knowledge tracing model |
CN110866542B (en) * | 2019-10-17 | 2021-11-19 | 西安交通大学 | Depth representation learning method based on feature controllable fusion |
CN114169869B (en) * | 2022-02-14 | 2022-06-07 | 北京大学 | Attention mechanism-based post recommendation method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043793A (en) * | 2009-10-09 | 2011-05-04 | 卢健华 | Knowledge-service-oriented recommendation method |
CN103514289A (en) * | 2013-10-08 | 2014-01-15 | 北京百度网讯科技有限公司 | Method and device for building interest entity base |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090074108A (en) * | 2007-12-28 | 2009-07-06 | 주식회사 솔트룩스 | Method for recommending contents with context awareness |
-
2014
- 2014-06-17 CN CN201410269562.6A patent/CN104008203B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043793A (en) * | 2009-10-09 | 2011-05-04 | 卢健华 | Knowledge-service-oriented recommendation method |
CN103514289A (en) * | 2013-10-08 | 2014-01-15 | 北京百度网讯科技有限公司 | Method and device for building interest entity base |
Also Published As
Publication number | Publication date |
---|---|
CN104008203A (en) | 2014-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104008203B (en) | A kind of Users' Interests Mining method for incorporating body situation | |
CN106951422B (en) | Webpage training method and device, and search intention identification method and device | |
CN108717408B (en) | Sensitive word real-time monitoring method, electronic equipment, storage medium and system | |
CN104899273B (en) | A kind of Web Personalization method based on topic and relative entropy | |
CN102902821B (en) | The image high-level semantics mark of much-talked-about topic Network Based, search method and device | |
CN104572797A (en) | Individual service recommendation system and method based on topic model | |
CN107357793B (en) | Information recommendation method and device | |
CN112214685A (en) | Knowledge graph-based personalized recommendation method | |
CN105139237A (en) | Information push method and apparatus | |
CN105279146A (en) | Context-aware approach to detection of short irrelevant texts | |
CN103678431A (en) | Recommendation method based on standard labels and item grades | |
CN104731962A (en) | Method and system for friend recommendation based on similar associations in social network | |
CN105045931A (en) | Video recommendation method and system based on Web mining | |
WO2018112696A1 (en) | Content pushing method and content pushing system | |
CN105531701A (en) | Personalized trending image search suggestion | |
CN102270212A (en) | User interest feature extraction method based on hidden semi-Markov model | |
CN109063147A (en) | Online course forum content recommendation method and system based on text similarity | |
CN109271514A (en) | Generation method, classification method, device and the storage medium of short text disaggregated model | |
CN104077417A (en) | Figure tag recommendation method and system in social network | |
Lu | Semi-supervised microblog sentiment analysis using social relation and text similarity | |
CN112989208B (en) | Information recommendation method and device, electronic equipment and storage medium | |
CN112801425B (en) | Method and device for determining information click rate, computer equipment and storage medium | |
CN112966091A (en) | Knowledge graph recommendation system fusing entity information and heat | |
CN112288554B (en) | Commodity recommendation method and device, storage medium and electronic device | |
Ye et al. | A web services classification method based on GCN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |