CN104008203B - A kind of Users' Interests Mining method for incorporating body situation - Google Patents

A kind of Users' Interests Mining method for incorporating body situation Download PDF

Info

Publication number
CN104008203B
CN104008203B CN201410269562.6A CN201410269562A CN104008203B CN 104008203 B CN104008203 B CN 104008203B CN 201410269562 A CN201410269562 A CN 201410269562A CN 104008203 B CN104008203 B CN 104008203B
Authority
CN
China
Prior art keywords
mrow
msub
user
interest
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410269562.6A
Other languages
Chinese (zh)
Other versions
CN104008203A (en
Inventor
陈庭贵
周广澜
许翀寰
封毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201410269562.6A priority Critical patent/CN104008203B/en
Publication of CN104008203A publication Critical patent/CN104008203A/en
Application granted granted Critical
Publication of CN104008203B publication Critical patent/CN104008203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of Users' Interests Mining method for incorporating body situation, first against the Web user interest behavior characteristic of complex multi-dimensional in e-commerce website, builds the user interest profile extraction model based on Second-Order Hidden Markov Model;Next analyzes the contextual information that can reflect user interest, including the individual information of user, environmental information and facility information etc.;The user interest model based on situation ontologies is again pulled up, the interest-degree of user's individual information is measured and expressed using the thought of fuzzy logic at the same time, it is finally based on the user interest drift detection method of hidden semi-Markov model, model is built according to user's browse path, using the average of the probable probability of the average log of sequence as threshold point, to judge whether interest is drifted about.The present invention constructs the interest model that disclosure satisfy that user demand to provide personalized ventilation system, improves the effective means of user satisfaction, has good application value.

Description

A kind of Users' Interests Mining method for incorporating body situation
Technical field
The present invention relates to data mining and ontology field, especially a kind of Users' Interests Mining method is particularly suitable In user personalized information services the problem of.
Background technology
Network application becomes increasingly complex, and data volume is also increasing, some such as e-commerce, web site design work Become more complicated with heavy, this is needed on the basis of user's existing information, from user's Access Interest, access time, access Dynamic adjustment structure of web page in terms of the behaviors such as frequency, targetedly carries out e-commerce to meet user demand, there is provided individual character Change service.The individual info service of Internet be exactly according to user it is different the characteristics of, and user interest hobby carry out from Dynamic information tissue and the service of adjustment, with a kind of quick, efficiently, accurate information acquiring pattern is isotropic to solve user information The problems such as.Based on this, how from the information expanded rapidly accurate understanding user information requirement, the structure characterization network user is special Sign, interest, the user model of target and Behavior preference simultaneously predict user behavior, preferably provide personalized clothes to the user accordingly Business becomes a problem.How to find in time and exactly user interest drift at the same time, the user of structure dynamic renewal is emerging Interesting model, to meet the individual information needs service of different user, has become the key issue of individual info service.
The content of the invention
In order to overcome the interest model that can not meet user demand of existing data mining mode to be pushed away to provide personalization The deficiency of clothes is recommended, present invention structure disclosure satisfy that the interest model of user demand to provide personalized ventilation system, improves user A kind of effective means of satisfaction, there is provided Users' Interests Mining method for incorporating body situation.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of Users' Interests Mining method for incorporating body situation, the Users' Interests Mining method comprise the following steps:
1) the user interest profile extraction model based on Second-Order Hidden Markov Model is established:
The data of user interest can be reflected by obtaining those firstly the need of collection, and process is as follows:From client, server End, proxy server end obtain user's source data, after these source datas obtain, they are pre-processed and with the lattice of setting Formula is preserved, for later use the excavation of family interest.
Secondly, user interest profile is extracted using Second-Order Hidden Markov Model, including training part is with extracting part;
Training department point includes being pre-processed the characteristic information sequencing of user interest, forms text document, then After being scanned through to text, marked text sequence is converted to the text of mark using separator, space, line feed, colon typesetting This segmentation sequences, finally calculates it following model parameter, the definite algorithm of its parameter is as shown by the equation with second order HMM model:
1. initial probability distribution vector
Wherein, Init (i) refers in marked whole training sample, with state SiTo start the number of status switch,Then refer to it is stateful for start status switch number summation;
2. original state transition probability
Wherein, CijAnd CijkRepresent respectively from state SiTo SjTransfer number, and the state S at t-1 momenti, t moment shape State Sj, it is S to be transferred to t+1 moment statekNumber.WithRepresent respectively from state SiTo stateful transfer The sum of number, and the state S at t-1 momenti, t moment state Sj, it is transferred to the sum of stateful number;
3. observed value discharges probability
Wherein, Ej(Ok) and Eij(Ok) state S is represented respectivelyjWhen discharge observed value OkNumber, and the shape at t-1 moment State Si, t moment state Sj, release observed value OkNumber.WithState S is represented respectivelyjWhen release it is all The sum of number of observed value, and the state S at t-1 momenti, t moment state Sj, discharge the sum of number of all observed values;
Extraction unit is divided to including two steps, i.e.,:(a) text of feature to be extracted is pre-processed, to text through overscan After retouching, marked text sequence is converted to the text sections sequence of mark using separator, space, line feed, colon typesetting; (b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm, using well-established HMM moulds Type carries out user interest profile extraction, will handle the state output observed value O=O after obtaining1O2...OTAs mode input, from In find out maximum probability in state tag sequence, the content of user characteristics extraction is exactly to be marked as the sight of dbjective state label Examine text;
2) contextual information of analysis reflection user interest:Pass through the search to user, navigation patterns and purchaser record information Analysis, derive the true interest of user in a period of time;
3) the user interest ontology model structure of situation is incorporated:First by region, gender, age, marriage, education background and receipts Enter the key of several influence user interests as contextual factor index, and combine the historical purchase information and user behavior of user Feature carries out Fuzzy Processing to obtain its interest level;Then the method for expressing of body situation is used, passes through more granularity divisions, structure Build user interest ontology model;
4) user interest drift detection method based on hidden semi-Markov model:
Two observed values are chosen to describe the navigation patterns of user:A) user accesses the browse path sequence of webpage;B) from One webpage reaches the time interval of another webpage;All state sets are expressed as S={ S1,S2,...,SN, it is corresponding Observation set is expressed as V={ v1,v2,...,vN, time interval is expressed as set I={ 1,2 ... };For certain of user One navigation patterns, the number of its browse path link is a stochastic variable, the number of the observed value exported under given state The navigation patterns can be expressed as to set { 1 ..., D }.It is that two-dimentional observed value sequence is expressed as O=user's browse path sequence {(r11),...,(rTT), wherein:rt∈ V represent the object of user's browsed web content;τt∈ I represent user from one Page jump is to another page rtWith rt-1Between time interval;The output probability matrix B={ b of modeli(v, q) } table Show, for given state i ∈ S, bi(v, q) represents user in a page rt=v ∈ V and with the time interval of the previous page For τtThe probability of=q ∈ I, and meet ∑v,qbi(v, q)=1;With P={ pi(d) } represent to export observed value under given state i Number is the probability of d ∈ { 1 ..., D }, is the probability matrix of state duration in hidden semi-Markov model, and meet ∑dpi(d)=1;State transition probability matrix passes through A={ aijBe indicated, aijRepresent the probability shifted from i ∈ S to j ∈ S;Just Beginning probability vector π={ πiRepresent, πiRepresent probability of the original state in i ∈ S;
One of user important interest behavior record is defined as:Uinterest=user, background, History, behavior, timestamp, content }, wherein, user user represents, such as ID;Background represents user Specific contextual factor;History represents the history purchaser record of user;Behavior identifies specific interest behavior operating result; Timestamp represents the execution time of user behavior;Content represents interest topic content;
In user accesses affairs, there is access transition probability P (q between any two behavior operationi→qj), represent Interest weight is as follows:
For each qjAnd its corresponding conceptAll there are an observed value probability distributionI.e. u is to qj's It is right in all accessInterest probabilities, can be byiThe collection of included access node is combined into Qi={ q '1,...,q'f|q'∈ IC }, then Qi,jRepresent atiIn in qjThe set of all access nodes afterwards,Represent Qi,jIn containThe collection of node Close:
By u in qjUpper observed value probability distributionIt is defined as:
Then user u according toBe possible in access sequence find a status switch, establish user interest row For hidden semi-Markov model, make it have maximum access probability:
During being detected to user interest drift, it is necessary first to the observation sequence in HSMM models is gathered, and And data are pre-processed before model is trained, after determining model parameter, then by calling HSMM algorithms, obtain The constant probable value of user interest, its probable value are calculated with the probable probability of average log, when the interest value of user is in just In normal scope, then user data is added to training data and concentrated, to update the parameter of hidden semi-Markov model;Otherwise, should User will be considered as interest drift.
Further, in the step 1), obtaining the approach of user personalized information has two kinds:(a) by network surveying, use The mode that family oneself participates in is collected;(b) interest information of user is obtained by tracking user behavior, using user behavior The feature extracting method of data.
Further, in the step 2), the behavioural information of user includes user's search key, user's history purchase note Record and user's history navigation patterns.
Further,, will in User-ontology situation is built according to the interest situation information of user in the step 3) User context is divided into user's individual situation, user environment situation and user equipment situation,.Body is using level concept tree Form, a certain element of user context are represented by each node in tree, that is, build situation ontologies tree.
The present invention technical concept be:User oriented personalized service field, the concept drift according to involved by method And Question Scene, it is proposed that incorporate the Users' Interests Mining method of body situation, construct the interest that disclosure satisfy that user demand Model improves the effective means of user satisfaction to provide personalized ventilation system.
Based on this, the present invention introduces data mining, ontology, fills using user personalized information service as research object Divide and consider user individual feature, propose a kind of Users' Interests Mining method for incorporating body situation, effectively realize user personality Change demand for services.
Data mining, ontology are introduced, takes into full account user individual feature, first against multiple in e-commerce website The Web user interest behavior characteristic of miscellaneous multidimensional, structure are based on Second-Order Hidden Markov Model (Second-Order Hidden Markov Model) user interest profile extraction model;Next analyzes the situation letter that can reflect user interest Breath, including the individual information of user, environmental information and facility information etc.;The user interest based on situation ontologies is again pulled up Model, while the interest-degree of user's individual information is measured and expressed using the thought of fuzzy logic, it is finally based on hidden The user interest drift detection method of semi-Markov model (Hidden Semi-Markov Model, HSMM), according to user Browse path builds model, using the average of the probable probability of the average log of sequence as threshold point, to judge whether interest is sent out Drift is given birth to.
The beneficial effects of the present invention are:The present invention, which constructs, disclosure satisfy that the interest model of user demand to provide individual character Change recommendation service, improve the effective means of user satisfaction, there is good application value.
Brief description of the drawings
Fig. 1 is the algorithm flow chart of the interest characteristics extraction based on second order HMM.
Fig. 2 is the structure flow of user context body.
Fig. 3 interest drifts detect block diagram.
Embodiment
The invention will be further described below in conjunction with the accompanying drawings.
With reference to Fig. 1, Fig. 2 and Fig. 3, a kind of Users' Interests Mining method for incorporating body situation, the Users' Interests Mining Method comprises the following steps:
5) the user interest profile extraction model based on Second-Order Hidden Markov Model is established:Web information extracts (Web Information Extraction) belong to the category that web content excavates, it is to extract data from semi-structured Web document, A kind of information extraction method using Web as information source.This step includes the collection of user data and user interest profile carries The foundation of modulus type.
In order to build user interest model, it is necessary first to which collection, which obtains those, can reflect the data of user interest.Usually In the case of, the data of user are often very much, include the information of user's registration, log information, page of text content-data, and website is opened up Flutter structure, the behavioral data of user, and page hyperlink information etc..These data can be from client, server end, agency The data sources such as server end obtain, and after these metadata obtain, can pre-process them and carry out in an appropriate format Preserve, for later use excavation of family interest.It is summed up, the approach for obtaining user personalized information mainly there are two kinds:(a) pass through Network surveying, the mode that user oneself participates in are collected.This method can directly acquire interest and the information requirement of user Tendency, but must have the positive cooperation of user;(b) interest information of user is obtained by tracking user behavior.Due to The first obtains the approach of user data, such as log-on message, is directly provided by user in a manner of list, is passed to back-end data Storehouse, the extraction comparison of its user interest profile is convenient, and infers the data of user interest by tracking the implicit behavior of user But can not directly obtain, so mainly using the feature extracting method of user behavior data here.
Secondly, the feature extraction of user interest belongs to Text Information Extraction category, and information extraction has become nature language Say an important directions of processing, theoretical research is continuously available development.The model extracted for information about at present mainly has 3 classes:One Kind is the model based on dictionary;One kind is rule-based model, such as body;A kind of is the model based on statistics, such as hidden Ma Er Can husband's model (HMM).Since HMM there are the statistical basis for being very suitable for natural language processing, strong robustness, essence are extracted plus it Degree is high, be easy to establish and it is adaptable the advantages that, it is more and more interested to researchers.Here second order hidden Markov is used Model extracts user interest profile, and flow chart is as shown in Figure 1.Mainly include two large divisions, i.e. training part is with extracting part.
Training department point includes being pre-processed some characteristic information sequencings of user interest, forms text document, Then after being scanned through to text, marked text sequence is converted into mark using typesettings such as separator, space, line feed, colons The text sections sequence of note, finally calculates it following model parameter, the definite algorithm such as formula of its parameter with second order HMM model It is shown:
1. initial probability distribution vector
Wherein, Init (i) refers in marked whole training sample, with state SiTo start the number of status switch,Then refer to it is stateful for start status switch number summation.
2. original state transition probability
Wherein, CijAnd CijkRepresent respectively from state SiTo SjTransfer number, and the state S at t-1 momenti, t moment shape State Sj, it is S to be transferred to t+1 moment statekNumber.WithRepresent respectively from state SiTo stateful transfer The sum of number, and the state S at t-1 momenti, t moment state Sj, it is transferred to the sum of stateful number.
3. observed value discharges probability
Wherein, Ej(Ok) and Eij(Ok) state S is represented respectivelyjWhen discharge observed value OkNumber, and the shape at t-1 moment State Si, t moment state Sj, release observed value OkNumber.WithState S is represented respectivelyjWhen release it is all The sum of number of observed value, and the state S at t-1 momenti, t moment state Sj, discharge the sum of number of all observed values.
Extraction unit is divided to including two steps, i.e.,:(a) text of feature to be extracted is pre-processed, to text through overscan After retouching, marked text sequence is converted to the text sections sequence of mark using typesettings such as separator, space, line feed, colons; (b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm.Using well-established HMM moulds Type carries out user interest profile extraction.The state output observed value O=O after obtaining will be handled1O2...OTAs mode input, from In find out maximum probability in state tag sequence, the content of user characteristics extraction is exactly to be marked as the sight of dbjective state label Examine text.
6) contextual information of analysis reflection user interest:The interest characteristics of the network user is mainly by related to user interest Internal factor and external factor influence.Internal factor has gender, age, occupation, personality, education, income etc., external Factor then includes culture background, social environment, home background etc., and inherent and external many factors result in network The generation of user's difference behavior.Just because of this reason so that different users is there are many difference, to the interest of commodity Degree and deviation are also different.
The interest of user can usually be reflected in the behavior of itself, when they are interested in whatsit to produce Certain tendentiousness, demand and the interest of user can be recorded in their behavioural information, therefore can be by user's The analysis of the information such as search, navigation patterns and purchaser record, derives the true interest of user in a period of time.Here, user Behavioural information mainly include the following aspects:User's search key, user's history purchaser record, user's history browse row For etc..
7) the user interest ontology model structure of situation is incorporated:First by region, gender, age, marriage, education background and receipts Enter the key of several influence user interests as contextual factor index, and combine the historical purchase information and user behavior of user Feature carries out Fuzzy Processing to obtain its interest level;Then the method for expressing of body situation is used, passes through more granularity divisions, structure Build user interest ontology model.The flow chart for building user context ontology model is as shown in Figure 2.
According to the interest situation information of user, in User-ontology situation is built, user context is divided into user's individual Situation, user environment situation and user equipment situation.Body be typically using level concept tree form, user context certain One element is represented by each node in tree, that is, builds situation ontologies tree.
8) user interest drift detection method based on hidden semi-Markov model:Shopping row of the user on the network in browsing For process be by the complex process for browsing a variety of individual factors such as purpose, culture background, hobby and being influenced, by background because Element, user behavior and interest content consider the interest of user, and establish hidden semi-Markov model (HSMM) to examine Survey whether user interest drifts about.
Assuming that user, during webpage is browsed, its navigation patterns meets Markov property, then following two are chosen herein A observed value describes the navigation patterns of user:A) user accesses the browse path sequence of webpage;B) reached from a webpage another The time interval of one webpage.All state sets are expressed as S={ S1,S2,...,SN, corresponding observation set represents For V={ v1,v2,...,vN, time interval is expressed as set I={ 1,2 ... };For a certain navigation patterns of user, its is clear The number that path links of looking at is a stochastic variable, and the number of the observed value exported under given state can be by the navigation patterns table It is shown as set { 1 ..., D }.It is that two-dimentional observed value sequence is expressed as O={ (r user's browse path sequence11),...,(rT, τT), wherein:rt∈ V represent the object of user's browsed web content;τt∈ I represent user from a page jump to another Page rtWith rt-1Between time interval.The output probability matrix B={ b of modeli(v, q) } represent, for given state i ∈ S, bi(v, q) represents user in a page rt=v ∈ V and be τ with the time interval of the previous pagetThe probability of=q ∈ I, and Meet ∑v,qbi(v, q)=1.With P={ pi(d) } represent to export observed value number under given state i as d ∈'s { 1 ..., D } Probability, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑dpi(d)=1.State transition probability Matrix passes through A={ aijBe indicated, aijRepresent the probability shifted from i ∈ S to j ∈ S.Probability vector uses π={ πiTable Show, πiRepresent probability of the original state in i ∈ S.
One of user important interest behavior record is defined as:Uinterest=user, background, history,behavior,timestamp,content}.Wherein, user user represents, such as ID;Background represents user Specific contextual factor;History represents the history purchaser record of user;Behavior identifies specific interest behavior operating result; Timestamp represents the execution time of user behavior;Content represents interest topic content.
In user accesses affairs, there is access transition probability P (q between any two behavior operationi→qj), can table Show that interest weight is as follows:
For each qjAnd its corresponding conceptAll there are an observed value probability distributionI.e. u is to qj All access in.It is rightInterest probabilities, can be byiThe collection of included access node is combined into Qi={ q '1,...,q'f|q' ∈ IC }, then Qi,jRepresent atiIn in qjThe set of all access nodes afterwards,Represent Qi,jIn containNode Set:
By u in qjUpper observed value probability distributionIt is defined as:
Then user u according toBe possible in access sequence find a status switch, establish user interest row For hidden semi-Markov model, make it have maximum access probability:
During being detected to user interest drift, it is necessary first to the observation sequence in HSMM models is gathered, this In the navigation patterns data of user are mainly used as observation value sequence, and before model is trained data are carried out pre- Processing, after determining model parameter, then by calling HSMM algorithms, obtains the constant probable value of user interest, its probable value is used The probable probability of average log is calculated.When the interest value of user is in normal range (NR), then user data is added to training In data set, to update the parameter of hidden semi-Markov model;Otherwise, the user will be considered as interest drift.Drift detection Implementation method it is as shown in Figure 3.

Claims (4)

  1. A kind of 1. Users' Interests Mining method for incorporating body situation, it is characterised in that:The Users' Interests Mining method includes Following steps:
    1) the user interest profile extraction model based on Second-Order Hidden Markov Model is established:
    The data of user interest can be reflected by obtaining those firstly the need of collection, and process is as follows:From client, server end, generation Manage server end obtain user's source data, these source datas obtain after, by they pre-processed and with the form of setting into Row preserves, for later use excavation of family interest;
    Secondly, user interest profile is extracted using Second-Order Hidden Markov Model, including training part is with extracting part;
    Training department point includes being pre-processed the characteristic information sequencing of user interest, text document is formed, then to text Originally after being scanned through, marked text sequence is converted to the text point of mark using separator, space, line feed, colon typesetting Block sequence, finally with second order HMM model according to formula (1)~(5) computation model parameter, the definite algorithm such as formula institute of its parameter Show:
    1. initial probability distribution vector
    <mrow> <msub> <mi>&amp;pi;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>I</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>I</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>i</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
    Wherein, Init (i) refers in marked whole training sample, with state SiTo start the number of status switch, Then refer to it is stateful for start status switch number summation;
    2. original state transition probability
    <mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>k</mi> </mrow> </msub> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
    <mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>u</mi> </mrow> </msub> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
    Wherein, CijAnd CijkRepresent respectively from state SiTo SjTransfer number, and the state S at t-1 momenti, t moment state Sj, it is S to be transferred to t+1 moment statekNumber;WithRepresent respectively from state SiTo stateful transfer number The sum of, and the state S at t-1 momenti, t moment state Sj, it is transferred to the sum of stateful number;
    3. observed value discharges probability
    <mrow> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>E</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>E</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>j</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
    <mrow> <msub> <mi>b</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>E</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>E</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>u</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>k</mi> <mo>&amp;le;</mo> <mi>M</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>
    Wherein, Ej(Ok) and Eij(Ok) state S is represented respectivelyjWhen discharge observed value OkNumber, and the state S at t-1 momenti, t Moment state Sj, release observed value OkNumber;WithState S is represented respectivelyjWhen discharge all observed values The sum of number, and the state S at t-1 momenti, t moment state Sj, discharge the sum of number of all observed values;
    Extraction unit is divided to including two steps, i.e.,:(a) the characteristic information sequencing of user interest is pre-processed, forms text This document, after being scanned through to text, mark is converted to using separator, space, line feed, colon typesetting by marked text sequence The text sections sequence of note;(b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm, should User interest profile extraction is carried out with well-established HMM model, the state output observed value O=O after obtaining will be handled1O2… OTAs mode input, maximum probability in state tag sequence is therefrom found out, the content of user characteristics extraction is exactly labeled For the observation text of dbjective state label;
    2) contextual information of analysis reflection user interest:Pass through point of the search to user, navigation patterns and purchaser record information Analysis, derives the true interest of user in a period of time;
    3) the user interest ontology model structure of situation is incorporated:It is first that region, gender, age, marriage, education background and income is several A key for influencing user interest combines the historical purchase information and user behavior feature of user as contextual factor index Fuzzy Processing is carried out to obtain its interest level;Then the method for expressing of body situation is used, by more granularity divisions, structure is used Family interesting ontological profile;
    4) user interest drift detection method based on hidden semi-Markov model:
    Two observed values are chosen to describe the navigation patterns of user:A) user accesses the browse path sequence of webpage;B) from one Webpage reaches the time interval of another webpage;All state sets are expressed as S={ S1,S2,…,SN, corresponding observed value Set expression is V={ v1,v2,…,vN, time interval is expressed as set I={ 1,2 ... };A certain for user browses row For the number of its browse path link is a stochastic variable, and the number of the observed value exported under given state can be clear by this Behavior representation is look at into set { 1 ..., D };It is that two-dimentional observed value sequence is expressed as O={ (r user's browse path sequence1, τ1),…,(rTT), wherein:rt∈ V represent the object of user's browsed web content;τt∈ I represent that user jumps from a page Go to another page rtWith rt-1Between time interval;The output probability matrix B={ b of modeli(v, q) } represent, for Given state i ∈ S, bi(v, q) represents user in a page rt=v ∈ V and be τ with the time interval of the previous paget=q The probability of ∈ I, and meet ∑v,qbi(v, q)=1;P is the probability matrix of state duration in hidden semi-Markov model, P ={ pi(d) }, pi(d) represent that observed value number is exported under given state i is the probability of d ∈ { 1 ..., D }, and meet ∑dpi (d)=1;State transition probability matrix passes through A={ aijBe indicated, aijRepresent the probability shifted from i ∈ S to j ∈ S;Initially Probability vector π={ πiRepresent, πiRepresent probability of the original state in i ∈ S;
    One of user important interest behavior record is defined as:Uinterest=user, background, history, Behavior, timestamp, content }, wherein, user represents user;Background represents the specific contextual factor of user; History represents the history purchaser record of user;Behavior identifies specific interest behavior operating result;Timestamp is represented The execution time of user behavior;Content represents interest topic content;
    In user accesses affairs, there is access transition probability P (q between any two behavior operationi→qj), represent as follows:
    <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>&amp;RightArrow;</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>|</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mfrac> <mrow> <msub> <mi>&amp;theta;</mi> <mn>1</mn> </msub> <msub> <mi>W</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;theta;</mi> <mn>2</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>H</mi> <mi>I</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;theta;</mi> <mn>3</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>I</mi> <mi>B</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;theta;</mi> <mn>4</mn> </msub> <msub> <mi>W</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&amp;theta;</mi> <mn>1</mn> </msub> <msub> <mi>W</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;theta;</mi> <mn>2</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>H</mi> <mi>I</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;theta;</mi> <mn>3</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>I</mi> <mi>B</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;theta;</mi> <mn>4</mn> </msub> <msub> <mi>W</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>&amp;NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>
    For each qjAnd its corresponding observed valueAll there are an observed value probability distributionThat is user user To qjAll access in, to observed valueInterest probabilities, can be byiThe set Q of included access node statei= {q1',…,q'f| q' ∈ IC } represent, then Qi,jRepresent atiIn in qjThe set of all access nodes afterwards,Represent Qi,jIn contain observed valueThe set of node:
    <mrow> <msub> <mi>Q</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>{</mo> <msubsup> <mi>q</mi> <mrow> <mi>k</mi> <mo>+</mo> <mi>l</mi> </mrow> <mo>&amp;prime;</mo> </msubsup> <mo>|</mo> <msubsup> <mi>q</mi> <mi>k</mi> <mo>&amp;prime;</mo> </msubsup> <mo>=</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>,</mo> <mi>l</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mrow> <mo>(</mo> <mi>f</mi> <mo>-</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>&amp;Element;</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>N</mi> <mi>u</mi> <mi>l</mi> <mi>l</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>
    By user user in qjUpper observed value probability distributionIt is defined as:
    Then user user according toBe possible in access sequence find a status switch, establish user's interest behavior Hidden semi-Markov model, make it have maximum access probability:
    <mrow> <msub> <mi>P</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>&amp;sigma;</mi> <mi>z</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>arg</mi> <mi>max</mi> <mo>&amp;Pi;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>k</mi> </msub> <mo>&amp;RightArrow;</mo> <msub> <mi>q</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>&amp;sigma;</mi> <mi>z</mi> <mi>k</mi> </msubsup> <mo>|</mo> <msub> <mi>q</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>
    During being detected to user interest drift, it is necessary first to gather the observation sequence in HSMM models, and Model pre-processes data before being trained, and after determining model parameter, then by calling HSMM algorithms, obtains user The constant probable value of interest, its probable value are calculated with the probable probability of average log, when the interest value of user is in normal model In enclosing, then user data is added to training data and concentrated, to update the parameter of hidden semi-Markov model;Otherwise, the user It will be considered as interest drift.
  2. A kind of 2. Users' Interests Mining method for incorporating body situation as claimed in claim 1, it is characterised in that:The step 1) in, obtaining the approach of user personalized information has two kinds:(a) by network surveying, the mode that user oneself participates in is received Collection;(b) interest information of user is obtained by tracking user behavior, using the feature extracting method of user behavior data.
  3. A kind of 3. Users' Interests Mining method for incorporating body situation as claimed in claim 1 or 2, it is characterised in that:It is described In step 2), the behavioural information of user includes user's search key, user's history purchaser record and user's history navigation patterns.
  4. A kind of 4. Users' Interests Mining method for incorporating body situation as claimed in claim 1 or 2, it is characterised in that:It is described In step 3), according to the interest situation information of user, in User-ontology situation is built, user context is divided into user's individual Situation, user environment situation and user equipment situation, body use the form of level concept tree, a certain element of user context Exactly represented by each node in tree.
CN201410269562.6A 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation Active CN104008203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410269562.6A CN104008203B (en) 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410269562.6A CN104008203B (en) 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation

Publications (2)

Publication Number Publication Date
CN104008203A CN104008203A (en) 2014-08-27
CN104008203B true CN104008203B (en) 2018-04-17

Family

ID=51368860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410269562.6A Active CN104008203B (en) 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation

Country Status (1)

Country Link
CN (1) CN104008203B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718471A (en) * 2014-12-03 2016-06-29 中国科学院声学研究所 User preference modeling method, system, and user preference evaluation method and system
US9940362B2 (en) * 2015-05-26 2018-04-10 Google Llc Predicting user needs for a particular context
CN106055661B (en) * 2016-06-02 2017-11-17 福州大学 More interest resource recommendations based on more Markov chain models
CN106776757B (en) * 2016-11-15 2020-03-27 中国银行股份有限公司 Method and device for indicating user to complete online banking operation
CN106651517B (en) * 2016-12-20 2021-11-30 广东技术师范大学 Drug recommendation method based on hidden semi-Markov model
CN109388661B (en) 2017-08-02 2020-04-21 创新先进技术有限公司 Model training method and device based on shared data
CN107609063B (en) * 2017-08-29 2020-03-17 重庆邮电大学 Multi-label classified mobile phone application recommendation system and method thereof
CN108134691B (en) * 2017-12-18 2019-10-01 Oppo广东移动通信有限公司 Model building method, Internet resources preload method, apparatus, medium and terminal
CN108038222B (en) * 2017-12-22 2022-01-11 冶金自动化研究设计院 System of entity-attribute framework for information system modeling and data access
CN108596205B (en) * 2018-03-20 2022-02-11 重庆邮电大学 Microblog forwarding behavior prediction method based on region correlation factor and sparse representation
CN108809955B (en) * 2018-05-22 2019-05-24 南瑞集团有限公司 A kind of power consumer behavior depth analysis method based on hidden Markov model
CN109741146B (en) * 2019-01-04 2022-06-28 平安科技(深圳)有限公司 Product recommendation method, device, equipment and storage medium based on user behaviors
CN109933741B (en) * 2019-02-27 2020-06-23 京东数字科技控股有限公司 Method, device and storage medium for extracting user network behavior characteristics
CN110162553A (en) * 2019-05-21 2019-08-23 南京邮电大学 Users' Interests Mining method based on attention-RNN
CN110297817A (en) * 2019-06-25 2019-10-01 哈尔滨工业大学 A method of the structure of knowledge is constructed based on personalized Bayes's knowledge tracing model
CN110866542B (en) * 2019-10-17 2021-11-19 西安交通大学 Depth representation learning method based on feature controllable fusion
CN114169869B (en) * 2022-02-14 2022-06-07 北京大学 Attention mechanism-based post recommendation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043793A (en) * 2009-10-09 2011-05-04 卢健华 Knowledge-service-oriented recommendation method
CN103514289A (en) * 2013-10-08 2014-01-15 北京百度网讯科技有限公司 Method and device for building interest entity base

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090074108A (en) * 2007-12-28 2009-07-06 주식회사 솔트룩스 Method for recommending contents with context awareness

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043793A (en) * 2009-10-09 2011-05-04 卢健华 Knowledge-service-oriented recommendation method
CN103514289A (en) * 2013-10-08 2014-01-15 北京百度网讯科技有限公司 Method and device for building interest entity base

Also Published As

Publication number Publication date
CN104008203A (en) 2014-08-27

Similar Documents

Publication Publication Date Title
CN104008203B (en) A kind of Users&#39; Interests Mining method for incorporating body situation
CN106951422B (en) Webpage training method and device, and search intention identification method and device
CN108717408B (en) Sensitive word real-time monitoring method, electronic equipment, storage medium and system
CN104899273B (en) A kind of Web Personalization method based on topic and relative entropy
CN102902821B (en) The image high-level semantics mark of much-talked-about topic Network Based, search method and device
CN104572797A (en) Individual service recommendation system and method based on topic model
CN107357793B (en) Information recommendation method and device
CN112214685A (en) Knowledge graph-based personalized recommendation method
CN105139237A (en) Information push method and apparatus
CN105279146A (en) Context-aware approach to detection of short irrelevant texts
CN103678431A (en) Recommendation method based on standard labels and item grades
CN104731962A (en) Method and system for friend recommendation based on similar associations in social network
CN105045931A (en) Video recommendation method and system based on Web mining
WO2018112696A1 (en) Content pushing method and content pushing system
CN105531701A (en) Personalized trending image search suggestion
CN102270212A (en) User interest feature extraction method based on hidden semi-Markov model
CN109063147A (en) Online course forum content recommendation method and system based on text similarity
CN109271514A (en) Generation method, classification method, device and the storage medium of short text disaggregated model
CN104077417A (en) Figure tag recommendation method and system in social network
Lu Semi-supervised microblog sentiment analysis using social relation and text similarity
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN112801425B (en) Method and device for determining information click rate, computer equipment and storage medium
CN112966091A (en) Knowledge graph recommendation system fusing entity information and heat
CN112288554B (en) Commodity recommendation method and device, storage medium and electronic device
Ye et al. A web services classification method based on GCN

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant