CN104008203B

CN104008203B - A kind of Users' Interests Mining method for incorporating body situation

Info

Publication number: CN104008203B
Application number: CN201410269562.6A
Authority: CN
Inventors: 陈庭贵; 周广澜; 许翀寰; 封毅
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2014-06-17
Filing date: 2014-06-17
Publication date: 2018-04-17
Anticipated expiration: 2034-06-17
Also published as: CN104008203A

Abstract

A kind of Users' Interests Mining method for incorporating body situation, first against the Web user interest behavior characteristic of complex multi-dimensional in e-commerce website, builds the user interest profile extraction model based on Second-Order Hidden Markov Model；Next analyzes the contextual information that can reflect user interest, including the individual information of user, environmental information and facility information etc.；The user interest model based on situation ontologies is again pulled up, the interest-degree of user's individual information is measured and expressed using the thought of fuzzy logic at the same time, it is finally based on the user interest drift detection method of hidden semi-Markov model, model is built according to user's browse path, using the average of the probable probability of the average log of sequence as threshold point, to judge whether interest is drifted about.The present invention constructs the interest model that disclosure satisfy that user demand to provide personalized ventilation system, improves the effective means of user satisfaction, has good application value.

Description

A kind of Users' Interests Mining method for incorporating body situation

Technical field

The present invention relates to data mining and ontology field, especially a kind of Users' Interests Mining method is particularly suitable In user personalized information services the problem of.

Background technology

Network application becomes increasingly complex, and data volume is also increasing, some such as e-commerce, web site design work Become more complicated with heavy, this is needed on the basis of user's existing information, from user's Access Interest, access time, access Dynamic adjustment structure of web page in terms of the behaviors such as frequency, targetedly carries out e-commerce to meet user demand, there is provided individual character Change service.The individual info service of Internet be exactly according to user it is different the characteristics of, and user interest hobby carry out from Dynamic information tissue and the service of adjustment, with a kind of quick, efficiently, accurate information acquiring pattern is isotropic to solve user information The problems such as.Based on this, how from the information expanded rapidly accurate understanding user information requirement, the structure characterization network user is special Sign, interest, the user model of target and Behavior preference simultaneously predict user behavior, preferably provide personalized clothes to the user accordingly Business becomes a problem.How to find in time and exactly user interest drift at the same time, the user of structure dynamic renewal is emerging Interesting model, to meet the individual information needs service of different user, has become the key issue of individual info service.

The content of the invention

In order to overcome the interest model that can not meet user demand of existing data mining mode to be pushed away to provide personalization The deficiency of clothes is recommended, present invention structure disclosure satisfy that the interest model of user demand to provide personalized ventilation system, improves user A kind of effective means of satisfaction, there is provided Users' Interests Mining method for incorporating body situation.

The technical solution adopted by the present invention to solve the technical problems is：

A kind of Users' Interests Mining method for incorporating body situation, the Users' Interests Mining method comprise the following steps：

1) the user interest profile extraction model based on Second-Order Hidden Markov Model is established：

The data of user interest can be reflected by obtaining those firstly the need of collection, and process is as follows：From client, server End, proxy server end obtain user's source data, after these source datas obtain, they are pre-processed and with the lattice of setting Formula is preserved, for later use the excavation of family interest.

Secondly, user interest profile is extracted using Second-Order Hidden Markov Model, including training part is with extracting part；

Training department point includes being pre-processed the characteristic information sequencing of user interest, forms text document, then After being scanned through to text, marked text sequence is converted to the text of mark using separator, space, line feed, colon typesetting This segmentation sequences, finally calculates it following model parameter, the definite algorithm of its parameter is as shown by the equation with second order HMM model：

1. initial probability distribution vector

Wherein, Init (i) refers in marked whole training sample, with state S_iTo start the number of status switch,Then refer to it is stateful for start status switch number summation；

2. original state transition probability

Wherein, C_ijAnd C_ijkRepresent respectively from state S_iTo S_jTransfer number, and the state S at t-1 moment_i, t moment shape State S_j, it is S to be transferred to t+1 moment state_kNumber.WithRepresent respectively from state S_iTo stateful transfer The sum of number, and the state S at t-1 moment_i, t moment state S_j, it is transferred to the sum of stateful number；

3. observed value discharges probability

Wherein, E_j(O_k) and E_ij(O_k) state S is represented respectively_jWhen discharge observed value O_kNumber, and the shape at t-1 moment State S_i, t moment state S_j, release observed value O_kNumber.WithState S is represented respectively_jWhen release it is all The sum of number of observed value, and the state S at t-1 moment_i, t moment state S_j, discharge the sum of number of all observed values；

Extraction unit is divided to including two steps, i.e.,：(a) text of feature to be extracted is pre-processed, to text through overscan After retouching, marked text sequence is converted to the text sections sequence of mark using separator, space, line feed, colon typesetting； (b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm, using well-established HMM moulds Type carries out user interest profile extraction, will handle the state output observed value O=O after obtaining₁O₂...O_TAs mode input, from In find out maximum probability in state tag sequence, the content of user characteristics extraction is exactly to be marked as the sight of dbjective state label Examine text；

2) contextual information of analysis reflection user interest：Pass through the search to user, navigation patterns and purchaser record information Analysis, derive the true interest of user in a period of time；

3) the user interest ontology model structure of situation is incorporated：First by region, gender, age, marriage, education background and receipts Enter the key of several influence user interests as contextual factor index, and combine the historical purchase information and user behavior of user Feature carries out Fuzzy Processing to obtain its interest level；Then the method for expressing of body situation is used, passes through more granularity divisions, structure Build user interest ontology model；

4) user interest drift detection method based on hidden semi-Markov model：

Two observed values are chosen to describe the navigation patterns of user：A) user accesses the browse path sequence of webpage；B) from One webpage reaches the time interval of another webpage；All state sets are expressed as S={ S₁,S₂,...,S_N, it is corresponding Observation set is expressed as V={ v₁,v₂,...,v_N, time interval is expressed as set I={ 1,2 ... }；For certain of user One navigation patterns, the number of its browse path link is a stochastic variable, the number of the observed value exported under given state The navigation patterns can be expressed as to set { 1 ..., D }.It is that two-dimentional observed value sequence is expressed as O=user's browse path sequence {(r₁,τ₁),...,(r_T,τ_T), wherein：r_t∈ V represent the object of user's browsed web content；τ_t∈ I represent user from one Page jump is to another page r_tWith r_t-1Between time interval；The output probability matrix B={ b of model_i(v, q) } table Show, for given state i ∈ S, b_i(v, q) represents user in a page r_t=v ∈ V and with the time interval of the previous page For τ_tThe probability of=q ∈ I, and meet ∑_v,qb_i(v, q)=1；With P={ p_i(d) } represent to export observed value under given state i Number is the probability of d ∈ { 1 ..., D }, is the probability matrix of state duration in hidden semi-Markov model, and meet ∑_dp_i(d)=1；State transition probability matrix passes through A={ a_ijBe indicated, a_ijRepresent the probability shifted from i ∈ S to j ∈ S；Just Beginning probability vector π={ π_iRepresent, π_iRepresent probability of the original state in i ∈ S；

One of user important interest behavior record is defined as：U_interest=user, background, History, behavior, timestamp, content }, wherein, user user represents, such as ID；Background represents user Specific contextual factor；History represents the history purchaser record of user；Behavior identifies specific interest behavior operating result； Timestamp represents the execution time of user behavior；Content represents interest topic content；

In user accesses affairs, there is access transition probability P (q between any two behavior operation_i→q_j), represent Interest weight is as follows：

For each q_jAnd its corresponding conceptAll there are an observed value probability distributionI.e. u is to q_j's It is right in all accessInterest probabilities, can be by_iThe collection of included access node is combined into Q_i={ q '₁,...,q'_f|q'∈ IC }, then Q_i,jRepresent at_iIn in q_jThe set of all access nodes afterwards,Represent Q_i,jIn containThe collection of node Close：

By u in q_jUpper observed value probability distributionIt is defined as：

Then user u according toBe possible in access sequence find a status switch, establish user interest row For hidden semi-Markov model, make it have maximum access probability：

During being detected to user interest drift, it is necessary first to the observation sequence in HSMM models is gathered, and And data are pre-processed before model is trained, after determining model parameter, then by calling HSMM algorithms, obtain The constant probable value of user interest, its probable value are calculated with the probable probability of average log, when the interest value of user is in just In normal scope, then user data is added to training data and concentrated, to update the parameter of hidden semi-Markov model；Otherwise, should User will be considered as interest drift.

Further, in the step 1), obtaining the approach of user personalized information has two kinds：(a) by network surveying, use The mode that family oneself participates in is collected；(b) interest information of user is obtained by tracking user behavior, using user behavior The feature extracting method of data.

Further, in the step 2), the behavioural information of user includes user's search key, user's history purchase note Record and user's history navigation patterns.

Further,, will in User-ontology situation is built according to the interest situation information of user in the step 3) User context is divided into user's individual situation, user environment situation and user equipment situation,.Body is using level concept tree Form, a certain element of user context are represented by each node in tree, that is, build situation ontologies tree.

The present invention technical concept be：User oriented personalized service field, the concept drift according to involved by method And Question Scene, it is proposed that incorporate the Users' Interests Mining method of body situation, construct the interest that disclosure satisfy that user demand Model improves the effective means of user satisfaction to provide personalized ventilation system.

Based on this, the present invention introduces data mining, ontology, fills using user personalized information service as research object Divide and consider user individual feature, propose a kind of Users' Interests Mining method for incorporating body situation, effectively realize user personality Change demand for services.

Data mining, ontology are introduced, takes into full account user individual feature, first against multiple in e-commerce website The Web user interest behavior characteristic of miscellaneous multidimensional, structure are based on Second-Order Hidden Markov Model (Second-Order Hidden Markov Model) user interest profile extraction model；Next analyzes the situation letter that can reflect user interest Breath, including the individual information of user, environmental information and facility information etc.；The user interest based on situation ontologies is again pulled up Model, while the interest-degree of user's individual information is measured and expressed using the thought of fuzzy logic, it is finally based on hidden The user interest drift detection method of semi-Markov model (Hidden Semi-Markov Model, HSMM), according to user Browse path builds model, using the average of the probable probability of the average log of sequence as threshold point, to judge whether interest is sent out Drift is given birth to.

The beneficial effects of the present invention are：The present invention, which constructs, disclosure satisfy that the interest model of user demand to provide individual character Change recommendation service, improve the effective means of user satisfaction, there is good application value.

Brief description of the drawings

Fig. 1 is the algorithm flow chart of the interest characteristics extraction based on second order HMM.

Fig. 2 is the structure flow of user context body.

Fig. 3 interest drifts detect block diagram.

Embodiment

The invention will be further described below in conjunction with the accompanying drawings.

With reference to Fig. 1, Fig. 2 and Fig. 3, a kind of Users' Interests Mining method for incorporating body situation, the Users' Interests Mining Method comprises the following steps：

5) the user interest profile extraction model based on Second-Order Hidden Markov Model is established：Web information extracts (Web Information Extraction) belong to the category that web content excavates, it is to extract data from semi-structured Web document, A kind of information extraction method using Web as information source.This step includes the collection of user data and user interest profile carries The foundation of modulus type.

In order to build user interest model, it is necessary first to which collection, which obtains those, can reflect the data of user interest.Usually In the case of, the data of user are often very much, include the information of user's registration, log information, page of text content-data, and website is opened up Flutter structure, the behavioral data of user, and page hyperlink information etc..These data can be from client, server end, agency The data sources such as server end obtain, and after these metadata obtain, can pre-process them and carry out in an appropriate format Preserve, for later use excavation of family interest.It is summed up, the approach for obtaining user personalized information mainly there are two kinds：(a) pass through Network surveying, the mode that user oneself participates in are collected.This method can directly acquire interest and the information requirement of user Tendency, but must have the positive cooperation of user；(b) interest information of user is obtained by tracking user behavior.Due to The first obtains the approach of user data, such as log-on message, is directly provided by user in a manner of list, is passed to back-end data Storehouse, the extraction comparison of its user interest profile is convenient, and infers the data of user interest by tracking the implicit behavior of user But can not directly obtain, so mainly using the feature extracting method of user behavior data here.

Secondly, the feature extraction of user interest belongs to Text Information Extraction category, and information extraction has become nature language Say an important directions of processing, theoretical research is continuously available development.The model extracted for information about at present mainly has 3 classes：One Kind is the model based on dictionary；One kind is rule-based model, such as body；A kind of is the model based on statistics, such as hidden Ma Er Can husband's model (HMM).Since HMM there are the statistical basis for being very suitable for natural language processing, strong robustness, essence are extracted plus it Degree is high, be easy to establish and it is adaptable the advantages that, it is more and more interested to researchers.Here second order hidden Markov is used Model extracts user interest profile, and flow chart is as shown in Figure 1.Mainly include two large divisions, i.e. training part is with extracting part.

Training department point includes being pre-processed some characteristic information sequencings of user interest, forms text document, Then after being scanned through to text, marked text sequence is converted into mark using typesettings such as separator, space, line feed, colons The text sections sequence of note, finally calculates it following model parameter, the definite algorithm such as formula of its parameter with second order HMM model It is shown：

1. initial probability distribution vector

Wherein, Init (i) refers in marked whole training sample, with state S_iTo start the number of status switch,Then refer to it is stateful for start status switch number summation.

2. original state transition probability

Wherein, C_ijAnd C_ijkRepresent respectively from state S_iTo S_jTransfer number, and the state S at t-1 moment_i, t moment shape State S_j, it is S to be transferred to t+1 moment state_kNumber.WithRepresent respectively from state S_iTo stateful transfer The sum of number, and the state S at t-1 moment_i, t moment state S_j, it is transferred to the sum of stateful number.

3. observed value discharges probability

Wherein, E_j(O_k) and E_ij(O_k) state S is represented respectively_jWhen discharge observed value O_kNumber, and the shape at t-1 moment State S_i, t moment state S_j, release observed value O_kNumber.WithState S is represented respectively_jWhen release it is all The sum of number of observed value, and the state S at t-1 moment_i, t moment state S_j, discharge the sum of number of all observed values.

Extraction unit is divided to including two steps, i.e.,：(a) text of feature to be extracted is pre-processed, to text through overscan After retouching, marked text sequence is converted to the text sections sequence of mark using typesettings such as separator, space, line feed, colons； (b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm.Using well-established HMM moulds Type carries out user interest profile extraction.The state output observed value O=O after obtaining will be handled₁O₂...O_TAs mode input, from In find out maximum probability in state tag sequence, the content of user characteristics extraction is exactly to be marked as the sight of dbjective state label Examine text.

6) contextual information of analysis reflection user interest：The interest characteristics of the network user is mainly by related to user interest Internal factor and external factor influence.Internal factor has gender, age, occupation, personality, education, income etc., external Factor then includes culture background, social environment, home background etc., and inherent and external many factors result in network The generation of user's difference behavior.Just because of this reason so that different users is there are many difference, to the interest of commodity Degree and deviation are also different.

The interest of user can usually be reflected in the behavior of itself, when they are interested in whatsit to produce Certain tendentiousness, demand and the interest of user can be recorded in their behavioural information, therefore can be by user's The analysis of the information such as search, navigation patterns and purchaser record, derives the true interest of user in a period of time.Here, user Behavioural information mainly include the following aspects：User's search key, user's history purchaser record, user's history browse row For etc..

7) the user interest ontology model structure of situation is incorporated：First by region, gender, age, marriage, education background and receipts Enter the key of several influence user interests as contextual factor index, and combine the historical purchase information and user behavior of user Feature carries out Fuzzy Processing to obtain its interest level；Then the method for expressing of body situation is used, passes through more granularity divisions, structure Build user interest ontology model.The flow chart for building user context ontology model is as shown in Figure 2.

According to the interest situation information of user, in User-ontology situation is built, user context is divided into user's individual Situation, user environment situation and user equipment situation.Body be typically using level concept tree form, user context certain One element is represented by each node in tree, that is, builds situation ontologies tree.

8) user interest drift detection method based on hidden semi-Markov model：Shopping row of the user on the network in browsing For process be by the complex process for browsing a variety of individual factors such as purpose, culture background, hobby and being influenced, by background because Element, user behavior and interest content consider the interest of user, and establish hidden semi-Markov model (HSMM) to examine Survey whether user interest drifts about.

Assuming that user, during webpage is browsed, its navigation patterns meets Markov property, then following two are chosen herein A observed value describes the navigation patterns of user：A) user accesses the browse path sequence of webpage；B) reached from a webpage another The time interval of one webpage.All state sets are expressed as S={ S₁,S₂,...,S_N, corresponding observation set represents For V={ v₁,v₂,...,v_N, time interval is expressed as set I={ 1,2 ... }；For a certain navigation patterns of user, its is clear The number that path links of looking at is a stochastic variable, and the number of the observed value exported under given state can be by the navigation patterns table It is shown as set { 1 ..., D }.It is that two-dimentional observed value sequence is expressed as O={ (r user's browse path sequence₁,τ₁),...,(r_T, τ_T), wherein：r_t∈ V represent the object of user's browsed web content；τ_t∈ I represent user from a page jump to another Page r_tWith r_t-1Between time interval.The output probability matrix B={ b of model_i(v, q) } represent, for given state i ∈ S, b_i(v, q) represents user in a page r_t=v ∈ V and be τ with the time interval of the previous page_tThe probability of=q ∈ I, and Meet ∑_v,qb_i(v, q)=1.With P={ p_i(d) } represent to export observed value number under given state i as d ∈'s { 1 ..., D } Probability, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑_dp_i(d)=1.State transition probability Matrix passes through A={ a_ijBe indicated, a_ijRepresent the probability shifted from i ∈ S to j ∈ S.Probability vector uses π={ π_iTable Show, π_iRepresent probability of the original state in i ∈ S.

One of user important interest behavior record is defined as：U_interest=user, background, history,behavior,timestamp,content}.Wherein, user user represents, such as ID；Background represents user Specific contextual factor；History represents the history purchaser record of user；Behavior identifies specific interest behavior operating result； Timestamp represents the execution time of user behavior；Content represents interest topic content.

In user accesses affairs, there is access transition probability P (q between any two behavior operation_i→q_j), can table Show that interest weight is as follows：

For each q_jAnd its corresponding conceptAll there are an observed value probability distributionI.e. u is to q_j All access in.It is rightInterest probabilities, can be by_iThe collection of included access node is combined into Q_i={ q '₁,...,q'_f|q' ∈ IC }, then Q_i,jRepresent at_iIn in q_jThe set of all access nodes afterwards,Represent Q_i,jIn containNode Set：

By u in q_jUpper observed value probability distributionIt is defined as：

During being detected to user interest drift, it is necessary first to the observation sequence in HSMM models is gathered, this In the navigation patterns data of user are mainly used as observation value sequence, and before model is trained data are carried out pre- Processing, after determining model parameter, then by calling HSMM algorithms, obtains the constant probable value of user interest, its probable value is used The probable probability of average log is calculated.When the interest value of user is in normal range (NR), then user data is added to training In data set, to update the parameter of hidden semi-Markov model；Otherwise, the user will be considered as interest drift.Drift detection Implementation method it is as shown in Figure 3.

Claims

A kind of 1. Users' Interests Mining method for incorporating body situation, it is characterised in that：The Users' Interests Mining method includes Following steps：

1) the user interest profile extraction model based on Second-Order Hidden Markov Model is established：

The data of user interest can be reflected by obtaining those firstly the need of collection, and process is as follows：From client, server end, generation Manage server end obtain user's source data, these source datas obtain after, by they pre-processed and with the form of setting into Row preserves, for later use excavation of family interest；

Secondly, user interest profile is extracted using Second-Order Hidden Markov Model, including training part is with extracting part；

Training department point includes being pre-processed the characteristic information sequencing of user interest, text document is formed, then to text Originally after being scanned through, marked text sequence is converted to the text point of mark using separator, space, line feed, colon typesetting Block sequence, finally with second order HMM model according to formula (1)~(5) computation model parameter, the definite algorithm such as formula institute of its parameter Show：

1. initial probability distribution vector

<mrow> <msub> <mi>&pi;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <mi>I</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mi>I</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Wherein, Init (i) refers in marked whole training sample, with state S_iTo start the number of status switch, Then refer to it is stateful for start status switch number summation；

2. original state transition probability

<mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>k</mi> </mrow> </msub> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>a</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>C</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>u</mi> </mrow> </msub> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>k</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

Wherein, C_ijAnd C_ijkRepresent respectively from state S_iTo S_jTransfer number, and the state S at t-1 moment_i, t moment state S_j, it is S to be transferred to t+1 moment state_kNumber；WithRepresent respectively from state S_iTo stateful transfer number The sum of, and the state S at t-1 moment_i, t moment state S_j, it is transferred to the sum of stateful number；

3. observed value discharges probability

<mrow> <msub> <mi>b</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>E</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>E</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>j</mi> <mo>&le;</mo> <mi>N</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>b</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>E</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>E</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>O</mi> <mi>u</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&le;</mo> <mi>N</mi> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>k</mi> <mo>&le;</mo> <mi>M</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

Wherein, E_j(O_k) and E_ij(O_k) state S is represented respectively_jWhen discharge observed value O_kNumber, and the state S at t-1 moment_i, t Moment state S_j, release observed value O_kNumber；WithState S is represented respectively_jWhen discharge all observed values The sum of number, and the state S at t-1 moment_i, t moment state S_j, discharge the sum of number of all observed values；

Extraction unit is divided to including two steps, i.e.,：(a) the characteristic information sequencing of user interest is pre-processed, forms text This document, after being scanned through to text, mark is converted to using separator, space, line feed, colon typesetting by marked text sequence The text sections sequence of note；(b) the second order HMM model of combined training part output, is calculated using Viterbi algorithm, should User interest profile extraction is carried out with well-established HMM model, the state output observed value O=O after obtaining will be handled₁O₂… O_TAs mode input, maximum probability in state tag sequence is therefrom found out, the content of user characteristics extraction is exactly labeled For the observation text of dbjective state label；

2) contextual information of analysis reflection user interest：Pass through point of the search to user, navigation patterns and purchaser record information Analysis, derives the true interest of user in a period of time；

3) the user interest ontology model structure of situation is incorporated：It is first that region, gender, age, marriage, education background and income is several A key for influencing user interest combines the historical purchase information and user behavior feature of user as contextual factor index Fuzzy Processing is carried out to obtain its interest level；Then the method for expressing of body situation is used, by more granularity divisions, structure is used Family interesting ontological profile；

4) user interest drift detection method based on hidden semi-Markov model：

Two observed values are chosen to describe the navigation patterns of user：A) user accesses the browse path sequence of webpage；B) from one Webpage reaches the time interval of another webpage；All state sets are expressed as S={ S₁,S₂,…,S_N, corresponding observed value Set expression is V={ v₁,v₂,…,v_N, time interval is expressed as set I={ 1,2 ... }；A certain for user browses row For the number of its browse path link is a stochastic variable, and the number of the observed value exported under given state can be clear by this Behavior representation is look at into set { 1 ..., D }；It is that two-dimentional observed value sequence is expressed as O={ (r user's browse path sequence₁, τ₁),…,(r_T,τ_T), wherein：r_t∈ V represent the object of user's browsed web content；τ_t∈ I represent that user jumps from a page Go to another page r_tWith r_t-1Between time interval；The output probability matrix B={ b of model_i(v, q) } represent, for Given state i ∈ S, b_i(v, q) represents user in a page r_t=v ∈ V and be τ with the time interval of the previous page_t=q The probability of ∈ I, and meet ∑_v,qb_i(v, q)=1；P is the probability matrix of state duration in hidden semi-Markov model, P ={ p_i(d) }, p_i(d) represent that observed value number is exported under given state i is the probability of d ∈ { 1 ..., D }, and meet ∑_dp_i (d)=1；State transition probability matrix passes through A={ a_ijBe indicated, a_ijRepresent the probability shifted from i ∈ S to j ∈ S；Initially Probability vector π={ π_iRepresent, π_iRepresent probability of the original state in i ∈ S；

One of user important interest behavior record is defined as：U_interest=user, background, history, Behavior, timestamp, content }, wherein, user represents user；Background represents the specific contextual factor of user； History represents the history purchaser record of user；Behavior identifies specific interest behavior operating result；Timestamp is represented The execution time of user behavior；Content represents interest topic content；

In user accesses affairs, there is access transition probability P (q between any two behavior operation_i→q_j), represent as follows：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>&RightArrow;</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>|</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mfrac> <mrow> <msub> <mi>&theta;</mi> <mn>1</mn> </msub> <msub> <mi>W</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>2</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>H</mi> <mi>I</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>3</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>I</mi> <mi>B</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>4</mn> </msub> <msub> <mi>W</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>&theta;</mi> <mn>1</mn> </msub> <msub> <mi>W</mi> <mi>B</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>2</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>H</mi> <mi>I</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>3</mn> </msub> <msub> <mi>W</mi> <mrow> <mi>I</mi> <mi>B</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&theta;</mi> <mn>4</mn> </msub> <msub> <mi>W</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mi>j</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mo>=</mo> <mi>j</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

For each q_jAnd its corresponding observed valueAll there are an observed value probability distributionThat is user user To q_jAll access in, to observed valueInterest probabilities, can be by_iThe set Q of included access node state_i= {q₁',…,q'_f| q' ∈ IC } represent, then Q_i,jRepresent at_iIn in q_jThe set of all access nodes afterwards,Represent Q_i,jIn contain observed valueThe set of node：

<mrow> <msub> <mi>Q</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>{</mo> <msubsup> <mi>q</mi> <mrow> <mi>k</mi> <mo>+</mo> <mi>l</mi> </mrow> <mo>&prime;</mo> </msubsup> <mo>|</mo> <msubsup> <mi>q</mi> <mi>k</mi> <mo>&prime;</mo> </msubsup> <mo>=</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>,</mo> <mi>l</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mrow> <mo>(</mo> <mi>f</mi> <mo>-</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>,</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>&Element;</mo> <msub> <mi>Q</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>N</mi> <mi>u</mi> <mi>l</mi> <mi>l</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

By user user in q_jUpper observed value probability distributionIt is defined as：

Then user user according toBe possible in access sequence find a status switch, establish user's interest behavior Hidden semi-Markov model, make it have maximum access probability：

<mrow> <msub> <mi>P</mi> <mrow> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>&sigma;</mi> <mi>z</mi> <mi>k</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mi>arg</mi> <mi>max</mi> <mo>&Pi;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>q</mi> <mi>k</mi> </msub> <mo>&RightArrow;</mo> <msub> <mi>q</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>&sigma;</mi> <mi>z</mi> <mi>k</mi> </msubsup> <mo>|</mo> <msub> <mi>q</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

During being detected to user interest drift, it is necessary first to gather the observation sequence in HSMM models, and Model pre-processes data before being trained, and after determining model parameter, then by calling HSMM algorithms, obtains user The constant probable value of interest, its probable value are calculated with the probable probability of average log, when the interest value of user is in normal model In enclosing, then user data is added to training data and concentrated, to update the parameter of hidden semi-Markov model；Otherwise, the user It will be considered as interest drift.
A kind of 2. Users' Interests Mining method for incorporating body situation as claimed in claim 1, it is characterised in that：The step 1) in, obtaining the approach of user personalized information has two kinds：(a) by network surveying, the mode that user oneself participates in is received Collection；(b) interest information of user is obtained by tracking user behavior, using the feature extracting method of user behavior data.
A kind of 3. Users' Interests Mining method for incorporating body situation as claimed in claim 1 or 2, it is characterised in that：It is described In step 2), the behavioural information of user includes user's search key, user's history purchaser record and user's history navigation patterns.
A kind of 4. Users' Interests Mining method for incorporating body situation as claimed in claim 1 or 2, it is characterised in that：It is described In step 3), according to the interest situation information of user, in User-ontology situation is built, user context is divided into user's individual Situation, user environment situation and user equipment situation, body use the form of level concept tree, a certain element of user context Exactly represented by each node in tree.