CN104156359A - Linking information recommendation method and device - Google Patents

Linking information recommendation method and device Download PDF

Info

Publication number
CN104156359A
CN104156359A CN201310174941.2A CN201310174941A CN104156359A CN 104156359 A CN104156359 A CN 104156359A CN 201310174941 A CN201310174941 A CN 201310174941A CN 104156359 A CN104156359 A CN 104156359A
Authority
CN
China
Prior art keywords
document content
word
user side
document
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310174941.2A
Other languages
Chinese (zh)
Other versions
CN104156359B (en
Inventor
潘璇
程刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201310174941.2A priority Critical patent/CN104156359B/en
Publication of CN104156359A publication Critical patent/CN104156359A/en
Application granted granted Critical
Publication of CN104156359B publication Critical patent/CN104156359B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a linking chain information recommendation method and device. The method comprises the steps that a request, for obtaining document content, sent by a user side is received, and the corresponding document content is searched for; according to stored historical data generated when the user side has access to all the document content and stored mining models, the service type corresponding to the document content is recognized, and the mining model matched with the service type is selected; according to the mining model matched with the service type, after words and expressions, to be linked, are mined out from the found document content, the mined words and expressions in the document content are linked, and the document content with the linked words and expressions is sent to the user side. According to the linking chain information recommendation method and device, a service side has the advantage of being capable of actively sending the document content with the linked words and expressions to the user side in real time, system efficiency is improved, and a large number of labor costs are saved.

Description

Interior chain information recommend method and device
Technical field
The present invention relates to computer networking technology, relate in particular to a kind of interior chain information recommend method and device.
Background technology
In the time that user side is accessed some e-commerce platforms, community or use instant messaging etc. to include the product of word content, server end conventionally can be for the clicking rate of user side or recent much-talked-about topic, from above-mentioned word product, choose some proprietary terms as chain information in name, place name, historical events or web page address etc. is sent to user side, supply user side access; In the time that user side is clicked above-mentioned interior chain information such as a certain interior chain word, just can jump to result of page searching or the encyclopaedia entry page or corresponding other respective page of this interior chain word that this interior chain word is corresponding.
At present, after the mode of choosing above-mentioned interior chain word is normally manually collected, adds up, sorted by the staff of server background, then push to user side; Because countless the accessing content of user of access to netwoks also varies, and new word and social hotspots also continue to bring out, information updating speed is also very fast, therefore adopt the implementation pushing after artificial collection, statistics, sequence to waste a large amount of human and material resources and time, interior chain information recommends efficiency low, and implementation process is not intelligent.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of interior chain information recommend method and device, is intended to reach server end and can initiatively sends the object of interior chain information to user side.
The embodiment of the invention discloses a kind of interior chain information recommend method, comprise the following steps:
Receive the request of obtaining document content that user side sends, search corresponding described document content;
Access historical data and the mining model of all document contents according to the user side of storage, identify the COS that described document content is corresponding, and the mining model of selection and described COS adaptation;
According to the mining model of described COS adaptation, the described document content of searching, after the excavation of the word of interior chain, is carried out to interior chain by the word of excavation in described document content, and the document content after chain in word is sent to user side.
The embodiment of the invention also discloses a kind of interior chain information recommendation apparatus, comprising:
Front end services module, the request of obtaining document content sending for receiving user side, searches corresponding described document content;
Model adaptation module, for access historical data and the mining model of all document contents according to the user side of storage, identifies the COS that described document content is corresponding, and the mining model of selection and described COS adaptation;
Online mining module, mining model for basis with described COS adaptation, the described document content of searching, after the excavation of the word of interior chain, is carried out to interior chain by the word of excavation in described document content, and the document content after chain in word is sent to user side.
The request of obtaining document content that the present invention sends by receiving user side, searches corresponding document content; Access historical data and the mining model of all document contents according to the user side of storage, COS corresponding to identification the document content, and the mining model of selection and described COS adaptation; According to the mining model of described COS adaptation, to the described document content of searching after the excavation of the word of interior chain, in described document content, the word of excavation is carried out to interior chain, and the document content after chain in word is sent to the method for user side, make server end there is in real time, initiatively send to user side the beneficial effect of the document content after chain in word, improve system effectiveness, saved a large amount of human costs.
Brief description of the drawings
Fig. 1 is chain information recommend method the first embodiment schematic flow sheet in the present invention;
Fig. 2 is chain information recommend method the second embodiment schematic flow sheet in the present invention;
Fig. 3 is historical data and the mining model one embodiment schematic flow sheet that obtains and store user side in chain information recommend method in the present invention and access all document contents;
Fig. 4 adopts chain information recommend method in the present invention to carry out off-line digging system, service implementation system and the mutual embodiment schematic flow sheet of user's service end in real time interior chain mining process;
Fig. 5 is chain information recommendation apparatus the first embodiment high-level schematic functional block diagram in the present invention;
Fig. 6 is chain information recommendation apparatus the second embodiment high-level schematic functional block diagram in the present invention;
Fig. 7 is that in the interior chain information recommendation apparatus of the present invention, off-line excavates module one embodiment high-level schematic functional block diagram;
Fig. 8 is that when in the present invention, chain information recommendation apparatus carries out in real time interior chain excavation, each functional module is disposed an example structure schematic diagram.
Realization, functional characteristics and the advantage of the object of the invention, in connection with embodiment, are described further with reference to accompanying drawing.
Embodiment
Further illustrate technical scheme of the present invention below in conjunction with Figure of description and specific embodiment.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
With reference to Fig. 1, Fig. 1 is chain information recommend method the first embodiment schematic flow sheet in the present invention; As shown in Figure 1, in the present invention, chain information recommend method comprises the following steps:
The request of obtaining document content that step S01, reception user side send, searches corresponding described document content;
Server end receives after the request of obtaining document content of user side transmission, above-mentioned request carried out after the data processings such as validity judgement, standardization processing, and the legal described request of response user side, and search the described document content corresponding with described request.
Step S02, access historical data and the mining model of all document contents according to the user side of storage, identify the COS that described document content is corresponding, and the mining model of selection and described COS adaptation;
Server end is finding after the document content corresponding with the request of user side, accesses historical record data and the mining model of all document contents according to pre-stored user side, the corresponding COS of document content of identification user side request.
The corresponding COS of described document content comprises the corresponding COS of instant messaging, and the document content comprising such as instant messagings such as QQ, Fetion, micro-letter, MSN, as Chinese word or foreign language word etc.The corresponding COS of described document content also comprises the document content that application was comprised that community, e-commerce platform, searched page etc. occur with form web page, as asks, pats, microblogging, the document content that the page comprised such as searches.
Server end, according to the corresponding COS of document content of identification, is selected the mining model with the interior chain word of this COS adaptation from pre-stored database.
Step S03, according to the mining model of described COS adaptation, to the described document content of searching after the excavation of the word of interior chain, in described document content, the word of excavation is carried out to interior chain, and the document content after chain in word is sent to user side.
Server end according to the mining model of described COS adaptation of selecting, the described document content finding is carried out to the real-time excavation of interior chain word.The real-time excavation of described document content being carried out to interior chain word can be understood as, and the word for the treatment of interior chain that the described document content finding is occurred excavates.Server end is excavated in described document content after the word of interior chain, in described document content, the word for the treatment of interior chain of excavating is carried out to interior chain, and the document content after chain in word is sent to user side.
In other embodiment in the present embodiment and the present invention in chain information recommend method and device, the word after interior chain is called to interior chain word, in the subsequent embodiment of the present embodiment, internally the explanation of chain word will repeat no more.
In the present embodiment, interior chain word becomes the interface linking mutually between the content page under current document content and same website domain name, has improved including and website weight of search engine.In the time that user side is browsed described document content, by the interior chain word in the described document content of direct click, just can directly jump to the corresponding link page of this interior chain word, improve system effectiveness, save the time that user side is inquired about this word.
Such as in following concrete application scenarios, to ask community as example; Asking in community, for user's enquirement, have multiple other users' answer, if some answers are adopted, this answer just becomes and is satisfied with answer or elite knowledge.In order to have improved including and website weight of search engine, simultaneously in order to allow this user understand more thoroughly and comprehensive to other users' answer, save user's search time, server end can carry out interior chain for some the proprietary terms (as name, place name, historical events, material title etc.) that are satisfied with in answer, makes above-mentioned proprietary term be called interior chain word; Server end enters the page that this interior chain word is corresponding as user clicks when above-mentioned interior chain word what receive that user side triggers, server end enters into by controlling search engine redirect result of page searching or the encyclopaedia entry page or other the corresponding pages that this interior chain word is corresponding, improve including and website weight of search engine, also facilitate user to search simultaneously.
The request of obtaining document content that the present embodiment sends by receiving user side, searches corresponding document content; Access historical data and the mining model of all document contents according to the user side of storage, COS corresponding to identification the document content, and the mining model of selection and described COS adaptation; According to the mining model of described COS adaptation, to the described document content of searching after the excavation of the word of interior chain, in described document content, the word of excavation is carried out to interior chain, and the document content after chain in word is sent to the method for user side, make server end there is in real time, initiatively send to user side the beneficial effect of the document content after chain in word, improve system effectiveness, saved a large amount of human costs.
Please refer to Fig. 2, Fig. 2 is chain information recommend method the second embodiment schematic flow sheet in the present invention; Described in the present embodiment and Fig. 1, the difference of embodiment is, has only increased step S10, obtains and stored user side and access historical data and the mining model of all document contents; The present embodiment only specifically describes step S10, and in the present invention, other related steps of chain information recommend method please refer to the specific descriptions of related embodiment, do not repeat them here.
As shown in Figure 2, the request of obtaining document content that in the present invention, chain information recommend method sends at step S01, reception user side, the step of searching corresponding described document content also comprises before:
Step S10, obtain and store user side and access historical data and the mining model of all document contents.
Server end obtains user side accesses historical data and the mining model of all document contents, and the historical data of all document contents that obtain and mining model are stored in database; Mining model described in the present embodiment comprises that server end online mining need carry out the mining model of the word of interior chain.
Server end obtains user side and accesses the historical data of all document contents and comprise: the corresponding data of arbitrary product that comprises digital content that in community, e-commerce platform and searched page, the user such as blog, news of the space community such as user's enquirement and satisfied answers, elite knowledge, microblogging blog article, Baidu space and QQ space accessed.
Server end obtains mining model and comprises: adopt the mode of high frequency co-occurrence as mining model; Such as, certain two word common frequency occurring in unitary document is the highest, server end identify this word be correlated with, similar, close word.Server end obtains mining model and also comprises, according to the algorithm model of general scoring such as order models, BM25 algorithm model.Described mining model at least one.
Please refer to Fig. 3, Fig. 3 is historical data and the mining model one embodiment schematic flow sheet that obtains and store user side in chain information recommend method in the present invention and access all document contents; As shown in Figure 3, in a preferred embodiment, obtain and store user side and access historical data and the mining model of all document contents and comprise:
Step S11, obtain user side and access the historical data of all document contents;
Step S12, the historical data of described all document contents is carried out to participle, and all document contents after participle are kept in knowledge base;
Server end obtains after the historical data of all document contents of user side access, using above-mentioned historical data as language material, it is carried out to participle.Participle described in the present embodiment refers to Chinese word segmentation, is about to a Chinese character sequence and is cut into independent one by one word.
In the present embodiment, server end can adopt a general point dictionary to carry out participle to above-mentioned document content, and the document content after participle is kept in database; For the ease of searching, the document content after participle directly can be kept in the knowledge base of database.Simultaneously, because the corresponding document content of product of different COS is different, therefore, server end also can, according to the corresponding different document content of the product of different COS, increase the corresponding everyday expressions of each COS and specific term etc. in knowledge base.Such as for interacting Question-Answer community, can increase common phrases, two-part allegorical saying, specific term etc.Server end can be using the participle of each document as a set, and by a semantic tree of set composition of one group of identical or close semanteme word; Each semantic tree is combined into above-mentioned knowledge base.
Because knowledge base is distinguished to some extent because of the difference of the document content of different COS, therefore, the document content of different COS after participle separately can be kept in corresponding knowledge base; Certainly, all document contents after participle can share a knowledge base.In a preferred embodiment, the document content of different COS after participle is kept at respectively in each self-corresponding knowledge base.
Step S13, described knowledge base is carried out to off-line excavation according to preset algorithm model, obtain the word for the treatment of interior chain that off-line excavates;
Server end, according to the COS of document content, can carry out respectively independently off-line to different COS, between each product, separates, completely independent.
In the time that server end carries out off-line excavation to above-mentioned knowledge base, the relevance algorithms model excavating according to interior chain carries out, and obtains the word for the treatment of interior chain that off-line excavates.The important parameter of the preset algorithm model that off-line excavates comprises: correlativity, user's degree of liking and user's degree of click.
Taking BM25 algorithm as example, the BM25 value of the word occurring in document content is larger, and the correlativity of this word is better.The computing formula of BM25 value is as follows:
score ( w , d ) = IDF ( w ) * f ( w , d ) ( k 1 + 1 ) f ( w , d ) + k 1 ( 1 - b + b | D | avgDL ) ;
Wherein, IDF ( w ) = log N - n ( w ) + 0.5 n ( w ) + 0.5 ;
In above-mentioned formula, N represents all quantity that is satisfied with answer results page, and what n (w) represented to comprise certain word is satisfied with answer results number of pages; F (w, d) represents to refer to that this word appears at this and be satisfied with the number of times of answer results page; | the be satisfied with length of answer results page article of D|, in the present embodiment | D| word number is weighed; AvgDL represents all average lengths that are satisfied with answer results page; k 1with b be constant term, wherein, k 1span be [1.2,2.0], the value of b is 0.75; IDF (w) is the inverse document frequency amount of this word.
Interior chain excavation is carried out in the answer that is satisfied with the problem that in interacting Question-Answer community (such as asking), emotion class is relevant; The result page of this problem is labeled as d 0, this document length is 12.Suppose that the emotion class of asking is satisfied with answer results page and adds up to N 0=99000; The average length of emotion class document is 10; Server end is satisfied with answer and carries out participle to this, and after removing noise jamming, and the descriptor obtaining, the number of times that this descriptor occurs in emotion class problem is satisfied with answer results page, the number of times that this descriptor occurs in document are as shown in the table:
Descriptor Honeymoon Change-place-reflect Love
The number of times occurring in document entirety 54000 2000 90000
The number of times occurring in current document 5 2 6
Suppose now constant term k 1value be 1.5, according to above-mentioned BM25 computing formula:
IDF (honeymooners)=log ((99000-54000+0.5)/54000+0.5)=0.83;
IDF (change-place-reflect)=log ((99000-2000+0.5)/2000+0.5)=48.49;
IDF (love)=log ((99000-90000+0.5)/90000+0.5)=0.10;
Calculating based on above-mentioned algorithm model can draw:
The mark 0.19 of the mark 1.54> " love " of mark 65.09 > " honeymooners " of " change-place-reflect "; When in the situation that the corresponding mining model of document content is only weighed with the correlativity of word and document, and interior chain word only gets two, and the word set for the treatment of interior chain of the document that off-line excavates is combined into { " change-place-reflect ", " honeymooners " }.
It will be appreciated by those skilled in the art that, in the relevance algorithms model excavating according to interior chain, other specific algorithm models obtain the concrete mode of the word for the treatment of interior chain of off-line excavation, can be with reference to concrete computation process and the obtain manner of above-mentioned employing BM25 algorithm model, the present embodiment does not repeat them here.
The present embodiment also can adopt the mode of calculated characteristics vector to calculate word correlativity.For example, adopt the semantic tree of the word obtaining in knowledge base mining process as the proper vector of word, the word of document content, as proper vector, calculates the included angle cosine value between these two proper vectors; Described included angle cosine value is larger, and its correlativity is better.
The degree of correlation of the click degree of step S14, the word for the treatment of interior chain user side in predetermined period of excavating according to off-line and the document content obtaining with user side, adjusts described algorithm model, to obtain corresponding described mining model storage.
Getting after the word of interior chain according to off-line algorithm model, according to mining model parameter, above-mentioned algorithm model is constantly adjusted, thus the mining model the storage that draw the corresponding adaptation of document content of different COS.
Adjust described algorithm model according to comprising: the degree of correlation of the click degree of the word of the treating interior chain user side in predetermined period excavating according to off-line and the document content that obtains with user side; Also comprise: number of visits, number of users and the flow of the word for the treatment of interior chain user side in predetermined period that off-line excavates.
The algorithm model adjusting stage is being trained to get the adaptive mining model stage to algorithm model, and server end is according to the quality of algorithm model training result is adjusted the parameter of above-mentioned algorithm model and mining model, to reach optimum.In order to ensure that evaluation and test does not exist deviation, server end need to, in the problem result page of each classification, extract the sample of some, and internally the degree of correlation and the clicking rate of chain word are assessed.
For example, while adopting mining model corresponding to BM25 algorithm model, total number of documents is N 0, when first receiving excavates request, total number of documents is (N 0+ 1) while, receiving second of a sort request, document adds up to (N 0+ 2), by that analogy; The word that certain occurs in document, the n calculating in mining model (w) is n 0, now n (w) is (n 0+ 1),, while receiving second document that comprises this word, the n of this word (w) is (n 0+ 2), by that analogy.In the mining model obtaining, some needs to revise, but server end is difficult to amendment under real-time status, server end acquiescence is used the data of storage, according to the new data that is the sample that excavates when off-line when enough large, excavates in real time to ask generation on mining model to affect meeting very little.The mining model obtaining need to be adjusted once at set intervals, for example, and every other day, within one week or one month, once revise.
The present embodiment is accessed the historical data of all document contents and the method for mining model by obtaining and store user side, is the prerequisite that server end carries out real-time online excavation, for server end real-time online excavates, provides necessary condition.
Taking off-line digging system, real-time service system and user's service end as corresponding subject of implementation, the implementation procedure of chain information recommend method in the present invention is described again below.
Based on the specific descriptions of embodiment described in Fig. 1, Fig. 2 and Fig. 3, please refer to Fig. 4, Fig. 4 adopts chain information recommend method in the present invention to carry out off-line digging system, service implementation system and the mutual embodiment schematic flow sheet of user's service end in real time interior chain mining process; As shown in Figure 4, obtain and store user side and access historical data and the mining model of all document contents by off-line digging system, and the model training that off-line is excavated; In the time that user's service end gets fresh content, real-time service system obtains model and the parameter corresponding to each model of corresponding knowledge and off-line excavation from off-line digging system; In the time that user's service end sends corresponding request to real-time service system, the request that real-time service system sends according to user's service end, judge the COS of this request correspondence, according to the adaptive corresponding algorithm model of COS, the word that need is carried out to interior chain excavates in real time, and the interior chain word after excavating is back to user's service end, for showing to user; Adopt the described interior chain information recommend method of the embodiment of the present invention to carry out interior chain excavation, make server end there is in real time, initiatively send to user side the beneficial effect of the document content after chain in word, improved system effectiveness, saved a large amount of human costs.
Please refer to Fig. 5, Fig. 5 is chain information recommendation apparatus the first embodiment high-level schematic functional block diagram in the present invention; As shown in Figure 5, in the present invention, chain information recommendation apparatus comprises: front end services module 01, model adaptation module 02 and online mining module 03.
Front end services module 01, the request of obtaining document content sending for receiving user side, searches corresponding described document content;
Front end services module 01 receives after the request of obtaining document content of user side transmission, above-mentioned request is carried out after the data processings such as validity judgement, standardization processing, the legal described request of response user side, and search the described document content corresponding with described request.
Model adaptation module 02, for access historical data and the mining model of all document contents according to the user side of storage, identifies the COS that described document content is corresponding, and the mining model of selection and described COS adaptation;
Find after the document content corresponding with the request of user side in front end services module 01, model adaptation module 02 is accessed historical record data and the mining model of all document contents according to pre-stored user side, the corresponding COS of document content of identification user side request.
The corresponding COS of described document content that model adaptation module 02 is identified comprises the corresponding COS of instant messaging, and the document content comprising such as instant messagings such as QQ, Fetion, micro-letter, MSN, as Chinese word or foreign language word etc.The corresponding COS of described document content also comprises the document content that application was comprised that community, e-commerce platform, searched page etc. occur with form web page, as asks, pats, microblogging, the document content that the page comprised such as searches.
Model adaptation module 02, according to the corresponding COS of document content of identification, is selected the mining model with the interior chain word of this COS adaptation from pre-stored database.
Online mining module 03, mining model for basis with described COS adaptation, the described document content of searching, after the excavation of the word of interior chain, is carried out to interior chain by the word of excavation in described document content, and the document content after chain in word is sent to user side.
Online mining module 03 according to the mining model of the described COS adaptation of selecting, the described document content finding is carried out to the real-time excavation of interior chain word.The real-time excavation that online mining module 03 is carried out interior chain word to described document content can be understood as, and the word for the treatment of interior chain that online mining module 03 occurs the described document content finding excavates.Online mining module 03 is excavated in described document content after the word of interior chain, in described document content, the word for the treatment of interior chain of excavating is carried out to interior chain, and the document content after chain in word is sent to user side.
In the present embodiment, interior chain word becomes the interface linking mutually between the content page under current document content and same website domain name, has improved including and website weight of search engine.In the time that user side is browsed described document content, by the interior chain word in the described document content of direct click, just can directly jump to the corresponding link page of this interior chain word, improve system effectiveness, save the time that user side is inquired about this word.
In the present embodiment, chain information recommendation apparatus can have multiple online mining modules 03, between online mining module 03, can be the same or different, and determines according to the conditions of demand of concrete application scenarios.
Such as in following concrete application scenarios, to ask community as example; Asking in community, for user's enquirement, have multiple other users' answer, if some answers are adopted, this answer just becomes and is satisfied with answer or elite knowledge.In order to have improved including and website weight of search engine, simultaneously in order to allow this user understand more thoroughly and comprehensive to other users' answer, save user's search time, server end can carry out interior chain for some the proprietary terms (as name, place name, historical events, material title etc.) that are satisfied with in answer, makes above-mentioned proprietary term be called interior chain word; Server end enters the page that this interior chain word is corresponding as user clicks when above-mentioned interior chain word what receive that user side triggers, server end enters into by controlling search engine redirect result of page searching or the encyclopaedia entry page or other the corresponding pages that this interior chain word is corresponding, improve including and website weight of search engine, also facilitate user to search simultaneously.
The request of obtaining document content that the present embodiment sends by receiving user side, searches corresponding document content; Access historical data and the mining model of all document contents according to the user side of storage, COS corresponding to identification the document content, and the mining model of selection and described COS adaptation; According to the mining model of described COS adaptation, to the described document content of searching after the excavation of the word of interior chain, in described document content, the word of excavation is carried out to interior chain, and the document content after chain in word is sent to user side, make server end there is in real time, initiatively send to user side the beneficial effect of the document content after chain in word, improve system effectiveness, saved a large amount of human costs.
Please refer to Fig. 6, Fig. 6 is chain information recommendation apparatus the second embodiment high-level schematic functional block diagram in the present invention; Described in the present embodiment and Fig. 5, the difference of embodiment is, has only increased off-line and has excavated module 04; The present embodiment only excavates module 04 to off-line and is described specifically, and about other related modules of chain information recommendation apparatus in the present invention, please refer to the specific descriptions of related embodiment, does not repeat them here.
As shown in Figure 6, in the present invention, chain information recommendation apparatus also comprises:
Off-line excavates module 04, accesses historical data and the mining model of all document contents for obtaining and store user side.
Off-line excavation module 04 is obtained user side and is accessed historical data and the mining model of all document contents, and the historical data of all document contents that obtain and mining model are stored in database; Mining model described in the present embodiment comprises that online mining module 03 online mining need carry out the mining model of the word of interior chain.
Off-line excavates module 04 and obtains user side and access the historical data of all document contents and comprise: the corresponding data of arbitrary product that comprises digital content that in community, e-commerce platform and searched page, the user such as blog, news of the space community such as user's enquirement and satisfied answers, elite knowledge, microblogging blog article, Baidu space and QQ space accessed.
Off-line excavation module 04 is obtained mining model and is comprised: adopt the mode of high frequency co-occurrence as mining model; Such as, certain two word common frequency occurring in unitary document is the highest, server end identify this word be correlated with, similar, close word.Server end obtains mining model and also comprises, according to the algorithm model of general scoring such as order models, BM25 algorithm model.Described mining model at least one.
Please refer to Fig. 7, Fig. 7 is that in the interior chain information recommendation apparatus of the present invention, off-line excavates module one embodiment high-level schematic functional block diagram.As shown in Figure 7, in the present embodiment, in chain information recommendation apparatus, off-line excavation module 04 also comprises: data capture unit 041, data participle unit 042, off-line excavate unit 043 and model training unit 044.
Data capture unit 041, accesses the historical data of all document contents for obtaining user side;
Data participle unit 042, for the historical data of described all document contents is carried out to participle, and is kept at all document contents after participle in knowledge base;
Data capture unit 041 obtains after the historical data of all document contents of user side access, and data participle unit 042 carries out participle using above-mentioned historical data as language material to it.Participle described in the present embodiment refers to Chinese word segmentation, is about to a Chinese character sequence and is cut into independent one by one word.
In the present embodiment, data participle unit 042 can adopt a general point dictionary to carry out participle to above-mentioned document content, and the document content after participle is kept in database; For the ease of searching, the document content after participle directly can be kept in the knowledge base of database.Simultaneously, because the corresponding document content of product of different COS is different, therefore, data participle unit 042 also can, according to the corresponding different document content of the product of different COS, increase the corresponding everyday expressions of each COS and specific term etc. in knowledge base.Such as for interacting Question-Answer community, data participle unit 042 can increase common phrases, two-part allegorical saying, specific term etc.Data participle unit 042 can be using the participle of each document as a set, and by a semantic tree of set composition of one group of identical or close semanteme word; Each semantic tree is combined into above-mentioned knowledge base.
Because knowledge base is distinguished to some extent because of the difference of the document content of different COS, therefore, data participle unit 042 can separately be kept at the document content of different COS after participle in corresponding knowledge base; Certainly, all document contents after participle can share a knowledge base.In a preferred embodiment, data participle unit 042 is kept at the document content of different COS after participle respectively in each self-corresponding knowledge base.
Off-line excavates unit 043, for described knowledge base is carried out to off-line excavation according to preset algorithm model, obtains the word for the treatment of interior chain that off-line excavates;
Off-line excavates unit 043 according to the COS of document content, can carry out respectively independently off-line to different COS, between each product, separates, completely independent.
In the time that off-line excavation unit 043 carries out off-line excavation to above-mentioned knowledge base, the relevance algorithms model excavating according to interior chain carries out, and obtains the word for the treatment of interior chain that off-line excavates.The important parameter of the preset algorithm model that off-line excavates comprises: correlativity, user's degree of liking and user's degree of click.
Taking BM25 algorithm as example, the BM25 value of the word occurring in document content is larger, and the correlativity of this word is better.The computing formula of BM25 value is as follows:
score ( w , d ) = IDF ( w ) * f ( w , d ) ( k 1 + 1 ) f ( w , d ) + k 1 ( 1 - b + b | D | avgDL ) ;
Wherein, IDF ( w ) = log N - n ( w ) + 0.5 n ( w ) + 0.5 ;
In above-mentioned formula, N represents all quantity that is satisfied with answer results page, and what n (w) represented to comprise certain word is satisfied with answer results number of pages; F (w, d) represents to refer to that this word appears at this and be satisfied with the number of times of answer results page; | the be satisfied with length of answer results page article of D|, in the present embodiment | D| word number is weighed; AvgDL represents all average lengths that are satisfied with answer results page; k 1with b be constant term, wherein, k 1span be [1.2,2.0], the value of b is 0.75; IDF (w) is the inverse document frequency amount of this word.
Interior chain excavation is carried out in the answer that is satisfied with the problem that in interacting Question-Answer community (such as asking), emotion class is relevant; The result page of this problem is labeled as d 0, this document length is 12.Suppose that the emotion class of asking is satisfied with answer results page and adds up to N 0=99000; The average length of emotion class document is 10; Data participle unit 042 is satisfied with answer and carries out participle to this, and after removing noise jamming, and the descriptor obtaining, the number of times that this descriptor occurs in emotion class problem is satisfied with answer results page, the number of times that this descriptor occurs in document are as shown in the table:
Descriptor Honeymoon Change-place-reflect Love
The number of times occurring in document entirety 54000 2000 90000
The number of times occurring in current document 5 2 6
Suppose now constant term k 1value be 1.5, off-line excavates unit 043 according to above-mentioned BM25 computing formula:
IDF (honeymooners)=log ((99000-54000+0.5)/54000+0.5)=0.83;
IDF (change-place-reflect)=log ((99000-2000+0.5)/2000+0.5)=48.49;
IDF (love)=log ((99000-90000+0.5)/90000+0.5)=0.10;
Off-line excavates the calculating of unit 043 based on above-mentioned algorithm model and can draw:
The mark 0.19 of the mark 1.54> " love " of mark 65.09 > " honeymooners " of " change-place-reflect "; When in the situation that the corresponding mining model of document content is only weighed with the correlativity of word and document, and interior chain word is only got two, off-line excavates unit 043 and obtains the word set for the treatment of interior chain of the document that off-line excavates and be combined into { " change-place-reflect ", " honeymooners " }.
It will be appreciated by those skilled in the art that, in the relevance algorithms model that off-line excavation unit 043 excavates according to interior chain, other specific algorithm models obtain the concrete mode of the word for the treatment of interior chain of off-line excavation, can be with reference to concrete computation process and the obtain manner of above-mentioned employing BM25 algorithm model, the present embodiment does not repeat them here.
In the present embodiment, off-line excavation unit 043 also can adopt the mode of calculated characteristics vector to calculate word correlativity.For example, adopt the semantic tree of the word obtaining in knowledge base mining process as the proper vector of word, the word of document content is as proper vector, and off-line excavates the included angle cosine value that unit 043 calculates between these two proper vectors; Described included angle cosine value is larger, and its correlativity is better.
Model training unit 044, for the degree of correlation of the click degree of the word for the treatment of interior chain user side in predetermined period of excavating according to off-line and the document content that obtains with user side, adjusts described algorithm model, to obtain corresponding described mining model storage.
Excavating unit 043 at off-line gets after the word of interior chain according to off-line algorithm model, model training unit 044 is constantly adjusted above-mentioned algorithm model according to mining model parameter, thereby draws mining model the storage of the corresponding adaptation of document content of different COS.
Model training unit 044 adjust described algorithm model according to comprising: the degree of correlation of the click degree of the word of the treating interior chain user side in predetermined period excavating according to off-line and the document content that obtains with user side; Also comprise: number of visits, number of users and the flow of the word for the treatment of interior chain user side in predetermined period that off-line excavates.
The algorithm model adjusting stage is being trained to get the adaptive mining model stage to algorithm model, model training unit 044 is according to the quality of algorithm model training result is adjusted the parameter of above-mentioned algorithm model and mining model, to reach optimum.In order to ensure that evaluation and test does not exist deviation, model training unit 044 need to, in the problem result page of each classification, extract the sample of some, and internally the degree of correlation and the clicking rate of chain word are assessed.
For example, when model training unit 044 adopts mining model corresponding to BM25 algorithm model, total number of documents is N 0, when first receiving excavates request, total number of documents is (N 0+ 1) while, receiving second of a sort request, document adds up to (N 0+ 2), by that analogy; The word that certain occurs in document, the n calculating in mining model (w) is n 0, now n (w) is (n 0+ 1),, while receiving second document that comprises this word, the n of this word (w) is (n 0+ 2), by that analogy.In the mining model obtaining in model training unit 044, some needs to revise, but under real-time status, be difficult to revise, model training unit 044 acquiescence is used the data of storage, according to the new data that is the sample that excavates when off-line when enough large, excavates in real time to ask generation on mining model to affect meeting very little.The mining model that model training unit 044 obtains need to be adjusted once at set intervals, for example, and every other day, within one week or one month, once revise.
The present embodiment is accessed historical data and the mining model of all document contents by obtaining and store user side, be the prerequisite that server end carries out real-time online excavation, for server end real-time online excavates, provides necessary condition.
To dispose the course of work of again describing chain information recommendation apparatus in the present invention with a kind of structure in practical application scene below.
Based on the specific descriptions of embodiment described in Fig. 5, Fig. 6 and Fig. 7, please refer to Fig. 8, Fig. 8 is that when in the present invention, chain information recommendation apparatus carries out in real time interior chain excavation, each functional module is disposed an example structure schematic diagram; Front end services module shown in Fig. 8 is the front end services module 01 described in above-described embodiment, and model adaptation and distribution module possess the function of the model adaptation module 02 described in above-described embodiment, and online mining module is the online mining module 03 described in above-described embodiment; Log acquisition module shown in Fig. 8, knowledge base excavation module, model training module, model memory module and knowledge base memory module possess the function of the off-line excavation module 04 described in above-described embodiment jointly.
As shown in Figure 8, the log acquisition module during off-line excavates is obtained the daily record that contains document content, asks, the historical visit data of microblogging, space, community etc. such as obtaining, and excavates and model training for follow-up knowledge base; The historical data that knowledge base excavation module is obtained log acquisition module is carried out participle, and all document contents that knowledge base memory module is excavated knowledge base after module participle are kept in knowledge base; Simultaneously, model training module is obtained knowledge base and is excavated all document contents after module participle, above-mentioned document content is carried out to model training according to preset algorithm model, such as above-mentioned document content is carried out to off-line excavation, thereby obtain the word for the treatment of interior chain that off-line excavates, and transfer to model memory module to store the algorithm model training, use for online mining module.Get after the content requests of leading portion service module transmission at model adaptation and distribution module, according to described content requests, identify the COS that described document content is corresponding, and the mining model of selection and described COS adaptation; The mining model of choosing according to model adaptation and distribution module, corresponding online mining module to the document content of searching after the excavation of the word of interior chain, in described document content, the word of excavation is carried out to interior chain, and the document content after chain in word is back to model adaptation and distribution module, then by model adaptation and distribution module, interior chain word is back to and sends the corresponding front end services module of content requests.
Dispose from the structure shown in Fig. 8, in the present embodiment, chain information recommendation apparatus can have multiple front end services modules and multiple online mining module, thereby can be in the time that user side has a large amount of concurrent content requests, interior chain information recommendation apparatus still can carry out efficiently, rapidly in real time interior chain and excavate, and push the document content that comprises interior chain word to user side, improve system effectiveness; Further, saved the time of user side inquiry corresponding word.
It should be noted that, in this article, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby the process, method, article or the device that make to comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or be also included as the intrinsic key element of this process, method, article or device.The in the situation that of more restrictions not, the key element being limited by statement " comprising ... ", and be not precluded within process, method, article or the device that comprises this key element and also have other identical element.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that above-described embodiment method can add essential general hardware platform by software and realize, can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in a storage medium (as ROM/RAM, magnetic disc, CD), comprises that some instructions are in order to make a server device (can be computing machine, server or the network equipment etc.) carry out the method described in the present invention each embodiment.
The foregoing is only the preferred embodiments of the present invention; not thereby limit its scope of the claims; every equivalent structure or conversion of equivalent flow process that utilizes instructions of the present invention and accompanying drawing content to do; directly or indirectly be used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (10)

1. in, a chain information recommend method, is characterized in that, comprises the following steps:
Receive the request of obtaining document content that user side sends, search corresponding described document content;
Access historical data and the mining model of all document contents according to the user side of storage, identify the COS that described document content is corresponding, and the mining model of selection and described COS adaptation;
According to the mining model of described COS adaptation, the described document content of searching, after the excavation of the word of interior chain, is carried out to interior chain by the word of excavation in described document content, and the document content after chain in word is sent to user side.
2. the method for claim 1, is characterized in that, before the request of obtaining document content that described reception user side sends, also comprises:
Obtain and store user side and access historical data and the mining model of all document contents.
3. method as claimed in claim 2, is characterized in that, described in obtain and store user side and access historical data and the mining model of all document contents and comprise:
Obtain user side and access the historical data of all document contents;
The historical data of described all document contents is carried out to participle, and all document contents after participle are kept in knowledge base;
Described knowledge base is carried out to off-line excavation according to preset algorithm model, obtain the word for the treatment of interior chain that off-line excavates;
The degree of correlation of the click degree of the word of the treating interior chain user side in predetermined period excavating according to off-line and the document content that obtains with user side, adjusts described algorithm model, to obtain corresponding described mining model storage.
4. method as claimed in claim 3, is characterized in that, described described knowledge base is carried out to off-line excavation according to preset algorithm model, also comprises after obtaining the word for the treatment of interior chain that off-line excavates:
Number of visits, number of users and the flow of the word of the treating interior chain user side in predetermined period excavating according to off-line, adjust described algorithm model, to obtain corresponding described mining model storage.
5. the method as described in claim 3 or 4, is characterized in that, describedly all document contents after participle are kept to knowledge base comprise:
Dissimilar according to document content, is kept at the document content after participle in different knowledge bases.
6. in, a chain information recommendation apparatus, is characterized in that, comprising:
Front end services module, the request of obtaining document content sending for receiving user side, searches corresponding described document content;
Model adaptation module, for access historical data and the mining model of all document contents according to the user side of storage, identifies the COS that described document content is corresponding, and the mining model of selection and described COS adaptation;
Online mining module, mining model for basis with described COS adaptation, the described document content of searching, after the excavation of the word of interior chain, is carried out to interior chain by the word of excavation in described document content, and the document content after chain in word is sent to user side.
7. device as claimed in claim 6, is characterized in that, also comprises:
Off-line excavates module, accesses historical data and the mining model of all document contents for obtaining and store user side.
8. device as claimed in claim 7, is characterized in that, described off-line excavates module and comprises:
Data capture unit, accesses the historical data of all document contents for obtaining user side;
Data participle unit, for the historical data of described all document contents is carried out to participle, and is kept at all document contents after participle in knowledge base;
Off-line excavates unit, for described knowledge base is carried out to off-line excavation according to preset algorithm model, obtains the word for the treatment of interior chain that off-line excavates;
Model training unit, for the degree of correlation of the click degree of the word for the treatment of interior chain user side in predetermined period of excavating according to off-line and the document content that obtains with user side, adjusts described algorithm model, to obtain corresponding described mining model storage.
9. device as claimed in claim 8, is characterized in that, described model training unit also for:
Number of visits, number of users and the flow of the word of the treating interior chain user side in predetermined period excavating according to off-line, adjust described algorithm model, to obtain corresponding described mining model storage.
10. device as claimed in claim 7 or 8, is characterized in that, described off-line excavate module also for:
Dissimilar according to document content, is kept at the document content after participle in different knowledge bases.
CN201310174941.2A 2013-05-13 2013-05-13 Interior chain information recommends method and device Active CN104156359B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310174941.2A CN104156359B (en) 2013-05-13 2013-05-13 Interior chain information recommends method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310174941.2A CN104156359B (en) 2013-05-13 2013-05-13 Interior chain information recommends method and device

Publications (2)

Publication Number Publication Date
CN104156359A true CN104156359A (en) 2014-11-19
CN104156359B CN104156359B (en) 2018-10-30

Family

ID=51881864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310174941.2A Active CN104156359B (en) 2013-05-13 2013-05-13 Interior chain information recommends method and device

Country Status (1)

Country Link
CN (1) CN104156359B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908767A (en) * 2017-11-29 2018-04-13 链家网(北京)科技有限公司 Chain processing method and processing device in the bottom of website
CN109670030A (en) * 2018-12-30 2019-04-23 联想(北京)有限公司 Question and answer exchange method and device
CN109858528A (en) * 2019-01-10 2019-06-07 平安科技(深圳)有限公司 Recommender system training method, device, computer equipment and storage medium
CN110543946A (en) * 2018-05-29 2019-12-06 百度在线网络技术(北京)有限公司 method and apparatus for training a model
CN113792230A (en) * 2021-08-24 2021-12-14 北京百度网讯科技有限公司 Service linking method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233671A1 (en) * 2006-03-30 2007-10-04 Oztekin Bilgehan U Group Customized Search
CN101493832A (en) * 2009-03-06 2009-07-29 辽宁般若网络科技有限公司 Website content combine recommendation system and method
CN102314454A (en) * 2010-06-30 2012-01-11 百度在线网络技术(北京)有限公司 Method and system for automatically adding internal links
CN102654875A (en) * 2011-03-04 2012-09-05 北京百度网讯科技有限公司 Method and device for automatically processing inner link of web text
CN103020090A (en) * 2011-09-27 2013-04-03 腾讯科技(深圳)有限公司 Method and device for providing link recommendation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233671A1 (en) * 2006-03-30 2007-10-04 Oztekin Bilgehan U Group Customized Search
CN101493832A (en) * 2009-03-06 2009-07-29 辽宁般若网络科技有限公司 Website content combine recommendation system and method
CN102314454A (en) * 2010-06-30 2012-01-11 百度在线网络技术(北京)有限公司 Method and system for automatically adding internal links
CN102654875A (en) * 2011-03-04 2012-09-05 北京百度网讯科技有限公司 Method and device for automatically processing inner link of web text
CN103020090A (en) * 2011-09-27 2013-04-03 腾讯科技(深圳)有限公司 Method and device for providing link recommendation

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908767A (en) * 2017-11-29 2018-04-13 链家网(北京)科技有限公司 Chain processing method and processing device in the bottom of website
CN110543946A (en) * 2018-05-29 2019-12-06 百度在线网络技术(北京)有限公司 method and apparatus for training a model
CN110543946B (en) * 2018-05-29 2022-07-05 百度在线网络技术(北京)有限公司 Method and apparatus for training a model
CN109670030A (en) * 2018-12-30 2019-04-23 联想(北京)有限公司 Question and answer exchange method and device
CN109670030B (en) * 2018-12-30 2022-06-28 联想(北京)有限公司 Question-answer interaction method and device
CN109858528A (en) * 2019-01-10 2019-06-07 平安科技(深圳)有限公司 Recommender system training method, device, computer equipment and storage medium
WO2020143186A1 (en) * 2019-01-10 2020-07-16 平安科技(深圳)有限公司 Recommendation system training method and apparatus, and computer device and storage medium
CN109858528B (en) * 2019-01-10 2024-05-14 平安科技(深圳)有限公司 Recommendation system training method and device, computer equipment and storage medium
CN113792230A (en) * 2021-08-24 2021-12-14 北京百度网讯科技有限公司 Service linking method and device, electronic equipment and storage medium
CN113792230B (en) * 2021-08-24 2024-04-09 北京百度网讯科技有限公司 Service linking method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN104156359B (en) 2018-10-30

Similar Documents

Publication Publication Date Title
US11847612B2 (en) Social media profiling for one or more authors using one or more social media platforms
CN101551806B (en) Personalized website navigation method and system
US10049132B2 (en) Personalizing query rewrites for ad matching
CN102332006B (en) A kind of information push control method and device
US9443019B2 (en) Optimized web domains classification based on progressive crawling with clustering
CN101641697B (en) Related search queries for a webpage and their applications
CN101119326B (en) Method and device for managing instant communication conversation record
US20080104034A1 (en) Method For Scoring Changes to a Webpage
KR20160055930A (en) Systems and methods for actively composing content for use in continuous social communication
CN103400286A (en) Recommendation system and method for user-behavior-based article characteristic marking
CN104008109A (en) User interest based Web information push service system
CN102737021B (en) Search engine and realization method thereof
CN104217031A (en) Method and device for classifying users according to search log data of server
CN102063469A (en) Method and device for acquiring relevant keyword message and computer equipment
CN103942268A (en) Method and device for combining search and application and application interface
CN104156359A (en) Linking information recommendation method and device
CN102722499A (en) Search engine and implementation method thereof
US11249993B2 (en) Answer facts from structured content
CN110188291B (en) Document processing based on proxy log
CN106407377A (en) Search method and device based on artificial intelligence
CN113297457A (en) High-precision intelligent information resource pushing system and pushing method
CN113869931A (en) Advertisement putting strategy determining method and device, computer equipment and storage medium
CN104636386A (en) Information monitoring method and device
CN101840438B (en) Retrieval system oriented to meta keywords of source document
CN104102727A (en) Query term recommending method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant