CN110309189B

CN110309189B - Method and device for acquiring heat of entity words

Info

Publication number: CN110309189B
Application number: CN201810203602.5A
Authority: CN
Inventors: 李潇; 郑孙聪
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2018-03-13
Filing date: 2018-03-13
Publication date: 2023-04-18
Anticipated expiration: 2038-03-13
Also published as: CN110309189A

Abstract

The invention discloses a method and a device for acquiring heat of entity words, wherein the scheme comprises the following steps: acquiring a search log data set; obtaining the attention of the entity words according to the mention amount of the entity words in the search log data set; and performing multi-source fusion on the attention degree and the importance degree of the entity word according to the importance degree of the entity word in the existing knowledge base to obtain the current heat degree of the entity word. By adopting the technical scheme provided by the invention, the attention and the importance of the entity words are fused to obtain the heat of the entity words, so that the acquisition accuracy of the heat of the entity words is improved, and the problems of manpower resource waste and low judgment speed caused by judging the heat of the entity words by relying on the subjective intention of people are solved.

Description

Method and device for acquiring heat of entity words

Technical Field

The invention relates to the technical field of data mining, in particular to a method and a device for acquiring heat of entity words.

Background

The entity word popularity refers to the popularity of the entity word at the current stage. In the search engine processing, the heat of the entity words is very important, and the retrieval results of the query words are directly influenced.

The heat acquisition of entity words is a data mining technology. The traditional entity word heat problem is judged by depending on the subjective will of people to a great extent, so that the problems of manpower waste, low speed, serious influence of human factors and the like are brought.

At present, link information among entity words in a knowledge graph is mainly used for calculating pagerank (pecky rank) values of the entity words, and then the heat degree of the entity words is obtained. With the popularization and rapid development of the internet, a great amount of daily news data emerge on the internet, and therefore, network expressions become more and more colorful, and new words such as "rally" and "old drivers" are more and more endless. The knowledge graph is relatively slow to update, and for some network new words, the link relation of the network new words is difficult to update in the knowledge graph quickly, so that the hot degree of the entity words obtained through calculation is low, and the accuracy of the hot degree of the obtained entity words is not high.

Disclosure of Invention

The invention provides a method for acquiring the heat degree of an entity word, which aims to solve the problem of low accuracy in acquiring the heat degree of the entity word in the related technology.

The invention provides a method for acquiring heat of entity words, which comprises the following steps:

acquiring a search log data set;

obtaining the attention of the entity words according to the mention amount of the entity words in the search log data set;

and according to the importance degree of the entity word in the existing knowledge base, performing multi-source fusion on the attention degree and the importance degree of the entity word to obtain the current heat of the entity word.

In an exemplary embodiment, before obtaining the attention of the entity word according to the mention amount of the entity word in the search log data set, the method further includes:

matching the query statement with an expression mode template according to the query statement of the entity word in the search log data set to obtain the matching success times between the query statement containing the entity word and the expression mode template;

and accumulating to obtain the mention amount of the entity words according to the mention times of the entity words in the search log data set and the matching success times.

according to the established entity attribute relationship, counting the times of the entity words and the corresponding attribute words appearing simultaneously in the search log data set to obtain the forward matching times of the entity words;

and accumulating to obtain the mention amount of the entity words according to the mention times of the entity words in the search log data set and the forward matching times.

according to the established entity attribute relationship, counting the times of the entity words as attribute words and corresponding entities in the search log data set, and obtaining the reverse matching times of the entity words;

and accumulating to obtain the mention amount of the entity words according to the mention times of the entity words in the search log data set and the reverse matching times.

In an exemplary embodiment, the obtaining the attention of the entity word according to the mention amount of the entity word in the search log data set includes:

and normalizing the reference quantity of the entity words to obtain the attention of the entity words.

In an exemplary embodiment, the multi-source fusing the attention and the importance of the entity word according to the importance of the entity word in the existing knowledge base to obtain the current heat of the entity word includes:

acquiring the cognitive popularity of the entity word through multi-source fusion according to the webpage importance level of the corresponding webpage address of the entity word in the encyclopedic network and the entity importance level of the entity word in the knowledge graph;

and fusing the cognitive popularity of the entity words and the attention of the entity words to obtain the current heat of the entity words.

In an exemplary embodiment, before obtaining the cognitive popularity of the entity word through multi-source fusion according to the webpage importance level of the webpage address corresponding to the entity word in the encyclopedia network and the entity importance level of the entity word in the knowledge graph, the method further includes:

acquiring a webpage address corresponding to the identification information from the encyclopedia network according to the identification information of the entity words;

and calculating the importance level of the webpage corresponding to the webpage address through a pagerank algorithm according to the constructed webpage link relation in the encyclopedia network.

and calculating the entity importance level of the entity word through a pagerank algorithm according to the entity link relation of the entity word in the knowledge graph.

In an exemplary embodiment, fusing the cognitive popularity of the entity word and the attention of the entity word to obtain the current heat of the entity word includes:

and weighting and summing the cognitive popularity and the attention of the entity word according to a preset weight coefficient to obtain the current heat of the entity word.

The invention also provides a device for acquiring the heat of the entity words, which comprises:

the log acquisition module is used for acquiring a search log data set;

the attention obtaining module is used for obtaining the attention of the entity words according to the mention amount of the entity words in the search log data set;

and the heat obtaining module is used for performing multi-source fusion on the attention degree and the importance degree of the entity word according to the importance degree of the entity word in the existing knowledge base to obtain the current heat of the entity word.

In an exemplary embodiment, the attention obtaining module further includes:

the first matching unit is used for matching the query statement with the expression mode template according to the query statement of the entity word in the search log data set to obtain the matching success times between the query statement containing the entity word and the expression mode template;

and the first accumulation unit is used for accumulating and obtaining the mention amount of the entity words according to the mention times of the entity words in the search log data set and the matching success times.

In an exemplary embodiment, the attention obtaining module further includes:

the second matching unit is used for counting the times of the entity words and the corresponding attribute words in the search log data set according to the established entity attribute relationship to obtain the forward matching times of the entity words;

and the second accumulation unit is used for accumulating and obtaining the mention amount of the entity words according to the mention times of the entity words in the search log data set and the forward matching times.

In an exemplary embodiment, the attention obtaining module further includes:

a third matching unit, configured to count, in the search log data set, times that the entity word appears as an attribute word simultaneously with a corresponding entity according to the established entity attribute relationship, and obtain reverse matching times of the entity word;

and the third accumulation unit is used for accumulating and obtaining the mention amount of the entity words according to the mention times of the entity words in the search log data set and the reverse matching times.

In an exemplary embodiment, the attention obtaining module includes:

and the normalization unit is used for normalizing the reference quantity of the entity words to obtain the attention degree of the entity words.

In an exemplary embodiment, the heat obtaining module includes:

the cognitive popularity obtaining unit is used for obtaining the cognitive popularity of the entity word through multi-source fusion according to the webpage importance level of the webpage address corresponding to the entity word in the encyclopedia network and the entity importance level of the entity word in the knowledge graph;

and the heat obtaining unit is used for fusing the cognition popularity of the entity words and the attention of the entity words to obtain the current heat of the entity words.

In an exemplary embodiment, the cognitive popularity obtaining unit further includes:

the website acquisition subunit is used for acquiring a webpage address corresponding to the identification information from the encyclopedia network according to the identification information of the entity word;

and the first calculating subunit is used for calculating the importance level of the webpage corresponding to the webpage address through a pagerank algorithm according to the constructed webpage link relation in the encyclopedic network.

and the second calculating subunit is used for calculating the entity importance level of the entity word through a pagerank algorithm according to the entity link relation of the entity word in the knowledge graph.

In an exemplary embodiment, the heat obtaining unit includes:

and the weighted summation subunit is used for weighting and summing the cognitive popularity and the attention of the entity word according to a preset weight coefficient to obtain the current heat of the entity word.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

according to the technical scheme provided by the invention, the attention degree of the entity word is determined by obtaining the mention amount of the entity word in the search log data set, and then the attention degree of the entity word and the importance degree of the entity word in the existing knowledge base are subjected to multi-source fusion to obtain the current heat degree of the entity word. By adopting the technical scheme provided by the invention, the attention and the importance of the entity words are fused to obtain the heat of the entity words, so that the acquisition accuracy of the heat of the entity words is improved, and the problems of manpower resource waste and low judgment speed caused by judging the heat of the entity words by relying on the subjective intention of people are solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic illustration of an implementation environment according to the present invention;

FIG. 2 is a block diagram illustrating a server in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method for heat acquisition of entity words in accordance with an illustrative embodiment;

FIG. 4 is a flowchart of a method for obtaining heat of entity words according to another exemplary embodiment based on the corresponding embodiment in FIG. 3;

FIG. 5 is a flowchart illustrating a method for obtaining heat of entity words according to yet another exemplary embodiment based on the corresponding embodiment in FIG. 3;

FIG. 6 is a flow chart illustrating a method for obtaining heat of entity words according to still another exemplary embodiment based on the corresponding embodiment in FIG. 3;

FIG. 7 is a flowchart of step 350 of the corresponding embodiment of FIG. 3;

FIG. 8 is a flowchart illustrating a method for heat acquisition of entity words in accordance with an illustrative embodiment;

FIG. 9 is a block diagram illustrating an apparatus for obtaining a heat of an entity word in accordance with one illustrative embodiment;

FIG. 10 is a block diagram illustrating details of a focus acquisition module in the corresponding embodiment of FIG. 9;

fig. 11 is a detailed block diagram of the heat acquisition module in the corresponding embodiment of fig. 9.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

FIG. 1 is a schematic diagram illustrating an implementation environment to which the present invention relates, according to an exemplary embodiment. The implementation environment to which the present invention relates includes a server 110. The database of the server 110 stores a search log data set and a knowledge base, so that the server 110 can obtain the current popularity of the entity word by fusing according to the attention of the entity word in the search log data set and the importance degree of the entity word in the knowledge base by using the method for obtaining the popularity of the entity word provided by the present invention.

The implementation environment may also include a data source that provides data, i.e., a search log data set, as desired. Specifically, in this implementation environment, the data source may be other terminal devices 130, and the other terminal devices 130 may send the search log data set to the server 110, so that the server 110 obtains the current popularity of the entity word by using the method provided by the present invention based on the existing knowledge base and the search log data set.

It should be noted that the method for acquiring the heat of the entity words provided by the present invention is not limited to deploying corresponding processing logic in the server 110, and may also be processing logic deployed in other machines. For example, processing logic for heat acquisition of physical words is deployed in a computing-capable machine, and so on.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 110 according to an embodiment of the present invention. The server 110 may vary significantly due to configuration or performance, and may include one or more Central Processing Units (CPUs) 222 (e.g., one or more processors) and memory 232, one or more storage media 230 (e.g., one or more mass storage devices) storing applications 242 or data 244. Memory 232 and storage medium 230 may be, among other things, transient or persistent storage. The program stored in the storage medium 230 may include one or more modules (not shown), each of which may include a series of instructions operating on the server 110. Still further, the central processor 222 may be configured to communicate with the storage medium 230 to execute a series of instruction operations in the storage medium 230 on the server 110. The server 110 may also include one or more power supplies 226, one or more wired or wireless network interfaces 250, one or more input-output interfaces 258, and/or one or more operating systems 241, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like. The steps performed by the server 110 described in the embodiments of fig. 3-8 below may be based on the server 110 architecture shown in fig. 2.

It will be understood by those skilled in the art that all or part of the steps for implementing the following embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Fig. 3 is a flowchart illustrating a method for obtaining a heat of an entity word according to an exemplary embodiment. The scope of applicability and the execution subject of the method for acquiring the heat of the entity word, for example, the method is used for the server 110 of the implementation environment shown in fig. 1. As shown in fig. 3, the method may be performed by the server 110 and may include the following steps.

In step 310, a search log data set is obtained.

The search log refers to a query record left after a user queries specified content through a search tool, and the search log comprises specific search content, search time and the like. A search log dataset is a data set of all search logs in the near term (e.g., within 10 days, half a month, or a month).

It should be noted that the search log data set may be obtained from another terminal device 130 equipped with a search engine. The search engine is a computer program that helps users search for their desired content, in other words, matches the information stored in the terminal device 130 with the user's information needs and displays the matching result. The user queries the desired information through the search engine and leaves a search log in the terminal device 130. Of course, the search log data set may be stored in the server 110 itself provided with the search engine, and the server 110 may acquire the search log data set from its own database.

In step 330, the attention of the entity word is obtained according to the reference amount of the entity word in the search log data set.

The entity word refers to names of objects or things which exist in the real world in a guest and can be distinguished from each other. The reference amount is used for indicating the number of times that the entity word is referred in the search log data set, and the larger the reference amount is, the more the number of times that the entity word is searched is indicated, the more attention is paid; conversely, a lower reference amount indicates less attention. The attention degree refers to the attention degree of the entity word, and is generally a relative concept. For example, the entity word that is least interested in is represented by 1, the entity word that is more interested in than the entity word is represented by 2, and so on.

If necessary, the reference amount of a real word may be directly used as the attention degree of the real word, and for example, when the reference amount of a real word is 5000 times, the attention degree of the real word is represented by a value of 5000.

In an exemplary embodiment, the step 320 specifically includes: and normalizing the mention amount of the entity words to obtain the attention of the entity words.

The normalization is to limit the processed data (by some algorithm) in a certain range required by the user, so that the absolute value of the numerical value of the mentioned quantity becomes some relative value relationship. The reference amount is an absolute amount for each entity word, and is converted into a relative value by normalizing the reference amounts of all entity words, and the relative value is taken as the attention of the entity word.

Specifically, the original data line can be converted to [ 0-1 ] by using linear function normalization]Range, normalized by the formula

Wherein X _norm For normalized data, X is the original data of the mentioned quantities, X _max 、X _min Respectively, the maximum and minimum values of the original data set. And realizing equal-scale scaling of the original data of the mention quantity through the normalization formula, wherein the data after the normalization of the mention quantity of the entity words is the attention of the entity words.

In step 350, according to the importance degree of the entity word in the existing knowledge base, multi-source fusion is performed on the attention degree and the importance degree of the entity word, and the current heat degree of the entity word is obtained.

The knowledge base is a knowledge database and comprises an ontology and knowledge of knowledge. The existing knowledge bases include an encyclopedia knowledge base, a Wikipedia knowledge base and a knowledge map (knowledge sets with graphs having relevance). The importance degree of the entity words in the existing knowledge base is used for representing the importance degree and the relevance of the entity words in the knowledge base, and the importance degree can be represented by numerical values. The importance degree range of the entity words can be 0-1, the most important entity words in the knowledge base can be represented by a numerical value 1, and the importance degrees of other entity words are represented by decimal numbers between 0 and 1.

In one embodiment, the importance degree of the entity words in the existing knowledge base can be determined by counting the data of out-link and in-link of the entity words according to the relationship between the entity words in the knowledge map, calculating the pagerank (pecky rank) value of the entity words according to the data of out-link and in-link of the entity words, and further normalizing the pagerank values of all the entity words.

To explain, pagerank (also called pagerank, *** left rank) is a technique calculated by a search engine based on hyperlinks between web pages, and is named by the last name of Larry Page (lorry Page), a founder of Google corporation, as one of elements of the pagerank. Google is used to show the relevance and importance of web pages, and is often used in search engine optimization operations to evaluate one of the success factors of web page optimization.

Wherein, the pagerank value can be calculated by adopting the following formula:

wherein M is _pi Is the set of all pages with out-links to pi pages, L (pj) is the number of out-links of the pages pj, N is the total number of pages, and α is generally 0.85.According to the above formula, the PR value (i.e. pagerank value) of each web page can be calculated, and when the iteration is stable, the result is the final result.

By referring to the calculation formula of the webpage pagerank value, the pagerank value of the entity word can be calculated according to the data of the out-link and in-link of the entity word in the knowledge graph, and then the pagerank value is normalized to obtain the importance degree of the entity word. Here, an out-link refers to a hyperlink on a web page or website, as opposed to an "inbound link". In fact, "out-link" and "in-link" are both links in the general sense, i.e., for the same link, it is out-link for the web page where the link is located, and it is in-link for the web page pointed to.

Furthermore, the multi-source fusion of the attention and the importance of the entity words means that the attention numerical value and the importance numerical value of the entity words are combined, correlated or combined to obtain a more accurate numerical value. For example, the attention value and the importance value are added, multiplied, or the like. It should be noted that the multi-source fusion indicates that there are at least two fusion objects, and the fusion objects may include the attention degree of the entity word, the importance degree of the entity word in the encyclopedia knowledge base, the importance degree of the entity word in the knowledge graph, the importance degree of the entity word in the wikipedia knowledge base, and the like. And performing multi-source fusion on the attention and the importance of the entity word to obtain a fusion result, namely the current heat (popular) of the entity word. The current popularity of the entity word can be used for representing the popularity of the entity word in the near future, and reflecting the attention of the public.

In practical application, the search engine can recommend an entity word with a relatively high current heat value to the user according to the correlation between the current heat of the entity word and the word searched by the user based on the word searched by the user, so that the search result of the search engine is more in line with the search requirement of the public.

According to the technical scheme provided by the embodiment of the invention, the attention degree of the entity word is determined by obtaining the mention amount of the entity word in the search log data set, and then the attention degree of the entity word and the importance degree of the entity word in the existing knowledge base are subjected to multi-source fusion to obtain the current heat degree of the entity word. By adopting the technical scheme provided by the invention, the attention and the importance of the entity words are fused to obtain the heat of the entity words, so that the acquisition accuracy of the heat of the entity words is improved, and the problems of manpower resource waste and low judgment speed caused by judging the heat of the entity words by relying on the subjective intention of people are solved.

In an exemplary embodiment, before the step 330, as shown in fig. 4, the technical solution provided by the present invention further includes the following steps.

In step 401, according to the query statement of the entity word in the search log data set, matching the query statement with an expression mode template to obtain the matching success times between the query statement containing the entity word and the expression mode template.

The query statement refers to a statement input by a user during a search, and for example, "the latest warwolf 2 photo" is a query statement. The search log data set records the query sentences of all recent users, and through the matching of the entity, all the query sentences of the entity word in the search log data set can be found out, for example, other query sentences of the entity word "warwolf 2" may have "dubbing actor of warwolf 2", "warwolf 2 scenario" and the like.

It should be noted that the expression templates of different entity words may be stored in the database in advance. For example, the expression template may be "movie XX series", "dubbing actor of XX", "XX song", etc., where "XX" represents an alternative physical word. Specifically, a query statement containing a certain entity word a is matched with all expression mode templates, wherein when the query statement contains a configured certain expression mode template, the matching can be considered to be successful. By matching all the query sentences containing the entity words a with the expression mode templates one by one, the number of successful matching times between all the query sentences containing the entity words a and all the expression mode templates can be counted.

In step 402, the mention amount of the entity word is obtained through accumulation according to the mention times of the entity word in the search log data set and the matching success times.

It should be noted that matching the query sentence including the entity word with the expression template is mainly performed to disambiguate the mention amount of the entity word. The number of times of reference of the entity word in the search log data set is the number of times of occurrence of the entity word in the search log data set, and can be obtained through statistics. And adding the number of times of mention of the entity word and the number of times of successful matching to obtain the number of mention of the entity word. The error caused by simply taking the number of times of reference of the entity word as the reference quantity of the entity word is reduced, and the attention degree of different entity words can be distinguished conveniently through the reference quantity of the entity word in the follow-up process.

In an exemplary embodiment, before the step 330, as shown in fig. 5, the technical solution provided by the present invention further includes the following steps.

In step 501, according to the established entity attribute relationship, the times of the entity word and the corresponding attribute word appearing at the same time are counted in the search log data set, and the forward matching times of the entity word are obtained.

The constructed entity attribute relationship refers to the entity words and the attribute words of which the relationship is determined, and the attribute words are used for describing the entity words. For example, the relation between the entity word "Liu Dehua" and the attribute word "forgetting water" and the attribute name is "song".

It should be noted that the number of times that an entity word and a corresponding attribute word in a search log data set appear simultaneously refers to the number of times that the entity word and the corresponding attribute word appear in the same query statement, for example, the number of times that "Liu Dehua" and "forgetting water" appear in the same query statement, "Liu Dehua" and "yidi" appear in the same query statement, the entity word and the corresponding attribute word appear once at the same time, and the number of times of forward matching is increased by one, and further, the number of times of forward matching of the entity word is obtained by accumulating the number of times that the entity word and the corresponding attribute word appear simultaneously. The forward matching times are the times of simultaneous occurrence of the entity word and the corresponding attribute word.

In step 502, the mention amount of the entity word is obtained cumulatively according to the mention times of the entity word in the search log data set and the forward matching times.

And adding the number of times of mention of the entity word in the search log data set and the forward matching number of times to obtain the mention amount of the entity word, so that the mention amount of the entity word can be optimized, and the error of the mention amount of the entity word is eliminated. The reference amount of the entity word may be the sum of the reference times and forward matching times of the entity word in the search log data set and the forward matching times obtained in step 401, so as to further optimize the reference amount of the entity word.

In an exemplary embodiment, before the step 330, as shown in fig. 6, the technical solution provided by the present invention further includes the following steps.

In step 601, according to the established entity attribute relationship, counting the times of the entity words as attribute words and corresponding entities appearing simultaneously in the search log data set, and obtaining the reverse matching times of the entity words;

it should be noted that there is no distinct boundary between the entity word and the attribute word in the form, for example, "forgetting water" may also be an entity word in the knowledge base, and "forgetting water" also serves as an attribute word of the entity word "Liu Dehua". For the reference quantity of the entity word such as "forgetting water", the reverse matching times of the entity word "forgetting water" can be obtained by counting the times of simultaneous occurrence of the entity word "forgetting water" as an attribute word with the corresponding entity such as "Liu Dehua", that is, the times of occurrence in the same query sentence in the search log data set. The reverse matching times are the times when a certain entity word is taken as an attribute word and the corresponding entity word appears in the same query sentence.

In step 602, the mention amount of the entity word is cumulatively obtained according to the mention times of the entity word in the search log data set and the reverse matching times.

For example, the reference amount of the entity word "forgetting water" may be the sum of the reference number of the entity word "forgetting water" in the search log data set and the reverse matching number, so as to improve the accuracy of the reference amount of the entity word and reduce the error.

In an exemplary embodiment, the reference amount of the entity word may be the sum of the reference times, the matching success times, the reverse matching times and the forward matching success times of the entity word, and the reference amount error of the entity word is further reduced.

In an exemplary embodiment, as shown in fig. 7, the step 350 specifically includes:

in step 351, acquiring the cognitive popularity of the entity word through multi-source fusion according to the webpage importance level of the webpage address corresponding to the entity word in the encyclopedia network and the entity importance level of the entity word in the knowledge graph;

the importance degree of the entity words in the knowledge base comprises the webpage importance level of the entity words in the encyclopedia network and the entity importance level of the entity words in the knowledge graph. Both encyclopedia networks and knowledge graphs are a form of knowledge base. The encyclopedia network is a massive knowledge set formed by editing and publishing human knowledge information by using the internet technology. The knowledge graph is a knowledge set with the relevance of graphs, each node in the knowledge graph represents an entity existing in the real world, and each edge is a relation between the entities. Knowledge-graphs are the most efficient way to represent relationships. Generally, a knowledge graph is a relational network obtained by connecting all kinds of Information (Heterogeneous Information). Knowledge-graphs provide the ability to analyze problems from a "relational" perspective.

For example, encyclopedia, wikipedia, interactive encyclopedia, etc. are all one type of encyclopedia network. The encyclopedic network is provided with a webpage address corresponding to the entity word, and webpage content corresponding to the webpage address is used for explaining the entity word. The webpage importance level of the entity word in the encyclopedia network is used for representing the importance and the relevance of the webpage address corresponding to the entity word in the encyclopedia network. And the entity importance level of the entity words in the knowledge graph is used for representing the importance and the relevance of the entity words in the knowledge graph. The webpage importance level and the entity importance level of the entity word can be calculated in advance and stored.

It should be noted that, the importance degree of the entity word in the knowledge base and the attention degree of the entity word are subjected to multi-source fusion, the importance degree of the entity word in different knowledge bases can be firstly subjected to first fusion, then the first fusion result and the attention degree of the entity word are subjected to second fusion, and the second fusion result is used as the current heat degree of the entity word.

Specifically, the first fusion may be multi-source fusion of the webpage importance level of the webpage address corresponding to the entity word in the encyclopedic network and the entity importance level of the entity word in the knowledge graph, that is, fusion of the webpage importance level and the entity importance level of the entity word to obtain the cognitive popularity of the entity word. The cognitive popularity of the entity word is a fusion result of the webpage importance level and the entity importance level of the entity word. The fusion mode may be weighted addition of the web page importance level and the entity importance level according to a preset weight coefficient.

In an exemplary embodiment, before step 351, a step of calculating a web page importance level of a web page address corresponding to the entity word in the encyclopedia network may be further included, specifically as follows:

the identification information of the entity word may be an entity word number. And acquiring a webpage address (url) corresponding to the entity word number from the encyclopedia network.

And calculating the importance level of the webpage corresponding to the webpage address through a pagerank algorithm according to the constructed webpage link relation in the encyclopedic network.

It should be noted that anchor links corresponding to the encyclopedic web page addresses can be obtained in advance, and a chain and incoming chain relationship is constructed according to the web page addresses corresponding to the anchor links, so that the encyclopedic network is constructed. According to the established webpage link relation in the encyclopedia network, the pagerank value of the webpage address corresponding to the entity word number can be calculated through the pagerank algorithm. The calculation formula of the pagerank value in the above embodiment can be referred to, and the pagerank algorithm belongs to the prior art and is not described herein again.

Specifically, after the pagerank values of the web addresses corresponding to the entity word numbers are calculated, the pagerank values of all the web addresses are normalized, and the normalized result is used as the web importance levels of the entity words.

In an exemplary embodiment, before step 351, the method may further include a step of calculating an entity importance level of the entity word, which is as follows:

It should be noted that the knowledge graph itself is a mesh knowledge base formed by linking entities with attributes through relationships, the entity link relationship of an entity word in the knowledge graph refers to the out-link and in-link relationship of the entity word in the knowledge graph, and the pagerank value of the entity word can be calculated through a pagerank algorithm according to the out-link and in-link relationship of the entity word. And normalizing the pagerank values of all entity words to obtain the normalization result, namely the entity importance level of the entity words.

In an exemplary embodiment, comparing the pagerank value of the entity word in the encyclopedia network (normalized result) with the pagerank value of the entity word in the knowledge graph (normalized result), if the pagerank value in the knowledge graph is greater than the pagerank value in the encyclopedia network, then cognitive popularity = pagerank value in the knowledge graph 0.8+ pagerank value in the encyclopedia network 0.2. Otherwise cognitive popularity = pagerank value in encyclopedia network 0.8+ pagerank value in knowledge graph 0.2.

In step 352, the cognitive popularity of the entity word and the attention of the entity word are fused to obtain the current heat of the entity word.

Specifically, calculating a pagerank value of a webpage address corresponding to the entity word in the encyclopedic network, and normalizing to obtain the webpage importance level of the entity word; calculating the pagerank value of the entity word according to the out-link and in-link relation of the entity word in the knowledge graph, and normalizing to obtain the entity importance level of the entity word; and fusing the webpage importance level and the entity importance level of the entity word to obtain the cognitive popularity of the entity word. And further fusing the cognitive popularity of the entity words and the attention of the entity words to obtain the current heat of the entity words.

The method for fusing the cognitive popularity and the attention of the entity words can be adding or multiplying the cognitive popularity and the attention. And taking the fusion result of the cognitive popularity and the attention as the current heat of the entity word.

In an exemplary embodiment, the step 352 specifically includes: and weighting and summing the cognitive popularity and the attention of the entity word according to a preset weight coefficient to obtain the current heat of the entity word.

In an exemplary embodiment, it is determined whether an entity word exists in the search log dataset, if so, the attention and the cognitive popularity of the entity word are compared, and if the attention is greater than the cognitive popularity, the current heat of the entity word = attention 0.8+ cognitive popularity 0.2. If the attention is less than or equal to the cognitive popularity, the current heat of the entity word = attention 0.4+ cognitive popularity 0.6. If the fruit body words do not exist in the search log dataset, the current popularity of the entity = the cognitive popularity.

Fig. 8 is a flowchart illustrating a method for obtaining heat of entity words according to an exemplary embodiment. As shown in fig. 8, firstly, the pagerank value of the entity word in the encyclopedia network is calculated (S801), the pagerank value of the entity word in the knowledge graph (S802), and the mention amount of the entity word in the search log data set (S803), and the sequence of S801, S802, and S803 is not limited, and may be calculated in parallel. The values obtained in S801, S802, and S803 are normalized respectively, and then the pagerank value normalization result of the entity word in the encyclopedic network, the pagerank value normalization result in the knowledge graph, and the normalization result of the mention amount are fused (for example, weighted and added according to a preset weight coefficient) (S804), and the current heat of the entity word is output (S805).

It should be explained that the current heat of the entity word refers to the pagerank value of the webpage address corresponding to the entity word in the encyclopedic network, so that the current heat value of the entity word can better accord with the public cognition, and the data of the encyclopedic network is manually edited, so that the accuracy is also ensured. The pagerank value of the entity word and the attention of the entity word in the search log data set are fused, so that the robustness of the pagerank value is kept, the current heat of the entity word with more frequent occurrence times can be improved, and the defect that the current heat of the entity word is inaccurate to obtain in the traditional mode due to the fact that a knowledge base is not updated timely is overcome by calculating the attention of the entity word in the search log data set.

The following is an embodiment of the apparatus of the present invention, which can be used to execute the embodiment of the method for obtaining the heat of the entity word executed by the server 110 according to the present invention. For details that are not disclosed in the embodiments of the apparatus of the present invention, please refer to the embodiments of the method for obtaining heat of the entity words of the present invention.

Fig. 9 is a block diagram illustrating an apparatus for obtaining a heat degree of an entity word according to an exemplary embodiment, which may be used in the server 110 in the implementation environment shown in fig. 1 to perform all or part of the steps of the method for obtaining a heat degree of an entity word shown in any one of fig. 3 to 8. As shown in fig. 9, the apparatus includes, but is not limited to: a log obtaining module 910, an attention obtaining module 930, and a heat obtaining module 950.

A log obtaining module 910, configured to obtain a search log data set;

an attention obtaining module 930, configured to obtain an attention of an entity word according to a mention amount of the entity word in the search log data set;

the heat obtaining module 950 is configured to perform multi-source fusion on the attention and the importance of the entity word according to the importance of the entity word in the existing knowledge base, and obtain the current heat of the entity word.

The implementation processes of the functions and actions of the modules in the apparatus are specifically described in the implementation processes of the corresponding steps in the method for acquiring the heat of the entity words, and are not described herein again.

The log obtaining module 910 may be, for example, one of the physical structures of the wired or wireless network interface 250 in fig. 2.

The attention obtaining module 930 and the heat obtaining module 950 may also be functional modules, and are configured to execute corresponding steps in the method for obtaining heat of entity words. It is understood that these modules may be implemented in hardware, software, or a combination of both. When implemented in hardware, these modules may be implemented as one or more hardware modules, such as one or more application specific integrated circuits. When implemented in software, the modules may be implemented as one or more computer programs executing on one or more processors, such as programs stored in memory 232 for execution by central processor 222 of FIG. 2.

In an exemplary embodiment, as shown in fig. 10, the attention obtaining module 930 further includes:

a first matching unit 931, configured to match the query statement with the expression template according to the query statement of the entity word in the search log dataset, to obtain the matching success times between the query statement including the entity word and the expression template;

a first accumulating unit 932, configured to obtain, by accumulation, a mention amount of the entity word according to the mention times of the entity word in the search log data set and the matching success times.

In an exemplary embodiment, the attention obtaining module 930 further includes:

In an exemplary embodiment, the attention obtaining module 930 includes:

In an exemplary embodiment, as shown in fig. 11, the heat obtaining module 950 includes:

a cognitive popularity obtaining unit 951, configured to obtain a cognitive popularity of the entity word through multi-source fusion according to a webpage importance level of a webpage address corresponding to the entity word in an encyclopedia network and an entity importance level of the entity word in a knowledge graph;

a heat obtaining unit 952, configured to fuse the cognitive popularity of the entity word with the attention of the entity word, so as to obtain a current heat of the entity word.

In an exemplary embodiment, the cognitive popularity obtaining unit 951 further includes:

In an exemplary embodiment, the heat obtaining unit 952 includes:

Optionally, the present invention further provides an electronic device, which may be used in the server 110 in the implementation environment shown in fig. 1 to execute all or part of the steps of the method for obtaining heat of entity words shown in any one of fig. 3 to fig. 8. The electronic device includes:

a processor;

a memory for storing processor-executable instructions;

wherein, the processor is configured to execute the method for acquiring the heat of the entity word according to the above exemplary embodiment.

The specific manner in which the processor of the electronic device performs the operation in this embodiment has been described in detail in the embodiment of the method for acquiring heat of the entity word, and will not be described in detail here.

In an exemplary embodiment, a storage medium is also provided that is a computer-readable storage medium, such as may be transitory and non-transitory computer-readable storage media, including instructions. The storage medium stores a computer program executable by the central processing unit 222 of the server 110 to perform the heat acquiring method of the above-described entity words.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for acquiring heat of entity words is characterized by comprising the following steps:

acquiring a search log data set, wherein the search log data set is a data set formed by query data left after a searcher queries specified contents through a search tool;

acquiring the cognitive popularity of the entity word through multi-source fusion according to the webpage importance level of the webpage address corresponding to the entity word in the encyclopedia network and the entity importance level of the entity word in the knowledge graph;

and fusing the cognitive popularity of the entity word and the attention of the entity word to obtain the current heat of the entity word.

2. The method of claim 1, wherein before obtaining the attention of the entity word according to the mention amount of the entity word in the search log data set, the method further comprises:

3. The method of claim 1, wherein before obtaining the attention of the entity word according to the mention amount of the entity word in the search log data set, the method further comprises:

4. The method according to claim 1, wherein before obtaining the attention of the entity word according to the mention amount of the entity word in the search log data set, the method further comprises:

5. The method according to claim 1, wherein the obtaining the attention of the entity word according to the mention amount of the entity word in the search log data set comprises:

and normalizing the mention amount of the entity words to obtain the attention of the entity words.

6. The method of claim 1, wherein before obtaining the cognitive popularity of the entity word through multi-source fusion according to the webpage importance level of the webpage address corresponding to the entity word in the encyclopedia network and the entity importance level of the entity word in the knowledge graph, the method further comprises:

7. The method according to claim 1, wherein before obtaining the cognitive popularity of the entity word through multi-source fusion according to the webpage importance level of the webpage address corresponding to the entity word in the encyclopedia network and the entity importance level of the entity word in the knowledge graph, the method further comprises:

8. The method of claim 1, wherein fusing the cognitive popularity of the entity word with the attention of the entity word to obtain the current heat of the entity word comprises:

9. An apparatus for obtaining heat of an entity word, the apparatus comprising:

the log acquisition module is used for acquiring a search log data set, wherein the search log data set is a data set formed by query data left after a searcher queries specified contents through a search tool;

the popularity obtaining module is used for obtaining the cognitive popularity of the entity words through multi-source fusion according to the webpage importance levels of the corresponding webpage addresses of the entity words in the encyclopedic network and the entity importance levels of the entity words in the knowledge graph; and fusing the cognitive popularity of the entity word and the attention of the entity word to obtain the current heat of the entity word.

10. The apparatus of claim 9, wherein the attention obtaining module further comprises:

11. The apparatus of claim 9, wherein the attention obtaining module further comprises:

12. The apparatus of claim 9, wherein the attention obtaining module further comprises:

a third matching unit, configured to count, in the search log dataset, times that the entity word appears as an attribute word simultaneously with a corresponding entity according to the established entity attribute relationship, and obtain reverse matching times of the entity word;