CN111666420A

CN111666420A - Method for intensively extracting experts based on subject knowledge graph

Info

Publication number: CN111666420A
Application number: CN202010474948.6A
Authority: CN
Inventors: 林欣; 王辰奕; 高桢; 孙琪力
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-15
Anticipated expiration: 2040-05-29
Also published as: CN111666420B

Abstract

The invention discloses a method for intensively extracting experts based on a discipline knowledge graph. The method specifically comprises the following steps: the method comprises the steps of establishing mapping from keywords to academic knowledge map nodes by using the similarity of English Wikipedia hyperlinks, calculating the matching degree between a project group and experts by using the mapping result, finding a plurality of experts which are relatively most matched for the project group by using matching degree scores, intensively selecting the experts from the experts found in the previous step as the actual result according to the overall situation of the extraction, and filling the vacant project group on the premise of ensuring that the experts are more concentrated when the experts actively or passively quit the review. The invention is flexible and easy to use, and has wide application range; the method can select proper experts for the project groups proposed by experts in each industry, and ensure the concentration of the selected experts to a certain extent on the premise of not damaging the matching degree of the experts.

Description

Method for intensively extracting experts based on subject knowledge graph

Technical Field

The invention relates to the field of natural language processing and the field of database entity matching and entity mapping. Specifically, the method is a method for obtaining a plurality of experts with high subject correlation degree respectively for a plurality of project groups in one review activity in batch by taking data of a subject knowledge graph as assistance, and selecting the experts as extraction results from the experts as intensively as possible.

Background

In recent years, with the improvement of hardware performance and the explosive increase of information on the internet, methods for processing and analyzing big data have been rapidly developed and are increasingly widely used in many fields. The knowledge graph has great advantages in the aspect of improving the information retrieval quality as a novel big data processing means.

The concept of knowledge graph was first proposed by *** corporation, and at first, knowledge graph was mainly used to assist the search engine of *** to retrieve information. With the development of big data processing and analyzing methods in recent years, knowledge maps are widely applied in the fields of intelligent search, intelligent question answering, intelligent recommendation and the like. Particularly in the field of intelligent search, the defect of searching only through keyword matching is made up by the appearance of the knowledge graph, so that a search engine can make an educated guess on the specific intention of user query to a certain extent, and concept retrieval or semantic retrieval is realized. Under the support of the knowledge graph, the computer can better understand the expression mode of the human language and intelligently feed back a retrieval result which is more suitable for the requirements of the user. In addition, the structural characteristics of the knowledge graph enable the relationship among various information entities to be reflected more clearly, so that the information is aggregated into knowledge, and the information is easier to understand, evaluate and utilize by a computer.

In essence, a knowledge-graph can be thought of as a semantic network that translates various things and associations between things in the real world into forms of "entity and entity attribute-value" tuples and "entity-relationship-entity" triples that are more computer-processable. Today, the concept of knowledge graph is generalized, and various large knowledge bases are also called knowledge graphs.

Disclosure of Invention

The invention aims to provide a method for intensively extracting experts based on a discipline knowledge graph. The method uses data such as Microsoft Academic Graph (Microsoft Academic Graph), hierarchical structure of domain classification, interlinkage between English Wikipedia (Wikipedia) entry pages and the like as the assistance of extraction tasks. The calculation of the subject matching degree between the project group and the expert uses methods such as path similarity of a tree structure.

The specific technical scheme for realizing the purpose of the invention is as follows:

a method for intensively extracting experts based on a discipline knowledge graph comprises the following specific steps:

step 1: extracting tasks of experts aiming at a certain evaluation activity to obtain all to-be-evaluated project groups, various fields thereof and Chinese and English keyword information;

step 2: respectively calculating the matching degrees between all the project groups and all the experts to obtain an expert alternative set with high matching degree for each project group;

and step 3: on the premise of ensuring that the number of the project groups reviewed by the experts does not exceed the upper limit set by the extraction task, selecting experts from all expert alternative sets in a centralized manner as a final extraction result for all the project groups in the extraction task;

and 4, step 4: if the experts actively or passively quit the review after receiving the notice participating in the review activity, the vacant project groups are intensively supplemented, so that the number of the experts in the final result of each project group meets the requirement again.

Wherein, the step 2 specifically comprises:

step A1: searching academic knowledge graph nodes with the highest association degree for each keyword by using Chinese and English keyword sets in the project group examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each project group example;

step A2: searching academic knowledge graph nodes with the highest association degree for each keyword by using Chinese and English keyword sets in expert examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each expert example;

step A3: calculating the matching degree of the project group instances and each expert instance between each two of the project group instances and the expert instances on the keywords by using the academic knowledge graph node sets of the project group instances and the expert instances obtained in the step A1 and the step A2;

step A4: calculating the matching degree of the project group examples and the expert examples in various fields by using the various field information in the project group examples and the various field information in the expert examples;

step A5: multiplying the keyword matching degree in the step A3 and the field matching degree in the step A4 by respective weights and summing the results to serve as subject matching degrees between the project group examples and the expert examples;

step A6: according to the subject matching degree obtained in the step A5, at most k experts are set in the alternative set of the project group example, and each project group example p is subjected to the classification of all the expert examples_iThe first k bits of the matching degree sequence, and the k expert examples are formed into each item group example p_iCorresponding expert set E_i(ii) a For each project group instance p_iAll allocate k expert instances with highest matching degree as candidate experts of the project group, and form and project group instances p₁～p_nOne-to-one correspondence alternative set E₁～E_n(ii) a Wherein k is a positive integer within 100.

Searching the academic knowledge graph node with the highest degree of association for each keyword and establishing mapping as follows: establishing a mapping f from the keywords to the academic knowledge map nodes, namely keyword → node, and realizing the mapping by analyzing the vocabulary entry page data of the Wikipedia; specifically, using vocabulary entry page data of wikipedia, finding and returning an academic knowledge graph node most similar to a keyword, executing the following steps:

the method comprises the following steps: querying local and network wiki databases, recording links pointing to wiki vocabulary entries in wiki vocabulary entry pages corresponding to each graph node, and caching the obtained graph nodes and link set binary groups into files; after the step is executed once, the step is executed again only when a new graph node is added;

step two: calling Google translation under the condition that the keywords are Chinese, translating the keywords into English, and then executing the step three; when the key word is English, directly executing the step (c);

step three: comparing the key words or the translation results with the character string contents of the map node names, if the character string contents of the node names of a certain map node are completely consistent with the key words, directly returning the node as the mapping result of the key words, otherwise, executing the fourth step;

step IV: querying a wiki database, and if the keyword has a corresponding entry page in the wiki database, and the entry page with part of map nodes and the entry page of the keyword have hyperlinks pointing to the same entry page together, returning the map node with the most common links as a mapping result; if the keyword does not have the same-name entry in the wiki, or the entry pages of all the graph nodes and the entry pages of the keyword do not have the common hyperlink object, executing the step (v);

step five: calling the api of the Wikipedia to access to obtain at most 10 Wikipedia entry pages most corresponding to the keywords, combining the entry hyperlinks in the pages into a set, and executing a step (sixthly);

step (c): if the number of the pages searched by the api is not 0 and the vocabulary entry pages with part of the graph nodes and the hyperlink set obtained in the step (v) have common links, returning the graph node with the maximum number of the common vocabulary entry links with the link set as a mapping result; and if the number of the pages searched by the api is 0, or the link set obtained in the step (c) still has no common link with the entry pages of all the graph nodes, the keyword mapping fails.

Step a3, calculating the matching degree between each two of the project group instances and each expert instance on the keywords: calculating the path similarity of the mapping result of the keywords of the project group instance p and the expert instance e on the map; on the used knowledge graph, the distribution form of the nodes is tree-shaped, and the nodes have certain hierarchy, so that the academic knowledge graph node set N of the project group instance p_PEach node np in_iAnd count thereof cp_iEach node ne in the set of academic knowledge map nodes N with expert instance e_iAnd its count ce_iObtaining all paths from the top layer to the nodes according to the graph, finding a pair of paths with the most overlapped nodes on the paths in all the path pairs of the two sides, and taking the similarity between the path pair as the similarity sim (np) between the two nodes_i,ne_i) (ii) a According to the similarity among all the nodes obtained by calculation, the final keyword matching degree among the examples is calculated according to the following mode:

step a4, calculating the matching degree between the project group instance and each expert instance in each domain, wherein the matching degree is as follows: and calculating by using the similarity between the lowest content in the subject field and the technical field of the project group example and the lowest content in the subject field and the technical field of the expert example, and giving the matching degree of the fields according to the same quantity of the content of the lowest field.

The step 3 specifically includes:

step a: finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples in the alternative set to form a set E_sThe remaining expert instances then form another set E_t(ii) a When an expert is selected to add the final result, set E is selected_sEach expert instance e in the set finds all the project groups which contain the expert instance in the corresponding alternative set and have not obtained enough required experts, and the project groups are large according to the residual vacancy numberSorting until the average rank is small, taking the part with the top rank and the quantity not exceeding the upper limit of the time to calculate the average rank of the matching degree, selecting the expert example with the highest average rank to be added into the final result of the item group for calculating the average rank of the expert example, and sorting from the set E_sDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set E_sDelete and add to set E_t(ii) a Repeating the process of selecting an expert to add to the final result until E_sSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5; step b: when an expert is selected to add the final result, set E is selected_tFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_tDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_tDeleting; repeating the process of selecting an expert to add to the final result until E_tSufficient expertise is available for the empty or all project groups.

The step 4 specifically includes:

step (1): for all experts who actively or passively quit the review of some project groups, deleting the experts from all the alternative sets and final results of the project groups which quit the review;

step (2): finding out all expert examples existing in the final extraction result of any project group, removing the expert examples with the occurrence frequency reaching the upper limit in the final extraction result, and forming the rest expert examples into a set E_a(ii) a When an expert is selected to add the final result, set E is selected_aFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups according to the residual vacancy number from large to small, and taking the project group with the top rank to calculate the average rank of the matching degree; the number of the project groups used for calculating the average ranking is limited in a range which does not enable the total number of the project groups to be reviewed by the expert to exceed the set upper limit of the review task; after the above calculation is completed for each expert, the expert instance with the highest average ranking is selected to be added into the final result of the project group for calculating the average ranking of the expert instance, and the set E is selected_aDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_aDeleting; repeating the process of selecting an expert to add to the final result until E_aSufficient expertise is available for the empty or all project groups;

and (3): finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples which exist in the alternative set and are not contained by the final result of any project group to form a set E_sThe remaining expert instances then form another set E_t(ii) a When an expert is selected to add the final result, set E is selected_sFinds all the expert instances inIn the corresponding candidate set, the item groups which do not obtain enough required experts are sorted from large to small according to the number of the remaining vacancies, the parts with the top ranking and the number not exceeding the upper limit of the time are selected to calculate the average ranking of the matching degree, the expert example with the highest average ranking is selected to be added into the final result of the item group for calculating the average ranking of the expert example, and the set E is followed_sDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set E_sDelete and add to set E_t(ii) a Repeating the process of selecting an expert to add to the final result until E_sSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5;

and (4): when an expert is selected to add the final result, set E is selected_tFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_tDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_tDeleting; repeating the process of selecting an expert to add to the final result until E_tSufficient expertise is available for the empty or all project groups.

It should also be noted that:

1. the most time-consuming part in the method is the step (i) -the step (v) of searching the academic knowledge graph node most similar to each keyword when the step (A1) or the step (A2) is executed, and in consideration of the fact, when the whole extraction task is completed each time, the newly obtained keyword and the mapping result thereof are stored in the table structure corresponding to the database. Before each time the extraction task is started, all known mapping results are first read from the corresponding table structure into a cache for use at any time. Because the content of the expert keywords is not changed frequently except for the situation that new experts are added, the method can save a great deal of time consumption for mapping the expert keywords in each extraction task.

2. The method can randomly adjust the number of the experts to be distributed to the project group instance subset, the maximum number of the expert instances in the expert instance subset and the number of times that the experts are finally distributed to the final result of the project group instance according to the extraction requirement. And simultaneously, the lower limit of the professional related project group instance in the step b can be adjusted, so that the extraction tasks of different scales can be dealt with to a certain extent.

3. When mapping from keywords to academic knowledgegraph nodes is established, there is no limitation on the content of keywords, and natural language keywords in the form of short texts of arbitrary content can be input.

The invention has the following advantages:

1. ease of use: the calculation model of the subject background matching degree is divided into two independent calculation methods of the domain correlation degree based on the domain data and the keyword correlation degree based on the keyword data, and when one of the two methods is incomplete, the correlation degree result can be obtained only in another mode; meanwhile, while utilizing the relatively fixed field information which is easy to match, the method has no limitation on the content of the Chinese and English keywords, and can freely input the keywords of the natural language in the form of short texts with any content, thereby leading the whole set of method to be flexible and easy to use and having wide application range.

2. The novelty is as follows: the method creatively uses the data of the subject knowledge graph such as Microsoft academic knowledge graph (Microsoft academic graph), the hierarchical structure of domain classification, interlinkage between English Wikipedia (Wikipedia) entry pages and the like as auxiliary knowledge, and compared with the traditional text-based similarity measurement of various character strings, the method greatly improves the quality of entity matching and mapping by applying the Wikipedia which is a large data set.

3. Evaluatability: the expert extraction result of the method is obtained by quantitative analysis based on the degree of correlation of subject backgrounds. During the execution of the method, the closeness of subject backgrounds among the various project groups within the group and the closeness of expert-adept subject areas to the reviewed subject backgrounds of the project groups can be analyzed from the knowledge graph. Thus, the method itself provides a criterion that can evaluate the feasibility of grouping the project team results and recommending expert results.

4. The practicability is as follows: the method has great practical significance, and can automatically select proper evaluation experts for the project groups which are requested to be evaluated and received by various scientific research and audit organizations for evaluation, thereby greatly saving manpower. Meanwhile, on the premise of ensuring that the subject background matching degree is not greatly influenced, required experts can be selected or supplemented for the project group as intensively as possible, so that the number of the finally mobilized experts is as small as possible, and the consumption of manpower and material resources in the project group review process is also reduced.

Drawings

FIG. 1 is a schematic diagram of a knowledge-graph structure.

Detailed Description

The present invention will be described in further detail with reference to the following specific examples. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.

The invention comprises the following specific steps:

step 1: for the extraction task corresponding to a certain evaluation activity, all the project groups of the time are obtained

Obtaining a complete set P of all project group instances manually input by a project group presenter in a certain expert extraction task from a database, wherein the project group instances P ∈ P comprise attributes: project group id, project group discipline domain, project group technology domain, and project group keyword set, where each domain is shaped like "chemistry/analytical chemistry/spectral analysis" and includes several levels, usually three to four levels, separated by slashes.

Step 2: obtaining experts with high subject background matching degree for each project group as candidate experts of the project group

For each item group instance P ∈ P obtained in the step 1 and a candidate expert instance complete set E obtained in an expert information database, wherein the expert instance E ∈ E comprises attributes: expert id, subject area, technical area and chinese and english keyword sets, where each area is a number of chinese phrases and the content is the lowest content using the same classification as the areas in the example of the project group, e.g. the content recorded as "chemistry/analytical chemistry/spectral analysis" in the project group, noted as "spectral analysis" in the example of expert. And for each project group, calculating the matching degree between each expert and each field by utilizing the field and the keyword information, and then obtaining an alternative expert set of the project group, wherein the alternative expert set comprises expert examples with high degree of matching with the subject background calculated by the project group examples.

And step 3: uniformly and intensively selecting experts as final extraction results for all project groups in the extraction task

And (3) selecting experts for the project groups from all the experts in the alternative sets corresponding to all the project groups obtained in the step (2) according to the actual quantity requirement as final results. After the step is completed, the number of experts appearing in the final result can be minimized without affecting the overall matching degree. For each project group, after the step is completed, a subset of the expert set with higher matching degree of the project group obtained in the step 2 can be obtained as the result of the extraction task.

And 4, step 4: when the expert actively or passively quits the review, the project group is supplemented in a centralized manner

When such a situation occurs, the experts are first removed from the candidate set and the final result of the project group they exited, and then other experts are selected from the candidate set for augmentation. The same as the previous step, ensures that the total number of experts in the final result is as small as possible after the experts are supplemented.

The step 2 specifically comprises:

step A1: searching academic knowledge graph nodes with the highest degree of association for each keyword by using Chinese and English keyword sets in the project group examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each project group example;

step A2: searching academic knowledge graph nodes with the highest degree of association for each keyword by using Chinese and English keyword sets in the expert examples, establishing mapping, and finally obtaining a set of all academic knowledge graph nodes corresponding to all keywords below each expert example;

step A3: calculating the matching degree of the project group instances and each expert instance on the keywords by using the academic knowledge graph node sets of the project group instances and the expert instances obtained in the step A1 and the step A2;

step A4: calculating the matching degree of the project group examples and the expert examples on various fields by using the field information in each project group example and the field information in each expert example;

step A5: multiplying the keyword matching degree in the step A3 and the field matching degree in the step A4 by respective weights and summing the results to serve as the subject background matching degree between the project group examples and the expert examples;

step A6: assuming that a subset of the project group instances is allocated with k experts at most according to the subject context matching degree between each subset of the project group instances and each expert instance obtained in the step A5, each subset of the project group instances p is allocated with k experts_iAll expert examples are pressed as p_iThe first k bits of the subject background matching degree sequence, and the k expert examples are formed into p_iTo what is providedResponsive review expert set E_i. Allocating k expert examples with relatively highest matching degree to each project group example as candidate experts of the project group, and forming and project group examples p₁～p_nOne-to-one correspondence subset of expert instances E₁～E_n。

In the steps A1 and A2, establishing a mapping f from the keywords to the academic knowledge map nodes, wherein the mapping f is keyword → node and is realized by analyzing data of English Wikipedia entry pages; specifically, using the hyperlink condition in the entry page of wikipedia, finding and returning an academic knowledgegraph node most similar to the current keyword or its english translation requires the following steps:

the method comprises the following steps: and querying a local and network wiki database, and recording links pointing to all wiki entries in a wiki entry page corresponding to each map node. For example, suppose that the term "Artificial intelligence" in wikipedia, english, contains only the following: "In computer science, artificaltingle, sound computer embedded, is embedded signed by computers, In contrast to the natural embedded displayed by computers and animals", wherein embedding, computers, mans and animals are hyperlinks to Wikipedia entry pages, these names are saved as a set of hyperlinks to the atlas node artifical embedding. Caching all the map node and link set binary groups obtained in the mode into files; after the first step is executed once, the second step is executed again only when a new map node is added, and the second step is directly executed when the map is not changed;

step two: under the condition that the current keywords to be mapped are Chinese, calling a Google translation function to translate the keywords into English, and then executing the step three; when the key word is English, directly executing the step (c);

step three: comparing the keyword or the translation result with the character string content of the map node name, if a certain map node is found, and the character string content of the node name is completely consistent with the keyword, directly returning the node as the mapping result of the keyword, otherwise, executing the step IV;

step IV: querying a wiki database, and if the keyword has a corresponding entry page in the wiki database, and the entry pages with part of the graph nodes and the entry pages of the keyword have hyperlinks pointing to the same entry page together, returning the graph nodes with the most number of common entry links of the wiki entry pages and the wiki entry pages of the keyword as mapping results; if the keyword does not have the same-name entry in the wiki, or the entry pages of all the graph nodes and the entry pages of the keyword do not have the common hyperlink object, executing the step (c);

step five: calling the api of Wikipedia to access to obtain at most 10 Wikipedia entry pages most related to the keywords, combining the entry hyperlinks in the pages into a set, and executing a step (sixthly);

step (c): if the number of the pages searched by the api is not 0 and the vocabulary entry pages with partial graph nodes and the hyperlink set obtained in the step (c) have common links, returning the graph nodes with the maximum number of the common vocabulary entry links with the link set as mapping results; and if the number of the pages searched by the api is 0, or the link set obtained in the step (c) still has no common link with the entry pages of all the graph nodes, the keyword mapping fails.

In step a3, the degree of matching between the keywords of the project group instance and the expert instance is calculated by using the sum of the path similarities between each two nodes in the set of mapping nodes of the keywords of the two instances on the knowledge graph. On the used knowledge graph, the distribution form of the nodes has certain hierarchy, so the academic knowledge graph node set N of the project group instances_PEach node np in_iAnd count thereof cp_iWith each node ne in the expert instance academic knowledge graph node set N_iAnd its count ce_iFirstly, all paths from the top layer to the nodes are obtained according to the map, and the coincidence on the paths is found in all path pairsThe similarity between the pair of paths with the most nodes is taken as the similarity sim (np) between the two nodes_i,ne_i) The final keyword match between instances is calculated as follows:

in step a4, the domain matching degree between the project group instance and the expert instance is calculated as follows: the lowest layers of the subject field and the technical field in the example of the project group are respectively taken out, for example, chemical/analytical chemistry/spectral analysis, then spectral analysis is taken out and respectively used as Fpr_iAnd Fpt_iTwo sets; for the expert example, the recorded subject field and technical field contents are directly used as Fer respectively_iAnd Fet_iTwo sets. Separately find Fpr_iAnd Fer_iIntersection of and Fpt_iAnd Fet_iThe more the intersection contains the domain information, the higher the domain matching degree.

The method for centrally selecting experts as final results described in step 3 comprises the following specific steps:

step a: finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples in the alternative set to form a set E_sThe remaining expert instances then form another set E_t. When an expert is selected to add the final result, set E is selected_sFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_sDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all item groups for calculating the average ranking, if anyMultiple experts have the same remaining gap sum and the highest average rank, and are randomly selected from them. In the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set E_sDelete and add to set E_t. Repeating the above process of selecting an expert to add to the final result until E_sSufficient expertise is available for the empty or all project groups;

step b: when an expert is selected to add the final result, set E is selected_tFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_tDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_tIs deleted. Repeating the above process of selecting an expert to add to the final result until E_tSufficient expertise is available for the empty or all project groups.

The method for supplementing the condition that the expert quits the review in the step 4 comprises the following specific steps:

step (2): finding out all expert examples existing in the final extraction result of any project group, removing the expert examples with the occurrence frequency reaching the upper limit in the final extraction result, and combining the rest expert examples into a setAnd E_a. When an expert is selected to add the final result, set E is selected_aFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups according to the residual vacancy number from large to small, and taking the project group with the top rank to calculate the average rank of the matching degree; the number of the project groups used for calculating the average ranking is limited within the range that the total number of the project groups reviewed by the expert does not exceed the set upper limit of the review task. After the above calculation is completed for each expert, the expert instance with the highest average ranking is selected to be added into the final result of the project group for calculating the average ranking of the expert instance, and the set E is selected_aDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_aIs deleted. Repeating the above process of selecting an expert to add to the final result until E_aSufficient expertise is available for the empty or all project groups.

And (3): finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples which exist in the alternative set and are not contained by the final result of any project group to form a set E_sThe remaining expert instances then form another set E_t. When an expert is selected to add the final result, set E is selected_sFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert examples from the setAnd E_sDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set E_sDelete and add to set E_t. Repeating the above process of selecting an expert to add to the final result until E_sSufficient expertise is available for the empty or all project groups;

and (4): when an expert is selected to add the final result, set E is selected_tFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_tDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_tIs deleted. Repeating the above process of selecting an expert to add to the final result until E_tSufficient expertise is available for the empty or all project groups.

Examples

In step 1, a complete set of project group instances as shown in table 1 is obtained by reading the project group data corresponding to one extraction task in the database:

TABLE 1 complete set of example sets of item groups

In the execution process of the step 2, when the step a1 is executed, for a certain project group instance, firstly, each keyword is subjected to the steps of (i) - (sixth) to find the academic knowledgegraph node most similar to the keyword. For example, the translation result "feature selection" of the keyword "feature screening" of the item group with id 4 has the same-name node in the graph, so the same-name graph node "laser science" is returned in step three; after the keyword "molybdenum disulfide" in the project group 1 is translated into "molybdenum disulide" through *** translation, ten entry pages returned by calling api of the wikipedia have the most number of common links with the entry page of the graph node "inorganic chemistry", so that the graph node "inorganic chemistry" is returned in step (c). After step a1 is performed, the mapping of keywords to academic knowledgegraph nodes as in table 2 can be obtained and the set of academic knowledgegraph nodes for the project group as in table 3 is obtained (taking project group 1 as an example):

table 2 maps f keyword → node

TABLE 3 academic knowledgegraph node set for project groups

After step A2 is performed, the set of academic knowledgegraph nodes for all expert instances is obtained in the same form. Assume that a set of academic knowledge graph nodes for expert example a is obtained as shown in table 4.

TABLE 4 academic knowledgegraph node set of experts

When the step a3 is executed, the keyword matching degree between the project group and all experts is calculated, taking the node statistical similarity of the project group 1 and the node statistical probability of the expert a as an example, all paths that the nodes on both sides reach from the node on the top layer in the knowledge graph are respectively obtained, as shown in fig. 1, and the maximum path similarity, that is, the number of nodes that are most likely to repeat on the path is obtained by comparison, and the maximum value is taken. In the legend, both sides of the node have unique paths, so the most similar pair of paths is "diagnostic medical knowledge" is- < -radiology < - - - - - < -medicine < - - - "and" soft tissue path "is- < - > -path < - - -medicine < - > -number of identical nodes is 1. And calculating according to a certain weight to obtain a result of path similarity between the two nodes, wherein the result is 0.25. In the case of project group 1 and expert a, the final result needs to be multiplied by the node count 2 of the project group and the node count 1 of the expert as the similarity of the project group and the expert on the two nodes. And calculating the similarity result for all node pairing modes between the expert and the project group, and taking the final result of the result accumulation as the keyword matching degree between the project group and the expert.

In step a4, assuming that the subject field of expert a is "pharmacy" and the technical field is "novel drug delivery formulation technology", it is clear that the bottommost field, which coincides both in subject field and technical field with project group 3, has the highest field match of 16. The other project groups do not have any overlapped bottommost layer fields, so the field matching degree is the basic value 12.

After the step a5 is executed, the matching degree of the keywords calculated in the step A3 and the matching degree of the domains obtained by comparing the lowest layer overlapping conditions of the two domains in the step a4 are multiplied by respective weights and added, and the subject background matching degree between the project group instances and all the expert instances is calculated.

When the step a6 is executed, for each project group instance, the expert instances may be simply ranked according to the calculated expert matching degree score, and a plurality of expert instances ranked before are selected as candidate experts for the project group according to the requirement of the current extraction. In this example, the ranking range k is set to 5.

After the above steps are completed, a candidate expert set consisting of experts with a matching degree ranking range of 5 or more can be obtained for each project group instance. Assuming that the lower limit L of the three project groups in this example is set to 2, each project group of this evaluation activity needs 3 experts for evaluation, the upper limit of each expert evaluation project group is 20, and the candidate experts and the filling condition of each project group are shown in tables 5 to 7:

TABLE 5 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 1

Expert id	Ranking	Whether or not it has been selected
			10	1	Whether or not
18	2	Whether or not
			16	3	Whether or not
25	4	Whether or not
			32	5	Whether or not

TABLE 6 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 2

Expert id	Ranking	Whether or not it has been selected
			10	1	Whether or not
18	2	Whether or not
			25	3	Whether or not
16	4	Whether or not
			32	4	Whether or not

TABLE 7 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM 3

Expert id	Ranking	Whether or not it has been selected
			32	1	Whether or not
12	1	Whether or not
			13	3	Whether or not
16	3	Whether or not
			11	3	Whether or not
88	3	Whether or not

Thereafter in the course of executing step 3, in step a, first all expert instance components that appear in the alternative set a number of times greater than or equal to the lower limit 2 are foundSet, in this example this set E_s10,18,16,25,32, and the remaining expert set E_tIs {12,13,11,88 }.

When going from E_sWhen one expert is selected to be added into the final result, for each expert in the set, all the project groups which contain the expert example in the corresponding candidate set and do not obtain enough needed experts are found, the project groups are sorted from large to small according to the residual vacancy number, and the part with the top rank and the number not exceeding the upper limit of the time is selected to calculate the average matching rank. Taking the expert 10 as an example, the item groups including the expert 10 in the candidate set are the item groups 1 and 2. Since the two item groups are both 3 empty, the ordering result can be 1, 2 or 2, 1. Since the upper limit of the number of the expert review project groups is 20, the average ranking of (1+1)/2 is 1, which can be calculated by directly using all project groups in the ranking result. Similarly for other experts, an average rank of 2 for expert 18, 3.33 for expert 16, 3.5 for expert 25, and 3.33 for expert 32 is available, thus choosing to add expert 10 to the final results for project groups 1 and 2 that participate in the average rank calculation for expert 10, and taking expert 10 from E_sIs deleted. Thereafter, each time an expert needs to be selected to add to the final result, the expert is selected in the manner described above. In this example, the expert 18 is first added to the project groups 1, 2 thereafter, and thereafter the expert 16 is randomly selected to add to the project groups 1, 2, 3 because the expert 16 has the same average rank and vacancy as the expert 32. At this time, if one expert is selected to be added to the final result, E is added_sThe remaining experts 25,32 in the candidate set and the remaining empty item groups are only item group 3, i.e. the number of item groups available for calculating the average ranking is less than the lower limit 2, hence from E_sDelete experts 25,32 and add E_tIn (1). At this time due to E_sIf it is empty and not all project groups have all required experts, step b is entered.

In step b, when the slave E is to be operated_tWhen choosing one expert from {12,13,11,88,25,32} to add to the final result, the average ranking is calculated in the same way for each expert in the set. Taking expert 12 as an example, corresponding to the alternative setThe project group including the expert 12 and having a vacancy is only the project group 3, and thus the average ranking of the expert 12 at this time is easily obtained as 1. After the average ranking of all experts in the set is calculated, any one of experts 12 and 32 with the highest average ranking and the same sum of the remaining vacancies of all the project groups for calculating the average ranking is selected, added to the result of project group 3, and the result E is obtained_tIs deleted. Thereafter, the average ranking is calculated in the same manner and the remaining experts of experts 12 and 32 are added to the final result of project group 3. All project groups at this point acquire the required expertise and step 3 is therefore complete. The whole process is shown in Table 8:

TABLE 8 operation of the exemplary cases in steps a-b

By the mode, the finally selected experts are ensured to be concentrated as much as possible on the premise of not excessively influencing the overall matching degree.

And (4) when the situation that the expert actively or passively quits the review occurs, supplementing according to the steps (1) to (4) in the step 4. Assuming that the experts 16 in the project group 2 and the project group 3 quit the review, after the step (1) is executed to delete from the candidate set and the final result, the candidate experts and the filling condition of each project group are shown in tables 9-11:

TABLE 9 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 1

Expert id	Ranking	Whether or not it has been selected
			10	1	Is that
18	2	Is that
			16	3	Is that
25	4	Whether or not
			32	5	Whether or not

TABLE 10 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 2

Expert id	Ranking	Whether or not it has been selected
			10	1	Is that
18	2	Is that
			25	3	Whether or not
32	4	Whether or not

TABLE 11 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM 3

Expert id	Ranking	Whether or not it has been selected
			32	1	Is that
12	1	Is that
			13	3	Whether or not
11	3	Whether or not
			88	3	Whether or not

In step (2), all expert examples already existing in the final extraction result of any project group are obtained, the expert examples with the occurrence frequency reaching the upper limit in the final extraction result are removed, and the rest expert examples are combined into a set E_a{10,18,16,32,12}. When an expert is selected to add the final result, set E is selected_aFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups from large to small according to the residual vacancy number, and taking the project group with the top rank to calculate the average rank of the matching degree. Taking the expert 32 as an example, the item group that includes the expert 32 in the corresponding candidate set and does not add the expert 32 to the final result and does not obtain enough experts needed is only the item group 2, and the average ranking is calculated to be 4. In this example due to E_aThe number of sets of items available to other experts to calculate the average ranking is all 0, so these experts are referred to as E_aAnd (4) removing. The expert 32 now has the highest average ranking, and adds this expert instance to the final result of project group 2, from E, which is used to calculate the average ranking_aAnd (4) removing. At this time E_aIf empty, go to step (3).

In step (3), all expert instances with the frequency of occurrence in the alternative set being more than or equal to the lower limit 2 are found out from all expert instances which exist in the alternative set and are not contained by the final result of any project group, and a set E is formed_s{25}, the remainder forming the set E_t{13,11,88}. When an expert is selected to add to the final result, there are no sets of items for the expert 25 that include the expert 25 in the corresponding candidate set and have not obtained enough experts needed, i.e., the number of sets of items from which the average ranking is calculated is 0, less than the lower limit of 2, so the expert 25 is selected from E_sIs deleted. At this time due to E_sIf empty, go to step (4). In step (4), set E of experts_t{13,11,88}, when an expert is selected to join the final result, taking expert 11 as an example, the corresponding candidate set contains expert 11 and is still emptyThe missing project group is only project group 3, so the average ranking of expert 11 at this time is readily available as 3. The average ranking of both expert 11 and expert 88 was calculated to be 3 using the same method. Since the average rank is the same and the sum of the remaining vacancies of all the project groups used to calculate the average rank is also the same, the final result of joining the project group 3 is randomly selected 11 from them, all the project groups obtain the required experts at this time, and step (4) is ended. By the method, all project groups are supplemented by experts, and new experts are prevented from being added as much as possible. The above procedure is shown in table 12 below:

table 12 illustrates the operation of the case in steps (1) to (4)

The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims

1. A method for intensively extracting experts based on a discipline knowledge graph is characterized by comprising the following specific steps:

2. The method for extracting experts in a discipline knowledge graph set based on claim 1, wherein the step 2 specifically comprises:

3. The method for extracting experts from a set of discipline knowledge graphs as claimed in claim 2, wherein the searching for the academic knowledge graph node with the highest association degree for each keyword and establishing the mapping is as follows: establishing a mapping f from the keywords to the academic knowledge map nodes, namely keyword → node, and realizing the mapping by analyzing the vocabulary entry page data of the Wikipedia; specifically, using vocabulary entry page data of wikipedia, finding and returning an academic knowledge graph node most similar to a keyword, executing the following steps:

4. The method for extracting experts based on discipline knowledge graph set as claimed in claim 2, wherein the step a3 is to calculate the matching degree between each two instances of the project group and each expert instance on the keywords: calculating the path similarity of the mapping result of the keywords of the project group instance p and the expert instance e on the map; on the used knowledge graph, the distribution form of the nodes is tree-shaped, and the nodes have certain hierarchy, so that the academic knowledge graph node set N of the project group instance p_PEach node np in_iAnd count thereof cp_iAcademic knowledge graph node set N with expert instance e_eEach node ne in_iAnd its count ce_iObtaining all paths from the top layer to the nodes according to the graph, finding a pair of paths with the most overlapped nodes on the paths in all the path pairs of the two sides, and taking the similarity between the path pair as the similarity sim (np) between the two nodes_i,ne_i) (ii) a According to the similarity among all the nodes obtained by calculation, the final keyword matching degree among the examples is calculated according to the following mode:

5. the method for extracting experts based on discipline knowledge graph set as claimed in claim 2, wherein the step a4 is to calculate the matching degree between the project group instance and each expert instance in each domain as: and calculating by using the similarity between the lowest content in the subject field and the technical field of the project group example and the lowest content in the subject field and the technical field of the expert example, and giving the matching degree of the fields according to the same quantity of the content of the lowest field.

6. The method for extracting experts in a discipline knowledge graph set based on claim 1, wherein the step 3 specifically comprises:

step a: finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples in the alternative set to form a set E_sThe remaining expert instances then form another set E_t(ii) a When an expert is selected to add the final result, set E is selected_sFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_sDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set E_sDelete and add to set E_s(ii) a Repeating the process of selecting an expert to add to the final result until E_sSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5;

step b: when an expert is selected to add the final result, set E is selected_tFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_sDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_sDeleting; repeating the process of selecting an expert to add to the final result until E_sSufficient expertise is available for the empty or all project groups.

7. The method for extracting experts in a discipline knowledge graph set based on claim 1, wherein the step 4 specifically comprises:

step (2): finding out all expert examples existing in the final extraction result of any project group, removing the expert examples with the occurrence frequency reaching the upper limit in the final extraction result, and forming the rest expert examples into a set E_a(ii) a When an expert is selected to add the final result, set E is selected_aFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups from large to small according to the residual vacancy number, taking the project groups with the top rank to calculate the average matching degreeRanking; the number of the project groups used for calculating the average ranking is limited in a range which does not enable the total number of the project groups to be reviewed by the expert to exceed the set upper limit of the review task; after the above calculation is completed for each expert, the expert instance with the highest average ranking is selected to be added into the final result of the project group for calculating the average ranking of the expert instance, and the set E is selected_aDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set E_aDeleting; repeating the process of selecting an expert to add to the final result until E_aSufficient expertise is available for the empty or all project groups;

and (3): finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples which exist in the alternative set and are not contained by the final result of any project group to form a set E_sThe remaining expert instances then form another set E_t(ii) a When an expert is selected to add the final result, set E is selected_sFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set E_sDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to calculate the average ranking is less than the lower limit L, the expert instance will be assigned the lower limit LExpert instances from set E_sDelete and add to set E_t(ii) a Repeating the process of selecting an expert to add to the final result until E_sSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5;