CN111666420A - Method for intensively extracting experts based on subject knowledge graph - Google Patents

Method for intensively extracting experts based on subject knowledge graph Download PDF

Info

Publication number
CN111666420A
CN111666420A CN202010474948.6A CN202010474948A CN111666420A CN 111666420 A CN111666420 A CN 111666420A CN 202010474948 A CN202010474948 A CN 202010474948A CN 111666420 A CN111666420 A CN 111666420A
Authority
CN
China
Prior art keywords
expert
experts
project
average ranking
groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010474948.6A
Other languages
Chinese (zh)
Other versions
CN111666420B (en
Inventor
林欣
王辰奕
高桢
孙琪力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202010474948.6A priority Critical patent/CN111666420B/en
Publication of CN111666420A publication Critical patent/CN111666420A/en
Application granted granted Critical
Publication of CN111666420B publication Critical patent/CN111666420B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for intensively extracting experts based on a discipline knowledge graph. The method specifically comprises the following steps: the method comprises the steps of establishing mapping from keywords to academic knowledge map nodes by using the similarity of English Wikipedia hyperlinks, calculating the matching degree between a project group and experts by using the mapping result, finding a plurality of experts which are relatively most matched for the project group by using matching degree scores, intensively selecting the experts from the experts found in the previous step as the actual result according to the overall situation of the extraction, and filling the vacant project group on the premise of ensuring that the experts are more concentrated when the experts actively or passively quit the review. The invention is flexible and easy to use, and has wide application range; the method can select proper experts for the project groups proposed by experts in each industry, and ensure the concentration of the selected experts to a certain extent on the premise of not damaging the matching degree of the experts.

Description

Method for intensively extracting experts based on subject knowledge graph
Technical Field
The invention relates to the field of natural language processing and the field of database entity matching and entity mapping. Specifically, the method is a method for obtaining a plurality of experts with high subject correlation degree respectively for a plurality of project groups in one review activity in batch by taking data of a subject knowledge graph as assistance, and selecting the experts as extraction results from the experts as intensively as possible.
Background
In recent years, with the improvement of hardware performance and the explosive increase of information on the internet, methods for processing and analyzing big data have been rapidly developed and are increasingly widely used in many fields. The knowledge graph has great advantages in the aspect of improving the information retrieval quality as a novel big data processing means.
The concept of knowledge graph was first proposed by *** corporation, and at first, knowledge graph was mainly used to assist the search engine of *** to retrieve information. With the development of big data processing and analyzing methods in recent years, knowledge maps are widely applied in the fields of intelligent search, intelligent question answering, intelligent recommendation and the like. Particularly in the field of intelligent search, the defect of searching only through keyword matching is made up by the appearance of the knowledge graph, so that a search engine can make an educated guess on the specific intention of user query to a certain extent, and concept retrieval or semantic retrieval is realized. Under the support of the knowledge graph, the computer can better understand the expression mode of the human language and intelligently feed back a retrieval result which is more suitable for the requirements of the user. In addition, the structural characteristics of the knowledge graph enable the relationship among various information entities to be reflected more clearly, so that the information is aggregated into knowledge, and the information is easier to understand, evaluate and utilize by a computer.
In essence, a knowledge-graph can be thought of as a semantic network that translates various things and associations between things in the real world into forms of "entity and entity attribute-value" tuples and "entity-relationship-entity" triples that are more computer-processable. Today, the concept of knowledge graph is generalized, and various large knowledge bases are also called knowledge graphs.
Disclosure of Invention
The invention aims to provide a method for intensively extracting experts based on a discipline knowledge graph. The method uses data such as Microsoft Academic Graph (Microsoft Academic Graph), hierarchical structure of domain classification, interlinkage between English Wikipedia (Wikipedia) entry pages and the like as the assistance of extraction tasks. The calculation of the subject matching degree between the project group and the expert uses methods such as path similarity of a tree structure.
The specific technical scheme for realizing the purpose of the invention is as follows:
a method for intensively extracting experts based on a discipline knowledge graph comprises the following specific steps:
step 1: extracting tasks of experts aiming at a certain evaluation activity to obtain all to-be-evaluated project groups, various fields thereof and Chinese and English keyword information;
step 2: respectively calculating the matching degrees between all the project groups and all the experts to obtain an expert alternative set with high matching degree for each project group;
and step 3: on the premise of ensuring that the number of the project groups reviewed by the experts does not exceed the upper limit set by the extraction task, selecting experts from all expert alternative sets in a centralized manner as a final extraction result for all the project groups in the extraction task;
and 4, step 4: if the experts actively or passively quit the review after receiving the notice participating in the review activity, the vacant project groups are intensively supplemented, so that the number of the experts in the final result of each project group meets the requirement again.
Wherein, the step 2 specifically comprises:
step A1: searching academic knowledge graph nodes with the highest association degree for each keyword by using Chinese and English keyword sets in the project group examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each project group example;
step A2: searching academic knowledge graph nodes with the highest association degree for each keyword by using Chinese and English keyword sets in expert examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each expert example;
step A3: calculating the matching degree of the project group instances and each expert instance between each two of the project group instances and the expert instances on the keywords by using the academic knowledge graph node sets of the project group instances and the expert instances obtained in the step A1 and the step A2;
step A4: calculating the matching degree of the project group examples and the expert examples in various fields by using the various field information in the project group examples and the various field information in the expert examples;
step A5: multiplying the keyword matching degree in the step A3 and the field matching degree in the step A4 by respective weights and summing the results to serve as subject matching degrees between the project group examples and the expert examples;
step A6: according to the subject matching degree obtained in the step A5, at most k experts are set in the alternative set of the project group example, and each project group example p is subjected to the classification of all the expert examplesiThe first k bits of the matching degree sequence, and the k expert examples are formed into each item group example piCorresponding expert set Ei(ii) a For each project group instance piAll allocate k expert instances with highest matching degree as candidate experts of the project group, and form and project group instances p1~pnOne-to-one correspondence alternative set E1~En(ii) a Wherein k is a positive integer within 100.
Searching the academic knowledge graph node with the highest degree of association for each keyword and establishing mapping as follows: establishing a mapping f from the keywords to the academic knowledge map nodes, namely keyword → node, and realizing the mapping by analyzing the vocabulary entry page data of the Wikipedia; specifically, using vocabulary entry page data of wikipedia, finding and returning an academic knowledge graph node most similar to a keyword, executing the following steps:
the method comprises the following steps: querying local and network wiki databases, recording links pointing to wiki vocabulary entries in wiki vocabulary entry pages corresponding to each graph node, and caching the obtained graph nodes and link set binary groups into files; after the step is executed once, the step is executed again only when a new graph node is added;
step two: calling Google translation under the condition that the keywords are Chinese, translating the keywords into English, and then executing the step three; when the key word is English, directly executing the step (c);
step three: comparing the key words or the translation results with the character string contents of the map node names, if the character string contents of the node names of a certain map node are completely consistent with the key words, directly returning the node as the mapping result of the key words, otherwise, executing the fourth step;
step IV: querying a wiki database, and if the keyword has a corresponding entry page in the wiki database, and the entry page with part of map nodes and the entry page of the keyword have hyperlinks pointing to the same entry page together, returning the map node with the most common links as a mapping result; if the keyword does not have the same-name entry in the wiki, or the entry pages of all the graph nodes and the entry pages of the keyword do not have the common hyperlink object, executing the step (v);
step five: calling the api of the Wikipedia to access to obtain at most 10 Wikipedia entry pages most corresponding to the keywords, combining the entry hyperlinks in the pages into a set, and executing a step (sixthly);
step (c): if the number of the pages searched by the api is not 0 and the vocabulary entry pages with part of the graph nodes and the hyperlink set obtained in the step (v) have common links, returning the graph node with the maximum number of the common vocabulary entry links with the link set as a mapping result; and if the number of the pages searched by the api is 0, or the link set obtained in the step (c) still has no common link with the entry pages of all the graph nodes, the keyword mapping fails.
Step a3, calculating the matching degree between each two of the project group instances and each expert instance on the keywords: calculating the path similarity of the mapping result of the keywords of the project group instance p and the expert instance e on the map; on the used knowledge graph, the distribution form of the nodes is tree-shaped, and the nodes have certain hierarchy, so that the academic knowledge graph node set N of the project group instance pPEach node np iniAnd count thereof cpiEach node ne in the set of academic knowledge map nodes N with expert instance eiAnd its count ceiObtaining all paths from the top layer to the nodes according to the graph, finding a pair of paths with the most overlapped nodes on the paths in all the path pairs of the two sides, and taking the similarity between the path pair as the similarity sim (np) between the two nodesi,nei) (ii) a According to the similarity among all the nodes obtained by calculation, the final keyword matching degree among the examples is calculated according to the following mode:
Figure BDA0002515578180000031
step a4, calculating the matching degree between the project group instance and each expert instance in each domain, wherein the matching degree is as follows: and calculating by using the similarity between the lowest content in the subject field and the technical field of the project group example and the lowest content in the subject field and the technical field of the expert example, and giving the matching degree of the fields according to the same quantity of the content of the lowest field.
The step 3 specifically includes:
step a: finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples in the alternative set to form a set EsThe remaining expert instances then form another set Et(ii) a When an expert is selected to add the final result, set E is selectedsEach expert instance e in the set finds all the project groups which contain the expert instance in the corresponding alternative set and have not obtained enough required experts, and the project groups are large according to the residual vacancy numberSorting until the average rank is small, taking the part with the top rank and the quantity not exceeding the upper limit of the time to calculate the average rank of the matching degree, selecting the expert example with the highest average rank to be added into the final result of the item group for calculating the average rank of the expert example, and sorting from the set EsDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set EsDelete and add to set Et(ii) a Repeating the process of selecting an expert to add to the final result until EsSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5; step b: when an expert is selected to add the final result, set E is selectedtFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EtDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EtDeleting; repeating the process of selecting an expert to add to the final result until EtSufficient expertise is available for the empty or all project groups.
The step 4 specifically includes:
step (1): for all experts who actively or passively quit the review of some project groups, deleting the experts from all the alternative sets and final results of the project groups which quit the review;
step (2): finding out all expert examples existing in the final extraction result of any project group, removing the expert examples with the occurrence frequency reaching the upper limit in the final extraction result, and forming the rest expert examples into a set Ea(ii) a When an expert is selected to add the final result, set E is selectedaFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups according to the residual vacancy number from large to small, and taking the project group with the top rank to calculate the average rank of the matching degree; the number of the project groups used for calculating the average ranking is limited in a range which does not enable the total number of the project groups to be reviewed by the expert to exceed the set upper limit of the review task; after the above calculation is completed for each expert, the expert instance with the highest average ranking is selected to be added into the final result of the project group for calculating the average ranking of the expert instance, and the set E is selectedaDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EaDeleting; repeating the process of selecting an expert to add to the final result until EaSufficient expertise is available for the empty or all project groups;
and (3): finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples which exist in the alternative set and are not contained by the final result of any project group to form a set EsThe remaining expert instances then form another set Et(ii) a When an expert is selected to add the final result, set E is selectedsFinds all the expert instances inIn the corresponding candidate set, the item groups which do not obtain enough required experts are sorted from large to small according to the number of the remaining vacancies, the parts with the top ranking and the number not exceeding the upper limit of the time are selected to calculate the average ranking of the matching degree, the expert example with the highest average ranking is selected to be added into the final result of the item group for calculating the average ranking of the expert example, and the set E is followedsDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set EsDelete and add to set Et(ii) a Repeating the process of selecting an expert to add to the final result until EsSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5;
and (4): when an expert is selected to add the final result, set E is selectedtFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EtDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EtDeleting; repeating the process of selecting an expert to add to the final result until EtSufficient expertise is available for the empty or all project groups.
It should also be noted that:
1. the most time-consuming part in the method is the step (i) -the step (v) of searching the academic knowledge graph node most similar to each keyword when the step (A1) or the step (A2) is executed, and in consideration of the fact, when the whole extraction task is completed each time, the newly obtained keyword and the mapping result thereof are stored in the table structure corresponding to the database. Before each time the extraction task is started, all known mapping results are first read from the corresponding table structure into a cache for use at any time. Because the content of the expert keywords is not changed frequently except for the situation that new experts are added, the method can save a great deal of time consumption for mapping the expert keywords in each extraction task.
2. The method can randomly adjust the number of the experts to be distributed to the project group instance subset, the maximum number of the expert instances in the expert instance subset and the number of times that the experts are finally distributed to the final result of the project group instance according to the extraction requirement. And simultaneously, the lower limit of the professional related project group instance in the step b can be adjusted, so that the extraction tasks of different scales can be dealt with to a certain extent.
3. When mapping from keywords to academic knowledgegraph nodes is established, there is no limitation on the content of keywords, and natural language keywords in the form of short texts of arbitrary content can be input.
The invention has the following advantages:
1. ease of use: the calculation model of the subject background matching degree is divided into two independent calculation methods of the domain correlation degree based on the domain data and the keyword correlation degree based on the keyword data, and when one of the two methods is incomplete, the correlation degree result can be obtained only in another mode; meanwhile, while utilizing the relatively fixed field information which is easy to match, the method has no limitation on the content of the Chinese and English keywords, and can freely input the keywords of the natural language in the form of short texts with any content, thereby leading the whole set of method to be flexible and easy to use and having wide application range.
2. The novelty is as follows: the method creatively uses the data of the subject knowledge graph such as Microsoft academic knowledge graph (Microsoft academic graph), the hierarchical structure of domain classification, interlinkage between English Wikipedia (Wikipedia) entry pages and the like as auxiliary knowledge, and compared with the traditional text-based similarity measurement of various character strings, the method greatly improves the quality of entity matching and mapping by applying the Wikipedia which is a large data set.
3. Evaluatability: the expert extraction result of the method is obtained by quantitative analysis based on the degree of correlation of subject backgrounds. During the execution of the method, the closeness of subject backgrounds among the various project groups within the group and the closeness of expert-adept subject areas to the reviewed subject backgrounds of the project groups can be analyzed from the knowledge graph. Thus, the method itself provides a criterion that can evaluate the feasibility of grouping the project team results and recommending expert results.
4. The practicability is as follows: the method has great practical significance, and can automatically select proper evaluation experts for the project groups which are requested to be evaluated and received by various scientific research and audit organizations for evaluation, thereby greatly saving manpower. Meanwhile, on the premise of ensuring that the subject background matching degree is not greatly influenced, required experts can be selected or supplemented for the project group as intensively as possible, so that the number of the finally mobilized experts is as small as possible, and the consumption of manpower and material resources in the project group review process is also reduced.
Drawings
FIG. 1 is a schematic diagram of a knowledge-graph structure.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
The invention comprises the following specific steps:
step 1: for the extraction task corresponding to a certain evaluation activity, all the project groups of the time are obtained
Obtaining a complete set P of all project group instances manually input by a project group presenter in a certain expert extraction task from a database, wherein the project group instances P ∈ P comprise attributes: project group id, project group discipline domain, project group technology domain, and project group keyword set, where each domain is shaped like "chemistry/analytical chemistry/spectral analysis" and includes several levels, usually three to four levels, separated by slashes.
Step 2: obtaining experts with high subject background matching degree for each project group as candidate experts of the project group
For each item group instance P ∈ P obtained in the step 1 and a candidate expert instance complete set E obtained in an expert information database, wherein the expert instance E ∈ E comprises attributes: expert id, subject area, technical area and chinese and english keyword sets, where each area is a number of chinese phrases and the content is the lowest content using the same classification as the areas in the example of the project group, e.g. the content recorded as "chemistry/analytical chemistry/spectral analysis" in the project group, noted as "spectral analysis" in the example of expert. And for each project group, calculating the matching degree between each expert and each field by utilizing the field and the keyword information, and then obtaining an alternative expert set of the project group, wherein the alternative expert set comprises expert examples with high degree of matching with the subject background calculated by the project group examples.
And step 3: uniformly and intensively selecting experts as final extraction results for all project groups in the extraction task
And (3) selecting experts for the project groups from all the experts in the alternative sets corresponding to all the project groups obtained in the step (2) according to the actual quantity requirement as final results. After the step is completed, the number of experts appearing in the final result can be minimized without affecting the overall matching degree. For each project group, after the step is completed, a subset of the expert set with higher matching degree of the project group obtained in the step 2 can be obtained as the result of the extraction task.
And 4, step 4: when the expert actively or passively quits the review, the project group is supplemented in a centralized manner
When such a situation occurs, the experts are first removed from the candidate set and the final result of the project group they exited, and then other experts are selected from the candidate set for augmentation. The same as the previous step, ensures that the total number of experts in the final result is as small as possible after the experts are supplemented.
The step 2 specifically comprises:
step A1: searching academic knowledge graph nodes with the highest degree of association for each keyword by using Chinese and English keyword sets in the project group examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each project group example;
step A2: searching academic knowledge graph nodes with the highest degree of association for each keyword by using Chinese and English keyword sets in the expert examples, establishing mapping, and finally obtaining a set of all academic knowledge graph nodes corresponding to all keywords below each expert example;
step A3: calculating the matching degree of the project group instances and each expert instance on the keywords by using the academic knowledge graph node sets of the project group instances and the expert instances obtained in the step A1 and the step A2;
step A4: calculating the matching degree of the project group examples and the expert examples on various fields by using the field information in each project group example and the field information in each expert example;
step A5: multiplying the keyword matching degree in the step A3 and the field matching degree in the step A4 by respective weights and summing the results to serve as the subject background matching degree between the project group examples and the expert examples;
step A6: assuming that a subset of the project group instances is allocated with k experts at most according to the subject context matching degree between each subset of the project group instances and each expert instance obtained in the step A5, each subset of the project group instances p is allocated with k expertsiAll expert examples are pressed as piThe first k bits of the subject background matching degree sequence, and the k expert examples are formed into piTo what is providedResponsive review expert set Ei. Allocating k expert examples with relatively highest matching degree to each project group example as candidate experts of the project group, and forming and project group examples p1~pnOne-to-one correspondence subset of expert instances E1~En
In the steps A1 and A2, establishing a mapping f from the keywords to the academic knowledge map nodes, wherein the mapping f is keyword → node and is realized by analyzing data of English Wikipedia entry pages; specifically, using the hyperlink condition in the entry page of wikipedia, finding and returning an academic knowledgegraph node most similar to the current keyword or its english translation requires the following steps:
the method comprises the following steps: and querying a local and network wiki database, and recording links pointing to all wiki entries in a wiki entry page corresponding to each map node. For example, suppose that the term "Artificial intelligence" in wikipedia, english, contains only the following: "In computer science, artificaltingle, sound computer embedded, is embedded signed by computers, In contrast to the natural embedded displayed by computers and animals", wherein embedding, computers, mans and animals are hyperlinks to Wikipedia entry pages, these names are saved as a set of hyperlinks to the atlas node artifical embedding. Caching all the map node and link set binary groups obtained in the mode into files; after the first step is executed once, the second step is executed again only when a new map node is added, and the second step is directly executed when the map is not changed;
step two: under the condition that the current keywords to be mapped are Chinese, calling a Google translation function to translate the keywords into English, and then executing the step three; when the key word is English, directly executing the step (c);
step three: comparing the keyword or the translation result with the character string content of the map node name, if a certain map node is found, and the character string content of the node name is completely consistent with the keyword, directly returning the node as the mapping result of the keyword, otherwise, executing the step IV;
step IV: querying a wiki database, and if the keyword has a corresponding entry page in the wiki database, and the entry pages with part of the graph nodes and the entry pages of the keyword have hyperlinks pointing to the same entry page together, returning the graph nodes with the most number of common entry links of the wiki entry pages and the wiki entry pages of the keyword as mapping results; if the keyword does not have the same-name entry in the wiki, or the entry pages of all the graph nodes and the entry pages of the keyword do not have the common hyperlink object, executing the step (c);
step five: calling the api of Wikipedia to access to obtain at most 10 Wikipedia entry pages most related to the keywords, combining the entry hyperlinks in the pages into a set, and executing a step (sixthly);
step (c): if the number of the pages searched by the api is not 0 and the vocabulary entry pages with partial graph nodes and the hyperlink set obtained in the step (c) have common links, returning the graph nodes with the maximum number of the common vocabulary entry links with the link set as mapping results; and if the number of the pages searched by the api is 0, or the link set obtained in the step (c) still has no common link with the entry pages of all the graph nodes, the keyword mapping fails.
In step a3, the degree of matching between the keywords of the project group instance and the expert instance is calculated by using the sum of the path similarities between each two nodes in the set of mapping nodes of the keywords of the two instances on the knowledge graph. On the used knowledge graph, the distribution form of the nodes has certain hierarchy, so the academic knowledge graph node set N of the project group instancesPEach node np iniAnd count thereof cpiWith each node ne in the expert instance academic knowledge graph node set NiAnd its count ceiFirstly, all paths from the top layer to the nodes are obtained according to the map, and the coincidence on the paths is found in all path pairsThe similarity between the pair of paths with the most nodes is taken as the similarity sim (np) between the two nodesi,nei) The final keyword match between instances is calculated as follows:
Figure BDA0002515578180000091
in step a4, the domain matching degree between the project group instance and the expert instance is calculated as follows: the lowest layers of the subject field and the technical field in the example of the project group are respectively taken out, for example, chemical/analytical chemistry/spectral analysis, then spectral analysis is taken out and respectively used as FpriAnd FptiTwo sets; for the expert example, the recorded subject field and technical field contents are directly used as Fer respectivelyiAnd FetiTwo sets. Separately find FpriAnd FeriIntersection of and FptiAnd FetiThe more the intersection contains the domain information, the higher the domain matching degree.
The method for centrally selecting experts as final results described in step 3 comprises the following specific steps:
step a: finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples in the alternative set to form a set EsThe remaining expert instances then form another set Et. When an expert is selected to add the final result, set E is selectedsFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EsDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all item groups for calculating the average ranking, if anyMultiple experts have the same remaining gap sum and the highest average rank, and are randomly selected from them. In the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set EsDelete and add to set Et. Repeating the above process of selecting an expert to add to the final result until EsSufficient expertise is available for the empty or all project groups;
step b: when an expert is selected to add the final result, set E is selectedtFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EtDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EtIs deleted. Repeating the above process of selecting an expert to add to the final result until EtSufficient expertise is available for the empty or all project groups.
The method for supplementing the condition that the expert quits the review in the step 4 comprises the following specific steps:
step (1): for all experts who actively or passively quit the review of some project groups, deleting the experts from all the alternative sets and final results of the project groups which quit the review;
step (2): finding out all expert examples existing in the final extraction result of any project group, removing the expert examples with the occurrence frequency reaching the upper limit in the final extraction result, and combining the rest expert examples into a setAnd Ea. When an expert is selected to add the final result, set E is selectedaFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups according to the residual vacancy number from large to small, and taking the project group with the top rank to calculate the average rank of the matching degree; the number of the project groups used for calculating the average ranking is limited within the range that the total number of the project groups reviewed by the expert does not exceed the set upper limit of the review task. After the above calculation is completed for each expert, the expert instance with the highest average ranking is selected to be added into the final result of the project group for calculating the average ranking of the expert instance, and the set E is selectedaDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EaIs deleted. Repeating the above process of selecting an expert to add to the final result until EaSufficient expertise is available for the empty or all project groups.
And (3): finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples which exist in the alternative set and are not contained by the final result of any project group to form a set EsThe remaining expert instances then form another set Et. When an expert is selected to add the final result, set E is selectedsFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert examples from the setAnd EsDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set EsDelete and add to set Et. Repeating the above process of selecting an expert to add to the final result until EsSufficient expertise is available for the empty or all project groups;
and (4): when an expert is selected to add the final result, set E is selectedtFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EtDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EtIs deleted. Repeating the above process of selecting an expert to add to the final result until EtSufficient expertise is available for the empty or all project groups.
Examples
In step 1, a complete set of project group instances as shown in table 1 is obtained by reading the project group data corresponding to one extraction task in the database:
TABLE 1 complete set of example sets of item groups
Figure BDA0002515578180000111
Figure BDA0002515578180000121
In the execution process of the step 2, when the step a1 is executed, for a certain project group instance, firstly, each keyword is subjected to the steps of (i) - (sixth) to find the academic knowledgegraph node most similar to the keyword. For example, the translation result "feature selection" of the keyword "feature screening" of the item group with id 4 has the same-name node in the graph, so the same-name graph node "laser science" is returned in step three; after the keyword "molybdenum disulfide" in the project group 1 is translated into "molybdenum disulide" through *** translation, ten entry pages returned by calling api of the wikipedia have the most number of common links with the entry page of the graph node "inorganic chemistry", so that the graph node "inorganic chemistry" is returned in step (c). After step a1 is performed, the mapping of keywords to academic knowledgegraph nodes as in table 2 can be obtained and the set of academic knowledgegraph nodes for the project group as in table 3 is obtained (taking project group 1 as an example):
table 2 maps f keyword → node
Figure BDA0002515578180000122
Figure BDA0002515578180000131
TABLE 3 academic knowledgegraph node set for project groups
Figure BDA0002515578180000132
After step A2 is performed, the set of academic knowledgegraph nodes for all expert instances is obtained in the same form. Assume that a set of academic knowledge graph nodes for expert example a is obtained as shown in table 4.
TABLE 4 academic knowledgegraph node set of experts
Figure BDA0002515578180000133
When the step a3 is executed, the keyword matching degree between the project group and all experts is calculated, taking the node statistical similarity of the project group 1 and the node statistical probability of the expert a as an example, all paths that the nodes on both sides reach from the node on the top layer in the knowledge graph are respectively obtained, as shown in fig. 1, and the maximum path similarity, that is, the number of nodes that are most likely to repeat on the path is obtained by comparison, and the maximum value is taken. In the legend, both sides of the node have unique paths, so the most similar pair of paths is "diagnostic medical knowledge" is- < -radiology < - - - - - < -medicine < - - - "and" soft tissue path "is- < - > -path < - - -medicine < - > -number of identical nodes is 1. And calculating according to a certain weight to obtain a result of path similarity between the two nodes, wherein the result is 0.25. In the case of project group 1 and expert a, the final result needs to be multiplied by the node count 2 of the project group and the node count 1 of the expert as the similarity of the project group and the expert on the two nodes. And calculating the similarity result for all node pairing modes between the expert and the project group, and taking the final result of the result accumulation as the keyword matching degree between the project group and the expert.
In step a4, assuming that the subject field of expert a is "pharmacy" and the technical field is "novel drug delivery formulation technology", it is clear that the bottommost field, which coincides both in subject field and technical field with project group 3, has the highest field match of 16. The other project groups do not have any overlapped bottommost layer fields, so the field matching degree is the basic value 12.
After the step a5 is executed, the matching degree of the keywords calculated in the step A3 and the matching degree of the domains obtained by comparing the lowest layer overlapping conditions of the two domains in the step a4 are multiplied by respective weights and added, and the subject background matching degree between the project group instances and all the expert instances is calculated.
When the step a6 is executed, for each project group instance, the expert instances may be simply ranked according to the calculated expert matching degree score, and a plurality of expert instances ranked before are selected as candidate experts for the project group according to the requirement of the current extraction. In this example, the ranking range k is set to 5.
After the above steps are completed, a candidate expert set consisting of experts with a matching degree ranking range of 5 or more can be obtained for each project group instance. Assuming that the lower limit L of the three project groups in this example is set to 2, each project group of this evaluation activity needs 3 experts for evaluation, the upper limit of each expert evaluation project group is 20, and the candidate experts and the filling condition of each project group are shown in tables 5 to 7:
TABLE 5 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 1
Expert id Ranking Whether or not it has been selected
10 1 Whether or not
18 2 Whether or not
16 3 Whether or not
25 4 Whether or not
32 5 Whether or not
TABLE 6 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 2
Expert id Ranking Whether or not it has been selected
10 1 Whether or not
18 2 Whether or not
25 3 Whether or not
16 4 Whether or not
32 4 Whether or not
TABLE 7 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM 3
Expert id Ranking Whether or not it has been selected
32 1 Whether or not
12 1 Whether or not
13 3 Whether or not
16 3 Whether or not
11 3 Whether or not
88 3 Whether or not
Thereafter in the course of executing step 3, in step a, first all expert instance components that appear in the alternative set a number of times greater than or equal to the lower limit 2 are foundSet, in this example this set Es10,18,16,25,32, and the remaining expert set EtIs {12,13,11,88 }.
When going from EsWhen one expert is selected to be added into the final result, for each expert in the set, all the project groups which contain the expert example in the corresponding candidate set and do not obtain enough needed experts are found, the project groups are sorted from large to small according to the residual vacancy number, and the part with the top rank and the number not exceeding the upper limit of the time is selected to calculate the average matching rank. Taking the expert 10 as an example, the item groups including the expert 10 in the candidate set are the item groups 1 and 2. Since the two item groups are both 3 empty, the ordering result can be 1, 2 or 2, 1. Since the upper limit of the number of the expert review project groups is 20, the average ranking of (1+1)/2 is 1, which can be calculated by directly using all project groups in the ranking result. Similarly for other experts, an average rank of 2 for expert 18, 3.33 for expert 16, 3.5 for expert 25, and 3.33 for expert 32 is available, thus choosing to add expert 10 to the final results for project groups 1 and 2 that participate in the average rank calculation for expert 10, and taking expert 10 from EsIs deleted. Thereafter, each time an expert needs to be selected to add to the final result, the expert is selected in the manner described above. In this example, the expert 18 is first added to the project groups 1, 2 thereafter, and thereafter the expert 16 is randomly selected to add to the project groups 1, 2, 3 because the expert 16 has the same average rank and vacancy as the expert 32. At this time, if one expert is selected to be added to the final result, E is addedsThe remaining experts 25,32 in the candidate set and the remaining empty item groups are only item group 3, i.e. the number of item groups available for calculating the average ranking is less than the lower limit 2, hence from EsDelete experts 25,32 and add EtIn (1). At this time due to EsIf it is empty and not all project groups have all required experts, step b is entered.
In step b, when the slave E is to be operatedtWhen choosing one expert from {12,13,11,88,25,32} to add to the final result, the average ranking is calculated in the same way for each expert in the set. Taking expert 12 as an example, corresponding to the alternative setThe project group including the expert 12 and having a vacancy is only the project group 3, and thus the average ranking of the expert 12 at this time is easily obtained as 1. After the average ranking of all experts in the set is calculated, any one of experts 12 and 32 with the highest average ranking and the same sum of the remaining vacancies of all the project groups for calculating the average ranking is selected, added to the result of project group 3, and the result E is obtainedtIs deleted. Thereafter, the average ranking is calculated in the same manner and the remaining experts of experts 12 and 32 are added to the final result of project group 3. All project groups at this point acquire the required expertise and step 3 is therefore complete. The whole process is shown in Table 8:
TABLE 8 operation of the exemplary cases in steps a-b
Figure BDA0002515578180000161
By the mode, the finally selected experts are ensured to be concentrated as much as possible on the premise of not excessively influencing the overall matching degree.
And (4) when the situation that the expert actively or passively quits the review occurs, supplementing according to the steps (1) to (4) in the step 4. Assuming that the experts 16 in the project group 2 and the project group 3 quit the review, after the step (1) is executed to delete from the candidate set and the final result, the candidate experts and the filling condition of each project group are shown in tables 9-11:
TABLE 9 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 1
Expert id Ranking Whether or not it has been selected
10 1 Is that
18 2 Is that
16 3 Is that
25 4 Whether or not
32 5 Whether or not
TABLE 10 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM GROUP 2
Expert id Ranking Whether or not it has been selected
10 1 Is that
18 2 Is that
25 3 Whether or not
32 4 Whether or not
TABLE 11 EXPERT ALTERNARY AND FILL CONDITIONS FOR ITEM 3
Expert id Ranking Whether or not it has been selected
32 1 Is that
12 1 Is that
13 3 Whether or not
11 3 Whether or not
88 3 Whether or not
In step (2), all expert examples already existing in the final extraction result of any project group are obtained, the expert examples with the occurrence frequency reaching the upper limit in the final extraction result are removed, and the rest expert examples are combined into a set Ea{10,18,16,32,12}. When an expert is selected to add the final result, set E is selectedaFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups from large to small according to the residual vacancy number, and taking the project group with the top rank to calculate the average rank of the matching degree. Taking the expert 32 as an example, the item group that includes the expert 32 in the corresponding candidate set and does not add the expert 32 to the final result and does not obtain enough experts needed is only the item group 2, and the average ranking is calculated to be 4. In this example due to EaThe number of sets of items available to other experts to calculate the average ranking is all 0, so these experts are referred to as EaAnd (4) removing. The expert 32 now has the highest average ranking, and adds this expert instance to the final result of project group 2, from E, which is used to calculate the average rankingaAnd (4) removing. At this time EaIf empty, go to step (3).
In step (3), all expert instances with the frequency of occurrence in the alternative set being more than or equal to the lower limit 2 are found out from all expert instances which exist in the alternative set and are not contained by the final result of any project group, and a set E is formeds{25}, the remainder forming the set Et{13,11,88}. When an expert is selected to add to the final result, there are no sets of items for the expert 25 that include the expert 25 in the corresponding candidate set and have not obtained enough experts needed, i.e., the number of sets of items from which the average ranking is calculated is 0, less than the lower limit of 2, so the expert 25 is selected from EsIs deleted. At this time due to EsIf empty, go to step (4). In step (4), set E of expertst{13,11,88}, when an expert is selected to join the final result, taking expert 11 as an example, the corresponding candidate set contains expert 11 and is still emptyThe missing project group is only project group 3, so the average ranking of expert 11 at this time is readily available as 3. The average ranking of both expert 11 and expert 88 was calculated to be 3 using the same method. Since the average rank is the same and the sum of the remaining vacancies of all the project groups used to calculate the average rank is also the same, the final result of joining the project group 3 is randomly selected 11 from them, all the project groups obtain the required experts at this time, and step (4) is ended. By the method, all project groups are supplemented by experts, and new experts are prevented from being added as much as possible. The above procedure is shown in table 12 below:
table 12 illustrates the operation of the case in steps (1) to (4)
Figure BDA0002515578180000181
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.

Claims (7)

1. A method for intensively extracting experts based on a discipline knowledge graph is characterized by comprising the following specific steps:
step 1: extracting tasks of experts aiming at a certain evaluation activity to obtain all to-be-evaluated project groups, various fields thereof and Chinese and English keyword information;
step 2: respectively calculating the matching degrees between all the project groups and all the experts to obtain an expert alternative set with high matching degree for each project group;
and step 3: on the premise of ensuring that the number of the project groups reviewed by the experts does not exceed the upper limit set by the extraction task, selecting experts from all expert alternative sets in a centralized manner as a final extraction result for all the project groups in the extraction task;
and 4, step 4: if the experts actively or passively quit the review after receiving the notice participating in the review activity, the vacant project groups are intensively supplemented, so that the number of the experts in the final result of each project group meets the requirement again.
2. The method for extracting experts in a discipline knowledge graph set based on claim 1, wherein the step 2 specifically comprises:
step A1: searching academic knowledge graph nodes with the highest association degree for each keyword by using Chinese and English keyword sets in the project group examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each project group example;
step A2: searching academic knowledge graph nodes with the highest association degree for each keyword by using Chinese and English keyword sets in expert examples, establishing mapping, and acquiring a set of all academic knowledge graph nodes corresponding to all keywords below each expert example;
step A3: calculating the matching degree of the project group instances and each expert instance between each two of the project group instances and the expert instances on the keywords by using the academic knowledge graph node sets of the project group instances and the expert instances obtained in the step A1 and the step A2;
step A4: calculating the matching degree of the project group examples and the expert examples in various fields by using the various field information in the project group examples and the various field information in the expert examples;
step A5: multiplying the keyword matching degree in the step A3 and the field matching degree in the step A4 by respective weights and summing the results to serve as subject matching degrees between the project group examples and the expert examples;
step A6: according to the subject matching degree obtained in the step A5, at most k experts are set in the alternative set of the project group example, and each project group example p is subjected to the classification of all the expert examplesiThe first k bits of the matching degree sequence, and the k expert examples are formed into each item group example piCorresponding expert set Ei(ii) a For each project group instance piAll allocate k expert instances with highest matching degree as candidate experts of the project group, and form and project group instances p1~pnOne-to-one correspondence alternative set E1~En(ii) a Wherein k is a positive integer within 100.
3. The method for extracting experts from a set of discipline knowledge graphs as claimed in claim 2, wherein the searching for the academic knowledge graph node with the highest association degree for each keyword and establishing the mapping is as follows: establishing a mapping f from the keywords to the academic knowledge map nodes, namely keyword → node, and realizing the mapping by analyzing the vocabulary entry page data of the Wikipedia; specifically, using vocabulary entry page data of wikipedia, finding and returning an academic knowledge graph node most similar to a keyword, executing the following steps:
the method comprises the following steps: querying local and network wiki databases, recording links pointing to wiki vocabulary entries in wiki vocabulary entry pages corresponding to each graph node, and caching the obtained graph nodes and link set binary groups into files; after the step is executed once, the step is executed again only when a new graph node is added;
step two: calling Google translation under the condition that the keywords are Chinese, translating the keywords into English, and then executing the step three; when the key word is English, directly executing the step (c);
step three: comparing the key words or the translation results with the character string contents of the map node names, if the character string contents of the node names of a certain map node are completely consistent with the key words, directly returning the node as the mapping result of the key words, otherwise, executing the fourth step;
step IV: querying a wiki database, and if the keyword has a corresponding entry page in the wiki database, and the entry page with part of map nodes and the entry page of the keyword have hyperlinks pointing to the same entry page together, returning the map node with the most common links as a mapping result; if the keyword does not have the same-name entry in the wiki, or the entry pages of all the graph nodes and the entry pages of the keyword do not have the common hyperlink object, executing the step (v);
step five: calling the api of the Wikipedia to access to obtain at most 10 Wikipedia entry pages most corresponding to the keywords, combining the entry hyperlinks in the pages into a set, and executing a step (sixthly);
step (c): if the number of the pages searched by the api is not 0 and the vocabulary entry pages with part of the graph nodes and the hyperlink set obtained in the step (v) have common links, returning the graph node with the maximum number of the common vocabulary entry links with the link set as a mapping result; and if the number of the pages searched by the api is 0, or the link set obtained in the step (c) still has no common link with the entry pages of all the graph nodes, the keyword mapping fails.
4. The method for extracting experts based on discipline knowledge graph set as claimed in claim 2, wherein the step a3 is to calculate the matching degree between each two instances of the project group and each expert instance on the keywords: calculating the path similarity of the mapping result of the keywords of the project group instance p and the expert instance e on the map; on the used knowledge graph, the distribution form of the nodes is tree-shaped, and the nodes have certain hierarchy, so that the academic knowledge graph node set N of the project group instance pPEach node np iniAnd count thereof cpiAcademic knowledge graph node set N with expert instance eeEach node ne iniAnd its count ceiObtaining all paths from the top layer to the nodes according to the graph, finding a pair of paths with the most overlapped nodes on the paths in all the path pairs of the two sides, and taking the similarity between the path pair as the similarity sim (np) between the two nodesi,nei) (ii) a According to the similarity among all the nodes obtained by calculation, the final keyword matching degree among the examples is calculated according to the following mode:
Figure FDA0002515578170000021
5. the method for extracting experts based on discipline knowledge graph set as claimed in claim 2, wherein the step a4 is to calculate the matching degree between the project group instance and each expert instance in each domain as: and calculating by using the similarity between the lowest content in the subject field and the technical field of the project group example and the lowest content in the subject field and the technical field of the expert example, and giving the matching degree of the fields according to the same quantity of the content of the lowest field.
6. The method for extracting experts in a discipline knowledge graph set based on claim 1, wherein the step 3 specifically comprises:
step a: finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples in the alternative set to form a set EsThe remaining expert instances then form another set Et(ii) a When an expert is selected to add the final result, set E is selectedsFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EsDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in the process, when the number of the item groups which can be used for calculating the average ranking of an expert example is less than the lower limit L, the expert example is selected from the set EsDelete and add to set Es(ii) a Repeating the process of selecting an expert to add to the final result until EsSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5;
step b: when an expert is selected to add the final result, set E is selectedtFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EsDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EsDeleting; repeating the process of selecting an expert to add to the final result until EsSufficient expertise is available for the empty or all project groups.
7. The method for extracting experts in a discipline knowledge graph set based on claim 1, wherein the step 4 specifically comprises:
step (1): for all experts who actively or passively quit the review of some project groups, deleting the experts from all the alternative sets and final results of the project groups which quit the review;
step (2): finding out all expert examples existing in the final extraction result of any project group, removing the expert examples with the occurrence frequency reaching the upper limit in the final extraction result, and forming the rest expert examples into a set Ea(ii) a When an expert is selected to add the final result, set E is selectedaFinding all the project groups which contain the expert example in the corresponding candidate set, do not add the expert example into the final result and do not obtain enough required experts, sequencing the project groups from large to small according to the residual vacancy number, taking the project groups with the top rank to calculate the average matching degreeRanking; the number of the project groups used for calculating the average ranking is limited in a range which does not enable the total number of the project groups to be reviewed by the expert to exceed the set upper limit of the review task; after the above calculation is completed for each expert, the expert instance with the highest average ranking is selected to be added into the final result of the project group for calculating the average ranking of the expert instance, and the set E is selectedaDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EaDeleting; repeating the process of selecting an expert to add to the final result until EaSufficient expertise is available for the empty or all project groups;
and (3): finding out all expert examples with the times more than or equal to a certain lower limit L in the alternative set from all expert examples which exist in the alternative set and are not contained by the final result of any project group to form a set EsThe remaining expert instances then form another set Et(ii) a When an expert is selected to add the final result, set E is selectedsFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EsDeleting; and when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and randomly selecting the expert example if the plurality of experts have the same residual vacancy sum and the highest average ranking. In this process, when the number of sets of items available to an expert instance to calculate the average ranking is less than the lower limit L, the expert instance will be assigned the lower limit LExpert instances from set EsDelete and add to set Et(ii) a Repeating the process of selecting an expert to add to the final result until EsSufficient expertise is available for the empty or all project groups; wherein, L is a positive integer of 2-5;
and (4): when an expert is selected to add the final result, set E is selectedtFinding all the project groups which contain the expert examples in the corresponding candidate set and do not obtain enough required experts, sequencing the project groups from large to small according to the number of the remaining vacancies, calculating the average ranking of the matching degree by taking the parts with the top ranking and the number not exceeding the upper limit of the current time, selecting the expert example with the highest average ranking to be added into the final result of the project group for calculating the average ranking of the expert example, and adding the expert example from the set EtDeleting; when a plurality of experts have the highest average ranking, selecting an expert example with the largest residual vacancy sum of all the project groups for calculating the average ranking, and if the plurality of experts have the same residual vacancy sum and the highest average ranking, randomly selecting the expert example; in this process, when the number of sets of items available to an expert instance to compute the average ranking is equal to 0, the expert instance is selected from set EtDeleting; repeating the process of selecting an expert to add to the final result until EtSufficient expertise is available for the empty or all project groups.
CN202010474948.6A 2020-05-29 2020-05-29 Method for intensively extracting experts based on subject knowledge graph Active CN111666420B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010474948.6A CN111666420B (en) 2020-05-29 2020-05-29 Method for intensively extracting experts based on subject knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010474948.6A CN111666420B (en) 2020-05-29 2020-05-29 Method for intensively extracting experts based on subject knowledge graph

Publications (2)

Publication Number Publication Date
CN111666420A true CN111666420A (en) 2020-09-15
CN111666420B CN111666420B (en) 2021-02-26

Family

ID=72385174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010474948.6A Active CN111666420B (en) 2020-05-29 2020-05-29 Method for intensively extracting experts based on subject knowledge graph

Country Status (1)

Country Link
CN (1) CN111666420B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507138A (en) * 2020-12-28 2021-03-16 医渡云(北京)技术有限公司 Method and device for constructing disease-specific knowledge map, medium and electronic equipment

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020032735A1 (en) * 2000-08-25 2002-03-14 Daniel Burnstein Apparatus, means and methods for automatic community formation for phones and computer networks
CN101388024A (en) * 2008-10-09 2009-03-18 浙江大学 Compression space high-efficiency search method based on complex network
CN101520868A (en) * 2009-02-24 2009-09-02 上海大学 Method for applying analytic hierarchy process to reviewer information database system
CN102012911A (en) * 2010-11-19 2011-04-13 清华大学 Constrained optimization-based expert matching method and system
CN102222117A (en) * 2011-07-19 2011-10-19 四川建设网有限责任公司 Method and device for extracting expert database information
US20120011156A1 (en) * 2010-06-29 2012-01-12 Indiana University Research And Technology Corporation Inter-class molecular association connectivity mapping
CN102402732A (en) * 2010-09-14 2012-04-04 中国船舶工业综合技术经济研究院 Method and system for evaluating scientific research projects
US20120131000A1 (en) * 2010-10-21 2012-05-24 inno360, Inc. Method and apparatus for identifying talent by matching with the given technical needs and building talent profile from multiple data sources
CN102663553A (en) * 2012-04-09 2012-09-12 吴溢华 Copy edit flow system and copy edit flow method for stopping one paper for multiple journals
CN102880657A (en) * 2012-08-31 2013-01-16 电子科技大学 Expert recommending method based on searcher
CN103631859A (en) * 2013-10-24 2014-03-12 杭州电子科技大学 Intelligent review expert recommending method for science and technology projects
CN103823896A (en) * 2014-03-13 2014-05-28 蚌埠医学院 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN107092705A (en) * 2017-05-28 2017-08-25 海南大学 A kind of Semantic Modeling Method that the data collection of illustrative plates calculated, Information Atlas and knowledge mapping framework are associated based on element multidimensional frequency
CN107145559A (en) * 2017-05-02 2017-09-08 吉林大学 Intelligent classroom Knowledge Management Platform and method based on semantic technology and gameization
CN108664615A (en) * 2017-05-12 2018-10-16 华中师范大学 A kind of knowledge mapping construction method of discipline-oriented educational resource
CN108920556A (en) * 2018-06-20 2018-11-30 华东师范大学 Recommendation expert method based on subject knowledge map
US20180366013A1 (en) * 2014-08-28 2018-12-20 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
CN109992642A (en) * 2019-03-29 2019-07-09 华南理工大学 A kind of automatic method of selecting of single task expert and system based on scientific and technological entry

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020032735A1 (en) * 2000-08-25 2002-03-14 Daniel Burnstein Apparatus, means and methods for automatic community formation for phones and computer networks
CN101388024A (en) * 2008-10-09 2009-03-18 浙江大学 Compression space high-efficiency search method based on complex network
CN101520868A (en) * 2009-02-24 2009-09-02 上海大学 Method for applying analytic hierarchy process to reviewer information database system
US20120011156A1 (en) * 2010-06-29 2012-01-12 Indiana University Research And Technology Corporation Inter-class molecular association connectivity mapping
CN102402732A (en) * 2010-09-14 2012-04-04 中国船舶工业综合技术经济研究院 Method and system for evaluating scientific research projects
US20120131000A1 (en) * 2010-10-21 2012-05-24 inno360, Inc. Method and apparatus for identifying talent by matching with the given technical needs and building talent profile from multiple data sources
CN102012911A (en) * 2010-11-19 2011-04-13 清华大学 Constrained optimization-based expert matching method and system
CN102222117A (en) * 2011-07-19 2011-10-19 四川建设网有限责任公司 Method and device for extracting expert database information
CN102663553A (en) * 2012-04-09 2012-09-12 吴溢华 Copy edit flow system and copy edit flow method for stopping one paper for multiple journals
CN102880657A (en) * 2012-08-31 2013-01-16 电子科技大学 Expert recommending method based on searcher
CN103631859A (en) * 2013-10-24 2014-03-12 杭州电子科技大学 Intelligent review expert recommending method for science and technology projects
CN103823896A (en) * 2014-03-13 2014-05-28 蚌埠医学院 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
US20180366013A1 (en) * 2014-08-28 2018-12-20 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
CN107145559A (en) * 2017-05-02 2017-09-08 吉林大学 Intelligent classroom Knowledge Management Platform and method based on semantic technology and gameization
CN108664615A (en) * 2017-05-12 2018-10-16 华中师范大学 A kind of knowledge mapping construction method of discipline-oriented educational resource
CN107092705A (en) * 2017-05-28 2017-08-25 海南大学 A kind of Semantic Modeling Method that the data collection of illustrative plates calculated, Information Atlas and knowledge mapping framework are associated based on element multidimensional frequency
CN108920556A (en) * 2018-06-20 2018-11-30 华东师范大学 Recommendation expert method based on subject knowledge map
CN109992642A (en) * 2019-03-29 2019-07-09 华南理工大学 A kind of automatic method of selecting of single task expert and system based on scientific and technological entry

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507138A (en) * 2020-12-28 2021-03-16 医渡云(北京)技术有限公司 Method and device for constructing disease-specific knowledge map, medium and electronic equipment

Also Published As

Publication number Publication date
CN111666420B (en) 2021-02-26

Similar Documents

Publication Publication Date Title
US10963794B2 (en) Concept analysis operations utilizing accelerators
CN108920556B (en) Expert recommending method based on discipline knowledge graph
US7636713B2 (en) Using activation paths to cluster proximity query results
CN106709040B (en) Application search method and server
US8843470B2 (en) Meta classifier for query intent classification
US8341159B2 (en) Creating taxonomies and training data for document categorization
JP5461360B2 (en) System and method for search processing using a super unit
US8095539B2 (en) Taxonomy-based object classification
CN110189831B (en) Medical record knowledge graph construction method and system based on dynamic graph sequence
US8396879B1 (en) Ranking authors and their content in the same framework
US20040249808A1 (en) Query expansion using query logs
EP1995669A1 (en) Ontology-content-based filtering method for personalized newspapers
US20090204609A1 (en) Determining Words Related To A Given Set Of Words
KR20060017765A (en) Concept network
WO2014210387A2 (en) Concept extraction
CN112115232A (en) Data error correction method and device and server
Tao et al. Doc2cube: Allocating documents to text cube without labeled data
CN112035614B (en) Test set generation method, device, computer equipment and storage medium
US20200065395A1 (en) Efficient leaf invalidation for query execution
CN113505190B (en) Address information correction method, device, computer equipment and storage medium
CN111666420B (en) Method for intensively extracting experts based on subject knowledge graph
US9710543B2 (en) Automated substitution of terms by compound expressions during indexing of information for computerized search
CN112199461B (en) Document retrieval method, device, medium and equipment based on block index structure
CN108256086A (en) Data characteristics statistical analysis technique
Álvarez et al. A Task-specific Approach for Crawling the Deep Web.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant