CN112685574B - Method and device for determining hierarchical relationship of domain terms - Google Patents

Method and device for determining hierarchical relationship of domain terms Download PDF

Info

Publication number
CN112685574B
CN112685574B CN202110014913.9A CN202110014913A CN112685574B CN 112685574 B CN112685574 B CN 112685574B CN 202110014913 A CN202110014913 A CN 202110014913A CN 112685574 B CN112685574 B CN 112685574B
Authority
CN
China
Prior art keywords
term
hierarchical relationship
terms
matrix
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110014913.9A
Other languages
Chinese (zh)
Other versions
CN112685574A (en
Inventor
张卫
王昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110014913.9A priority Critical patent/CN112685574B/en
Publication of CN112685574A publication Critical patent/CN112685574A/en
Application granted granted Critical
Publication of CN112685574B publication Critical patent/CN112685574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a method and a device for determining domain term hierarchical relationship. The method comprises the following steps: acquiring at least two terms in the target field and definition texts corresponding to the terms; constructing a keyword matrix based on the definition text, wherein the keyword matrix is used for representing keyword characteristics in the definition text, and performing spectral cluster analysis on the keyword matrix to obtain a first hierarchical relationship of the at least two terms; constructing a term co-occurrence matrix based on the definition text, wherein the term co-occurrence matrix is used for representing co-occurrence characteristics of the at least two terms in the definition text, and performing formal concept analysis on the term co-occurrence matrix to obtain a second hierarchical relationship of the at least two terms; and fusing the second hierarchical relationship and the first hierarchical relationship to obtain the target hierarchical relationship of the at least two terms. The technical scheme of the embodiment of the application can improve the accuracy of the domain term hierarchical relationship.

Description

Method and device for determining hierarchical relationship of domain terms
Technical Field
The application relates to the technical field of computers and artificial intelligence, in particular to a method and a device for determining domain term hierarchical relationship.
Background
In the field term hierarchical relationship determination scenario, similarity of text content in the field is generally used to extract term semantic relationships, for example, a multi-level term clustering method, and the main content is to discuss terms semantic representation, cluster number determination, term clustering marking, category label extraction and other works on the basis of text corpus.
However, the existing term clustering marking method is difficult to overcome the high-dimensional sparse semantic space formed by large-scale terms, and the precision and stability of term clustering marking cannot be guaranteed well, so that the accuracy of the hierarchical relationship of the field terms cannot be improved.
Based on this, how to improve the accuracy of domain term hierarchical relationship is a technical problem to be solved.
Disclosure of Invention
Embodiments of the present application provide a method, an apparatus, a computer program product, or a computer program, a computer readable medium, and an electronic device for determining a domain term hierarchical relationship, so that accuracy of the domain term hierarchical relationship may be improved at least to some extent.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.
According to an aspect of the embodiments of the present application, there is provided a method for determining a hierarchical relationship of domain terms, the method including: acquiring at least two terms in the target field and definition texts corresponding to the terms; constructing a keyword matrix based on the definition text, wherein the keyword matrix is used for representing keyword characteristics in the definition text, and performing spectral cluster analysis on the keyword matrix to obtain a first hierarchical relationship of the at least two terms; constructing a term co-occurrence matrix based on the definition text, wherein the term co-occurrence matrix is used for representing co-occurrence characteristics of the at least two terms in the definition text, and performing formal concept analysis on the term co-occurrence matrix to obtain a second hierarchical relationship of the at least two terms; and fusing the second hierarchical relationship and the first hierarchical relationship to obtain the target hierarchical relationship of the at least two terms.
According to an aspect of an embodiment of the present application, there is provided a device for determining a hierarchical relationship of domain terms, the device including: an acquisition unit, configured to acquire at least two terms in a target field and definition text corresponding to each term; the first construction unit is used for constructing a keyword matrix based on the definition text, wherein the keyword matrix is used for representing keyword characteristics in the definition text, and performing spectral cluster analysis on the keyword matrix to obtain a first hierarchical relationship of the at least two terms; the second construction unit is used for constructing a term co-occurrence matrix based on the definition text, wherein the term co-occurrence matrix is used for representing co-occurrence characteristics of the at least two terms in the definition text, and performing formal concept analysis on the term co-occurrence matrix to obtain a second hierarchical relationship of the at least two terms; and the fusion unit is used for fusing the second hierarchical relationship and the first hierarchical relationship to obtain the target hierarchical relationship of the at least two terms.
According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the method of determining the hierarchical relationship of domain terms described in the above embodiments.
According to an aspect of the embodiments of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method of determining a hierarchical relationship of domain terms as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of determining domain term hierarchical relationships as described in the above embodiments.
In the technical solutions provided in some embodiments of the present application, a keyword matrix for characterizing a keyword feature in the definition text and a term co-occurrence matrix for characterizing co-occurrence features of the at least two terms in the definition text are respectively constructed through definition texts corresponding to each term, and further, a first hierarchical relationship and a second hierarchical relationship of the at least two terms are obtained through spectral cluster analysis on the keyword matrix and formal concept analysis on the term co-occurrence matrix, and finally, the second hierarchical relationship and the first hierarchical relationship are fused to obtain a target hierarchical relationship. Because the accuracy of the hierarchical relationship of the term can be effectively improved by performing formal concept analysis on the term co-occurrence matrix, the accuracy of the hierarchical relationship of the term (namely the target hierarchical relationship of the term) in the field can be further improved by fusing the second hierarchical relationship with the first hierarchical relationship under the condition that the recall rate is met by the hierarchical relationship of the term in the field.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of embodiments of the present application may be applied;
FIG. 2 illustrates a domain term hierarchical relationship model rendering according to one embodiment of the present application;
FIG. 3 illustrates a flow chart of a method of determining domain term hierarchical relationships according to one embodiment of the present application;
FIG. 4 illustrates a detailed flow diagram of constructing a keyword matrix according to one embodiment of the present application;
FIG. 5 shows a detailed flow diagram of spectral cluster analysis of the keyword matrix according to one embodiment of the present application;
FIG. 6 shows a detailed flow chart of spectral cluster analysis of the keyword matrix by the number of clusters according to one embodiment of the present application;
FIG. 7 illustrates a detailed flow diagram of determining a superior term in the target term class cluster according to one embodiment of the present application;
FIG. 8 illustrates a model representation of multiple clustering of domain terms according to one embodiment of the present application;
FIG. 9 illustrates a detailed flow diagram of constructing a term co-occurrence matrix according to one embodiment of the present application;
FIG. 10 illustrates a detailed flow diagram of formal concept analysis of the term co-occurrence matrix according to one embodiment of the present application;
FIG. 11 illustrates an overall flow diagram for determining domain term hierarchical relationships according to one embodiment of the present application;
FIG. 12 illustrates a block diagram of a domain term hierarchical relationship determination apparatus according to one embodiment of the present application;
fig. 13 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present application. One skilled in the relevant art will recognize, however, that the aspects of the application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It should be noted that: references herein to "a plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Also to be described is: the terms first, second and the like in the description and in the claims and drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so used may be interchanged where appropriate such that the embodiments of the present application described herein may be implemented in sequences other than those illustrated or described.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application may be applied.
As shown in fig. 1, the system architecture may include terminal devices (such as one or more of the smart phone 101, tablet 102, and portable computer 103 shown in fig. 1, but may also be other terminal devices with positioning functions, such as a water meter, an electricity meter, etc.), a network 104, and a server 105. The network 104 is the medium used to provide communication links between the terminal devices and the server 105. The network 104 may include various connection types, such as wired communication links, wireless communication links, and the like.
In one embodiment of the present application, the server 105 may obtain a request for determining a hierarchical relationship of a domain term from a terminal device, after obtaining at least two terms in a target domain and a definition text corresponding to each term, the server 105 first constructs a keyword matrix based on the definition text, where the keyword matrix is used to characterize a keyword feature in the definition text, performs spectral cluster analysis on the keyword matrix to obtain a first hierarchical relationship of the at least two terms, then, the server 105 constructs a term co-occurrence matrix based on the definition text, where the term co-occurrence matrix is used to characterize a co-occurrence feature of the at least two terms in the definition text, performs formal concept analysis on the term co-occurrence matrix, and obtains a second hierarchical relationship of the at least two terms, and finally, the server 105 fuses the second hierarchical relationship and the first hierarchical relationship to obtain the target hierarchical relationship of the at least two terms.
The determination means for domain term hierarchical relationship is generally provided in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the determination scheme of the domain term hierarchical relationship provided in the embodiments of the present application.
It should be noted that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. According to the implementation requirement, the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud services.
Before describing the determination scheme of domain term hierarchical relationship in the present application, a brief description will be given below of the domain terms and the concept of domain term hierarchical relationship in conjunction with fig. 2.
Referring to FIG. 2, a domain term hierarchical relationship model rendering diagram is shown, according to one embodiment of the present application.
First, domain terms set forth in the present application may refer to a term of art (or a professional word) within a certain professional field, such as "diagnosis", "treatment", "disease", "tumor", etc., which are terms in the medical field. Also, for example, "government authorities", "inspection homes", "patent authorities", "law enforcement officers", "police", etc., belong to the terminology in the government affairs field. The domain term hierarchical relationship may refer to a relationship between terms and a domain, such as a co-ordinate relationship, an upper-lower relationship (subordinate relationship or father-son relationship), etc., for example, in the medical domain, "diagnosis" and "treatment" belong to the co-ordinate relationship, and "disease" and "tumor" belong to the upper-lower relationship (i.e., "disease" is an upper term and "tumor" is a lower term).
In fig. 2, the domain term set 201 includes terms "B, C, D, E, F, G, H, I, J, K, L, M, N, O", and the domain term hierarchical relationship corresponding to the terms "B, C, D, E, F, G, H, I, J, K, L, M, N, O" is shown as 202 in fig. 2, for example, the term hierarchical relationship of the term "B, D, N" is a context relationship, and the term hierarchical relationship of the term "B, E, J" is a parity relationship.
The implementation details of the technical solutions of the embodiments of the present application are described in detail below:
fig. 3 shows a flowchart of a method for determining domain term hierarchical relationships according to an embodiment of the present application, which may be performed by a device having a calculation processing function, such as the server 105 shown in fig. 1. Referring to fig. 3, the method for determining the hierarchical relationship of terms in the field at least includes steps 310 to 370, which are described in detail as follows:
in step 310, at least two terms within the target field are acquired, along with definition text corresponding to each term.
In this application, the target field may refer to a certain professional field, such as a medical professional field, a military professional field, an artificial intelligence professional field, a government professional field, etc., which is not excessively limited in this application. Further, terms in the target field may refer to a plurality of terms in a certain area of expertise, such as terms of "diagnosis", "treatment", "disease", "tumor", etc. in the medical professional field.
In this application, definition text may be a definition or interpretation for terms, which may be obtained from a network platform, for example, where encyclopedia text (such as hundred degrees encyclopedia, also such as wikipedia, also such as a dictionary of literature, etc.) of domain terms is crawled in the network, where one term may correspond to one definition text.
In the method, the corpus can be enhanced by taking the definition text of the terms as the corpus for determining the domain term hierarchical relationship, and the accuracy of the domain term hierarchical relationship is improved.
With continued reference to fig. 3, in step 330, a keyword matrix is constructed based on the defined text, where the keyword matrix is used to characterize the keyword features in the defined text, and spectral cluster analysis is performed on the keyword matrix, so as to obtain a first hierarchical relationship of the at least two terms.
In one embodiment of the present application, constructing a keyword matrix based on the definition text may be performed in accordance with the steps shown in fig. 4.
Referring to FIG. 4, a detailed flow diagram of constructing a keyword matrix according to one embodiment of the present application is shown. Specifically, steps 331 to 332 are included:
in step 331, extracting a target keyword from the definition text through a bag of words model, and determining a weight of the target keyword in the definition text, wherein the weight is used for representing the importance degree of the target keyword in the definition text.
In step 332, the keyword matrix is constructed based on the weights.
In the application, the target keywords extracted from the definition text through the bag-of-words model can reflect the theme of the definition text to a certain extent, and further, the characteristics of the terms corresponding to the definition text can be embodied by determining the weight of the target keywords in the definition text.
It should be noted that the weight of the target keyword in the definition text may be determined according to the existence of the target keyword in the definition text, when a certain target keyword does not exist in the definition text, the weight of the target keyword in the definition text may be 0, and if the frequency of occurrence of a certain target keyword in the definition text is higher, the weight of the target keyword in the definition text may be higher.
Based on the weights, the keyword matrix is constructed, for example, two terms A and B exist, corresponding definition texts are a and B, and target keywords extracted from the definition texts a and B through a word bag model are c1, c2, c3 and c4. The weights of the target keywords c1, c2, c3 and c4 in the definition text a are respectively 0.1, 0, 0.4 and 0.5, and the weights of the target keywords c1, c2, c3 and c4 in the definition text b are respectively 0.7, 0.1, 0 and 0.2. Based on this, based on defining the texts a and b, the keyword matrix constructed may be: { v a ,v b }, where v a =[0.1 0 0.4 0.5],v b =[0.7、0.1、0、0.2]。
In one embodiment of the present application, performing spectral clustering analysis on the keyword matrix to obtain the first hierarchical relationship of the at least two terms may be performed according to the steps shown in fig. 5.
Referring to fig. 5, a detailed flow diagram of spectral cluster analysis of the keyword matrix is shown, according to one embodiment of the present application. Specifically, steps 333 to 334 are included:
in step 333, the keyword matrix is sequentially subjected to linear dimension reduction and nonlinear dimension reduction, so as to obtain a dimension reduction keyword matrix.
In step 334, determining the number of clusters through the dimension-reduction keyword matrix, so as to perform spectral cluster analysis on the keyword matrix according to the number of clusters, thereby obtaining a first hierarchical relationship of the at least two terms.
In this embodiment, the linear dimension reduction of the keyword matrix can be achieved through the algorithm of principal component analysis, the high-dimension feature can be mapped to the low-dimension orthogonal feature through principal component analysis, the variance of the projection of the data on the orthogonal feature is calculated, the larger the variance is, the more information the orthogonal feature contains, and the dimension reduction effect can be achieved by deleting the data in the direction of the small feature value.
In this embodiment, the nonlinear dimension reduction of the keyword matrix can be realized through a T-distribution random neighborhood embedding algorithm, which enables a similar object to have higher probability to be selected through probability distribution among high-dimensional data points, and simultaneously maps the object points to a low-dimensional space to construct probability distribution, so that the two are similar as much as possible, and the effect of dimension reduction is achieved.
In the method, the keyword matrix is subjected to linear dimension reduction and nonlinear dimension reduction in sequence, so that the dimension reduction efficiency of the keyword matrix can be ensured, and the dimension reduction precision can be ensured.
In the method, the dimension of the keyword matrix is reduced, so that the dimension-reduced keyword matrix is obtained, and therefore the high-dimension sparse semantic space of the keyword matrix can be compressed to the low-dimension dense semantic space of the dimension-reduced keyword matrix, semantic features are more concentrated, and the clustering number is further determined.
It should be noted that, the number of clusters proposed in the present application refers to the number of term clusters obtained after clustering the at least two terms.
It should be noted that in other embodiments of the present application, the keyword matrix may be reduced in dimension only by linear dimension reduction, or may be reduced in dimension only by nonlinear dimension reduction.
In this embodiment, according to the number of clusters, spectral clustering analysis is performed on the keyword matrix to obtain the first hierarchical relationship of the at least two terms, which may be performed according to the steps shown in fig. 6.
Referring to fig. 6, a detailed flowchart of spectral cluster analysis of the keyword matrix according to the number of clusters is shown, according to one embodiment of the present application. Specifically, steps 3341 to 3342 are included:
In step 3341, a laplace matrix of the keyword matrix is extracted, and a cluster analysis is performed on feature vectors of the laplace matrix to obtain term clusters of the clustering number, where each term cluster includes at least one term.
In the application, according to the number of clusters, the keyword matrix is subjected to spectral cluster analysis, namely, under the guidance of the number of clusters, the keyword matrix is subjected to spectral cluster division into term class clusters. Spectral clustering is a clustering algorithm derived from the idea of graph theory, and the concentrated data points are regarded as vertexes of an undirected weighted graph, so that the similarity relationship among the data points is converted into weighted edges of the undirected weighted graph, and the clustering of the data sets is converted into the segmentation problem of the undirected weighted graph. The core of the spectral clustering is to cluster the eigenvectors of the Laplace matrix of the data set so as to achieve more accurate dividing effect, and the specific steps are as follows:
step 1, input keyword matrix = { v 1 ,v 2 ,…,v m And the number of clusters is l.
Step 2, mapping the keyword matrix, and defining any two points v i 、v j Weights w between ij To represent the similarity between two points, w when there is a connecting edge between the data points ij >0, otherwise w ij =0, and the undirected graph properties are such that w ij =w ji . In addition, the edge weight of the graph is obtained through Gaussian distance, and the calculation formula is as follows:
step 3, calculating a similarity matrix S through the edge weights of the data set m And diagonal matrix D m Thereby obtaining a Laplace matrix (L m ) And subjecting it to normalization treatment (L std ):
Step 4, calculating and obtaining L std The first e largest eigenvalues and eigenvectors are collected by taking the eigenvectors as column vectors to obtain a matrix u m×e ={u 1 ,u 2 ,…,u e And normalize it to obtain a new matrix T m×e The canonical formula is:
T ij =u ij /(∑ e u ij 2 ) 1/2
step 5, for T m×e Line vector of (a) makesClustering by K-means, and outputting term cluster C 1 ,C 2 ,…,C l The terms in various clusters are the term sets divided by clustering.
In the application, the keyword matrix converted from the definition text corresponding to the large-scale term is high-dimensional and sparse, and the core thought of spectral clustering is to realize clustering of the sample data feature vector by mapping the data in the high-dimensional space to the low dimension, so that accurate and stable division of the high-dimensional and sparse keyword matrix is achieved, and finally, the clustering effect of the large-scale term is improved.
With continued reference to fig. 6, in step 3342, for each target term class cluster, an upper term is determined in the target term class cluster, and other terms within the target term class cluster than the upper term are determined as lower terms of the upper term.
In this embodiment, for each target term class cluster, determining a superior term in the target term class cluster may be performed according to the steps shown in fig. 7.
Referring to FIG. 7, a detailed flow diagram of determining a superior term in the target term class cluster is shown, according to one embodiment of the present application. Specifically, steps 3343 to 3345 are included:
in step 3343, a sub-keyword matrix is constructed based on the definition text corresponding to each term in the target term class cluster.
In step 3344, a social network based on term semantic similarity is constructed based on the sub-keyword matrix to calculate term centrality of each term in the target term class cluster.
In step 3345, the term with the highest degree of centrality is determined as the upper term.
In the present application, based on the sub-keyword matrix, the term centrality of each term in the target term class cluster may be calculated by constructing term feature vectors with terms as attributes in each sub-keyword matrix, calculating the similarity of the term feature vectors by cosine algorithm, so as to obtain a term matrix representing the similarity between terms, and finally inputting the term matrix into a social network tool to calculate the term centrality by means of social network analysis.
In the method, the accuracy of determining the upper terms can be improved by calculating the term centrality of each term in the target term class cluster and determining the term with the highest term centrality as the upper term.
In an embodiment of the present application, spectral clustering analysis may be further performed on the sub-keyword matrix to obtain a third hierarchical relationship of terms in the target term class cluster, where the third hierarchical relationship is a local hierarchical relationship of the first hierarchical relationship.
For further understanding of this embodiment by those skilled in the art, an example will be set forth below in connection with fig. 8:
referring to FIG. 8, a model representation of multiple clustering of domain terms is shown according to one embodiment of the present application.
Specifically, as shown in fig. 8, the direction of term clustering splitting is from left to right, where the domain term set 801 is a complete term class cluster, and after performing first spectral clustering on the domain term set 801, 2 primary term class clusters 802 are obtained, where a node with the largest cluster (the term centrality of the node is the largest and the semantic relevance of the node in the primary term class cluster is the largest) is determined as an upper term in the primary term class cluster.
Furthermore, the sub-keyword matrices corresponding to the two primary term class clusters can be subjected to spectral clustering analysis again, and it is to be noted that in the process of spectral clustering again, the upper terms in the primary term class clusters need to be removed.
In the example shown in fig. 8, after spectral cluster analysis is performed on the 2 primary term class clusters again, 4 secondary term class clusters 803 are obtained, wherein the largest node in each secondary term class cluster 803 is determined as a higher term in the secondary term class cluster 803, so that the third hierarchical relationship of terms in the 2 primary term class clusters is obtained, respectively.
It should be noted that, in fig. 8, the upper terms 1 and 2 are in a co-located relationship, the upper terms 1-1 and 1-2 are lower terms of the upper terms 1, the upper terms 2-1 and 2-2 are lower terms of the upper terms 2, and in each secondary term cluster, one upper term and several lower terms are included, for example, the upper terms 1-1 are upper terms of two other terms.
It should be understood to those skilled in the art that the secondary term class clusters shown in fig. 8 may also perform spectral cluster analysis, so as to continue splitting, and finally obtain the first hierarchical relationship of all terms in the domain term set 801.
With continued reference to fig. 3, in step 350, a term co-occurrence matrix is constructed based on the defined text, the term co-occurrence matrix being used to characterize co-occurrence features of the at least two terms in the defined text, and formal concept analysis is performed on the term co-occurrence matrix to obtain a second hierarchical relationship of the at least two terms.
In one embodiment of the present application, constructing the term co-occurrence matrix based on the definition text may be performed in accordance with the steps shown in fig. 9.
Referring to fig. 9, a detailed flow diagram of constructing a term co-occurrence matrix according to one embodiment of the present application is shown. Specifically, steps 351 to 352 are included:
in step 351, for each definition text, a term co-occurrence vector is generated, wherein vector elements in the term co-occurrence vector are used to characterize the presence of the respective term in the definition text.
In step 352, the term co-occurrence matrix is constructed from the term co-occurrence vectors corresponding to the respective definition text.
In this application, each term may include both "present" and "absent" in the definition text, and may be represented by "1" when the term is present in the definition text and "0" when the term is absent in the definition text. Specifically, for example, if the domain term set includes A, B, C, D terms, one of which defines the text W 1 Where only terms A and C appear, then the definition is forThe W is 1 Generating a term co-occurrence vector is "w1= [1 01 0 ]]”。
Further, when the definition text corresponding to at least two terms in the target field is W 1 ,W 2 ,…,W n When the term co-occurrence matrix is constructed, the term co-occurrence matrix is= { w 1 ,w 2 ,…,w n }。
In one embodiment of the present application, the formal concept analysis of the term co-occurrence matrix may be performed to obtain the second hierarchical relationship of the at least two terms according to the steps shown in fig. 10.
Referring to fig. 10, a detailed flow diagram of formal concept analysis of the term co-occurrence matrix is shown, according to one embodiment of the present application. Specifically, steps 353 to 354 are included:
in step 353, a concept lattice structure is generated based on the term co-occurrence matrix, the concept lattice structure being used to characterize the association between the definition text and the respective terms.
In step 354, a second hierarchical relationship of the at least two terms is extracted from the concept lattice structure.
In this embodiment, by using the term co-occurrence matrix, the domain term set a (attribute), the definition text set O (object), and the relationship R between them can be determined, so that a concept lattice structure can be generated, which is a partial sequence set with concepts as elements, and can be visualized through a Hasse diagram, wherein each node is a concept, and the concept lattice structure can describe the relationship R between the definition text set O (object) and the domain term set a (attribute).
Further, by the conceptual lattice structure, a 3-tuple b= (a, O, R) can be constructed, next, in the 3-tuple B, two mappings f and h are defined for the power set of O, A, as follows:
wherein aRo indicates that there is an attribute a e a in object O e O. The above reflects the object set (O i ) Common attributes and the same attributes (A j ) All of the objects in (a). At this time, if f (O i )=A j And h (A) j )=O i Then consider c= (O i ,A j ) Is prepared from O i Is epitaxy, A j Is an connotation concept.
Finally, if for concept C 1 =(O 1 ,A 1 )、C 2 =(O 2 ,A 2 ) Has the following componentsThen call C 2 Is C 1 By means of which the second hierarchical relationship of the at least two terms can be extracted.
With continued reference to fig. 3, in step 370, the second hierarchical relationship and the first hierarchical relationship are fused to obtain a target hierarchical relationship of the at least two terms.
In one embodiment of the present application, fusing the second hierarchical relationship and the first hierarchical relationship may include two cases:
when a first local hierarchical relationship which is different from the first hierarchical relationship and does not conflict with the first hierarchical relationship exists in the second hierarchical relationship, the first local hierarchical relationship is supplemented into the first hierarchical relationship.
And when a second local hierarchical relationship which is different from and conflicts with the first hierarchical relationship exists in the second hierarchical relationship, replacing the second local hierarchical relationship into the first hierarchical relationship.
For further understanding of this embodiment by those skilled in the art, the following will be set forth in one example in conjunction with table 1:
TABLE 1
Referring to table 1, when the fusion category is an upper extension, a lower extension, and an upper extension, a local hierarchical relationship which is different from the first hierarchical relationship and does not conflict with the first hierarchical relationship exists in the second hierarchical relationship, and the local hierarchical relationship is supplemented to the first hierarchical relationship at this time to obtain a local hierarchical relationship in the target hierarchical relationship.
Specifically, the upper expansion means that different upper terms (A, B) point to the same lower term (C) to expand the upper concept of the lower word. The lower expansion refers to the fact that one upper term (a) simultaneously points to a different lower term (B, C) to expand the lower concept of an upper word. The upper and lower extension means that the upper term (A) and the lower term (C) are fused by the same term (B) to extend the upper and lower concepts.
With continued reference to table 1, when the fusion category is the upper and lower correction, a local hierarchical relationship which is different from and conflicts with the first hierarchical relationship exists in the second hierarchical relationship, and at this time, the local hierarchical relationship is replaced into the first hierarchical relationship, so as to obtain the local hierarchical relationship in the target hierarchical relationship.
Specifically, the upper and lower level correction refers to replacing the local hierarchical relationship in the first hierarchical relationship with a→b→a.
In order to better understand the present application as a whole, a simple combing of the solution proposed in the present application will be performed in connection with fig. 11:
referring to FIG. 11, an overall flow diagram of determining domain term hierarchical relationships is shown, according to one embodiment of the present application.
In fig. 11, a definition text 1102 is searched and crawled based on domain terms in a domain term set 1101, then a keyword matrix 1103 and a term co-occurrence matrix 1107 are constructed based on the definition text 1102, spectral cluster analysis is performed on the keyword matrix 1103, formal concept analysis is performed on the term co-occurrence matrix 1107 to obtain a first hierarchical relationship 1104 and a second hierarchical relationship 1105 respectively, and finally the first hierarchical relationship 1104 and the second hierarchical relationship 1105 are fused to obtain a target hierarchical relationship 1106 of domain terms in the domain term set 1101. Furthermore, the obtained target hierarchical relationship of the domain terms can be applied to specific application scenes such as information retrieval, intelligent recommendation, knowledge discovery and the like.
It can be seen that, according to the technical scheme of the application, through the definition text corresponding to each term, a keyword matrix for representing the keyword characteristics in the definition text and a term co-occurrence matrix for representing the co-occurrence characteristics of the at least two terms in the definition text are respectively constructed, further, through spectral cluster analysis on the keyword matrix and formal concept analysis on the term co-occurrence matrix, a first hierarchical relationship and a second hierarchical relationship of the at least two terms are obtained, and finally, the second hierarchical relationship and the first hierarchical relationship are fused, so that a target hierarchical relationship is obtained. Because the spectral clustering analysis of the keyword matrix can ensure that the hierarchical relationship of the domain term has a certain recall rate, and the formal conceptual analysis of the term co-occurrence matrix can effectively improve the accuracy of the hierarchical relationship of the term layers, the accuracy of the hierarchical relationship of the domain term can be further improved under the condition that the recall rate is met by fusing the second hierarchical relationship and the first hierarchical relationship.
The following describes an embodiment of an apparatus of the present application, which may be used to perform a method for determining a hierarchical relationship of domain terms in the above-described embodiment of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method for determining the hierarchical relationship of domain terms described in the present application.
Fig. 12 shows a block diagram of a domain term hierarchical relationship determination apparatus according to an embodiment of the present application.
Referring to fig. 12, a device 1200 for determining a hierarchical relationship of domain terms according to an embodiment of the present application includes: an acquisition unit 1201, a first construction unit 1202, a second construction unit 1203, and a fusion unit 1004.
According to an aspect of the embodiments of the present application, there is provided a device for determining a hierarchical relationship of domain terms, where the device includes: an acquiring unit 1201 configured to acquire at least two terms in a target field, and definition text corresponding to each term; a first construction unit 1202, configured to construct a keyword matrix based on the defined text, where the keyword matrix is used to characterize keyword features in the defined text, and perform spectral cluster analysis on the keyword matrix to obtain a first hierarchical relationship of the at least two terms; a second construction unit 1203, configured to construct a term co-occurrence matrix based on the definition text, where the term co-occurrence matrix is used to characterize co-occurrence features of the at least two terms in the definition text, and perform formal concept analysis on the term co-occurrence matrix to obtain a second hierarchical relationship of the at least two terms; and a fusing unit 1204, configured to fuse the second hierarchical relationship and the first hierarchical relationship to obtain a target hierarchical relationship of the at least two terms.
In some embodiments of the present application, based on the foregoing solution, the first building unit 1202 is configured to: extracting a target keyword from the definition text through a word bag model, and determining the weight of the target keyword in the definition text, wherein the weight is used for representing the importance degree of the target keyword in the definition text; and constructing the keyword matrix based on the weight.
In some embodiments of the present application, based on the foregoing solution, the first construction unit 1202 includes a first analysis unit, configured to sequentially perform linear dimension reduction and nonlinear dimension reduction on the keyword matrix, to obtain a dimension-reduced keyword matrix; and determining the number of clusters through the dimension reduction keyword matrix, and performing spectral cluster analysis on the keyword matrix according to the number of clusters to obtain a first hierarchical relationship of the at least two terms.
In some embodiments of the present application, based on the foregoing solution, the first analysis unit is configured to: extracting a Laplace matrix of the keyword matrix, and performing cluster analysis on feature vectors of the Laplace matrix to obtain term class clusters of the clustering number, wherein each term class cluster comprises at least one term; for each target term class cluster, determining an upper term in the target term class cluster, and determining other terms except the upper term in the target term class cluster as lower terms of the upper term.
In some embodiments of the present application, based on the foregoing solution, the first analysis unit is configured to: constructing a sub-keyword matrix based on definition texts corresponding to terms in the target term class cluster; based on the sub-keyword matrix, constructing a social network based on term semantic similarity to calculate term centrality of each term in the target term class cluster; and determining the term with the highest degree of center as the upper term.
In some embodiments of the present application, based on the foregoing solution, the first analysis unit is configured to: and performing spectral cluster analysis on the sub-keyword matrix to obtain a third hierarchical relationship of the terms in the target term cluster, wherein the third hierarchical relationship is a local hierarchical relationship of the first hierarchical relationship.
In some embodiments of the present application, based on the foregoing solution, the second construction unit 1203 is configured to: generating a term co-occurrence vector for each definition text, wherein vector elements in the term co-occurrence vector are used for representing the existence condition of each term in the definition text; and constructing the term co-occurrence matrix through term co-occurrence vectors corresponding to the definition texts.
In some embodiments of the present application, based on the foregoing solution, the second construction unit 1203 includes a second analysis unit, configured to generate a concept lattice structure based on the term co-occurrence matrix, where the concept lattice structure is used to characterize an association relationship between the definition text and each term; and extracting a second hierarchical relationship of the at least two terms from the concept lattice structure.
In some embodiments of the present application, based on the foregoing scheme, the fusing unit 1204 is configured to: when a first local hierarchical relationship which is different from the first hierarchical relationship and does not conflict exists in the second hierarchical relationship, supplementing the first local hierarchical relationship into the first hierarchical relationship; and when a second local hierarchical relationship which is different from and conflicts with the first hierarchical relationship exists in the second hierarchical relationship, replacing the second local hierarchical relationship into the first hierarchical relationship.
Fig. 13 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.
It should be noted that, the computer system 1300 of the electronic device shown in fig. 13 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 13, the computer system 1300 includes a central processing unit (Central Processing Unit, CPU) 1301 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1302 or a program loaded from a storage portion 1308 into a random access Memory (Random Access Memory, RAM) 1303, for example, performing the method described in the above embodiment. In the RAM 1303, various programs and data required for the system operation are also stored. The CPU 1301, ROM 1302, and RAM 1303 are connected to each other through a bus 1304. An Input/Output (I/O) interface 1305 is also connected to bus 1304.
The following components are connected to the I/O interface 1305: an input section 1306 including a keyboard, a mouse, and the like; an output portion 1307 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, a speaker, and the like; a storage portion 1308 including a hard disk or the like; and a communication section 1309 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1309 performs a communication process via a network such as the internet. The drive 1310 is also connected to the I/O interface 1305 as needed. Removable media 1313, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is mounted on drive 1310 as needed so that a computer program read therefrom is mounted into storage portion 1308 as needed.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1309 and/or installed from the removable medium 1311. When executed by a Central Processing Unit (CPU) 1301, performs the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the method of determining the hierarchical relationship of domain terms described in the above embodiments.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs that, when executed by one of the electronic devices, cause the electronic device to implement the domain term hierarchical relationship determination method described in the above embodiment.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (6)

1. A method for determining a hierarchical relationship of domain terms, the method comprising:
acquiring at least two terms in the target field and definition texts corresponding to the terms;
constructing a keyword matrix based on the definition text, wherein the keyword matrix is used for representing keyword characteristics in the definition text, and sequentially performing linear dimension reduction and nonlinear dimension reduction on the keyword matrix to obtain a dimension reduction keyword matrix; determining the number of clusters through the dimension reduction keyword matrix, and performing spectral cluster analysis on the keyword matrix according to the number of clusters to obtain a first hierarchical relationship of the at least two terms;
generating a term co-occurrence vector for each definition text, wherein vector elements in the term co-occurrence vector are used for representing the existence condition of each term in the definition text; constructing a term co-occurrence matrix through term co-occurrence vectors corresponding to each definition text, wherein the term co-occurrence matrix is used for representing co-occurrence characteristics of the at least two terms in the definition text, and generating a concept lattice structure based on the term co-occurrence matrix, wherein the concept lattice structure is used for representing association relations between the definition text and each term; extracting a second hierarchical relationship of the at least two terms from the concept lattice structure;
When a first local hierarchical relationship which is different from the first hierarchical relationship and does not conflict exists in the second hierarchical relationship, supplementing the first local hierarchical relationship into the first hierarchical relationship; when a second local hierarchical relationship which is different from and conflicts with the first hierarchical relationship exists in the second hierarchical relationship, the second local hierarchical relationship is replaced into the first hierarchical relationship, and the target hierarchical relationship of the at least two terms is obtained.
2. The method of claim 1, wherein constructing a keyword matrix based on the defined text comprises:
extracting a target keyword from the definition text through a word bag model, and determining the weight of the target keyword in the definition text, wherein the weight is used for representing the importance degree of the target keyword in the definition text;
and constructing the keyword matrix based on the weight.
3. The method of claim 1, wherein the performing spectral cluster analysis on the keyword matrix according to the number of clusters to obtain the first hierarchical relationship of the at least two terms includes:
extracting a Laplace matrix of the keyword matrix, and performing cluster analysis on feature vectors of the Laplace matrix to obtain term class clusters of the clustering number, wherein each term class cluster comprises at least one term;
For each target term class cluster, determining an upper term in the target term class cluster, and determining other terms except the upper term in the target term class cluster as lower terms of the upper term.
4. The method of claim 3, wherein said determining a superior term in each target term class cluster comprises:
constructing a sub-keyword matrix based on definition texts corresponding to terms in the target term class cluster;
based on the sub-keyword matrix, constructing a social network based on term semantic similarity to calculate term centrality of each term in the target term class cluster;
and determining the term with the highest degree of center as the upper term.
5. The method according to claim 4, wherein the method further comprises:
and performing spectral cluster analysis on the sub-keyword matrix to obtain a third hierarchical relationship of the terms in the target term cluster, wherein the third hierarchical relationship is a local hierarchical relationship of the first hierarchical relationship.
6. A device for determining hierarchical relationships of domain terms, the device comprising:
An acquisition unit, configured to acquire at least two terms in a target field and definition text corresponding to each term;
the first construction unit is used for constructing a keyword matrix based on the definition text, wherein the keyword matrix is used for representing the keyword characteristics in the definition text, and carrying out linear dimension reduction and nonlinear dimension reduction on the keyword matrix in sequence to obtain a dimension reduction keyword matrix; determining the number of clusters through the dimension reduction keyword matrix, and performing spectral cluster analysis on the keyword matrix according to the number of clusters to obtain a first hierarchical relationship of the at least two terms;
a second construction unit, configured to generate a term co-occurrence vector for each definition text, where vector elements in the term co-occurrence vector are used to characterize the existence of the respective term in the definition text; constructing a term co-occurrence matrix through term co-occurrence vectors corresponding to each definition text, wherein the term co-occurrence matrix is used for representing co-occurrence characteristics of the at least two terms in the definition text, and generating a concept lattice structure based on the term co-occurrence matrix, wherein the concept lattice structure is used for representing association relations between the definition text and each term; extracting a second hierarchical relationship of the at least two terms from the concept lattice structure;
The fusion unit is used for supplementing the first local hierarchical relationship into the first hierarchical relationship when the first local hierarchical relationship which is different from the first hierarchical relationship and does not conflict exists in the second hierarchical relationship; when a second local hierarchical relationship which is different from and conflicts with the first hierarchical relationship exists in the second hierarchical relationship, the second local hierarchical relationship is replaced into the first hierarchical relationship, and the target hierarchical relationship of the at least two terms is obtained.
CN202110014913.9A 2021-01-06 2021-01-06 Method and device for determining hierarchical relationship of domain terms Active CN112685574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110014913.9A CN112685574B (en) 2021-01-06 2021-01-06 Method and device for determining hierarchical relationship of domain terms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110014913.9A CN112685574B (en) 2021-01-06 2021-01-06 Method and device for determining hierarchical relationship of domain terms

Publications (2)

Publication Number Publication Date
CN112685574A CN112685574A (en) 2021-04-20
CN112685574B true CN112685574B (en) 2024-04-09

Family

ID=75456142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110014913.9A Active CN112685574B (en) 2021-01-06 2021-01-06 Method and device for determining hierarchical relationship of domain terms

Country Status (1)

Country Link
CN (1) CN112685574B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462253A (en) * 2014-11-20 2015-03-25 武汉数为科技有限公司 Topic detection or tracking method for network text big data
CN105335499A (en) * 2015-10-27 2016-02-17 盐城工学院 Document clustering method based on distribution-convergence model
CN107679135A (en) * 2017-09-22 2018-02-09 深圳市易图资讯股份有限公司 The topic detection of network-oriented text big data and tracking, device
CN109840325A (en) * 2019-01-28 2019-06-04 山西大学 Text semantic method for measuring similarity based on mutual information
CN110287313A (en) * 2019-05-20 2019-09-27 阿里巴巴集团控股有限公司 A kind of the determination method and server of risk subject
CN110909550A (en) * 2019-11-13 2020-03-24 北京环境特性研究所 Text processing method and device, electronic equipment and readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8566360B2 (en) * 2010-05-28 2013-10-22 Drexel University System and method for automatically generating systematic reviews of a scientific field

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462253A (en) * 2014-11-20 2015-03-25 武汉数为科技有限公司 Topic detection or tracking method for network text big data
CN105335499A (en) * 2015-10-27 2016-02-17 盐城工学院 Document clustering method based on distribution-convergence model
CN107679135A (en) * 2017-09-22 2018-02-09 深圳市易图资讯股份有限公司 The topic detection of network-oriented text big data and tracking, device
CN109840325A (en) * 2019-01-28 2019-06-04 山西大学 Text semantic method for measuring similarity based on mutual information
CN110287313A (en) * 2019-05-20 2019-09-27 阿里巴巴集团控股有限公司 A kind of the determination method and server of risk subject
CN110909550A (en) * 2019-11-13 2020-03-24 北京环境特性研究所 Text processing method and device, electronic equipment and readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
an ontology automation construction scheme for chinese e-government thesaurus optimizing;Hao Wang等;wiley;20201029;1-13 *
基于谱聚类的虚拟健康社区知识聚合方法研究;张海涛;宋拓;周红磊;张鑫蕊;;图书情报工作;20200420;第64卷(第08期);134-140 *
电子政务领域中文术语层次关系识别研究;张卫等;情报学报;20210124;第40卷(第1期);62-76 *

Also Published As

Publication number Publication date
CN112685574A (en) 2021-04-20

Similar Documents

Publication Publication Date Title
Serafino et al. True scale-free networks hidden by finite size effects
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
JP2020123318A (en) Method, apparatus, electronic device, computer-readable storage medium, and computer program for determining text relevance
CN112149400B (en) Data processing method, device, equipment and storage medium
WO2014126657A1 (en) Latent semantic analysis for application in a question answer system
CN111353303B (en) Word vector construction method and device, electronic equipment and storage medium
CN110162637B (en) Information map construction method, device and equipment
CN115730597A (en) Multi-level semantic intention recognition method and related equipment thereof
CN115455169A (en) Knowledge graph question-answering method and system based on vocabulary knowledge and semantic dependence
CN115129869A (en) Text data processing method and device, computer equipment and storage medium
CN113110843A (en) Contract generation model training method, contract generation method and electronic equipment
JP2023517518A (en) Vector embedding model for relational tables with null or equivalent values
CN110287270B (en) Entity relationship mining method and equipment
CN112685574B (en) Method and device for determining hierarchical relationship of domain terms
CN114547257B (en) Class matching method and device, computer equipment and storage medium
US20220284501A1 (en) Probabilistic determination of compatible content
CN115905885A (en) Data identification method, device, storage medium and program product
CN111274818B (en) Word vector generation method and device
CN113010642B (en) Semantic relation recognition method and device, electronic equipment and readable storage medium
CN115186188A (en) Product recommendation method, device and equipment based on behavior analysis and storage medium
Ma et al. [Retracted] The Construction of Big Data Computational Intelligence System for E‐Government in Cloud Computing Environment and Its Development Impact
CN113779248A (en) Data classification model training method, data processing method and storage medium
CN114282002A (en) Knowledge generation method, device, equipment and storage medium based on artificial intelligence
CN113010759A (en) Processing method and device of cluster set, computer readable medium and electronic equipment
CN113627514A (en) Data processing method and device of knowledge graph, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant