WO2022052505A1 - Method and apparatus for extracting sentence main portion on the basis of dependency grammar, and readable storage medium - Google Patents

Method and apparatus for extracting sentence main portion on the basis of dependency grammar, and readable storage medium Download PDF

Info

Publication number
WO2022052505A1
WO2022052505A1 PCT/CN2021/094933 CN2021094933W WO2022052505A1 WO 2022052505 A1 WO2022052505 A1 WO 2022052505A1 CN 2021094933 W CN2021094933 W CN 2021094933W WO 2022052505 A1 WO2022052505 A1 WO 2022052505A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
dependency
extracting
analyzed
dependency syntax
Prior art date
Application number
PCT/CN2021/094933
Other languages
French (fr)
Chinese (zh)
Inventor
汤耀华
周楠楠
杨海军
徐倩
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022052505A1 publication Critical patent/WO2022052505A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the field of artificial intelligence in financial technology (Fintech), and in particular, to a method, device and readable storage medium for sentence stem extraction based on dependency syntax.
  • the main purpose of this application is to provide a method, device and readable storage medium for sentence stem extraction based on dependency syntax, which aims to solve the technical problem of low sentence semantic recognition accuracy in the related art.
  • the present application provides a method for extracting sentence stems based on dependency syntax.
  • the method for extracting sentence stems based on dependency syntax is applied to a sentence stem extraction device based on dependency syntax.
  • a target sentence trunk corresponding to the to-be-analyzed sentence is extracted to perform semantic recognition on the to-be-analyzed sentence.
  • the present application also provides a device for extracting sentence stems based on dependency syntax.
  • the device for extracting sentence stems based on dependency syntax is a virtual device, and the device for extracting sentence stems based on dependency syntax is applied to a device for extracting sentence stems based on dependency syntax.
  • the device for extracting sentence stems based on dependency syntax includes:
  • a dependency syntax analysis module used to obtain a statement to be analyzed, and to perform a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
  • the sentence stem extraction module is used for extracting the target sentence stem corresponding to the to-be-analyzed sentence based on the dependent syntax analysis result, so as to perform semantic recognition on the to-be-analyzed sentence.
  • the present application also provides a device for extracting sentence stems based on dependency syntax.
  • the device for extracting sentence stems based on dependency syntax is an entity device.
  • the device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a The program of the method for extracting sentence stems based on dependency syntax on the memory and running on the processor, the program for the method for extracting sentence stems based on dependency syntax can realize the above-mentioned dependency syntax-based program when executed by the processor The steps of the sentence stem extraction method.
  • the present application also provides a readable storage medium, where a program for implementing the method for extracting sentence stems based on dependency syntax is stored thereon, and the program for the method for extracting sentence stems based on dependency syntax is implemented when executed by a processor The steps of the sentence stem extraction method based on the above-mentioned dependency syntax.
  • the present application provides a method, device, and readable storage medium for extracting sentence trunks based on dependency syntax.
  • the present application is in the process of obtaining sentences to be analyzed. After that, a dependency syntax analysis is performed on the to-be-analyzed sentence to obtain a dependency-syntax analysis result corresponding to the to-be-analyzed sentence, and then based on the dependency syntax analysis result, the target sentence trunk corresponding to the to-be-analyzed sentence is extracted, thereby realizing
  • the purpose of extracting the target sentence stem of the sentence to be analyzed is based on the method of dependency parsing.
  • the contribution of the non-sentence stem part of the sentence to be analyzed except the target sentence stem to semantic recognition The degree is small, and it interferes with semantic recognition, and then the method based on dependency syntax analysis is realized.
  • the purpose of eliminating the non-sentence stem part that contributes little to semantic recognition in the sentence to be analyzed makes the target sentence stem closer to the target sentence.
  • FIG. 1 is a schematic flowchart of a first embodiment of a method for extracting sentence stems based on dependency syntax in the present application
  • FIG. 2 is a schematic diagram of a dependency tree corresponding to the to-be-analyzed sentence described in the sentence trunk extraction method based on the dependency syntax of the present application;
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for extracting sentence stems based on dependency syntax in the present application
  • FIG. 4 is a schematic diagram of a device structure of a hardware operating environment involved in the solution of the embodiment of the present application.
  • An embodiment of the present application provides a method for extracting sentence stems based on dependency syntax.
  • the method for extracting sentence stems based on dependency syntax includes:
  • Step S10 obtaining a statement to be analyzed, and performing a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
  • the to-be-analyzed sentence is a preprocessed sentence that needs to be subjected to dependency syntax analysis, wherein the purpose of the preprocessing is to remove interference in the to-be-analyzed sentence and perform dependency syntax analysis.
  • Background component the method for extracting sentence trunks based on dependency syntax is applied to a human-computer dialogue system, the sentences to be analyzed are preprocessed sentences that are replied by a user during a human-computer dialogue, and the sentence trunk extraction device based on dependency syntaxes It includes a preset dependency syntax model, wherein the preset dependency syntax model is a pre-trained machine learning model for performing dependency syntax analysis on sentences, wherein the process of dependency syntax analysis is the process of parsing syntax information of sentences , wherein the syntactic information includes sentence type information and word component information.
  • the sentence type information indicates that the sentence is a subject-predicate-object sentence
  • the word component information indicates that the sentence is a subject-predicate-object sentence.
  • I is the subject
  • is is the predicate
  • "who” is the object.
  • the purpose of determining the dependency relationship is to determine the dependency relationship between words, and the purpose of performing the dependency relationship type prediction is to predict the dependency relationship.
  • the sentence to be analyzed is the sentence "ABC", in which A, B and C are all words in the sentence to be analyzed
  • the dependency relationship is judged, it can be determined that B depends on A, C depends on B, and the dependency is carried out.
  • the relationship type is predicted, it can be determined that the dependence relationship between A and B is a subject-predicate relationship, and the dependence relationship between B and C is a verb-object relationship.
  • the analysis sentence performs dependency relationship discrimination and dependency relationship type prediction, so as to perform a dependency syntax analysis on the to-be-analyzed sentence, and the steps of obtaining a dependency syntax analysis result include:
  • the dependency type prediction result can be represented by a matrix, and the matrix form corresponding to the dependency type prediction result is the dependency type prediction probability matrix , wherein the value of each bit in the dependency type prediction probability matrix is the dependency type label probability prediction vector between a word in the to-be-analyzed sentence and another word, wherein the dependency
  • the value on each bit in the relationship type prediction vector is the probability value that the dependency between a word and another word in the to-be-analyzed sentence belongs to the preset dependency corresponding to the bit, wherein the preset Dependencies include subject-predicate relationships, verb-object relationships, etc.
  • 0.1 means that the relationship between word A and word B is The probability of a subject-predicate relationship is 10%, and 0.9 indicates a 90% probability of a verb-object relationship between word A and word B.
  • the step of obtaining the statement to be analyzed includes:
  • Step S11 acquiring the statement to be processed, and identifying the background components in the statement to be processed;
  • the sentence to be processed is acquired, and the background components in the sentence to be processed are identified. Specifically, the sentence to be processed is acquired, and the word position of each word to be processed in the sentence to be processed is determined, wherein the The predicate positions include prefix positions, mid-sentence positions and suffix positions, and then based on the word position of each of the to-be-processed words, match the corresponding preset position background words for each of the to-be-analyzed words, and then based on each of the to-be-analyzed words.
  • the preset position background word set includes a prefix position A set of background words, a set of positional background words in a sentence, and a set of suffixed positional background words, wherein the set of prefixed positional background words includes "there is also that", “there is that", “for example”, “for example” For example”, “I want to know”, “I know”, “I don't know”, “I want to know”, “I want to know”, “I want to ask”, “I want to ask”, “I want to ask”, "I want to ask", "I want to ask", "I want to ask", "I want to ask", "I want to ask", "I want to ask", "I want to ask", "I want to ask", "I want to ask Prefix positional context words such as "xia”, the set of positional context words in the sentence includes the positional context words in sentences such as "generally”, “wait a minute”, “that", “trouble under”, etc., and the set of suffixed positional context words Including “what", “do you
  • the step of determining each context word in each of the to-be-analyzed words based on the preset position context word set corresponding to each of the to-be-analyzed words includes the following steps: : Perform the following steps for each of the words to be analyzed:
  • the to-be-analyzed word is compared with each preset context word in the preset position context word set corresponding to the to-be-analyzed word to determine whether there is a preset position context word set corresponding to the to-be-analyzed word and the preset context word set corresponding to the to-be-analyzed word.
  • the preset background word that describes the word to be analyzed is consistent. If there is, the word to be analyzed is used as a background word. If it does not exist, the word to be analyzed is not a background word.
  • Step S12 removing the background component from the sentence to be processed to obtain the sentence to be analyzed.
  • the background component is removed from the to-be-processed sentence to obtain the to-be-analyzed sentence. Specifically, the background component from the to-be-processed sentence is removed to remove the opposite sentence from the to-be-processed sentence.
  • the trunk extraction process there are interfering background components, and the part of the sentence to be processed other than the background component is used as the sentence to be analyzed, so as to improve the accuracy of sentence trunk extraction.
  • Step S20 based on the dependency syntax analysis result, extract the target sentence trunk corresponding to the to-be-analyzed sentence, so as to perform semantic recognition on the to-be-analyzed sentence.
  • the dependency syntax analysis result includes a dependency label prediction result, wherein the dependency type prediction result is the dependency type between words in the to-be-analyzed sentence,
  • the dependency relationship types include subject-predicate relationship types, verb-object relationship types, and verb-complement structure types.
  • the target sentence trunk corresponding to the to-be-analyzed sentence to perform semantic recognition on the to-be-analyzed sentence.
  • determine the word components of each word to be analyzed and then based on the part of speech of each word to be analyzed and the word component of each word to be analyzed in the sentence to be analyzed, determine the sentence pattern information of the sentence to be analyzed, wherein the part of speech is The properties of the word to be analyzed, for example, the part of speech includes verbs, names, and quantifiers, etc., and the word components are the properties of the word to be analyzed in the statement to be analyzed, for example, the word components include subject, predicate, and object etc., and then based on the sentence pattern information, select the target core word in the statement to be analyzed, and then based on the dependency type prediction result, select each target sentence associated with the target core word in the statement to be analyzed stem words, and then the target core words and each of the target sentence stem words form
  • the dependency syntax analysis result includes a dependency relationship type prediction result
  • the step of extracting the trunk of the target sentence corresponding to the sentence to be analyzed based on the result of the dependency syntax analysis includes:
  • Step S21 determining sentence pattern information corresponding to the to-be-analyzed sentence based on the dependency type prediction result
  • sentence type information corresponding to the to-be-analyzed sentence is determined based on the dependency type prediction result. Specifically, based on the dependency type between the to-be-analyzed words in the to-be-analyzed sentence, each The word component of the word to be analyzed, and then the part of speech of each word to be analyzed is obtained, and then based on the part of speech and word component of each word to be analyzed, sentence type discrimination is performed on the sentence to be analyzed, and the sentence to be analyzed is determined.
  • Sentence pattern information of a statement wherein the sentence pattern information is identification information of the sentence pattern of the statement to be analyzed, and the sentence pattern of the statement to be analyzed includes a verb predicate, a noun predicate, an adjective predicate, Prepositional predicate sentences, linked sentences, double-object sentences, comparison-character sentences, qui-character sentences, ba-character sentences and concurrent sentences, etc., wherein the verb predicate sentence is a sentence in which a verb is used as a predicate, and the noun predicate sentence is a noun predicate sentence.
  • a predicate sentence the adjective predicate sentence is an adjective as a predicate sentence
  • the prepositional predicate sentence is a preposition as a predicate sentence
  • the linked sentence is a sentence in which two consecutive verbs that do not dominate each other are the same as the predicate
  • the double-object statement is a statement in which the dependency relationship type corresponding to the predicate includes both an inter-object relationship type and a verb-object relationship type
  • the comparison clause is a statement with an adverbial starting from "compare” in the adverbial clause of the predicate
  • the object clause is:
  • the adverbial statement of the predicate contains the adverbial statement beginning with "by”, the adverbial statement of the predicate is the adverbial statement beginning with "ba”, and the concurrent statement is the statement whose dependency type corresponding to the object is the conjunctive type. .
  • Step S22 extracting the target sentence stem based on the sentence pattern information and the dependency type prediction result.
  • the target sentence stem is extracted based on the sentence pattern information and the dependency type prediction result. Specifically, based on the sentence pattern information, the words to be analyzed in the sentence to be analyzed are selected from the words to be analyzed.
  • the target core word wherein the target core word is the word to be analyzed as the core predicate.
  • the sentence pattern information indicates that the sentence to be analyzed is a verb predicate sentence
  • the verb on the predicate of the sentence to be analyzed is the target core word
  • the preset word component priority and the type of dependency between words in the to-be-analyzed statement select the stem words of each target sentence corresponding to the target core word, and then use the The dependent syntactic vector corresponding to the target core word and each of the target sentence stem words is used as the target sentence stem
  • the preset word component priority sentence is the word component that is preferentially extracted when extracting the sentence stem
  • the subject-predicate relationship type in the dependency type is determined, the subject corresponding to the target core word is determined as the target sentence stem word, and based on the verb-object relationship type in the dependency type, the target core word is determined.
  • the subject is used as the stem of the target sentence, and then the stem of the target sentence is composed of a subject, a predicate and an object.
  • the step of extracting the stem of the target sentence based on the sentence pattern information and the dependency type prediction result includes:
  • a dependency tree corresponding to the to-be-analyzed sentence is generated, and then based on Preset word component priorities and the dependency tree to extract the target sentence trunk, wherein the preset word component priority is the priority of the extracted word components in the sentence trunk extraction process, in an implementable manner.
  • Figure 2 is a schematic diagram of the dependency tree corresponding to the to-be-analyzed sentence, wherein ROOT means In the statement to be analyzed, m, a, u, r, nt, d, v, q and n are all tags of part of speech, and ADV, RAD, ATT, HED, SBV, CMP and VOB are tags of dependency type.
  • the step of extracting the backbone of the target sentence based on the sentence pattern information and the dependency type prediction result includes:
  • Step S221 based on the sentence pattern information, determine the target core word corresponding to the sentence to be analyzed;
  • the target core word corresponding to the sentence to be analyzed is determined based on the sentence pattern information, and specifically, the core predicate corresponding to the sentence to be analyzed is determined based on the sentence pattern information, wherein the The core predicate is a predicate that determines the sentence pattern information of the to-be-analyzed sentence, and then the to-be-analyzed word corresponding to the core predicate is used as the target core word.
  • Step S222 determining each target sentence stem word corresponding to the target core word based on the preset word component priority and the dependency type prediction result;
  • the preset word component priority is the priority of the extracted word component in the sentence stem extraction process
  • the preset word component priority includes a preset first priority, a preset priority Set a second priority and a preset third priority, wherein, in an implementable manner, the word components corresponding to the preset first priority include a subject, a predicate, an object, an adverbial, and a complement.
  • the word component corresponding to the second priority includes an attributive
  • the word component corresponding to the preset third priority includes a word component other than the word component corresponding to the preset first priority and the preset second priority. other word components.
  • the stem words of each target sentence corresponding to the target core word based on the preset word component priority and the dependency type prediction result. Specifically, based on the dependency type between words in the to-be-analyzed sentence, Determine the word components of each to-be-selected word corresponding to the target core word, and then select priority word components from the word components of each The word to be analyzed corresponding to the priority word component is used as the stem word of the target sentence, wherein, it should be noted that the first level of the preset word component priority is the preset first priority, and the preset component priority The second layer of the preset word component is the preset second priority, and the third layer of the preset word component is the preset third priority.
  • Step S223 Generate the target sentence stem based on the sentence pattern information, the target core word and each target sentence stem word.
  • the target sentence stem is generated based on the sentence pattern information, the target core word and each of the target sentence stem words, and specifically, the word components and part of speech of the target core word are obtained, and word components and parts of speech of the stem words of each target sentence, and further based on the sentence pattern information, the target core words, the part of speech and word components of the target core words, the stem words of each target sentence and the target words
  • the part of speech and the corresponding word component corresponding to the sentence stem word are used to generate the target sentence stem, wherein, in an implementable manner, it is assumed that the to-be-analyzed sentence is "very disappointed, he later took the Ministry of Finance exam several times.
  • the civil servant then the stem of the target sentence is ⁇ sentence pattern: subject-verb-object, subject: [he, ATT: [very disappointed]], predicate: [kao, RAD: [de]], prepositional object: [], object: [civil servant, ATT: Ministry of Finance]: COO: [], CMP: [several times], ADV: [later, again] ⁇ , where ATT is the label of the relationship type between China and RAD, and RAD is the right The label of the additional relationship type, COO is the label of the parallel relationship type, CMP is the label of the dynamic complement structure type, and ADV is the label of the structure type in the state.
  • This embodiment provides a method, device, and readable storage medium for extracting sentence stems based on dependency syntax. Compared with the method for semantic recognition of sentences based on the word frequency information of sentences adopted in the related art, this embodiment obtains the After analyzing the sentence, perform a dependency syntax analysis on the to-be-analyzed sentence to obtain a dependency-syntax analysis result corresponding to the to-be-analyzed sentence, and then extract the target sentence trunk corresponding to the to-be-analyzed sentence based on the dependency syntax analysis result, and then The purpose of extracting the target sentence stem of the sentence to be analyzed based on the method of dependency syntax analysis is realized. It should be noted that the non-sentence stem part of the sentence to be analyzed except the target sentence stem is used for semantic recognition.
  • the contribution of the sentence is small, and it interferes with semantic recognition, and then the method based on dependency syntactic analysis is realized.
  • the backbone of the target sentence is closer to the real semantics of the sentence to be analyzed, and then based on the backbone of the target sentence, semantic recognition of the sentence to be analyzed can improve the accuracy of sentence semantic recognition, and because the target sentence backbone is compared to the target sentence to be analyzed.
  • the sentence is more concise, and the accuracy of semantic recognition is improved, while the amount of input data for semantic recognition is smaller, which can reduce the amount of calculation in semantic recognition and improve the computational efficiency of semantic recognition.
  • the efficiency of semantic recognition overcomes the sentence-based word information in related technologies. When performing semantic recognition on sentences, for some sentences with the same semantics but different word information, it is difficult to accurately identify these sentences with the same semantics, resulting in accurate sentence semantic recognition.
  • the technical defect of low rate improves the accuracy of sentence semantic recognition.
  • the dependency syntax analysis result includes a dependency relationship type prediction result
  • the step of performing dependency syntax analysis on the to-be-analyzed statement, and obtaining the result of the dependency syntax analysis includes:
  • Step A10 vectorizing the statement to be analyzed to obtain a vectorized statement
  • the to-be-analyzed sentence is vectorized to obtain a vectorized sentence. Specifically, the to-be-analyzed word vector, the to-be-analyzed part-of-speech vector and the to-be-analyzed word vector corresponding to each to-be-analyzed word in the to-be-analyzed sentence are generated.
  • the word vector to be analyzed is a coding vector representing the word to be analyzed, used to uniquely represent the word to be analyzed
  • the part of speech vector to be analyzed is a coding vector representing the part of speech of the word to be analyzed
  • the position vector of the word to be analyzed is a coding vector representing the position of the word to be analyzed in the sentence to be analyzed, and then based on the word vector to be analyzed corresponding to each word to be analyzed, the corresponding part of speech vector to be analyzed and
  • the corresponding to-be-analyzed word position vector is used to generate a vectorized word corresponding to each of the to-be-analyzed words, and then a matrix formed by each of the vectorized words is used as the vectorized sentence.
  • the to-be-analyzed sentence includes at least one to-be-analyzed word
  • the vectorized sentence includes at least one vectorized word
  • the step of vectorizing the to-be-analyzed statement to obtain the vectorized statement includes:
  • Step A11 obtaining the word vector to be analyzed corresponding to the word to be analyzed, the corresponding part of speech vector to be analyzed, and the corresponding word position vector to be analyzed;
  • the to-be-analyzed word vector corresponding to the to-be-analyzed word, the corresponding to-be-analyzed part-of-speech vector, and the corresponding to-be-analyzed word position vector are obtained.
  • the word is mapped to a preset vector space, the word vector to be analyzed corresponding to the word to be analyzed is obtained, and the corresponding part of speech vector to be analyzed is matched for the word to be analyzed.
  • the position in the sentence is generated, and the position vector of the to-be-analyzed word corresponding to the to-be-analyzed word is generated.
  • Step A12 Generate the vectorized word based on the to-be-analyzed word vector, the to-be-analyzed part-of-speech vector, and the to-be-analyzed word position vector.
  • the vectorized word is generated based on the to-be-analyzed word vector, the to-be-analyzed part-of-speech vector, and the to-be-analyzed word position vector.
  • the part-of-speech vector and the position vector of the word to be analyzed are input into a preset vectorized word calculation formula to obtain the vectorized word, wherein the preset vectorized word calculation formula is as follows:
  • X i is the vectorized word
  • E w is the word vector to be analyzed
  • E t is the part of speech vector to be analyzed
  • E p is the position vector of the word to be analyzed
  • It is the concate operation between vectors.
  • Step A20 based on a preset dependency relation discrimination model, perform dependency relation discrimination on the vectorized statement, and obtain a dependency relation discrimination result;
  • the preset dependency syntax model includes a preset dependency judgment model, wherein the preset dependency judgment model is used for judging whether words in the sentence to be analyzed are between words or not.
  • Machine learning models with dependencies are used for judging whether words in the sentence to be analyzed are between words or not.
  • the vectorized statement is subjected to dependency discrimination, and a dependency discrimination result is obtained.
  • the vectorized statement is input into the preset dependency discrimination model, and the vectorized statement is The dependency relation is discriminated to discriminate whether there is a dependency relation between the words in the sentence to be analyzed, and the dependency relation discriminant result is obtained.
  • the preset dependency relationship discrimination model includes a first feature extraction model, a first fully connected network, a second fully connected network and a first double affine transformation network,
  • the step of performing dependency discrimination on the vectorized statement based on the preset dependency discrimination model, and obtaining a dependency discrimination result includes:
  • Step A21 based on the first feature extraction model, perform feature extraction on the vectorized statement to obtain a first feature extraction result
  • the first feature extraction model is a neural network that performs feature extraction on the vectorized sentence, and the first feature extraction model includes a Transformer model, an RNN network, and a CNN network.
  • feature extraction is performed on the vectorized statement to obtain a first feature extraction result.
  • the vectorized statement is input into the first feature extraction model, and the vectorized statement is Perform feature extraction to obtain a first feature extraction matrix, and use the first feature extraction matrix as the first feature extraction result.
  • Step A22 based on the first fully connected network and the second fully connected network, respectively fully connect the first feature extraction result to obtain a first sentence vector and a second sentence vector;
  • the first feature extraction results are respectively fully connected to obtain a first sentence vector and a second sentence vector.
  • the first feature extraction matrix is input into the first fully connected network
  • the first feature extraction matrix is fully connected to obtain the first sentence vector
  • the first feature extraction matrix is input into the second fully connected network
  • the The first feature extraction matrix is fully connected to obtain a second sentence vector, wherein, it should be noted that the first sentence vector includes at least a prefix vector, which is used to represent the representation vector of the dependent word in the dependency relationship
  • the second sentence vector includes at least a word tail vector, which is used to represent the representation vector of the dependent word in the dependency relationship. For example, assuming that word A depends on word B, the word representation vector corresponding to word B is the prefix vector, and the word The word representation vector corresponding to A is the word tail vector.
  • Step A23 based on the first double affine transformation network, perform double affine transformation on the first sentence vector and the second sentence vector to obtain a dependency score matrix;
  • double affine transformation is performed on the first sentence vector and the second sentence vector to obtain a dependency score matrix.
  • a sentence vector and the second sentence vector are input into the first double affine transformation network, and double affine transformation is performed on the first sentence vector and the second sentence vector to calculate each of the first sentence vectors.
  • the probability score of the existence of a dependency relationship between the prefix vector and each word ending vector in the second sentence vector, and the dependency score matrix is obtained, wherein the dependency score matrix is determined by the existence of a dependency between each prefix vector and each word ending vector.
  • a score matrix consisting of probability scores for the relationship.
  • Step A24 Determine the dependency discrimination result based on the dependency score matrix.
  • the dependency determination result is determined based on the dependency score matrix, and specifically, based on a preset maximum spanning tree algorithm, a maximum probability that satisfies a preset score selection condition is selected from the dependency score matrix score sum, and use the dependency relationship vector composed of the vectorized words corresponding to the dependency relationship corresponding to the maximum probability score and the corresponding target probability score as the dependency relationship discrimination result, wherein the preset score selection conditions include each The words to be analyzed corresponding to the target probability scores are in one-to-one correspondence with the words to be analyzed in the sentences to be analyzed, etc.
  • the target probability scores are A and B, wherein the target probability score A indicates that the word b is attached to the word
  • the probability score of a the target probability score B represents the probability score of word c attached to word b
  • the vectorized word corresponding to word a is vector X
  • the vectorized word corresponding to word b is vector Y
  • the vectorized word corresponding to word c is vector Z
  • the dependency vector is a vector (X, 1, 0, 0, 1, Y, 1, 0, 0, 1, Z), where (1, 0, 0, 1) indicates that there is a relationship between words dependencies.
  • Step A30 based on a preset dependency type prediction model and the dependency discrimination result, perform dependency type prediction on the vectorized statement, and obtain the dependency type prediction result;
  • the preset dependency syntax model includes a preset dependency type prediction model, wherein the preset dependency type prediction model is used to predict the type of dependencies between words in the sentence to be analyzed machine learning model.
  • the dependency type probability score vector is (A, B, C), and all The first bit of the dependency type probability score vector corresponds to the subject-predicate relationship, the second bit corresponds to the verb-object relationship, and the third bit corresponds to the juxtaposition relationship, then A is the dependency between the two words corresponding to the dependency type probability score vector
  • the relationship is the probability score of the subject-predicate relationship
  • B is the probability score of the verb-object relationship between the two words corresponding to the dependency type probability score vector
  • A is the two words corresponding to the dependency type probability score vector
  • the dependency relationship between them is the probability score of the subject-predicate relationship, and then based on the dependency relationship discrimination result, select the probability score vector of each target dependency relationship type in the dependency relationship type probability score matrix, and then classify the target dependency relationship
  • the dependency type corresponding to the largest value in the type probability score vector is used as the target dependency type, and then the dependency type between the words of the sentence to be analyzed is obtained, that is, the dependency type prediction result is obtained.
  • the dependency discrimination result includes a dependency vector
  • the step of performing dependency type prediction on the vectorized statement based on the preset dependency type prediction model and the dependency discrimination result, and obtaining the dependency type prediction result includes:
  • Step A31 based on the preset dependency type prediction model, perform dependency type prediction on the vectorized statement, and obtain a dependency type probability score matrix;
  • the preset dependency type prediction model includes a second feature extraction model, a third fully connected network, a fourth fully connected network, and a second double affine transformation network.
  • a dependency type prediction is performed on the vectorized statement, and a dependency type probability score matrix is obtained.
  • the vectorized statement is input into the second feature extraction model, and the Perform feature extraction on the quantized sentence to obtain a second feature extraction matrix, and input the second feature extraction matrix into the third fully connected network and the fourth fully connected network, respectively, to obtain a third sentence corresponding to the second feature extraction matrix vector and the corresponding fourth sentence vector, input the third sentence vector and the fourth sentence vector into the second double affine transformation network, and perform double affine on the third sentence vector and the fourth sentence vector Transform to obtain the dependency type probability score matrix.
  • Step A32 Integrate the dependency type probability score matrix and the dependency relationship vector to obtain the dependency type prediction result.
  • the dependency type probability score matrix and the dependency relationship vector are fused to obtain the dependency type prediction result.
  • the dependency type probability score is Each dependency type probability score vector in the matrix is fused with the dependency vector to obtain a dependency type probability vector corresponding to each of the dependency type probability score vectors, wherein the preset fusion rules include weighted average, splicing, summing, etc., the value on each bit of the dependency type probability vector is the probability of a preset dependency type, and the preset dependency type includes a subject-predicate relationship type, a verb-object relationship type, and a parallel relationship type and so on, and then select the maximum probability value in each of the dependency type probability vectors as the target dependency type probability, and then determine each maximum dependency type probability that meets the preset probability selection conditions in each of the target dependency type probabilities
  • the corresponding dependency type, and the dependency type corresponding to each maximum dependency type probability is used as the dependency type prediction result, wherein the preset probability selection condition includes the selected word to be
  • the preset dependency syntax model can be obtained by training based on the following steps:
  • Step B10 acquiring training data and a to-be-trained dependent syntax model, wherein the training data includes a training sentence and a preset dependency type label corresponding to the training sentence;
  • the preset dependency type label is an identifier of the dependency type between words in the pre-marked training sentence
  • the dependency syntax model to be trained is an untrained one Dependency syntax model
  • Acquire training data and a to-be-trained dependent syntax model wherein the training data includes a training sentence and a preset dependency type label corresponding to the training sentence, specifically, acquire a label-dependent syntax analysis data set and a to-be-trained dependent syntax model, and Collecting a dependency syntax analysis data set, and manually labeling the dependency syntax analysis data set to obtain a manual labeling dependency syntax analysis data set, and then performing the labeling dependency syntax analysis data set and the manual labeling dependency syntax analysis data set.
  • Combined to obtain a training data set so as to expand the number of training samples corresponding to the to-be-trained dependent syntax model.
  • Step B20 inputting the training data into the to-be-trained dependent syntax model to perform dependency syntax analysis on the training sentence to obtain a type training prediction label;
  • the training data includes at least one training sentence.
  • the training sentence is vectorized to obtain a vectorized training sentence, and then based on the preset dependency discrimination model in the dependency syntax model to be trained, the dependency relation of the vectorized training sentence is discriminated to obtain a training dependency vector, and
  • a dependency type prediction is performed on the vectorized training sentence to obtain a training dependency type probability score matrix, and then the training dependency vector and the The training dependency type probability score matrix is used to determine the type training prediction label, wherein the type training prediction label is the identifier of the dependency type corresponding to the training sentence.
  • Step B30 calculating the dependency syntax model error based on the type training prediction label and the preset dependency type label
  • a dependency syntax model error is calculated based on the type training prediction label and the preset dependency type label, and specifically, the distance between the type training prediction label and the preset dependency type label is calculated , to obtain the dependent syntax model error.
  • Step B40 based on the error of the dependent syntax model, update the dependent syntax model to be trained until the dependent syntax model to be trained satisfies a preset update end condition, and use the dependent syntax model to be trained as the preset Dependency syntax model.
  • the to-be-trained dependent syntax model is updated until the to-be-trained dependent syntax model satisfies a preset update end condition, and the to-be-trained dependent syntax model is used as the
  • the preset dependent syntax model specifically, based on the error of the dependent syntax model, calculate gradient information, and update the model parameters of the dependent syntax model to be trained according to the gradient information by means of backpropagation, and obtain an updated and then determine whether the updated dependent syntax model to be trained satisfies the preset update end condition, and if so, the updated dependent syntax model to be trained is used as the preset dependent syntax model, if not is satisfied, then re-acquire the training sentence to re-train and update the model parameters of the updated dependent syntax model to be trained, until the updated dependent syntax model to be trained satisfies the preset update end condition, wherein the preset update ends Conditions include reaching the maximum number of iterations and convergence of the loss function.
  • the preset paraphrase sentence recognition model can be trained.
  • the paraphrase sentence training data and the paraphrase sentence recognition model to be trained are obtained, wherein the paraphrase sentence training sentence includes: The first to-be-recognized training sentence, the second to-be-recognized training sentence, and a sentence label, wherein the sentence label is an identifier for whether the first to-be-recognized training sentence and the second to-be-recognized training sentence are paraphrase sentences, and the The paraphrase training data can be represented by a vector.
  • the paraphrase training data is a vector (X1, X2, Y)
  • X1 is the first paraphrase training sentence
  • X2 is the second paraphrase training sentence
  • Y is the sentence label
  • the recognition training sentence and the second to-be-recognized training sentence are subjected to dependency syntax analysis to obtain a first training-dependent vector corresponding to the first to-be-recognized training sentence and a second training-dependent vector corresponding to the second to-be-recognized training sentence
  • the vector representation of the training sentence, the first training dependency vector and the second training dependency vector are aggregated and input into the to-be-trained paraphrase recognition model to obtain an output paraphrase identification label, and then based on the output paraphrase sentence Identify the label and the statement label, calculate the paraphrase recognition model
  • This implementation provides a method for analyzing dependency syntax based on machine learning.
  • the to-be-analyzed statement is vectorized to obtain a vectorized statement, and then based on a preset dependency relationship discrimination model, the vectorized statement is subjected to dependency relationship discrimination. , obtain the dependency discrimination result, and then realize the purpose of judging whether there is a dependency between the words of the sentence to be analyzed, and then based on the preset dependency type prediction model and the dependency discrimination result, the vectorized sentence Predicting the type of dependency relationship is performed to obtain the prediction result of the type of dependency relationship, thereby realizing the purpose of predicting the type of dependency relationship between words in the sentence to be analyzed, and because the type of dependency relationship is based on the prediction relationship discrimination result.
  • the target sentence trunk corresponding to the to-be-analyzed sentence can be extracted to perform semantic recognition on the to-be-analyzed sentence.
  • the non-sentence trunk part in the to-be-analyzed sentence except the target sentence trunk has a small contribution to semantic recognition and interferes with semantic recognition, thereby realizing the method based on dependency syntax analysis
  • the purpose of eliminating the non-sentence stem part in the sentence to be analyzed that contributes little to semantic recognition makes the target sentence stem closer to the real semantics of the sentence to be analyzed, thereby improving the accuracy of sentence semantic recognition.
  • FIG. 4 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
  • the apparatus for extracting sentence stems based on dependency syntax may include: a processor 1001 , such as a CPU, a memory 1005 , and a communication bus 1002 .
  • the communication bus 1002 is used to realize the connection communication between the processor 1001 and the memory 1005 .
  • the memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
  • the device for extracting sentence trunks based on dependency syntax may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like.
  • the rectangular user interface may include a display screen (Display), an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface.
  • Optional network interfaces may include standard wired interfaces and wireless interfaces (eg, WI-FI interfaces).
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, and a sentence trunk extraction program based on dependency syntax.
  • the operating system is a program that manages and controls the hardware and software resources of the dependency syntax-based sentence trunk extraction device, and supports the execution of the dependency syntax-based sentence trunk extraction program and other software and/or programs.
  • the network communication module is used to realize the communication between various components in the memory 1005, and communicate with other hardware and software in the sentence trunk extraction system based on the dependency syntax.
  • the processor 1001 is configured to execute the program for extracting sentence stems based on dependency syntaxes stored in the memory 1005 to realize the sentence stems based on any one of the above-mentioned dependency syntaxes. Extraction method steps.
  • the specific implementation of the device for extracting sentence stems based on dependency syntax in the present application is basically the same as the above-mentioned embodiments of the method for extracting sentence stems based on dependency syntax, and details are not repeated here.
  • An embodiment of the present application further provides a device for extracting sentence stems based on dependency syntax.
  • the device for extracting sentence stems based on dependency syntax is applied to a device for extracting sentence stems based on dependency syntax.
  • the device for extracting sentence stems based on dependency syntax includes:
  • a dependency syntax analysis module used to obtain a statement to be analyzed, and to perform a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
  • the sentence stem extraction module is used for extracting the target sentence stem corresponding to the to-be-analyzed sentence based on the dependent syntax analysis result, so as to perform semantic recognition on the to-be-analyzed sentence.
  • the sentence stem extraction module includes:
  • a determining unit configured to determine sentence pattern information corresponding to the to-be-analyzed sentence based on the dependency type prediction result
  • An extraction unit configured to extract the stem of the target sentence based on the sentence pattern information and the dependency type prediction result.
  • the extraction unit includes:
  • a first determining subunit configured to determine the target core word corresponding to the sentence to be analyzed based on the sentence pattern information
  • the second determination subunit is used to determine each target sentence stem word corresponding to the target core word based on the preset word component priority and the dependency type prediction result;
  • a first generating subunit configured to generate the target sentence stem based on the sentence pattern information, the target core word and each target sentence stem word.
  • the dependency syntax analysis module includes:
  • a vectorization unit for vectorizing the statement to be analyzed to obtain a vectorized statement
  • a dependency discrimination unit configured to perform dependency discrimination on the vectorized statement based on a preset dependency discrimination model, and obtain a dependency discrimination result
  • a dependency type prediction unit configured to perform dependency type prediction on the vectorized statement based on a preset dependency type prediction model and the dependency discrimination result, and obtain the dependency type prediction result.
  • the dependency determination unit includes:
  • a feature extraction subunit configured to perform feature extraction on the vectorized statement based on the first feature extraction model to obtain a first feature extraction result
  • the fully-connected subunit is configured to fully connect the first feature extraction result based on the first fully-connected network and the second fully-connected network, respectively, to obtain a first sentence vector and a second sentence vector;
  • a double affine transformation subunit is used to perform double affine transformation on the first sentence vector and the second sentence vector based on the first double affine transformation network to obtain a dependency score matrix
  • the third determination subunit is configured to determine the dependency discrimination result based on the dependency score matrix.
  • the dependency type prediction unit includes:
  • a dependency type prediction subunit configured to perform dependency type prediction on the vectorized statement based on the preset dependency type prediction model, and obtain a dependency type probability score matrix
  • a fusion subunit configured to fuse the dependency type probability score matrix and the dependency relationship vector to obtain the dependency type prediction result.
  • the vectorization unit includes:
  • an acquisition subunit used to acquire the word vector to be analyzed corresponding to the word to be analyzed, the corresponding part of speech vector to be analyzed and the corresponding word position vector to be analyzed;
  • the second generating subunit is configured to generate the vectorized word based on the word vector to be analyzed, the part of speech vector to be analyzed, and the position vector of the word to be analyzed.
  • the dependency syntax analysis module further includes:
  • an acquisition unit for acquiring a statement to be processed and identifying background components in the statement to be processed
  • a removing unit configured to remove the background component in the to-be-processed sentence to obtain the to-be-analyzed sentence.
  • An embodiment of the present application provides a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs can also be executed by one or more processors to implement The steps of any one of the above-mentioned methods for extracting sentence stems based on dependency syntax.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

A method and apparatus for extracting a sentence main portion on the basis of dependency grammar, and a readable storage medium. The method comprises: acquiring a sentence to be analyzed, and performing dependency grammar analysis of the sentence, so as to obtain a dependency grammar analysis result (S10); and extracting, on the basis of the dependency grammar analysis result, a target sentence main portion corresponding to the sentence, so as to perform semantic recognition on the sentence (S20).

Description

基于依存句法的句子主干抽取方法、设备和可读存储介质Sentence stem extraction method, device and readable storage medium based on dependency syntax
本申请要求2020年9月14日申请的,申请号为202010965433.6,名称为“基于依存句法的句子主干抽取方法、设备和可读存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。This application claims the priority of the Chinese patent application filed on September 14, 2020, the application number is 202010965433.6, and the title is "Sentence trunk extraction method, device and readable storage medium based on dependency syntax", which is hereby incorporated in its entirety. Reference.
技术领域technical field
本申请涉及金融科技(Fintech)的人工智能领域,尤其涉及一种基于依存句法的句子主干抽取方法、设备和可读存储介质。The present application relates to the field of artificial intelligence in financial technology (Fintech), and in particular, to a method, device and readable storage medium for sentence stem extraction based on dependency syntax.
背景技术Background technique
随着金融科技,尤其是互联网科技金融的不断发展,越来越多的技术(如分布式、区块链Blockchain、人工智能等)应用在金融领域,但金融业也对技术提出了更高的要求,如对金融业对应待办事项的分发也有更高的要求。With the continuous development of financial technology, especially Internet technology finance, more and more technologies (such as distributed, blockchain, artificial intelligence, etc.) are applied in the financial field, but the financial industry also puts forward higher requirements for technology. Requirements, such as the distribution of corresponding to-do items in the financial industry, also have higher requirements.
随着计算机软件和人工智能的不断发展,人工智能的应用领域也越来越广泛,在基于人工智能的对话***中,常需要对句子的语义进行正确的识别,目前,通常通过词频算法检索方法对句子进行语义识别,也即,基于句子的词信息,对句子进行语义识别,其中,所述词信息包括词频信息和词顺序信息等,然而,对于一些语义相同但词信息不同的句子,例如,“我喜欢你”、“我很早就喜欢隔壁班的你”、“你被我偷偷的喜欢着”等三个句子所表达的语义均为我喜欢你,但三个句子的词信息均不同,进而目前的语义识别方法难以准确识别这些句子具备相同的语义,进而目前的句子语义识别的准确率仍然较低。With the continuous development of computer software and artificial intelligence, the application fields of artificial intelligence are becoming more and more extensive. In the dialogue system based on artificial intelligence, it is often necessary to correctly identify the semantics of sentences. Perform semantic recognition on sentences, that is, perform semantic recognition on sentences based on the word information of the sentences, wherein the word information includes word frequency information and word order information, etc. However, for some sentences with the same semantics but different word information, such as , "I like you", "I like you in the next class very early", "You are secretly liked by me" and other three sentences express the semantics that I like you, but the word information of the three sentences is Therefore, it is difficult for the current semantic recognition method to accurately recognize that these sentences have the same semantics, and the accuracy of the current sentence semantic recognition is still low.
发明内容SUMMARY OF THE INVENTION
本申请的主要目的在于提供一种基于依存句法的句子主干抽取方法、设备和可读存储介质,旨在解决相关技术中句子语义识别准确率低的技术问题。The main purpose of this application is to provide a method, device and readable storage medium for sentence stem extraction based on dependency syntax, which aims to solve the technical problem of low sentence semantic recognition accuracy in the related art.
为实现上述目的,本申请提供一种基于依存句法的句子主干抽取方法,所述基于依存句法的句子主干抽取方法应用于基于依存句法的句子主干抽取设备,所述基于依存句法的句子主干抽取方法包括:In order to achieve the above object, the present application provides a method for extracting sentence stems based on dependency syntax. The method for extracting sentence stems based on dependency syntax is applied to a sentence stem extraction device based on dependency syntax. include:
获取待分析语句,并对所述待分析语句进行依存句法分析,获得依存句法分析结果;Obtaining a statement to be analyzed, and performing a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,以对所述待分析语句进行语义识别。Based on the result of the dependency syntax analysis, a target sentence trunk corresponding to the to-be-analyzed sentence is extracted to perform semantic recognition on the to-be-analyzed sentence.
本申请还提供一种基于依存句法的句子主干抽取装置,所述基于依存句法的句子主干 抽取装置为虚拟装置,且所述基于依存句法的句子主干抽取装置应用于基于依存句法的句子主干抽取设备,所述基于依存句法的句子主干抽取装置包括:The present application also provides a device for extracting sentence stems based on dependency syntax. The device for extracting sentence stems based on dependency syntax is a virtual device, and the device for extracting sentence stems based on dependency syntax is applied to a device for extracting sentence stems based on dependency syntax. , the device for extracting sentence stems based on dependency syntax includes:
依存句法分析模块,用于获取待分析语句,并对所述待分析语句进行依存句法分析,获得依存句法分析结果;a dependency syntax analysis module, used to obtain a statement to be analyzed, and to perform a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
句子主干抽取模块,用于基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,以对所述待分析语句进行语义识别。The sentence stem extraction module is used for extracting the target sentence stem corresponding to the to-be-analyzed sentence based on the dependent syntax analysis result, so as to perform semantic recognition on the to-be-analyzed sentence.
本申请还提供一种基于依存句法的句子主干抽取设备,所述基于依存句法的句子主干抽取设备为实体设备,所述基于依存句法的句子主干抽取设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的所述基于依存句法的句子主干抽取方法的程序,所述基于依存句法的句子主干抽取方法的程序被处理器执行时可实现如上述的基于依存句法的句子主干抽取方法的步骤。The present application also provides a device for extracting sentence stems based on dependency syntax. The device for extracting sentence stems based on dependency syntax is an entity device. The device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a The program of the method for extracting sentence stems based on dependency syntax on the memory and running on the processor, the program for the method for extracting sentence stems based on dependency syntax can realize the above-mentioned dependency syntax-based program when executed by the processor The steps of the sentence stem extraction method.
本申请还提供一种可读存储介质,所述可读存储介质上存储有实现基于依存句法的句子主干抽取方法的程序,所述基于依存句法的句子主干抽取方法的程序被处理器执行时实现如上述的基于依存句法的句子主干抽取方法的步骤。The present application also provides a readable storage medium, where a program for implementing the method for extracting sentence stems based on dependency syntax is stored thereon, and the program for the method for extracting sentence stems based on dependency syntax is implemented when executed by a processor The steps of the sentence stem extraction method based on the above-mentioned dependency syntax.
本申请提供了一种基于依存句法的句子主干抽取方法、设备和可读存储介质,相比于相关技术采用的基于句子的词频信息,对句子进行语义识别的方法,本申请在获取待分析语句之后,对所述待分析语句进行依存句法分析,获得所述待分析语句对应的依存句法分析结果,进而基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,进而实现了基于依存句法分析的方法,抽取待分析语句的目标句子主干的目的,其中,需要说明的是,在所述待分析语句中除所述目标句子主干之外的非句子主干部分对语义识别的贡献度较小,且对语义识别存在干扰,进而实现了基于依存句法分析的方法,在待分析语句中剔除对语义识别的贡献度小的非句子主干部分的目的,使得目标句子主干更加贴近于待分析语句的真实语义,进而基于所述目标句子主干,对所述待分析语句进行语义识别,可提高句子语义识别的准确率,且由于目标句子主干相比于待分析语句更加精简,进而在提高语义识别的准确率的同时,进行语义识别时的输入数据量更小,可减少进行语义识别时的计算量,提高了语义识别时的计算效率,也即,提高了语义识别的效率,克服了相关技术中基于句子的词信息,对句子进行语义识别时,对于一些语义相同但词信息不同的句子,难以准确识别这些句子具备相同的语义,而导致句子语义识别准确率低的技术缺陷,提高了句子语义识别的准确率。The present application provides a method, device, and readable storage medium for extracting sentence trunks based on dependency syntax. Compared with the method for semantic recognition of sentences based on the word frequency information of sentences adopted in the related art, the present application is in the process of obtaining sentences to be analyzed. After that, a dependency syntax analysis is performed on the to-be-analyzed sentence to obtain a dependency-syntax analysis result corresponding to the to-be-analyzed sentence, and then based on the dependency syntax analysis result, the target sentence trunk corresponding to the to-be-analyzed sentence is extracted, thereby realizing The purpose of extracting the target sentence stem of the sentence to be analyzed is based on the method of dependency parsing. It should be noted that the contribution of the non-sentence stem part of the sentence to be analyzed except the target sentence stem to semantic recognition The degree is small, and it interferes with semantic recognition, and then the method based on dependency syntax analysis is realized. The purpose of eliminating the non-sentence stem part that contributes little to semantic recognition in the sentence to be analyzed makes the target sentence stem closer to the target sentence. Analyze the real semantics of the sentence, and then perform semantic recognition on the to-be-analyzed sentence based on the target sentence backbone, which can improve the accuracy of sentence semantic recognition, and because the target sentence backbone is more compact than the to-be-analyzed sentence, further improving At the same time of the accuracy of semantic recognition, the amount of input data for semantic recognition is smaller, which can reduce the amount of calculation in semantic recognition and improve the computational efficiency of semantic recognition. In the related art, when performing semantic recognition on sentences based on the word information of sentences, for some sentences with the same semantics but different word information, it is difficult to accurately identify these sentences with the same semantics, which leads to the technical defect of low sentence semantic recognition accuracy. The accuracy of sentence semantic recognition.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or related technologies, the accompanying drawings required for describing the embodiments or related technologies will be briefly introduced below. Obviously, for those skilled in the art, On the premise of no creative labor, other drawings can also be obtained from these drawings.
图1为本申请基于依存句法的句子主干抽取方法第一实施例的流程示意图;1 is a schematic flowchart of a first embodiment of a method for extracting sentence stems based on dependency syntax in the present application;
图2为本申请基于依存句法的句子主干抽取方法中所述待分析语句对应的依存关系树的示意图;2 is a schematic diagram of a dependency tree corresponding to the to-be-analyzed sentence described in the sentence trunk extraction method based on the dependency syntax of the present application;
图3为本申请基于依存句法的句子主干抽取方法第二实施例的流程示意图;3 is a schematic flowchart of a second embodiment of a method for extracting sentence stems based on dependency syntax in the present application;
图4为本申请实施例方案涉及的硬件运行环境的设备结构示意图。FIG. 4 is a schematic diagram of a device structure of a hardware operating environment involved in the solution of the embodiment of the present application.
本申请目的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.
具体实施方式detailed description
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
本申请实施例提供一种基于依存句法的句子主干抽取方法,在本申请基于依存句法的句子主干抽取方法的第一实施例中,参照图1,所述基于依存句法的句子主干抽取方法包括:An embodiment of the present application provides a method for extracting sentence stems based on dependency syntax. In the first embodiment of the method for extracting sentence stems based on dependency syntax, referring to FIG. 1 , the method for extracting sentence stems based on dependency syntax includes:
步骤S10,获取待分析语句,并对所述待分析语句进行依存句法分析,获得依存句法分析结果;Step S10, obtaining a statement to be analyzed, and performing a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
在本实施例中,需要说明的是,所述待分析语句为预处理好的需要进行依存句法分析的句子,其中,预处理的目的为在所述待分析语句中去除干扰进行依存句法分析的背景成分,所述基于依存句法的句子主干抽取方法应用于人机对话***,所述待分析语句为预处理后的进行人机对话时用户回复的语句,所述基于依存句法的句子主干抽取设备包括预设依存句法模型,其中,所述预设依存句法模型为预先训练好的机器学习模型,用于对语句进行依存句法分析,其中,依存句法分析的过程即为解析语句的句法信息的过程,其中,所述句法信息包括句型信息和词成分信息,例如,假设语句为“我是谁”,则经过依存句法分析后,句型信息表明该语句为主谓宾句,词成分信息表明“我”为主语,“是”为谓语,“谁”为宾语。In this embodiment, it should be noted that the to-be-analyzed sentence is a preprocessed sentence that needs to be subjected to dependency syntax analysis, wherein the purpose of the preprocessing is to remove interference in the to-be-analyzed sentence and perform dependency syntax analysis. Background component, the method for extracting sentence trunks based on dependency syntax is applied to a human-computer dialogue system, the sentences to be analyzed are preprocessed sentences that are replied by a user during a human-computer dialogue, and the sentence trunk extraction device based on dependency syntaxes It includes a preset dependency syntax model, wherein the preset dependency syntax model is a pre-trained machine learning model for performing dependency syntax analysis on sentences, wherein the process of dependency syntax analysis is the process of parsing syntax information of sentences , wherein the syntactic information includes sentence type information and word component information. For example, if the sentence is "Who am I", after the dependency syntax analysis, the sentence type information indicates that the sentence is a subject-predicate-object sentence, and the word component information indicates that the sentence is a subject-predicate-object sentence. "I" is the subject, "is" is the predicate, and "who" is the object.
获取待分析语句,并对所述待分析语句进行依存句法分析,获得依存句法分析结果,具体地,获得待处理语句,其中,所述待处理语句为人机对话***中收集的语句,进而对所述待处理语句进行预处理,获得待分析语句,进而将所述待分析语句输入所述预设依存 句法模型,分别对所述待分析语句进行依存关系判别和依存关系类型预测,以对所述待分析语句进行依存句法分析,获得依存句法分析结果,其中,需要说明的是,进行依存关系判别的目的为判别词与词之间的依存关系,进行依存关系类型预测的目的是预测依存关系的类型,例如,假设待分析语句为语句“ABC”,其中A、B和C均为待分析语句中的词,则进行依存关系判别后,可判定B依存于A,C依存于B,进行依存关系类型预测后,可确定A与B之间的依存关系为主谓关系,B与C之间的依存关系为动宾关系,其中,在一种可实施的方式中,所述对所述待分析语句进行依存关系判别和依存关系类型预测,以对所述待分析语句进行依存句法分析,获得依存句法分析结果的步骤包括:Obtain the statement to be analyzed, and perform dependency syntax analysis on the statement to be analyzed to obtain the result of the dependency syntax analysis, and specifically, obtain the statement to be processed, wherein the statement to be processed is the statement collected in the human-machine dialogue system, and then the The to-be-processed sentence is preprocessed to obtain the to-be-analyzed sentence, and then the to-be-analyzed sentence is input into the preset dependency syntax model, and the The sentence to be analyzed is subjected to dependency syntax analysis, and the result of the dependency syntax analysis is obtained. It should be noted that the purpose of determining the dependency relationship is to determine the dependency relationship between words, and the purpose of performing the dependency relationship type prediction is to predict the dependency relationship. Type, for example, assuming that the sentence to be analyzed is the sentence "ABC", in which A, B and C are all words in the sentence to be analyzed, after the dependency relationship is judged, it can be determined that B depends on A, C depends on B, and the dependency is carried out. After the relationship type is predicted, it can be determined that the dependence relationship between A and B is a subject-predicate relationship, and the dependence relationship between B and C is a verb-object relationship. The analysis sentence performs dependency relationship discrimination and dependency relationship type prediction, so as to perform a dependency syntax analysis on the to-be-analyzed sentence, and the steps of obtaining a dependency syntax analysis result include:
对所述待分析语句进行依存关系判别,获得所述待分析语句对应的依存关系判别结果,并对所述待分析语句进行依存关系类型预测,获得所述待分析语句对应的依存关系类型预测结果,进而将所述依存关系判别结果和所述依存关系类型结果进行融合,获得所述待分析语句中词与词之间的依存关系类型标签,其中,所述依存关系类型标签为依存关系类型的标识,进而将所述依存关系类型标签作为所述依存句法分析结果,其中,所述依存关系类型预测结果可用矩阵进行表示,所述依存关系类型预测结果对应的矩阵形式为依存关系类型预测概率矩阵,其中,所述依存关系类型预测概率矩阵中的每一比特位上的值均为所述待分析语句中一词与另外一词之间的依存关系类型标签概率预测向量,其中,所述依存关系类型预测向量中的每一比特位上的值均为所述待分析语句中一词与另外一词的依存关系属于该比特位对应的预设依存关系的概率值,其中,所述预设依存关系包括主谓关系、动宾关系等,例如,假设词A与词B之间的所述依存关系类型标签概率预测向量为(0.1,0.9),则0.1表示词A与词B之间为主谓关系的概率为10%,0.9表示词A与词B之间的动宾关系的概率为90%。Perform dependency relationship discrimination on the statement to be analyzed, obtain a dependency relationship determination result corresponding to the statement to be analyzed, and perform a dependency relationship type prediction on the statement to be analyzed to obtain a dependency relationship type prediction result corresponding to the statement to be analyzed. , and then fuse the dependency discrimination result and the dependency type result to obtain the dependency type label between the words in the to-be-analyzed sentence, wherein the dependency type label is the type of the dependency. and then use the dependency type label as the dependency syntax analysis result, wherein the dependency type prediction result can be represented by a matrix, and the matrix form corresponding to the dependency type prediction result is the dependency type prediction probability matrix , wherein the value of each bit in the dependency type prediction probability matrix is the dependency type label probability prediction vector between a word in the to-be-analyzed sentence and another word, wherein the dependency The value on each bit in the relationship type prediction vector is the probability value that the dependency between a word and another word in the to-be-analyzed sentence belongs to the preset dependency corresponding to the bit, wherein the preset Dependencies include subject-predicate relationships, verb-object relationships, etc. For example, assuming that the dependency type label probability prediction vector between word A and word B is (0.1, 0.9), then 0.1 means that the relationship between word A and word B is The probability of a subject-predicate relationship is 10%, and 0.9 indicates a 90% probability of a verb-object relationship between word A and word B.
其中,所述获取待分析语句的步骤包括:Wherein, the step of obtaining the statement to be analyzed includes:
步骤S11,获取待处理语句,并识别所述待处理语句中的背景成分;Step S11, acquiring the statement to be processed, and identifying the background components in the statement to be processed;
在本实施例中,获取待处理语句,并识别所述待处理语句中的背景成分,具体地,获取待处理语句,并确定所述待处理语句中各待处理词的词位置,其中,所述词位置包括前缀位置、句中位置和后缀位置,进而基于每一所述待处理词的词位置,为各所述待分析词匹配对应的预设位置背景词集合,进而基于各所述待分析词对应的预设位置背景词集合,在各所述待分析词中确定各背景词,并将各所述背景词作为所述背景成分,其中,所述预设位置背景词集合包括前缀位置背景词集合、句中位置背景词集合和后缀位置背景词集合,其中,所述前缀位置背景词集合包括“还有就是说”、“还有就是”、“打个比方说”、 “打个比方”、“我想知道”、“我知道”、“我不知道”、“我想了解一下”、“我想了解下”、“我想问问”、“我想问”、“我请问下”等前缀位置背景词,所述句中位置背景词集合包括“一般情况下”、“等一下”、“那个”、“麻烦下”等句中位置背景词,所述后缀位置背景词集合包括“什么的”、“你知道吗”、“是这个意思吗”、“是吗”、“是不是”等后缀位置背景词。In this embodiment, the sentence to be processed is acquired, and the background components in the sentence to be processed are identified. Specifically, the sentence to be processed is acquired, and the word position of each word to be processed in the sentence to be processed is determined, wherein the The predicate positions include prefix positions, mid-sentence positions and suffix positions, and then based on the word position of each of the to-be-processed words, match the corresponding preset position background words for each of the to-be-analyzed words, and then based on each of the to-be-analyzed words. Analyzing the preset position background word set corresponding to the word, determining each background word in each of the to-be-analyzed words, and using each of the background words as the background component, wherein the preset position background word set includes a prefix position A set of background words, a set of positional background words in a sentence, and a set of suffixed positional background words, wherein the set of prefixed positional background words includes "there is also that", "there is that", "for example", "for example" For example", "I want to know", "I know", "I don't know", "I want to know", "I want to know", "I want to ask", "I want to ask", "I want to ask Prefix positional context words such as "xia", the set of positional context words in the sentence includes the positional context words in sentences such as "generally", "wait a minute", "that", "trouble under", etc., and the set of suffixed positional context words Including "what", "do you know", "does this mean", "is it", "is it" and other suffixed positional background words.
另外地,需要说明的是,在一种可实施的方式中,所述基于各所述待分析词对应的预设位置背景词集合,在各所述待分析词中确定各背景词的步骤包括:对每一所述待分析词均执行以下步骤:In addition, it should be noted that, in an implementable manner, the step of determining each context word in each of the to-be-analyzed words based on the preset position context word set corresponding to each of the to-be-analyzed words includes the following steps: : Perform the following steps for each of the words to be analyzed:
将所述待分析词与所述待分析对应的预设位置背景词集合中的各预设背景词进行比对,以判断所述待分析词对应的预设位置背景词集合中是否存在与所述待分析词一致的预设背景词,若存在,则将所述待分析词作为背景词,若不存在,则所述待分析词不为背景词。The to-be-analyzed word is compared with each preset context word in the preset position context word set corresponding to the to-be-analyzed word to determine whether there is a preset position context word set corresponding to the to-be-analyzed word and the preset context word set corresponding to the to-be-analyzed word. The preset background word that describes the word to be analyzed is consistent. If there is, the word to be analyzed is used as a background word. If it does not exist, the word to be analyzed is not a background word.
步骤S12,在所述待处理语句中去除所述背景成分,获得所述待分析语句。Step S12, removing the background component from the sentence to be processed to obtain the sentence to be analyzed.
在本实施例中,在所述待处理语句中去除所述背景成分,获得所述待分析语句,具体地,剔除所述待处理语句中的背景成分,以剔除所述待处理语句中对句子主干抽取过程存在干扰的背景成分,进而将所述待处理语句中除所述背景成分之外的部分作为所述待分析语句,以提高句子主干抽取的准确性。In this embodiment, the background component is removed from the to-be-processed sentence to obtain the to-be-analyzed sentence. Specifically, the background component from the to-be-processed sentence is removed to remove the opposite sentence from the to-be-processed sentence. In the trunk extraction process, there are interfering background components, and the part of the sentence to be processed other than the background component is used as the sentence to be analyzed, so as to improve the accuracy of sentence trunk extraction.
步骤S20,基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,以对所述待分析语句进行语义识别。Step S20, based on the dependency syntax analysis result, extract the target sentence trunk corresponding to the to-be-analyzed sentence, so as to perform semantic recognition on the to-be-analyzed sentence.
在本实施例中,需要说明的是,所述依存句法分析结果包括依存关系标签预测结果,其中,所述依存关系类型预测结果为所述待分析语句中词与词之间的依存关系类型,其中,所述依存关系类型包括主谓关系类型、动宾关系类型和动补结构类型等。In this embodiment, it should be noted that the dependency syntax analysis result includes a dependency label prediction result, wherein the dependency type prediction result is the dependency type between words in the to-be-analyzed sentence, Wherein, the dependency relationship types include subject-predicate relationship types, verb-object relationship types, and verb-complement structure types.
基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,以对所述待分析语句进行语义识别,具体地,基于所述依存关系类型预测结果,确定所述待分析语句中各待分析词的词成分,进而基于待分析语句中各所述待分析词的词性和各所述待分析词的词成分,确定所述待分析语句的句型信息,其中,所述词性为待分析词的本身的性质,例如所述词性包括动词、名称和量词等,所述词成分为待分析词在所述待分析语句中表现的性质,例如所述词成分包括主语、谓语和宾语等,进而基于所述句型信息,在所述待分析语句中选取目标核心词,进而基于所述依存关系类型预测结果,在所述待分析语句中选取所述目标核心词关联的各目标句子主干词,进而将所述目标核心词和各所述目标句子主 干词组成所述目标句子主干,进而将所述目标句子主干作为预设句子语义识别模型的模型输入,以对所述待分析语句进行语义识别。Based on the dependency syntax analysis result, extract the target sentence trunk corresponding to the to-be-analyzed sentence to perform semantic recognition on the to-be-analyzed sentence. Specifically, based on the dependency type prediction result, determine the word components of each word to be analyzed, and then based on the part of speech of each word to be analyzed and the word component of each word to be analyzed in the sentence to be analyzed, determine the sentence pattern information of the sentence to be analyzed, wherein the part of speech is The properties of the word to be analyzed, for example, the part of speech includes verbs, names, and quantifiers, etc., and the word components are the properties of the word to be analyzed in the statement to be analyzed, for example, the word components include subject, predicate, and object etc., and then based on the sentence pattern information, select the target core word in the statement to be analyzed, and then based on the dependency type prediction result, select each target sentence associated with the target core word in the statement to be analyzed stem words, and then the target core words and each of the target sentence stem words form the target sentence stem, and then the target sentence stem is used as the model input of the preset sentence semantic recognition model, so as to analyze the sentence to be analyzed. Semantic recognition.
其中,所述依存句法分析结果包括依存关系类型预测结果,Wherein, the dependency syntax analysis result includes a dependency relationship type prediction result,
所述基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干的步骤包括:The step of extracting the trunk of the target sentence corresponding to the sentence to be analyzed based on the result of the dependency syntax analysis includes:
步骤S21,基于所述依存关系类型预测结果,确定所述待分析语句对应的句型信息;Step S21, determining sentence pattern information corresponding to the to-be-analyzed sentence based on the dependency type prediction result;
在本实施例中,基于所述依存关系类型预测结果,确定所述待分析语句对应的句型信息,具体地,基于所述待分析语句中各待分析词之间的依存关系类型,确定各所述待分析词的词成分,进而获取各所述待分析词的词性,进而基于各所述待分析词的词性和词成分,对所述待分析语句进行句型判别,确定所述待分析语句的句型信息,其中,所述句型信息为所述待分析语句的句型的标识信息,所述待分析语句的句型包括动词性谓语句、名词性谓语句、形容词性谓语句、介词性谓语句、连动句、双宾语语句、比字句、被字句、把字句和兼语句等,其中,所述动词性谓语句为动词做谓语的语句,所述名词性谓语句为名词做谓语的语句,所述形容词性谓语句为形容词做谓语语句,所述介词性谓语句为介词做谓语的语句,所述连动句为两个连续的互不支配的动词同为谓语的语句,所述双宾语语句为谓语对应的依存关系类型同时包括间宾关系类型和动宾关系类型的语句,所述比字句为谓语的状语中有“比”开始的状语的语句,所述被字句为谓语的状语中有“被”开始的状语的语句,所述把字句为谓语的状语中有“把”开始的状语的语句,所述兼语句为宾语对应的依存关系类型为兼语类型的语句。In this embodiment, sentence type information corresponding to the to-be-analyzed sentence is determined based on the dependency type prediction result. Specifically, based on the dependency type between the to-be-analyzed words in the to-be-analyzed sentence, each The word component of the word to be analyzed, and then the part of speech of each word to be analyzed is obtained, and then based on the part of speech and word component of each word to be analyzed, sentence type discrimination is performed on the sentence to be analyzed, and the sentence to be analyzed is determined. Sentence pattern information of a statement, wherein the sentence pattern information is identification information of the sentence pattern of the statement to be analyzed, and the sentence pattern of the statement to be analyzed includes a verb predicate, a noun predicate, an adjective predicate, Prepositional predicate sentences, linked sentences, double-object sentences, comparison-character sentences, qui-character sentences, ba-character sentences and concurrent sentences, etc., wherein the verb predicate sentence is a sentence in which a verb is used as a predicate, and the noun predicate sentence is a noun predicate sentence. A predicate sentence, the adjective predicate sentence is an adjective as a predicate sentence, the prepositional predicate sentence is a preposition as a predicate sentence, and the linked sentence is a sentence in which two consecutive verbs that do not dominate each other are the same as the predicate, The double-object statement is a statement in which the dependency relationship type corresponding to the predicate includes both an inter-object relationship type and a verb-object relationship type, and the comparison clause is a statement with an adverbial starting from "compare" in the adverbial clause of the predicate, and the object clause is: The adverbial statement of the predicate contains the adverbial statement beginning with "by", the adverbial statement of the predicate is the adverbial statement beginning with "ba", and the concurrent statement is the statement whose dependency type corresponding to the object is the conjunctive type. .
步骤S22,基于所述句型信息和所述依存关系类型预测结果,抽取所述目标句子主干。Step S22, extracting the target sentence stem based on the sentence pattern information and the dependency type prediction result.
在本实施例中,基于所述句型信息和所述依存关系类型预测结果,抽取所述目标句子主干,具体地,基于所述句型信息,在所述待分析语句的待分析词中选取目标核心词,其中,所述目标核心词为做核心谓语的待分析词,例如,假设所述句型信息表示待分析语句为动词性谓语句,则所述待分析语句的谓语上的动词即为所述目标核心词,进而基于预设词成分优选级和所述待分析语句中词与词之间的依存关系类型,选取所述目标核心词对应的各目标句子主干词,进而将所述目标核心词和各所述目标句子主干词共同对应的依存句法向量作为所述目标句子主干,其中,所述预设词成分优先句子为抽取句子主干时优先抽取的词成分,且需要说明的是,为了保证目标句子主干的语义清晰且精简程度高,需要预先设置好抽取的词成分的数量,例如,假设预先设置好抽取的词成分包括主语、谓语和宾语,且谓语为动词,则基于所述依存关系类型中的主谓关系类型,确定所述目标核心词对 应的主语作为所述目标句子主干词,并基于所述依存关系类型中的动宾关系类型,确定所述目标核心词对应的主语作为所述目标句子主干词,进而所述目标句子主干则由主语、谓语和宾语组成。In this embodiment, the target sentence stem is extracted based on the sentence pattern information and the dependency type prediction result. Specifically, based on the sentence pattern information, the words to be analyzed in the sentence to be analyzed are selected from the words to be analyzed. The target core word, wherein the target core word is the word to be analyzed as the core predicate. For example, if the sentence pattern information indicates that the sentence to be analyzed is a verb predicate sentence, the verb on the predicate of the sentence to be analyzed is is the target core word, and then based on the preset word component priority and the type of dependency between words in the to-be-analyzed statement, select the stem words of each target sentence corresponding to the target core word, and then use the The dependent syntactic vector corresponding to the target core word and each of the target sentence stem words is used as the target sentence stem, wherein the preset word component priority sentence is the word component that is preferentially extracted when extracting the sentence stem, and it should be noted that , in order to ensure that the semantics of the main body of the target sentence is clear and the degree of simplification is high, it is necessary to pre-set the number of extracted word components. The subject-predicate relationship type in the dependency type is determined, the subject corresponding to the target core word is determined as the target sentence stem word, and based on the verb-object relationship type in the dependency type, the target core word is determined. The subject is used as the stem of the target sentence, and then the stem of the target sentence is composed of a subject, a predicate and an object.
在另一种实施例方式中,所述基于所述句型信息和所述依存关系类型预测结果,抽取所述目标句子主干的步骤包括:In another embodiment, the step of extracting the stem of the target sentence based on the sentence pattern information and the dependency type prediction result includes:
基于所述句型信息、所述待分析语句中词与词之间的依存关系类型以及所述待分析语句中各待分析词的词性,生成所述待分析语句对应的依存关系树,进而基于预设词成分优先级和所述依存关系树,抽取所述目标句子主干,其中,所述预设词成分优先级为在句子主干抽取过程的提取词成分的优先级,在一种可实施的方式中,所述待分析语句为“十分失望的他后来又考了几次财政厅的公务员”,则如图2所示为所述待分析语句对应的依存关系树的示意图,其中,ROOT表示所述待分析语句,m、a、u、r、nt、d、v、q和n均为词性的标签,ADV、RAD、ATT、HED、SBV、CMP和VOB为依存关系类型的标签。Based on the sentence pattern information, the type of dependencies between words in the to-be-analyzed sentence, and the part-of-speech of each to-be-analyzed word in the to-be-analyzed sentence, a dependency tree corresponding to the to-be-analyzed sentence is generated, and then based on Preset word component priorities and the dependency tree to extract the target sentence trunk, wherein the preset word component priority is the priority of the extracted word components in the sentence trunk extraction process, in an implementable manner. In the method, the to-be-analyzed sentence is "he was very disappointed that he later took the exam as a civil servant of the Ministry of Finance", then Figure 2 is a schematic diagram of the dependency tree corresponding to the to-be-analyzed sentence, wherein ROOT means In the statement to be analyzed, m, a, u, r, nt, d, v, q and n are all tags of part of speech, and ADV, RAD, ATT, HED, SBV, CMP and VOB are tags of dependency type.
其中,所述基于所述句型信息和所述依存关系类型预测结果,抽取所述目标句子主干的步骤包括:Wherein, the step of extracting the backbone of the target sentence based on the sentence pattern information and the dependency type prediction result includes:
步骤S221,基于所述句型信息,确定所述待分析语句对应的目标核心词;Step S221, based on the sentence pattern information, determine the target core word corresponding to the sentence to be analyzed;
在本实施例中,基于所述句型信息,确定所述待分析语句对应的目标核心词,具体地,基于所述句型信息,确定所述待分析语句对应的核心谓语,其中,所述核心谓语为决定所述待分析语句的句型信息的谓语,进而将所述核心谓语对应的待分析词作为所述目标核心词。In this embodiment, the target core word corresponding to the sentence to be analyzed is determined based on the sentence pattern information, and specifically, the core predicate corresponding to the sentence to be analyzed is determined based on the sentence pattern information, wherein the The core predicate is a predicate that determines the sentence pattern information of the to-be-analyzed sentence, and then the to-be-analyzed word corresponding to the core predicate is used as the target core word.
步骤S222,基于预设词成分优先级和所述依存关系类型预测结果,确定所述目标核心词对应的各目标句子主干词;Step S222, determining each target sentence stem word corresponding to the target core word based on the preset word component priority and the dependency type prediction result;
在本实施例中,需要说明的是,所述预设词成分优先级为在句子主干抽取过程的提取词成分的优先级,所述预设词成分优先级包括预设第一优先级、预设第二优先级和预设第三优先级,其中,在一种可实施的方式中,所述预设第一优先级对应的词成分包括主语、谓语、宾语、状语和补语,所述预设第二优先级对应的词成分包括定语,所述预设第三优先级对应的词成分包括除所述预设第一优先级和所述预设第二优先级共同对应的词成分之外的其他词成分。In this embodiment, it should be noted that the preset word component priority is the priority of the extracted word component in the sentence stem extraction process, and the preset word component priority includes a preset first priority, a preset priority Set a second priority and a preset third priority, wherein, in an implementable manner, the word components corresponding to the preset first priority include a subject, a predicate, an object, an adverbial, and a complement. It is assumed that the word component corresponding to the second priority includes an attributive, and the word component corresponding to the preset third priority includes a word component other than the word component corresponding to the preset first priority and the preset second priority. other word components.
基于预设词成分优先级和所述依存关系类型预测结果,确定所述目标核心词对应的各目标句子主干词,具体地,基于所述待分析语句中词与词之间的依存关系类型,确定所述目标核心词对应的各待选取词的词成分,进而基于所述预设词成分优先级的层数,在各待 选取词的词成分中选取优先级词成分,进而将各所述优先级词成分对应的待分析词作为所述目标句子主干词,其中,需要说明的是,所述预设词成分优先级的第一层为预设第一优先级,所述预设成分优先级的第二层为预设第二优先级,所述预设词成分的第三层为预设第三优先级。Determine the stem words of each target sentence corresponding to the target core word based on the preset word component priority and the dependency type prediction result. Specifically, based on the dependency type between words in the to-be-analyzed sentence, Determine the word components of each to-be-selected word corresponding to the target core word, and then select priority word components from the word components of each The word to be analyzed corresponding to the priority word component is used as the stem word of the target sentence, wherein, it should be noted that the first level of the preset word component priority is the preset first priority, and the preset component priority The second layer of the preset word component is the preset second priority, and the third layer of the preset word component is the preset third priority.
步骤S223,基于所述句型信息、所述目标核心词和各所述目标句子主干词,生成所述目标句子主干。Step S223: Generate the target sentence stem based on the sentence pattern information, the target core word and each target sentence stem word.
在本实施例中,基于所述句型信息、所述目标核心词和各所述目标句子主干词,生成所述目标句子主干,具体地,获取所述目标核心词的词成分和词性,以及各所述目标句子主干词的词成分和词性,进而基于所述句型信息、所述目标核心词、所述目标核心词的词性和词成分、各所述目标句子主干词和各所述目标句子主干词对应的词性和对应的词成分,生成所述目标句子主干,其中,在一种可实施的方式中,假设所述待分析语句为“十分失望的他后来又考了几次财政厅的公务员”,则所述目标句子主干为{句型:主谓宾句,主语:[他,ATT:[十分失望的]],谓语:[考,RAD:[了]],前置宾语:[],宾语:[公务员,ATT:财政厅的]:COO:[],CMP:[几次],ADV:[后来,又]},其中,ATT为定中关系类型的标签,RAD为右附加关系类型的标签,COO为并列关系类型的标签,CMP为动补结构类型的标签,ADV为状中结构类型的标签。In this embodiment, the target sentence stem is generated based on the sentence pattern information, the target core word and each of the target sentence stem words, and specifically, the word components and part of speech of the target core word are obtained, and word components and parts of speech of the stem words of each target sentence, and further based on the sentence pattern information, the target core words, the part of speech and word components of the target core words, the stem words of each target sentence and the target words The part of speech and the corresponding word component corresponding to the sentence stem word are used to generate the target sentence stem, wherein, in an implementable manner, it is assumed that the to-be-analyzed sentence is "very disappointed, he later took the Ministry of Finance exam several times. "the civil servant", then the stem of the target sentence is {sentence pattern: subject-verb-object, subject: [he, ATT: [very disappointed]], predicate: [kao, RAD: [de]], prepositional object: [], object: [civil servant, ATT: Ministry of Finance]: COO: [], CMP: [several times], ADV: [later, again]}, where ATT is the label of the relationship type between China and RAD, and RAD is the right The label of the additional relationship type, COO is the label of the parallel relationship type, CMP is the label of the dynamic complement structure type, and ADV is the label of the structure type in the state.
另外地,需要说明的是,由于句子主干抽取是基于依存句法分析获得的待解析语句对应的句型信息、对应的词成分和对应的词与词之间的依存关系类型进行的,进而本实施例可解释句子主干抽取结果的原因,对于句子主干抽取结果的可解释性强,进而句子主干抽取结果的置信度极高。In addition, it should be noted that since the sentence stem extraction is performed based on the sentence pattern information corresponding to the to-be-parsed sentence obtained by the dependency syntax analysis, the corresponding word components, and the type of the dependency relationship between the corresponding words and words, this implementation The example can explain the reason of the sentence stem extraction result, the interpretability of the sentence stem extraction result is strong, and the confidence of the sentence stem extraction result is extremely high.
本实施例提供了一种基于依存句法的句子主干抽取方法、设备和可读存储介质,相比于相关技术采用的基于句子的词频信息,对句子进行语义识别的方法,本实施例在获取待分析语句之后,对所述待分析语句进行依存句法分析,获得所述待分析语句对应的依存句法分析结果,进而基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,进而实现了基于依存句法分析的方法,抽取待分析语句的目标句子主干的目的,其中,需要说明的是,在所述待分析语句中除所述目标句子主干之外的非句子主干部分对语义识别的贡献度较小,且对语义识别存在干扰,进而实现了基于依存句法分析的方法,在待分析语句中剔除对语义识别的贡献度小的非句子主干部分的目的,使得剔除语义干扰词的目标句子主干更加贴近于待分析语句的真实语义,进而基于所述目标句子主干,对所述待分析语句进行语义识别,可提高句子语义识别的准确率,且由于目标句子主干相比于待分析语 句更加精简,进而在提高语义识别的准确率的同时,进行语义识别时的输入数据量更小,可减少进行语义识别时的计算量,提高了语义识别时的计算效率,也即,提高了语义识别的效率,克服了相关技术中基于句子的词信息,对句子进行语义识别时,对于一些语义相同但词信息不同的句子,难以准确识别这些句子具备相同的语义,而导致句子语义识别准确率低的技术缺陷,提高了句子语义识别的准确率。This embodiment provides a method, device, and readable storage medium for extracting sentence stems based on dependency syntax. Compared with the method for semantic recognition of sentences based on the word frequency information of sentences adopted in the related art, this embodiment obtains the After analyzing the sentence, perform a dependency syntax analysis on the to-be-analyzed sentence to obtain a dependency-syntax analysis result corresponding to the to-be-analyzed sentence, and then extract the target sentence trunk corresponding to the to-be-analyzed sentence based on the dependency syntax analysis result, and then The purpose of extracting the target sentence stem of the sentence to be analyzed based on the method of dependency syntax analysis is realized. It should be noted that the non-sentence stem part of the sentence to be analyzed except the target sentence stem is used for semantic recognition. The contribution of the sentence is small, and it interferes with semantic recognition, and then the method based on dependency syntactic analysis is realized. The backbone of the target sentence is closer to the real semantics of the sentence to be analyzed, and then based on the backbone of the target sentence, semantic recognition of the sentence to be analyzed can improve the accuracy of sentence semantic recognition, and because the target sentence backbone is compared to the target sentence to be analyzed. The sentence is more concise, and the accuracy of semantic recognition is improved, while the amount of input data for semantic recognition is smaller, which can reduce the amount of calculation in semantic recognition and improve the computational efficiency of semantic recognition. The efficiency of semantic recognition overcomes the sentence-based word information in related technologies. When performing semantic recognition on sentences, for some sentences with the same semantics but different word information, it is difficult to accurately identify these sentences with the same semantics, resulting in accurate sentence semantic recognition. The technical defect of low rate improves the accuracy of sentence semantic recognition.
进一步地,参照图3,基于本申请中第一实施例,在本申请的另一实施例中,所述依存句法分析结果包括依存关系类型预测结果,Further, referring to FIG. 3 , based on the first embodiment of the present application, in another embodiment of the present application, the dependency syntax analysis result includes a dependency relationship type prediction result,
所述对所述待分析语句进行依存句法分析,获得依存句法分析结果的步骤包括:The step of performing dependency syntax analysis on the to-be-analyzed statement, and obtaining the result of the dependency syntax analysis includes:
步骤A10,对所述待分析语句进行向量化,获得向量化语句;Step A10, vectorizing the statement to be analyzed to obtain a vectorized statement;
在本实施例中,对所述待分析语句进行向量化,获得向量化语句,具体地,生成所述待分析语句中每一待分析词对应的待分析词向量、待分析词性向量和待分析词位置向量,其中,所述待分析词向量为表示待分析词的编码向量,用于唯一表示所述待分析词,所述待分析词性向量为表示所述待分析词的词性的编码向量,所述待分析词位置向量为表示所述待分析词在所述待分析语句中的位置的编码向量,进而基于每一所述待分析词对应的待分析词向量、对应的待分析词性向量和对应的待分析词位置向量,生成每一所述待分析词对应的向量化词,进而将各所述向量化词构成的矩阵作为所述向量化语句。In this embodiment, the to-be-analyzed sentence is vectorized to obtain a vectorized sentence. Specifically, the to-be-analyzed word vector, the to-be-analyzed part-of-speech vector and the to-be-analyzed word vector corresponding to each to-be-analyzed word in the to-be-analyzed sentence are generated. word position vector, wherein the word vector to be analyzed is a coding vector representing the word to be analyzed, used to uniquely represent the word to be analyzed, and the part of speech vector to be analyzed is a coding vector representing the part of speech of the word to be analyzed, The position vector of the word to be analyzed is a coding vector representing the position of the word to be analyzed in the sentence to be analyzed, and then based on the word vector to be analyzed corresponding to each word to be analyzed, the corresponding part of speech vector to be analyzed and The corresponding to-be-analyzed word position vector is used to generate a vectorized word corresponding to each of the to-be-analyzed words, and then a matrix formed by each of the vectorized words is used as the vectorized sentence.
其中,所述待分析语句至少包括一待分析词,所述向量化语句至少包括一向量化词,Wherein, the to-be-analyzed sentence includes at least one to-be-analyzed word, and the vectorized sentence includes at least one vectorized word,
所述对所述待分析语句进行向量化,获得向量化语句的步骤包括:The step of vectorizing the to-be-analyzed statement to obtain the vectorized statement includes:
步骤A11,获取所述待分析词对应的待分析词向量、对应的待分析词性向量和对应的待分析词位置向量;Step A11, obtaining the word vector to be analyzed corresponding to the word to be analyzed, the corresponding part of speech vector to be analyzed, and the corresponding word position vector to be analyzed;
在本实施例中,获取所述待分析词对应的待分析词向量、对应的待分析词性向量和对应的待分析词位置向量,具体地,基于预设词向量生成模型,将所述待分析词映射至预设向量空间,获得所述待分析词对应的待分析词向量,并为所述待分析词匹配对应的待分析词性向量,进一步地,基于所述待分析词在所述待分析语句中的位置,生成所述待分析词对应的待分析词位置向量。In this embodiment, the to-be-analyzed word vector corresponding to the to-be-analyzed word, the corresponding to-be-analyzed part-of-speech vector, and the corresponding to-be-analyzed word position vector are obtained. The word is mapped to a preset vector space, the word vector to be analyzed corresponding to the word to be analyzed is obtained, and the corresponding part of speech vector to be analyzed is matched for the word to be analyzed. The position in the sentence is generated, and the position vector of the to-be-analyzed word corresponding to the to-be-analyzed word is generated.
步骤A12,基于所述待分析词向量、所述待分析词性向量和所述待分析词位置向量,生成所述向量化词。Step A12: Generate the vectorized word based on the to-be-analyzed word vector, the to-be-analyzed part-of-speech vector, and the to-be-analyzed word position vector.
在本实施例中,基于所述待分析词向量、所述待分析词性向量和所述待分析词位置向量,生成所述向量化词,具体地,将所述待分析词、所述待分析词性向量和所述待分析词位置向量输入预设向量化词计算公式,获得所述向量化词,其中,所述预设向量化词计算 公式如下所示:In this embodiment, the vectorized word is generated based on the to-be-analyzed word vector, the to-be-analyzed part-of-speech vector, and the to-be-analyzed word position vector. The part-of-speech vector and the position vector of the word to be analyzed are input into a preset vectorized word calculation formula to obtain the vectorized word, wherein the preset vectorized word calculation formula is as follows:
Figure PCTCN2021094933-appb-000001
Figure PCTCN2021094933-appb-000001
其中,X i为所述向量化词,E w为所述待分析词向量,E t为所述待分析词性向量,E p为所述待分析词位置向量,
Figure PCTCN2021094933-appb-000002
为向量之间的concate操作。
Wherein, X i is the vectorized word, E w is the word vector to be analyzed, E t is the part of speech vector to be analyzed, E p is the position vector of the word to be analyzed,
Figure PCTCN2021094933-appb-000002
It is the concate operation between vectors.
步骤A20,基于预设依存关系判别模型,对所述向量化语句进行依存关系判别,获得依存关系判别结果;Step A20, based on a preset dependency relation discrimination model, perform dependency relation discrimination on the vectorized statement, and obtain a dependency relation discrimination result;
在本实施例中,需要说明的是,所述预设依存句法模型包括预设依存关系判别模型,其中,所述预设依存关系判别模型为用于判别待分析语句中词与词之间是否存在依存关系的机器学习模型。In this embodiment, it should be noted that the preset dependency syntax model includes a preset dependency judgment model, wherein the preset dependency judgment model is used for judging whether words in the sentence to be analyzed are between words or not. Machine learning models with dependencies.
基于预设依存关系判别模型,对所述向量化语句进行依存关系判别,获得依存关系判别结果,具体地,将所述向量化语句输入所述预设依存关系判别模型,对所述向量化语句进行依存关系判别,以判别待分析语句中词与词之间是否存在依存关系,获得依存关系判别结果。Based on the preset dependency discrimination model, the vectorized statement is subjected to dependency discrimination, and a dependency discrimination result is obtained. Specifically, the vectorized statement is input into the preset dependency discrimination model, and the vectorized statement is The dependency relation is discriminated to discriminate whether there is a dependency relation between the words in the sentence to be analyzed, and the dependency relation discriminant result is obtained.
其中,所述预设依存关系判别模型包括第一特征提取模型、第一全连接网络、第二全连接网络和第一双仿射变换网络,Wherein, the preset dependency relationship discrimination model includes a first feature extraction model, a first fully connected network, a second fully connected network and a first double affine transformation network,
所述基于预设依存关系判别模型,对所述向量化语句进行依存关系判别,获得依存关系判别结果的步骤包括:The step of performing dependency discrimination on the vectorized statement based on the preset dependency discrimination model, and obtaining a dependency discrimination result includes:
步骤A21,基于所述第一特征提取模型,对所述向量化语句进行特征提取,获得第一特征提取结果;Step A21, based on the first feature extraction model, perform feature extraction on the vectorized statement to obtain a first feature extraction result;
在本实施例中,需要说明的是,所述第一特征提取模型为对所述向量化语句进行特征提取的神经网络,所述第一特征提取模型包括Transformer模型、RNN网络和CNN网络等。In this embodiment, it should be noted that the first feature extraction model is a neural network that performs feature extraction on the vectorized sentence, and the first feature extraction model includes a Transformer model, an RNN network, and a CNN network.
基于所述第一特征提取模型,对所述向量化语句进行特征提取,获得第一特征提取结果,具体地,将所述向量化语句输入所述第一特征提取模型,对所述向量化语句进行特征提取,获得第一特征提取矩阵,并将所述第一特征提取矩阵作为所述第一特征提取结果。Based on the first feature extraction model, feature extraction is performed on the vectorized statement to obtain a first feature extraction result. Specifically, the vectorized statement is input into the first feature extraction model, and the vectorized statement is Perform feature extraction to obtain a first feature extraction matrix, and use the first feature extraction matrix as the first feature extraction result.
步骤A22,基于所述第一全连接网络和所述第二全连接网络,分别对所述第一特征提取结果进行全连接,获得第一句子向量和第二句子向量;Step A22, based on the first fully connected network and the second fully connected network, respectively fully connect the first feature extraction result to obtain a first sentence vector and a second sentence vector;
在本实施例中,基于所述第一全连接网络和所述第二全连接网络,分别对所述第一特征提取结果进行全连接,获得第一句子向量和第二句子向量,具体地,将所述第一特征提取矩阵输入第一全连接网络,对所述第一特征提取矩阵进行全连接,获得第一句子向量,并将所述第一特征提取矩阵输入第二全连接网络,对所述第一特征提取矩阵进行全连接, 获得第二句子向量,其中,需要说明的是,所述第一句子向量至少包括一词头向量,用于表示依存关系中作为被依存的词的表示向量,所述第二句子向量至少包括一词尾向量,用于表示依存关系中作为依存的词的表示向量,例如,假设词A依存于词B,则词B对应的词表示向量为词头向量,词A对应的词表示向量为词尾向量。In this embodiment, based on the first fully connected network and the second fully connected network, the first feature extraction results are respectively fully connected to obtain a first sentence vector and a second sentence vector. Specifically, The first feature extraction matrix is input into the first fully connected network, the first feature extraction matrix is fully connected to obtain the first sentence vector, and the first feature extraction matrix is input into the second fully connected network, and the The first feature extraction matrix is fully connected to obtain a second sentence vector, wherein, it should be noted that the first sentence vector includes at least a prefix vector, which is used to represent the representation vector of the dependent word in the dependency relationship , the second sentence vector includes at least a word tail vector, which is used to represent the representation vector of the dependent word in the dependency relationship. For example, assuming that word A depends on word B, the word representation vector corresponding to word B is the prefix vector, and the word The word representation vector corresponding to A is the word tail vector.
步骤A23,基于所述第一双仿射变换网络,对所述第一句子向量和所述第二句子向量进行双仿射变换,获得依存关系得分矩阵;Step A23, based on the first double affine transformation network, perform double affine transformation on the first sentence vector and the second sentence vector to obtain a dependency score matrix;
在本实施例中,基于所述第一双仿射变换网络,对所述第一句子向量和所述第二句子向量进行双仿射变换,获得依存关系得分矩阵,具体地,将所述第一句子向量和所述第二句子向量输入所述第一双仿射变换网络,对所述第一句子向量和所述第二句子向量进行双仿射变换,以计算第一句子向量中每一词头向量和第二句子向量中每一词尾向量存在依存关系的概率得分,获得所述依存关系得分矩阵,其中,所述依存关系得分矩阵为由每一词头向量和每一词尾向量之间存在依存关系的概率得分组成的得分矩阵。In this embodiment, based on the first double affine transformation network, double affine transformation is performed on the first sentence vector and the second sentence vector to obtain a dependency score matrix. A sentence vector and the second sentence vector are input into the first double affine transformation network, and double affine transformation is performed on the first sentence vector and the second sentence vector to calculate each of the first sentence vectors. The probability score of the existence of a dependency relationship between the prefix vector and each word ending vector in the second sentence vector, and the dependency score matrix is obtained, wherein the dependency score matrix is determined by the existence of a dependency between each prefix vector and each word ending vector. A score matrix consisting of probability scores for the relationship.
步骤A24,基于所述依存关系得分矩阵,确定所述依存关系判别结果。Step A24: Determine the dependency discrimination result based on the dependency score matrix.
在本实施例中,基于所述依存关系得分矩阵,确定所述依存关系判别结果,具体地基于预设最大生成树算法,在所述依存关系得分矩阵中选取满足预设得分选取条件的最大概率得分和,并将所述最大概率得分和对应的各目标概率得分对应的依存关系对应的向量化词组成的依存关系向量作为所述依存关系判别结果,其中,所述预设得分选取条件包括各目标概率得分对应的待分析词与所述待分析语句中的待分析词一一对应等,例如,假设各所述目标概率得分为A和B,其中,目标概率得分A表示词b依附于词a的概率得分,目标概率得分B表示词c依附于词b的概率得分,且词a对应向量化词为向量X,词b对应向量化词为向量Y,词c对应向量化词为向量Z,进而所述依存关系向量为向量(X,1,0,0,1,Y,1,0,0,1,Z),其中(1,0,0,1)表示词与词之间存在依存关系。In this embodiment, the dependency determination result is determined based on the dependency score matrix, and specifically, based on a preset maximum spanning tree algorithm, a maximum probability that satisfies a preset score selection condition is selected from the dependency score matrix score sum, and use the dependency relationship vector composed of the vectorized words corresponding to the dependency relationship corresponding to the maximum probability score and the corresponding target probability score as the dependency relationship discrimination result, wherein the preset score selection conditions include each The words to be analyzed corresponding to the target probability scores are in one-to-one correspondence with the words to be analyzed in the sentences to be analyzed, etc. For example, it is assumed that the target probability scores are A and B, wherein the target probability score A indicates that the word b is attached to the word The probability score of a, the target probability score B represents the probability score of word c attached to word b, and the vectorized word corresponding to word a is vector X, the vectorized word corresponding to word b is vector Y, and the vectorized word corresponding to word c is vector Z , and then the dependency vector is a vector (X, 1, 0, 0, 1, Y, 1, 0, 0, 1, Z), where (1, 0, 0, 1) indicates that there is a relationship between words dependencies.
步骤A30,基于预设依存关系类型预测模型和所述依存关系判别结果,对所述向量化语句进行依存关系类型预测,获得所述依存关系类型预测结果;Step A30, based on a preset dependency type prediction model and the dependency discrimination result, perform dependency type prediction on the vectorized statement, and obtain the dependency type prediction result;
在本实施例中,所述预设依存句法模型包括预设依存关系类型预测模型,其中,所述预设依存关系类型预测模型为用于预测待分析语句中词与词之间的依存关系类型的机器学习模型。In this embodiment, the preset dependency syntax model includes a preset dependency type prediction model, wherein the preset dependency type prediction model is used to predict the type of dependencies between words in the sentence to be analyzed machine learning model.
基于预设依存关系类型预测模型和所述依存关系判别结果,对所述向量化语句进行依存关系类型预测,获得所述依存关系类型预测结果,具体地,基于所述预设依存关系类型预测模型,对所述向量化语句进行依存关系类型预测,获得依存关系类型概率得分矩阵, 其中,需要说明的是,所述依存关系类型概率得分矩阵中每一比特位上存在一依存关系类型概率得分向量,其中,所述依存关系类型概率得分向量每一比特位上的数值为预设依存关系类型的概率得分,例如,假设所述依存关系类型概率得分向量为(A,B,C),且所述依存关系类型概率得分向量的第一位对应主谓关系,第二位对应动宾关系,第三位对应并列关系,则A为所述依存关系类型概率得分向量对应的两词之间的依存关系为主谓关系的概率得分,B为所述依存关系类型概率得分向量对应的两词之间的依存关系为动宾关系的概率得分,A为所述依存关系类型概率得分向量对应的两词之间的依存关系为主谓关系的概率得分,进而基于所述依存关系判别结果,在所述依存关系类型概率得分矩阵中选取各目标依存关系类型概率得分向量,进而将各所述目标依存关系类型概率得分向量中的最大数值对应的依存关系类型作为目标依存关系类型,进而获得待分析语句的词与词之间的依存关系类型,也即,获得所述依存关系类型预测结果。Based on the preset dependency type prediction model and the dependency discrimination result, perform dependency type prediction on the vectorized statement, and obtain the dependency type prediction result. Specifically, based on the preset dependency type prediction model , perform dependency type prediction on the vectorized statement, and obtain a dependency type probability score matrix, where it should be noted that there is a dependency type probability score vector on each bit in the dependency type probability score matrix , wherein the value on each bit of the dependency type probability score vector is the probability score of the preset dependency type. For example, assuming that the dependency type probability score vector is (A, B, C), and all The first bit of the dependency type probability score vector corresponds to the subject-predicate relationship, the second bit corresponds to the verb-object relationship, and the third bit corresponds to the juxtaposition relationship, then A is the dependency between the two words corresponding to the dependency type probability score vector The relationship is the probability score of the subject-predicate relationship, B is the probability score of the verb-object relationship between the two words corresponding to the dependency type probability score vector, and A is the two words corresponding to the dependency type probability score vector The dependency relationship between them is the probability score of the subject-predicate relationship, and then based on the dependency relationship discrimination result, select the probability score vector of each target dependency relationship type in the dependency relationship type probability score matrix, and then classify the target dependency relationship The dependency type corresponding to the largest value in the type probability score vector is used as the target dependency type, and then the dependency type between the words of the sentence to be analyzed is obtained, that is, the dependency type prediction result is obtained.
其中,所述依存关系判别结果包括依存关系向量,Wherein, the dependency discrimination result includes a dependency vector,
所述基于预设依存关系类型预测模型和所述依存关系判别结果,对所述向量化语句进行依存关系类型预测,获得所述依存关系类型预测结果的步骤包括:The step of performing dependency type prediction on the vectorized statement based on the preset dependency type prediction model and the dependency discrimination result, and obtaining the dependency type prediction result includes:
步骤A31,基于所述预设依存关系类型预测模型,对所述向量化语句进行依存关系类型预测,获得依存关系类型概率得分矩阵;Step A31, based on the preset dependency type prediction model, perform dependency type prediction on the vectorized statement, and obtain a dependency type probability score matrix;
在本实施例中,需要说明的是,所述预设依存关系类型预测模型包括第二特征提取模型、第三全连接网络、第四全连接网络和第二双仿射变换网络。In this embodiment, it should be noted that the preset dependency type prediction model includes a second feature extraction model, a third fully connected network, a fourth fully connected network, and a second double affine transformation network.
基于所述预设依存关系类型预测模型,对所述向量化语句进行依存关系类型预测,获得依存关系类型概率得分矩阵,具体地,将所述向量化语句输入第二特征提取模型,对所述向量化语句进行特征提取,获得第二特征提取矩阵,并将所述第二特征提取矩阵分别输入第三全连接网络和第四全连接网络,获得所述第二特征提取矩阵对应的第三句子向量和对应的第四句子向量,将所述第三句子向量和所述第四句子向量输入第二双仿射变换网络,对所述第三句子向量和所述第四句子向量进行双仿射变换,获得所述依存关系类型概率得分矩阵。Based on the preset dependency type prediction model, a dependency type prediction is performed on the vectorized statement, and a dependency type probability score matrix is obtained. Specifically, the vectorized statement is input into the second feature extraction model, and the Perform feature extraction on the quantized sentence to obtain a second feature extraction matrix, and input the second feature extraction matrix into the third fully connected network and the fourth fully connected network, respectively, to obtain a third sentence corresponding to the second feature extraction matrix vector and the corresponding fourth sentence vector, input the third sentence vector and the fourth sentence vector into the second double affine transformation network, and perform double affine on the third sentence vector and the fourth sentence vector Transform to obtain the dependency type probability score matrix.
步骤A32,将所述依存关系类型概率得分矩阵和所述依存关系向量进行融合,获得所述依存关系类型预测结果。Step A32: Integrate the dependency type probability score matrix and the dependency relationship vector to obtain the dependency type prediction result.
在本实施例中,将所述依存关系类型概率得分矩阵和所述依存关系向量进行融合,获得所述依存关系类型预测结果,具体地,基于预设融合规则,将所述依存关系类型概率得分矩阵中的每一依存关系类型概率得分向量与所述依存关系向量进行融合,获得各所述依 存关系类型概率得分向量对应的依存关系类型概率向量,其中,所述预设融合规则包括加权平均、拼接、求和等,所述依存关系类型概率向量每一比特位上的数值为预设依存关系类型的概率,所述预设依存关系类型包括主谓关系类型、动宾关系类型和并列关系类型等,进而分别在各所述依存关系类型概率向量中选取最大概率数值作为目标依存关系类型概率,进而在各所述目标依存关系类型概率中确定符合预设概率选取条件的各最大依存关系类型概率对应的依存关系类型,并将各最大依存关系类型概率对应的依存关系类型作为依存关系类型预测结果,其中,预设概率选取条件包括选取的各最大依存关系类型概率对应的待分析词与所述待分析语句中的各待分析词一一对应,例如,假设待分析语句为ABC,则预设概率选取条件为选取的各最大依存关系类型概率的数量为2,且各最大依存关系类型概率对应的各待分析词可组成待分析语句ABC。In this embodiment, the dependency type probability score matrix and the dependency relationship vector are fused to obtain the dependency type prediction result. Specifically, based on a preset fusion rule, the dependency type probability score is Each dependency type probability score vector in the matrix is fused with the dependency vector to obtain a dependency type probability vector corresponding to each of the dependency type probability score vectors, wherein the preset fusion rules include weighted average, splicing, summing, etc., the value on each bit of the dependency type probability vector is the probability of a preset dependency type, and the preset dependency type includes a subject-predicate relationship type, a verb-object relationship type, and a parallel relationship type and so on, and then select the maximum probability value in each of the dependency type probability vectors as the target dependency type probability, and then determine each maximum dependency type probability that meets the preset probability selection conditions in each of the target dependency type probabilities The corresponding dependency type, and the dependency type corresponding to each maximum dependency type probability is used as the dependency type prediction result, wherein the preset probability selection condition includes the selected word to be analyzed corresponding to each maximum dependency type probability and the The words to be analyzed in the sentence to be analyzed correspond one-to-one. For example, if the sentence to be analyzed is ABC, the preset probability selection condition is that the number of selected maximum dependency type probabilities is 2, and each maximum dependency type probability corresponds to Each word to be analyzed can form a sentence to be analyzed ABC.
另外地,需要说明的是,在一种实施方式中,所述预设依存句法模型可基于如下步骤训练获得:In addition, it should be noted that, in an embodiment, the preset dependency syntax model can be obtained by training based on the following steps:
步骤B10,获取训练数据和待训练依存句法模型,其中,所述训练数据包括训练语句和所述训练语句对应的预设依存类型标签;Step B10, acquiring training data and a to-be-trained dependent syntax model, wherein the training data includes a training sentence and a preset dependency type label corresponding to the training sentence;
在本实施例中,需要说明的是,所述预设依存类型标签为预先标注好的训练语句中词与词之间的依存关系类型的标识,所述待训练依存句法模型为未训练好的依存句法模型。In this embodiment, it should be noted that the preset dependency type label is an identifier of the dependency type between words in the pre-marked training sentence, and the dependency syntax model to be trained is an untrained one Dependency syntax model.
获取训练数据和待训练依存句法模型,其中,所述训练数据包括训练语句和所述训练语句对应的预设依存类型标签,具体地,获取标注依存句法分析数据集和待训练依存句法模型,并收集依存句法分析数据集,并对所述依存句法分析数据集进行人工标注,获得人工标注依存句法分析数据集,进而将所述标注依存句法分析数据集和所述人工标注依存句法分析数据集进行合并,获得训练数据集,以扩充所述待训练依存句法模型对应的训练样本的数量。Acquire training data and a to-be-trained dependent syntax model, wherein the training data includes a training sentence and a preset dependency type label corresponding to the training sentence, specifically, acquire a label-dependent syntax analysis data set and a to-be-trained dependent syntax model, and Collecting a dependency syntax analysis data set, and manually labeling the dependency syntax analysis data set to obtain a manual labeling dependency syntax analysis data set, and then performing the labeling dependency syntax analysis data set and the manual labeling dependency syntax analysis data set. Combined to obtain a training data set, so as to expand the number of training samples corresponding to the to-be-trained dependent syntax model.
步骤B20,将所述训练数据输入所述待训练依存句法模型,以对所述训练语句进行依存句法分析,获得类型训练预测标签;Step B20, inputting the training data into the to-be-trained dependent syntax model to perform dependency syntax analysis on the training sentence to obtain a type training prediction label;
在本实施例中,需要说明的是,所述训练数据至少包括一训练语句。In this embodiment, it should be noted that the training data includes at least one training sentence.
将所述训练数据输入所述待训练依存句法模型,以对所述训练语句进行依存句法分析,获得类型训练预测标签,具体地,基于所述待训练依存句法模型中的向量化网络,对所述训练语句进行向量化,获得向量化训练语句,进而基于所述待训练依存句法模型中的预设依存关系判别模型,对所述向量化训练语句进行依存关系判别,获得训练依存关系向量,并基于所述待训练依存句法模型中的预设依存关系类型预测模型,对所述向量化训练语句 进行依存关系类型预测,获得训练依存关系类型概率得分矩阵,进而将所述训练依存关系向量和所述训练依存关系类型概率得分矩阵,确定类型训练预测标签,其中,所述类型训练预测标签为训练语句对应的依存关系类型的标识。Input the training data into the to-be-trained dependent syntax model, to perform dependency syntax analysis on the training sentences, and obtain type training prediction labels. Specifically, based on the vectorized network in the to-be-trained dependent syntax model, all The training sentence is vectorized to obtain a vectorized training sentence, and then based on the preset dependency discrimination model in the dependency syntax model to be trained, the dependency relation of the vectorized training sentence is discriminated to obtain a training dependency vector, and Based on the preset dependency type prediction model in the dependency syntax model to be trained, a dependency type prediction is performed on the vectorized training sentence to obtain a training dependency type probability score matrix, and then the training dependency vector and the The training dependency type probability score matrix is used to determine the type training prediction label, wherein the type training prediction label is the identifier of the dependency type corresponding to the training sentence.
步骤B30,基于所述类型训练预测标签和所述预设依存类型标签,计算依存句法模型误差;Step B30, calculating the dependency syntax model error based on the type training prediction label and the preset dependency type label;
在本实施例中,基于所述类型训练预测标签和所述预设依存类型标签,计算依存句法模型误差,具体地,计算所述类型训练预测标签和所述预设依存类型标签之间的距离,获得依存句法模型误差。In this embodiment, a dependency syntax model error is calculated based on the type training prediction label and the preset dependency type label, and specifically, the distance between the type training prediction label and the preset dependency type label is calculated , to obtain the dependent syntax model error.
步骤B40,基于所述依存句法模型误差,对所述待训练依存句法模型进行更新,直至所述待训练依存句法模型满足预设更新结束条件,将所述待训练依存句法模型作为所述预设依存句法模型。Step B40, based on the error of the dependent syntax model, update the dependent syntax model to be trained until the dependent syntax model to be trained satisfies a preset update end condition, and use the dependent syntax model to be trained as the preset Dependency syntax model.
在本实施例中,基于所述依存句法模型误差,对所述待训练依存句法模型进行更新,直至所述待训练依存句法模型满足预设更新结束条件,将所述待训练依存句法模型作为所述预设依存句法模型,具体地,基于所述依存句法模型误差,计算梯度信息,并通过反向传播的方式,根据所述梯度信息,更新所述待训练依存句法模型的模型参数,获得更新后的待训练依存句法模型,进而判断更新后的待训练依存句法模型是否满足预设更新结束条件,若满足,则将更新后的待训练依存句法模型作为所述预设依存句法模型,若不满足,则重新获取训练语句,以对更新后的待训练依存句法模型的模型参数重新进行训练更新,直至更新后的待训练依存句法模型满足预设更新结束条件,其中,所述预设更新结束条件包括达到最大迭代次数和损失函数收敛等。In this embodiment, based on the error of the dependent syntax model, the to-be-trained dependent syntax model is updated until the to-be-trained dependent syntax model satisfies a preset update end condition, and the to-be-trained dependent syntax model is used as the The preset dependent syntax model, specifically, based on the error of the dependent syntax model, calculate gradient information, and update the model parameters of the dependent syntax model to be trained according to the gradient information by means of backpropagation, and obtain an updated and then determine whether the updated dependent syntax model to be trained satisfies the preset update end condition, and if so, the updated dependent syntax model to be trained is used as the preset dependent syntax model, if not is satisfied, then re-acquire the training sentence to re-train and update the model parameters of the updated dependent syntax model to be trained, until the updated dependent syntax model to be trained satisfies the preset update end condition, wherein the preset update ends Conditions include reaching the maximum number of iterations and convergence of the loss function.
进一步地,基于训练好的预设依存句法模型,即可训练所述预设复述句识别模型,具体地,获取复述句训练数据和待训练复述句识别模型,其中,所述复述句训练语句包括第一待识别训练语句、第二待识别训练语句和语句标签,其中,所述语句标签为所述第一待识别训练语句和所述第二待识别训练语句是否为复述句的标识,所述复述句训练数据可用向量进行表示,例如,假设所述复述句训练数据为向量(X1,X2,Y),则X1为第一复述句训练语句,X2为所述第二复述句训练语句,Y为所述语句标签,进而生成所述第一待识别训练语句和所述第二待识别训练语句共同对应的训练句子向量表示,并基于所述预设依存句法模型,分别对所述第一待识别训练语句和所述第二待识别训练语句进行依存句法分析,获得所述第一待识别训练语句对应的第一训练依存向量和所述第二待识别训练语句对应的第二训练依存向量,进而将所述训练句子向量表示、所述第一训练依存向量和所述 第二训练依存向量聚合后输入所述待训练复述句识别模型,获得输出复述句识别标签,进而基于所述输出复述句识别标签和所述语句标签,计算复述句识别模型误差,并基于所述复述句识别模型误差,对所述待训练复述句识别模型进行更新,直至所述待训练复述句识别模型满足预设训练结束条件,将所述待训练复述句识别模型作为所述预设复述句识别模型,其中,所述预设训练结束条件包括损失函数收敛和模型达到最大迭代次数等。Further, based on the trained preset dependency syntax model, the preset paraphrase sentence recognition model can be trained. Specifically, the paraphrase sentence training data and the paraphrase sentence recognition model to be trained are obtained, wherein the paraphrase sentence training sentence includes: The first to-be-recognized training sentence, the second to-be-recognized training sentence, and a sentence label, wherein the sentence label is an identifier for whether the first to-be-recognized training sentence and the second to-be-recognized training sentence are paraphrase sentences, and the The paraphrase training data can be represented by a vector. For example, if the paraphrase training data is a vector (X1, X2, Y), then X1 is the first paraphrase training sentence, X2 is the second paraphrase training sentence, Y is the sentence label, and then generates a training sentence vector representation corresponding to the first to-be-recognized training sentence and the second to-be-recognized training sentence, and based on the preset dependent syntax model, respectively The recognition training sentence and the second to-be-recognized training sentence are subjected to dependency syntax analysis to obtain a first training-dependent vector corresponding to the first to-be-recognized training sentence and a second training-dependent vector corresponding to the second to-be-recognized training sentence, Then, the vector representation of the training sentence, the first training dependency vector and the second training dependency vector are aggregated and input into the to-be-trained paraphrase recognition model to obtain an output paraphrase identification label, and then based on the output paraphrase sentence Identify the label and the statement label, calculate the paraphrase recognition model error, and update the paraphrase recognition model to be trained based on the paraphrase recognition model error until the paraphrase recognition model to be trained satisfies the preset training The end condition is to use the to-be-trained paraphrase sentence recognition model as the preset paraphrase sentence recognition model, wherein the preset training end condition includes the convergence of the loss function and the model reaching the maximum number of iterations.
本实施提供了一种基于机器学习的依存句法分析方法,首先对所述待分析语句进行向量化,获得向量化语句,进而基于预设依存关系判别模型,对所述向量化语句进行依存关系判别,获得依存关系判别结果,进而实现了判定待分析语句的词与词之间是否存在依存关系的目的,进而基于预设依存关系类型预测模型和所述依存关系判别结果,对所述向量化语句进行依存关系类型预测,获得所述依存关系类型预测结果,进而实现了预测待分析语句中词与词之间的依存关系类型的目的,且由于所述依存关系类型是基于预测关系判别结果进行预测的,避免了词与词之间存在依存关系的概率极低时,预测词与词之间存在各种类型的预设依存关系的概率却较高的情况发生,提高了依存关系类型预测的准确性,进而提高了依存句法分析的准确性,进而基于所述依存关系类型预测结果,即可抽取所述待分析语句对应的目标句子主干,以对所述待分析语句进行语义识别,其中,需要说明的是,在所述待分析语句中除所述目标句子主干之外的非句子主干部分对语义识别的贡献度较小,且对语义识别存在干扰,进而实现了基于依存句法分析的方法,在待分析语句中剔除对语义识别的贡献度小的非句子主干部分的目的,使得目标句子主干更加贴近于待分析语句的真实语义,进而可提高句子语义识别的准确率,为克服相关技术中基于句子的词信息,对句子进行语义识别时,对于一些语义相同但词信息不同的句子,难以准确识别这些句子具备相同的语义,而导致句子语义识别准确率低的技术缺陷奠定了基础。This implementation provides a method for analyzing dependency syntax based on machine learning. First, the to-be-analyzed statement is vectorized to obtain a vectorized statement, and then based on a preset dependency relationship discrimination model, the vectorized statement is subjected to dependency relationship discrimination. , obtain the dependency discrimination result, and then realize the purpose of judging whether there is a dependency between the words of the sentence to be analyzed, and then based on the preset dependency type prediction model and the dependency discrimination result, the vectorized sentence Predicting the type of dependency relationship is performed to obtain the prediction result of the type of dependency relationship, thereby realizing the purpose of predicting the type of dependency relationship between words in the sentence to be analyzed, and because the type of dependency relationship is based on the prediction relationship discrimination result. It avoids the situation where the probability of the existence of dependencies between words is extremely low, but the probability of various types of preset dependencies between predicted words and words is relatively high, which improves the accuracy of the type of dependency prediction. This improves the accuracy of dependency syntax analysis, and then based on the dependency type prediction result, the target sentence trunk corresponding to the to-be-analyzed sentence can be extracted to perform semantic recognition on the to-be-analyzed sentence. It is explained that the non-sentence trunk part in the to-be-analyzed sentence except the target sentence trunk has a small contribution to semantic recognition and interferes with semantic recognition, thereby realizing the method based on dependency syntax analysis, The purpose of eliminating the non-sentence stem part in the sentence to be analyzed that contributes little to semantic recognition makes the target sentence stem closer to the real semantics of the sentence to be analyzed, thereby improving the accuracy of sentence semantic recognition. When performing semantic recognition on sentences based on the word information of sentences, for some sentences with the same semantics but different word information, it is difficult to accurately recognize that these sentences have the same semantics, and the technical defects that lead to the low accuracy of sentence semantic recognition have laid the foundation.
参照图4,图4是本申请实施例方案涉及的硬件运行环境的设备结构示意图。Referring to FIG. 4 , FIG. 4 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
如图4所示,该基于依存句法的句子主干抽取设备可以包括:处理器1001,例如CPU,存储器1005,通信总线1002。其中,通信总线1002用于实现处理器1001和存储器1005之间的连接通信。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储设备。As shown in FIG. 4 , the apparatus for extracting sentence stems based on dependency syntax may include: a processor 1001 , such as a CPU, a memory 1005 , and a communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection communication between the processor 1001 and the memory 1005 . The memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
可选地,该基于依存句法的句子主干抽取设备还可以包括矩形用户接口、网络接口、摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。矩形用户接口可以包括显示屏(Display)、输入子模块比如键盘(Keyboard),可选矩形用户接 口还可以包括标准的有线接口、无线接口。网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。Optionally, the device for extracting sentence trunks based on dependency syntax may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may include a display screen (Display), an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface. Optional network interfaces may include standard wired interfaces and wireless interfaces (eg, WI-FI interfaces).
本领域技术人员可以理解,图4中示出的基于依存句法的句子主干抽取设备结构并不构成对基于依存句法的句子主干抽取设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the sentence stem extraction device based on the dependency syntax shown in FIG. Either some components are combined, or different component arrangements.
如图4所示,作为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块以及基于依存句法的句子主干抽取程序。操作***是管理和控制基于依存句法的句子主干抽取设备硬件和软件资源的程序,支持基于依存句法的句子主干抽取程序以及其它软件和/或程序的运行。网络通信模块用于实现存储器1005内部各组件之间的通信,以及与基于依存句法的句子主干抽取***中其它硬件和软件之间通信。As shown in FIG. 4 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, and a sentence trunk extraction program based on dependency syntax. The operating system is a program that manages and controls the hardware and software resources of the dependency syntax-based sentence trunk extraction device, and supports the execution of the dependency syntax-based sentence trunk extraction program and other software and/or programs. The network communication module is used to realize the communication between various components in the memory 1005, and communicate with other hardware and software in the sentence trunk extraction system based on the dependency syntax.
在图4所示的基于依存句法的句子主干抽取设备中,处理器1001用于执行存储器1005中存储的基于依存句法的句子主干抽取程序,实现上述任一项所述的基于依存句法的句子主干抽取方法的步骤。In the apparatus for extracting sentence stems based on dependency syntax shown in FIG. 4 , the processor 1001 is configured to execute the program for extracting sentence stems based on dependency syntaxes stored in the memory 1005 to realize the sentence stems based on any one of the above-mentioned dependency syntaxes. Extraction method steps.
本申请基于依存句法的句子主干抽取设备具体实施方式与上述基于依存句法的句子主干抽取方法各实施例基本相同,在此不再赘述。The specific implementation of the device for extracting sentence stems based on dependency syntax in the present application is basically the same as the above-mentioned embodiments of the method for extracting sentence stems based on dependency syntax, and details are not repeated here.
本申请实施例还提供一种基于依存句法的句子主干抽取装置,所述基于依存句法的句子主干抽取装置应用于基于依存句法的句子主干抽取设备,所述基于依存句法的句子主干抽取装置包括:An embodiment of the present application further provides a device for extracting sentence stems based on dependency syntax. The device for extracting sentence stems based on dependency syntax is applied to a device for extracting sentence stems based on dependency syntax. The device for extracting sentence stems based on dependency syntax includes:
依存句法分析模块,用于获取待分析语句,并对所述待分析语句进行依存句法分析,获得依存句法分析结果;a dependency syntax analysis module, used to obtain a statement to be analyzed, and to perform a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
句子主干抽取模块,用于基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,以对所述待分析语句进行语义识别。The sentence stem extraction module is used for extracting the target sentence stem corresponding to the to-be-analyzed sentence based on the dependent syntax analysis result, so as to perform semantic recognition on the to-be-analyzed sentence.
可选地,所述句子主干抽取模块包括:Optionally, the sentence stem extraction module includes:
确定单元,用于基于所述依存关系类型预测结果,确定所述待分析语句对应的句型信息;a determining unit, configured to determine sentence pattern information corresponding to the to-be-analyzed sentence based on the dependency type prediction result;
抽取单元,用于基于所述句型信息和所述依存关系类型预测结果,抽取所述目标句子主干。An extraction unit, configured to extract the stem of the target sentence based on the sentence pattern information and the dependency type prediction result.
可选地,所述抽取单元包括:Optionally, the extraction unit includes:
第一确定子单元,用于基于所述句型信息,确定所述待分析语句对应的目标核心词;a first determining subunit, configured to determine the target core word corresponding to the sentence to be analyzed based on the sentence pattern information;
第二确定子单元,用于基于预设词成分优先级和所述依存关系类型预测结果,确定所 述目标核心词对应的各目标句子主干词;The second determination subunit is used to determine each target sentence stem word corresponding to the target core word based on the preset word component priority and the dependency type prediction result;
第一生成子单元,用于基于所述句型信息、所述目标核心词和各所述目标句子主干词,生成所述目标句子主干。a first generating subunit, configured to generate the target sentence stem based on the sentence pattern information, the target core word and each target sentence stem word.
可选地,所述依存句法分析模块包括:Optionally, the dependency syntax analysis module includes:
向量化单元,用于对所述待分析语句进行向量化,获得向量化语句;a vectorization unit for vectorizing the statement to be analyzed to obtain a vectorized statement;
依存关系判别单元,用于基于预设依存关系判别模型,对所述向量化语句进行依存关系判别,获得依存关系判别结果;a dependency discrimination unit, configured to perform dependency discrimination on the vectorized statement based on a preset dependency discrimination model, and obtain a dependency discrimination result;
依存关系类型预测单元,用于基于预设依存关系类型预测模型和所述依存关系判别结果,对所述向量化语句进行依存关系类型预测,获得所述依存关系类型预测结果。A dependency type prediction unit, configured to perform dependency type prediction on the vectorized statement based on a preset dependency type prediction model and the dependency discrimination result, and obtain the dependency type prediction result.
可选地,所述依存关系判别单元包括:Optionally, the dependency determination unit includes:
特征提取子单元,用于基于所述第一特征提取模型,对所述向量化语句进行特征提取,获得第一特征提取结果;a feature extraction subunit, configured to perform feature extraction on the vectorized statement based on the first feature extraction model to obtain a first feature extraction result;
全连接子单元,用于基于所述第一全连接网络和所述第二全连接网络,分别对所述第一特征提取结果进行全连接,获得第一句子向量和第二句子向量;The fully-connected subunit is configured to fully connect the first feature extraction result based on the first fully-connected network and the second fully-connected network, respectively, to obtain a first sentence vector and a second sentence vector;
双仿射变换子单元,用于基于所述第一双仿射变换网络,对所述第一句子向量和所述第二句子向量进行双仿射变换,获得依存关系得分矩阵;A double affine transformation subunit is used to perform double affine transformation on the first sentence vector and the second sentence vector based on the first double affine transformation network to obtain a dependency score matrix;
第三确定子单元,用于基于所述依存关系得分矩阵,确定所述依存关系判别结果。The third determination subunit is configured to determine the dependency discrimination result based on the dependency score matrix.
可选地,所述依存关系类型预测单元包括:Optionally, the dependency type prediction unit includes:
依存关系类型预测子单元,用于基于所述预设依存关系类型预测模型,对所述向量化语句进行依存关系类型预测,获得依存关系类型概率得分矩阵;a dependency type prediction subunit, configured to perform dependency type prediction on the vectorized statement based on the preset dependency type prediction model, and obtain a dependency type probability score matrix;
融合子单元,用于将所述依存关系类型概率得分矩阵和所述依存关系向量进行融合,获得所述依存关系类型预测结果。A fusion subunit, configured to fuse the dependency type probability score matrix and the dependency relationship vector to obtain the dependency type prediction result.
可选地,所述向量化单元包括:Optionally, the vectorization unit includes:
获取子单元,用于获取所述待分析词对应的待分析词向量、对应的待分析词性向量和对应的待分析词位置向量;an acquisition subunit, used to acquire the word vector to be analyzed corresponding to the word to be analyzed, the corresponding part of speech vector to be analyzed and the corresponding word position vector to be analyzed;
第二生成子单元,用于基于所述待分析词向量、所述待分析词性向量和所述待分析词位置向量,生成所述向量化词。The second generating subunit is configured to generate the vectorized word based on the word vector to be analyzed, the part of speech vector to be analyzed, and the position vector of the word to be analyzed.
可选地,所述依存句法分析模块还包括:Optionally, the dependency syntax analysis module further includes:
获取单元,用于获取待处理语句,并识别所述待处理语句中的背景成分;an acquisition unit for acquiring a statement to be processed and identifying background components in the statement to be processed;
去除单元,用于在所述待处理语句中去除所述背景成分,获得所述待分析语句。A removing unit, configured to remove the background component in the to-be-processed sentence to obtain the to-be-analyzed sentence.
本申请基于依存句法的句子主干抽取装置的具体实施方式与上述基于依存句法的句子主干抽取方法各实施例基本相同,在此不再赘述。The specific implementations of the apparatus for extracting sentence stems based on dependency syntax in the present application are basically the same as the above-mentioned embodiments of the method for extracting sentence stems based on dependency syntaxes, and are not repeated here.
本申请实施例提供了一种可读存储介质,且所述可读存储介质存储有一个或者一个以上程序,所述一个或者一个以上程序还可被一个或者一个以上的处理器执行以用于实现上述任一项所述的基于依存句法的句子主干抽取方法的步骤。An embodiment of the present application provides a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs can also be executed by one or more processors to implement The steps of any one of the above-mentioned methods for extracting sentence stems based on dependency syntax.
本申请可读存储介质具体实施方式与上述基于依存句法的句子主干抽取方法各实施例基本相同,在此不再赘述。The specific implementations of the readable storage medium of the present application are basically the same as the above-mentioned embodiments of the method for extracting sentence stems based on dependency syntax, and are not repeated here.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利处理范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent processing of this application.

Claims (20)

  1. 一种基于依存句法的句子主干抽取方法,其中,所述基于依存句法的句子主干抽取方法包括:A method for extracting sentence stems based on dependency syntax, wherein the method for extracting sentence stems based on dependency syntax includes:
    获取待分析语句,并对所述待分析语句进行依存句法分析,获得依存句法分析结果;Obtaining a statement to be analyzed, and performing a dependency syntax analysis on the to-be-analyzed statement to obtain a dependency syntax analysis result;
    基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干,以对所述待分析语句进行语义识别。Based on the result of the dependency syntax analysis, a target sentence trunk corresponding to the to-be-analyzed sentence is extracted to perform semantic recognition on the to-be-analyzed sentence.
  2. 如权利要求1所述基于依存句法的句子主干抽取方法,其中,所述依存句法分析结果包括依存关系类型预测结果,The method for extracting sentence stems based on dependency syntax according to claim 1, wherein the dependency syntax analysis result comprises a dependency type prediction result,
    所述基于所述依存句法分析结果,抽取所述待分析语句对应的目标句子主干的步骤包括:The step of extracting the trunk of the target sentence corresponding to the sentence to be analyzed based on the result of the dependency syntax analysis includes:
    基于所述依存关系类型预测结果,确定所述待分析语句对应的句型信息;determining sentence type information corresponding to the to-be-analyzed sentence based on the dependency type prediction result;
    基于所述句型信息和所述依存关系类型预测结果,抽取所述目标句子主干。Extracting the target sentence stem based on the sentence pattern information and the dependency type prediction result.
  3. 如权利要求2所述基于依存句法的句子主干抽取方法,其中,所述基于所述句型信息和所述依存关系类型预测结果,抽取所述目标句子主干的步骤包括:The sentence stem extraction method based on dependency syntax according to claim 2, wherein the step of extracting the target sentence stem based on the sentence pattern information and the dependency type prediction result comprises:
    基于所述句型信息,确定所述待分析语句对应的目标核心词;Determine the target core word corresponding to the sentence to be analyzed based on the sentence pattern information;
    基于预设词成分优先级和所述依存关系类型预测结果,确定所述目标核心词对应的各目标句子主干词;Determine each target sentence stem word corresponding to the target core word based on the preset word component priority and the dependency type prediction result;
    基于所述句型信息、所述目标核心词和各所述目标句子主干词,生成所述目标句子主干。The target sentence stem is generated based on the sentence pattern information, the target core words and each of the target sentence stem words.
  4. 如权利要求1所述基于依存句法的句子主干抽取方法,其中,所述依存句法分析结果包括依存关系类型预测结果,The method for extracting sentence stems based on dependency syntax according to claim 1, wherein the dependency syntax analysis result comprises a dependency type prediction result,
    所述对所述待分析语句进行依存句法分析,获得依存句法分析结果的步骤包括:The step of performing dependency syntax analysis on the to-be-analyzed statement, and obtaining the result of the dependency syntax analysis includes:
    对所述待分析语句进行向量化,获得向量化语句;Vectorizing the to-be-analyzed statement to obtain a vectorized statement;
    基于预设依存关系判别模型,对所述向量化语句进行依存关系判别,获得依存关系判别结果;Based on a preset dependency discrimination model, the vectorized statement is subjected to dependency discrimination to obtain a dependency discrimination result;
    基于预设依存关系类型预测模型和所述依存关系判别结果,对所述向量化语句进行依存关系类型预测,获得所述依存关系类型预测结果。Based on a preset dependency type prediction model and the dependency discrimination result, a dependency type prediction is performed on the vectorized sentence to obtain the dependency type prediction result.
  5. 如权利要求4所述基于依存句法的句子主干抽取方法,其中,所述预设依存关系判别模型包括第一特征提取模型、第一全连接网络、第二全连接网络和第一双仿射变换网 络,The method for extracting sentence stems based on dependency syntax according to claim 4, wherein the preset dependency discrimination model comprises a first feature extraction model, a first fully connected network, a second fully connected network and a first double affine transformation The internet,
    所述基于预设依存关系判别模型,对所述向量化语句进行依存关系判别,获得依存关系判别结果的步骤包括:The step of performing dependency discrimination on the vectorized statement based on the preset dependency discrimination model, and obtaining a dependency discrimination result includes:
    基于所述第一特征提取模型,对所述向量化语句进行特征提取,获得第一特征提取结果;Based on the first feature extraction model, feature extraction is performed on the vectorized statement to obtain a first feature extraction result;
    基于所述第一全连接网络和所述第二全连接网络,分别对所述第一特征提取结果进行全连接,获得第一句子向量和第二句子向量;Based on the first fully connected network and the second fully connected network, the first feature extraction results are respectively fully connected to obtain a first sentence vector and a second sentence vector;
    基于所述第一双仿射变换网络,对所述第一句子向量和所述第二句子向量进行双仿射变换,获得依存关系得分矩阵;Based on the first double affine transformation network, double affine transformation is performed on the first sentence vector and the second sentence vector to obtain a dependency score matrix;
    基于所述依存关系得分矩阵,确定所述依存关系判别结果。Based on the dependency score matrix, the dependency discrimination result is determined.
  6. 如权利要求4所述基于依存句法的句子主干抽取方法,其中,所述依存关系判别结果包括依存关系向量,The method for extracting sentence stems based on dependency syntax according to claim 4, wherein the dependency discrimination result comprises a dependency vector,
    所述基于预设依存关系类型预测模型和所述依存关系判别结果,对所述向量化语句进行依存关系类型预测,获得所述依存关系类型预测结果的步骤包括:The step of performing dependency type prediction on the vectorized statement based on the preset dependency type prediction model and the dependency discrimination result, and obtaining the dependency type prediction result includes:
    基于所述预设依存关系类型预测模型,对所述向量化语句进行依存关系类型预测,获得依存关系类型概率得分矩阵;Based on the preset dependency type prediction model, the vectorized statement is subjected to dependency type prediction, and a dependency type probability score matrix is obtained;
    将所述依存关系类型概率得分矩阵和所述依存关系向量进行融合,获得所述依存关系类型预测结果。The dependency type probability score matrix and the dependency relation vector are fused to obtain the dependency type prediction result.
  7. 如权利要求4所述基于依存句法的句子主干抽取方法,其中,所述待分析语句至少包括一待分析词,所述向量化语句至少包括一向量化词,The method for extracting sentence stems based on dependency syntax according to claim 4, wherein the to-be-analyzed sentence includes at least one to-be-analyzed word, and the vectorized sentence includes at least one vectorized word,
    所述对所述待分析语句进行向量化,获得向量化语句的步骤包括:The step of vectorizing the to-be-analyzed statement to obtain the vectorized statement includes:
    获取所述待分析词对应的待分析词向量、对应的待分析词性向量和对应的待分析词位置向量;Obtain the word vector to be analyzed corresponding to the word to be analyzed, the corresponding part of speech vector to be analyzed and the corresponding position vector of the word to be analyzed;
    基于所述待分析词向量、所述待分析词性向量和所述待分析词位置向量,生成所述向量化词。The vectorized word is generated based on the to-be-analyzed word vector, the to-be-analyzed part-of-speech vector, and the to-be-analyzed word position vector.
  8. 如权利要求1所述基于依存句法的句子主干抽取方法,其中,所述获取待分析语句的步骤包括:The method for extracting sentence stems based on dependency syntax according to claim 1, wherein the step of obtaining the sentence to be analyzed comprises:
    获取待处理语句,并识别所述待处理语句中的背景成分;Acquire the statement to be processed, and identify the background components in the statement to be processed;
    在所述待处理语句中去除所述背景成分,获得所述待分析语句。The background component is removed from the to-be-processed sentence to obtain the to-be-analyzed sentence.
  9. 一种基于依存句法的句子主干抽取设备,其中,所述基于依存句法的句子主干抽 取设备包括:存储器、处理器以及存储在存储器上的用于实现所述基于依存句法的句子主干抽取方法的程序,A device for extracting sentence stems based on dependency syntax, wherein the device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a program stored on the memory for implementing the method for extracting sentence stems based on dependency syntax ,
    所述存储器用于存储实现基于依存句法的句子主干抽取方法的程序;Described memory is used for storing the program that realizes sentence trunk extraction method based on dependency syntax;
    所述处理器用于执行实现所述基于依存句法的句子主干抽取方法的程序,以实现如权利要求1所述基于依存句法的句子主干抽取方法的步骤。The processor is configured to execute a program for implementing the method for extracting sentence stems based on dependency syntax, so as to implement the steps of the method for extracting sentence stems based on dependency syntax as claimed in claim 1 .
  10. 一种基于依存句法的句子主干抽取设备,其中,所述基于依存句法的句子主干抽取设备包括:存储器、处理器以及存储在存储器上的用于实现所述基于依存句法的句子主干抽取方法的程序,A device for extracting sentence stems based on dependency syntax, wherein the device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a program stored in the memory for implementing the method for extracting sentence stems based on dependency syntax ,
    所述存储器用于存储实现基于依存句法的句子主干抽取方法的程序;Described memory is used for storing the program that realizes sentence trunk extraction method based on dependency syntax;
    所述处理器用于执行实现所述基于依存句法的句子主干抽取方法的程序,以实现如权利要求2所述基于依存句法的句子主干抽取方法的步骤。The processor is configured to execute a program for implementing the method for extracting sentence stems based on dependency syntax, so as to implement the steps of the method for extracting sentence stems based on dependency syntax as claimed in claim 2 .
  11. 一种基于依存句法的句子主干抽取设备,其中,所述基于依存句法的句子主干抽取设备包括:存储器、处理器以及存储在存储器上的用于实现所述基于依存句法的句子主干抽取方法的程序,A device for extracting sentence stems based on dependency syntax, wherein the device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a program stored in the memory for implementing the method for extracting sentence stems based on dependency syntax ,
    所述存储器用于存储实现基于依存句法的句子主干抽取方法的程序;Described memory is used for storing the program that realizes sentence trunk extraction method based on dependency syntax;
    所述处理器用于执行实现所述基于依存句法的句子主干抽取方法的程序,以实现如权利要求3所述基于依存句法的句子主干抽取方法的步骤。The processor is configured to execute a program for implementing the method for extracting sentence stems based on dependency syntax, so as to implement the steps of the method for extracting sentence stems based on dependency syntax as claimed in claim 3 .
  12. 一种基于依存句法的句子主干抽取设备,其中,所述基于依存句法的句子主干抽取设备包括:存储器、处理器以及存储在存储器上的用于实现所述基于依存句法的句子主干抽取方法的程序,A device for extracting sentence stems based on dependency syntax, wherein the device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a program stored on the memory for implementing the method for extracting sentence stems based on dependency syntax ,
    所述存储器用于存储实现基于依存句法的句子主干抽取方法的程序;Described memory is used for storing the program that realizes sentence trunk extraction method based on dependency syntax;
    所述处理器用于执行实现所述基于依存句法的句子主干抽取方法的程序,以实现如权利要求4所述基于依存句法的句子主干抽取方法的步骤。The processor is configured to execute a program for implementing the method for extracting sentence stems based on dependency syntax, so as to implement the steps of the method for extracting sentence stems based on dependency syntax as claimed in claim 4 .
  13. 一种基于依存句法的句子主干抽取设备,其中,所述基于依存句法的句子主干抽取设备包括:存储器、处理器以及存储在存储器上的用于实现所述基于依存句法的句子主干抽取方法的程序,A device for extracting sentence stems based on dependency syntax, wherein the device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a program stored in the memory for implementing the method for extracting sentence stems based on dependency syntax ,
    所述存储器用于存储实现基于依存句法的句子主干抽取方法的程序;Described memory is used for storing the program that realizes sentence trunk extraction method based on dependency syntax;
    所述处理器用于执行实现所述基于依存句法的句子主干抽取方法的程序,以实现如权利要求5所述基于依存句法的句子主干抽取方法的步骤。The processor is configured to execute a program for implementing the method for extracting sentence stems based on dependency syntax, so as to implement the steps of the method for extracting sentence stems based on dependency syntax as claimed in claim 5 .
  14. 一种基于依存句法的句子主干抽取设备,其中,所述基于依存句法的句子主干抽 取设备包括:存储器、处理器以及存储在存储器上的用于实现所述基于依存句法的句子主干抽取方法的程序,A device for extracting sentence stems based on dependency syntax, wherein the device for extracting sentence stems based on dependency syntax includes: a memory, a processor, and a program stored on the memory for implementing the method for extracting sentence stems based on dependency syntax ,
    所述存储器用于存储实现基于依存句法的句子主干抽取方法的程序;Described memory is used for storing the program that realizes sentence trunk extraction method based on dependency syntax;
    所述处理器用于执行实现所述基于依存句法的句子主干抽取方法的程序,以实现如权利要求6所述基于依存句法的句子主干抽取方法的步骤。The processor is configured to execute a program for implementing the method for extracting sentence stems based on dependency syntax, so as to implement the steps of the method for extracting sentence stems based on dependency syntax as claimed in claim 6 .
  15. 一种可读存储介质,其中,所述可读存储介质上存储有实现基于依存句法的句子主干抽取方法的程序,所述实现基于依存句法的句子主干抽取方法的程序被处理器执行以实现如权利要求1所述基于依存句法的句子主干抽取方法的步骤。A readable storage medium, wherein the readable storage medium stores a program for implementing the method for extracting sentence trunks based on dependency syntax, and the program for implementing the method for extracting sentence trunks based on dependency syntax is executed by a processor to achieve such as: The steps of the method for extracting sentence stems based on dependency syntax according to claim 1.
  16. 一种可读存储介质,其中,所述可读存储介质上存储有实现基于依存句法的句子主干抽取方法的程序,所述实现基于依存句法的句子主干抽取方法的程序被处理器执行以实现如权利要求2所述基于依存句法的句子主干抽取方法的步骤。A readable storage medium, wherein the readable storage medium stores a program for implementing the method for extracting sentence trunks based on dependency syntax, and the program for implementing the method for extracting sentence trunks based on dependency syntax is executed by a processor to achieve such as: The steps of the method for extracting sentence stems based on dependency syntax according to claim 2.
  17. 一种可读存储介质,其中,所述可读存储介质上存储有实现基于依存句法的句子主干抽取方法的程序,所述实现基于依存句法的句子主干抽取方法的程序被处理器执行以实现如权利要求3所述基于依存句法的句子主干抽取方法的步骤。A readable storage medium, wherein the readable storage medium stores a program for implementing the method for extracting sentence trunks based on dependency syntax, and the program for implementing the method for extracting sentence trunks based on dependency syntax is executed by a processor to achieve such as: The steps of the method for extracting sentence stems based on dependency syntax according to claim 3.
  18. 一种可读存储介质,其中,所述可读存储介质上存储有实现基于依存句法的句子主干抽取方法的程序,所述实现基于依存句法的句子主干抽取方法的程序被处理器执行以实现如权利要求4所述基于依存句法的句子主干抽取方法的步骤。A readable storage medium, wherein the readable storage medium stores a program for implementing the method for extracting sentence trunks based on dependency syntax, and the program for implementing the method for extracting sentence trunks based on dependency syntax is executed by a processor to achieve such as: The steps of the method for extracting sentence stems based on dependency syntax according to claim 4 .
  19. 一种可读存储介质,其中,所述可读存储介质上存储有实现基于依存句法的句子主干抽取方法的程序,所述实现基于依存句法的句子主干抽取方法的程序被处理器执行以实现如权利要求5所述基于依存句法的句子主干抽取方法的步骤。A readable storage medium, wherein the readable storage medium stores a program for implementing the method for extracting sentence trunks based on dependency syntax, and the program for implementing the method for extracting sentence trunks based on dependency syntax is executed by a processor to achieve such as: The steps of the method for extracting sentence stems based on dependency syntax according to claim 5 .
  20. 一种可读存储介质,其中,所述可读存储介质上存储有实现基于依存句法的句子主干抽取方法的程序,所述实现基于依存句法的句子主干抽取方法的程序被处理器执行以实现如权利要求6所述基于依存句法的句子主干抽取方法的步骤。A readable storage medium, wherein the readable storage medium stores a program for implementing the method for extracting sentence trunks based on dependency syntax, and the program for implementing the method for extracting sentence trunks based on dependency syntax is executed by a processor to achieve such as: The steps of the method for extracting sentence stems based on dependency syntax according to claim 6 .
PCT/CN2021/094933 2020-09-14 2021-05-20 Method and apparatus for extracting sentence main portion on the basis of dependency grammar, and readable storage medium WO2022052505A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010965433.6 2020-09-14
CN202010965433.6A CN112069801A (en) 2020-09-14 2020-09-14 Sentence backbone extraction method, equipment and readable storage medium based on dependency syntax

Publications (1)

Publication Number Publication Date
WO2022052505A1 true WO2022052505A1 (en) 2022-03-17

Family

ID=73696751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094933 WO2022052505A1 (en) 2020-09-14 2021-05-20 Method and apparatus for extracting sentence main portion on the basis of dependency grammar, and readable storage medium

Country Status (2)

Country Link
CN (1) CN112069801A (en)
WO (1) WO2022052505A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306663A (en) * 2022-12-27 2023-06-23 华润数字科技有限公司 Semantic role labeling method, device, equipment and medium
CN116933697A (en) * 2023-09-18 2023-10-24 上海芯联芯智能科技有限公司 Method and device for converting natural language into hardware description language
CN117669593A (en) * 2024-01-31 2024-03-08 山东省计算中心(国家超级计算济南中心) Zero sample relation extraction method, system, equipment and medium based on equivalent semantics

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069801A (en) * 2020-09-14 2020-12-11 深圳前海微众银行股份有限公司 Sentence backbone extraction method, equipment and readable storage medium based on dependency syntax
CN112560481B (en) * 2020-12-25 2024-05-31 北京百度网讯科技有限公司 Statement processing method, device and storage medium
CN114827360A (en) * 2021-01-27 2022-07-29 深圳市万普拉斯科技有限公司 Voice response method, device, controller and computer readable storage medium
CN113407739B (en) * 2021-07-14 2023-01-06 海信视像科技股份有限公司 Method, apparatus and storage medium for determining concept in information title

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470026A (en) * 2018-03-23 2018-08-31 北京奇虎科技有限公司 The sentence trunk method for extracting content and device of headline
CN110377903A (en) * 2019-06-24 2019-10-25 浙江大学 A kind of Sentence-level entity and relationship combine abstracting method
CN110704598A (en) * 2019-09-29 2020-01-17 北京明略软件***有限公司 Statement information extraction method, extraction device and readable storage medium
CN112069801A (en) * 2020-09-14 2020-12-11 深圳前海微众银行股份有限公司 Sentence backbone extraction method, equipment and readable storage medium based on dependency syntax

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470026A (en) * 2018-03-23 2018-08-31 北京奇虎科技有限公司 The sentence trunk method for extracting content and device of headline
CN110377903A (en) * 2019-06-24 2019-10-25 浙江大学 A kind of Sentence-level entity and relationship combine abstracting method
CN110704598A (en) * 2019-09-29 2020-01-17 北京明略软件***有限公司 Statement information extraction method, extraction device and readable storage medium
CN112069801A (en) * 2020-09-14 2020-12-11 深圳前海微众银行股份有限公司 Sentence backbone extraction method, equipment and readable storage medium based on dependency syntax

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Thesis Submitted to Ludong Universityfor the Degree of Master ", 15 December 2019, LUDONG UNIVERSITY, China, article HE, LONG: "Research on the Method of Extracting Enterprise Tax Law Entity Relationship Based on Dependency Parsing", pages: 1 - 94, XP009539053 *
LIU YUEHONG: "Question Dependency Syntax and Semantic Analysis Research", CHINESE MASTER’S THESES FULL-TEXT DATABASE, INFORMATION SCIENCE AND TECHNOLOGY, 15 May 2012 (2012-05-15), XP055911488 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306663A (en) * 2022-12-27 2023-06-23 华润数字科技有限公司 Semantic role labeling method, device, equipment and medium
CN116306663B (en) * 2022-12-27 2024-01-02 华润数字科技有限公司 Semantic role labeling method, device, equipment and medium
CN116933697A (en) * 2023-09-18 2023-10-24 上海芯联芯智能科技有限公司 Method and device for converting natural language into hardware description language
CN116933697B (en) * 2023-09-18 2023-12-08 上海芯联芯智能科技有限公司 Method and device for converting natural language into hardware description language
CN117669593A (en) * 2024-01-31 2024-03-08 山东省计算中心(国家超级计算济南中心) Zero sample relation extraction method, system, equipment and medium based on equivalent semantics
CN117669593B (en) * 2024-01-31 2024-04-26 山东省计算中心(国家超级计算济南中心) Zero sample relation extraction method, system, equipment and medium based on equivalent semantics

Also Published As

Publication number Publication date
CN112069801A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
WO2022052505A1 (en) Method and apparatus for extracting sentence main portion on the basis of dependency grammar, and readable storage medium
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN109145294B (en) Text entity identification method and device, electronic equipment and storage medium
CN106777013B (en) Conversation management method and device
WO2021051871A1 (en) Text extraction method, apparatus, and device, and storage medium
CN116775847B (en) Question answering method and system based on knowledge graph and large language model
US20160328467A1 (en) Natural language question answering method and apparatus
CN112069298A (en) Human-computer interaction method, device and medium based on semantic web and intention recognition
TW202020691A (en) Feature word determination method and device and server
CN107015964B (en) Intelligent robot development-oriented custom intention implementation method and device
CN110704621A (en) Text processing method and device, storage medium and electronic equipment
CN112084793B (en) Semantic recognition method, device and readable storage medium based on dependency syntax
CN112100354A (en) Man-machine conversation method, device, equipment and storage medium
CN111274267A (en) Database query method and device and computer readable storage medium
WO2022048194A1 (en) Method, apparatus and device for optimizing event subject identification model, and readable storage medium
WO2024067276A1 (en) Video tag determination method and apparatus, device and medium
CN114860942B (en) Text intention classification method, device, equipment and storage medium
CN112069799A (en) Dependency syntax based data enhancement method, apparatus and readable storage medium
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
CN115658846A (en) Intelligent search method and device suitable for open-source software supply chain
CN111400340A (en) Natural language processing method and device, computer equipment and storage medium
CN117435716B (en) Data processing method and system of power grid man-machine interaction terminal
CN112860907B (en) Emotion classification method and equipment
CN112668341B (en) Text regularization method, apparatus, device and readable storage medium
CN113220854A (en) Intelligent dialogue method and device for machine reading understanding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21865566

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21865566

Country of ref document: EP

Kind code of ref document: A1