CN110765765A - Contract key clause extraction method and device based on artificial intelligence and storage medium - Google Patents
Contract key clause extraction method and device based on artificial intelligence and storage medium Download PDFInfo
- Publication number
- CN110765765A CN110765765A CN201910873470.1A CN201910873470A CN110765765A CN 110765765 A CN110765765 A CN 110765765A CN 201910873470 A CN201910873470 A CN 201910873470A CN 110765765 A CN110765765 A CN 110765765A
- Authority
- CN
- China
- Prior art keywords
- contract
- text
- word
- vector set
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 112
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 65
- 239000013598 vector Substances 0.000 claims abstract description 195
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 230000011218 segmentation Effects 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 13
- 230000009467 reduction Effects 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 8
- 238000013016 damping Methods 0.000 claims description 6
- 230000005484 gravity Effects 0.000 claims description 6
- 238000000034 method Methods 0.000 description 23
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000013507 mapping Methods 0.000 description 6
- 210000002569 neuron Anatomy 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000019771 cognition Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services; Handling legal documents
Abstract
The invention relates to an artificial intelligence technology, and discloses a contract key term extraction method based on artificial intelligence, which comprises the following steps: receiving a contract text, preprocessing the contract text to obtain a standard contract text, extracting a keyword set in the standard contract text, and converting the keyword set into a word vector set to obtain a keyword vector set; acquiring a text set of predetermined key contract terms, and converting the text set into a text word vector set; and inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, and taking the corresponding keyword vector as a key term of the contract text when the similarity set has similarity larger than a preset threshold value. The invention also provides a contract key clause extraction device based on artificial intelligence and a computer readable storage medium. The invention realizes the high-efficiency extraction of the key terms of the contract.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a contract key term extraction method and device based on artificial intelligence and a storage medium.
Background
With the advent of the network age, on-line contract texts began to emerge and their number was still increasing sharply every day, and it is important to efficiently extract the key terms of the contract texts in the face of such an enormous information resource of the contract texts. In current commercial contracts, the contract terms are numerous, but most of the contract terms are formatted or templated terms, and important information terms in the contract are not highlighted, so that the understanding and cognition of the contract are not facilitated. Therefore, how to extract the key terms of the contract text more efficiently becomes a big problem nowadays.
Disclosure of Invention
The invention provides a contract key term extraction method, a contract key term extraction device and a storage medium based on artificial intelligence, and mainly aims to present an efficient extraction result to a user when the user extracts the contract key term.
In order to achieve the above purpose, the invention provides a contract key term extraction method based on artificial intelligence, which comprises the following steps:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
Optionally, the pre-processing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text includes:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
Optionally, the extracting, by using a keyword extraction algorithm, a keyword set in the standard contract text includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
Optionally, the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set includes:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
Optionally, the calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vector includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the keyword vector set, TPTThe feature vectors in the text word vector set.
In addition, to achieve the above object, the present invention further provides an artificial intelligence based contract key term extraction apparatus, including a memory and a processor, the memory storing therein an artificial intelligence based contract key term extraction program operable on the processor, the artificial intelligence based contract key term extraction program implementing the following steps when executed by the processor:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
Optionally, the pre-processing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text includes:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
Optionally, the extracting, by using a keyword extraction algorithm, a keyword set in the standard contract text includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF representsWord frequency, IDF denotes the inverse document frequency index, d is the characteristic word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
wherein ,is at the vertex WiThe relevant set, η is the damping coefficient;
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
Optionally, the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set includes:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
Optionally, the calculating the similarity between the standard keyword vector set and the text word vector set after feature vectors are extracted includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the keyword vector set, TPTThe feature vectors in the text word vector set.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having an artificial intelligence based contract key term extraction program stored thereon, the artificial intelligence based contract key term extraction program being executable by one or more processors to implement the steps of the artificial intelligence based contract key term extraction method as described above.
According to the contract key term extraction method and device based on artificial intelligence and the computer-readable storage medium, when a user extracts the contract key term based on artificial intelligence, the contract text of the user is received, preprocessing operation is carried out on the contract text, the key term of the contract text of the user is obtained by combining the key term obtained from the contract term information base and a pre-constructed intelligent contract key term extraction model, and an efficient contract key term extraction result based on artificial intelligence can be presented to the user.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for extracting key terms of a contract based on artificial intelligence according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an internal structure of an artificial intelligence-based contract key term extraction apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of an artificial intelligence based contract key term extraction program in an artificial intelligence based contract key term extraction apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a contract key term extraction method based on artificial intelligence. Referring to fig. 1, a schematic flow chart of a contract key term extraction method based on artificial intelligence according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the method for extracting contract key terms based on artificial intelligence includes:
and S1, receiving the contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text.
The stop words are words having no practical meaning in the text data function words, have no influence on the classification of the text, but have a high frequency of occurrence, and thus, the effect of text classification is caused, wherein the stop words comprise common pronouns, prepositions and the like, for example, the common stop words are "at", "not over" and the like. Preferably, the pre-constructed stop word list is matched with words in the contract text one by one to obtain the contract text and the stop words in the contract text, the stop words are deleted, and the pre-constructed stop word list is obtained by downloading through a webpage.
Further, the word segmentation in the invention comprises: matching the words of the contract text without stop words with the entries in the dictionary through a preset strategy to obtain the characteristic words of the contract text, separating the characteristic words by space symbols, and completing the word segmentation operation so as to obtain the standard contract text. Preferably, the preset strategy is a forward maximum matching method, the idea of the forward maximum matching method is to match several continuous characters in the text to be participled with a vocabulary from left to right, and if the matching is successful, a word is segmented.
And S2, extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set.
In a preferred embodiment of the present invention, the keyword extraction algorithm includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
And obtaining a keyword set in the standard contract text set according to the importance scores of the feature words.
Preferably, the present invention represents the set of keywords by converting the set of keywords into a word vector using a one-hot representation (one hot). The method comprises the steps of extracting all words in a corpus to construct a dictionary, wherein each word is represented by a word vector, the dimension of the vector is equal to the scale of the dictionary, the value of the dimension corresponding to the current word in the vector is 1, and the values of the other dimensions are all 0.
S3, obtaining a text set of the predetermined key contract terms from the contract term information base, and converting the text set of the key contract terms into a text word vector set.
In the preferred embodiment of the present invention, the contract clause information base is a database formed by combining contract information obtained from different enterprises and contract information downloaded from professional contract websites. The predetermined key contract terms include: transaction amount, transaction time, transaction mode, transaction object, and the like. Preferably, the text set of the key contract clauses is converted into the text word vector set by adopting the method of converting the keyword set into the word vector set.
S4, inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key clause extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as the key clause of the contract text, outputting the key clause through an output layer of the intelligent contract key clause extraction model, and highlighting the key clause in a preset mode to finish extraction of the contract text key clause.
In a preferred embodiment of the present invention, the pre-constructed intelligent contract key term extraction model includes: an input layer, a Convolutional Neural Network (CNN), and an output layer. The CNN is a feedforward neural network, the artificial neurons of which can respond to surrounding units in a part of coverage range, the basic structure of which comprises two layers, one of which is a feature extraction layer, the input of each neuron is connected with a local receiving domain of the previous layer, and the local features are extracted. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal. Preferably, in the present invention, the CNN includes: a convolutional layer, a pooling layer, and a fully-connected layer.
Preferably, in the present invention, the keyword vector set and the text word vector set are input into the input layer, the keyword vector set and the text word vector set are subjected to dimension reduction processing by the convolution layer, the keyword vector set and the text word vector set subjected to dimension reduction processing are subjected to feature vector extraction by the pooling layer, and the similarity between the keyword vector set and the text word vector set subjected to feature vector extraction is calculated by the full-link layer, so as to obtain the similarity set. When the similarity between the keyword vector and the text word vector is greater than the similarity of a preset threshold, the corresponding keyword vector is used as the key clause of the contract text, and the corresponding keyword vector is output through the output layer, so that the extraction of the key clause of the contract text is completed. Preferably, the similarity of the preset threshold in the present invention is 0.8, wherein the similarity calculation method includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the set of standard keyword vectors, TPTIs that it isFeature vectors in a text word vector set.
The highlighting of the key terms in the preset manner may include, for example, displaying the key terms in bold, underlined, or in different colors.
The invention also provides a contract key clause extraction device based on artificial intelligence. Referring to fig. 2, a schematic diagram of an internal structure of an artificial intelligence based contract key term extraction apparatus according to an embodiment of the present invention is shown.
In this embodiment, the contract key term extraction device 1 based on artificial intelligence may be a PC (personal computer), or a terminal device such as a smart phone, a tablet computer, a portable computer, or the like, or may be a server or the like. The artificial intelligence based contract key term extraction device 1 at least comprises a memory 11, a processor 12, a communication bus 13 and a network interface 14.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the artificial intelligence based contract key term extraction apparatus 1 in some embodiments, such as a hard disk of the artificial intelligence based contract key term extraction apparatus 1. The memory 11 may also be an external storage device of the artificial intelligence based contract key term extraction apparatus 1 in other embodiments, such as a plug-in hard disk provided on the artificial intelligence based contract key term extraction apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the artificial intelligence based contract key term extraction apparatus 1. The memory 11 can be used not only for storing application software installed in the artificial intelligence-based contract key term extraction apparatus 1 and various types of data, such as the code of the artificial intelligence-based contract key term extraction program 01, etc., but also for temporarily storing data that has been output or is to be output.
The processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is configured to execute program code stored in the memory 11 or process data, such as executing the artificial intelligence based contract key term extraction program 01.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. Among them, the display, which may also be appropriately referred to as a display screen or a display unit, displays information processed in the artificial intelligence based contract key term extraction apparatus 1 and a user interface for visualization.
While fig. 2 shows only the artificial intelligence based contractual term extraction apparatus 1 having the components 11-14 and the artificial intelligence based contractual term extraction program 01, those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the artificial intelligence based contractual term extraction apparatus 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores therein an artificial intelligence-based contract key term extraction program 01; the processor 12 implements the following steps when executing the artificial intelligence based contract key term extraction program 01 stored in the memory 11:
step one, receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text.
The stop words are words having no practical meaning in the text data function words, have no influence on the classification of the text, but have a high frequency of occurrence, and thus, the effect of text classification is caused, wherein the stop words comprise common pronouns, prepositions and the like, for example, the common stop words are "at", "not over" and the like. Preferably, the pre-constructed stop word list is matched with words in the contract text one by one to obtain the contract text and the stop words in the contract text, the stop words are deleted, and the pre-constructed stop word list is obtained by downloading through a webpage.
Further, the word segmentation in the invention comprises: matching the words of the contract text without stop words with the entries in the dictionary through a preset strategy to obtain the characteristic words of the contract text, separating the characteristic words by space symbols, and completing the word segmentation operation so as to obtain the standard contract text. Preferably, the preset strategy is a forward maximum matching method, the idea of the forward maximum matching method is to match several continuous characters in the text to be participled with a vocabulary from left to right, and if the matching is successful, a word is segmented.
And step two, extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set.
In a preferred embodiment of the present invention, the keyword extraction algorithm includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the aboveCharacteristic word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
And obtaining a keyword set in the standard contract text set according to the importance scores of the feature words.
Preferably, the present invention represents the set of keywords by converting the set of keywords into a word vector using a one-hot representation (one hot). The method comprises the steps of extracting all words in a corpus to construct a dictionary, wherein each word is represented by a word vector, the dimension of the vector is equal to the scale of the dictionary, the value of the dimension corresponding to the current word in the vector is 1, and the values of the other dimensions are all 0.
And step three, acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set.
In the preferred embodiment of the present invention, the contract clause information base is a database formed by combining contract information obtained from different enterprises and contract information downloaded from professional contract websites. The predetermined key contract terms include: transaction amount, transaction time, transaction mode, transaction object, and the like. Preferably, the text set of the key contract clauses is converted into the text word vector set by adopting the method of converting the keyword set into the word vector set.
Step four, inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key clause extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has a similarity larger than a preset threshold, taking the corresponding keyword vector as a key clause of the contract text, outputting the key clause through an output layer of the intelligent contract key clause extraction model, and highlighting the key clause in a preset mode to finish the extraction of the contract text key clause.
In a preferred embodiment of the present invention, the pre-constructed intelligent contract key term extraction model includes: an input layer, a Convolutional Neural Network (CNN), and an output layer. The CNN is a feedforward neural network, the artificial neurons of which can respond to surrounding units in a part of coverage range, the basic structure of which comprises two layers, one of which is a feature extraction layer, the input of each neuron is connected with a local receiving domain of the previous layer, and the local features are extracted. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal. Preferably, in the present invention, the CNN includes: a convolutional layer, a pooling layer, and a fully-connected layer.
Preferably, in the present invention, the keyword vector set and the text word vector set are input into the input layer, the keyword vector set and the text word vector set are subjected to dimension reduction processing by the convolution layer, the keyword vector set and the text word vector set subjected to dimension reduction processing are subjected to feature vector extraction by the pooling layer, and the similarity between the keyword vector set and the text word vector set subjected to feature vector extraction is calculated by the full-link layer, so as to obtain the similarity set. When the similarity between the keyword vector and the text word vector is greater than the similarity of a preset threshold, the corresponding keyword vector is used as the key clause of the contract text, and the corresponding keyword vector is output through the output layer, so that the extraction of the key clause of the contract text is completed. Preferably, the similarity of the preset threshold in the present invention is 0.8, wherein the similarity calculation method includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the set of standard keyword vectors, TPTThe feature vectors in the text word vector set.
The highlighting of the key terms in the preset manner may include, for example, displaying the key terms in bold, underlined, or in different colors.
Alternatively, in other embodiments, the artificial intelligence based contract key term extraction program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention, and the module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function for describing the execution process of the artificial intelligence based contract key term extraction program in the artificial intelligence based contract key term extraction apparatus.
For example, referring to fig. 3, a schematic diagram of program modules of an artificial intelligence based contract key term extraction program in an embodiment of the artificial intelligence based contract key term extraction apparatus according to the present invention is shown, in this embodiment, the artificial intelligence based contract key term extraction program may be divided into a text processing module 10, a text conversion module 20, and an extraction module 30, exemplarily:
the text processing module 10 is configured to: and receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text.
The text conversion module 20 is configured to: extracting a keyword set in the standard contract text by using a keyword extraction algorithm, converting the keyword set into a word vector set to obtain a keyword vector set, acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set.
The extraction module 30 is configured to input the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model, obtain a similarity set between the keyword vector set and the text word vector set, when a similarity greater than a preset threshold exists in the similarity set, use a corresponding keyword vector as a key term of the contract text, output the key term through an output layer of the intelligent contract key term extraction model, and highlight the key term in a preset manner, thereby completing extraction of the contract text key term.
The functions or operation steps of the above-mentioned text processing module 10, text conversion module 20, extraction module 30 and other program modules implemented when executed are substantially the same as those of the above-mentioned embodiments, and are not described herein again.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which an artificial intelligence-based contract key term extraction program is stored, where the artificial intelligence-based contract key term extraction program is executable by one or more processors to implement the following operations:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the apparatus and method for extracting key terms of a contract based on artificial intelligence, and will not be described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A contract key term extraction method based on artificial intelligence is characterized by comprising the following steps:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
2. The artificial intelligence based contract key term extraction method as claimed in claim 1, wherein the pre-processing operation of removing stop words and segmentation words on the contract text to obtain a standard contract text comprises:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
3. The artificial intelligence based contract key term extraction method according to claim 2, wherein said extracting a keyword set in said standard contract text using a keyword extraction algorithm comprises:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
4. The artificial intelligence based contract key term extraction method according to any one of claims 1 to 3, wherein the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set includes:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
5. The artificial intelligence based contract key term extraction method according to claim 4, wherein said calculating the similarity between the keyword vector set and the text word vector set after extracting feature vectors comprises:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the keyword vector set, TPTThe feature vectors in the text word vector set.
6. An artificial intelligence based contract key term extraction apparatus, the apparatus comprising a memory and a processor, the memory having stored thereon an artificial intelligence based contract key term extraction program operable on the processor, the artificial intelligence based contract key term extraction program when executed by the processor implementing the steps of:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
7. The artificial intelligence based contract key term extraction device as claimed in claim 6, wherein the pre-processing operation of removing stop words and segmentation words on the contract text to obtain a standard contract text comprises:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
8. The artificial intelligence based contract key term extraction apparatus according to claim 7, wherein said extracting a keyword set in said standard contract text using a keyword extraction algorithm comprises:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
9. The artificial intelligence based contract key term extraction device as claimed in any one of claims 6 to 8, wherein the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set comprises:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
10. A computer-readable storage medium having stored thereon an artificial intelligence-based contractual key terms extraction program executable by one or more processors to implement the steps of the artificial intelligence-based contractual key terms extraction method according to any one of claims 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910873470.1A CN110765765B (en) | 2019-09-16 | 2019-09-16 | Contract key term extraction method, device and storage medium based on artificial intelligence |
PCT/CN2020/098950 WO2021051934A1 (en) | 2019-09-16 | 2020-06-29 | Method and apparatus for extracting key contract term on basis of artificial intelligence, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910873470.1A CN110765765B (en) | 2019-09-16 | 2019-09-16 | Contract key term extraction method, device and storage medium based on artificial intelligence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110765765A true CN110765765A (en) | 2020-02-07 |
CN110765765B CN110765765B (en) | 2023-10-20 |
Family
ID=69329488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910873470.1A Active CN110765765B (en) | 2019-09-16 | 2019-09-16 | Contract key term extraction method, device and storage medium based on artificial intelligence |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110765765B (en) |
WO (1) | WO2021051934A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666408A (en) * | 2020-05-26 | 2020-09-15 | 中国工商银行股份有限公司 | Method and device for screening and displaying important clauses |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743802A (en) * | 2021-09-08 | 2021-12-03 | 平安信托有限责任公司 | Work order intelligent matching method and device, electronic equipment and readable storage medium |
CN116070641B (en) * | 2023-03-13 | 2023-06-06 | 北京点聚信息技术有限公司 | Online interpretation method of electronic contract |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391963A (en) * | 2014-12-01 | 2015-03-04 | 北京中科创益科技有限公司 | Method for constructing correlation networks of keywords of natural language texts |
US9600231B1 (en) * | 2015-03-13 | 2017-03-21 | Amazon Technologies, Inc. | Model shrinking for embedded keyword spotting |
CN107122413A (en) * | 2017-03-31 | 2017-09-01 | 北京奇艺世纪科技有限公司 | A kind of keyword extracting method and device based on graph model |
CN107506347A (en) * | 2017-07-22 | 2017-12-22 | 长沙兔子代跑网络科技有限公司 | A kind of intelligence obtains the method and device for running chat record in generation |
WO2018077655A1 (en) * | 2016-10-24 | 2018-05-03 | Koninklijke Philips N.V. | Multi domain real-time question answering system |
CN109918635A (en) * | 2017-12-12 | 2019-06-21 | 中兴通讯股份有限公司 | A kind of contract text risk checking method, device, equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11514096B2 (en) * | 2015-09-01 | 2022-11-29 | Panjiva, Inc. | Natural language processing for entity resolution |
CN108319627A (en) * | 2017-02-06 | 2018-07-24 | 腾讯科技(深圳)有限公司 | Keyword extracting method and keyword extracting device |
CN109657227A (en) * | 2018-10-08 | 2019-04-19 | 平安科技(深圳)有限公司 | Contract feasibility determination method, equipment, storage medium and device |
CN110032632A (en) * | 2019-04-04 | 2019-07-19 | 平安科技(深圳)有限公司 | Intelligent customer service answering method, device and storage medium based on text similarity |
CN110163478B (en) * | 2019-04-18 | 2024-04-05 | 平安科技(深圳)有限公司 | Risk examination method and device for contract clauses |
-
2019
- 2019-09-16 CN CN201910873470.1A patent/CN110765765B/en active Active
-
2020
- 2020-06-29 WO PCT/CN2020/098950 patent/WO2021051934A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391963A (en) * | 2014-12-01 | 2015-03-04 | 北京中科创益科技有限公司 | Method for constructing correlation networks of keywords of natural language texts |
US9600231B1 (en) * | 2015-03-13 | 2017-03-21 | Amazon Technologies, Inc. | Model shrinking for embedded keyword spotting |
WO2018077655A1 (en) * | 2016-10-24 | 2018-05-03 | Koninklijke Philips N.V. | Multi domain real-time question answering system |
CN107122413A (en) * | 2017-03-31 | 2017-09-01 | 北京奇艺世纪科技有限公司 | A kind of keyword extracting method and device based on graph model |
CN107506347A (en) * | 2017-07-22 | 2017-12-22 | 长沙兔子代跑网络科技有限公司 | A kind of intelligence obtains the method and device for running chat record in generation |
CN109918635A (en) * | 2017-12-12 | 2019-06-21 | 中兴通讯股份有限公司 | A kind of contract text risk checking method, device, equipment and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111666408A (en) * | 2020-05-26 | 2020-09-15 | 中国工商银行股份有限公司 | Method and device for screening and displaying important clauses |
Also Published As
Publication number | Publication date |
---|---|
CN110765765B (en) | 2023-10-20 |
WO2021051934A1 (en) | 2021-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110222160B (en) | Intelligent semantic document recommendation method and device and computer readable storage medium | |
JP7302022B2 (en) | A text classification method, apparatus, computer readable storage medium and text classification program. | |
WO2020253042A1 (en) | Intelligent sentiment judgment method and device, and computer readable storage medium | |
CN114780727A (en) | Text classification method and device based on reinforcement learning, computer equipment and medium | |
CN110866098B (en) | Machine reading method and device based on transformer and lstm and readable storage medium | |
US11599727B2 (en) | Intelligent text cleaning method and apparatus, and computer-readable storage medium | |
CN110427480B (en) | Intelligent personalized text recommendation method and device and computer readable storage medium | |
CN110765761A (en) | Contract sensitive word checking method and device based on artificial intelligence and storage medium | |
CN110765765B (en) | Contract key term extraction method, device and storage medium based on artificial intelligence | |
CN111460090A (en) | Vector-based document retrieval method and device, computer equipment and storage medium | |
CN111241828A (en) | Intelligent emotion recognition method and device and computer readable storage medium | |
CN111460081B (en) | Answer generation method based on deep learning, electronic device and readable storage medium | |
CN110704687B (en) | Text layout method, text layout device and computer readable storage medium | |
CN110427453B (en) | Data similarity calculation method, device, computer equipment and storage medium | |
CN110502748B (en) | Text topic extraction method, device and computer readable storage medium | |
CN114398882A (en) | Document processing method, device, equipment and storage medium | |
CN110866042A (en) | Intelligent table query method and device and computer readable storage medium | |
CN113627797A (en) | Image generation method and device for employee enrollment, computer equipment and storage medium | |
CN110222144B (en) | Text content extraction method and device, electronic equipment and storage medium | |
CN113609847B (en) | Information extraction method, device, electronic equipment and storage medium | |
CN113360654B (en) | Text classification method, apparatus, electronic device and readable storage medium | |
CN112445862B (en) | Internet of things equipment data set construction method and device, electronic equipment and storage medium | |
CN112307175B (en) | Text processing method, text processing device, server and computer readable storage medium | |
WO2021139076A1 (en) | Intelligent text dialogue generation method and apparatus, and computer-readable storage medium | |
CN113434636A (en) | Semantic-based approximate text search method and device, computer equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |