CN110765765A - Contract key clause extraction method and device based on artificial intelligence and storage medium - Google Patents

Contract key clause extraction method and device based on artificial intelligence and storage medium Download PDF

Info

Publication number
CN110765765A
CN110765765A CN201910873470.1A CN201910873470A CN110765765A CN 110765765 A CN110765765 A CN 110765765A CN 201910873470 A CN201910873470 A CN 201910873470A CN 110765765 A CN110765765 A CN 110765765A
Authority
CN
China
Prior art keywords
contract
text
word
vector set
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910873470.1A
Other languages
Chinese (zh)
Other versions
CN110765765B (en
Inventor
侯丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910873470.1A priority Critical patent/CN110765765B/en
Publication of CN110765765A publication Critical patent/CN110765765A/en
Priority to PCT/CN2020/098950 priority patent/WO2021051934A1/en
Application granted granted Critical
Publication of CN110765765B publication Critical patent/CN110765765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents

Abstract

The invention relates to an artificial intelligence technology, and discloses a contract key term extraction method based on artificial intelligence, which comprises the following steps: receiving a contract text, preprocessing the contract text to obtain a standard contract text, extracting a keyword set in the standard contract text, and converting the keyword set into a word vector set to obtain a keyword vector set; acquiring a text set of predetermined key contract terms, and converting the text set into a text word vector set; and inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, and taking the corresponding keyword vector as a key term of the contract text when the similarity set has similarity larger than a preset threshold value. The invention also provides a contract key clause extraction device based on artificial intelligence and a computer readable storage medium. The invention realizes the high-efficiency extraction of the key terms of the contract.

Description

Contract key clause extraction method and device based on artificial intelligence and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a contract key term extraction method and device based on artificial intelligence and a storage medium.
Background
With the advent of the network age, on-line contract texts began to emerge and their number was still increasing sharply every day, and it is important to efficiently extract the key terms of the contract texts in the face of such an enormous information resource of the contract texts. In current commercial contracts, the contract terms are numerous, but most of the contract terms are formatted or templated terms, and important information terms in the contract are not highlighted, so that the understanding and cognition of the contract are not facilitated. Therefore, how to extract the key terms of the contract text more efficiently becomes a big problem nowadays.
Disclosure of Invention
The invention provides a contract key term extraction method, a contract key term extraction device and a storage medium based on artificial intelligence, and mainly aims to present an efficient extraction result to a user when the user extracts the contract key term.
In order to achieve the above purpose, the invention provides a contract key term extraction method based on artificial intelligence, which comprises the following steps:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
Optionally, the pre-processing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text includes:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
Optionally, the extracting, by using a keyword extraction algorithm, a keyword set in the standard contract text includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
Figure BDA0002203037780000021
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
Figure BDA0002203037780000022
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
Figure BDA0002203037780000031
wherein ,
Figure BDA0002203037780000032
is at the vertex WiThe relevant set, η is the damping coefficient;
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
Optionally, the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set includes:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
Optionally, the calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vector includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the keyword vector set, TPTThe feature vectors in the text word vector set.
In addition, to achieve the above object, the present invention further provides an artificial intelligence based contract key term extraction apparatus, including a memory and a processor, the memory storing therein an artificial intelligence based contract key term extraction program operable on the processor, the artificial intelligence based contract key term extraction program implementing the following steps when executed by the processor:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
Optionally, the pre-processing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text includes:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
Optionally, the extracting, by using a keyword extraction algorithm, a keyword set in the standard contract text includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
Figure BDA0002203037780000041
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
Figure BDA0002203037780000042
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF representsWord frequency, IDF denotes the inverse document frequency index, d is the characteristic word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
Figure BDA0002203037780000051
wherein ,is at the vertex WiThe relevant set, η is the damping coefficient;
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
Optionally, the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set includes:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
Optionally, the calculating the similarity between the standard keyword vector set and the text word vector set after feature vectors are extracted includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the keyword vector set, TPTThe feature vectors in the text word vector set.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having an artificial intelligence based contract key term extraction program stored thereon, the artificial intelligence based contract key term extraction program being executable by one or more processors to implement the steps of the artificial intelligence based contract key term extraction method as described above.
According to the contract key term extraction method and device based on artificial intelligence and the computer-readable storage medium, when a user extracts the contract key term based on artificial intelligence, the contract text of the user is received, preprocessing operation is carried out on the contract text, the key term of the contract text of the user is obtained by combining the key term obtained from the contract term information base and a pre-constructed intelligent contract key term extraction model, and an efficient contract key term extraction result based on artificial intelligence can be presented to the user.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for extracting key terms of a contract based on artificial intelligence according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an internal structure of an artificial intelligence-based contract key term extraction apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of an artificial intelligence based contract key term extraction program in an artificial intelligence based contract key term extraction apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a contract key term extraction method based on artificial intelligence. Referring to fig. 1, a schematic flow chart of a contract key term extraction method based on artificial intelligence according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the method for extracting contract key terms based on artificial intelligence includes:
and S1, receiving the contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text.
The stop words are words having no practical meaning in the text data function words, have no influence on the classification of the text, but have a high frequency of occurrence, and thus, the effect of text classification is caused, wherein the stop words comprise common pronouns, prepositions and the like, for example, the common stop words are "at", "not over" and the like. Preferably, the pre-constructed stop word list is matched with words in the contract text one by one to obtain the contract text and the stop words in the contract text, the stop words are deleted, and the pre-constructed stop word list is obtained by downloading through a webpage.
Further, the word segmentation in the invention comprises: matching the words of the contract text without stop words with the entries in the dictionary through a preset strategy to obtain the characteristic words of the contract text, separating the characteristic words by space symbols, and completing the word segmentation operation so as to obtain the standard contract text. Preferably, the preset strategy is a forward maximum matching method, the idea of the forward maximum matching method is to match several continuous characters in the text to be participled with a vocabulary from left to right, and if the matching is successful, a word is segmented.
And S2, extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set.
In a preferred embodiment of the present invention, the keyword extraction algorithm includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
Figure BDA0002203037780000071
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
Figure BDA0002203037780000072
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
Figure BDA0002203037780000073
wherein ,
Figure BDA0002203037780000074
is at the vertex WiThe relevant set, η, is the damping coefficient.
And obtaining a keyword set in the standard contract text set according to the importance scores of the feature words.
Preferably, the present invention represents the set of keywords by converting the set of keywords into a word vector using a one-hot representation (one hot). The method comprises the steps of extracting all words in a corpus to construct a dictionary, wherein each word is represented by a word vector, the dimension of the vector is equal to the scale of the dictionary, the value of the dimension corresponding to the current word in the vector is 1, and the values of the other dimensions are all 0.
S3, obtaining a text set of the predetermined key contract terms from the contract term information base, and converting the text set of the key contract terms into a text word vector set.
In the preferred embodiment of the present invention, the contract clause information base is a database formed by combining contract information obtained from different enterprises and contract information downloaded from professional contract websites. The predetermined key contract terms include: transaction amount, transaction time, transaction mode, transaction object, and the like. Preferably, the text set of the key contract clauses is converted into the text word vector set by adopting the method of converting the keyword set into the word vector set.
S4, inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key clause extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as the key clause of the contract text, outputting the key clause through an output layer of the intelligent contract key clause extraction model, and highlighting the key clause in a preset mode to finish extraction of the contract text key clause.
In a preferred embodiment of the present invention, the pre-constructed intelligent contract key term extraction model includes: an input layer, a Convolutional Neural Network (CNN), and an output layer. The CNN is a feedforward neural network, the artificial neurons of which can respond to surrounding units in a part of coverage range, the basic structure of which comprises two layers, one of which is a feature extraction layer, the input of each neuron is connected with a local receiving domain of the previous layer, and the local features are extracted. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal. Preferably, in the present invention, the CNN includes: a convolutional layer, a pooling layer, and a fully-connected layer.
Preferably, in the present invention, the keyword vector set and the text word vector set are input into the input layer, the keyword vector set and the text word vector set are subjected to dimension reduction processing by the convolution layer, the keyword vector set and the text word vector set subjected to dimension reduction processing are subjected to feature vector extraction by the pooling layer, and the similarity between the keyword vector set and the text word vector set subjected to feature vector extraction is calculated by the full-link layer, so as to obtain the similarity set. When the similarity between the keyword vector and the text word vector is greater than the similarity of a preset threshold, the corresponding keyword vector is used as the key clause of the contract text, and the corresponding keyword vector is output through the output layer, so that the extraction of the key clause of the contract text is completed. Preferably, the similarity of the preset threshold in the present invention is 0.8, wherein the similarity calculation method includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the set of standard keyword vectors, TPTIs that it isFeature vectors in a text word vector set.
The highlighting of the key terms in the preset manner may include, for example, displaying the key terms in bold, underlined, or in different colors.
The invention also provides a contract key clause extraction device based on artificial intelligence. Referring to fig. 2, a schematic diagram of an internal structure of an artificial intelligence based contract key term extraction apparatus according to an embodiment of the present invention is shown.
In this embodiment, the contract key term extraction device 1 based on artificial intelligence may be a PC (personal computer), or a terminal device such as a smart phone, a tablet computer, a portable computer, or the like, or may be a server or the like. The artificial intelligence based contract key term extraction device 1 at least comprises a memory 11, a processor 12, a communication bus 13 and a network interface 14.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the artificial intelligence based contract key term extraction apparatus 1 in some embodiments, such as a hard disk of the artificial intelligence based contract key term extraction apparatus 1. The memory 11 may also be an external storage device of the artificial intelligence based contract key term extraction apparatus 1 in other embodiments, such as a plug-in hard disk provided on the artificial intelligence based contract key term extraction apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the artificial intelligence based contract key term extraction apparatus 1. The memory 11 can be used not only for storing application software installed in the artificial intelligence-based contract key term extraction apparatus 1 and various types of data, such as the code of the artificial intelligence-based contract key term extraction program 01, etc., but also for temporarily storing data that has been output or is to be output.
The processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is configured to execute program code stored in the memory 11 or process data, such as executing the artificial intelligence based contract key term extraction program 01.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. Among them, the display, which may also be appropriately referred to as a display screen or a display unit, displays information processed in the artificial intelligence based contract key term extraction apparatus 1 and a user interface for visualization.
While fig. 2 shows only the artificial intelligence based contractual term extraction apparatus 1 having the components 11-14 and the artificial intelligence based contractual term extraction program 01, those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the artificial intelligence based contractual term extraction apparatus 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores therein an artificial intelligence-based contract key term extraction program 01; the processor 12 implements the following steps when executing the artificial intelligence based contract key term extraction program 01 stored in the memory 11:
step one, receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text.
The stop words are words having no practical meaning in the text data function words, have no influence on the classification of the text, but have a high frequency of occurrence, and thus, the effect of text classification is caused, wherein the stop words comprise common pronouns, prepositions and the like, for example, the common stop words are "at", "not over" and the like. Preferably, the pre-constructed stop word list is matched with words in the contract text one by one to obtain the contract text and the stop words in the contract text, the stop words are deleted, and the pre-constructed stop word list is obtained by downloading through a webpage.
Further, the word segmentation in the invention comprises: matching the words of the contract text without stop words with the entries in the dictionary through a preset strategy to obtain the characteristic words of the contract text, separating the characteristic words by space symbols, and completing the word segmentation operation so as to obtain the standard contract text. Preferably, the preset strategy is a forward maximum matching method, the idea of the forward maximum matching method is to match several continuous characters in the text to be participled with a vocabulary from left to right, and if the matching is successful, a word is segmented.
And step two, extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set.
In a preferred embodiment of the present invention, the keyword extraction algorithm includes:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
Figure BDA0002203037780000111
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the aboveCharacteristic word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
Figure BDA0002203037780000112
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
Figure BDA0002203037780000121
wherein ,
Figure BDA0002203037780000122
is at the vertex WiThe relevant set, η, is the damping coefficient.
And obtaining a keyword set in the standard contract text set according to the importance scores of the feature words.
Preferably, the present invention represents the set of keywords by converting the set of keywords into a word vector using a one-hot representation (one hot). The method comprises the steps of extracting all words in a corpus to construct a dictionary, wherein each word is represented by a word vector, the dimension of the vector is equal to the scale of the dictionary, the value of the dimension corresponding to the current word in the vector is 1, and the values of the other dimensions are all 0.
And step three, acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set.
In the preferred embodiment of the present invention, the contract clause information base is a database formed by combining contract information obtained from different enterprises and contract information downloaded from professional contract websites. The predetermined key contract terms include: transaction amount, transaction time, transaction mode, transaction object, and the like. Preferably, the text set of the key contract clauses is converted into the text word vector set by adopting the method of converting the keyword set into the word vector set.
Step four, inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key clause extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has a similarity larger than a preset threshold, taking the corresponding keyword vector as a key clause of the contract text, outputting the key clause through an output layer of the intelligent contract key clause extraction model, and highlighting the key clause in a preset mode to finish the extraction of the contract text key clause.
In a preferred embodiment of the present invention, the pre-constructed intelligent contract key term extraction model includes: an input layer, a Convolutional Neural Network (CNN), and an output layer. The CNN is a feedforward neural network, the artificial neurons of which can respond to surrounding units in a part of coverage range, the basic structure of which comprises two layers, one of which is a feature extraction layer, the input of each neuron is connected with a local receiving domain of the previous layer, and the local features are extracted. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal. Preferably, in the present invention, the CNN includes: a convolutional layer, a pooling layer, and a fully-connected layer.
Preferably, in the present invention, the keyword vector set and the text word vector set are input into the input layer, the keyword vector set and the text word vector set are subjected to dimension reduction processing by the convolution layer, the keyword vector set and the text word vector set subjected to dimension reduction processing are subjected to feature vector extraction by the pooling layer, and the similarity between the keyword vector set and the text word vector set subjected to feature vector extraction is calculated by the full-link layer, so as to obtain the similarity set. When the similarity between the keyword vector and the text word vector is greater than the similarity of a preset threshold, the corresponding keyword vector is used as the key clause of the contract text, and the corresponding keyword vector is output through the output layer, so that the extraction of the key clause of the contract text is completed. Preferably, the similarity of the preset threshold in the present invention is 0.8, wherein the similarity calculation method includes:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the set of standard keyword vectors, TPTThe feature vectors in the text word vector set.
The highlighting of the key terms in the preset manner may include, for example, displaying the key terms in bold, underlined, or in different colors.
Alternatively, in other embodiments, the artificial intelligence based contract key term extraction program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention, and the module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function for describing the execution process of the artificial intelligence based contract key term extraction program in the artificial intelligence based contract key term extraction apparatus.
For example, referring to fig. 3, a schematic diagram of program modules of an artificial intelligence based contract key term extraction program in an embodiment of the artificial intelligence based contract key term extraction apparatus according to the present invention is shown, in this embodiment, the artificial intelligence based contract key term extraction program may be divided into a text processing module 10, a text conversion module 20, and an extraction module 30, exemplarily:
the text processing module 10 is configured to: and receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text.
The text conversion module 20 is configured to: extracting a keyword set in the standard contract text by using a keyword extraction algorithm, converting the keyword set into a word vector set to obtain a keyword vector set, acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set.
The extraction module 30 is configured to input the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model, obtain a similarity set between the keyword vector set and the text word vector set, when a similarity greater than a preset threshold exists in the similarity set, use a corresponding keyword vector as a key term of the contract text, output the key term through an output layer of the intelligent contract key term extraction model, and highlight the key term in a preset manner, thereby completing extraction of the contract text key term.
The functions or operation steps of the above-mentioned text processing module 10, text conversion module 20, extraction module 30 and other program modules implemented when executed are substantially the same as those of the above-mentioned embodiments, and are not described herein again.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which an artificial intelligence-based contract key term extraction program is stored, where the artificial intelligence-based contract key term extraction program is executable by one or more processors to implement the following operations:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the apparatus and method for extracting key terms of a contract based on artificial intelligence, and will not be described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A contract key term extraction method based on artificial intelligence is characterized by comprising the following steps:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
2. The artificial intelligence based contract key term extraction method as claimed in claim 1, wherein the pre-processing operation of removing stop words and segmentation words on the contract text to obtain a standard contract text comprises:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
3. The artificial intelligence based contract key term extraction method according to claim 2, wherein said extracting a keyword set in said standard contract text using a keyword extraction algorithm comprises:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
Figure FDA0002203037770000021
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
Figure FDA0002203037770000022
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
Figure FDA0002203037770000023
wherein ,
Figure FDA0002203037770000024
is at the vertex WiThe relevant set, η is the damping coefficient;
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
4. The artificial intelligence based contract key term extraction method according to any one of claims 1 to 3, wherein the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set includes:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
5. The artificial intelligence based contract key term extraction method according to claim 4, wherein said calculating the similarity between the keyword vector set and the text word vector set after extracting feature vectors comprises:
Simtopic=Pearson(TPS,TPT)
wherein ,TPSFor feature vectors in the keyword vector set, TPTThe feature vectors in the text word vector set.
6. An artificial intelligence based contract key term extraction apparatus, the apparatus comprising a memory and a processor, the memory having stored thereon an artificial intelligence based contract key term extraction program operable on the processor, the artificial intelligence based contract key term extraction program when executed by the processor implementing the steps of:
receiving a contract text, and performing preprocessing operation of removing stop words and word segmentation on the contract text to obtain a standard contract text;
extracting a keyword set in the standard contract text by using a keyword extraction algorithm, and converting the keyword set into a word vector set to obtain a keyword vector set;
acquiring a text set of predetermined key contract terms from a contract term information base, and converting the text set of the key contract terms into a text word vector set;
inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set, when the similarity set has similarity larger than a preset threshold, taking the corresponding keyword vector as a key term of the contract text, outputting the key term through an output layer of the intelligent contract key term extraction model, and highlighting the key term in a preset mode to finish extraction of the contract text key term.
7. The artificial intelligence based contract key term extraction device as claimed in claim 6, wherein the pre-processing operation of removing stop words and segmentation words on the contract text to obtain a standard contract text comprises:
matching the pre-constructed stop words with the words in the contract text one by one to obtain stop words in the contract text, and deleting the stop words;
matching the words in the contract text without stop words with the entries in the dictionary through a preset matching strategy to obtain the feature words of the contract text set without stop words, and separating the feature words by space symbols to obtain the standard contract text.
8. The artificial intelligence based contract key term extraction apparatus according to claim 7, wherein said extracting a keyword set in said standard contract text using a keyword extraction algorithm comprises:
calculating any two characteristic words W in the standard contract texti and WjDependence relevance of (2):
Figure FDA0002203037770000041
wherein, Dep (W)i,Wj) Represents the feature word Wi and WjDependence degree of (2), len (W)i,Wj) Represents the feature word Wi and WjB is a hyper-parameter;
calculating the feature word Wi and WjThe gravity of (2):
wherein ,fgrav(Wi,Wj) Expression of characteristic word Wi and WjGravitation of, tfidf (W)i) Expression of characteristic word WiTF-IDF value of (1), tfidf (W)j) Expression of characteristic word WjTF-IDF value of (1), TF represents word frequency, IDF represents inverse document frequency index, d is a feature word Wi and WjThe euclidean distance between the word vectors of (a);
obtaining the feature word W according to the dependency relevance and the gravityi and WjThe strength of the association between:
weight(Wi,Wj)=Dep(Wi,Wj)*fgrav(Wi,Wj)
calculating the feature word W according to the association strengthiThe importance score of (a):
Figure FDA0002203037770000043
wherein ,
Figure FDA0002203037770000044
is at the vertex WiThe relevant set, η is the damping coefficient;
and obtaining a keyword set in the standard contract text according to the importance scores of the feature words.
9. The artificial intelligence based contract key term extraction device as claimed in any one of claims 6 to 8, wherein the inputting the keyword vector set and the text word vector set into a pre-constructed intelligent contract key term extraction model to obtain a similarity set of the keyword vector set and the text word vector set comprises:
performing dimension reduction processing on the keyword vector set and the text word vector set through a convolution layer in the intelligent contract key clause extraction model;
extracting feature vectors from the keyword vector set and the text word vector set subjected to the dimensionality reduction treatment by using a pooling layer in the intelligent contract key term extraction model;
and calculating the similarity between the keyword vector set and the text word vector set after extracting the feature vectors through a full connection layer in the intelligent contract key clause extraction model, thereby obtaining the similarity set.
10. A computer-readable storage medium having stored thereon an artificial intelligence-based contractual key terms extraction program executable by one or more processors to implement the steps of the artificial intelligence-based contractual key terms extraction method according to any one of claims 1 to 5.
CN201910873470.1A 2019-09-16 2019-09-16 Contract key term extraction method, device and storage medium based on artificial intelligence Active CN110765765B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910873470.1A CN110765765B (en) 2019-09-16 2019-09-16 Contract key term extraction method, device and storage medium based on artificial intelligence
PCT/CN2020/098950 WO2021051934A1 (en) 2019-09-16 2020-06-29 Method and apparatus for extracting key contract term on basis of artificial intelligence, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910873470.1A CN110765765B (en) 2019-09-16 2019-09-16 Contract key term extraction method, device and storage medium based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN110765765A true CN110765765A (en) 2020-02-07
CN110765765B CN110765765B (en) 2023-10-20

Family

ID=69329488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910873470.1A Active CN110765765B (en) 2019-09-16 2019-09-16 Contract key term extraction method, device and storage medium based on artificial intelligence

Country Status (2)

Country Link
CN (1) CN110765765B (en)
WO (1) WO2021051934A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666408A (en) * 2020-05-26 2020-09-15 中国工商银行股份有限公司 Method and device for screening and displaying important clauses

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743802A (en) * 2021-09-08 2021-12-03 平安信托有限责任公司 Work order intelligent matching method and device, electronic equipment and readable storage medium
CN116070641B (en) * 2023-03-13 2023-06-06 北京点聚信息技术有限公司 Online interpretation method of electronic contract

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
US9600231B1 (en) * 2015-03-13 2017-03-21 Amazon Technologies, Inc. Model shrinking for embedded keyword spotting
CN107122413A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 A kind of keyword extracting method and device based on graph model
CN107506347A (en) * 2017-07-22 2017-12-22 长沙兔子代跑网络科技有限公司 A kind of intelligence obtains the method and device for running chat record in generation
WO2018077655A1 (en) * 2016-10-24 2018-05-03 Koninklijke Philips N.V. Multi domain real-time question answering system
CN109918635A (en) * 2017-12-12 2019-06-21 中兴通讯股份有限公司 A kind of contract text risk checking method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11514096B2 (en) * 2015-09-01 2022-11-29 Panjiva, Inc. Natural language processing for entity resolution
CN108319627A (en) * 2017-02-06 2018-07-24 腾讯科技(深圳)有限公司 Keyword extracting method and keyword extracting device
CN109657227A (en) * 2018-10-08 2019-04-19 平安科技(深圳)有限公司 Contract feasibility determination method, equipment, storage medium and device
CN110032632A (en) * 2019-04-04 2019-07-19 平安科技(深圳)有限公司 Intelligent customer service answering method, device and storage medium based on text similarity
CN110163478B (en) * 2019-04-18 2024-04-05 平安科技(深圳)有限公司 Risk examination method and device for contract clauses

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
US9600231B1 (en) * 2015-03-13 2017-03-21 Amazon Technologies, Inc. Model shrinking for embedded keyword spotting
WO2018077655A1 (en) * 2016-10-24 2018-05-03 Koninklijke Philips N.V. Multi domain real-time question answering system
CN107122413A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 A kind of keyword extracting method and device based on graph model
CN107506347A (en) * 2017-07-22 2017-12-22 长沙兔子代跑网络科技有限公司 A kind of intelligence obtains the method and device for running chat record in generation
CN109918635A (en) * 2017-12-12 2019-06-21 中兴通讯股份有限公司 A kind of contract text risk checking method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666408A (en) * 2020-05-26 2020-09-15 中国工商银行股份有限公司 Method and device for screening and displaying important clauses

Also Published As

Publication number Publication date
CN110765765B (en) 2023-10-20
WO2021051934A1 (en) 2021-03-25

Similar Documents

Publication Publication Date Title
CN110222160B (en) Intelligent semantic document recommendation method and device and computer readable storage medium
JP7302022B2 (en) A text classification method, apparatus, computer readable storage medium and text classification program.
WO2020253042A1 (en) Intelligent sentiment judgment method and device, and computer readable storage medium
CN114780727A (en) Text classification method and device based on reinforcement learning, computer equipment and medium
CN110866098B (en) Machine reading method and device based on transformer and lstm and readable storage medium
US11599727B2 (en) Intelligent text cleaning method and apparatus, and computer-readable storage medium
CN110427480B (en) Intelligent personalized text recommendation method and device and computer readable storage medium
CN110765761A (en) Contract sensitive word checking method and device based on artificial intelligence and storage medium
CN110765765B (en) Contract key term extraction method, device and storage medium based on artificial intelligence
CN111460090A (en) Vector-based document retrieval method and device, computer equipment and storage medium
CN111241828A (en) Intelligent emotion recognition method and device and computer readable storage medium
CN111460081B (en) Answer generation method based on deep learning, electronic device and readable storage medium
CN110704687B (en) Text layout method, text layout device and computer readable storage medium
CN110427453B (en) Data similarity calculation method, device, computer equipment and storage medium
CN110502748B (en) Text topic extraction method, device and computer readable storage medium
CN114398882A (en) Document processing method, device, equipment and storage medium
CN110866042A (en) Intelligent table query method and device and computer readable storage medium
CN113627797A (en) Image generation method and device for employee enrollment, computer equipment and storage medium
CN110222144B (en) Text content extraction method and device, electronic equipment and storage medium
CN113609847B (en) Information extraction method, device, electronic equipment and storage medium
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN112445862B (en) Internet of things equipment data set construction method and device, electronic equipment and storage medium
CN112307175B (en) Text processing method, text processing device, server and computer readable storage medium
WO2021139076A1 (en) Intelligent text dialogue generation method and apparatus, and computer-readable storage medium
CN113434636A (en) Semantic-based approximate text search method and device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant