CN111047092A

CN111047092A - Dispute case victory rate prediction method and device, computer equipment and storage medium

Info

Publication number: CN111047092A
Application number: CN201911267505.3A
Authority: CN
Inventors: 何海龙; 李如先; 申志彬
Original assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Current assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-04-21

Abstract

The invention relates to a dispute case victory rate prediction method, a dispute case victory rate prediction device, computer equipment and a storage medium, wherein the dispute case victory rate prediction method comprises the steps of obtaining dispute cases needing prediction; extracting key elements of dispute cases needing prediction to obtain case key elements; carrying out word vectorization on the case key elements, and inputting the words into a victory ratio prediction model to carry out victory ratio prediction so as to obtain a victory ratio estimation value; sending the estimated value of the winning rate to the terminal for displaying at the terminal; the victory ratio prediction model is obtained by training a classification model as a sample set after carrying out word vectors on key elements with victory and failure category labels. According to the invention, key elements of the dispute cases to be predicted are extracted, the features of the dispute cases are extracted, regular expression matching elements or entity recognition model extraction elements are extracted according to whether the format of the document is standardized, and the case key elements are input into the success rate prediction model to perform corresponding success rate prediction, so that the accuracy and efficiency of the dispute case success rate prediction are increased.

Description

Dispute case victory rate prediction method and device, computer equipment and storage medium

Technical Field

The invention relates to a computer, in particular to a dispute case winning rate prediction method, a dispute case winning rate prediction device, computer equipment and a storage medium.

Background

Financial disputes are often related to financial companies, and refer to disputes between financial institutions and citizens, legal persons and other organizations, and between financial institutions caused by currency fusion. In order to reduce judicial losses caused by financial disputes, financial companies often need to spend a large amount of manpower and material resources to extract key elements of financial loan dispute on existing cases, and then carry out victory ratio prediction according to personal experiences and the key elements, and the result predicted by the method is low in accuracy and low in efficiency.

Therefore, it is necessary to design a method for increasing the accuracy and efficiency of the dispute case win ratio prediction.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a dispute case winning rate prediction method, a dispute case winning rate prediction device, computer equipment and a storage medium.

In order to achieve the purpose, the invention adopts the following technical scheme: the dispute case winning rate prediction method comprises the following steps:

acquiring dispute cases needing to be predicted;

extracting key elements of dispute cases needing prediction to obtain case key elements;

carrying out word vectorization on the case key elements, and inputting the words into a victory ratio prediction model to carry out victory ratio prediction so as to obtain a victory ratio estimation value;

sending the estimated value of the winning rate to a terminal for displaying at the terminal;

the success rate prediction model is obtained by training a classification model as a sample set after carrying out word vectors on key elements with success and failure category labels.

The further technical scheme is as follows: the method for extracting key elements of the dispute cases needing prediction to obtain the key elements of the cases comprises the following steps:

acquiring related legal documents according to dispute cases needing to be predicted to obtain document main bodies;

carrying out feature engineering extraction on the document main body to obtain a target document;

judging whether the target document is a document with a standard format;

if the target document is a document with a standard format, matching the target document through a regular expression to obtain case key elements;

if the target document is not a document with a standard format, preprocessing the target document to obtain a word vector;

inputting the word vectors into an entity recognition model for element classification to obtain key element categories;

extracting case key elements according to the key element categories and the target documents;

the entity recognition model is obtained by training a convolutional neural network through vectors obtained by word vectorization of a plurality of text data with key element classification labels.

The further technical scheme is as follows: the dispute case that predicts as required acquires relevant legal documents to obtain the document main body, includes:

acquiring a civil case according to a dispute case to be predicted;

filtering the content of the civil case to obtain a primary document;

and merging the first trial case and the second trial case of the primary document through the case number to obtain a document main body.

The further technical scheme is as follows: the characteristic engineering extraction of the document main body to obtain the target document comprises the following steps:

filtering stop words and punctuation marks in the document main body to obtain a filtering result;

discarding the document contents with the document content length not meeting the requirement on the filtering result to obtain an intermediate document;

and performing word segmentation processing and part-of-speech tagging on the intermediate document to obtain a target document.

The further technical scheme is as follows: the entity recognition model is obtained by training a convolutional neural network through vectors obtained after word vectorization of a plurality of text data with key element classification labels, and comprises the following steps:

constructing a convolutional neural network and a first loss function;

acquiring a plurality of text data with key element classification labels, performing word vectorization on the text data to obtain vectors with the key element classification labels, and dividing the vectors with the key element classification labels into a first training set and a first test set;

inputting the first training set into the convolutional neural network for convolutional training to obtain a first training result;

calculating the difference between the first training result and the key element classification label by adopting a first loss function to obtain a first loss value;

judging whether the first loss value is kept unchanged;

if the first loss value is not maintained, adjusting parameters of a convolutional neural network, and executing the first training set to be input into the convolutional neural network for convolutional training to obtain a first training result;

if the first loss value is kept unchanged, inputting the first test set into a convolutional neural network for element classification to obtain a first test result;

judging whether the first test result meets the requirement or not;

if the first test result does not meet the requirement, executing the parameter of the convolutional neural network;

and if the first test result meets the requirement, taking the convolutional neural network as an entity recognition model.

The further technical scheme is as follows: the convolutional neural network is optimized by a stochastic gradient descent algorithm.

The further technical scheme is as follows: the victory ratio prediction model is obtained by performing word vectors on key elements with victory and failure category labels and then serving the word vectors as a sample set training classification model, and comprises the following steps:

constructing a classification model and a second loss function;

obtaining key elements with the labels of the categories of the victory complaints and the complaints, carrying out word vectorization on the key elements to obtain vectors with the labels of the categories of the victory complaints and the complaints, forming a sample set, and dividing the sample set into a second training set and a second testing set;

inputting the second training set into the classification model for classification training to obtain a second training result;

calculating the difference between the second training result and the victory and the failure category labels by adopting a second loss function to obtain a second loss value;

judging whether the second loss value is kept unchanged;

if the second loss value is not maintained, adjusting parameters of a classification model, and executing the classification training of inputting the second training set into the classification model to obtain a second training result;

if the second loss value is kept unchanged, inputting a second test set into the classification model for carrying out the win rate prediction to obtain a second test result;

judging whether the second test result meets the requirement or not;

if the second test result does not meet the requirement, executing the parameter of the classification model adjustment;

if the second test result meets the requirement, the classification model is used as a success rate prediction model;

wherein the classification model comprises a logistic regression model or a convolutional neural network model.

The invention also provides a dispute case winning rate prediction device, which comprises:

the case acquiring unit is used for acquiring dispute cases needing to be predicted;

the extraction unit is used for extracting key elements of dispute cases needing prediction to obtain case key elements;

the prediction unit is used for carrying out word vectorization on the case key elements and inputting the words into the success rate prediction model to carry out success rate prediction so as to obtain a success rate estimation value;

and the sending unit is used for sending the winning rate estimation value to the terminal so as to display the winning rate estimation value on the terminal.

The invention also provides computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the method when executing the computer program.

The invention also provides a storage medium storing a computer program which, when executed by a processor, is operable to carry out the method as described above.

Compared with the prior art, the invention has the beneficial effects that: according to the invention, key elements of the dispute cases to be predicted are extracted, the features of the dispute cases are extracted, the regular expression matching elements or entity recognition model extraction elements are extracted according to the format specification of the document, and the extracted case key elements are input into the success rate prediction model to perform corresponding success rate prediction, so that the accuracy and efficiency of the dispute case success rate prediction are increased.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a dispute case win rate prediction method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a dispute case winning rate prediction method according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a dispute case winning rate prediction apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a dispute case winning rate prediction method according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of a dispute case win rate prediction method according to an embodiment of the present invention. The dispute case winning rate prediction method is applied to a server. The server carries out data interaction with the terminal, extracts key elements after acquiring dispute cases needing to be predicted from the terminal, predicts the winning rate, and outputs the predicted result to the terminal for display.

Fig. 2 is a schematic flow chart of a dispute case winning rate prediction method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S140.

And S110, acquiring dispute cases needing prediction.

In the present embodiment, the dispute case refers to the case information of financial loan disputes input by the terminal, and generally, the case information is brief case information, and includes information related to the related person and the event.

The dispute cases are mainly financial loan dispute cases.

And S120, extracting key elements of the dispute case to be predicted to obtain case key elements.

In this embodiment, the case key element refers to the content of the dispute case key, such as 1. borrowing main body; 2. a loan body; 3. the term of borrowing; 4. repayment time; 5. expiration time; 6. the amount of the debit; 7. contract and whether it is valid; 8. presence or absence of executable property; 9. the age of the borrower; 10. borrowing purposes; 11. running a bank; 12. interest borrowing; 13. payment records, etc.

In one embodiment, the step S120 may include steps S121 to S127.

And S121, acquiring related legal documents according to dispute cases predicted as required to obtain document main bodies.

In this embodiment, the document main body is a legal document formed by combining the first and second cases in the legal document after content filtering.

In one embodiment, the step S121 may include steps S1211 to S1213.

S1211, acquiring civil cases according to dispute cases predicted according to needs.

In this embodiment, a civil case refers to a corresponding case obtained from a judge paperwork according to a financial dispute case; there may be one or more, and therefore, a second filtration is required.

And S1212, carrying out content filtering on the civil case to obtain a preliminary document.

In this embodiment, the preliminary documents refer to corresponding financial loan-related civil cases formed after preliminary filtering, such as keyword filtering.

Specifically, the primary contents of the civil case are filtered through keywords including but not limited to finance, lending, borrowing and the like, so that a primary document is obtained, and the accuracy of entity identification is improved.

S1213, merging the first trial case and the second trial case of the preliminary document through the case number to obtain a document main body.

In this embodiment, the document body refers to a legal document formed by combining the information related to the first trial case and the information related to the second trial case.

The primary document combines the related cases which do not obey to relive the upper appeal and the related first-pass cases of the primary document through the case number of the primary document, the combining process is to extract the judgment result of the second-pass related to the primary document, to the rejected retained first-pass document, abandon the second-pass document, take the second-pass result as the final result for the rest, combine the document main bodies of the two-pass judgment into the final document main body, ensure the accuracy and the integrity of the document, and increase the accuracy of entity identification, namely increase the accuracy of element extraction.

And S122, extracting the characteristic engineering of the document main body to obtain the target document.

In this embodiment, the target document refers to a legal document formed by screening content meeting the requirement, and performing word segmentation processing and part-of-speech tagging on the content.

In one embodiment, the step S122 may include steps S1221 to S1223.

And S1221, filtering stop words and punctuation marks in the document main body to obtain a filtering result.

In this embodiment, the filtering result refers to the document after the stop words and punctuation marks in the document main body have been filtered.

Specifically, word judgment is carried out on the document main body according to a preset stop word bank, and once the same words appear, filtering is carried out to complete stop word filtering of the document main body.

Punctuation is filtered through string.division, interference on word segmentation processing and part-of-speech tagging is reduced, and workload of subsequent word segmentation processing and part-of-speech tagging can be reduced.

And S1222, discarding the document contents with the document content length not meeting the requirement of the filtering result to obtain an intermediate document.

In this embodiment, the intermediate document refers to a content whose character length is greater than a preset value, such as a character length of a word, for example, the character length is less than a certain threshold, for example, 150 words, and the document content whose length does not meet the requirement is filtered and discarded, so as to ensure that the remaining document content is the content that can be subjected to word segmentation, and reduce subsequent workload, so as to improve recognition efficiency.

And S1223, performing word segmentation processing and part-of-speech tagging on the intermediate document to obtain the target document.

Specifically, the word segmentation processing may be performed by methods such as, but not limited to, LSTM, CRF, HMM, and the like, and the part-of-speech tagging may be performed by methods such as, but not limited to, NLP.

And S123, judging whether the target document is a document with a standard format.

In the present embodiment, the legal document has a certain format specification, and the format of each item of content including the arrangement and order of titles is specified by the same standard, and therefore, it is possible to determine whether the target document is a document of the format specification according to the standard.

And S124, if the target document is a document with a standard format, matching the target document through a regular expression to obtain case key elements.

Specifically, when the target document is a document with a standard format, a uniform regular expression may be used for matching, such as an original? The company | being advertised \ s [ \ u4e00- \ u9fa5] {1,3} | in? Amount of borrowing? The money is overdue, and then key elements are obtained through direct matching.

And S125, if the target document is not the document with the standard format, preprocessing the target document to obtain a word vector.

In this embodiment, the word vector refers to a corresponding vector obtained by performing word vectorization on the word segments in the target document.

Specifically, word vectorization is performed on the target document to obtain a word vector. Word vectorization may be performed using, but not limited to tf-idf, one-hot, word2vec, etc.

The word segmentation, the part-of-speech tagging and the word vectorization can better analyze the content corresponding to the word subjected to the part-of-speech tagging in the target document, such as which words correspond to the main body of borrowing of key elements.

And S126, inputting the word vector into the entity recognition model for element classification to obtain a key element category.

In this embodiment, the key element category refers to which category the word vector corresponds to, and for example, the category belongs to a borrower or a loan body.

In an embodiment, the entity recognition model is obtained by training a convolutional neural network with vectors obtained by word vectorizing a plurality of text data with key element classification labels, and may include steps S1261 to S1269.

S1261, constructing a convolutional neural network and a first loss function.

In this embodiment, the convolutional neural network is a deep learning model, having a network of input layers, convolutional layers, and output layers. The first Loss function may be a Center Loss function.

S1262, obtaining a plurality of text data with key element classification labels, performing word vectorization on the text data to obtain vectors with the key element classification labels, and dividing the vectors with the key element classification labels into a first training set and a first testing set.

In this embodiment, the first training set refers to data used for training the model, and the first test set refers to data used for testing the trained model.

The text data is formed by performing the content filtering, word segmentation and part-of-speech tagging on documents of the judge document network as original data, performing corresponding key element classification label tagging on the formed documents to serve as reference data, and using vectors formed by performing word vectorization on the reference data as input.

Manually labeling documents of a referee document network after the segmentation, marking key elements to be extracted as labels of corresponding classification, such as borrowers and lenders, as P, marking B as a start according to law, I as a middle, E as an end, and the rest as O; the entity recognition of the irregular document is subjected to vectorization by using words or words plus parts of speech after being marked as input, and the marked result is used as output to train the network.

S1263, inputting the first training set into the convolutional neural network for convolutional training to obtain a first training result.

Specifically, the convolutional training is performed in the convolutional neural network by using CRF or LSTM + CRF to obtain a first training result.

In this embodiment, the first training result refers to the probability that the training set sequentially inputs to the convolutional neural network and then outputs the class label corresponding to the first training set, that is, the key element class, and is compared with a preset threshold, when the probability of the key element class exceeds the preset threshold, the class label is output as the key element class, otherwise, the class label is output as not the key element class.

S1264, calculating a difference between the first training result and the key element classification label by using a first loss function to obtain a first loss value.

In this embodiment, the first loss value is obtained by calculating the difference between the training result and the corresponding class label using the loss function.

S1265, determining whether the first loss value remains unchanged.

In this embodiment, when the first loss value remains unchanged, that is, the current convolutional neural network has converged, that is, the first loss value is substantially unchanged and very small, it also indicates that the current convolutional neural network can be used as an entity recognition model, generally, the first loss value is relatively large when training is started, and the first loss value is smaller after training, and if the first loss value does not remain unchanged, it indicates that the current convolutional neural network cannot be used as an entity recognition model, that is, the evaluated category is not accurate, which may result in inaccurate success rate prediction in a later period.

S1266, if the first loss value is not maintained, adjusting a parameter of the convolutional neural network, and executing the step S1263 to obtain a first training result.

In this embodiment, adjusting the parameter of the convolutional neural network refers to adjusting the weight value of each layer in the convolutional neural network. Through continuous training, a convolutional neural network meeting the requirements can be obtained.

S1267, if the first loss value is kept unchanged, inputting the first test set into the convolutional neural network for element classification to obtain a first test result.

In this embodiment, the first test result means that after the first test set is subjected to element classification, a corresponding element class can be obtained.

S1268, judging whether the first test result meets the requirement;

if the first test result does not meet the requirement, executing the step S1266;

s1269, if the first test result meets the requirement, the convolutional neural network is used as an entity recognition model.

When the two indexes of the precision and the recall rate of the first test result are evaluated to be in accordance with the conditions, the fitting degree is indicated to be in accordance with the requirements, and the first test result can be considered to be in accordance with the requirements; otherwise, the first test result is considered to be not qualified. And stopping training when the convolutional neural network converges. And testing the convolutional neural network after the convolutional neural network is trained, and if the first test result is not good, adjusting a training strategy to perform convolutional neural network training again. Certainly, in the training process, training and testing are carried out, and the testing is carried out in order to check the training condition in real time; and after the test of the training convolutional neural network is finished, the execution accuracy of the whole convolutional neural network is evaluated by using two indexes of precision and recall rate.

The key elements are changed due to the change of the labeled labels, so that different requirements of business personnel can be flexibly met.

The convolutional neural network described above is optimized by a stochastic gradient descent algorithm. Because the CNN can automatically select characteristics, the accuracy is higher, particularly, the result of optimizing the CNN by adopting SGD (Stochastic Gradient Descender) is better, and a better result can be obtained by adopting a smaller learning rate.

S127, extracting case key elements according to the key element categories and the target documents;

the participles in the target document are all used for determining the key requirement category, namely the content of the target document already determines the category of the key element, and the key element category and the target document can determine the key element.

And S130, performing word vectorization on the case key elements, and inputting the words into a success rate prediction model to perform success rate prediction so as to obtain a success rate estimation value.

In this embodiment, the winning rate estimation value refers to a value of the success rate of estimating the dispute case through the key elements.

Specifically, the success rate prediction model is obtained by performing word vectors through the key elements with the success and failure category labels and then serving as a sample set training classification model, and may include steps S131 to S139.

S131, constructing a classification model and a second loss function.

In this embodiment, the volume classification model comprises a logistic regression model or a convolutional neural network model, a network having an input layer, a convolutional layer, and an output layer. The second Loss function may be a Center Loss function.

S132, obtaining the key elements with the labels of the categories of the victory complaints and the aborts, performing word vectorization on the key elements to obtain vectors with the labels of the categories of the victory complaints and the aborts, forming a sample set, and dividing the sample set into a second training set and a second testing set.

In this embodiment, the second training set refers to data used for training the model, and the second test set refers to data used for testing the trained model.

The key elements are formed by performing the content filtering, word segmentation and part-of-speech tagging on documents of the referee document network as original data, extracting corresponding elements from the formed documents, performing word vectorization on the formed key elements, and performing corresponding marking by combining corresponding victory and failure results on the referee document network to serve as reference data which serves as input.

When labeling is carried out, according to the result and the key elements after word vectorization, the combination of the original report, namely a borrower or a borrower, the mark of the winner is judged to be 0 by taking the borrower as a main body, the mark of the loser is 1, and the key elements are word vectorized by methods such as tf-idf, one-hot, word2vec and the like; the above-described word vectorization result is input, and the category probabilities of the winning and the failing are output, and trained by a classification model such as an LR (Logistic Regression) or CNN (Convolutional Neural Networks) model.

And S133, inputting the second training set into the classification model for classification training to obtain a second training result.

In this embodiment, the second training result refers to the probability that the second training set outputs the category corresponding to the second training set after sequentially inputting the classification model, that is, the probability of the winning category and the failing category, and of course, the probability of the key element category may be compared with a preset threshold, and when the probability of the key element category exceeds the preset threshold, the category label is output as the winning category, and otherwise, the category label is output as the failing category.

Keywords such as 'withdraw of the complaint' and 'refute' in the judgment result are judged as a complaint, and the rest are judged as a complaint; in particular, to improve the accuracy of the model, the number of documents that are complained and aborted can be balanced.

And S134, calculating the difference between the second training result and the victory and the failure category labels by adopting a second loss function to obtain a second loss value.

In this embodiment, the second loss value is obtained by calculating the difference between the training result and the corresponding class label using the second loss function.

And S135, judging whether the second loss value is kept unchanged.

In this embodiment, when the second loss value remains unchanged, that is, the current classification model converges, that is, the second loss value is substantially unchanged and very small, it also indicates that the current classification model can be used as a success rate prediction model, generally, the second loss value is relatively large when training is started, and the second loss value is smaller after training, and if the second loss value does not remain unchanged, it indicates that the current classification model cannot be used as a success rate prediction model, that is, the estimated category is not accurate, which may result in inaccurate risk control in the later period.

And S136, if the second loss value is not maintained, adjusting parameters of the classification model, and executing the step S133.

In this embodiment, adjusting the parameters of the classification model refers to adjusting the weight values of the layers in the classification model. Through continuous training, a classification model meeting the requirements can be obtained.

And S137, if the second loss value is kept unchanged, inputting the second test set into the classification model for element classification to obtain a second test result.

In this embodiment, the second test result means that after the element classification is performed on the second test set, the corresponding element class can be obtained.

S138, judging whether the second test result meets the requirement or not;

if the second test result does not meet the requirement, executing the step S136;

and S139, if the second test result meets the requirement, taking the classification model as a success rate prediction model.

When the two indexes of the precision and the recall rate of the second test result are evaluated to be in accordance with the conditions, the fitting degree is indicated to be in accordance with the requirements, and the second test result can be considered to be in accordance with the requirements; otherwise, the test result is considered to be not qualified. And stopping training when the classification model converges. And testing the classification model after the classification model is trained, and if the second test result is not good, adjusting the training strategy to train the classification model again. Certainly, in the training process, training and testing are carried out, and the testing is carried out in order to check the training condition in real time; and after the test is finished by training the classification model, the execution accuracy of the whole classification model is evaluated by using two indexes of precision and recall rate.

In addition, when the classification model is a convolutional neural network model, optimization can be performed by a stochastic gradient descent algorithm. Because the CNN can automatically select characteristics, the accuracy is higher, particularly, the result of optimizing the CNN by adopting SGD (Stochastic Gradient Descender) is better, and a better result can be obtained by adopting a smaller learning rate.

And S140, sending the winning rate estimation value to a terminal for displaying at the terminal.

For example: the following are cases related to the fact of lending, i.e. the intermediate documents:

the original report Zhejiang village contract law after receiving the Zhu-third financial borrowing contract dispute one case from the institute in 2011 at 1, 24 days is to say that the Zhejiang village contract law is to say that the Zhu-first is to loan 5 ten thousand yuan to Chun east lake treatment at 2009 at 8, 3 days, 7 months at 20 days, the Anmo interest rate is 7.965 per thousand, and the Zhu-third in the Japanese is to bear the contract law and is to be reported not to pay principal and interest according to the contract agreement after the certain Zhu-third in the Japanese is to bear the contract so that the original report proposes the appeal to the institute: 1, the fact that the first defended original debt is not returned to the original report, namely the principal 5 ten thousand yuan and the fact that the third defended original report is not answered to the original report, and the fact that the third defended original report is not returned to the original report, the fact that the first defended original report has not been returned to the original report, and the fact that the third defended original report has not been answered to the original report, namely the fact that the first defended original report has not been returned to the original report, and the fact that the first defended original report has not been returned to the original report, namely the principal 5 million yuan, the interest 5992.69 yuan 2, the fact that a certain Zhu-propane of the defended original report has not been returned to the home.

And (3) extracting results through case key elements: 1. a main borrowing body: one of the first and second prescriptions is cinnabar; 2. the loan body: branch treatment of east lake of Puchongxing; 3. the term of borrowing: 1 year; 4. repayment time: 20 days 7 months in 2010; 5. expiration time: none, 6. borrow amount: 5992.69 yuan; 7. contract and whether valid: borrowing one part of borrowing data; 8. presence or absence of executable property: none; 9. age of borrower: none; 10. the borrowing purpose is as follows: none; 11. bank pipelining: none; 12. interest borrowing: none; 13. and (4) repayment recording: none.

The key factors of the case are used as the input of a winning rate prediction model, the probability that the prediction result is 0 is 75.4 percent, namely the loan subject with high probability can win the official department.

In the whole prediction process, experience is not added, the standard data are unified, the accuracy of dispute case victory rate prediction is increased, financial companies are helped to effectively control loan risks, and in addition, a referee document network has a plurality of document data related to financial disputes, so that the development of related victory rate prediction models can improve the risk control capability of the companies, the utilization rate of public data and the value of the public data.

According to the dispute case victory rate prediction method, key elements of dispute cases to be predicted are extracted, the features of the dispute cases are extracted, regular expression matching elements or entity recognition model extraction elements are extracted according to the format specification of a document, and the extracted case key elements are input into a dispute rate prediction model to predict the corresponding dispute rate, so that the accuracy and efficiency of dispute case victory rate prediction are improved.

Fig. 3 is a schematic block diagram of a dispute case winning rate prediction apparatus 300 according to an embodiment of the present invention. As shown in fig. 3, the present invention further provides a dispute case winning rate predicting apparatus 300 corresponding to the dispute case winning rate predicting method. The dispute case winning rate prediction apparatus 300 includes a unit for performing the above-described dispute case winning rate prediction method, and may be configured in a server. Specifically, referring to fig. 3, the device 300 for predicting the winning rate of a dispute case includes a case obtaining unit 301, an extracting unit 302, a predicting unit 303 and a sending unit 304.

A case obtaining unit 301, configured to obtain a dispute case that needs to be predicted; an extracting unit 302, configured to perform key element extraction on a dispute case to be predicted, so as to obtain a case key element; the prediction unit 303 is configured to perform word vectorization on the case key elements, and then input the case key elements into the win rate prediction model to perform win rate prediction, so as to obtain a win rate estimation value; a sending unit 304, configured to send the win ratio estimation value to the terminal for displaying at the terminal.

In one embodiment, the extracting unit 302 includes a document acquiring subunit, a project extracting subunit, a judging subunit, a matching subunit, a preprocessing subunit, a category acquiring subunit, and an element extracting subunit.

The document acquiring subunit is used for acquiring related legal documents according to dispute cases needing to be predicted so as to obtain a document main body; the engineering extraction subunit is used for carrying out characteristic engineering extraction on the document main body to obtain a target document; a judging subunit, configured to judge whether the target document is a document with a standard format; the matching subunit is used for matching the target document through a regular expression to obtain case key elements if the target document is a document with a standard format; the preprocessing subunit is used for preprocessing the target document to obtain a word vector if the target document is not a document with a standard format; the category acquisition subunit is used for inputting the word vectors into the entity recognition model to perform element classification so as to obtain key element categories; and the element extraction subunit is used for extracting the case key elements according to the key element categories and the target documents.

In one embodiment, the document acquisition subunit includes a civil case acquisition module, a content filtering module, and a merging module.

The civil case acquisition module is used for acquiring the civil cases according to dispute cases predicted according to needs; the content filtering module is used for filtering the content of the civil case to obtain a primary document; and the merging module is used for merging the first trial case and the second trial case for the preliminary document through the case number to obtain a document main body.

In one embodiment, the engineering extraction subunit includes a filtering module, a discarding module, and a labeling module.

The filtering module is used for filtering stop words and punctuation marks in the document main body to obtain a filtering result; the abandon module is used for abandoning the document contents with the document content length not meeting the requirement on the filtering result to obtain an intermediate document; and the labeling module is used for performing word segmentation processing and part-of-speech labeling on the intermediate document to obtain the target document.

In an embodiment, the extracting unit 302 further includes a model forming subunit, where the model forming subunit is configured to train a convolutional neural network through a vector obtained by performing word vectorization on a plurality of text data with key element classification labels, so as to obtain an entity recognition model.

In an embodiment, the model forming subunit includes a first building module, a data obtaining module, a first training module, a first loss calculating module, a first loss value determining module, a first adjusting module, a first testing module, and a first testing result determining module.

The first construction module is used for constructing a convolutional neural network and a first loss function; the data acquisition module is used for acquiring a plurality of text data with key element classification labels, performing word vectorization on the text data to obtain vectors with the key element classification labels, and dividing the vectors with the key element classification labels into a first training set and a first test set; the first training module is used for inputting the first training set into the convolutional neural network for convolutional training to obtain a first training result; the first loss calculation module is used for calculating the difference between the first training result and the key element classification label by adopting a first loss function so as to obtain a first loss value; the first loss value judging module is used for judging whether the first loss value is kept unchanged; a first adjusting module, configured to adjust a parameter of a convolutional neural network if the first loss value is not maintained, and perform convolutional training by inputting the first training set to the convolutional neural network, so as to obtain a first training result; the first test module is used for inputting the first test set into the convolutional neural network for element classification if the first loss value is kept unchanged so as to obtain a first test result; the first test result judging module is used for judging whether the first test result meets the requirement or not; if the first test result does not meet the requirement, executing the parameter of the convolutional neural network; and if the first test result meets the requirement, taking the convolutional neural network as an entity recognition model.

In an embodiment, the apparatus further includes a prediction model forming unit, where the prediction model forming unit is configured to train a classification model as a sample set after performing a word vector by using the key elements with the category labels of the winning and the failing, so as to form a winning rate prediction model.

In an embodiment, the prediction model forming unit includes a second constructing subunit, a sample set forming subunit, a first training subunit, a first loss calculating subunit, a first loss value judging subunit, a first adjusting subunit, a first testing subunit, and a first testing result judging subunit.

The second construction subunit is used for constructing a classification model and a second loss function; the sample set forming subunit is used for acquiring the key elements with the winning and the failing category labels, performing word vectorization on the key elements to obtain vectors with the winning and the failing category labels, forming a sample set, and dividing the sample set into a second training set and a second testing set; the first training subunit is used for inputting the second training set into the classification model for classification training to obtain a second training result; the first loss calculating subunit is used for calculating the difference between the second training result and the victory and the failure category labels by adopting a second loss function so as to obtain a second loss value; a first loss value judging subunit, configured to judge whether the second loss value remains unchanged; a first adjusting subunit, configured to adjust parameters of a classification model if the second loss value is not maintained, and perform the classification training by inputting the second training set into the classification model to obtain a second training result; the first testing subunit is used for inputting the second testing set into the classification model for the success rate prediction to obtain a second testing result if the second loss value is kept unchanged; the first test result judging subunit is used for judging whether the second test result meets the requirement or not; if the second test result does not meet the requirement, executing the parameter of the classification model adjustment; if the second test result meets the requirement, the classification model is used as a success rate prediction model; wherein the classification model comprises a logistic regression model or a convolutional neural network model.

It should be noted that, as can be clearly understood by those skilled in the art, the detailed implementation process of the dispute case success rate prediction apparatus 300 and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided herein.

The dispute case winning rate prediction apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 4, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032 include program instructions that, when executed, cause the processor 502 to perform a dispute case win ratio prediction method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be enabled to execute a dispute case win ratio prediction method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:

acquiring dispute cases needing to be predicted; extracting key elements of dispute cases needing prediction to obtain case key elements; carrying out word vectorization on the case key elements, and inputting the words into a victory ratio prediction model to carry out victory ratio prediction so as to obtain a victory ratio estimation value; sending the estimated value of the winning rate to a terminal for displaying at the terminal; the success rate prediction model is obtained by training a classification model as a sample set after carrying out word vectors on key elements with success and failure category labels.

In an embodiment, when the processor 502 implements the step of extracting the key elements of the dispute case to be predicted to obtain the case key elements, the following steps are specifically implemented:

acquiring related legal documents according to dispute cases needing to be predicted to obtain document main bodies; carrying out feature engineering extraction on the document main body to obtain a target document; judging whether the target document is a document with a standard format; if the target document is a document with a standard format, matching the target document through a regular expression to obtain case key elements; if the target document is not a document with a standard format, preprocessing the target document to obtain a word vector; inputting the word vectors into an entity recognition model for element classification to obtain key element categories; extracting case key elements according to the key element categories and the target documents; the entity recognition model is obtained by training a convolutional neural network through vectors obtained by word vectorization of a plurality of text data with key element classification labels.

In an embodiment, when the processor 502 obtains the relevant legal document according to the dispute case predicted as needed to obtain the document main body, the following steps are specifically implemented:

acquiring a civil case according to a dispute case to be predicted; filtering the content of the civil case to obtain a primary document; merging the first and second cases of the preliminary document through the case number to obtain a document main body

In an embodiment, when the processor 502 performs the feature engineering extraction on the document main body to obtain the target document, the following steps are specifically implemented:

filtering stop words and punctuation marks in the document main body to obtain a filtering result; discarding the document contents with the document content length not meeting the requirement on the filtering result to obtain an intermediate document; and performing word segmentation processing and part-of-speech tagging on the intermediate document to obtain a target document.

In an embodiment, when the processor 502 implements the step that the entity recognition model is obtained by training the convolutional neural network using a vector obtained by word vectorizing a plurality of text data with key element classification labels, the following steps are specifically implemented:

constructing a convolutional neural network and a first loss function; acquiring a plurality of text data with key element classification labels, performing word vectorization on the text data to obtain vectors with the key element classification labels, and dividing the vectors with the key element classification labels into a first training set and a first test set; inputting the first training set into the convolutional neural network for convolutional training to obtain a first training result; calculating the difference between the first training result and the key element classification label by adopting a first loss function to obtain a first loss value; judging whether the first loss value is kept unchanged; if the first loss value is not maintained, adjusting parameters of a convolutional neural network, and executing the first training set to be input into the convolutional neural network for convolutional training to obtain a first training result; if the first loss value is kept unchanged, inputting the first test set into a convolutional neural network for element classification to obtain a first test result; judging whether the first test result meets the requirement or not; if the first test result does not meet the requirement, executing the parameter of the convolutional neural network; and if the first test result meets the requirement, taking the convolutional neural network as an entity recognition model.

Wherein the convolutional neural network is optimized by a stochastic gradient descent algorithm.

In an embodiment, when the processor 502 implements the step of training the classification model as a sample set after performing word vectors on the key elements with the labels of the category of the winning and the failing, the following steps are specifically implemented:

constructing a classification model and a second loss function; obtaining key elements with the labels of the categories of the victory complaints and the complaints, carrying out word vectorization on the key elements to obtain vectors with the labels of the categories of the victory complaints and the complaints, forming a sample set, and dividing the sample set into a second training set and a second testing set; inputting the second training set into the classification model for classification training to obtain a second training result; calculating the difference between the second training result and the victory and the failure category labels by adopting a second loss function to obtain a second loss value; judging whether the second loss value is kept unchanged; if the second loss value is not maintained, adjusting parameters of a classification model, and executing the classification training of inputting the second training set into the classification model to obtain a second training result; if the second loss value is kept unchanged, inputting a second test set into the classification model for carrying out the win rate prediction to obtain a second test result; judging whether the second test result meets the requirement or not; if the second test result does not meet the requirement, executing the parameter of the classification model adjustment; if the second test result meets the requirement, the classification model is used as a success rate prediction model; wherein the classification model comprises a logistic regression model or a convolutional neural network model.

It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:

In an embodiment, when the processor executes the computer program to extract the key elements of the dispute case to be predicted, so as to obtain the case key elements, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to obtain the legal document related to the dispute case predicted as required to obtain the document main body, the following steps are specifically implemented:

acquiring a civil case according to a dispute case to be predicted; filtering the content of the civil case to obtain a primary document; and merging the first trial case and the second trial case of the primary document through the case number to obtain a document main body.

In an embodiment, when the processor executes the computer program to perform the feature engineering extraction on the document main body to obtain the target document, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step that the entity recognition model is obtained by training a convolutional neural network with a vector obtained by word vectorizing a plurality of text data with key element classification labels, the following steps are specifically implemented:

The convolutional neural network is optimized by a stochastic gradient descent algorithm.

In an embodiment, when the processor executes the computer program to implement the step of training the classification model as a sample set after performing word vectors on the key elements with the category labels of the winning and the failing, the processor specifically implements the following steps:

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The dispute case winning rate prediction method is characterized by comprising the following steps:

acquiring dispute cases needing to be predicted;

2. The dispute case winning rate prediction method according to claim 1, wherein the extracting key elements of the dispute case to be predicted to obtain case key elements comprises:

judging whether the target document is a document with a standard format;

3. The dispute case winning rate prediction method according to claim 2, wherein the obtaining of the relevant legal documents from the dispute cases predicted as required to obtain the document main body comprises:

acquiring a civil case according to a dispute case to be predicted;

filtering the content of the civil case to obtain a primary document;

4. The dispute case winning rate prediction method according to claim 2, wherein the performing feature engineering extraction on the document main body to obtain the target document comprises:

5. The dispute case winning rate prediction method according to claim 2, wherein the entity recognition model is obtained by training a convolutional neural network through vectors obtained by word vectorization of a plurality of text data with key element classification labels, and comprises:

constructing a convolutional neural network and a first loss function;

judging whether the first loss value is kept unchanged;

judging whether the first test result meets the requirement or not;

6. The dispute scenario winning rate prediction method according to claim 5, wherein the convolutional neural network is optimized by a stochastic gradient descent algorithm.

7. The dispute case winning rate prediction method according to claim 1, wherein the winning rate prediction model is obtained by performing word vectors on key elements with winning and failing category labels and then training classification models as a sample set, and comprises:

constructing a classification model and a second loss function;

judging whether the second loss value is kept unchanged;

judging whether the second test result meets the requirement or not;

8. A dispute case winning rate prediction device is characterized by comprising:

9. A computer device, characterized in that the computer device comprises a memory, on which a computer program is stored, and a processor, which when executing the computer program implements the method according to any of claims 1 to 7.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.