CN112685396A

CN112685396A - Financial data violation detection method and device, computer equipment and storage medium

Info

Publication number: CN112685396A
Application number: CN202011606090.0A
Authority: CN
Inventors: 徐欢
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-20

Abstract

The embodiment of the invention discloses a financial data violation detection method and device, computer equipment and a storage medium, and relates to the field of artificial intelligence. The method comprises the following steps: carrying out data cleaning processing on the historical financial data set to obtain a sample data set; constructing a knowledge graph according to the sample data set; acquiring node vectors of nodes of the knowledge graph, and adding the node vectors into a preset training sample set; training a preset text classification model through a training sample set; performing data cleaning processing on the initial financial data to obtain to-be-detected financial data; adding the financial data to be detected into the knowledge graph, and acquiring a node vector of the financial data to be detected; inputting the node vector of the financial data to be tested into the trained text classification model; and if the tag of the financial data to be detected is illegal, sending an illegal warning message to a preset supervision terminal. Compared with a manual checking mode, the method has the advantages of high efficiency and accuracy, and greatly reduced cost.

Description

Financial data violation detection method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a financial data violation detection method and device, computer equipment and a storage medium.

Background

At present, in the financial monthly process of enterprises, business personnel are required to check whether financial data are in compliance, and a large amount of manpower is consumed. Meanwhile, the problems of strong subjectivity and poor accuracy exist in manual checking.

It can be seen that in the prior art, the method for detecting whether financial data is in compliance or not is achieved through a manual checking method, so that on one hand, the efficiency is extremely low, the accuracy is low, and on the other hand, extremely high labor cost is required.

Disclosure of Invention

The embodiment of the invention provides a financial data violation detection method and device, computer equipment and a storage medium, and aims to solve the problems of low efficiency, low accuracy and high cost of the conventional financial data exception checking.

In a first aspect, an embodiment of the present invention provides a method for detecting violation of financial data, including:

obtaining a historical financial data set, the historical financial data set comprising a plurality of tagged financial data, the tags comprising violations and compliance;

performing data cleaning processing on the historical financial data set to obtain a sample data set, wherein the sample data set comprises a plurality of sample data, and the sample data is obtained by performing data cleaning processing on financial data;

constructing a knowledge graph according to the sample data set, wherein nodes of the knowledge graph are sample data;

acquiring node vectors of the nodes of the knowledge graph, and adding the node vectors of the nodes of the knowledge graph to a preset training sample set;

training a preset text classification model through the training sample set;

if initial financial data are received, performing data cleaning processing on the initial financial data to obtain to-be-detected financial data;

adding the financial data to be detected into the knowledge graph, and acquiring a node vector of the financial data to be detected;

inputting the node vector of the financial data to be detected into the trained text classification model and outputting a label of the financial data to be detected;

and if the tag of the financial data to be detected is illegal, sending an illegal warning message to a preset supervision terminal.

In a second aspect, an embodiment of the present invention further provides a device for detecting violation of financial data, where the device includes:

a first obtaining unit, configured to obtain a historical financial data set, where the historical financial data set includes a plurality of tagged financial data, and the tags include violations and compliance;

the first cleaning unit is used for carrying out data cleaning processing on the historical financial data set to obtain a sample data set, wherein the sample data set comprises a plurality of sample data, and the sample data is obtained by carrying out data cleaning processing on financial data;

the construction unit is used for constructing a knowledge graph according to the sample data set, wherein nodes of the knowledge graph are sample data;

the second acquisition unit is used for acquiring the node vectors of the nodes of the knowledge graph and adding the node vectors of the nodes of the knowledge graph to a preset training sample set;

the training unit is used for training a preset text classification model through the training sample set;

the second cleaning unit is used for cleaning the initial financial data to obtain the financial data to be detected if the initial financial data is received;

the third acquisition unit is used for adding the financial data to be detected into the knowledge graph and acquiring the node vector of the financial data to be detected;

the input unit is used for inputting the node vector of the financial data to be detected into the trained text classification model and outputting the label of the financial data to be detected;

and the sending unit is used for sending violation warning information to a preset supervision terminal if the tag of the financial data to be detected is violation.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.

In a fourth aspect, the present invention also provides a computer-readable storage medium, which stores a computer program, and the computer program can implement the above method when being executed by a processor.

The embodiment of the invention provides a financial data violation detection method and device, computer equipment and a storage medium. Wherein the method comprises the following steps: obtaining a historical financial data set, the historical financial data set comprising a plurality of tagged financial data, the tags comprising violations and compliance; performing data cleaning processing on the historical financial data set to obtain a sample data set, wherein the sample data set comprises a plurality of sample data, and the sample data is obtained by performing data cleaning processing on financial data; constructing a knowledge graph according to the sample data set, wherein nodes of the knowledge graph are sample data; acquiring node vectors of the nodes of the knowledge graph, and adding the node vectors of the nodes of the knowledge graph to a preset training sample set; training a preset text classification model through the training sample set; if initial financial data are received, performing data cleaning processing on the initial financial data to obtain to-be-detected financial data; adding the financial data to be detected into the knowledge graph, and acquiring a node vector of the financial data to be detected; inputting the node vector of the financial data to be detected into the trained text classification model and outputting a label of the financial data to be detected; if the tag of the financial data to be detected is illegal, an illegal warning message is sent to a preset supervision terminal, so that whether the financial data is abnormal or not can be automatically detected, and compared with a manual checking mode, on one hand, the efficiency is high, the accuracy is high, and on the other hand, the cost is greatly reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a financial data violation detection method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a financial data violation detection method according to an embodiment of the present invention;

FIG. 3 is a sub-flow diagram of a method for detecting violations of financial data according to an embodiment of the present invention;

FIG. 4 is a sub-flow diagram of a method for detecting violations of financial data according to an embodiment of the present invention;

FIG. 5 is a sub-flow diagram of a method for detecting violations of financial data according to an embodiment of the present invention;

FIG. 6 is a sub-flow diagram of a method for detecting violations of financial data according to an embodiment of the present invention;

FIG. 7 is a sub-flow diagram of a method for detecting violations of financial data according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of a financial data violation detection apparatus provided by an embodiment of the present invention;

fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a financial data violation detection method according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a financial data violation detection method according to an embodiment of the present invention. The financial data violation detection method server 20. The server 20 obtains a historical financial data set comprising a plurality of tagged financial data, the tags comprising violations and compliance; the server 20 performs data cleaning processing on the historical financial data set to obtain a sample data set, wherein the sample data set comprises a plurality of sample data, and the sample data is obtained by performing data cleaning processing on financial data; the server 20 constructs a knowledge graph according to the sample data set, wherein nodes of the knowledge graph are sample data; the server 20 acquires node vectors of the nodes of the knowledge graph, and adds the node vectors of the nodes of the knowledge graph to a preset training sample set; the server 20 trains a preset text classification model through the training sample set; if the initial financial data is received, the server 20 performs data cleaning processing on the initial financial data to obtain to-be-detected financial data; the server 20 adds the financial data to be detected to the knowledge graph and obtains a node vector of the financial data to be detected; the server 20 inputs the node vector of the financial data to be tested into the trained text classification model and outputs the label of the financial data to be tested; and if the tag of the financial data to be detected is illegal, the server 20 sends an illegal warning message to a preset supervision terminal 10.

Fig. 2 is a schematic flowchart of a financial data violation detection method according to an embodiment of the present invention. The invention can be applied to intelligent government affairs/intelligent city management/intelligent community/intelligent security/intelligent logistics/intelligent medical treatment/intelligent education/intelligent environmental protection/intelligent traffic scenes, thereby promoting the construction of intelligent cities. As shown, the method includes the following steps S1-S9.

S1, obtaining a historical financial data set, wherein the historical financial data set comprises a plurality of tagged financial data, and the tags comprise violations and compliance.

In particular implementations, the historical financial data set is a set of financial data for a past year of the enterprise, for example, the historical financial data set may be particular to a set of financial data for the last two years of the enterprise.

The historical financial data set includes a plurality of tagged financial data including violations and compliance. The financial data refers to text data for recording financial reimbursement information. For example, in one embodiment the financial data includes an reimbursement request form and a corresponding electronic ticket.

The tag of the financial data is in compliance, which indicates that the financial data is in compliance; the financial data is flagged as a violation, indicating that the financial data is not in compliance. The labels of the financial data as training data are labeled by those skilled in the art.

And S2, performing data cleaning processing on the historical financial data set to obtain a sample data set, wherein the sample data set comprises a plurality of sample data, and the sample data is obtained by performing data cleaning processing on financial data.

In particular, Data cleansing (Data cleansing) refers to a process of reviewing and verifying Data, and aims to remove duplicate information, correct errors, and provide Data consistency. The accuracy of subsequent model calculation can be improved through data cleaning.

Specifically, the data cleaning processing is carried out on the financial data contained in the historical financial data set one by one. And carrying out data cleaning on each financial data to obtain sample data. The sample data constitutes a sample data set.

S3, constructing a knowledge graph according to the sample data set, wherein the nodes of the knowledge graph are sample data.

In a specific implementation, the knowledge-graph is one type of knowledge-graph. A Graph (Graph) is composed of a finite, empty set of nodes and a set of edges between the nodes, usually represented as: g (V, E), where G represents a graph, V is the set of nodes in the graph G, and E is the set of edges in the graph G.

A graph is a more complex data structure than a linear table and a tree. In the graph, the relationship between the nodes is arbitrary, and any two nodes may be related to each other.

The diagram is a many-to-many data structure. The method comprises two parts of a node set and an edge set, wherein the edges reflect the relationship between the nodes. If the two nodes of the knowledge graph are associated, an edge exists between the two nodes; if no association exists between two nodes of the knowledge-graph, no edge exists between the two nodes.

In this embodiment, the nodes of the knowledge graph are sample data, and if there is a correlation between the sample data, an edge is established between the sample data.

Referring to fig. 3, in an embodiment, the step S3 specifically includes: S31-S34.

And S31, adding the sample data in the sample data set into a node set of the knowledge graph as nodes of the knowledge graph.

And S32, respectively obtaining the cosine distance between any two nodes in the node set.

In a specific implementation, the similarity between texts is characterized by a cosine distance. The greater the cosine distance between two texts, the higher the similarity between the two texts. In the invention, the cosine distances between any two nodes in the node set are respectively obtained. The cosine distance can be calculated by TFIDF algorithm, which is not described in detail herein.

And S33, judging whether the cosine distance between two nodes in the node set is greater than a preset cosine distance threshold value.

In a specific implementation, the distance threshold may be set by a person skilled in the art, and the present invention is not limited in particular. For example, in one embodiment, the distance threshold is set to 0.5.

And S34, if the cosine distance between two nodes in the node set is greater than a preset cosine distance threshold, establishing an edge between the two nodes, and adding the edge between the two nodes into the edge set of the knowledge graph.

In a specific implementation, if the cosine distance between two nodes in the node set is greater than a preset cosine distance threshold, an edge is established between the two nodes, and the edge between the two nodes is added to the edge set of the knowledge graph. Meanwhile, the weight of the edge between the two nodes is set as the cosine distance between the two nodes.

And S4, acquiring the node vectors of the nodes of the knowledge graph, and adding the node vectors of the nodes of the knowledge graph to a preset training sample set.

In particular implementations, each node in the knowledge-graph is identified as a vector (i.e., a node vector is computed for each node of the knowledge-graph). The node vector can have the capability of representation and reasoning in a vector space, and can be used as an input of a machine learning model for further data analysis and mining.

Further, the node vectors of the nodes of the knowledge graph are added to a preset training sample set for subsequent training of the text classification model.

Referring to fig. 4, in an embodiment, the step S4 specifically includes: S41-S44.

And S41, starting from the node of the knowledge graph, carrying out random walk according to the edge between the node and other nodes to obtain a node sequence with a preset length.

In specific implementation, starting from one node in the knowledge graph, random walk is performed according to edges between the nodes to obtain a preset node sequence with a fixed length. The preset fixed length may be, for example, 5 nodes. For example, in an embodiment, the random walk from the node a1 results in the node sequence of the node a1 being (a1, a2, a1, A3, a4), and the order of the random walk is a1 → a2 → a1 → A3 → a 4.

And S42, performing word segmentation processing on the node sequence of the nodes to obtain a first word segmentation set.

In specific implementation, word segmentation is performed on the node sequence of the nodes to obtain a first word segmentation set. Word segmentation refers to the segmentation of a chinese character sequence into a single word. Word segmentation is a process of recombining continuous word sequences into word sequences according to a certain specification.

Referring to fig. 5, in an embodiment, the step S42 specifically includes: S421-S422.

S421, dividing the node sequence of the nodes into a plurality of words by a preset word segmentation tool to obtain a first initial word segmentation set.

In a specific implementation, a commonly used word segmentation tool is a Chinese character segmentation tool. The node sequence of the nodes is divided into a plurality of words by a final word segmentation tool, and the words form a first initial word segmentation set.

S422, deleting the stop word in the first initial word segmentation set to obtain the first word segmentation set.

In specific implementation, stop words (stop words) are often prepositions, adverbs, conjunctions, or the like. For example, "in," "back," "also," "of," "it," "is," and the like are stop words. The stop word has no actual meaning and generates interference, so that the stop word needs to be deleted when the stop word is actually applied.

If the first initial word segmentation set comprises stop words, the stop words contained in the first initial word segmentation set are deleted to obtain a first word segmentation set.

S43, performing word vector training on the words of the first word segmentation set to obtain word vectors of the words of the first word segmentation set.

In specific implementation, word vector training is performed on the words of the first participle set by using word2 vec. word2vec is a natural language processing tool that functions to translate words in natural language into word vectors that can be understood by a computer.

The traditional word vector is easily disturbed by dimension disaster, and any two words are isolated and can not reflect the relation between the words, so the embodiment adopts word2vec to obtain the word vector, and can reflect the similarity between the words by calculating the distance between the vectors.

Alternatively, in other embodiments, other word vector tools may be used for word vector training, and the invention is not limited in this respect.

S44, inputting the word vector of the word of the first participle set into a preset bidirectional GRU network, so that the bidirectional GRU network outputs the node vector of the node.

In a specific implementation, word vectors of words in the first participle set are encoded through a bidirectional GRU (Gated recursive Unit) network, and an output of the bidirectional GRU network is a node vector of the node.

And S5, training a preset text classification model through the training sample set.

In specific implementation, a preset text classification model is trained through the training sample set. The text classification model may be embodied as a BERT (Bidirectional Encoder Representation from transforms).

When training the text classification model, the sample data set may be divided into two parts, where one part of the sample data is used for training and the other part of the sample data is used for verification. For example, in one embodiment, the sample data set contains sample data from the last two years, the last two months of sample data being used for validation, and the other data being used for training.

When the accuracy of the model is trained to be higher than 90%, the model is available and can be put into production.

In an embodiment, the step S5 specifically includes: and inputting the node vectors in the training sample set into the text classification model, and training the text classification model according to a preset back propagation algorithm.

In specific implementation, a back propagation algorithm (BP algorithm for short) is a supervised learning algorithm, and is often used to train a neural network. Training through a back propagation algorithm can enable the accuracy of the text classification model to be higher.

Specifically, the node vectors in the training sample set are input into the neural network model, whether the labels output by the neural network model are consistent with the labels of the sample data corresponding to the node vectors or not is judged, if not, a loss function is calculated, the parameters of the model are adjusted according to a back propagation algorithm, the node vectors of the sample data are input into the neural network model again, and the steps are repeated until the labels output by the neural network model are consistent with the labels of the sample data.

S6, if initial financial data are received, the initial financial data are subjected to data cleaning processing to obtain to-be-detected financial data.

In a specific implementation, the initial financial data refers to financial data that has not been checked at present. In the present invention, it is necessary to check whether the initial financial data is illegal.

In order to improve the accuracy, the initial financial data needs to be subjected to data cleaning processing to obtain the financial data to be measured.

Data cleansing (Data cleansing) refers to the process of re-examining and verifying Data in order to remove duplicate information, correct existing errors, and provide Data consistency. The accuracy of subsequent model calculation can be improved through data cleaning, and the probability of misjudgment is reduced.

S7, adding the financial data to be detected into the knowledge graph, and acquiring the node vector of the financial data to be detected.

In specific implementation, the financial data to be detected is added to the knowledge graph, and a node vector of the financial data to be detected is obtained.

Referring to fig. 6, in one embodiment, the above steps: adding the financial data to be tested into the knowledge graph, and specifically comprising the following steps:

s71, adding the financial data to be tested into the node set of the knowledge graph.

And S72, respectively obtaining cosine distances between the financial data to be measured and any node in the node set.

In a specific implementation, the similarity between texts is characterized by a cosine distance. The greater the cosine distance between two texts, the higher the similarity between the two texts. In the invention, the cosine distances between the financial data to be measured and any node in the node set are respectively obtained. The cosine distance can be calculated by TFIDF algorithm, which is not described in detail herein.

And S73, judging whether the cosine distance between the financial data to be detected and the nodes in the node set is larger than a preset cosine distance threshold value.

S74, if the cosine distance between the financial data to be measured and the nodes in the node set is larger than a preset cosine distance threshold, establishing edges between the financial data to be measured and the nodes in the node set, and adding the edges between the financial data to be measured and the nodes in the node set into an edge set of a knowledge graph.

Referring to fig. 7, in one embodiment, the above steps: adding the financial data to be tested into the knowledge graph, and specifically comprising the following steps: S701-S704.

And S701, starting from the financial data to be detected, and performing random walk according to edges between the financial data to be detected and other nodes to obtain a node sequence to be detected with a preset length.

In specific implementation, starting from the financial data to be measured (the financial data to be measured is a node) in the knowledge graph, random walk is performed according to edges between the nodes, and a preset node sequence with a fixed length is obtained. The preset fixed length may be, for example, 5 nodes.

S702, performing word segmentation processing on the node sequence to be detected to obtain a second word segmentation set.

In specific implementation, word segmentation is performed on the node sequence to be tested to obtain a second word segmentation set. Word segmentation refers to the segmentation of a chinese character sequence into a single word. Word segmentation is a process of recombining continuous word sequences into word sequences according to a certain specification.

S703, performing word vector training on the words in the second word segmentation set to obtain word vectors of the words in the second word segmentation set.

In specific implementation, word vector training is performed on the words of the second participle set by using word2 vec. word2vec is a natural language processing tool that functions to translate words in natural language into word vectors that can be understood by a computer.

S704, inputting the word vector of the word in the second participle set into a preset bidirectional GRU network, so that the bidirectional GRU network outputs the node vector of the financial data to be tested.

In a specific implementation, word vectors of words in the second participle set are encoded through a bidirectional GRU (Gated Recurrent Unit) network, and an output of the bidirectional GRU network is a node vector of the financial data to be measured.

And S8, inputting the node vector of the financial data to be detected into the trained text classification model and outputting the label of the financial data to be detected.

In specific implementation, the node vectors of the financial data to be tested are input into the trained text classification model, and the labels of the financial data to be tested are output. The label includes a violation and a compliance.

And if the tag of the financial data to be detected is illegal, judging that the financial data to be detected is illegal.

And if the tag of the financial data to be detected is in compliance, judging that the financial data to be detected is in compliance.

And S9, if the tag of the financial data to be detected is illegal, sending an illegal warning message to a preset supervision terminal.

In specific implementation, if the tag of the financial data to be detected is illegal, an illegal warning message is sent to a preset supervision terminal. For example, alarm information such as an alarm mail and an alarm short message may be sent out.

It should be noted that the supervision terminal refers to a terminal used by a supervisor. And may be embodied as a smart phone or a computer.

By applying the technical scheme of the invention, a historical financial data set is obtained, wherein the historical financial data set comprises a plurality of financial data marked with labels, and the labels comprise violation and compliance; performing data cleaning processing on the historical financial data set to obtain a sample data set, wherein the sample data set comprises a plurality of sample data, and the sample data is obtained by performing data cleaning processing on financial data; constructing a knowledge graph according to the sample data set, wherein nodes of the knowledge graph are sample data; acquiring node vectors of the nodes of the knowledge graph, and adding the node vectors of the nodes of the knowledge graph to a preset training sample set; training a preset text classification model through the training sample set; if initial financial data are received, performing data cleaning processing on the initial financial data to obtain to-be-detected financial data; adding the financial data to be detected into the knowledge graph, and acquiring a node vector of the financial data to be detected; inputting the node vector of the financial data to be detected into the trained text classification model and outputting a label of the financial data to be detected; if the tag of the financial data to be detected is illegal, an illegal warning message is sent to a preset supervision terminal, so that whether the financial data is abnormal or not can be automatically detected, and compared with a manual checking mode, on one hand, the efficiency is high, the accuracy is high, and on the other hand, the cost is greatly reduced.

Referring to fig. 8, fig. 8 is a schematic block diagram of a financial data violation detection apparatus 70 according to an embodiment of the present invention. Corresponding to the above method for detecting violation of financial data, the present invention further provides a device 70 for detecting violation of financial data. The financial data violation detection apparatus 70 comprises means for performing the financial data violation detection method described above, and the financial data violation detection apparatus 70 may be configured in a server. Specifically, the financial data violation detecting device 70 includes a first obtaining unit 71, a first washing unit 72, a building unit 73, a second obtaining unit 74, a training unit 75, a second washing unit 76, a third obtaining unit 77, an input unit 78, and a transmitting unit 79.

A first obtaining unit 71, configured to obtain a historical financial data set, where the historical financial data set includes a plurality of tagged financial data, and the tags include violations and compliance;

a first cleaning unit 72, configured to perform data cleaning processing on the historical financial data set to obtain a sample data set, where the sample data set includes a plurality of sample data, and the sample data is obtained by performing data cleaning processing on financial data;

a constructing unit 73, configured to construct a knowledge graph according to the sample data set, where a node of the knowledge graph is sample data;

a second obtaining unit 74, configured to obtain node vectors of nodes of the knowledge-graph, and add the node vectors of the nodes of the knowledge-graph to a preset training sample set;

a training unit 75, configured to train a preset text classification model through the training sample set;

a second cleaning unit 76, configured to, if initial financial data is received, perform data cleaning processing on the initial financial data to obtain to-be-detected financial data;

a third obtaining unit 77, configured to add the financial data to be tested to the knowledge graph, and obtain a node vector of the financial data to be tested;

an input unit 78, configured to input the node vector of the financial data to be tested into the trained text classification model and output a label of the financial data to be tested;

and the sending unit 79 is used for sending violation warning messages to a preset supervision terminal if the tag of the financial data to be detected is violation.

In an embodiment, said building a knowledge-graph from said sample data set comprises:

taking the sample data in the sample data set as nodes of the knowledge graph and adding the nodes of the knowledge graph into a node set of the knowledge graph;

respectively acquiring cosine distances between any two nodes in the node set;

judging whether the cosine distance between two nodes in the node set is greater than a preset cosine distance threshold value or not;

if the cosine distance between two nodes in the node set is larger than a preset cosine distance threshold, an edge is established between the two nodes, and the edge between the two nodes is added into the edge set of the knowledge graph.

In one embodiment, the obtaining node vectors of the nodes of the knowledge-graph includes:

starting from the nodes of the knowledge graph, carrying out random walk according to edges between the nodes and other nodes to obtain a node sequence with a preset length;

performing word segmentation processing on the node sequence of the nodes to obtain a first word segmentation set;

performing word vector training on the words of the first word segmentation set to obtain word vectors of the words of the first word segmentation set;

and inputting the word vector of the word of the first participle set into a preset bidirectional GRU network so as to output the node vector of the node by the bidirectional GRU network.

In an embodiment, the performing word segmentation on the node sequence of the nodes to obtain a first word segmentation set includes:

dividing the node sequence of the nodes into a plurality of words through a preset word segmentation tool to obtain a first initial word segmentation set;

deleting the stop word in the first initial word segmentation set to obtain the first word segmentation set.

In an embodiment, the training a preset text classification model through the training sample set includes:

and inputting the node vectors in the training sample set into the text classification model, and training the text classification model according to a preset back propagation algorithm.

In an embodiment, the adding the financial data to be tested to the knowledge-graph includes:

adding the financial data to be tested to a node set of the knowledge graph;

respectively acquiring cosine distances between the financial data to be detected and any node in the node set;

judging whether the cosine distance between the financial data to be detected and the nodes in the node set is larger than a preset cosine distance threshold value or not;

if the cosine distance between the financial data to be detected and the nodes in the node set is larger than a preset cosine distance threshold value, establishing edges between the financial data to be detected and the nodes in the node set, and adding the edges between the financial data to be detected and the nodes in the node set into an edge set of a knowledge graph.

In an embodiment, the obtaining the node vector of the financial data to be tested includes:

starting from the financial data to be detected, performing random walk according to edges between the financial data to be detected and other nodes to obtain a node sequence to be detected with a preset length;

performing word segmentation processing on the node sequence to be detected to obtain a second word segmentation set;

performing word vector training on the words of the second word segmentation set to obtain word vectors of the words of the second word segmentation set;

and inputting the word vector of the words of the second word segmentation set into a preset bidirectional GRU network so as to output the node vector of the financial data to be tested by the bidirectional GRU network.

It should be noted that, as will be clear to those skilled in the art, the detailed implementation process of the financial data violation detecting device 70 and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided here.

The financial data violation detecting means 70 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 9.

Referring to fig. 9, fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.

The computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include storage media 503 and internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, causes the processor 502 to perform a financial data violation detection method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be caused to perform a financial data violation detection method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the above-described architecture, which is merely a block diagram of portions of architecture associated with aspects of the present application, is not intended to limit the computing device 500 to which aspects of the present application may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:

training a preset text classification model through the training sample set;

respectively acquiring cosine distances between any two nodes in the node set;

adding the financial data to be tested to a node set of the knowledge graph;

It should be understood that in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. The computer program, when executed by a processor, causes the processor to perform the steps of:

training a preset text classification model through the training sample set;

respectively acquiring cosine distances between any two nodes in the node set;

adding the financial data to be tested to a node set of the knowledge graph;

The storage medium is an entity and non-transitory storage medium, and may be various entity storage media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk. The computer readable storage medium may be non-volatile or volatile.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, while the invention has been described with respect to the above-described embodiments, it will be understood that the invention is not limited thereto but may be embodied with various modifications and changes.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for financial data violation detection, comprising:

training a preset text classification model through the training sample set;

2. The financial data violation detection method of claim 1, wherein said constructing a knowledge-graph from said set of sample data comprises:

respectively acquiring cosine distances between any two nodes in the node set;

3. The financial data violation detection method of claim 2, wherein said obtaining node vectors for nodes of the knowledge-graph comprises:

4. The financial data violation detection method according to claim 3, wherein said performing a word segmentation on the sequence of nodes of the node to obtain a first set of words comprises:

5. The financial data violation detection method according to claim 1, wherein training a preset text classification model through the training sample set comprises:

6. The financial data violation detection method of claim 1, wherein said adding the financial data to be tested to the knowledge-graph comprises:

adding the financial data to be tested to a node set of the knowledge graph;

7. The financial data violation detection method according to claim 5, wherein said obtaining a node vector of the financial data to be tested comprises:

8. A financial data violation detection device, comprising:

9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.