CN110766091B - Method and system for identifying trepanning loan group partner - Google Patents

Method and system for identifying trepanning loan group partner Download PDF

Info

Publication number
CN110766091B
CN110766091B CN201911049749.4A CN201911049749A CN110766091B CN 110766091 B CN110766091 B CN 110766091B CN 201911049749 A CN201911049749 A CN 201911049749A CN 110766091 B CN110766091 B CN 110766091B
Authority
CN
China
Prior art keywords
node
nodes
relationships
relationship
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911049749.4A
Other languages
Chinese (zh)
Other versions
CN110766091A (en
Inventor
刘胜
梁淑云
马影
陶景龙
王启凡
魏国富
徐�明
殷钱安
余贤喆
周晓勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN201911049749.4A priority Critical patent/CN110766091B/en
Publication of CN110766091A publication Critical patent/CN110766091A/en
Application granted granted Critical
Publication of CN110766091B publication Critical patent/CN110766091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a method and a system for identifying a set-top loan group, wherein the method comprises the following steps: 1) Acquiring characteristic data related to the running process of the loop credit; 2) The keywords contained in the feature data are used as nodes, and a relation graph comprising the nodes is constructed according to the relation among the nodes; 3) Shrinking non-persona nodes in the relationship graph to persona nodes corresponding to the non-persona nodes; 4) Determining the weight of the edge according to the type of the edge between the character nodes, and dividing the relation graph into a plurality of node sets; 5) Aiming at each node set, the coincidence degree of the node set and the data of the preset loop credit criminals is obtained, the probability that the nodes in the node set are the loop credit group members is obtained, and the characters corresponding to the node set with the probability larger than the preset threshold value are taken as the loop credit group members. By applying the embodiment of the invention, the corresponding surrouding partner can be identified according to the data of the existing surrouding criminals.

Description

Method and system for identifying trepanning loan group partner
Technical Field
The invention relates to an identification method and system, in particular to an identification method and system for a set-top loan group partner.
Background
The set of credits is nominally civil lending and is essentially a criminal act. Criminals form false liabilities by inducing victims to sign lending related agreements, virtually increasing lending amounts, maliciously making violations, wantonly recognizing violations, destroying evidence of repayment, and the like, and illegally occupying the victims' properties by litigation, arbitration, notarization, or using violence, threat, and other means. The set of road loans has knowledge criminal formats, even individual legal practitioners become collusion of perpetrators and give professional legal guidance to the perpetrators, the success rate of false litigation is improved, and high crimes are obtained. The trepanning lending activity has strong concealment, fast profit, high income, easy replication and propagation and extremely great harm. The set of the loans seriously infringes legal rights of borrowers, disturbs normal financial order, derives various criminal crimes and influences society stability. Some set of road credits spread from offline to online by means of a network platform, and traditional contact crimes are changed into novel non-contact crimes, so that the number of persons in the affected group is more, the scope is wider, and the social hazard is great. Therefore, how to timely and accurately identify the trepanning lending party and further play a positive role in the safety and stability of society is a technical problem to be solved urgently.
Patent application No. 201810562975.1 discloses a method for monitoring the transfer of creditor, which comprises: acquiring right transfer information and right information, wherein the right transfer information at least comprises right grades, and the right transfer information comprises a transfer right person, an accepting right person and a transfer amount; updating the creditor information according to the creditor transfer information; and establishing and displaying a right transfer relation graph according to the right transfer information and the right information.
In the prior art, only the bond relation transfer can be identified, and the trepanning credit group partner cannot be identified.
Disclosure of Invention
The technical problem to be solved by the invention is how to provide a method and a system for identifying a trepanning credit group partner so as to identify the trepanning credit group partner.
The invention solves the technical problems by the following technical means:
the embodiment of the invention provides a method for identifying a set-top loan group partner, which comprises the following steps:
1) And acquiring characteristic data related in the running process of the loop credit, wherein the characteristic data comprises the following components: communication data, transaction records, and personal information of personnel involved in the trepanning process;
2) The keywords contained in the feature data are used as nodes, and a relation graph comprising the nodes is constructed according to the relation among the nodes;
3) Shrinking non-persona nodes in the relationship graph into persona nodes corresponding to the non-persona nodes;
4) Determining the weight of the edge according to the type of the edge between the character nodes, and dividing the relation graph into a plurality of node sets; wherein the types of edges include: one or a combination of employment relationships, colleague relationships, transfer relationships, charging relationships, payment relationships, conversation relationships, investment relationships, reporting relationships, title relationships, job relationships, behavioral relationships, and affinity relationships;
5) And aiming at each node set, the coincidence degree of the node set and the data of the preset surrouding criminals is obtained, the probability that the nodes in the node set are surrouding partner members is obtained, and the characters corresponding to the node set with the probability larger than a preset threshold value are used as the surrouding partner members.
By applying the embodiment of the invention, a corresponding relation diagram is established according to the characteristic data related in the running process of the set-up credit, and a relation diagram only comprising the character relation is established according to the relation diagram; dividing a relation diagram only comprising character relations into a plurality of node sets through iteration pairs among weights, judging the probability of each node set as a set credit group according to the number of the set credit criminals in the node sets, and further can identify the partner of the corresponding surroup according to the data of the existing surrouding criminal.
Optionally, the step 2) includes:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person name, a place name, a company name, an identification card number, a telephone number, a bank card number, a QQ number, an email address, an IP address, a number home location, and a number home company.
Optionally, the acquiring process of the relationship between the nodes in the step 2) includes:
for structured data, directly inquiring to obtain a relation among nodes, wherein the structured data comprises the following components: a bank transaction record; the relationship between the nodes comprises: one or a combination of transfer relationship, charging relationship, payment relationship, call relationship, investment relationship;
and extracting the relation among the nodes by using a syntactic analysis algorithm for unstructured data, wherein the unstructured data comprises the following steps: conversation content and chat record; the relationship between nodes further comprises: reporting relationship, calling relationship, job relationship, behavior relationship and affinity relationship.
Optionally, the step 4) includes:
41 Each node in the relation graph after node contraction operation is executed is randomly assigned with a unique ID, and each side is assigned with preset weight according to the type of the side between adjacent nodes;
42 For each node, using the formula, W ab =∑w ab +∑w ba Calculating a weight summary of the nodes, wherein W ab Summarizing weights between the node a and the node b; w (w) ab Is a unionPoint a points to the weight of node b; w (w) ba Is the weight directed by node b to node a;
43 Updating the ID of the node to the ID of the node with the largest weight summary value in the neighbor nodes, and returning to the execution step 42) until the IDs of the nodes are not changed;
44 Dividing the nodes with the same ID into one node set to obtain a plurality of node sets.
Optionally, the obtaining the probability that the node in the node set is a member of a set of credit groups includes:
for each set of nodes, the node is identified, using a formula,calculating the probability that the person corresponding to the node in the node set is taken as a member of the set-top credit group, wherein,
s is the probability that the person corresponding to the node in the node set is taken as a trepanning credit group member; m is the number of nodes in the node set; n is the number of nodes in the data of the person corresponding to the node in the node set and belonging to the predetermined set of road lending criminals.
The embodiment of the invention also provides a system for identifying the trepanning credit group partner, which comprises the following steps:
the first acquisition module is used for acquiring characteristic data related in the running process of the set of roads, wherein the characteristic data comprises: communication data, transaction records, and personal information of personnel involved in the trepanning process;
the construction module is used for constructing a relation diagram comprising all the nodes according to the relation among all the nodes by taking the keywords contained in the feature data as the nodes;
a contraction module, configured to contract a non-human node in nodes in a relationship graph to a human node corresponding to the non-human node;
the dividing module is used for determining the weight of the edge according to the type of the edge between the character nodes and dividing the relation graph into a plurality of node sets; wherein the types of edges include: one or a combination of employment relationships, colleague relationships, transfer relationships, charging relationships, payment relationships, conversation relationships, investment relationships, reporting relationships, title relationships, job relationships, behavioral relationships, and affinity relationships;
and the second acquisition module is used for acquiring the probability that the nodes in the node sets are trepanning credit group members according to the coincidence degree of the node sets and the data of the preset trepanning credit criminals, and taking the characters corresponding to the node sets with the probability larger than a preset threshold value as the trepanning credit group members.
Optionally, the construction module is configured to:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person name, a place name, a company name, an identification card number, a telephone number, a bank card number, a QQ number, an email address, an IP address, a number home location, and a number home company.
Optionally, the construction module is configured to:
for structured data, directly inquiring to obtain a relation among nodes, wherein the structured data comprises the following components: a bank transaction record; the relationship between the nodes comprises: one or a combination of transfer relationship, charging relationship, payment relationship, call relationship, investment relationship;
and extracting the relation among the nodes by using a syntactic analysis algorithm for unstructured data, wherein the unstructured data comprises the following steps: conversation content and chat record; the relationship between nodes further comprises: reporting relationship, calling relationship, job relationship, behavior relationship and affinity relationship.
Optionally, the dividing module is configured to:
41 Each node in the relation graph after node contraction operation is executed is randomly assigned with a unique ID, and each side is assigned with preset weight according to the type of the side between adjacent nodes;
42 For each node, using the formula, W ab =∑w ab +∑w ba Calculating a summary of the weights of the nodes,wherein W is ab Summarizing weights between the node a and the node b; w (w) ab Is the weight directed by node a to node b; w (w) ba Is the weight directed by node b to node a;
43 Updating the ID of the node to the ID of the node with the largest weight summary value in the neighbor nodes, and returning to the execution step 42) until the IDs of the nodes are not changed;
44 Dividing the nodes with the same ID into one node set to obtain a plurality of node sets.
Optionally, the second obtaining module is configured to:
for each set of nodes, the node is identified, using a formula,calculating the probability that the person corresponding to the node in the node set is taken as a member of the set-top credit group, wherein,
s is the probability that the person corresponding to the node in the node set is taken as a trepanning credit group member; m is the number of nodes in the node set; n is the number of nodes in the data of the person corresponding to the node in the node set and belonging to the predetermined set of road lending criminals.
The invention has the advantages that:
by applying the embodiment of the invention, a corresponding relation diagram is established according to the characteristic data related in the running process of the set-up credit, and a relation diagram only comprising the character relation is established according to the relation diagram; dividing a relation diagram only comprising character relations into a plurality of node sets through iteration pairs among weights, judging the probability of each node set as a set credit group according to the number of the set credit criminals in the node sets, and further can identify the partner of the corresponding surroup according to the data of the existing surrouding criminal.
Drawings
FIG. 1 is a flow chart of a method for identifying a set of lending partners according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a system for identifying a set of lending partners according to an embodiment of the present invention;
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Fig. 1 is a flow chart of a method for identifying a set of lending partners according to an embodiment of the present invention, as shown in fig. 1, where the method includes:
s101: and acquiring characteristic data related to the running process of the loop credit, wherein the characteristic data comprises the following components: communication data, transaction records, and personal information of personnel involved in the trepanning process.
For example, data needed for identifying the trepanning credit group can be stored in the corresponding directory of the local server from various related databases in a month. These data are required to satisfy the following: call information, bank transaction records, website access records, personal basic information, and mastered partial criminal information, etc. related to the implementation process of the trepanning credit.
In practical applications, the feature data, such as call information, includes fields: call number (call_phone), called number (call_phone), call time (call_time), call duration (call_dur), call content (call_content), and the like; the banking transaction records include: sender (u_transfer), sender (u_receive), transaction time (transfer_time), transaction account number (transfer_acct); the website access record should contain: user IP (user_ip), web address (v_url), access time (t_time), operation content (user_opt); the personal basic information should contain fields: name (user_name), certificate number (identi_num), phone number (phone_num), QQ number (qq_num), micro-letter number (wechat_num), and the like.
It should be emphasized that the characteristic data involved in the running process of the loop credit refer to data possibly involved in the whole flow link of the loop credit.
S102: and constructing a relation graph comprising all the nodes according to the relation among all the nodes by taking the keywords contained in the feature data as the nodes.
For example, in the first step, keywords included in the feature data may be extracted by using a natural language processing algorithm, where the keywords include: information such as person name, place name, company name, identification card number, telephone number, bank card number, QQ number, email address, IP address, number attribution company, place, key event, number words, action words, bank card number, telephone number, identification card number, IP address, domain name, mail address, etc. According to the requirements, the data of the bank card category, the telephone number operator, the identity card attribution and the like can be further perfected, and the normalization processing is carried out on the names of the keywords, wherein the normalization is realized through the common reference resolution in the natural language processing technology, the common reference among all the phrases is extracted from sentences, and the keyword which is commonly referred to by the phrases is determined, so that the normalization is realized, and the subsequent further processing and analysis are convenient.
For example, from the feature data acquired in step S101, these entities may be extracted from the original text by a named entity recognition technique in the field of natural language processing. Identification extracts person names, place names, key events (or actions), etc. For the name of a person, place name, company name, etc., this kind of information is called an entity in the field of natural language processing, or named entity. The information is the key words scattered in a large amount of data and used as the cores; at the same time, the information is input into a subsequent figure relation network analysis function module as very important input data.
In addition, the feature data acquired in step S101 includes another very important information, including but not limited to: identification card number, telephone number, bank card number, QQ number, email address, IP address, etc. When the characteristic data acquired in step S101 is read manually and encounters such information, a judgment cannot be made immediately. Therefore, the information can be extracted and stored in a concentrated mode by using a keyword extraction algorithm such as a word2vec model, and further the information can be further subjected to arrangement analysis, so that the information can be better utilized. For example, the information such as the identification card number, the mobile phone number, the QQ number and the like can be extracted and directly associated with a person according to all the relations of the numbers, so that more comprehensive information is provided when the relations and interactions of the persons are analyzed.
Further, in the feature data acquired in step S101, especially in spoken, informal text, there often occur "names" or "nicknames" of people or other keywords, or many different ways of calling, and these "names" or "nicknames" may be identified, and then these keywords are placed as nodes in the relationship graph in step S103, and these nodes are connected with other nodes by edges.
In addition, the information such as the mobile phone number of the bank card can be inquired of the attribution and the attribution company, and the information possibly related to the inquired geographic position information, group information and the like is respectively used as the node edge to be connected, so that the functions of mutual matching and mutual verification are achieved.
And secondly, taking the keywords as nodes in the relation graph, and extracting the relation among the nodes.
After the keywords in the jacket-loan related data are extracted, all that is needed next is to extract the association relationship between the keywords from the feature data acquired in step S101. The extraction of the relation between the elements is mainly divided into two aspects: for structured data, directly inquiring to obtain a relation among nodes, wherein the structured data comprises the following components: a bank transaction record; the relationship between the nodes comprises: one or a combination of transfer relationship, charging relationship, payment relationship, call relationship, investment relationship; and extracting the relation among the nodes by using a syntactic analysis algorithm for unstructured data, wherein the unstructured data comprises the following steps: conversation content and chat record; the relationship between nodes further comprises: reporting relationship, calling relationship, job relationship, behavior relationship and affinity relationship.
For example, for the feature data acquired in the step S101, the data may be directly queried to obtain the relationship of remittance between the remittance party and the payee. Similarly, the association relationship between the keywords (such as personnel name, company name, place, key event, etc.) in all the structured data can be obtained, and the relationship types to be extracted include: transfer relationship, charging relationship, payment relationship, call relationship, investment relationship, etc.
For example, for the feature data acquired in the unstructured S101 step, such as conversation voice, chat logs, and the like, keyword relationship extraction may be performed by a syntactic analysis method. Syntactic analysis is one of key technologies in natural language processing, and is a processing procedure of analyzing an input text sentence to obtain a syntactic structure of the sentence, and performing correlation processing by analyzing syntactic information such as a subject, a predicate, and an object in the sentence. The relationships to be extracted here include: reporting relationships, calling relationships, job relationships, behavioral relationships, affinity relationships, and the like.
It should be noted that, the keyword recognition and the syntax analysis algorithm are all existing algorithms, and the innovation of the embodiment of the invention mainly lies in the innovation of the whole technical thought.
S103: non-persona nodes in the relationship graph are contracted into persona nodes corresponding to the non-persona nodes.
For example, the method is based on the fact that the keywords are extracted in the S102 and the association relation found in the S103 is combined, and a multi-element relation network is built by taking the keywords as nodes and the association relation among the keywords as edges. The feature data acquired in step S101 contains a lot of important relationship information, and the most central is the relationship between people. Which have direct contact and interaction with each other and which are indirect associations made with each other by others, are probably one of the most important information contained in the data. In addition, there are also relationships between people and parties (including companies, organizations, etc.), and a large class of relationships that can be extracted from text, such as which people are directly involved in a party or doing work for the party.
The method is based on the multi-element relation network constructed in the step S104, and the relation diagram only comprising the character nodes is further optimized. Because the relational network constructed in the S104 method comprises a plurality of keyword nodes, the nodes comprise the following elements: person name, company name, location, key event, quantity words, action words, bank card number, telephone number, identification card number, IP address, domain name, mail address, etc. In order to better analyze the relationship graph between people in a relationship network, non-person nodes need to be simplified to be integrated into the attributes of edges between person nodes. For example, in the multi-element relationship network, the first person has an employment relationship with the third person, and the second person has an employment relationship with the third person, so that in the simplified figure relationship graph, the side relationship between the first person and the second person is a colleague relationship.
S104: determining the weight of the edge according to the type of the edge between each character node, and dividing the relation graph into a plurality of node sets; wherein the types of edges include: employment relationships, colleague relationships, transfer relationships, charging relationships, payment relationships, conversation relationships, investment relationships, reporting relationships, title relationships, job relationships, behavioral relationships, intimate relationships, or a combination thereof.
Specifically, the present step may include the following: 41 Each node in the relation graph after node contraction operation is executed is randomly assigned with a unique ID, and each side is assigned with preset weight according to the type of the side between adjacent nodes; 42 For each node, using the formula, W ab =∑w ab +∑w ba Calculating a weight summary of the nodes, wherein W ab Summarizing weights between the node a and the node b; w (w) ab Is the weight directed by node a to node b; w (w) ba Is the weight directed by node b to node a; 43 Updating the ID of the node to the ID of the node with the largest value of the weight summary in its neighbor nodes, and returning to execute step 42),until the IDs of the nodes are no longer changed; 44 Dividing the nodes with the same ID into one node set to obtain a plurality of node sets.
Illustratively, the ID of each node is updated with reference to the IDs of neighboring nodes, and the node ID of the maximum weight edge calculated in S1062 is taken as the latest ID of the node among all neighboring nodes. In the updating process, all nodes perform simultaneously, all the calculated IDs are the IDs before updating, and the IDs after updating do not participate in the calculation.
For example, if the ID of node a is 1, the ID of node B is 2, there is an edge between a and B, and node B is the largest neighbor of node a, then the ID of a will be replaced with 2 after the update, and if a is also the largest neighbor of node B before the update, then node B will be replaced with 1 after the update. After a round of computation is completed, a new iteration is continued until the IDs of all nodes no longer change. At this time, the node sets with the same node ID have a great association relationship and belong to the same node set.
It should be emphasized that in the same round of updating process, updating of the node ID is performed, and the ID after updating of the node is the ID of the neighboring node of the node obtained after the last round of updating is completed.
S105: aiming at each node set, the coincidence degree of the node set and the data of a preset surrouding criminal is obtained, the probability that the nodes in the node set are surrouding partner members is obtained, and characters corresponding to the node set with the probability larger than a preset threshold value are used as the surrouding partner members.
For each node set, the node set may be updated, using a formula,calculating the probability of the person corresponding to the node in the node set as a trepanning credit group member, wherein S is the probability of the person corresponding to the node in the node set as the trepanning credit group member; m is the number of nodes in the node set; n is the number of nodes in the data of the person corresponding to the node in the node set and belonging to the predetermined set of road lending criminals.
It will be appreciated that the portion of criminals that have been mastered refer to criminals that have had clear evidence of suiting.
By applying the embodiment of the invention, a corresponding relation diagram is established according to the characteristic data related in the running process of the set-up credit, and a relation diagram only comprising the character relation is established according to the relation diagram; dividing a relation diagram only comprising character relations into a plurality of node sets through iteration pairs among weights, judging the probability of each node set as a set credit group according to the number of the set credit criminals in the node sets, and further can identify the partner of the corresponding surroup according to the data of the existing surrouding criminal.
In addition, the traditional trepanning credit partner identification technology mainly relies on reading and analyzing the record files of different sources, combing the relationship among people, and identifying key people and clues. In the analysis process, the personnel at different levels need to repeatedly understand the same case, and important clues hidden in the summary of hundreds of word supply files and multiple different word supply files of the same personnel are found through comparative analysis. After the man-object relationship in the road loan is cleared, whether the criminal has a partner or not is analyzed by a statistical analysis method according to the mastered data such as the criminal call information, the transfer information, the chat record and the like. Similar analysis is then performed sequentially for identified group members until all group members are found.
With the increasing number of forms of the trepanning credit, various information data available for the trepanning credit partner identification is also increasing. This presents new challenges to the traditional work mode of manually reading, understanding and analyzing data clues, the information types and the number of gauge models related to the same set of lending cases are beyond the level that can be understood by the human brain, the investigation clues hidden behind the information are not easy to find, a great deal of manpower is often consumed in screening redundant information, and the truly valuable clues can be identified at the end or even possibly ignored.
The invention provides an analysis method based on a knowledge graph, which can effectively display information contents of a plurality of channels and analyze personnel relations involved in a set of road loans in a multi-dimensional manner so that the set of road loan is more comprehensive and accurate.
Compared with the prior art, the invention has the beneficial effects that: the knowledge graph-based technology can integrate the characteristic data acquired in the step S101 of different sources into the same relation graph after extraction and integration, so that the visual effect of the data is greatly enhanced, and the deep relation hidden behind the complex network can be easily excavated. Compared with the traditional statistical analysis method, the group identification based on the graph analysis method has higher accuracy and interpretability.
Example 2
Corresponding to the embodiment 1 of the invention, the embodiment of the invention also provides a system for identifying the trepanning loan group partner.
Fig. 2 is a schematic structural diagram of an identification system for a set-top loan partner, according to an embodiment of the invention, as shown in fig. 2, the system includes:
a first obtaining module 201, configured to obtain feature data related to a set of road loan running process, where the feature data includes: communication data, transaction records, and personal information of personnel involved in the trepanning process;
a building module 202, configured to build a relationship graph including each node according to the relationship between each node by using the keywords included in the feature data as nodes;
a contraction module 203, configured to contract non-persona nodes in the relationship graph to persona nodes corresponding to the non-persona nodes;
the dividing module 204 is configured to determine a weight of an edge according to a type of the edge between each character node, and divide the relationship graph into a plurality of node sets; wherein the types of edges include: one or a combination of employment relationships, colleague relationships, transfer relationships, charging relationships, payment relationships, conversation relationships, investment relationships, reporting relationships, title relationships, job relationships, behavioral relationships, and affinity relationships;
a second obtaining module 205, configured to obtain, for each node set, a probability that a node in the node set is a member of a set credit group by using a degree of coincidence between the node set and data of a predetermined set credit criminal, and take, as the member of the set credit group, a person corresponding to the node set whose probability is greater than a preset threshold.
By applying the embodiment of the invention, a corresponding relation diagram is established according to the characteristic data related in the running process of the set-up credit, and a relation diagram only comprising the character relation is established according to the relation diagram; dividing a relation diagram only comprising character relations into a plurality of node sets through iteration pairs among weights, judging the probability of each node set as a set credit group according to the number of the set credit criminals in the node sets, and further can identify the partner of the corresponding surroup according to the data of the existing surrouding criminal.
In a specific implementation manner of the embodiment of the present invention, the construction module 202 is configured to:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person name, a place name, a company name, an identification card number, a telephone number, a bank card number, a QQ number, an email address, an IP address, a number home location, and a number home company.
In a specific implementation of the embodiment of the present invention, the shrinking module 203 is configured to:
for structured data, directly inquiring to obtain a relation among nodes, wherein the structured data comprises the following components: a bank transaction record; the relationship between the nodes comprises: one or a combination of transfer relationship, charging relationship, payment relationship, call relationship, investment relationship;
and extracting the relation among the nodes by using a syntactic analysis algorithm for unstructured data, wherein the unstructured data comprises the following steps: conversation content and chat record; the relationship between nodes further comprises: reporting relationship, calling relationship, job relationship, behavior relationship and affinity relationship.
In a specific implementation manner of the embodiment of the present invention, the dividing module 204 is configured to:
41 Each node in the relation graph after node contraction operation is executed is randomly assigned with a unique ID, and each side is assigned with preset weight according to the type of the side between adjacent nodes;
42 For each node, using the formula, W ab =∑w ab +∑w ba Calculating a weight summary of the nodes, wherein W ab Summarizing weights between the node a and the node b; w (w) ab Is the weight directed by node a to node b; w (w) ba Is the weight directed by node b to node a;
43 Updating the ID of the node to the ID of the node with the largest weight summary value in the neighbor nodes, and returning to the execution step 42) until the IDs of the nodes are not changed;
44 Dividing the nodes with the same ID into one node set to obtain a plurality of node sets.
In a specific implementation manner of the embodiment of the present invention, the second obtaining module 205 is configured to:
for each set of nodes, the node is identified, using a formula,calculating the probability that the person corresponding to the node in the node set is taken as a member of the set-top credit group, wherein,
s is the probability that the person corresponding to the node in the node set is taken as a trepanning credit group member; m is the number of nodes in the node set; n is the number of nodes in the data of the person corresponding to the node in the node set and belonging to the predetermined set of road lending criminals.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A method of identifying a set of lending partners, the method comprising:
1) And acquiring characteristic data related in the running process of the loop credit, wherein the characteristic data comprises the following components: communication data, transaction records, and personal information of personnel involved in the trepanning process;
2) The keywords contained in the feature data are used as nodes, and a relation graph comprising the nodes is constructed according to the relation among the nodes; the obtaining process of the relation between the nodes in the step 2) comprises the following steps:
for structured data, directly inquiring to obtain a relation among nodes, wherein the structured data comprises the following components: a bank transaction record; the relationship between the nodes comprises: one or a combination of transfer relationship, charging relationship, payment relationship, call relationship, investment relationship;
and extracting the relation among the nodes by using a syntactic analysis algorithm for unstructured data, wherein the unstructured data comprises the following steps: conversation content and chat record; the relationship between nodes further comprises: reporting relationships, calling relationships, job relationships, behavioral relationships and affinity relationships;
3) Shrinking non-persona nodes in the relationship graph into persona nodes corresponding to the non-persona nodes;
4) Determining the weight of the edge according to the type of the edge between the character nodes, and dividing the relation graph into a plurality of node sets; wherein the types of edges include: one or a combination of employment relationships, colleague relationships, transfer relationships, charging relationships, payment relationships, conversation relationships, investment relationships, reporting relationships, title relationships, job relationships, behavioral relationships, and affinity relationships;
the step 4) includes:
41 Each node in the relation graph after node contraction operation is executed is randomly assigned with a unique ID, and each side is assigned with preset weight according to the type of the side between adjacent nodes;
42 For each node, using the formula, W ab =Σw ab +Σw ba Calculating a weight summary of the nodes, wherein W ab Summarizing weights between the node a and the node b; w (w) ab Is the weight directed by node a to node b; w (w) ba Is the weight directed by node b to node a;
43 Updating the ID of the node to the ID of the node with the largest weight summary value in the neighbor nodes, and returning to the execution step 42) until the IDs of the nodes are not changed;
44 Dividing the nodes with the same ID into a node set to obtain a plurality of node sets;
5) And aiming at each node set, the coincidence degree of the node set and the data of the preset surrouding criminals is obtained, the probability that the nodes in the node set are surrouding partner members is obtained, and the characters corresponding to the node set with the probability larger than a preset threshold value are used as the surrouding partner members.
2. A method of identifying a set of lending partners according to claim 1, wherein said step 2) comprises:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person name, a place name, a company name, an identification card number, a telephone number, a bank card number, a QQ number, an email address, an IP address, a number home location, and a number home company.
3. The method for identifying a trepanning credit group partner as claimed in claim 1, wherein said obtaining the probability that a node in said set of nodes is a trepanning credit group partner member comprises:
for each set of nodes, the node is identified, using a formula,calculating the probability that the person corresponding to the node in the node set is taken as a member of the set-top credit group, wherein,
s is the probability that the person corresponding to the node in the node set is taken as a trepanning credit group member; m is the number of nodes in the node set; n is the number of nodes in the data of the person corresponding to the node in the node set and belonging to the predetermined set of road lending criminals.
4. A system for identifying a trepanning credit group partner, the system comprising:
the first acquisition module is used for acquiring characteristic data related in the running process of the set of roads, wherein the characteristic data comprises: communication data, transaction records, and personal information of personnel involved in the trepanning process;
the construction module is used for constructing a relation diagram comprising all the nodes according to the relation among all the nodes by taking the keywords contained in the feature data as the nodes; the construction module is used for:
for structured data, directly inquiring to obtain a relation among nodes, wherein the structured data comprises the following components: a bank transaction record; the relationship between the nodes comprises: one or a combination of transfer relationship, charging relationship, payment relationship, call relationship, investment relationship;
and extracting the relation among the nodes by using a syntactic analysis algorithm for unstructured data, wherein the unstructured data comprises the following steps: conversation content and chat record; the relationship between nodes further comprises: reporting relationships, calling relationships, job relationships, behavioral relationships and affinity relationships;
a contraction module, configured to contract a non-human node in nodes in a relationship graph to a human node corresponding to the non-human node;
the dividing module is used for determining the weight of the edge according to the type of the edge between the character nodes and dividing the relation graph into a plurality of node sets; wherein the types of edges include: one or a combination of employment relationships, colleague relationships, transfer relationships, charging relationships, payment relationships, conversation relationships, investment relationships, reporting relationships, title relationships, job relationships, behavioral relationships, and affinity relationships;
the dividing module is used for:
41 Each node in the relation graph after node contraction operation is executed is randomly assigned with a unique ID, and each side is assigned with preset weight according to the type of the side between adjacent nodes;
42 For each node, using the formula, W ab =Σw ab +Σw ba Calculating a weight summary of the nodes, wherein W ab Summarizing weights between the node a and the node b; w (w) ab Is the weight directed by node a to node b; w (w) ba Is the weight directed by node b to node a;
43 Updating the ID of the node to the ID of the node with the largest weight summary value in the neighbor nodes, and returning to the execution step 42) until the IDs of the nodes are not changed;
44 Dividing the nodes with the same ID into a node set to obtain a plurality of node sets;
and the second acquisition module is used for acquiring the probability that the nodes in the node sets are trepanning credit group members according to the coincidence degree of the node sets and the data of the preset trepanning credit criminals, and taking the characters corresponding to the node sets with the probability larger than a preset threshold value as the trepanning credit group members.
5. The system for identifying a set of lending partners of claim 4, wherein the building module is configured to:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person name, a place name, a company name, an identification card number, a telephone number, a bank card number, a QQ number, an email address, an IP address, a number home location, and a number home company.
6. The system for identifying a set of lending groups according to claim 4, wherein the second obtaining module is configured to:
for each set of nodes, the node is identified, using a formula,calculating the probability that the person corresponding to the node in the node set is taken as a member of the set-top credit group, wherein,
s is the probability that the person corresponding to the node in the node set is taken as a trepanning credit group member; m is the number of nodes in the node set; n is the number of nodes in the data of the person corresponding to the node in the node set and belonging to the predetermined set of road lending criminals.
CN201911049749.4A 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner Active CN110766091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911049749.4A CN110766091B (en) 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911049749.4A CN110766091B (en) 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner

Publications (2)

Publication Number Publication Date
CN110766091A CN110766091A (en) 2020-02-07
CN110766091B true CN110766091B (en) 2024-02-27

Family

ID=69334905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911049749.4A Active CN110766091B (en) 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner

Country Status (1)

Country Link
CN (1) CN110766091B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754337B (en) * 2020-06-30 2024-02-23 上海观安信息技术股份有限公司 Method and system for identifying credit card maintenance card present community

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404890A (en) * 2015-10-13 2016-03-16 广西师范学院 Criminal gang discrimination method considering locus space-time meaning
WO2016210327A1 (en) * 2015-06-25 2016-12-29 Websafety, Inc. Management and control of mobile computing device using local and remote software agents
CN108038778A (en) * 2017-12-05 2018-05-15 深圳信用宝金融服务有限公司 Clique's fraud recognition methods of the small micro- loan of internet finance and device
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109191281A (en) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 A kind of group's fraud identifying system of knowledge based map
CN109299811A (en) * 2018-08-20 2019-02-01 众安在线财产保险股份有限公司 A method of the identification of fraud clique and Risk of Communication prediction based on complex network
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN109741173A (en) * 2018-12-27 2019-05-10 深圳前海微众银行股份有限公司 Recognition methods, device, equipment and the computer storage medium of suspicious money laundering clique
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN109919624A (en) * 2019-02-28 2019-06-21 杭州师范大学 A kind of net loan fraud clique's identification and method for early warning based on space-time centrality
CN110224859A (en) * 2019-05-16 2019-09-10 阿里巴巴集团控股有限公司 The method and system of clique for identification

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050243736A1 (en) * 2004-04-19 2005-11-03 International Business Machines Corporation System, method, and service for finding an optimal collection of paths among a plurality of paths between two nodes in a complex network
US8332366B2 (en) * 2006-06-02 2012-12-11 International Business Machines Corporation System and method for automatic weight generation for probabilistic matching
US10043213B2 (en) * 2012-07-03 2018-08-07 Lexisnexis Risk Solutions Fl Inc. Systems and methods for improving computation efficiency in the detection of fraud indicators for loans with multiple applicants

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016210327A1 (en) * 2015-06-25 2016-12-29 Websafety, Inc. Management and control of mobile computing device using local and remote software agents
CN105404890A (en) * 2015-10-13 2016-03-16 广西师范学院 Criminal gang discrimination method considering locus space-time meaning
CN108038778A (en) * 2017-12-05 2018-05-15 深圳信用宝金融服务有限公司 Clique's fraud recognition methods of the small micro- loan of internet finance and device
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109299811A (en) * 2018-08-20 2019-02-01 众安在线财产保险股份有限公司 A method of the identification of fraud clique and Risk of Communication prediction based on complex network
CN109191281A (en) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 A kind of group's fraud identifying system of knowledge based map
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN109741173A (en) * 2018-12-27 2019-05-10 深圳前海微众银行股份有限公司 Recognition methods, device, equipment and the computer storage medium of suspicious money laundering clique
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN109919624A (en) * 2019-02-28 2019-06-21 杭州师范大学 A kind of net loan fraud clique's identification and method for early warning based on space-time centrality
CN110224859A (en) * 2019-05-16 2019-09-10 阿里巴巴集团控股有限公司 The method and system of clique for identification

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
一种动态联盟企业风险概率识别方法;黄敏等;《东北大学学报(自然科学版)》;20051228(第12期);全文 *
一种基于局部相似性的社区发现算法;吴钟刚等;《计算机工程》;20161215(第12期);全文 *
基于专利发明人人名消歧的研发团队识别研究;张静等;《知识管理论坛》;20160629(第03期);全文 *
科技驱动金融 生活与众不同;赵国庆;《金融电子化》;20181115(第11期);全文 *

Also Published As

Publication number Publication date
CN110766091A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
Jiang et al. Linguistic signals under misinformation and fact-checking: Evidence from user comments on social media
CN109767322B (en) Suspicious transaction analysis method and device based on big data and computer equipment
CN107122451B (en) Automatic construction method of legal document sorter
JP6715838B2 (en) System and method for automatically identifying potentially important facts in a document
US7693767B2 (en) Method for generating predictive models for a business problem via supervised learning
TWI709927B (en) Method and device for determining target user group
Debreceny et al. Data mining of electronic mail and auditing: A research agenda
CN109492097B (en) Enterprise news data risk classification method
CN107679977A (en) A kind of tax administration platform and implementation method based on semantic analysis
CN110880142A (en) Risk entity acquisition method and device
US20130339288A1 (en) Determining document classification probabilistically through classification rule analysis
CN111144087A (en) Enterprise legal flow assistant decision-making system and method based on artificial intelligence
CN112016850A (en) Service evaluation method and device
Perez et al. I Call BS: Fraud detection in crowdfunding campaigns
Hidayati et al. Development of conceptual framework for cyber fraud investigation
Soni et al. Reducing risk in KYC (know your customer) for large Indian banks using big data analytics
CN110766091B (en) Method and system for identifying trepanning loan group partner
Clarke Dataveillance by governments: The technique of computer matching
Li et al. automatically detecting peer-to-peer lending intermediary risk—Top management team profile textual features perspective
CN110134866A (en) Information recommendation method and device
CN113407734B (en) Method for constructing knowledge graph system based on real-time big data
US11681966B2 (en) Systems and methods for enhanced risk identification based on textual analysis
US11880394B2 (en) System and method for machine learning architecture for interdependence detection
CN111428041A (en) Case abstract generation method, device, system and storage medium
Harshvardhan et al. Topic modelling Twitterati sentiments using Latent Dirichlet allocation during demonetization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant