CN116151967A - Fraudulent party identification system based on transaction knowledge graph - Google Patents

Fraudulent party identification system based on transaction knowledge graph Download PDF

Info

Publication number
CN116151967A
CN116151967A CN202111376113.8A CN202111376113A CN116151967A CN 116151967 A CN116151967 A CN 116151967A CN 202111376113 A CN202111376113 A CN 202111376113A CN 116151967 A CN116151967 A CN 116151967A
Authority
CN
China
Prior art keywords
transaction
data
fraud
knowledge
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111376113.8A
Other languages
Chinese (zh)
Inventor
吴斌
李银胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202111376113.8A priority Critical patent/CN116151967A/en
Publication of CN116151967A publication Critical patent/CN116151967A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Finance (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a fraud group identification system based on a transaction knowledge graph, which is characterized by comprising a data acquisition module, a transaction knowledge graph construction module, a fraud group identification module and a fraud risk display module, wherein the data acquisition module is used for acquiring and processing transaction data, the transaction knowledge graph construction module is used for carrying out knowledge extraction and knowledge representation on the transaction data to construct a corresponding transaction knowledge graph, the fraud group identification module is used for carrying out fraud identification on the transaction knowledge graph to obtain fraud risk data containing information such as fraud modes, and the fraud risk display module is used for providing visual risk display of the fraud risk data for a user.

Description

Fraudulent party identification system based on transaction knowledge graph
Technical Field
The invention relates to a fraud partner identification system based on a transaction knowledge graph.
Background
Fraud in transactions often occurs by illegal parties using various means of fraud to earn illegal interests from transactions, destroying financial order. Particularly, the credit deception group can deceive the funds of the financial institution by purchasing the identity card, packaging the data, and repeatedly applying for the loan. In the fraud group, a few devices are shared by multiple persons, the mobile phone numbers of the multiple persons are associated, and funds transfer finally reaches a unified account.
However, there is no technical solution and system for the above-mentioned fraud group, and the existing fraud identification technology has the following limitations:
the source and the type of the transaction data are single, rich information related in the transaction process is not considered, the correlation analysis of multi-source, heterogeneous and heterogeneous data is absent, and a unified knowledge graph is not available to support deeper correlation analysis. At present, a fraud group identification model mainly adopts a community detection method, wherein the method can be mainly divided into a global detection method and a local detection method, the global detection method can not distinguish a fraud community from a normal community, and the false alarm rate is high. The local detection method needs to input a batch of nodes marked as fraud, can only detect communities around seed nodes, and has high report missing rate.
Disclosure of Invention
In order to solve the problems, the invention provides a system for fraud recognition aiming at heterogeneous transaction data, which adopts the following technical scheme:
the invention provides a fraud group identification system based on a transaction knowledge graph, which is characterized by comprising the following steps: the system comprises a data acquisition module, a transaction knowledge graph construction module, a fraud group identification module, a fraud risk display module, a transmission control module and a storage module; the data acquisition module is provided with a data acquisition unit and a preprocessing unit, the data acquisition unit acquires transaction objects and main body data based on a preset data source, the preprocessing unit is used for preprocessing the transaction objects and the main body data to generate transaction data, the transmission control module transmits the transaction data to the storage module for storage and also transmits the transaction data to the transaction knowledge graph construction module, the transaction knowledge graph construction module is provided with a transaction knowledge modeling unit, a transaction knowledge extraction unit, a transaction knowledge representation unit and a transaction knowledge storage unit, the transaction knowledge modeling unit is used for modeling based on a transaction domain ontology, a transaction domain relation and a transaction domain attribute to generate a transaction knowledge recognition model, the transaction knowledge extraction unit carries out knowledge extraction on the received transaction data based on the transaction knowledge recognition model to generate a transaction knowledge extraction result, the transaction knowledge representation unit carries out form representation on the transaction knowledge extraction result to generate a transaction knowledge graph corresponding to the transaction data, the transmission control module transmits the transaction knowledge graph to the storage module to the fraud group identification module while storing the transaction knowledge graph corresponding to the transaction data, the fraud group identification module is a fraud group identification model, the group identification module takes the transaction group identification vector as input, the transaction group identification model carries out calculation of the transaction group identification vector, the transaction group identification model is based on the risk vector display of the transaction vector and the transaction risk vector is displayed, the fraud risk vector is displayed and the fraud group identification module is simultaneously transmitted to the storage module and the risk display module stores the risk vector display data, the transmission control module controls the input display unit to display the fraud risk data screen and display the fraud risk data in the fraud risk data screen.
The invention provides a fraud group identification system based on a transaction knowledge graph, which can also have the technical characteristics that a data source comprises a manual input, a correlation information system, a third party interface and a network, transaction objects and main body data are divided into transaction, main body and news public opinion based on data properties, the transaction objects and main body data are divided into structured data, semi-structured data and unstructured data based on data structures, a data acquisition unit comprises manual input, a correlation information system is called, a third party interface is called and network data crawling is called according to different data sources, and a data preprocessing unit comprises data analysis, data deduplication, data fusion and tag processing according to different data sources, data properties and different data structures.
The fraud group identification system based on the transaction knowledge graph provided by the invention can also have the technical characteristics that the transaction field ontology comprises transaction, enterprise, personnel, equipment, account numbers, regions, industries and operation ranges, the transaction field relationship comprises personnel-enterprise affiliated relationships, personnel-personnel telephone identical relationships, personnel-site location relationships, enterprise-site location relationships, personnel-equipment use relationships, equipment-equipment identical network relationships, equipment-equipment identical version relationships, personnel-account number affiliated relationships, account number-account number transfer relationships, enterprise-industry affiliated relationships, enterprise-operation range affiliated relationships and enterprise-enterprise telephone identical relationships, and the transaction field attribute comprises personnel basic information, personnel residence addresses, personnel telephones, enterprise names, industries, operation ranges, enterprise addresses, enterprise telephones, equipment network information, equipment version, bank information and money.
The fraud group identification system based on the transaction knowledge graph provided by the invention can also have the technical characteristics that the knowledge extraction comprises structured data knowledge extraction, semi-structured data knowledge extraction and unstructured data knowledge extraction, the structured data knowledge extraction is conversion extraction, the semi-structured data knowledge extraction and unstructured data knowledge extraction comprise entity extraction, relation extraction, attribute extraction and entity alignment, the conversion extraction is declarative language based on the relation between a descriptive relational database rule and a knowledge representation form, the structured data is converted into transaction knowledge, the knowledge extraction is carried out, the entity extraction is used for judging the range information of named entities in the text from the semi-structured data and unstructured data, the relation extraction is used for judging the relation information among the entities from the entities, the attribute extraction is used for extracting the attribute information contained in the entities from the entities, and the entity alignment is used for deducing whether different entities from different data sets are mapped to the same object in the physical world.
The fraud group identification system based on the transaction knowledge graph provided by the invention can also have the technical characteristics that the form representation comprises the form of entity-relation-attribute and the form of entity-relation-entity.
The fraud partner identification system based on the transaction knowledge graph provided by the invention can also have the technical characteristics that the fraud risk data comprises transaction object information, transaction main body information, transaction associated information graphs and fraud partner graphs, wherein the transaction associated information graphs and the fraud partner graphs comprise transaction parameter graphs, associated knowledge graphs and associated data objects.
The fraud partner identification system based on the transaction knowledge graph also has the technical characteristics that the fraud partner identification system further comprises a retrieval module, a transaction retrieval request picture and a retrieval result display picture are stored in a picture storage unit, the input display unit displays the transaction retrieval request picture to enable a user to input transaction object information to conduct transaction retrieval requests, once the user confirms input contents, the transmission control module transmits the input transaction object information to the retrieval module, the retrieval module retrieves the storage module according to the transaction object information in the received transaction retrieval request to obtain corresponding transaction data, the transaction knowledge graph and fraud risk data, the transmission control module transmits the retrieved transaction data, the transaction knowledge graph and the fraud risk data to the fraud risk display module, and the retrieval result display picture displays the transaction data, the transaction knowledge graph and the fraud risk data corresponding to the transaction object information to enable the user to view.
The actions and effects of the invention
According to the fraud group identification system based on the transaction knowledge graph, the data acquisition module is used for acquiring and processing transaction data, the transaction knowledge graph construction module is used for carrying out knowledge extraction and knowledge representation on the transaction data to construct a corresponding transaction knowledge graph, the fraud group identification module is used for carrying out fraud identification on the transaction knowledge graph to obtain fraud risk data containing information such as fraud modes and the like, and the fraud risk display module is used for providing visual risk display for the fraud risk data. The fraud group identification system based on the transaction knowledge graph provides a complete process for constructing the transaction knowledge graph aiming at heterogeneous transaction data, can effectively identify fraud groups based on the transaction knowledge graph, overcomes the defects of a fraud identification model in heterogeneous and heterogeneous information, improves the relationship of transaction, personnel, enterprises and related data, provides visual risk display, and improves the accuracy of indexes.
Drawings
FIG. 1 is a schematic diagram of a fraud partner identification system based on a transaction knowledge graph in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a transaction knowledge graph construction module according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a fraudulent party identification process in an embodiment of the present invention;
FIG. 4 is a workflow diagram of a fraud partner identification system based on a transaction knowledge graph in an embodiment of the invention;
FIG. 5 is a flow chart of a fraud partner identification system retrieving fraud risk data based on a transaction knowledge-graph in an embodiment of the invention.
Detailed Description
In order to make the technical means, creation characteristics, achievement purposes and effects achieved by the present invention easy to understand, the following describes the fraud partner identification system based on the transaction knowledge graph of the present invention specifically with reference to the embodiments and the accompanying drawings.
< example >
The experimental environment of this embodiment is set as follows: the loan deceiving group can deceive the funds of the financial institution by purchasing the identity card, packaging the data, and repeatedly applying for the loan. The multiple persons share a few devices, the mobile phone numbers of the multiple persons are associated, and the funds transfer finally reaches the unified account.
Fig. 1 is a schematic diagram of the constitution of a fraud partner identification system based on a transaction knowledge graph in the embodiment of the present invention.
As shown in fig. 1, the fraud partner identifying system 100 based on the transaction knowledge graph of the present embodiment includes a data acquisition module 1, a transaction knowledge graph construction module 2, a fraud partner identifying module 3, a fraud risk display module 4, a transmission control module 5, a storage module 6, and a retrieval module 7.
The data acquisition module 1 comprises a data acquisition unit and a preprocessing unit.
The data acquisition unit is used for acquiring transaction objects and main body data based on different data sources such as manual input, an associated information system, a third party interface and a network.
The preprocessing unit is used for preprocessing the transaction object and the main body data acquired by the data acquisition unit to generate transaction data.
In this embodiment, the transaction object and subject data are classified into transaction, subject and news public opinion based on data properties, and classified into structured data, semi-structured data and unstructured data based on data structures.
The data acquisition unit adopts different data acquisition methods according to different data sources, data properties and data structures, and the data acquisition unit comprises manual input, calling of a related information system, calling of a third party interface and network data crawling.
The preprocessing unit adopts data preprocessing methods according to data sources, data properties and different data structures, and the data preprocessing methods comprise data analysis, data deduplication, data fusion and tag processing. Specifically:
when the data is transaction in nature, the data structure is structured data, the data sources are manual inputs and associated information systems, which are information systems that can extract the current transaction, including the institution information system. The data content collected by the data collection unit comprises a transaction object, a transaction amount, a transaction account number, a transaction enterprise and transaction equipment, the collection method comprises manual input and calling of a related information system, and the preprocessing unit comprises content extraction and tag processing.
When the data property is a subject, the data structure includes structured data and semi-structured data, and the data sources include manual inputs and associated information systems, which are information systems that can extract the current subject, including institutional information systems as well as government information systems. The data content collected by the data collection unit comprises transaction personnel, areas where personnel are located, industries where enterprises belong, areas and operation ranges, the collection method comprises manual input and calling of a related information system, and the preprocessing unit comprises content extraction and label processing.
When the data is news public opinion in nature, the data structure includes unstructured data, semi-structured data and structured data, and the data sources include networks and associated information systems. Wherein, the network source comprises websites, and the associated information system is an information system capable of extracting news media and government website associated information, and comprises a social institution information system and a government information system. The data content collected by the data collection unit comprises publication words and associated information, the publication words comprise news and government notices, the associated information comprises associated words and associated content, the collection method comprises network data crawling and calling of an associated information system, and the preprocessing unit preprocessing the data comprises data analysis, data de-duplication, data fusion and tag processing.
In this embodiment, invoking the third party interface method includes invoking a third party interface based on fields such as an enterprise full name, a personnel identity card, etc., and extracting the associated data object from the search result.
The network data crawling method includes invoking a network search engine based on the enterprise keywords and extracting associated data objects from the search results.
The associated information systems include banking information systems, credit china and third party enterprise information systems (e.g. sky and eye checks).
Network sources include news media, government websites, microblogs, and forums.
Fig. 2 is a schematic diagram of a transaction knowledge graph construction module according to an embodiment of the invention.
As shown in fig. 2, the transaction knowledge map construction module 2 includes a transaction knowledge modeling unit 21, a transaction knowledge extraction unit 22, and a transaction knowledge representation unit 23.
The transaction knowledge modeling unit 21 generates a transaction knowledge recognition model by modeling using a top-down modeling method based on the modeling object.
Because the knowledge graph in the transaction field is oriented to specific problems such as abnormality and risk analysis in the transaction, the definition of the ontology, the relationship and the attribute all serve for the specific risk problem, the ontology is mainly constructed by adopting a manual (field expert) mode, namely, the knowledge graph is constructed in a top-down mode, the design of the ontology and the ontology library and the design of the data mode are firstly made, and then the entity information is poured according to the built frames. Aiming at analysis of fraud risk cases in the transaction field, a seven-step method is adopted to construct a body.
The seven-step method developed by the medical college of Stenford university is a ontology development method based on an ontology development tool Prot g, is more practical and is mainly used for constructing an ontology in the field.
The development tool Prot g supports the expression of various knowledge elements and the definition of various knowledge rules, and the more key function is that the development tool Prot g has various expansion interfaces, supports language rules with multiple ontology representations and provides conversion and export functions between different formats.
The seven steps of the method are respectively as follows: determining the professional field and category of the ontology; examining the possibility of multiplexing the existing body; listing important terms in the ontology; defining classes and class hierarchy; defining attributes of the class; defining facets of the attributes; and creating an instance.
In this embodiment, the modeling object includes a transaction domain ontology, a transaction domain relationship, and a transaction domain attribute. Specifically:
the transaction domain ontology comprises transactions, enterprises, personnel, equipment, accounts, regions, industries and operation ranges.
The transaction field relationship includes a person-to-business affiliated relationship (employee, manager, legal), a person-to-person telephone identity relationship, a person-to-place location relationship, a business-to-place location relationship, a person-to-device usage relationship, a device-to-device identity network (IP, proxy) relationship, a device-to-device identity version relationship, a person-to-account affiliated relationship, an account-to-account transfer relationship, a business-to-business affiliated relationship, and a business-to-business telephone identity relationship.
The transaction area attributes include personnel basic information, personnel residence addresses, personnel telephones, business names, industries, operating ranges, business addresses, business telephones, equipment network information, equipment versions, bank information where the account numbers are located, and money.
The transaction knowledge extraction unit 22 performs knowledge extraction on the received transaction data based on the transaction knowledge recognition model to generate a transaction knowledge extraction result.
In this embodiment, the knowledge extraction method includes structured data knowledge extraction, semi-structured data knowledge extraction and unstructured data knowledge extraction based on the data structure of the transaction data.
The structured data knowledge extraction is conversion extraction, wherein the conversion extraction is based on declarative language describing the relation between the relation database rules and the knowledge representation form, the structured data is converted into transaction knowledge, and the knowledge extraction is carried out. Specifically:
the structured data mainly refers to formatted data already stored in a relational database, and has definitions of entities, relations and attributes, so that only certain data preprocessing is needed. This type of data is stored in the relational database Mysql, which needs to be extracted into the OWL format of the knowledge representation. D2RQ is a declarative language describing the relationship between relational database rules and OWL ontology, and the structured data is extracted by knowledge using D2 RQ.
While the semi-structured data knowledge extraction and unstructured data knowledge extraction include entity extraction, relationship extraction, attribute extraction, and entity alignment.
Since the semi-structured data refers to data captured on a web page based on beaufullso and urllib2, it does not conform to the data model structure associated with the relational database, but contains relevant labels for separating semantic elements and layering records and fields, whereas the unstructured data mainly refers to text-like data such as news public opinion. Therefore, for this type of data, a complete entity extraction, relationship extraction, attribute extraction, entity alignment process is required. Specifically:
entity extraction is to determine the scope information of named entities in text from the semi-structured data and unstructured data.
The entity extraction problem can be converted into a sequence labeling problem, and then the problem is solved by a sequence labeling method. The Bi-LSTM-CRF is a sequence labeling algorithm, wherein the sequence labeling adopts a BIO method, a BiLSTM-CRF model main body consists of a bidirectional long and short time memory network (Bi-LSTM) and a Conditional Random Field (CRF), model input is character characteristics, output is a prediction label corresponding to each character, biLSTM receives the ebadd of each character, predicts the probability of each character to label labels, and a CRF layer takes the transmission_score of BiLSTM as input and outputs the predicted label sequence which accords with the label transfer constraint condition and is most possible.
In this embodiment, the Bi-LSTM-CRF algorithm is used to perform entity extraction, where the character embedding representation algorithm uses BERT.
For example for the following text:
the vouch for enterprise B quorum Xue Mou agrees to temporarily take over for enterprise a and changes the a quorum delegate to staff Chen Mou for enterprise B.
The entity extraction result is:
guarantee enterprise BLegal representativeXue MouConsent to temporary take overA enterprisesAnd willA enterprisesLegal representative changes toB rabbet Industry is provided withStaff of (a)Chen Mou
Relationship extraction is the determination of relationship information between entities from the entities, which is used in knowledge representation to characterize the association of two or more named entities or attributes, so the primary purpose of relationship extraction is to find and learn the association between different entities in the information source.
The R-BERT algorithm solves the relationship classification task by both utilizing a pre-trained Bert language model and combining information from the target entity. The model first inserts special tokens (token) before and after the location of the target entities, then enters text into the BERT for fine-tuning to identify the location of the two target entities and passes information to the BERT model. The locations of the two target entities are then found in the output empeddings of the BERT model. Their ebeddings and sense codes (embedding of a feature token set in BERT) are used as inputs to the multi-layer neural network classification. In this way, semantic information of sentences and two target entities can be captured to better accommodate the relationship classification task.
In this embodiment, the relation extraction is performed using the R-BERT algorithm.
For example for the following text:
the vouch for enterprise B quorum Xue Mou agrees to temporarily take over for enterprise a and changes the a quorum delegate to staff Chen Mou for enterprise B.
The relation extraction result is:
guarantee enterprise B-legal representative person-Xue Mou, xue Mou-take over-a enterprise, a-legal representative person-Chen Mou, chen Mou-employee-B enterprise.
The attribute extraction is to extract the attribute contained in the entity from the entity, wherein the attribute is a detailed description of the entity information and can enrich the information condition of one entity. In the transaction field, the transaction itself and the transaction main body both contain a lot of attribute information, and the attribute information is used as an important basis for risk judgment and attribute extraction is required.
MetaPAD can effectively find high-quality type text modes from different types of massive corpora, and is beneficial to information extraction. The framework of MetaPAD is divided into 3 parts, with the first part developing a context-aware segmentation method to determine the boundaries of the sub-sequences and generate the frequency, integrity and informativeness of the meta-patterns. The second section groups synonym patterns. The third part then adjusts the level of entity types for each synonym pattern group according to the appropriate granularity to have an exact meta-pattern.
In this embodiment, the MetaPAD algorithm is used for attribute extraction.
For example for the following text:
by inquiring credit investigation of the enterprise A in the post-loan inspection system, the enterprise A is found to be about to replace the mortgage to the auction house of the branch mortgage in the s-line loan of 200 ten thousand yuan, 80 ten thousand yuan, 40 ten thousand yuan and 80 ten thousand yuan at the end of 12 months in 2018 (20 days after the branch loan is issued), 1 month in 2019, 9 month in 2019 and 10 months respectively.
The attribute information about loan behavior that can be extracted includes:
loan time: 12 months at 2018, 1 month 2019, 9 months 2019, 10 months 2019.
Loan amount: 200 ten thousand yuan, 80 ten thousand yuan, 40 ten thousand yuan, 80 ten thousand yuan.
Loan mortgage: and (5) auction of the property.
Entity alignment is used to infer whether different entities from different data sets map to the same object in the physical world. Since data in a transaction scenario is multi-source heterogeneous, the same object is expressed differently in different scenarios and sources.
For example: the Xiamen you Hui information technology Limited company in the enterprise information base is often abbreviated as "you Hui" in data sources such as news public opinion, and the two entities refer to the same object, so that the two entities need to be combined.
Because the entity number of the transaction knowledge graph is small-scale and the matching degree between different entity names is relatively high, a relatively simple entity alignment method can be adopted. Dedupe is a python library that uses machine learning to quickly perform fuzzy matching, deduplication, and entity alignment on structured data.
The transaction knowledge representation unit 23 formally represents the transaction knowledge extraction result to generate a transaction knowledge graph corresponding to the transaction data.
In this embodiment, the formal representation of transaction knowledge includes entity-relationship-properties and entity-relationship-entities.
Transaction knowledge representation refers to a markup language used to describe semantic web resources and their relationships. RDF, RDFS and OWL are all semantic-net-based knowledge representation frameworks, where the atlas modeling capabilities of OWL are very flexible and fast and can distinguish data attributes from object attributes.
OWL can be regarded as an extension of RDFS for defining higher-level concepts of RDF describing the interrelationships of resources on a network, it has a more comprehensive vocabulary, and provides a powerful tool for users to construct ontologies, annotate data, such as symmetry, reflective properties, property chains, self-constraints, etc.
In this embodiment, the partial OWL of the transaction knowledge graph is expressed as follows:
Figure BDA0003363907550000151
/>
Figure BDA0003363907550000161
/>
Figure BDA0003363907550000171
the fraud partner recognition module 3 generates fraud risk data for the fraud partner recognition model by calculating a feature vector of the transaction knowledge graph by taking the transaction knowledge graph as input, and recognizing a corresponding fraud pattern based on the feature vector.
The fraud risk data includes transaction object information, transaction subject information, transaction-related information graphic representations, and fraud partner graphic representations. Wherein, the transaction association information diagram and the fraud partner diagram comprise a transaction parameter diagram, an association knowledge graph and an association data object.
Common features in the group fraud scenario are: 1. the subjects and relationships involved in transactions are heterogeneous, such as data for devices, account numbers, funds transactions, etc., and the model needs to be able to handle heterogeneous data; 2. the main body contains more attribute features which can effectively reflect risk factors, so that the model can process node attribute information simultaneously while utilizing a graph network structure; 3. fraudulent group is a community with certain risk characteristics, not all communities are fraudulent group, for example, families and friends can form communities, but the fraudulent group belongs to normal communities, and the model needs to be capable of distinguishing the fraudulent communities from the communities; 4. the patterns of fraud are constantly changing and the models need to be able to accommodate different forms of fraudulent party features.
FIG. 3 is a schematic diagram of a fraudulent party identification process in an embodiment of the present invention.
In this embodiment, based on the above common features in the partner fraud scenario, as shown in fig. 3, the process of using the fraud partner identification model for fraud identification includes fraud partner embedding learning, fraud pattern acquisition, risk node selection, and partner extension. Specifically:
the rogue-partner embedding study is used to obtain an embedded representation of the rogue-partner. Specifically:
the fraudulent party identification model takes a group of communities marked as fraudulent parties as input, firstly carries out representation learning on the group of fraudulent parties, uses a node representation algorithm to generate a characteristic vector of each node in the graph, and then measures and takes the characteristic vector of all nodes in each community as the characteristic vector of the community.
In this embodiment, the node representation algorithm adopts an unsupervised HetGNN algorithm, and HetGNN is a heterogeneous graph representation learning method that considers both structural heterogeneous and node content heterogeneous of the graph.
And acquiring a fraud mode, after acquiring the representation of the fraud partner based on the fraud partner embedding learning, clustering the characteristic vectors of the batch of the partner by using a Kmeans clustering algorithm, wherein each clustering center is used as a fraud mode.
And selecting risk nodes, namely selecting a first-order neighbor of each node in the inputted graph of the transaction knowledge graph and forming an initial community with the node, calculating the similarity between the initial community and each fraud mode, and taking the value with the highest similarity as the final similarity of the node.
And (3) expanding the group, namely selecting a node with highest similarity, starting from a community formed by the current first-order neighbors, and expanding the community by using an improved dynamic Beam Search method to obtain the final fraudulent group.
In order to reduce the space and time occupied by searching under the condition of larger solution space of the graph, nodes with poor quality are cut off and nodes with higher quality are reserved when each step of depth expansion is performed, so that the space consumption is reduced and the time efficiency is improved.
In this embodiment, the Beam Search method is applied to the problem of group extension, and the feature of extension is improved, that is, in the community extension, the number of candidates is inconsistent each time because the degree of each node is inconsistent, so that under the condition that the calculation amount is kept basically unchanged, the size of the Beam width is dynamically adjusted according to the degree of the node selected in the next step, so that the number of candidates in each step is kept consistent, and the node with higher quality is kept as much as possible.
The fraud risk display module 4 has a picture storage unit and an input display unit.
The picture storage unit stores a fraud risk data picture, a transaction search request picture, and a search result display picture.
The input display unit is used for displaying a fraud risk data picture to enable a user to view fraud risk data, and displaying a search result display picture to enable the user to view a search result of the transaction search request, wherein the search result comprises transaction data, a transaction knowledge graph and fraud risk data.
The input display unit is also used for displaying a transaction retrieval request picture to enable a user to input transaction object information to conduct a transaction retrieval request.
The transmission control module 5 is used for transmitting and sending data information between the modules of the fraud partner identification system 100 based on the transaction knowledge graph of the present embodiment.
The storage module 6 is used for storing the transaction data, the transaction knowledge graph and the fraud risk data transmitted by the transmission control module 5 in different modes correspondingly.
The transaction data are stored in a data property mode. Specifically:
when the data is transaction and subject, the storage method comprises database storage and blockchain storage; when the data is news public opinion, the storage method comprises database storage, file storage and blockchain certification.
Since the transaction knowledge has the characteristics of a graph structure, the transaction knowledge is stored by a graph database. The use of the graph database to store information ensures the structural property of the knowledge, and the searching mode is also carried out according to the graph characteristics when the knowledge is searched and inferred. In this embodiment, neo4j database is used for storage.
Meanwhile, because a large amount of structured data exists in the transaction field, the efficiency and the performance are better when complex knowledge retrieval and reasoning are processed by the relational data aiming at the part of data. Therefore, in this embodiment, the storage mode of the transaction knowledge graph is to use a graph database and a relational database in a mixed manner.
The retrieving module 7 is configured to retrieve the corresponding transaction data, transaction knowledge graph and fraud risk data from the storage module 6 according to the transaction object information in the received transaction retrieval request.
FIG. 4 is a workflow diagram of a transaction knowledge-graph-based fraud partner identification system in an embodiment of the invention.
As shown in fig. 4, the process of performing fraudulent party identification based on the transaction knowledge pattern by using the fraudulent party identification system 100 based on the transaction knowledge pattern of the present embodiment includes the steps of:
step S1, a data acquisition unit of a data acquisition module 1 acquires transaction objects and main body data based on a preset data source, and a preprocessing unit preprocesses the transaction objects and the main body data to generate transaction data;
step S2, the transmission control module 5 transmits the transaction data to the storage module 6 for storage and also transmits the transaction data to the transaction knowledge graph construction module 2;
step S3, the transaction knowledge graph construction module 2 performs knowledge extraction and formal representation on the transaction data based on the transaction knowledge recognition model so as to generate a transaction knowledge graph corresponding to the transaction data;
step S4, the transmission control module 5 transmits the transaction knowledge graph to the storage module 6 to store the transaction data correspondingly and simultaneously to the fraud partner identification module 3;
s5, the fraud partner recognition module takes the transaction knowledge graph as input based on the fraud partner recognition model, calculates feature vectors of the transaction knowledge graph, recognizes corresponding fraud modes based on the feature vectors, and generates fraud risk data;
step S6, the transmission control module 5 transmits the fraud risk data to the storage module 6 to be stored corresponding to the transaction knowledge graph and also to the fraud risk display module 4;
step S7, the input display unit of the fraud risk display module 4 displays the fraud risk data screen and displays the fraud risk data in the fraud risk data screen.
FIG. 5 is a flow chart of a fraud partner identification system retrieving fraud risk data based on a transaction knowledge-graph in an embodiment of the invention.
As shown in fig. 5, the process of fraud risk data retrieval using the fraud partner identification system 100 based on the transaction knowledge map of the present embodiment includes the steps of:
step A1, an input display unit of the fraud risk display module 4 displays a transaction retrieval request picture to enable a user to input transaction object information to conduct a transaction retrieval request;
step A2, after the user confirms the input content, the transmission control module 5 transmits the input transaction object information to the retrieval module 7;
step A3, the retrieval module 7 retrieves the storage module 6 according to the transaction object information in the received transaction retrieval request to obtain corresponding transaction data, transaction knowledge graph and fraud risk data;
step A4, the transmission control module 5 transmits the retrieved transaction data, the transaction knowledge graph and the fraud risk data to the fraud risk display module 4;
and step A5, the search result display picture displays transaction data, a transaction knowledge graph and fraud risk data corresponding to the transaction object information so as to be checked by a user.
In this embodiment, each module of the fraud group identification system 100 based on the transaction knowledge graph may be implemented by a program, where the functions and the program of each module are relatively complete and independent, so as to ensure that each module may be implemented and upgraded independently.
Example operation and Effect
According to the fraud partner identification system based on the transaction knowledge graph, the data acquisition module is used for acquiring and processing transaction data, the transaction knowledge graph construction module is used for carrying out knowledge extraction and knowledge representation on the transaction data to construct a corresponding transaction knowledge graph, the fraud partner identification module is used for carrying out fraud identification on the transaction knowledge graph to obtain fraud risk data containing information such as fraud modes and the like, and the fraud risk display module is used for providing visual risk display for the fraud risk data. The fraud group identification system based on the transaction knowledge graph provides a complete process for constructing the transaction knowledge graph aiming at heterogeneous transaction data, can effectively identify fraud groups based on the transaction knowledge graph, overcomes the defect of a fraud identification model in heterogeneous and heterogeneous information, improves the relationship of transaction, personnel, enterprises and related data, provides visual risk display, and improves the accuracy of indexes.
In an embodiment, the data acquisition module adopts a corresponding acquisition method and a preprocessing method and stores associated data aiming at data objects and acquisition contents with different properties, so that acquisition and processing of data with different sources are realized.
In the embodiment, the transaction knowledge graph construction module is composed of transaction knowledge modeling, transaction knowledge extraction, transaction knowledge representation and transaction knowledge storage, so that a complete transaction knowledge graph construction flow is formed, rich node, relationship and attribute information are reserved, the effective utilization of the related information of the transaction field is realized, and the method has important and beneficial effects on subsequent application.
In the embodiment, the fraud group identification module makes up the defect of the fraud group identification model in heterogeneous and heterogeneous information association analysis by utilizing information such as transaction knowledge graph nodes, relations, attributes and the like. Meanwhile, as the method of semi-supervised community detection is adopted in the method of fraud group identification, the fraud group identification model of the embodiment can summarize fraud modes, effectively detect fraud groups from the whole graph, and simultaneously reduce false alarm rate and false alarm rate. In addition, on the aspect of a fraudulent party identification algorithm, the embodiment adopts node-based embedded learning, so that the characteristics of any party structure can be learned, and the method is suitable for the characteristics of fraudulent parties in different forms. And by adopting the Beam Search method, the accuracy of fraudulent party identification is improved under the condition of ensuring the consistent calculated amount.
In the embodiment, the fraud risk display module provides data integration, visual display and interactive search of fraud risk through the display interface and man-machine interaction (i.e. search request picture), so that the transaction main body information and the related data objects thereof are supported to be searched from the transaction objects, the requirements of transaction wind control personnel are met, the user experience is improved, the interpretability is enhanced, and the assistance is provided for the follow-up decision of the wind control personnel.
The above examples are only for illustrating the specific embodiments of the present invention, and the present invention is not limited to the description scope of the above examples.

Claims (7)

1. A fraud partner identification system based on a transaction knowledge graph, comprising:
the system comprises a data acquisition module, a transaction knowledge graph construction module, a fraud group identification module, a fraud risk display module, a transmission control module and a storage module;
wherein the data acquisition module is provided with a data acquisition unit and a preprocessing unit,
the data acquisition unit acquires transaction object and subject data based on a predetermined data source,
the preprocessing unit is used for preprocessing the transaction object and the main body data to generate transaction data,
the transmission control module transmits the transaction data to the storage module for storage and also transmits the transaction data to the transaction knowledge graph construction module,
the transaction knowledge graph construction module is provided with a transaction knowledge modeling unit, a transaction knowledge extraction unit and a transaction knowledge representation unit,
the transaction knowledge modeling unit is used for modeling based on the transaction domain ontology, the transaction domain relationship and the transaction domain attribute to generate a transaction knowledge identification model,
the transaction knowledge extraction unit performs knowledge extraction on the received transaction data based on the transaction knowledge recognition model to generate a transaction knowledge extraction result,
the transaction knowledge representation unit formally represents the transaction knowledge extraction result to generate a transaction knowledge graph corresponding to the transaction data,
the transmission control module transmits the transaction knowledge graph to the storage module, stores the transaction knowledge graph corresponding to the transaction data and also transmits the transaction knowledge graph to the fraud partner identification module,
the fraudulent party identification module is a fraudulent party identification model,
the fraud partner identification model takes the transaction knowledge graph as input, calculates feature vectors of the transaction knowledge graph, identifies corresponding fraud patterns based on the feature vectors, generates fraud risk data,
the transmission control module transmits the fraud risk data to the storage module, stores the fraud risk data corresponding to the transaction knowledge graph and also transmits the fraud risk data to the fraud risk display module,
the fraud risk display module has a picture storage unit and an input display unit,
the picture storage unit stores a fraud risk data picture,
the transmission control module controls the input display unit to display the fraud risk data picture and displays the fraud risk data in the fraud risk data picture.
2. A fraud partner identification system based on a transaction knowledge graph as claimed in claim 1, wherein:
wherein the data sources include manual inputs, associated information systems, third party interfaces, and networks,
the transaction object and subject data are classified into transaction, subject and news public opinion based on data properties,
the transaction object and the main body data are divided into structured data, semi-structured data and unstructured data based on a data structure,
the data acquisition unit adopts a data acquisition method according to the data source, the data property and the data structure, wherein the data acquisition method comprises manual input, calling of an associated information system, calling of a third party interface and network data crawling,
the preprocessing unit comprises data analysis, data deduplication, data fusion and tag processing according to the data source, the data property and the data preprocessing method adopted by the data structure.
3. A fraud partner identification system based on a transaction knowledge graph as claimed in claim 1, wherein:
wherein the transaction field ontology comprises transaction, enterprise, personnel, equipment, account number, region, industry and operation range,
the transaction field relationship comprises the same relationship between personnel and enterprises, the same relationship between personnel and personnel, the same relationship between personnel and places, the same relationship between enterprises and places, the same relationship between personnel and devices, the same network relationship between devices and devices, the same version relationship between devices and devices, the same relationship between personnel and accounts, the transfer relationship between accounts and accounts, the same relationship between enterprises and industries, the same relationship between enterprises and business ranges and the same relationship between enterprises and places,
the transaction field attributes comprise personnel basic information, personnel residence addresses, personnel telephones, enterprise names, industries, operating ranges, enterprise addresses, enterprise telephones, equipment network information, equipment versions, bank information where accounts are located and money.
4. A fraud partner identification system based on a transaction knowledge graph as claimed in claim 1, wherein:
wherein the knowledge extraction comprises structured data knowledge extraction, semi-structured data knowledge extraction and unstructured data knowledge extraction,
the structured data knowledge extraction is a conversion extraction,
the semi-structured data knowledge extraction and unstructured data knowledge extraction include entity extraction, relationship extraction, attribute extraction and entity alignment,
the conversion extraction is based on declarative language describing the relation between the relation database rules and knowledge representation forms, the structured data is converted into transaction knowledge, knowledge extraction is carried out,
the entity extraction is to determine the scope information of named entities in the text from the semi-structured data and the unstructured data,
the relationship extraction is to determine relationship information between entities from the entities,
the attribute extraction is to extract attribute information contained in the entity from the entity,
the entity alignment is used to infer whether different entities from different data sets map to the same object in the physical world.
5. A fraud partner identification system based on a transaction knowledge graph as claimed in claim 1, wherein:
wherein the formal representation includes the form of entity-relationship-attribute and the form of entity-relationship-entity.
6. A fraud partner identification system based on a transaction knowledge graph as claimed in claim 1, wherein:
wherein the fraud risk data includes transaction object information, transaction subject information, transaction-related information graphic representations, and fraud partner graphic representations,
the transaction association information diagram and the fraud group diagram each comprise a transaction parameter diagram, an association knowledge graph and an association data object.
7. A transaction knowledge-graph-based fraud partner identification system as claimed in claim 6, wherein:
wherein, the utility model also comprises a retrieval module,
the picture storage unit also stores a transaction search request picture and a search result display picture,
the input display unit displays the transaction retrieval request screen to enable a user to input the transaction object information to perform transaction retrieval request,
upon confirmation of the input content by the user, the transmission control module transmits the input transaction object information to the retrieval module,
the retrieval module retrieves the storage module according to the transaction object information in the received transaction retrieval request to obtain the corresponding transaction data, the transaction knowledge graph and the fraud risk data,
the transmission control module transmits the retrieved transaction data, the transaction knowledge graph and the fraud risk data to the fraud risk display module,
and the search result display picture displays the transaction data, the transaction knowledge graph and the fraud risk data corresponding to the transaction object information for the user to check.
CN202111376113.8A 2021-11-19 2021-11-19 Fraudulent party identification system based on transaction knowledge graph Pending CN116151967A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111376113.8A CN116151967A (en) 2021-11-19 2021-11-19 Fraudulent party identification system based on transaction knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111376113.8A CN116151967A (en) 2021-11-19 2021-11-19 Fraudulent party identification system based on transaction knowledge graph

Publications (1)

Publication Number Publication Date
CN116151967A true CN116151967A (en) 2023-05-23

Family

ID=86337604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111376113.8A Pending CN116151967A (en) 2021-11-19 2021-11-19 Fraudulent party identification system based on transaction knowledge graph

Country Status (1)

Country Link
CN (1) CN116151967A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786126A (en) * 2023-12-28 2024-03-29 永信至诚科技集团股份有限公司 Knowledge graph-based naked-touch clue analysis method and device
CN117874755A (en) * 2024-03-13 2024-04-12 中国电子科技集团公司第三十研究所 System and method for identifying hidden network threat users
CN118193856A (en) * 2024-05-17 2024-06-14 成都无糖信息技术有限公司 Knowledge-graph-based network pollution partner tracing method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786126A (en) * 2023-12-28 2024-03-29 永信至诚科技集团股份有限公司 Knowledge graph-based naked-touch clue analysis method and device
CN117874755A (en) * 2024-03-13 2024-04-12 中国电子科技集团公司第三十研究所 System and method for identifying hidden network threat users
CN117874755B (en) * 2024-03-13 2024-05-10 中国电子科技集团公司第三十研究所 System and method for identifying hidden network threat users
CN118193856A (en) * 2024-05-17 2024-06-14 成都无糖信息技术有限公司 Knowledge-graph-based network pollution partner tracing method and system

Similar Documents

Publication Publication Date Title
EP3985578A1 (en) Method and system for automatically training machine learning model
CN112328801B (en) Method for predicting group events by event knowledge graph
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN116151967A (en) Fraudulent party identification system based on transaction knowledge graph
US20230056987A1 (en) Semantic map generation using hierarchical clause structure
CN112927082A (en) Credit risk prediction method, apparatus, device, medium, and program product
WO2023040493A1 (en) Event detection
CN114915468B (en) Intelligent analysis and detection method for network crime based on knowledge graph
Hsu et al. Integrating machine learning and open data into social Chatbot for filtering information rumor
Rahman et al. A systematic review towards big data analytics in social media
CN112597775A (en) Credit risk prediction method and device
Thomas et al. Semi‐supervised, knowledge‐integrated pattern learning approach for fact extraction from judicial text
Nadeem et al. SSM: Stylometric and semantic similarity oriented multimodal fake news detection
CN115545558A (en) Method, device, machine readable medium and equipment for obtaining risk identification model
Jagdish et al. Identification of End‐User Economical Relationship Graph Using Lightweight Blockchain‐Based BERT Model
Thandaga Jwalanaiah et al. Effective deep learning based multimodal sentiment analysis from unstructured big data
Guo A mutual attention based multimodal fusion for fake news detection on social network
van Loon Three families of automated text analysis
Bai et al. A multi-task attention tree neural net for stance classification and rumor veracity detection
Vishwanath et al. Social media data extraction for disaster management aid using deep learning techniques
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN115952770B (en) Data standardization processing method and device, electronic equipment and storage medium
CN115204393A (en) Smart city knowledge ontology base construction method and device based on knowledge graph
Zhang et al. A text mining based method for policy recommendation
Faccia et al. NLP And IR Applications For Financial Reporting And Non-Financial Disclosure. Framework Implementation And Roadmap For Feasible Integration With The Accounting Process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination