CN114579692A

CN114579692A - Fraud data deep analysis method and system

Info

Publication number: CN114579692A
Application number: CN202011391778.1A
Authority: CN
Inventors: 俞龙�; 张云柯; 刘占峰
Original assignee: 360 Smart Technology Tianjin Co ltd
Current assignee: 360 Smart Technology Tianjin Co ltd
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2022-06-03

Abstract

The invention relates to the technical field of information processing, and discloses a fraud data deep analysis method and a system, wherein the method comprises the following steps: carrying out data conversion on the obtained fraud case source data to obtain structured data; building a fraud analysis knowledge graph according to the structured data, and embedding the fraud analysis knowledge graph into a corresponding fraud analysis model; and acquiring corresponding fraud feature data according to the fraud analysis knowledge-map, and inputting the fraud feature data into the fraud analysis model to obtain a fraud analysis result. Different from the prior art, only case retrieval and case export functions are provided for cases, or simple clue association analysis is provided, the fraud analysis knowledge graph is constructed based on fraud case source data, and is embedded into the corresponding fraud analysis model, so that deep analysis on fraud case source data is realized based on the fraud analysis model, and case analysis efficiency and visibility of case data are improved.

Description

Fraud data deep analysis method and system

Technical Field

The invention relates to the technical field of information processing, in particular to a fraud data deep analysis method and system.

Background

Knowledge Graph (Knowledge Graph) is a modern theory which combines the theory and method of applying mathematics, graphics, information visualization technology, information science and other subjects with the method of metrology citation analysis, co-occurrence analysis and the like, and utilizes the visualized Graph to vividly show the core structure, development history, frontier field and overall Knowledge framework of the subjects to achieve the aim of multi-subject fusion.

However, in the prior art, most of systems storing fraud case source data are case management systems, which only provide case retrieval and case export functions, and a small number of case management systems can implement clue association analysis based on a relational database, but still have the disadvantages of poor visibility and slow analysis efficiency. The knowledge map is used as a knowledge base of a semantic network, can describe knowledge resources and carriers thereof by using a visualization technology, and excavates, analyzes, constructs, draws and displays knowledge and mutual relations among the knowledge resources and the carriers. Based on this, how to perform deep analysis on fraud case source data based on the knowledge graph to improve case analysis efficiency and visibility of case data becomes a problem to be solved urgently.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a fraud data deep analysis method and a fraud data deep analysis system, and aims to solve the technical problem of how to carry out deep analysis on fraud case source data based on a knowledge graph so as to improve case analysis efficiency and visibility of case data.

In order to achieve the above object, the present invention provides a fraud data depth analysis method, comprising:

carrying out data conversion on the obtained fraud case source data to obtain structured data;

building a fraud analysis knowledge graph according to the structured data, and embedding the fraud analysis knowledge graph into a corresponding fraud analysis model;

and acquiring corresponding fraud feature data according to the fraud analysis knowledge-map, and inputting the fraud feature data into the fraud analysis model to obtain a fraud analysis result.

Preferably, the step of performing data conversion on the obtained fraud case source data to obtain structured data specifically includes:

carrying out data cleaning on the obtained fraud case source data to obtain the cleaned fraud case source data;

performing word segmentation processing on the washed fraud case source data to obtain word segmentation data;

and carrying out knowledge extraction processing on the word segmentation data according to a preset structured rule to obtain structured data.

Preferably, the step of extracting knowledge from the segmented word data according to a preset structuring rule to obtain the structured data specifically includes:

performing entity extraction on the word segmentation data according to a preset structured rule to obtain entity data;

extracting the relation of the word segmentation data according to a preset structured rule to obtain relation data;

generating structured data based on the entity data and the relationship data

Preferably, the steps of constructing a fraud analysis knowledge-graph according to the structured data, and embedding the fraud analysis knowledge-graph into a corresponding fraud analysis model specifically include:

performing data integration on the structured data to obtain a triple, and inputting the triple into a graph database;

generating a corresponding fraud analysis knowledge graph according to the triples and a preset knowledge graph frame in the graph database;

embedding the fraud analysis knowledge-graph into a corresponding fraud analysis model.

Preferably, the fraud analysis knowledge graph comprises a case knowledge graph, a fund flow knowledge graph and a call ticket knowledge graph;

correspondingly, the step of generating a corresponding fraud analysis knowledge graph according to the triplet and a preset knowledge graph frame in the graph database specifically includes:

extracting corresponding case structure data from the triples according to a case knowledge graph frame in the graph database to perform data supplement, and generating corresponding case knowledge graphs;

extracting corresponding fund flow structure data from the triples according to a fund flow knowledge graph frame in the graph database for data supplement, and generating corresponding fund flow knowledge graphs;

extracting corresponding call ticket structure data from the triple according to a call ticket knowledge map frame in the map database to perform data supplement, and generating a corresponding call ticket knowledge map;

accordingly, the step of embedding the fraud analysis knowledge-map into the corresponding fraud analysis model specifically comprises:

embedding the case knowledge graph into a corresponding fraud analysis model;

embedding the fund flow knowledge graph into a corresponding fraud analysis model;

and embedding the ticket knowledge graph into a corresponding fraud analysis model.

Preferably, the step of obtaining corresponding fraud feature data according to the fraud analysis knowledge-map and inputting the fraud feature data into the fraud analysis model to obtain fraud analysis results specifically comprises:

searching strongly-associated case data and/or weakly-associated case data in the case map database according to the case knowledge map;

inputting the found strong association case data and/or the weak association case data into the fraud analysis model to obtain fraud analysis results.

Preferably, before the step of searching for strongly associated case data and weakly associated case data in the case database according to the case knowledge graph, the method further includes:

extracting a target question-answer mapping relation according to the case knowledge graph, and establishing a corresponding question-answer template based on the target question-answer mapping relation;

and generating a case handling plugin according to the question-answer template and the target question-answer mapping relation, and fusing the case handling plugin into the fraud analysis model.

Preferably, the step of extracting corresponding fraud feature data from the fraud analysis knowledge-graph and inputting the fraud feature data into the fraud analysis model to obtain fraud analysis results specifically comprises:

searching a target account with historical fund flow lower than a preset lowest fund flow from the fund flow knowledge graph;

and when detecting that the target account is credited with target funds larger than a preset transaction fund flow, sending a running water record corresponding to the target funds into the fraud analysis model to obtain a fraud analysis result.

acquiring the initiating positions of the call ticket data from the call ticket knowledge graph, and counting the number of calls at each initiating position;

and when the target initiating position with the number of the call bills larger than the preset number of the call bills is detected to exist, inputting the target initiating position into the fraud analysis model to obtain a fraud analysis result.

Preferably, the step of embedding the fraud analysis knowledge-map into the corresponding fraud analysis model specifically comprises:

establishing an embedded representation template according to the fraud analysis knowledge graph, and establishing a corresponding model search library based on the embedded representation template;

searching a primary-order model corresponding to the fraud analysis knowledge graph in the model search library;

and training the initial-order model according to the fraud analysis knowledge graph to obtain a corresponding fraud analysis model.

Preferably, the step of training the initial-order model according to the fraud analysis knowledge-graph to obtain a corresponding fraud analysis model specifically includes:

extracting each fraud index item from the fraud analysis knowledge-graph, and calculating the grading result of each fraud index item according to a preset grading rule;

taking the fraud feature data as an input result of the primary model, and taking the scoring result as an output result of the primary model;

and training the initial-order model according to the input result and the output result to obtain a corresponding fraud analysis model.

obtaining corresponding fraud feature data according to the fraud analysis knowledge-map, and inputting the fraud feature data into the fraud analysis model to obtain a scoring result corresponding to the fraud feature data;

sorting the scoring results in a sorting mode from high to low to obtain scoring sorting results;

generating a fraud analysis result based on the ranking the scoring results of the first rank.

In addition, to achieve the above object, the present invention further provides a fraud data deep analysis system, which includes:

the data structuring module is used for performing data conversion on the obtained fraud case source data to obtain structured data;

the model embedding module is used for constructing a fraud analysis knowledge graph according to the structured data and embedding the fraud analysis knowledge graph into a corresponding fraud analysis model;

a fraud analysis module for obtaining corresponding fraud feature data according to the fraud analysis knowledge-map, and inputting the fraud feature data into the fraud analysis model to obtain fraud analysis results.

The data structuring module is further used for carrying out data cleaning on the obtained fraud case source data to obtain the cleaned fraud case source data;

the data structuring module is further configured to perform word segmentation processing on the washed fraud case source data to obtain word segmentation data;

and the data structuring module is also used for extracting knowledge from the word segmentation data according to a preset structuring rule to obtain structured data.

Preferably, the model embedding module is further configured to perform data integration on the structured data to obtain a triple, and input the triple into the graph database;

the model embedding module is further configured to generate a corresponding fraud analysis knowledge graph according to the triples and a preset knowledge graph frame in the graph database.

Preferably, the model embedding module is further configured to extract corresponding case structure data from the triplet according to a case knowledge graph framework in the graph database for data supplementation, and generate a corresponding case knowledge graph;

the model embedding module is also used for extracting corresponding fund flow structure data from the triple according to a fund flow knowledge map frame in the map database for data supplement to generate a corresponding fund flow knowledge map;

and the model embedding module is also used for extracting corresponding call ticket structure data from the triple according to a call ticket knowledge map framework in the map database to perform data supplement and generate a corresponding call ticket knowledge map.

Preferably, the fraud analysis module is further configured to search for strongly associated case data and/or weakly associated case data in the case map database according to the case knowledge map;

the fraud analysis module is further configured to input the found strong association case data and/or the found weak association case data into the fraud analysis model to obtain a fraud analysis result.

Preferably, the fraud analysis module is further configured to extract a target question-answer mapping relationship according to the case knowledge graph, and establish a corresponding question-answer template based on the target question-answer mapping relationship;

the fraud analysis module is further configured to generate a case handling plugin according to the question-answer template and the target question-answer mapping relationship, and fuse the case handling plugin into the fraud analysis model.

Preferably, the fraud analysis module is further configured to search the fund flow knowledge-graph for a target account with a historical fund flow lower than a preset minimum fund flow;

the fraud analysis module is further configured to send a running water record corresponding to the target fund to the fraud analysis model when the target fund larger than a preset transaction fund flow is detected to be credited to the target account, so as to obtain a fraud analysis result.

Preferably, the fraud analysis module is further configured to obtain an origination position of the call ticket data from the call ticket knowledge graph, and count call numbers of each origination position;

the fraud analysis module is further configured to, when a target origination position with a number of call tickets larger than a preset number of call tickets is detected, input the target origination position into the fraud analysis model to obtain a fraud analysis result.

In the method, data conversion is carried out on obtained fraud case source data to obtain structured data, fraud analysis knowledge-maps are built according to the structured data and embedded into corresponding fraud analysis models, corresponding fraud feature data are obtained according to the fraud analysis knowledge-maps and input into the fraud analysis models to obtain fraud analysis results. The method and the system are different from the prior art that only case retrieval and case export functions are provided for cases or simple clue association analysis is carried out, but the defects of poor visibility and low analysis efficiency still exist.

Drawings

FIG. 1 is a flowchart illustrating a first embodiment of a fraud data depth analysis method according to the present invention;

FIG. 2 is a flowchart illustrating a fraud data depth analysis method according to a second embodiment of the present invention;

FIG. 3 is a flowchart illustrating a third embodiment of a fraud data depth analysis method according to the present invention;

FIG. 4 is a flowchart illustrating a fourth embodiment of a fraud data depth analysis method according to the present invention;

FIG. 5 is a block diagram of the structure of the first embodiment of the fraud data depth analysis system of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

An embodiment of the present invention provides a fraud data depth analysis method, and referring to fig. 1, fig. 1 is a flowchart illustrating a first embodiment of the fraud data depth analysis method of the present invention.

In this embodiment, the fraud data deep analysis method includes the following steps:

step S10: carrying out data conversion on the obtained fraud case source data to obtain structured data;

it should be noted that, in order to obtain structured fraud case source data so as to construct a corresponding knowledge graph, after fraud case source data (including but not limited to data stored in storage media such as case databases, fund flow databases, ticket databases, case-involved data excel tables, case files, txt texts, case-involved disks, memory devices, and the like) are obtained, data cleaning can be performed on the fraud case source data to obtain cleaned fraud case source data, so-called data cleaning can be understood as a last procedure for finding and correcting recognizable errors in data files, including checking data consistency, processing invalid values and missing values, and the like; then, performing word segmentation processing on the washed fraud case source data to obtain word segmentation data, wherein the word segmentation processing can be understood as word segmentation according to a keyword string submitted by a user after corresponding query processing is performed on the keyword string; then, knowledge extraction processing is performed on the word segmentation data according to preset structured rules to obtain structured data, wherein the preset structured rules include but are not limited to Business principles (Business principles), Efficiency principles (Efficiency principles), analysis principles (Analytics principles) and Redundancy principles (Redundancy principles). The knowledge extraction process may be understood as performing entity extraction on the segmented word data according to a preset structural rule (also referred to as named entity recognition, where an entity is a basic unit of a knowledge graph and is also an important language unit carrying information in the text, including concepts, persons, organizations, place names, time, and the like) to obtain entity data, performing relationship extraction on the segmented word data according to the preset structural rule (which may be understood as recognizing a semantic relationship between an entity and an entity to obtain key knowledge in a text), obtaining relationship data, and generating the structured data based on the entity data and the relationship data. In addition, after the entity data and the relationship data are obtained, attribute extraction (which can be understood as extracting the weight of the relationship between the entities, that is, attribute information of the entities, and comparing the weight with the relationship, where the relationship reflects the external connection of the entities, and the attribute reflects the internal characteristics of the entities) may be performed to obtain the attribute data, and the entity data, the relationship data, and the attribute data are used as the structured data.

Step S20: building a fraud analysis knowledge graph according to the structured data, and embedding the fraud analysis knowledge graph into a corresponding fraud analysis model;

in a specific implementation, in order to obtain a fraud analysis knowledge graph to realize deep analysis of fraud case source data based on the knowledge graph, and improve case analysis efficiency and visibility of case data, data integration may be performed on the structured data to obtain triples, where the basic forms of the triples include, but are not limited to: entity 1-relationship-entity 2 and entity-attribute values, and then inputting the triples into a graph database, wherein the graph database comprises nodes, edges and attributes, and the nodes can be used for representing entities, events and other objects and can be analogized to records in a relational database, such as people, places, movies and the like; by edges, we can understand the directed lines connecting nodes in the graph, which can be used to represent the relationships between different nodes, such as the relationships of couples, co-workers, etc.; so-called attributes, can be used to describe characteristics of a node or edge, such as name, start and end time of a couple relationship, etc. Then, generating a corresponding fraud analysis knowledge graph according to the triples and a preset knowledge graph frame in the graph database, wherein the preset knowledge graph frame can be understood as a frame which is used for filling data and is used for describing the relation between concepts (such as entities and entities) in a mode layer of the knowledge graph, such as a case knowledge graph frame, a fund flow knowledge graph frame, a ticket knowledge graph frame and the like, namely, extracting case structure data required by the case knowledge graph frame from the triples for data supplement so as to generate the corresponding case knowledge graph; extracting the fund flow structure data required by the fund flow knowledge graph frame from the triple for data supplement so as to generate a corresponding fund flow knowledge graph; and extracting the corresponding call ticket structure data of the call ticket knowledge map frame from the triple group for data supplement so as to generate a corresponding call ticket knowledge map.

Further, in order to obtain fraud analysis models, deep analysis is performed on fraud case source data based on fraud analysis models, and further improve case analysis efficiency, after fraud analysis knowledge-maps are obtained, embedded representation templates can be constructed according to fraud analysis knowledge-maps, the embedded representation templates can be understood as a unified representation of a map, then, corresponding model search bases can be established based on the embedded representation templates, the model search bases can be understood as databases storing various models with a model embedding template adaptation degree greater than a preset adaptation degree, the preset adaptation degree can be set according to actual requirements, this embodiment does not limit this, then, initial-order models corresponding to fraud analysis knowledge-maps can be searched in the model search bases, and then, the initial-order models are trained according to the fraud analysis knowledge-maps, in order to obtain a corresponding fraud analysis model, in the training process, each fraud indicator item can be extracted from the fraud analysis knowledge-graph, the scoring result of each fraud indicator item is calculated according to a preset scoring rule, then, the fraud feature data is used as the input result of the primary model, the scoring result is used as the output result of the primary model, and the primary model is trained according to the input result and the output result to obtain the corresponding fraud analysis model.

Step S30: and acquiring corresponding fraud feature data according to the fraud analysis knowledge-map, and inputting the fraud feature data into the fraud analysis model to obtain a fraud analysis result.

It is easy to understand that, in order to further improve the visibility of case data, fraud feature data corresponding to the fraud analysis knowledge-graph can be obtained according to the fraud analysis knowledge-graph, and the fraud feature data is input into the fraud analysis model, so as to obtain a rating result corresponding to the fraud feature data, where the fraud feature data can be understood as fraud indicators corresponding to each knowledge-graph, then the rating results are ranked from high to low, so as to obtain a rating ranking result, and a fraud analysis result is generated based on the rating result ranking in the first rank, for example, when a case suspect is locked, indexes corresponding to various fraud indicators of each suspect can be added, so as to obtain a suspect index score of each suspect, and the suspect index scores are ranked, so as to obtain a rating ranking result, so that the suspect in the first rank in the rating ranking result is the suspect with the highest suspicion degree, and displaying the fraud analysis result, for example, highlighting the criminals in the first rank order, and displaying the relationship between other criminals and the criminal, where the specific display mode may be set according to actual requirements, such as an image-text mode, a voice prompt mode, and the like, and this embodiment does not limit this.

In this embodiment, the obtained fraud case source data is subjected to data conversion to obtain structured data, a fraud analysis knowledge graph is constructed according to the structured data and embedded into a corresponding fraud analysis model, corresponding fraud feature data is obtained according to the fraud analysis knowledge graph, and the fraud feature data is input into the fraud analysis model to obtain a fraud analysis result. Different from the prior art, only the case retrieval and case export functions are provided for the cases, or simple clue association analysis is performed, but the defects of poor visibility and low analysis efficiency still exist, in the embodiment, the fraud analysis knowledge graph is constructed based on fraud case source data, and is embedded into the corresponding fraud analysis model, so that the deep analysis of the fraud case source data is realized based on the fraud analysis model, and the case analysis efficiency and the visibility of the case data are improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating a fraud data depth analysis method according to a second embodiment of the present invention.

Based on the first embodiment described above, in the present embodiment, the step S20 includes:

step S201: performing data integration on the structured data to obtain a triple, and inputting the triple into a graph database;

step S202: extracting corresponding case structure data from the triples according to a case knowledge graph frame in the graph database to perform data supplement, and generating corresponding case knowledge graphs;

step S203: and embedding the case knowledge graph into a corresponding fraud analysis model.

It should be noted that after the structured data is obtained, the structured data may be subjected to data integration to obtain a triple, and the basic form of the triple includes, but is not limited to: entity 1-relationship-entity 2 and entity-attribute values, and then inputting the triples into a graph database, wherein the graph database comprises nodes, edges and attributes, and the nodes can be used for representing entities, events and other objects and can be analogized to records in a relational database, such as people, places, movies and the like; by edges, we can understand the directed lines connecting nodes in the graph, which can be used to represent the relationships between different nodes, such as the relationships of couples, co-workers, etc.; so-called attributes, can be used to describe characteristics of a node or edge, such as name, start and end time of a couple relationship, etc. Then, according to the triples and case knowledge graph frames in the graph database, a corresponding case knowledge graph is generated, and the case knowledge graph frame can be understood as a frame which is used for describing the relationship between concepts (such as entities and entities) and can be used for filling data in a mode layer of the knowledge graph, namely, case structure data required by the case knowledge graph frame can be extracted from the triples for data supplement, so that the corresponding case knowledge graph is generated.

It is easy to understand that after obtaining the case knowledge graph, the case knowledge graph can be embedded into the corresponding fraud analysis model, the fraud case source data is deeply analyzed based on the fraud analysis model, and the case analysis efficiency is further improved, that is, an embedded representation template can be constructed according to the case knowledge graph, the embedded representation template can be understood as a uniform representation form of a graph, then, a corresponding model search library is established based on the embedded representation template, the model search library can be understood as a database storing a plurality of models with a degree of adaptation to the model embedded template greater than a preset degree of adaptation, the preset degree of adaptation can be set according to actual requirements, this embodiment is not limited thereto, then, an initial-order model corresponding to the case knowledge graph can be searched in the model search library, and then the initial-order model is trained according to the fraud analysis knowledge graph, in the training process, each fraud indicator item can be extracted from the case knowledge graph, the scoring result of each fraud indicator item is calculated according to a preset scoring rule, then, the fraud feature data is used as the input result of the primary model, the scoring result is used as the output result of the primary model, and the primary model is trained according to the input result and the output result to obtain the corresponding fraud analysis model.

Accordingly, the step S30 includes:

step S301: searching strongly-associated case data and/or weakly-associated case data in the case map database according to the case knowledge map;

step S302: inputting the found strong association case data and/or the weak association case data into the fraud analysis model to obtain fraud analysis results.

It should be noted that, in order to implement deep analysis on fraud case source data based on a case knowledge graph and improve case analysis efficiency and visibility of case data, after the case knowledge graph is obtained, strongly-associated case data and/or weakly-associated case data can be searched in the graph database according to the case knowledge graph, and the searched strongly-associated case data and/or weakly-associated case data are input into the fraud analysis model to obtain a fraud analysis result. The strongly associated case data can be understood as two or more case data with strong association relations, such as the same suspect, the same primary account, the same phone number, the same mac address and the like, which are searched in a graph database of a case knowledge graph. The weakly associated case data can be understood as two or more case data with weakly associated relations, such as the case data with the same time period, the same place of occurrence, the same IP address, the same base station number, the same cell number, the same victim (name, certificate type, certificate number and the like), the same social account number of the suspect, the similarity of brief cases, the same honeycomb number, the same mechanism of opening an account (the unit, residence, mailbox and the like where the suspect opens an account) and the like, which are obtained by searching in a database of a case knowledge graph. In the obtaining process of the fraud analysis result, the obtained strong association case data and/or weak association case data can be input into the fraud analysis model, the scoring results corresponding to the fraud indicator items in the strong association case data and/or the weak association case data are obtained, the scoring results are sorted according to a sorting mode from high to low to obtain scoring sorting results, the fraud analysis result is generated based on the scoring result sorted in the first rank, and finally the fraud analysis result is displayed, wherein a specific display mode can be set according to actual needs, such as an image-text mode, a voice reminding mode and the like, which is not limited in this embodiment.

In a specific implementation, in order to further improve visibility of case data, a target question-answer mapping relationship may be extracted according to the case knowledge graph, where the target question-answer mapping relationship may be understood as a mapping relationship between a question template and an answer, a corresponding question-answer template (such as a general question template and an answer template thereof, a comparison sentence template and a reply template thereof) is established based on the target question-answer mapping relationship, a case handling plug-in is generated according to the question-answer template and the target question-answer mapping relationship, and the case handling plug-in is fused into the fraud analysis model, so as to implement diversified display of fraud analysis results.

In this embodiment, the structured data is subjected to data integration to obtain triples, the triples are input into a graph database, corresponding case structure data are extracted from the triples according to a case knowledge graph framework in the graph database for data supplement, corresponding case knowledge graphs are generated, the case knowledge graphs are embedded into corresponding fraud analysis models, strongly-associated case data and/or weakly-associated case data are searched in the graph database according to the case knowledge graphs, and the searched strongly-associated case data and/or weakly-associated case data are input into the fraud analysis models to obtain fraud analysis results. The method comprises the steps of searching strong-association case data and/or weak-association case data in a database based on a generated case knowledge graph, inputting the searched strong-association case data and/or weak-association case data into a fraud analysis model to obtain fraud analysis results, realizing deep analysis of fraud case source data based on the case knowledge graph, further improving case analysis efficiency and visibility of case data, establishing a question-and-answer template, and fusing a case handling assistant established based on the question-and-answer template into the fraud analysis model to realize diversified display of the fraud analysis results.

Referring to fig. 3, fig. 3 is a flowchart illustrating a fraud data depth analysis method according to a third embodiment of the present invention.

step S211: performing data integration on the structured data to obtain a triple, and inputting the triple into a graph database;

step S212: extracting corresponding fund flow structure data from the triples according to a fund flow knowledge graph frame in the graph database for data supplement, and generating corresponding fund flow knowledge graphs;

step S213: and embedding the fund flow knowledge graph into a corresponding fraud analysis model.

It should be noted that after obtaining the structured data, the structured data may be subjected to data integration to obtain a triple, and the basic form of the triple includes, but is not limited to: entity 1-relationship-entity 2 and entity-attribute values, and then inputting the triples into a graph database, wherein the graph database comprises nodes, edges and attributes, and the nodes can be used for representing entities, events and other objects and can be analogized to records in a relational database, such as people, places, movies and the like; by edges, we can understand the directed lines connecting nodes in the graph, which can be used to represent the relationships between different nodes, such as the relationships of couples, co-workers, etc.; so-called attributes, can be used to describe characteristics of a node or edge, such as name, start and end time of a couple relationship, etc. Then, according to the triples and the fund flow knowledge graph framework in the graph database, a corresponding fund flow knowledge graph is generated, wherein the fund flow knowledge graph framework can be understood as a framework which is used for describing the relationship between concepts (such as entities and entities) and is used for filling data in a mode layer of the knowledge graph, namely, case structure data required by the fund flow knowledge graph framework can be extracted from the triples for data supplement, so that the corresponding fund flow knowledge graph is generated.

It is easy to understand that after the fund flow knowledge graph is obtained, the fund flow knowledge graph can be embedded into the corresponding fraud analysis model, the fraud case source data is deeply analyzed based on the fraud analysis model, and the case analysis efficiency is further improved, that is, an embedded representation template can be constructed according to the fund flow knowledge graph, the embedded representation template can be understood as a uniform representation form of a graph, then, a corresponding model search library is established based on the embedded representation template, the model search library can be understood as a database storing a plurality of models with a model embedding template adaptability degree greater than a preset adaptability degree, the preset adaptability degree can be set according to actual requirements, the embodiment does not limit the above, then, a primary model corresponding to the fund flow knowledge graph can be searched in the model search library, and then the primary model is trained according to the fund flow knowledge graph, in the training process, fraud indicator items can be extracted from the fund flow knowledge graph, a scoring result of each fraud indicator item is calculated according to a preset scoring rule, then the fraud feature data is used as an input result of the primary model, the scoring result is used as an output result of the primary model, and the primary model is trained according to the input result and the output result to obtain a corresponding fraud analysis model.

Accordingly, the step S30 includes:

step S311: searching a target account with historical fund flow lower than a preset lowest fund flow from the fund flow knowledge graph;

step S312: and when detecting that the target account is credited with target funds larger than a preset transaction fund flow, sending a running water record corresponding to the target funds into the fraud analysis model to obtain a fraud analysis result.

It should be noted that, in order to implement deep analysis on fraud case source data based on the fund flow knowledge graph and improve case analysis efficiency and visibility of case data, after obtaining the fund flow knowledge graph, a target account whose historical fund flow is lower than a preset minimum fund flow may be searched from the fund flow knowledge graph, and the preset minimum fund flow may be set according to actual needs, for example, 100 ten thousand yuan, which is not limited in this embodiment. When it is detected that the target account is credited with a target fund greater than a preset transaction fund flow, sending a running water record corresponding to the target fund to the fraud analysis model to obtain a fraud analysis result, where the preset transaction fund flow may be set according to an actual requirement, such as 100 ten thousand yuan, this embodiment is not limited to this, and in a specific implementation, in order to further improve the retrieval accuracy, information such as time may be added as a retrieval index item, for example, if it is detected that whether the target account is credited with a target fund greater than the preset transaction fund flow within a preset time period, the retrieval index item may be set according to the actual requirement, which is not limited in this embodiment. In the obtaining process of the fraud analysis result, the obtained running water records may be input into the fraud analysis model, scoring results corresponding to fraud indicator items corresponding to the running water records are obtained, the scoring results are sorted according to a sorting mode from high to low to obtain scoring sorting results, fraud analysis results are generated based on the scoring results sorted in the first rank, and finally, the fraud analysis results are displayed in a specific display mode, which may be set according to actual requirements, such as an image-text mode, a voice reminding mode, and the like, which is not limited in this embodiment.

In this embodiment, the structured data is subjected to data integration to obtain triples, the triples are input into a graph database, corresponding fund flow structure data is extracted from the triples according to a fund flow knowledge map frame in the graph database to perform data supplementation, corresponding fund flow knowledge maps are generated and embedded into corresponding fraud analysis models, target accounts with historical fund flows lower than a preset minimum fund flow are searched from the fund flow knowledge maps, and when the target accounts are detected to be larger than target funds of a preset transaction fund flow, a flow record corresponding to the target funds is sent to the fraud analysis models to obtain fraud analysis results. The abnormal fund flow is monitored in real time based on the generated fund flow knowledge graph, and a corresponding fraud analysis model is established according to the fund flow knowledge graph so as to realize deep analysis of fraud case source data and further improve case analysis efficiency and visibility of case data.

Referring to FIG. 4, FIG. 4 is a flowchart illustrating a fraud data depth analysis method according to a fourth embodiment of the present invention.

step S221: performing data integration on the structured data to obtain a triple, and inputting the triple into a graph database;

step S222: extracting corresponding call ticket structure data from the triple according to a call ticket knowledge map frame in the map database to perform data supplement, and generating a corresponding call ticket knowledge map;

step S223: and embedding the ticket knowledge graph into a corresponding fraud analysis model.

It should be noted that after obtaining the structured data, the structured data may be subjected to data integration to obtain a triple, and the basic form of the triple includes, but is not limited to: entity 1-relationship-entity 2 and entity-attribute values, and then inputting the triples into a graph database, wherein the graph database comprises nodes, edges and attributes, and the nodes can be used for representing entities, events and other objects and can be analogized to records in a relational database, such as people, places, movies and the like; by edges, we can understand the directed lines connecting nodes in the graph, which can be used to represent the relationships between different nodes, such as the relationships of couples, co-workers, etc.; so-called attributes, can be used to describe characteristics of a node or edge, such as name, start and end time of a couple relationship, etc. Then, according to the triples and the call ticket knowledge map framework in the map database, a corresponding call ticket knowledge map is generated, wherein the call ticket knowledge map framework can be understood as a framework which is used for filling data and is used for describing the relationship between concepts (such as entities and entities) in a mode layer of the knowledge map, namely, case structure data required by the call ticket knowledge map framework can be extracted from the triples for data supplement, so that the corresponding call ticket knowledge map is generated.

It is easy to understand that after the ticket knowledge graph is obtained, the ticket knowledge graph can be embedded into the corresponding fraud analysis model, the fraud case source data is deeply analyzed based on the fraud analysis model, and the case analysis efficiency is further improved, that is, an embedded representation template can be constructed according to the ticket knowledge graph, the embedded representation template can be understood as a uniform representation form of a graph, then, a corresponding model search library is established based on the embedded representation template, the model search library can be understood as a database storing a plurality of models with a degree of adaptation to the model embedded template being greater than a preset degree of adaptation, the preset degree of adaptation can be set according to actual requirements, the embodiment does not limit the situation, then, an initial-order model corresponding to the ticket knowledge graph can be searched in the model search library, and then the initial-order model is trained according to the ticket knowledge graph, in the training process, each fraud index item can be extracted from the ticket knowledge graph, the scoring result of each fraud index item is calculated according to a preset scoring rule, then, the fraud feature data is used as the input result of the primary model, the scoring result is used as the output result of the primary model, and the primary model is trained according to the input result and the output result to obtain the corresponding fraud analysis model.

Accordingly, the step S30 includes:

step S321: acquiring the initiating positions of the call ticket data from the call ticket knowledge graph, and counting the number of calls at each initiating position;

step S322: and when the target initiating position with the number of the call bills larger than the preset number of the call bills is detected to exist, inputting the target initiating position into the fraud analysis model to obtain a fraud analysis result.

It should be noted that, in order to implement deep analysis on fraud case source data based on a ticket knowledge graph and improve case analysis efficiency and visibility of case data, after the ticket knowledge graph is obtained, the originating position of the ticket data can be obtained from the ticket knowledge graph, the number of tickets at each originating position is counted, and when the target originating position with the number of tickets greater than a preset number of tickets is detected, the target originating position is input into the fraud analysis model to obtain a fraud analysis result. The preset number of words may be set according to an actual requirement, for example, 100, which is not limited in this embodiment, in a specific implementation, in order to further improve the retrieval accuracy, information such as time may be added as a retrieval index item, for example, it is detected whether a certain position within a preset time period is dialed out of a phone number greater than the preset number of words, when the preset number of words is 100, that is, when it is detected that a certain position within a week is dialed out of a phone number greater than 100, the position may be regarded as a suspicious position, and the retrieval index item may be set according to the actual requirement, which is not limited in this implementation. In the process of obtaining the fraud analysis result, the obtained target launching position may be input into the fraud analysis model, scoring results corresponding to fraud indicator items corresponding to the target launching position are obtained, the scoring results are ranked in a ranking manner from high to low to obtain scoring ranking results, a fraud analysis result is generated based on the ranking first ranking scoring results, and finally, the fraud analysis result is displayed in a specific display manner, which may be set according to actual requirements, such as a graphic and text form, a voice prompt manner, and the like, which is not limited in this embodiment.

In the embodiment, the structured data is subjected to data integration to obtain triples, the triples are input into a graph database, corresponding ticket structure data are extracted from the triples according to a ticket knowledge graph frame in the graph database to perform data supplementation, corresponding ticket knowledge graphs are generated, the ticket knowledge graphs are embedded into corresponding fraud analysis models, the originating positions of the ticket data are obtained from the ticket knowledge graphs, the number of tickets at each originating position is counted, and when a target originating position with the number of tickets larger than a preset number of tickets is detected, the target originating position is input into the fraud analysis model to obtain fraud analysis results. Effective investigation of conversation single-pit points is achieved based on the generated call ticket knowledge graph, a corresponding fraud analysis model is established according to the call ticket knowledge graph so as to achieve deep analysis of fraud case source data, and case analysis efficiency and visibility of case data are further improved.

Referring to fig. 5, fig. 5 is a block diagram illustrating the structure of a first embodiment of the fraud data depth analysis system of the present invention.

As shown in fig. 5, the fraud data deep analysis system proposed by the embodiment of the invention comprises:

the data structuring module 10 is configured to perform data conversion on the obtained fraud case source data to obtain structured data;

a model embedding module 20, configured to construct a fraud analysis knowledge-map according to the structured data, and embed the fraud analysis knowledge-map into a corresponding fraud analysis model;

a fraud analysis module 30 for obtaining corresponding fraud feature data according to the fraud analysis knowledge-map and inputting the fraud feature data into the fraud analysis model to obtain fraud analysis results.

In this embodiment, the obtained fraud case source data is subjected to data conversion to obtain structured data, a fraud analysis knowledge graph is constructed according to the structured data and embedded into a corresponding fraud analysis model, corresponding fraud feature data is obtained according to the fraud analysis knowledge graph, and the fraud feature data is input into the fraud analysis model to obtain a fraud analysis result. Different from the prior art, only the case retrieval and case export functions are provided for cases, or simple clue association analysis is performed, but the defects of poor visibility and low analysis efficiency still exist, in the embodiment, the fraud analysis knowledge graph is constructed based on fraud case source data, and is embedded into the corresponding fraud analysis model, so that deep analysis on fraud case source data is realized based on the fraud analysis model, and the case analysis efficiency and the visibility of case data are improved.

Other embodiments or specific implementation manners of the fraud data deep analysis system of the present invention can refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., a rom/ram, a magnetic disk, an optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

The invention discloses A1 and a fraud data deep analysis method, wherein the fraud data deep analysis method comprises the following steps:

A2, in the fraud data depth analysis method described in a1, the step of performing data conversion on the obtained fraud case source data to obtain structured data specifically includes:

A3, the method for deeply analyzing fraud data as described in a2, wherein the step of performing knowledge extraction processing on the segmented word data according to a preset structuring rule to obtain structured data specifically includes:

structured data is generated based on the entity data and the relationship data.

A4, the fraud data deep analysis method as defined in any one of A1-A3, the step of constructing a fraud analysis knowledge-map according to the structured data and embedding the fraud analysis knowledge-map into a corresponding fraud analysis model, specifically comprising:

A5, the fraud data deep analysis method of A4, wherein the fraud analysis knowledge graph comprises a case knowledge graph, a fund flow knowledge graph and a call ticket knowledge graph;

extracting corresponding case structure data from the triples according to a case knowledge graph framework in the graph database to supplement data, and generating corresponding case knowledge graphs;

embedding the case knowledge graph into a corresponding fraud analysis model;

A6, the fraud data depth analysis method as defined in A5, said step of acquiring corresponding fraud feature data according to said fraud analysis knowledge-map and inputting said fraud feature data into said fraud analysis model to obtain fraud analysis results, specifically comprising:

A7, the method for deep analysis of fraud data as described in A6, further comprising, before the step of finding strongly and weakly associated case data in the graph database according to the case knowledge graph:

A8, the fraud data deep analysis method as defined in A5, said step of extracting corresponding fraud feature data from said fraud analysis knowledge-graph and inputting said fraud feature data into said fraud analysis model to obtain fraud analysis results, specifically comprising:

A9, the fraud data deep analysis method as defined in A5, said step of extracting corresponding fraud feature data from said fraud analysis knowledge-graph and inputting said fraud feature data into said fraud analysis model to obtain fraud analysis results, specifically comprising:

and when the target launching position with the number of the call bills larger than the preset number of the call bills is detected, inputting the target launching position into the fraud analysis model to obtain a fraud analysis result.

A10, the fraud data depth analysis method as defined in any one of A1-A9, the step of embedding the fraud analysis knowledge-map into a corresponding fraud analysis model specifically comprises:

A11, the method for deep analysis of fraud data as defined in A10, the step of training the primary model according to the fraud analysis knowledge-graph to obtain a corresponding fraud analysis model, comprising:

A12, the fraud data depth analysis method as defined in A11, said step of acquiring corresponding fraud feature data according to said fraud analysis knowledge-map and inputting said fraud feature data into said fraud analysis model to obtain fraud analysis results, specifically comprising:

The invention also discloses B13 and a fraud data deep analysis system, wherein the fraud data deep analysis system comprises:

B14, the fraud data deep analysis system as defined in B13, the data structuring module being further configured to perform data cleaning on the obtained fraud case source data, obtaining the cleaned fraud case source data;

B15, the fraud data depth analysis system of B13, the model embedding module being further configured to perform data integration on the structured data, obtain triples, and input the triples into a database of databases;

B16, the fraud data deep analysis system as defined in B15, the model embedding module further configured to extract corresponding case structure data from the triples according to the case knowledge map framework in the graph database for data supplementation, and generate corresponding case knowledge maps;

B17, the fraud data deep analysis system as described in B16, said fraud analysis module further for looking up strongly and/or weakly associated case data in said graph database from said case knowledge-graph;

B18, the fraud data deep analysis system as defined in B17, the fraud analysis module being further configured to extract a target question-answer mapping relationship according to the case knowledge map, and establish a corresponding question-answer template based on the target question-answer mapping relationship;

B19, the fraud data deep analysis system as defined in B16, the fraud analysis module being further configured to find a target account from the fund flow knowledge-graph for which a historical fund flow is lower than a preset minimum fund flow;

B20, the fraud data deep analysis system as B16, the fraud analysis module is further configured to obtain origination positions of ticket data from the ticket knowledge graph, and count the number of words at each origination position;

Claims

1. A fraud data depth analysis method, characterized in that the fraud data depth analysis method comprises:

2. The fraud data depth analysis method of claim 1, wherein the step of performing data transformation on the obtained fraud case source data to obtain structured data specifically comprises:

3. The fraud data deep analysis method of claim 2, wherein the step of performing knowledge extraction processing on the segmented word data according to a preset structuring rule to obtain structured data specifically comprises:

4. The fraud data deep analysis method of any of claims 1 to 3, wherein said step of constructing a fraud analysis knowledge-graph from said structured data and embedding said fraud analysis knowledge-graph into a corresponding fraud analysis model specifically comprises:

5. The fraud data depth analysis method of claim 4, wherein the fraud analysis knowledge-graph comprises a case knowledge-graph, a fund flow knowledge-graph and a ticket knowledge-graph;

extracting corresponding fund flow structure data from the triples according to a fund flow knowledge graph frame in the graph database for data supplement to generate corresponding fund flow knowledge graphs;

embedding the case knowledge graph into a corresponding fraud analysis model;

6. The fraud data deep analysis method of claim 5, wherein said step of obtaining corresponding fraud feature data according to said fraud analysis knowledge-map and inputting said fraud feature data into said fraud analysis model to obtain fraud analysis results specifically comprises:

7. The method of deep analysis of fraud data of claim 6, wherein said step of looking up strongly and weakly associated case data in said graph database from said case knowledge-graph is preceded by further comprising:

8. The fraud data depth analysis method of claim 5, wherein said step of extracting corresponding fraud feature data from said fraud analysis knowledge-graph and inputting said fraud feature data into said fraud analysis model to obtain a fraud analysis result specifically comprises:

9. The fraud data depth analysis method of claim 5, wherein said step of extracting corresponding fraud feature data from said fraud analysis knowledge-graph and inputting said fraud feature data into said fraud analysis model to obtain a fraud analysis result specifically comprises:

10. A fraud data depth analysis system, characterized in that the fraud data depth analysis system comprises: