CN109993233B - Method and system for predicting data auditing objective based on machine learning - Google Patents

Method and system for predicting data auditing objective based on machine learning Download PDF

Info

Publication number
CN109993233B
CN109993233B CN201910283383.0A CN201910283383A CN109993233B CN 109993233 B CN109993233 B CN 109993233B CN 201910283383 A CN201910283383 A CN 201910283383A CN 109993233 B CN109993233 B CN 109993233B
Authority
CN
China
Prior art keywords
attribute
attribute information
verification
data record
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910283383.0A
Other languages
Chinese (zh)
Other versions
CN109993233A (en
Inventor
黄晶
涂威威
陈雨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201910283383.0A priority Critical patent/CN109993233B/en
Publication of CN109993233A publication Critical patent/CN109993233A/en
Application granted granted Critical
Publication of CN109993233B publication Critical patent/CN109993233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method and system for predicting data auditing objectives based on machine learning is provided. The method comprises the following steps: (A) Acquiring a to-be-audited data record and at least one verification data record; (B) Calculating an attribute truth degree characteristic of the data record to be audited based on the at least one verification data record, wherein the attribute truth degree characteristic is used for measuring the truth degree of the at least one attribute information; (C) And acquiring a prediction result of the data record to be audited about the data audit target according to the calculated attribute truth characteristics by utilizing a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data audit target about the corresponding data record to be audited based on the attribute truth characteristics. By the method, the defect that the existing data auditing mode consumes manpower can be overcome.

Description

Method and system for predicting data auditing target based on machine learning
The present application is a divisional application of a patent application entitled "method and system for predicting a data audit target based on machine learning", having an application date of 2016, 6/13/2016 and an application number of 201610420714.7.
Technical Field
The present invention relates generally to the field of data science, and more particularly, to a method and system for predicting data audit targets based on machine learning.
Background
In real life, various data such as customer information, project information, asset information, transaction information, and the like often need to be audited to make business judgments based on the audit results.
For example, in various scenarios such as applying for a transaction, transacting a transaction, registering an activity, etc., various attribute information may be generated (manually filled in or otherwise generated) regarding an identity, an asset, contact information, transaction characteristics, or activity characteristics. Accordingly, the relevant units, organizations, institutions, etc. have to review the information data to make business decisions as to whether the application is in compliance, whether the transaction is legitimate, or whether participation in the event is permitted, etc.
Taking the application credit card as an example, the related aspects will check whether the information data filled by the applicant is true and accurate, so as to determine whether the applicant is involved in fraudulent activities. Typically, in order to verify that a credit card applicant is a fraudulent customer desiring improper profit, a staff member at the credit card center manually verifies that the information written by the applicant, such as the work unit, title, income, contact number, etc., is accurate. Specifically, the worker needs to first check whether the information of the work unit is valid, and then contact the relevant person in charge of the unit to ask the applicant for the actual job title and income, etc. In addition, the staff member needs to manually dial the contact phone of the applicant or his emergency contact to verify whether the number belongs to the applicant or his emergency contact himself or herself, etc.
It can be seen from the above examples that the traditional auditing method needs to consume a large labor cost, and auditors can be easily confused by auditors through a serial communication mode and the like, so as to achieve the purpose of being unfair.
Disclosure of Invention
The exemplary embodiment of the invention aims to overcome the defect that the existing data auditing mode is labor-consuming.
According to an exemplary embodiment of the invention, there is provided a method of predicting data auditing objectives based on machine learning, comprising: (A) Acquiring a to-be-audited data record and at least one verification data record, wherein the to-be-audited data record comprises attribute information about each attribute of a to-be-audited item, a data audit target refers to business judgment made by auditing the attribute information included in the to-be-audited data record, and the at least one verification data record is used for verifying at least one attribute information about at least one attribute in each attribute in the attribute information; (B) Calculating an attribute truth degree characteristic of the data record to be audited based on the at least one verification data record, wherein the attribute truth degree characteristic is used for measuring the truth degree of the at least one attribute information; (C) And acquiring a prediction result of the data record to be audited about the data audit target according to the calculated attribute truth characteristics by utilizing a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data audit target about the corresponding data record to be audited based on the attribute truth characteristics.
Optionally, in the method, the at least one verification data record may be generated from each data source and/or extracted from the data record to be audited, respectively.
Optionally, in the method, step (B) may include: (B1) Predicting at least one predicted attribute information for each of the at least one attribute based on the at least one validation data record; (B2) For each attribute, calculating the difference between the attribute information of each attribute and the predicted at least one predicted attribute information, and obtaining the attribute trueness feature of the nuclear data record to be examined based on the set of the differences calculated for the at least one attribute.
Alternatively, in the method, in the step (B2), the difference between the attribute information of each attribute and the predicted at least one predicted attribute information may be calculated respectively based on respective statistical distribution characteristics of differences between the attribute information of each attribute and the predicted at least one predicted attribute information.
Alternatively, in the method, in the step (B1), the at least one verification data record may be respectively input to at least one attribute verification model trained based on machine learning to predict at least one predicted attribute information about each of the at least one attribute.
Alternatively, in the method, in the step (B1), at least one attribute information about each of the at least one attribute, which is included in the verification data record itself generated from the respective data source, may be used as the at least one predicted attribute information about the each attribute.
Alternatively, in the method, in step (B1), in a case where the verification data record generated from each data source to be input to any one of the attribute verification models itself includes attribute information on an attribute to which the prediction attribute information to be predicted by the any one of the attribute verification models is related, the attribute information included in the verification data record itself may be regarded as the prediction attribute information to be predicted by the any one of the attribute verification models; further, in a case where the verification data record itself does not include attribute information about an attribute to which the predicted attribute information to be predicted by the any attribute verification model is related, the verification data record may be input to the any attribute verification model.
Alternatively, in the method, in the step (B1), in a case where the verification data records generated from the respective data sources to be input into any one of the attribute verification models themselves include attribute information on an attribute to which the predicted attribute information to be predicted by the any one of the attribute verification models is related, when a difference between a point of time at which the verification data records are generated and a point of time at which the audit data records are generated does not exceed a preset time interval, the attribute information included in the verification data records themselves may be regarded as the predicted attribute information to be predicted by the any one of the attribute verification models; the validation data record may be entered into the any of the attribute validation models when a difference between a point in time at which the validation data record is generated and a point in time at which the audit data record is generated exceeds the time interval.
Preferably, after step (C), the method may further comprise: (D) And displaying the prediction result and the attribute truth degree characteristic to a user.
Preferably, in the method, in the step (D), attribute information corresponding to a large difference among the attribute trueness characteristics and predicted attribute information on the attribute information may be presented to the user; and/or, attribute information corresponding to differences in attribute liveness characteristics that play a primary predictive role and predicted attribute information about the attribute information may be presented to the user.
Preferably, in the method, in the step (D), the presented prediction attribute information may be displayed in a statistical distribution diagram of the prediction attribute information.
Preferably, after step (D), the method may further comprise: (E) Feedback is received from a user regarding actual results of the data review goal, wherein the feedback is used to train a goal prediction model and/or an attribute verification model.
Preferably, in the method, in step (C), the prediction result of the data record to be audited about the data audit target may be obtained according to the calculated attribute truth feature and the attribute feature obtained based on at least a part of the attribute information of the data record to be audited.
According to another exemplary embodiment of the invention, there is provided a system for predicting data auditing objectives based on machine learning, including: the data acquisition device is used for acquiring a data record to be audited and at least one verification data record, wherein the data record to be audited comprises attribute information about each attribute of an item to be audited, a data audit target refers to business judgment made by auditing the attribute information included in the data record to be audited, and the at least one verification data record is used for verifying at least one attribute information about at least one attribute in each attribute in the attribute information; the attribute verification device is used for calculating an attribute truth characteristic of the data record to be verified based on the at least one verification data record, wherein the attribute truth characteristic is used for measuring the truth degree of the at least one attribute information; and the target prediction device is used for acquiring a prediction result of the data record to be verified relative to the data verification target according to the calculated attribute truth degree characteristic by using a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data verification target relative to the corresponding data record to be verified based on the attribute truth degree characteristic.
Preferably, in the system, the at least one verification data record may be generated from each data source and/or extracted from the data record to be audited, respectively.
Preferably, in the system, the attribute verifying means may predict at least one predicted attribute information on each of the at least one attribute based on the at least one verification data record; and the attribute verifying device may calculate, for each of the attributes, a difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively, and obtain the attribute trueness feature of the audit data record based on a set of the calculated differences for the at least one attribute.
Preferably, in the system, the attribute verifying means may calculate, for each of the attributes, a difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively, by calculating, based on a statistical distribution characteristic of each of differences between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively.
Preferably, in the system, the attribute verifying means may input the at least one verification data record respectively into at least one attribute verification model trained based on machine learning to predict at least one predicted attribute information about each of the at least one attribute.
Preferably, in the system, the attribute verifying means may use at least one attribute information about each of the at least one attribute included in the verification data record itself generated from the respective data sources as the at least one predicted attribute information about the each attribute.
Preferably, in the system, in a case where the verification data records generated from the respective data sources to be input into any one of the attribute verification models themselves include attribute information about an attribute to which predicted attribute information that the any one of the attribute verification models is to predict relates, the attribute verification means may regard the attribute information included in the verification data records themselves as predicted attribute information that the any one of the attribute verification models is to predict; further, the attribute verification device may input the verification data record to the any attribute verification model when the verification data record itself does not include attribute information on an attribute to which the predicted attribute information to be predicted by the any attribute verification model is related.
Preferably, in the system, in a case where the verification data records generated from the respective data sources to be input into any one of the attribute verification models themselves include attribute information on an attribute to which predicted attribute information is to be predicted by the any one of the attribute verification models, the attribute verification apparatus may regard the attribute information included in the verification data records themselves as the predicted attribute information to be predicted by the any one of the attribute verification models when a difference between a point of time at which the verification data records are generated and a point of time at which the audit data records are generated does not exceed a preset time interval; the attribute verification means may enter the verification data record into the any one of the attribute verification models when the difference between the point in time at which the verification data record is generated and the point in time at which the audit data record is generated exceeds the time interval.
Preferably, the system may further comprise: and the display device is used for displaying the prediction result and the attribute truth characteristics to a user.
Preferably, in the system, the presentation means may further present the attribute information corresponding to a larger difference among the attribute truth characteristics and the predicted attribute information on the attribute information to the user; and/or the presentation means may further present to the user attribute information corresponding to the difference in the attribute trueness feature that plays the main predictive role and predicted attribute information on the attribute information.
Preferably, in the system, the presentation means may display the presented prediction attribute information in a statistical distribution diagram of the prediction attribute information.
Preferably, the system may further comprise: feedback means for receiving feedback from a user regarding actual results of the data review goal, wherein the feedback is used to train the goal prediction model and/or the attribute verification model.
Preferably, in the system, the target prediction device may obtain a prediction result of the data record to be audited with respect to the data audit target according to the calculated attribute truth characteristics and the attribute characteristics obtained based on at least a part of the attribute information of the data record to be audited.
According to another exemplary embodiment of the present invention, a computing device for predicting data auditing objectives based on machine learning is provided, comprising a storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform the steps of: (A) Acquiring a to-be-audited data record and at least one verification data record, wherein the to-be-audited data record comprises attribute information of each attribute of an item to be audited, a data auditing target refers to business judgment made by auditing the attribute information included in the to-be-audited data record, and the at least one verification data record is used for verifying at least one attribute information of at least one attribute in each attribute in the attribute information; (B) Calculating an attribute truth degree characteristic of the data record to be audited based on the at least one verification data record, wherein the attribute truth degree characteristic is used for measuring the truth degree of the at least one attribute information; (C) And acquiring a prediction result of the data record to be audited about the data audit target according to the calculated attribute truth characteristics by utilizing a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data audit target about the corresponding data record to be audited based on the attribute truth characteristics.
Optionally, in the computing device, the at least one verification data record may be generated from each data source and/or extracted from the to-be-audited data record, respectively.
Optionally, in the computing device, step (B) may include: (B1) Predicting at least one predicted attribute information for each of the at least one attribute based on the at least one validation data record; (B2) For each attribute, calculating differences between the attribute information of each attribute and the predicted at least one predicted attribute information, and obtaining attribute truth characteristics of the to-be-checked data records based on the set of the differences calculated for the at least one attribute.
Alternatively, in the calculation device, in the step (B2), the difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information may be calculated based on respective statistical distribution characteristics of differences between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively.
Alternatively, in the computing apparatus, in the step (B1), the at least one verification data record may be respectively input to at least one attribute verification model trained based on machine learning to predict at least one predicted attribute information about each of the at least one attribute.
Alternatively, in the computing apparatus, in the step (B1), at least one attribute information about each of the at least one attribute, which is included in the verification data record itself generated from the respective data source, may be used as the at least one predicted attribute information about the each attribute.
Alternatively, in the computing apparatus, in the step (B1), in a case where the verification data record generated from each data source to be input into any one of the attribute verification models itself includes attribute information on an attribute to which the prediction attribute information to be predicted by the any one of the attribute verification models is related, the attribute information included in the verification data record itself may be regarded as the prediction attribute information to be predicted by the any one of the attribute verification models; further, in a case where the verification data record itself does not include attribute information about an attribute to which the predicted attribute information to be predicted by the any attribute verification model is related, the verification data record may be input to the any attribute verification model.
Alternatively, in the computing apparatus, in the step (B1), in a case where the verification data records generated from the respective data sources to be input into any of the attribute verification models themselves include attribute information on an attribute to which the predicted attribute information to be predicted by the any of the attribute verification models is related, when a difference between a point of time at which the verification data records are generated and a point of time at which the audit data records are generated does not exceed a preset time interval, the attribute information included in the verification data records themselves may be regarded as the predicted attribute information to be predicted by the any of the attribute verification models; the validation data record may be entered into the any of the attribute validation models when a difference between a point in time at which the validation data record is generated and a point in time at which the audit data record is generated exceeds the time interval.
Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, after step (C), the following steps may also be performed: (D) And displaying the prediction result and the attribute truth characteristics to a user.
Optionally, in the computing device, in step (D), attribute information corresponding to a larger difference among the attribute trueness characteristics and predicted attribute information on the attribute information may be presented to a user; and/or, attribute information corresponding to a difference that plays a main predictive role among the attribute liveness features and predicted attribute information about the attribute information may be presented to the user.
Optionally, in the computing device, in the step (D), the presented prediction attribute information may be displayed in a statistical distribution diagram of the prediction attribute information.
Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, after step (D), the following steps may be further performed: (E) Feedback is received from a user regarding actual results of the data review goal, wherein the feedback is used to train a goal prediction model and/or an attribute verification model.
Optionally, in the computing device, in step (C), a prediction result of the data record to be audited about the data audit target may be obtained according to the calculated attribute truth characteristics and the attribute characteristics obtained based on at least a part of the attribute information of the data record to be audited.
In the method and the system for predicting the data auditing targets based on machine learning according to the exemplary embodiment of the invention, the automatic auditing of the data records to be audited can be realized based on a machine learning model by means of additional verification data records.
Drawings
These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:
FIG. 1 illustrates a block diagram of a system for predicting data audit goals based on machine learning, according to an exemplary embodiment of the present invention;
FIG. 2 illustrates a flow diagram of a method of predicting data review targets based on machine learning, according to an exemplary embodiment of the invention;
FIG. 3 illustrates an example of obtaining attribute trueness characteristics of a pending nuclear data record in accordance with an exemplary embodiment of the present invention; and
FIG. 4 illustrates a block diagram of a system for predicting data auditing objectives based on machine learning, according to another exemplary embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the invention, exemplary embodiments thereof will be described in further detail below with reference to the accompanying drawings and detailed description.
In an exemplary embodiment of the invention, data audits are implemented by: and a target prediction model trained on the basis of a machine learning technology is utilized, and a prediction result about a data auditing target is given for the prediction sample characteristics fused with the data to be audited and the verification data. Machine learning is a necessary product of the development of artificial intelligence research to a certain stage, and aims to improve the performance of the system by means of calculation and by using experience. In computer systems, "experience" is usually in the form of "data" from which a "model" can be generated by means of a machine learning algorithm, i.e. the empirical data is supplied to the machine learning algorithm, on the basis of which a model can be generated which, in the face of new situations, provides a corresponding decision, i.e. a prediction result. Machine learning may be implemented in the form of "supervised learning," "unsupervised learning," or "semi-supervised learning," it being noted that the present invention is not particularly limited to a particular machine learning algorithm. In addition, it should be noted that during the process of training and applying the target prediction model, statistical algorithms, business rules and/or expert knowledge, etc. can be utilized to further improve the accuracy of the prediction result.
FIG. 1 illustrates a block diagram of a system for predicting data audit goals based on machine learning, according to an exemplary embodiment of the invention. Specifically, the prediction system provides a processing architecture for predicting a data audit target based on audit data and verification data, wherein the verification data is used for verifying attribute information of the audit data. The system shown in fig. 1 may be implemented entirely by a computer program, as a software program, as a dedicated hardware device, or as a combination of software and hardware. Accordingly, each device constituting the system shown in fig. 1 may be a virtual module that realizes the corresponding function only by means of a computer program, may be a general-purpose or dedicated device that realizes the function by means of a hardware structure, or may be a processor or the like on which the corresponding computer program runs. By utilizing the system, the user can be helped to automatically audit the data, and the original information of the verification data can be fully utilized in the automatic audit process, so that the audit target can be effectively predicted.
As shown in fig. 1, the data obtaining apparatus 10 is configured to obtain a to-be-audited data record and at least one verification data record, where the to-be-audited data record includes attribute information about each attribute of an item to be audited, a data audit target refers to a business judgment made through auditing the attribute information included in the to-be-audited data record, and the at least one verification data record is used to verify at least one attribute information about at least one attribute among the attributes.
According to the exemplary embodiment of the present invention, data auditing refers to auditing the attribute information content of a data record to be audited, and the data can be data manually filled by a client, data submitted by the client on line, data stored or generated in advance, or data received from the outside. Such data may relate to the customer's own information, such as identity, academic calendar, profession, assets, contact details, and the like. Alternatively, the data may relate to information about the business-related item, such as the amount of the transaction, the parties to the transaction, the subject matter, the location of the transaction, etc., with respect to the sales contract. It should be noted that the informational content referred to in the exemplary embodiments of this invention may relate to the presentation or nature of any object or matter in some respect and is not limited to defining or describing individuals, objects, organizations, units, organizations, items, events, or the like. In fact, any information data capable of making a business judgment by auditing thereof may be applied to the exemplary embodiments of the present invention.
Specifically, in the exemplary embodiment of the present invention, the data is audited mainly for making business judgment based on the attribute information of the data. That is, the data auditing target herein aims to make a certain business judgment, and the basis of the business judgment is the audited attribute information.
As an example, the business determination herein may be a conclusion about the authenticity of the attribute information, i.e., determining whether the pending data record is related to fraud. Accordingly, the data audit target may indicate a fraud determination for a credit card application, a fraud determination for a credit card transaction, and the like. For example, a customer applying for a credit card needs to fill in various information data such as his/her identity, occupation, academic calendar, income, contact address, emergency contact, etc., and whether the customer is a fraudulent customer can be determined by checking the filled information data. As another example, when a customer attempts to complete a transaction using credit card payment, information related to the transaction may be reviewed to determine whether the transaction is a fraudulent transaction. Further, for example, in order to determine a fraudulent item among a large number of items (e.g., subsidized items, etc.), various items of information about the item may be reviewed to determine whether the item is a fraudulent item that is not actually implemented in order to obtain improper benefits.
As another example, the service determination may also be a further determination made according to the authenticity of the attribute information (or a combination of the authenticity of the information and the content of the information, etc.), for example, determining the credit status related to the data record to be audited, etc. Accordingly, the data audit target may indicate a credit card credit line determination.
That is, the exemplary embodiments of the present invention are not limited to performing an audit on the information data to determine the fraud risk involved in the related data record, and any situation in which a business determination can be made based on the audit result of the information data is applicable to the exemplary embodiments of the present invention. Accordingly, in exemplary embodiments of the present invention, various formats and types of information data may be audited to make corresponding business decisions. Here, the purpose of the auditing is to measure the authenticity of each item of information data to determine the authenticity of the related data record, or further to obtain other business decisions based on the authenticity of each item of information data.
The data acquisition device 10 may acquire structured or unstructured data from different sources, such as textual data or numerical data. Specifically, the data acquisition device 10 may acquire data to be audited, and as described above, the data record to be audited may include attribute information about each attribute of the item to be audited. In addition, the data obtaining apparatus 10 may obtain verification data for assisting in verifying the data to be audited, and in particular, the data obtaining apparatus 10 may obtain one or more verification data records for verifying the authenticity of at least one attribute information of the data to be audited. Here, the verification data may include other information data originating inside the audit department and information data originating from an external data source (for example, data originating from a data provider, data originating from the internet (for example, a social network site), data originating from a mobile operator, data originating from an APP operator, data originating from an express company, data originating from a credit agency, and the like), and besides, the verification data may be data composed of a part of attribute information extracted from each attribute information of the data to be audited, that is, verification data obtained after at least attribute information as a verification target is excluded from each attribute information of the data to be audited. These pending audit data and validation data may be input to the data acquisition device 10 through an input device, or automatically generated by the data acquisition device 10 based on existing data, or may be obtained by the data acquisition device 10 from a network (e.g., a storage medium (e.g., a data warehouse) on the network), and furthermore, an intermediate data exchange device such as a server may facilitate the data acquisition device 10 in acquiring corresponding data from an external data source. Here, the acquired data may be converted into a format that is easy to handle by a data conversion module such as a text analysis module in the data acquisition apparatus 10. It should be noted that the data acquisition device 10 may be configured as various modules comprised of software, hardware, and/or firmware, some or all of which may be integrated or cooperate together to perform a particular function.
The attribute verification device 20 is configured to calculate an attribute truth degree feature of the data record to be verified based on the at least one verification data record, where the attribute truth degree feature is used to measure a truth degree of the at least one attribute information.
Here, as an example, for the verification data records generated from each data source, the attribute verification apparatus 20 may fuse such verification data records with the data records to be verified at the feature level of the machine learning model, so that the original information of the verification data records can be fully utilized in predicting the data verification target. In addition, the attribute verification apparatus 20 can easily predict the data verification target by performing a self-consistency check on the extracted verification data record of the self-audit data record itself by using such verification data record. It should be noted that the two verification data records described above may be used separately or together according to an exemplary embodiment of the present invention.
Specifically, the attribute verifying unit 20 may use one or more pieces of verification data to calculate the trueness of at least one attribute information of the pending data record. That is, the attribute verification device 20 may use the content of the attribute information included in the verification data record to measure the authenticity of certain attribute information in the audit data record. As an example, if the verification data record already includes attribute information about a certain attribute in the pending audit data record, the attribute information in the verification data record may be directly utilized to measure the trueness of the relevant attribute information in the pending audit data record. For example, in the case that the gender of the client in the data record to be reviewed needs to be reviewed, if the verification data record already includes the gender information of the client, the gender information of the client in the data record to be reviewed can be directly used to measure the true degree of the gender information of the client summarized by the data record to be reviewed.
Furthermore, according to the exemplary embodiment of the present invention, if the verification data record does not include attribute information about an attribute in the data record to be verified, the relevant attribute information in the data record to be verified may be predicted based on the verification data record through a machine learning model trained in advance, and the trueness of the relevant attribute information in the data record to be verified may be further measured based on the predicted value. For example, in the case that the customer income needs to be checked in the credit card application information data record to be checked, if the verification data record from the bank does not include the customer income but includes some asset information or daily transaction information of the customer, the machine learning model trained in advance and using the asset information or daily transaction information to predict the customer income can be used to obtain a customer income predicted value based on the asset information or daily transaction information characteristics of the verification data record, and further measure the truth of the customer income in the audit data record based on the customer income predicted value.
In this way, the attribute verification apparatus 20 may respectively calculate the truth degree of one or more corresponding attribute information in the data records to be verified by using each obtained verification data record, and further integrate all the truth degrees calculated by using all the verification data records for the relevant attribute information into the attribute truth degree feature of the data records to be verified, where the attribute truth degree feature is used to integrally reflect the cross-verification result of the used all verification data records for the truth degree of the data records to be verified.
The target prediction device 30 is configured to obtain a prediction result of the data record to be subjected to the audit with respect to the data audit target according to the calculated attribute validity characteristic by using a target prediction model trained based on machine learning, where the target prediction model is trained to predict the data audit target with respect to the corresponding data record to be subjected to the audit based on the attribute validity characteristic.
Specifically, after obtaining the attribute fidelity characteristics of the data records to be audited, the target prediction apparatus 30 may predict the data audit target of the corresponding data records to be audited using a machine learning model (i.e., a target prediction model) previously trained based on machine learning techniques. The target prediction model is trained based on a machine learning algorithm, and specifically, a large amount of historical data can be used as a training sample, and the target prediction model is trained based on a specific machine learning algorithm, wherein the historical data comprises attribute truth characteristics and corresponding data auditing targets which exist historically, the attribute truth characteristics are used as the characteristics of the training sample, and the corresponding data auditing targets are used as labels (label).
Accordingly, when the attribute truth feature of the new data record to be audited comes, the target prediction apparatus 30 may input the attribute truth feature as the feature of the prediction sample to the target prediction model, and may obtain the prediction result of the new data record to be audited with respect to the data audit target.
It should be understood that the above-described devices may be respectively configured as software, hardware, firmware, or any combination thereof that performs the specified functions. These means may correspond, for example, to an application-specific integrated circuit, to a pure software code or to a combination of software and hardware means or modules. Further, one or more functions implemented by these apparatuses may also be collectively performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).
It can be seen that the attribute verifying device 20 and the target predicting device 30 form a double-layer architecture for predicting a data auditing target of the to-be-audited nuclear data by using the verification data, and in the architecture, for the target predicting model, the to-be-audited nuclear data and the verification data are fused at a characteristic level and/or self-consistency verification is realized, so that the original information of the verification data and/or the to-be-audited data is fully utilized. Compared with the traditional external data using mode, the mode sets a uniform attribute truth degree feature calculation mode in the verification stage, so that the data source expansion becomes easier and more convenient, and because the verification data is only used for obtaining the truth degree of the data to be verified and is not directly used as the feature of the verification target of the prediction data, the design consistency of a prediction model can be kept while the original information of each data is fully and crossly fused.
In addition, in the process of obtaining the attribute truth degree feature of the data record to be audited by using the verification data of other sources, as an optimal mode, a prediction mode based on a machine learning technology can be further introduced, that is, even if the verification data record generated from each data source does not include the attribute information to be audited, a corresponding predicted value can be obtained through the machine learning mode, and then the truth degree of the attribute information to be audited is calculated based on the predicted value, so that the introduction of external data of various sources can be facilitated to help to complete the final data auditing target.
A flow diagram of a method for predicting data audit goals based on machine learning according to an exemplary embodiment of the present invention is described below with reference to fig. 2. Here, the method shown in fig. 2 may be performed by the prediction system shown in fig. 1, may be implemented entirely in software by a computer program, and may be performed by a specifically configured computing device as an example.
For convenience of description, it is assumed that the method shown in fig. 2 is performed by the prediction system shown in fig. 1, and fraud regarding the authenticity of data records is taken as an example of data auditing targets, and accordingly, the method can be applied to detect or identify a potential fraud risk from data to be audited, the potential of the fraud risk can be expressed as a probability of occurrence of a fraud situation, which may include, but is not limited to, applying for a credit card by impersonation, stealing a credit card by others, forging others' information to conduct a transaction, etc., and the magnitude of such fraud probability can be expressed by a numerical value between [0,1], for example.
As shown in the figure, in step S10, a pending audit data record and at least one verification data record are acquired by the data acquisition device 10, where the pending audit data record includes attribute information about each attribute of an item to be audited, and a data audit target refers to a business judgment made by auditing the attribute information included in the pending audit data record.
Here, as an example, each audit-pending data record may correspond to an item to be audited (e.g., an event or object), and accordingly, an audit-pending data record may include various attribute fields that reflect the performance or nature (i.e., attributes) of the event or object in some respect. These attribute fields may be filtered or processed accordingly to further obtain sample features for machine learning. Here, the data acquisition device 10 may collect data in a manual, semi-automatic, or fully automatic manner, or may perform preliminary processing on the collected raw data to generate various processed attribute information that may be characteristic of the sample. As an example, the data acquisition device 10 may collect data in batches.
Here, the data acquisition device 10 may receive the data record to be audited, which is manually input by the user, through an input device (e.g., a workstation). Furthermore, the data acquisition device 10 may systematically retrieve the data records to be audited from the data source in a fully automated manner, for example, by systematically requesting the data source and obtaining the requested data from a response via a timer mechanism implemented in software, firmware, hardware, or a combination thereof. The data sources may include one or more databases or other servers. The manner in which the data is obtained in a fully automated manner may be implemented via an internal network and/or an external network, which may include transmitting encrypted data over the internet. Where servers, databases, networks, etc. are configured to communicate with one another, data collection may be automated without human intervention, but it should be noted that certain user input operations may still exist in this manner. The semi-automatic mode is between the manual mode and the full-automatic mode. The semi-automatic approach differs from the fully automatic approach in that a trigger mechanism activated by the user replaces the timer mechanism. In this case, the request for extracting data is generated only in the case where a specific user input is received. Each time data is acquired, the captured data may preferably be stored in non-volatile memory. As an example, a data warehouse may be utilized to store raw data collected during acquisition as well as processed data.
Unlike the existing data auditing scheme that only acquires and analyzes data information to be audited, according to an exemplary embodiment of the present invention, the data acquisition device 10 also acquires at least one verification data record for verifying at least one attribute information regarding at least one attribute among the respective attributes among the attribute information.
As an example, the at least one verification data record may be generated from each data source, respectively. Specifically, the source of the verification data may be the same as the source of the data to be verified, for example, information data filled when the client applies for opening a credit card to a bank, such as income, academic records, post, property condition, etc., for example, the data acquisition device 10 may acquire other data records of the client at the bank, such as loan records, daily transaction data, etc. Furthermore, the verification data may also be data originating from other private or public sources, such as data originating from a data provider, data originating from the internet (e.g. a social network site), data originating from a mobile operator, data originating from an APP operator, data originating from an express company, data originating from a credit agency, etc. Correspondingly, in order to check information data filled when a customer applies for opening a credit card to a bank, the data acquisition device 10 may further acquire data records of the customer on a social network site, an express company, a mobile operator, an APP operator, and the like.
In the exemplary embodiment of the invention, by means of the fusion of other verification data at the feature level, the original information of the data can be fully utilized to jointly verify the authenticity of the data to be verified. Here, the data acquisition device 10 may query and retrieve, for the data record to be audited, the corresponding verification data record from the data source of the data to be audited and/or from at least one other data source; alternatively, the data acquisition device 100 may query and retrieve the corresponding validation data records from a data warehouse that maintains data records from the data source of the pending validation data and/or at least one other data source.
Furthermore, as another example, the at least one verification data record may also be extracted from the to-be-audited data records, and accordingly, the data obtaining apparatus 10 may extract the verification data record corresponding to the predetermined attribute by excluding the attribute information of the predetermined attribute from the to-be-audited data records, and the verification data record may be used for performing self-consistency check on the attribute information of the predetermined attribute. For example, in order to check the income in the information data filled when the customer applies for opening the credit card to the bank, assuming that the data record to be checked includes four attribute information of income, academic calendar, post and property situation, the corresponding verification data record may include at least one attribute information of the academic calendar, post and property situation excluding the income. It can be seen that besides the predetermined attribute, some other attribute information may be excluded from the data record to be audited and only attribute information that may subsequently be used as predicted sample features of the attribute verification model is retained. It should be noted that in practice, the data records to be audited often contain more attribute information, and for each kind of attribute information to be audited, at least one verification data record can be generated by means of extraction accordingly. Further, according to exemplary embodiments of the present invention, the validation data records may also include both data records generated from the data source and data records extracted from the audit data records.
Optionally, the data acquisition device 10 may store and/or process the acquired data by means of a hardware cluster (such as a Hadoop cluster), for example, storage, sorting and other off-line operations. In addition, the data acquisition device 10 may perform online streaming processing on the acquired data.
By way of example, a data conversion module, such as a text analysis module, may be included in the data acquisition device 10 for converting unstructured data, such as text, into more readily available structured data for further processing or reference. Text-based data may include emails, documents, web pages, graphics, spreadsheets, call center logs, suspicious transaction reports, and the like.
According to an exemplary embodiment of the present invention, the data acquisition device 10 may optionally perform feature engineering processing on the acquired data records, i.e., processing the attribute field values of the data records to acquire attribute information of sample features that can be used for machine learning. For example, the data acquisition device 10 may perform various feature engineering processes such as discretization, field combination, extraction of partial field values, rounding, and the like on the original attribute fields of the received data records, thereby converting the original attribute values into attribute information that can be used as a machine learning feature.
In step S20, an attribute validity characteristic of the to-be-verified data record is calculated by the attribute verification apparatus 20 based on the at least one verification data record, wherein the attribute validity characteristic is used for measuring the validity degree of the at least one attribute information.
Specifically, the attribute verification apparatus 20 may obtain one or more degrees of truth of any attribute information of the data record to be verified based on one or more verification data records.
Assuming that the set of the data records to be audited is X, each data record to be audited may have d attribute information, i.e., the ith data record to be audited XiEpsilon X, which can be expressed as Xi=(xi1;xi2;…;xid) Wherein i and d are positive integers. Here, for xiJ-th attribute information x ofijJ 1 < d, the attribute verification device 20 may be based on the ith verification data record Mk from among verification data Mk from the kth source (including the source from which the verification data was generated and/or some particular manner of extracting the verification data from the data to be audited)iTo obtain xijK is 1. Ltoreq. K.ltoreq.K, K is the total number of sources of authentication data, and MkiAnd xiMay respectively indicate corresponding data records of the same instance (i.e., the ith instance) at different data sources (i.e., the kth source and the data source to be audited).
As an example, the attribute verification apparatus 20 may predict at least one predicted attribute information regarding each of at least one attribute of the pending validation data records based on at least one validation data record.
For example, the attribute verifying means 20 may use at least one attribute information about each of the at least one attribute included in the verification data record itself generated from the respective data sources as at least one predicted attribute information about the each attribute. For example, in order to obtain a predicted value about the academic information among the information data records filled in when the client applies for the credit card, if the academic information is included in a piece of verification data from the recruiting site (for example, resume data of the client), the attribute verification apparatus 20 may use the academic information in the client resume data as the predicted value about the academic information among the application information to be checked by the client.
For another example, the attribute verifying unit 20 may input the at least one verification data record into at least one attribute verification model trained based on machine learning, respectively, to predict at least one predicted attribute information about each of the at least one attribute. The validation data record here may be a data record originating from another data source, which may or may not contain attribute information about the attribute to be predicted. Alternatively, the verification data may be a data record extracted from the data record to be audited, which preferably does not contain the attribute information to be predicted.
Specifically, the attribute verification apparatus 20 may predict the attribute information of each attribute of the data record to be verified using one or more attribute verification models trained in advance. Here, the attribute verification model is trained to predict corresponding attribute information to be reviewed based on particular verification data. As an example, for the verification data record from the external data source, the attribute verification model herein may be characterized by the attribute information of the hash-transformed verification data, in this way, the verification data acquired by the data acquisition apparatus 10 from the external data source may be the hash-transformed non-plaintext data, which helps to ensure the security and confidentiality in data sharing.
For example, for each xijThe attribute verification device 20 may select a previously trained attribute verification model for predicting the jth attribute information based on the verification data Mk from the kth source, and record the ith verification data Mk among the verification data MkiInputting the attribute verification model as a prediction sample to obtain the attribute verification model related to xiThe predicted value Mk of the jth attribute information of (1)ij. Here, any one of the verification data records may be used to make predictions for one or more attribute information to be audited, and preferably, the verification data record extracted from the verification data record can only be used to make predictions for the attribute information that it does not have. Accordingly, the attribute verification apparatus 20 can respectively obtain the predicted values of the respective attribute information based on the verification data of some or all of the K kinds of sources, that is, for the attribute information of different attributes, the attribute verification apparatus 20 can use corresponding one or more attribute verification models for predicting the attribute information based on the verification data from one or more sources.
It should be noted that the manner in which the attribute verifying apparatus 20 acquires the predicted attribute information is not limited to the above two. Preferably, the attribute verification device 20 can effectively merge the above two modes.
As an example, in a case where a verification data record itself generated from each data source to be input to any one of the attribute verification models includes attribute information about an attribute to which prediction attribute information to be predicted by the any one of the attribute verification models is related, the attribute verification apparatus 20 takes the attribute information included in the verification data record itself as the prediction attribute information to be predicted by the any one of the attribute verification models; when the verification data record itself does not include attribute information on an attribute related to predicted attribute information to be predicted by the attribute verification model, the attribute verification device 20 inputs the verification data record into the attribute verification model.
For example, in the above example of predicting the academic attribute value in the credit card application information based on the resume data, there may also be an attribute verification model that predicts the academic attribute based on the resume data. Accordingly, the attribute verifying unit 20 can select a corresponding prediction mode by determining whether a particular resume data record includes a scholarly field. For example, if the academic information is already included in a certain resume data record, the attribute verification apparatus 20 may directly use the academic information in the resume data record as the predicted value of the academic information to be audited; if the resume data record lacks the academic information, the attribute verification device 20 may input the resume data record into a pre-trained attribute verification model for predicting the academic, so as to obtain a predicted value of the academic information by means of machine learning prediction.
Furthermore, since there is a high possibility that a time mismatch problem is encountered when predicting attribute information of data to be checked using a verification data record, for example, a resume data record may be information generated long ago and cannot accurately reflect income information of a customer when applying for a credit card, the attribute verification apparatus 20 may preferably select a corresponding prediction manner based on a time factor when performing prediction.
Specifically, in the case where the verification data records generated from the respective data sources to be input into any one of the attribute verification models themselves include attribute information about an attribute to which the prediction attribute information to be predicted by the any one of the attribute verification models is related, when the difference between the point of time at which the verification data records are generated and the point of time at which the check data records are generated does not exceed a preset time interval, the attribute verification apparatus 20 takes the attribute information included in the verification data records themselves as the prediction attribute information to be predicted by the any one of the attribute verification models; when the difference between the point in time at which the verification data record is generated and the point in time at which the audit data record is generated exceeds the time interval, the attribute verification device 20 inputs the verification data record into any of the attribute verification models.
For example, in the case where the income field in the credit card application information is predicted based on resume data, even if a particular resume data record itself includes the income field, the attribute verification apparatus 20 will determine whether the difference between the time at which this resume time record is generated and the time at which the application information to be reviewed is generated exceeds a preset time interval. If the time difference of the information generation does not exceed the time interval, the attribute verification device 20 may directly use the income information in the resume data record as a prediction value of the income information to be audited; and if the time difference of the information generation exceeds the time interval, the attribute verification apparatus 20 may input the piece of resume data record into a previously trained attribute verification model for predicting revenue based on the resume data to obtain a predicted value about the revenue information by way of machine learning prediction.
After obtaining the attribute prediction information about each of the respective attributes to be reviewed, the attribute verification apparatus 20 may calculate a difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively, for each of the attributes, and obtain the attribute trueness feature of the audit data record based on the set of the differences calculated for the at least one attribute. Here, as a preferable mode, the difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information may be calculated based on respective statistical distribution characteristics of differences between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively. By the method, the obtained attribute truth characteristics can directly reflect the statistical characteristics of various prediction difference values aiming at various attribute information, namely, the prediction difference values are more effectively converted into relevant characteristics of machine learning, so that the prediction performance of the attribute verification model is further ensured.
For example, for the ith pending data record xiJ-th attribute information x of (1)ijThe attribute verification means 20 may each calculate a value based on one or more verification data records MkiPredicted one or more attribute values MkijAnd xijDifference therebetween as xijOne or more differences in truth. After calculating their respective individual differences for all attributes to be reviewed, the attribute verification apparatus 20 may take the set of differences as the review data record xiAttribute liveness feature of (2).
FIG. 3 illustrates an example of obtaining attribute trueness characteristics of a pending nuclear data record in accordance with an exemplary embodiment of the present invention. In this example, the validation data records all come from the various data sources and do not contain the validation data records extracted from the pending validation data records. It should be noted that the above-described manner is merely an exemplary embodiment of the present invention, and the verification data record employed by the present invention is not limited thereto.
It can be seen that in the example shown in fig. 3, the pending review data record may be personal information that is filled out when the customer applies for a credit card, and may include attribute information such as the customer's gender, address, and academic calendar, for example. In this case, the attribute verification apparatus 20 may obtain the attribute authenticity feature of the to-be-verified data record by using the verification data records of various external data sources, in this process, the to-be-verified data of the client and the other verification data of the client are fully fused, and thus the to-be-verified data record can be subsequently verified more effectively.
As an example, the attribute verification device 20 may obtain the estimated degree of truth regarding the gender and the academic history of the client (the estimated degree of truth is used to represent the difference between the attribute information to be audited and the predicted attribute information) by using the social network site data. Here, it should be noted that, according to the exemplary embodiment of the present invention, the attribute verifying apparatus 20 may obtain the estimated truth degree of one or more attribute information of the data to be checked by using the verifying data of a certain source according to the requirement, without any limitation. For example, the attribute verification device 20 may also obtain the estimated trueness of the gender of the customer by using only the social networking site data, or may obtain the estimated trueness of all the attribute information of the customer by using the social networking site data respectively. The attribute verification apparatus 20 also uses the same for other verification data.
The attribute verification device 20 will be described below as an example of obtaining the estimated degree of truth of the sex of the customer using social network site data.
Specifically, assuming that the social network site is the k-th verification data source, the to-be-verified data record x for the i-th clientiTo get its jth attribute xij(i.e., gender) estimated degree of truth, the attribute verification device 20 may obtain the corresponding social networking site data records Mk provided by the data acquisition device 10iHere, xiAnd MkiMay have the same user ID. At the acquisition of MkiThereafter, as an example, the attribute verifying apparatus 20 may verify the MkiInputting a pre-trained model for predicting the gender of the client based on social network site data as a prediction sample to obtain the predicted gender Mk of the ith clientij. Accordingly, the attribute verification apparatus 20 may be Mk-basedijAnd xijDifference between them to obtain xijA predicted truth TkijE.g. Tkij=fkj(Mkij-xij) Here, function fkjCan be used for mixing MkijAnd xijThe difference between them is converted into a numerical value that appropriately reflects the degree of realism. Preferably, the function fkjCan be designed to reflect (Mk)ij-xij) Statistical distribution over all or part of the customer sample space. It should be noted, however, that the present invention is not so limited, and the function f may be designed in any suitable mannerkjTo satisfy model operation and/or modelabilityThe energy requirement. For example, for the attribute of gender, when MkijAnd xijEqual (i.e., mk)ij-xijEqual to 0), tkij=fkj(Mkij-xij) = a1; when Mk isijAnd xijIs not equal (i.e., mk)ij-xijNot equal to 0), tkij=fkj(Mkij-xij) = a2, wherein both a1 and a2 are constants that facilitate numerical operations.
It should be noted that the above examples of estimated gender trueness are only used to describe and explain exemplary embodiments of the present invention, and are not used to limit the scope of the present invention. In fact, according to any attribute information of the data to be examined, those skilled in the art can select the suitable verification data of one or more data sources to respectively predict the attribute, and perform appropriate numerical processing on the difference value between the predicted value and the actual value, so as to obtain one or more truth values reflecting the authenticity of the attribute information.
For example, when the attribute verification apparatus 20 uses the social network site data to obtain the estimated truth degree of the address of the user, the actual address value and the predicted address value can be respectively expressed in the form of geographic coordinates, and accordingly, the difference value therebetween can be represented by the function fkjConverted to an appropriate truth value.
In a similar manner, the attribute verification device 20 may respectively verify one or more attribute information of the to-be-verified data records by using data of different external data sources, such as express company data, app data, mobile operator data, and the like. For example, the attribute verification apparatus 20 may predict the attribute information to be predicted based on the feature of the verification data record using an attribute verification model corresponding to the data source of the verification data and the attribute to be predicted, and obtain a truth value reflecting the authenticity of the attribute based on the difference between the predicted value and the actual value.
Here, as an alternative, in the case where the verification data itself of a certain external data source already contains attribute information of a certain attribute to be subjected to authenticity judgment in the data to be checked, the attribute verification apparatus 20 may choose not to use the attribute verification model to obtain a certainty value for the certain attribute. For example, if social networking site data records MkiAlready containing the gender attribute of the ith client, the attribute verification apparatus 20 may directly connect the Mk without using the attribute verification modeliThe gender attribute is included as the gender Mk of the predicted i-th customerijAnd further based on the predicted value MkijAnd the actual value xijThe difference between them to obtain a truth value. Here, if social networking site data records MkiAbsent a gender attribute (i.e., the data record itself is incomplete), the attribute verification apparatus 20 may predict the gender attribute of the ith client using a pre-trained attribute verification model, which may be used to predict the gender of the client based on at least one attribute of the social networking site data including gender or other than gender (or attributes subjected to hash and/or feature engineering). In this case, attribute verification device 20 may record the social networking site data records Mk that lack the gender attribute valueiThe attribute verification model is input as a prediction sample to predict the gender of the customer based on various characteristics of the prediction sample.
It can be seen that, in the above manner, the attribute verifying apparatus 20 can respectively obtain at least one degree of truth of each attribute of the corresponding to-be-verified data record based on the verification data from different sources. On this basis, the attribute verification device 20 can obtain the attribute authenticity feature of the data record to be audited by synthesizing all acquired authenticity.
Here, as an example, an attribute verification model that can predict a certain attribute of the censored data based on verification data of a certain source may be a prediction model trained based on machine learning techniques. Specifically, for data records with historically confirmed authenticity (for example, data records with historically confirmed authenticity through the same data audit and made business judgment, such as application information data filled by credit card application clients with historically confirmed non-fraud), it may be considered that each attribute of the data is a true value, the true values are used as a label (label) under supervised learning, and each attribute of the corresponding verification data record is used as a corresponding feature, so as to train an attribute verification model for predicting the attribute value based on the verification data.
These attribute verification models may be pre-trained by attribute verification apparatus 20. Further, the attribute verification model may be trained beforehand by a model training device (not shown) provided in the system shown in fig. 1, or may be trained beforehand by an external device other than the system shown in fig. 1, in which case the attribute verification device 20 may receive the attribute verification model trained thereby from the model training device or the external device.
It should be noted that while the above illustrates an example of obtaining attribute authenticity signatures using verification data records generated from various data sources, other types of verification data records, such as verification data records injected and extracted from the data records to be audited, may also be applied to the present invention, and different types of verification data records may also be used together to obtain attribute authenticity signatures. Here, as an example, for each verification data record, a corresponding attribute verification model or other means may be used to determine a predicted value for the corresponding attribute to be audited.
Referring again to fig. 2, in step S30, a prediction result of the data record to be audited about the data audit target is obtained according to the calculated attribute truth characteristics by the target prediction apparatus 30 using a target prediction model trained based on machine learning, where the target prediction model is trained to predict the data audit target about the corresponding data record to be audited based on the attribute truth characteristics.
As described above, according to the exemplary embodiments of the present invention, unlike the conventional way of performing manual review on each attribute information of a data record to be reviewed directly, it is not simple to expand the data to be reviewed (for example, external data of a user is introduced to predict a review target directly), but the authenticity of the data to be reviewed is first verified from different aspects by using verification data from various sources, and the data review target is predicted according to the degree of authenticity obtained through multi-party verification. Accordingly, for the target prediction model, the characteristic level of the target prediction model sufficiently fuses data information from various sources (including various information of the data to be audited and/or other information from various data sources), so that the prediction performance for the auditing target is remarkably improved.
As an example, for the ith pending data record xiIn other words, as described above, the attribute verifying apparatus 20 can acquire the data xi(ii) the degree of truth Tk of each attribute informationijAnd the attribute truth degree characteristic is formed, wherein k indicates the serial number of the verification data source, and j indicates the attribute serial number of the data to be checked. Accordingly, the target prediction device 30 may input the prediction samples composed of the above-mentioned truth characteristics into the target prediction model, thereby obtaining the prediction result of the data auditing target.
As an example, it is assumed that the data auditing target is to make a business judgment about whether or not it is card-opening fraud based on personal information filled in when a customer applies for opening a credit card, and specifically, it is necessary to judge whether or not there is false information in information data filled in by the customer. For this reason, the attribute verification apparatus 20 may obtain the authenticity value of the personal information filled by the client based on the verification data of various sources, and synthesize the attribute authenticity feature of the client to be verified. Accordingly, the target prediction device 30 inputs the attribute truth characteristics as a prediction sample into the target prediction model, and then obtains a prediction result about whether the customer is suspected of card opening fraud.
Here, the target prediction model may be a prediction model trained based on a machine learning technique. Specifically, for data records (for example, finally determined fraudulent data records and non-fraudulent data records and the like) which have historically obtained data auditing results, the auditing results are used as labels (label) under supervised learning, corresponding attribute truth characteristics are used as corresponding training sample characteristics, and then a target prediction model for predicting data auditing targets based on the attribute truth characteristics can be trained.
It should be noted that, according to the exemplary embodiment of the present invention, the data auditing target is not limited to the fraud problem regarding the authenticity of the attribute information of the data to be audited, but may also be other business judgments related to the authenticity of the attribute information or further business judgments based on the authenticity of the attribute information along with other information. In this case, the target prediction model may be trained to predict data audit targets based on the attribute truth characteristics of the data records to be audited along with at least a portion of the attribute information of the data records to be audited. For example, the target prediction model may be trained to predict credit limits for credit cards based on attribute liveness characteristics and the content of certain attribute information. Accordingly, in step S30, the target predicting device 30 may obtain a prediction result of the data record to be audited with respect to the data audit target according to the calculated attribute truth feature and the attribute feature obtained based on at least a part of the attribute information of the data record to be audited.
Similarly, the target prediction model may be trained in advance by the target prediction apparatus 30. Further, the target prediction model may be trained in advance by a model training device (not shown) provided in the system shown in fig. 1, or may be trained in advance by an external device other than the system shown in fig. 1, in which case the target prediction device 30 may receive the target prediction model trained by the model training device or the external device.
After obtaining the prediction results regarding the data auditing targets in step S30, the target prediction apparatus 30 may store the prediction results in a corresponding memory for further processing at a later time, or the prediction results may be transmitted to an external investigation apparatus. In addition, the prediction result can also be displayed to the auditing user through the output device, and further, as an optional mode, feedback information can also be received from the auditing user.
FIG. 4 illustrates a block diagram of a system for predicting data auditing objectives based on machine learning, according to another exemplary embodiment of the present invention. Here, the data acquisition device 10, the attribute verification device 20, and the object prediction device 30 shown in fig. 4 may operate in a similar manner to the respective devices shown in fig. 1, except that the system of fig. 4 further includes a presentation device 40.
Specifically, after the target prediction apparatus 30 obtains the prediction result of the data record to be audited about the data audit target, the display apparatus 40 may display the prediction result to the user, for example, the display apparatus 40 displays the prediction result to the user through a display screen, and accordingly, the user may directly make a final business decision according to the prediction result, or the user may manually check some important information according to the prediction result and make a final business decision after checking the important information.
Here, in order to better help the user to make business decisions, the display device 40 may optionally provide not only the prediction result to the user, but also some auxiliary information that helps the user to make business decisions, so that the user can make business decisions by intensively checking the attribute information. For example, presentation device 40 may present the prediction to the user along with attribute trueness characteristics. Because the attribute truth characteristics can reflect the estimated truth values of all attribute information in the to-be-examined data records, a user can selectively and manually verify some attribute information with larger truth deviation by combining the prediction results. Here, when the relevant information is presented to the user, the presented information may be graphically and/or charted to enhance its readability.
Preferably, the presentation means 40 may present to the user attribute information corresponding to a large difference among the attribute liveness characteristics and predicted attribute information about the attribute information; and/or, presenting attribute information corresponding to differences in attribute liveness characteristics that play a primary predictive role and predicted attribute information about the attribute information to a user.
In particular, presentation device 40 may separately display and/or highlight attribute information corresponding to large differences among attribute liveness features and predicted attribute information related to the attribute information so that these less-lively attribute information are readily observable by a user. In addition, the display device 40 may display and/or highlight the difference in its primary predictive role (e.g., the feature corresponding to the difference has a higher weight in the objective function of the objective predictive model; or, for a certain data record to be reviewed, the feature corresponding to the difference mainly affects the prediction result). Here, as an example, the presentation means 40 may display the presented prediction attribute information in a statistical distribution diagram of the prediction attribute information. By the method, the user can more intuitively know the statistical characteristics of the related information, and the judgment of the user is facilitated for subsequent operation.
For example, after observing such important attribute information, the user can further review the attribute information and make final business decisions based on the results of the review, i.e., actual results with respect to the data review targets.
Alternatively, the prediction system according to the exemplary embodiment of the present invention may receive feedback about these business decisions from the user, and these feedbacks, together with corresponding attribute fidelity characteristics and other prediction characteristics, may serve as training samples of the target prediction model, where the business decisions represented by the feedback may serve as markers in the target prediction model regarding data auditing targets, and the corresponding prediction sample characteristics may serve as characteristics of the training samples.
In addition, attribute information whose authenticity is confirmed in the process of business decision making may also be fed back to the attribute verification model as a marker corresponding to the corresponding verification data record, and the attribute information of the corresponding verification data record may be used as a training feature. However, it should be noted that the manner of obtaining the indicia of the attribute verification model is not limited thereto.
Through the mode, the prediction system can effectively perform incremental learning according to the newly added to-be-audited data records and the actual audit results thereof, so that the model is updated in time.
Accordingly, the system of fig. 1 or 4 may further comprise feedback means (not shown) for receiving feedback from the user regarding the actual results of the data review goal, wherein the feedback is used to train the goal prediction model and/or the attribute verification model. Alternatively, the feedback device may be an interactive device integrated with the presentation device.
Specifically, after the auditor makes a final actual business decision, the actual result of the decision may be input to the feedback device, and the feedback device may use the actual business decision and/or the related data attribute information as a new training sample of the target prediction model and/or the attribute verification model, so that the model may continuously learn from the new data sample, correct itself in time, and adapt to changes. For example, the self-learning capability can effectively solve the endless new fraud and counterfeit means, and avoids the need for experts to continuously make new rules according to new fraud behaviors.
It should be noted that the prediction system of the data auditing target may completely depend on the operation of the computer program to realize the corresponding functions, that is, each device corresponds to each step in the functional architecture of the computer program, so that the whole system is called by a special software package (e.g., lib library) to realize the corresponding prediction functions.
Alternatively, each of the means shown in fig. 1 or 4 may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.
Here, the exemplary embodiments of the present invention may also be realized as a computing apparatus including a storage part and a processor, the storage part having stored therein a set of computer-executable instructions that, when executed by the processor, perform the above-described prediction method of a data review target.
In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions described above.
The computing device need not be a single computing device, but can be any device or collection of circuits capable of executing the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
Some of the operations described in the prediction method related to the data auditing target may be implemented by software, some of the operations may be implemented by hardware, and other operations may be implemented by a combination of hardware and software.
The processor may execute instructions or code stored in one of the memory components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., such that the processor can read files stored in the storage component.
Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.
The operations involved in the prediction methods described above with respect to data audit objectives may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.
Specifically, as described above, a computing device for predicting data audit targets based on machine learning according to an exemplary embodiment of the present invention may include a storage component and a processor, the storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform the steps of: (A) Acquiring a to-be-audited data record and at least one verification data record, wherein the to-be-audited data record comprises attribute information of each attribute of an item to be audited, a data auditing target refers to business judgment made by auditing the attribute information included in the to-be-audited data record, and the at least one verification data record is used for verifying at least one attribute information of at least one attribute in each attribute in the attribute information; (B) Calculating an attribute truth degree characteristic of the data record to be audited based on the at least one verification data record, wherein the attribute truth degree characteristic is used for measuring the truth degree of the at least one attribute information; (C) And acquiring a prediction result of the data record to be audited about the data audit target according to the calculated attribute truth characteristics by utilizing a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data audit target about the corresponding data record to be audited based on the attribute truth characteristics.
It should be noted that the details of the processing of the prediction method regarding the data auditing target according to the exemplary embodiments of the present invention have been described above with reference to fig. 2 to 4, and the details of the processing when the computing device performs the steps will not be described herein again.
While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims (39)

1. A method of predicting data review goals based on machine learning, comprising:
(A) Acquiring a to-be-audited data record and at least one verification data record, wherein the to-be-audited data record comprises attribute information about each attribute of a to-be-audited item, a data audit target refers to business judgment made by auditing the attribute information included in the to-be-audited data record, and the at least one verification data record is used for verifying at least one attribute information about at least one attribute in each attribute in the attribute information;
(B) Calculating an attribute truth characteristic of the data record to be audited based on the at least one verification data record, wherein the attribute truth characteristic is used for measuring the truth degree of the at least one attribute information;
(C) And acquiring a prediction result of the data record to be checked relative to the data checking target according to the calculated attribute truth characteristics by utilizing a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data checking target relative to the corresponding data record to be checked based on the attribute truth characteristics.
2. The method of claim 1, wherein the at least one verification data record is generated from a respective data source and/or extracted from the to-be-audited data record, respectively.
3. The method of claim 1, wherein step (B) comprises:
(B1) Predicting at least one predicted attribute information for each of the at least one attribute based on the at least one validation data record;
(B2) For each attribute, calculating the difference between the attribute information of each attribute and the predicted at least one predicted attribute information, and obtaining the attribute trueness feature of the nuclear data record to be examined based on the set of the differences calculated for the at least one attribute.
4. The method of claim 3, wherein in step (B2), the difference between the attribute information of each attribute and the predicted at least one predicted attribute information is calculated based on a statistical distribution characteristic of the difference between the attribute information of each attribute and the predicted at least one predicted attribute information, respectively.
5. The method of claim 3, wherein, in step (B1), the at least one verification data record is respectively input into at least one attribute verification model trained based on machine learning to predict at least one predicted attribute information with respect to each of the at least one attribute.
6. The method according to claim 3, wherein in step (B1), at least one attribute information about each of the at least one attribute, which the verification data record itself generated from the respective data source includes, is taken as the at least one predicted attribute information about said each attribute.
7. The method according to claim 5, wherein in step (B1), in a case where the verification data record generated from each data source to be input into any one of the attribute verification models itself includes attribute information on an attribute to which the prediction attribute information to be predicted by the any one of the attribute verification models is related, the attribute information included in the verification data record itself is taken as the prediction attribute information to be predicted by the any one of the attribute verification models; and, when the verification data record itself does not include attribute information on an attribute relating to predicted attribute information to be predicted by the any attribute verification model, inputting the verification data record into the any attribute verification model.
8. The method according to claim 7, wherein in the step (B1), in a case where the verification data records generated from the respective data sources to be input into any one of the attribute verification models themselves include attribute information on an attribute to which predicted attribute information to be predicted by the any one of the attribute verification models is related, when a difference between a point of time at which the verification data record is generated and a point of time at which a check data record is generated does not exceed a preset time interval, the attribute information included in the verification data record itself is taken as predicted attribute information to be predicted by the any one of the attribute verification models; and when the difference between the time point of generating the verification data record and the time point of generating the to-be-audited data record exceeds the time interval, inputting the verification data record into any attribute verification model.
9. The method of claim 1 or 5, further comprising, after step (C):
(D) And displaying the prediction result and the attribute truth characteristics to a user.
10. The method of claim 9, wherein in step (D), attribute information corresponding to a larger difference among the attribute liveness features and predicted attribute information about the attribute information are presented to a user; and/or, presenting attribute information corresponding to differences in attribute liveness characteristics that play a primary predictive role and predicted attribute information about the attribute information to a user.
11. The method of claim 10, wherein, in the step (D), the presented prediction attribute information is displayed in a statistical distribution diagram of the prediction attribute information.
12. The method of claim 9, further comprising, after step (D):
(E) Feedback is received from a user regarding actual results of the data review goal, wherein the feedback is used to train a goal prediction model and/or an attribute verification model.
13. The method according to claim 1, wherein in step (C), the predicted result of the data record to be audited about the data audit target is obtained according to the calculated attribute truth characteristic and the attribute characteristic obtained based on at least a part of the attribute information of the data record to be audited.
14. A system for predicting data auditing objectives based on machine learning, comprising:
the data acquisition device is used for acquiring a data record to be audited and at least one verification data record, wherein the data record to be audited comprises attribute information about each attribute of an item to be audited, a data audit target refers to business judgment made by auditing the attribute information included in the data record to be audited, and the at least one verification data record is used for verifying at least one attribute information about at least one attribute in each attribute in the attribute information;
the attribute verification device is used for calculating an attribute truth characteristic of the data record to be verified based on the at least one verification data record, wherein the attribute truth characteristic is used for measuring the truth degree of the at least one attribute information;
and the target prediction device is used for acquiring a prediction result of the data record to be verified relative to the data verification target according to the calculated attribute truth degree characteristic by using a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data verification target relative to the corresponding data record to be verified based on the attribute truth degree characteristic.
15. The system of claim 14, wherein the at least one verification data record is generated from each data source and/or extracted from the to-be-audited data record, respectively.
16. The system of claim 14, wherein the attribute verification means predicts at least one predicted attribute information for each of the at least one attribute based on the at least one verification data record; and the attribute verifying device respectively calculates the difference between the attribute information of each attribute and the predicted at least one predicted attribute information for each attribute, and obtains the attribute truth degree characteristic of the to-be-examined data record based on the set of the calculated differences for the at least one attribute.
17. The system according to claim 16, wherein the attribute verifying means calculates, for each of the attributes, a difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information by based on a statistical distribution characteristic of a difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively.
18. The system of claim 16, wherein the attribute verification means inputs the at least one verification data record respectively into at least one attribute verification model trained based on machine learning to predict at least one predicted attribute information for each of the at least one attribute.
19. The system according to claim 16, wherein the attribute verifying means treats at least one attribute information about each of the at least one attribute, which the verification data record itself generated from the respective data sources includes, as the at least one predicted attribute information about said each attribute.
20. The system according to claim 18, wherein in a case where the verification data record itself generated from each data source to be input into any one of the attribute verification models includes attribute information on an attribute to which predicted attribute information to be predicted by the any one of the attribute verification models relates, the attribute verification means regards the attribute information included in the verification data record itself as predicted attribute information to be predicted by the any one of the attribute verification models; further, the attribute verification device inputs the verification data record into the any attribute verification model when the verification data record itself does not include attribute information on an attribute relating to predicted attribute information to be predicted by the any attribute verification model.
21. The system according to claim 20, wherein in a case where the verification data records generated from the respective data sources to be input into any one of the attribute verification models themselves include attribute information on an attribute to which the prediction attribute information to be predicted by the any one of the attribute verification models relates, when a difference between a point of time at which the verification data record is generated and a point of time at which the audit data record is generated does not exceed a preset time interval, the attribute verification means treats the attribute information included in the verification data record itself as the prediction attribute information to be predicted by the any one of the attribute verification models; and when the difference between the time point of generating the verification data record and the time point of generating the pending verification data record exceeds the time interval, the attribute verification device inputs the verification data record into any one attribute verification model.
22. The system of claim 14 or 18, further comprising:
and the display device is used for displaying the prediction result and the attribute truth characteristics to a user.
23. The system of claim 22, wherein the presentation means further presents the attribute information corresponding to a larger difference among the attribute truth features and the predicted attribute information about the attribute information to the user; and/or the presentation means further presents to the user attribute information corresponding to the difference in the attribute trueness feature that plays the main predictive role and predicted attribute information on the attribute information.
24. The system of claim 23, wherein the presentation means displays the presented predicted attribute information in a view of statistical distribution of the predicted attribute information.
25. The system of claim 22, further comprising:
feedback means for receiving feedback from a user regarding actual results of the data review goal, wherein the feedback is used to train the goal prediction model and/or the attribute verification model.
26. The system as claimed in claim 14, wherein the target prediction device obtains the prediction result of the data record to be audited about the data audit target according to the calculated attribute truth characteristics and the attribute characteristics obtained based on at least a part of the attribute information of the data record to be audited.
27. A computing device for predicting data auditing objectives based on machine learning, comprising a storage component having stored therein a set of computer-executable instructions that, when executed by a processor, perform the steps of:
(A) Acquiring a to-be-audited data record and at least one verification data record, wherein the to-be-audited data record comprises attribute information of each attribute of an item to be audited, a data auditing target refers to business judgment made by auditing the attribute information included in the to-be-audited data record, and the at least one verification data record is used for verifying at least one attribute information of at least one attribute in each attribute in the attribute information;
(B) Calculating an attribute truth degree characteristic of the data record to be audited based on the at least one verification data record, wherein the attribute truth degree characteristic is used for measuring the truth degree of the at least one attribute information;
(C) And acquiring a prediction result of the data record to be audited about the data audit target according to the calculated attribute truth characteristics by utilizing a target prediction model trained based on machine learning, wherein the target prediction model is trained to predict the data audit target about the corresponding data record to be audited based on the attribute truth characteristics.
28. The computing device of claim 27, wherein the at least one verification data record is generated from a respective data source and/or extracted from the to-be-audited data record, respectively.
29. The computing device of claim 27, wherein step (B) comprises:
(B1) Predicting at least one predicted attribute information for each of the at least one attribute based on the at least one validation data record;
(B2) For each attribute, calculating the difference between the attribute information of each attribute and the predicted at least one predicted attribute information, and obtaining the attribute trueness feature of the nuclear data record to be examined based on the set of the differences calculated for the at least one attribute.
30. The computing apparatus of claim 29, wherein in step (B2), the difference between the attribute information of each of the attributes and the predicted at least one predicted attribute information is computed based on a statistical distribution characteristic of each of differences between the attribute information of each of the attributes and the predicted at least one predicted attribute information, respectively.
31. The computing apparatus of claim 29, wherein in step (B1), the at least one validation data record is respectively input into at least one attribute validation model trained based on machine learning to predict at least one predicted attribute information for each of the at least one attribute.
32. The computing apparatus according to claim 29, wherein in step (B1), at least one attribute information about each of the at least one attribute, which is included in the verification data record itself generated from the respective data source, is taken as the at least one predicted attribute information about said each attribute.
33. The computing apparatus according to claim 31, wherein in step (B1), in a case where a verification data record generated from each data source to be input into any one of the attribute verification models itself includes attribute information on an attribute to which prediction attribute information to be predicted by the any one of the attribute verification models is related, the attribute information included in the verification data record itself is taken as the prediction attribute information to be predicted by the any one of the attribute verification models; and, when the verification data record itself does not include attribute information on an attribute relating to predicted attribute information to be predicted by the any attribute verification model, inputting the verification data record into the any attribute verification model.
34. The computing apparatus according to claim 33, wherein in the step (B1), in a case where the verification data records generated from the respective data sources to be input into any of the attribute verification models themselves include attribute information about an attribute to which predicted attribute information to be predicted by the any of the attribute verification models is to be referred, when a difference between a point in time at which the verification data record is generated and a point in time at which a censored data record is generated does not exceed a preset time interval, the attribute information included in the verification data record itself is taken as predicted attribute information to be predicted by the any of the attribute verification models; and when the difference between the time point of generating the verification data record and the time point of generating the to-be-audited data record exceeds the time interval, inputting the verification data record into any attribute verification model.
35. The computing device of claim 27 or 31, wherein when the set of computer-executable instructions is executed by the processor, after step (C), further performing the step of:
(D) And displaying the prediction result and the attribute truth characteristics to a user.
36. The computing device of claim 35, wherein in step (D), attribute information corresponding to a greater difference among attribute liveness features and predicted attribute information about the attribute information are presented to a user; and/or, presenting attribute information corresponding to differences in attribute liveness characteristics that play a primary predictive role and predicted attribute information about the attribute information to a user.
37. The computing device of claim 36, wherein in step (D), the exposed prediction attribute information is displayed in a statistical distribution diagram of the prediction attribute information.
38. The computing device of claim 35, wherein when the set of computer-executable instructions is executed by the processor, after step (D), further performing the step of:
(E) Feedback is received from a user regarding actual results of the data review goal, wherein the feedback is used to train a goal prediction model and/or an attribute verification model.
39. The computing device of claim 27, wherein in step (C), the predicted result of the data record to be audited with respect to the data audit target is obtained according to the calculated attribute liveness characteristic and an attribute characteristic obtained based on at least a part of attribute information of the data record to be audited.
CN201910283383.0A 2016-06-13 2016-06-13 Method and system for predicting data auditing objective based on machine learning Active CN109993233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910283383.0A CN109993233B (en) 2016-06-13 2016-06-13 Method and system for predicting data auditing objective based on machine learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910283383.0A CN109993233B (en) 2016-06-13 2016-06-13 Method and system for predicting data auditing objective based on machine learning
CN201610420714.7A CN106096657B (en) 2016-06-13 2016-06-13 Based on machine learning come the method and system of prediction data audit target

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201610420714.7A Division CN106096657B (en) 2016-06-13 2016-06-13 Based on machine learning come the method and system of prediction data audit target

Publications (2)

Publication Number Publication Date
CN109993233A CN109993233A (en) 2019-07-09
CN109993233B true CN109993233B (en) 2022-11-01

Family

ID=57846540

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201610420714.7A Active CN106096657B (en) 2016-06-13 2016-06-13 Based on machine learning come the method and system of prediction data audit target
CN201910283383.0A Active CN109993233B (en) 2016-06-13 2016-06-13 Method and system for predicting data auditing objective based on machine learning

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201610420714.7A Active CN106096657B (en) 2016-06-13 2016-06-13 Based on machine learning come the method and system of prediction data audit target

Country Status (1)

Country Link
CN (2) CN106096657B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886425A (en) * 2017-10-25 2018-04-06 上海壹账通金融科技有限公司 Credit evaluation method, apparatus, equipment and computer-readable recording medium
WO2019193665A1 (en) 2018-04-03 2019-10-10 Global Mobility Service株式会社 Credit checking assistance system, vehicle-mounted device, vehicle, server, credit checking assistance method, credit checking assistance program, and storage medium
US11580419B2 (en) 2018-05-18 2023-02-14 Kyndryl, Inc. Computer environment infrastructure compliance audit result prediction
CN109492981A (en) * 2018-09-14 2019-03-19 龙马智芯(珠海横琴)科技有限公司 The checking method and device of information
CN109492135B (en) * 2018-10-27 2024-03-19 平安科技(深圳)有限公司 Data auditing method and device based on data processing
CN110187945A (en) * 2019-04-26 2019-08-30 平安科技(深圳)有限公司 Indicate information generating method, device, terminal and storage medium
CN112435035A (en) * 2019-08-09 2021-03-02 阿里巴巴集团控股有限公司 Data auditing method, device and equipment
CN110851500B (en) * 2019-11-07 2022-10-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN110837527B (en) * 2019-11-14 2022-03-22 深圳市超算科技开发有限公司 Safe application method and system of machine learning model
CN111091350A (en) * 2019-12-12 2020-05-01 中国银行股份有限公司 Method, device and equipment for auditing and processing service data and storage medium
CN113313279A (en) * 2020-02-27 2021-08-27 北京沃东天骏信息技术有限公司 Document auditing method and device
CN111898885B (en) * 2020-07-16 2024-05-24 广东金宇恒软件科技有限公司 Collective economic management and supervision system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118434A (en) * 2006-08-03 2008-02-06 松下电工株式会社 Anomaly monitoring device
JP2010072779A (en) * 2008-09-17 2010-04-02 Mitsubishi Electric Corp Data classifying device, computer program, and data classification method
CN103123634A (en) * 2011-11-21 2013-05-29 北京百度网讯科技有限公司 Copyright resource identification method and copyright resource identification device
CN103383723A (en) * 2012-08-10 2013-11-06 眼验有限责任公司 Method and system for spoof detection for biometric authentication
CN103425845A (en) * 2013-08-26 2013-12-04 中国科学院合肥物质科学研究院 Safe reliability evaluating system and method for nuclear power device
CN104091122A (en) * 2014-06-17 2014-10-08 北京邮电大学 Detection system of malicious data in mobile internet
CN104426884A (en) * 2013-09-03 2015-03-18 深圳市腾讯计算机***有限公司 Method for authenticating identity and device for authenticating identity
CN104598500A (en) * 2013-10-30 2015-05-06 三星Sds株式会社 apparatus and method for classifying data, and a system for collecting data using apparatus and method
CN104636912A (en) * 2015-02-13 2015-05-20 银联智惠信息服务(上海)有限公司 Identification method and device for withdrawal of credit cards
CN105225979A (en) * 2014-06-19 2016-01-06 中芯国际集成电路制造(上海)有限公司 A kind of semiconductor device processing procedure prognoses system and method
CN105279382A (en) * 2015-11-10 2016-01-27 成都数联易康科技有限公司 Medical insurance abnormal data on-line intelligent detection method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9195910B2 (en) * 2013-04-23 2015-11-24 Wal-Mart Stores, Inc. System and method for classification with effective use of manual data input and crowdsourcing
CN103793484B (en) * 2014-01-17 2017-03-15 五八同城信息技术有限公司 The fraud identifying system based on machine learning in classification information website
US20150242856A1 (en) * 2014-02-21 2015-08-27 International Business Machines Corporation System and Method for Identifying Procurement Fraud/Risk

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118434A (en) * 2006-08-03 2008-02-06 松下电工株式会社 Anomaly monitoring device
JP2010072779A (en) * 2008-09-17 2010-04-02 Mitsubishi Electric Corp Data classifying device, computer program, and data classification method
CN103123634A (en) * 2011-11-21 2013-05-29 北京百度网讯科技有限公司 Copyright resource identification method and copyright resource identification device
CN103383723A (en) * 2012-08-10 2013-11-06 眼验有限责任公司 Method and system for spoof detection for biometric authentication
CN103425845A (en) * 2013-08-26 2013-12-04 中国科学院合肥物质科学研究院 Safe reliability evaluating system and method for nuclear power device
CN104426884A (en) * 2013-09-03 2015-03-18 深圳市腾讯计算机***有限公司 Method for authenticating identity and device for authenticating identity
CN104598500A (en) * 2013-10-30 2015-05-06 三星Sds株式会社 apparatus and method for classifying data, and a system for collecting data using apparatus and method
CN104091122A (en) * 2014-06-17 2014-10-08 北京邮电大学 Detection system of malicious data in mobile internet
CN105225979A (en) * 2014-06-19 2016-01-06 中芯国际集成电路制造(上海)有限公司 A kind of semiconductor device processing procedure prognoses system and method
CN104636912A (en) * 2015-02-13 2015-05-20 银联智惠信息服务(上海)有限公司 Identification method and device for withdrawal of credit cards
CN105279382A (en) * 2015-11-10 2016-01-27 成都数联易康科技有限公司 Medical insurance abnormal data on-line intelligent detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于审计数据质量控制的数据挖掘应用;陈爱林等;《财会月刊》;20091231;第57-58页 *

Also Published As

Publication number Publication date
CN109993233A (en) 2019-07-09
CN106096657B (en) 2019-04-30
CN106096657A (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN109993233B (en) Method and system for predicting data auditing objective based on machine learning
EP3985578A1 (en) Method and system for automatically training machine learning model
Alam et al. Blockchain-based initiatives: current state and challenges
JP2020522832A (en) System and method for issuing a loan to a consumer determined to be creditworthy
US20180081787A1 (en) Virtual Payments Environment
JP5586373B2 (en) Computer-readable storage medium storing a program for causing a computer system to realize the function of a component that processes a payment request, and a method of operating a computer system that causes a computer system to process a payment request
Ekina et al. Application of bayesian methods in detection of healthcare fraud
US20220351284A1 (en) System and method for the rapid, flexible approval and disbursement of a loan
CN111881158B (en) Processing method, device, computer system and readable storage medium for managing report data
CN110729028A (en) Health data processing method and device, electronic equipment and storage medium
CN109658126A (en) Data processing method, device, equipment and storage medium based on product promotion
CN110717597A (en) Method and device for acquiring time sequence characteristics by using machine learning model
CN113762973A (en) Data processing method and device, computer readable medium and electronic equipment
CN111639706A (en) Personal risk portrait generation method based on image set and related equipment
CN115630221A (en) Terminal application interface display data processing method and device and computer equipment
CN117409913A (en) Medical service method and platform based on cloud technology
CN112702410B (en) Evaluation system, method and related equipment based on blockchain network
CN114066584A (en) Method and device for risk prevention and control of block chain
CN117114901A (en) Method, device, equipment and medium for processing insurance data based on artificial intelligence
CN109636627B (en) Insurance product management method, device, medium and electronic equipment based on block chain
CN115564591A (en) Financing product determination method and related equipment
CN115689571A (en) Abnormal user behavior monitoring method, device, equipment and medium
CN114372892A (en) Payment data monitoring method, device, equipment and medium
Parthasarathy et al. A framework for managing ethics in data science projects
Mary et al. Design and development of big data-based model for detecting fraud in healthcare insurance industry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant