CN115689298A

CN115689298A - Telecommunication fraud risk prediction method, system, equipment and readable storage medium

Info

Publication number: CN115689298A
Application number: CN202211712830.8A
Authority: CN
Inventors: 徐涛; 吴楠; 蒋修强; 胡大明; 卢小军; 王金涛; 王方舟
Original assignee: Beijing Ma Niu Technology Co ltd
Current assignee: Beijing Ma Niu Technology Co ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-02-03

Abstract

The invention relates to a telecom fraud risk prediction method, a telecom fraud risk prediction system, a telecom fraud risk prediction device and a readable storage medium, wherein the method comprises the steps of acquiring case data, regional risk coefficients and target personnel information, wherein the target personnel information forms a personnel characteristic data set; determining a target data set according to the personnel characteristic data set, the case data and the data processing rule; determining a test data set and a training data set according to the target data and the data classification rule; determining a plurality of prediction models according to a preset training frame and a training data set; determining a model score corresponding to each prediction model according to the plurality of prediction models, the test data set and the model score rule; determining a model joint score cheating risk value according to the model score, the prediction model and the cheating coefficient calculation rule; and determining a target risk value according to the regional risk coefficient, the model combined score deception risk value and a risk prediction rule. The invention has the effect of improving the accuracy of identifying users susceptible to telecommunication fraud.

Description

Telecommunication fraud risk prediction method, system, equipment and readable storage medium

Technical Field

The present application relates to the field of risk assessment technology, and in particular, to a method, a system, a device and a readable storage medium for predicting a risk of telecommunication fraud.

Background

Telecommunication fraud refers to a criminal act of creating false information by a lawless person through a telephone, a network and a short message mode, carrying out remote and non-contact fraud on a victim and inducing the victim to pay money or transfer money to the lawless person.

Telecommunication fraud threatens the security of people's property seriously. Although at present, public security can obtain a large amount of fraud early warning information through various channels, and analyze the fraud early warning information through experience or partial indexes so as to simply classify the vulnerable groups, and corresponding dissuading measures are taken for the vulnerable groups according to classification results. However, due to insufficient fraud prevention police force, the problem of difficult fraud prevention still exists, so that more accurate risk assessment needs to be performed in the aspect of telecommunication fraud early warning, that is, users who are more susceptible to telecommunication fraud hazards need to be identified more accurately, and targeted dissuading is performed on the users according to the possibility of future victimization of the users. The problem to be solved is how to improve the accuracy of identifying users susceptible to telecommunication fraud.

Disclosure of Invention

In order to improve the accuracy of identifying users vulnerable to telecommunication fraud, the application provides a telecommunication fraud risk prediction scheme.

In a first aspect of the present application, a telecommunications fraud risk prediction method is provided. The method comprises the following steps:

acquiring case data, regional risk coefficients and target personnel information, wherein the target personnel information forms a personnel characteristic data set;

determining a target data set according to the personnel feature data set, the case data and the data processing rule;

determining a test data set and a training data set according to the target data set and the data classification rule, wherein the training data set is used for training a model, and the test data set is used for grading the model;

determining a plurality of prediction models according to a preset training frame and the training data set;

determining a model score corresponding to each prediction model according to the plurality of prediction models, the test data set and a model score rule;

determining a model combined score cheating risk value according to the model score, the prediction model and the cheating coefficient calculation rule;

and determining a target risk value according to the region risk coefficient, the model joint score cheated risk value and a risk prediction rule.

According to the technical scheme, the target personnel information is obtained, then the target personnel information is formed into a personnel characteristic data set, then the personnel characteristic data set is marked on data in the personnel characteristic data set according to a data processing rule, the marked data form a target data set, the target data set is classified and divided into a test data set and a training data set. The trained multiple prediction models can be obtained by inputting the training data set into a preset training frame, then the multiple prediction models are scored according to the test data set and the model scoring rules, a data base is provided for subsequent calculation, and then the model joint scoring cheating risk value is calculated according to the model scoring, the prediction models and the cheating coefficient calculation rules. And finally, determining a target risk value by jointly scoring the model, the cheated risk value, the regional risk coefficient and the risk prediction rule. The method comprises the steps of determining a model joint scoring cheating risk value by training a plurality of prediction models and a cheating coefficient calculation rule, determining the probability that the person is subjected to telecommunication fraud in the region by combining the regional risk coefficient of the region, improving the accuracy of identifying users susceptible to telecommunication fraud damage, and further enabling public staff to make targeted dissuasion to the person according to the probability, so that the incidence rate of telecommunication fraud cases is reduced to a certain extent.

In a possible implementation manner, the determining a target data set according to the person feature data set, the case data, and the data processing rule includes:

determining a cheated cycle threshold value according to the personnel characteristic data set, the case data and a cheated cycle calculation rule;

according to the personnel characteristic data set and the case data, cheating marks are made on the target personnel information which has been alarmed, and the target personnel information with the cheating marks form a first data set;

according to the personnel characteristic data set and the case data, target personnel information which is not alarmed within a cheated period threshold value is marked to be not cheated, and the target personnel information which is not cheated and marked is formed into a second data set;

the first data set and the second data set constitute a target data set.

In one possible implementation, the determining a model score corresponding to each prediction model according to the plurality of prediction models, the test data set, and a model score rule includes:

the test data set comprises b deceased person information, wherein b is a positive integer greater than 1;

the b cheated people information forms a third data set;

inputting the test data set into the prediction model to obtain information of c cheated persons;

the c cheated person information forms a fourth data set;

taking an intersection of the third data set and the fourth data set, wherein the intersection comprises d cheated people information;

the model score of the predictive model = d/b.

In a possible implementation manner, the determining a model joint score spoofed risk value according to the model score, the prediction model and the spoofed coefficient calculation rule includes:

inputting personal information into the plurality of prediction models to obtain cheating probabilities corresponding to the prediction models;

and determining a model joint score cheating risk value according to the cheating probability, the model score and a coefficient determination rule.

According to the technical scheme, a plurality of corresponding cheating probabilities are obtained by using a plurality of prediction models, and then a final model combined score cheating risk value is determined by combining the model score and the corresponding cheating probabilities. The accuracy of the joint scoring cheating risk value of the model is improved to a certain extent by using a plurality of prediction models, and the phenomenon that the final data deviation is large because the result obtained by training by using a single prediction model is easily influenced by the model and a training data set is reduced.

In one possible implementation, obtaining the regional risk coefficient includes:

obtaining region fraud case data, wherein the region fraud case data is the ratio of the number of fraud cases in a plurality of regions to the number of all cases in the region;

sorting according to the occupation comparison areas to form an area occupation comparison number array;

calculating the median and standard deviation of the area ratio number sequence;

and determining a region risk coefficient according to the median, the standard deviation, the region fraud case data and a region risk determination rule.

In a possible implementation manner, the determining a target risk value according to the regional risk coefficient, the model joint score deceived risk value, and a risk prediction rule includes:

obtaining the region of personal information corresponding to the model combined scoring cheating risk value according to the model combined scoring cheating risk value;

determining a region risk coefficient corresponding to the region according to the region;

target risk value = model joint score spoofed risk value x the regional risk coefficient.

According to the technical scheme, the accuracy of the target risk value is improved to a certain extent by analyzing the individual and the region through the model joint scoring deception risk value and the regional risk coefficient.

In one possible implementation manner, the obtaining of the target person information includes:

acquiring telephone fraud data for acquiring data phishing by telephone information and phishing data for acquiring data phishing by website content;

acquiring a unique identification code of a target person according to the telephone fraud data;

extracting target person information from the telephone fraud data and the phishing data according to the unique identification code.

In a second aspect of the present application, a telecommunications fraud risk prediction system is provided. The system comprises:

the system comprises a data acquisition module, a data analysis module and a data analysis module, wherein the data acquisition module is used for acquiring case data, regional risk coefficients and target personnel information, and the target personnel information forms a personnel characteristic data set;

the data processing module is used for determining a target data set according to the personnel characteristic data set, the case data and the data processing rule;

the data classification module is used for determining a test data set and a training data set according to the target data set and a data classification rule, wherein the training data set is used for training a model, and the test data set is used for grading the model;

the model training module is used for determining a plurality of prediction models according to a preset training frame and the training data set;

the model scoring module is used for determining a model score corresponding to each prediction model according to the plurality of prediction models, the test data set and a model scoring rule;

the personal risk calculation module determines a model joint score cheating risk value according to the model score, the prediction model and the cheating coefficient calculation rule;

and the target risk determining module is used for determining a target risk value according to the region risk coefficient, the model combined score cheated risk value and a risk prediction rule.

In a third aspect of the present application, an electronic device is provided. The electronic device includes: a memory having stored thereon a computer program and a processor implementing, when executing the program, a telecommunications fraud risk prediction method as described above.

In a fourth aspect of the present application, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the method as according to the first aspect of the present application.

In summary, the present application includes at least one of the following beneficial technical effects:

1. the method comprises the steps of obtaining target personnel information, forming a personnel characteristic data set by the target personnel information, marking data in the personnel characteristic data set by the personnel characteristic data set according to a data processing rule, forming a target data set by the marked data, classifying the target data set, and dividing the target data set into a test data set and a training data set. The trained multiple prediction models can be obtained by inputting the training data set into a preset training frame, then the multiple prediction models are scored according to the test data set and the model scoring rules, and then the model joint scoring cheated risk value is calculated according to the model scoring, the prediction models and the cheated coefficient calculation rules. And finally, determining a target risk value by jointly scoring the model, the cheated risk value, the regional risk coefficient and the risk prediction rule. The method comprises the steps of determining a model joint scoring cheated risk value by training a plurality of prediction models and a cheated coefficient calculation rule, determining the probability that a person suffers telecommunication cheating in a region by combining the regional risk coefficient of the region, improving the accuracy of identifying users susceptible to telecommunication cheating, and further enabling public staff to make targeted dissuasion to the person according to the probability so that the incidence rate of telecommunication cheating cases is reduced to a certain extent;

2. the accuracy of the target risk value is improved to a certain extent by analyzing from two aspects of individuals and the area through calculating the model combined scoring deception risk value and the regional risk coefficient.

Drawings

FIG. 1 is a schematic flow chart of a telecommunication fraud risk prediction method provided by the present application.

FIG. 2 is a schematic structural diagram of a telecommunication fraud risk prediction system provided by the present application.

Fig. 3 is a schematic structural diagram of an electronic device provided in the present application.

In the figure, 200, a telecommunications fraud risk prediction system; 201. a data acquisition module; 202. a data processing module; 203. a data classification module; 204. a model training module; 205. a model scoring module; 206. a personal risk calculation module; 207. a target risk determination module; 301. a CPU; 302. a ROM; 303. a RAM; 304. an I/O interface; 305. an input section; 306. an output section; 307. a storage section; 308. a communication section; 309. a driver; 310. a removable media.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.

The embodiments of the present application will be described in further detail with reference to the drawings attached hereto.

The embodiment of the application provides a method for predicting telecommunication fraud risk, and the main flow of the method is described as follows.

As shown in fig. 1:

step S101: telephone fraud data, phishing data, case data and phishing case data are obtained.

Specifically, the telephone fraud data, the phishing data, the case data and the regional fraud case data are obtained through the public security system. The phishing data comprise name, gender, identity card number, age, occupation, academic history, marital status, time for answering a fraud call for the first time, number of times for answering a fraud call, time for dialing a fraud call for the first time, number of times for dialing a fraud call, duration for answering a fraud call, duration for dialing a fraud call and number of communications with the same fraud call, and the phishing data comprise name, gender, identity card number, time for accessing a telecommunication fraud website for the first time, total number of times for accessing the telecommunication fraud website, total duration for accessing the telecommunication fraud website and average duration for accessing the telecommunication fraud website. The case data comprises the counter, the basic information of the counter, the victim, the basic information of the victim, the time when the victim is exposed to the fraud information for the first time and the counter time. The above regional fraud case data includes the total number of cases in a plurality of regions, the number of telecommunication fraud cases in the region and the ratio of the number of telecommunication fraud cases to the total number of cases.

Step S102: and extracting target personnel information according to the telephone fraud data and the phishing data, wherein the target personnel information forms a personnel feature data set.

Specifically, the telephone fraud data and the phishing data are extracted to form the personnel characteristic information. For example, the name and identification number of a target person are obtained from the phone fraud data, in this embodiment, the identification number is a unique identification code, and corresponding gender information is obtained from the household data in the public security system according to the unique identification code and is represented by a numerical value, wherein male is 1 and female is 0. And similarly, extracting the age from the household registration information. And extracting the work occupation of the personnel from the social security data in the public security system according to the unique identification code, wherein the work occupation comprises whether the personnel is a public institution, whether the personnel is a national enterprise, whether the personnel is a private enterprise, whether the personnel is an individual industrial and commercial enterprise and whether the personnel is other. And extracting the academic calendar information from the academic book information in the public security system according to the unique identification code, wherein the academic calendar information comprises whether the academic calendar information is illiterate, whether the academic calendar information is primary school, whether the academic calendar information is junior high school, whether the academic calendar information is subject, whether the academic calendar information is a student or more. And extracting marital conditions from the family information in the public security system according to the unique identification code, wherein the marital conditions comprise whether the family is not married, whether the family is married and whether the family is departed. And acquiring the number of times of answering a fraud call, the number of times of dialing a fraud call, the duration of answering a fraud call, the duration of dialing a fraud call, the average duration of answering a fraud call, the average duration of dialing a fraud call, the number of times of communicating with the same fraud call, the first time of answering a fraud call and the first time of dialing a fraud call of the corresponding target person from the telephone fraud data according to the unique identification code. And acquiring the total number of times of accessing the telecom fraud websites, the total duration of accessing the telecom fraud websites, the average duration of accessing the telecom fraud websites and the time of accessing the telecom fraud websites for the first time of the corresponding target person from the phishing data according to the unique identification code. And acquiring information of all target personnel to form a personnel characteristic data set. The personnel characteristic data set is updated in real time, namely the telephone fraud data and the phishing data are dynamically scanned, information corresponding to the personnel characteristic data set is updated when new information appears, and information construction of target personnel is also performed if new target personnel appear.

In this embodiment, the construction of job occupation, academic information, and marital status is achieved by one-hot coding, which is the use of an N-bit status register to encode N states, each with its own independent register bit, and only one of which is active at any one time. One-hot encoding is a technique well known to those skilled in the art and will not be described in detail herein.

Step S103: and marking the personnel characteristic data set and determining a target data set according to the personnel characteristic data set, the case data and the data processing rule.

Specifically, the time when the victim first contacts with the fraud information and the reporting time are obtained from the case data, and the cheated period is calculated, wherein the cheated period is the number of days elapsed between the time when the victim first contacts with the fraud information and the reporting time. Acquiring all cheated periods in case data and sequencing the cheated periods in an ascending order to form a sequence, wherein the sequence is

And acquiring a median of the sequence and a standard deviation of the sequence, wherein the cheated cycle threshold is the sum of the median and the standard deviation. The specific calculation formula is as follows:

；

；

；

wherein n represents the number of elements in the sequence,

is the first in the above sequence

The number of the elements is one,

is the average of all the elements in the above sequence,

indicating a spoofed cycle threshold.

Comparing the personnel characteristic data set with the case data, when a certain target person is in both the personnel characteristic data set and the case data, the target person is shown to be cheated and marked, and the information of the target person with the cheated mark is formed into a first data set. Comparing the personnel characteristic data set with the case data, when a certain target person is in the personnel characteristic data set but not in the case data, the target person is not alarmed, and meanwhile, the time of the target person contacting fraud information for the first time is obtained from the personnel characteristic data set, wherein the time of the target person calling fraud calls for the first time, the time of the target person receiving fraud calls for the first time and the time of the target person accessing the telecom fraud website for the first time are obtained, and the minimum time is obtained by comparing the times, and the minimum time is the time of the target person contacting fraud information for the first time. And calculating the days from the time of the first time of contacting the fraud information to the current time, comparing the days with the cheated cycle threshold value, when the days are more than the cheated cycle threshold value, marking the corresponding target personnel with non-cheated marks, and forming the target personnel information with the non-cheated marks into a second data set. And when the number of days is less than the cheated cycle threshold value, marking the corresponding target personnel as a to-be-predicted mark, and forming a to-be-predicted data set by the target personnel information to be predicted and marked. And combining the first data set and the second data set to form a target data set.

When the data in the personnel feature data set is updated, the target data is also subjected to corresponding updating calculation.

Step S104: and determining a test data set and a training data set according to the target data set and the data classification rule.

Specifically, data in a target data set is randomly sampled, the proportion of random sampling is set manually, the sampled data form a test data set, and the residual data after the sampling form a training data set. For example, if there are 100 pieces of data in the target data set and the sampling rate obtained from the database is 10%, this means that 10 pieces of data are extracted from the target data set as a test data set and the remaining 90 pieces of data are used as a training data set. The sampling ratio is also set in accordance with the total amount of data in the target data set when the sampling ratio is set artificially.

Step S105: and determining a plurality of prediction models according to a preset training frame and a training data set.

Specifically, the preset training frame includes a logistic regression model, a decision tree model, a naive bayes model and a neural network model, and the training data set is input into the preset training frame to obtain a plurality of prediction models, wherein the plurality of prediction models include the logistic regression model, the decision tree model, the naive bayes model and the neural network model after training.

Step S106: and determining the model score corresponding to each prediction model according to the plurality of prediction models, the test data sets and the model score rule.

Specifically, the test data set comprises b deceased person information, wherein b is a positive integer greater than 1; the b cheated people form a third data set; respectively inputting the test data sets into the prediction model to obtain information of c cheated persons; the c cheated people information forms a fourth data set; taking an intersection of the third data set and the fourth data set, wherein the intersection comprises d pieces of cheated personnel information; the model score of the above predictive model = d/b. For example, there are 100 pieces of data in the test data set, wherein 50 pieces of data of the deceased person are present, that is, b =50, 100 pieces of data are input into the logistic regression model, 60 pieces of data are output by the logistic regression model as the deceased person, 40 pieces of data among 60 pieces of data are indeed deceased person, c =60, d =40, and the score of the logistic regression model is 40/50=0.8, and the score calculation methods for the other decision tree models, naive bayes model, and neural network model are the same as the above-mentioned score calculation method for the logistic regression model, and are not described herein again.

Step S107: and determining a model combined score cheating risk value according to the model score, the prediction model and the cheating coefficient calculation rule.

Specifically, target person information in a data set to be predicted is input into a prediction model, the prediction model outputs the cheating probability of the person, and a specific calculation formula of a combined score cheating risk value of the model of the person is as follows:

；

wherein p represents a model joint score deception risk value, i represents the ith prediction model, and q represents the model joint score deception risk value _i Representing the probability of spoofing, m, of the prediction model output of the ith _i The corresponding model scores of the ith prediction model are shown, and n is the number of prediction models. For example, a certain person information is input to the logistic regression model, the decision tree model, the naive bayes model, and the neural network model, respectively, the logistic regression model outputs a spoofed probability of 0.6, the decision tree model outputs a spoofed probability of 0.8, the naive bayes model outputs a spoofed probability of 0.6, and the neural network model outputs a spoofed probability of 0.8. In this embodiment, the model score of the logistic regression model is 0.8, the model score of the decision tree model is 0.7, the model score of the naive bayes model is 0.9, and the model score of the neural network model is 0.85, so that the risk value of cheating on the combined model score of the person is (0.6 × 0.8+0.8 × 0.7+0.6 × 0.9+0.8 × 0.85)/4 =2.26.

Step S108: and determining the regional risk coefficient according to the regional fraud case data and the regional risk coefficient rule.

Specifically, according to the data of the region fraud cases, the total number of cases in a plurality of regions, the ratio of the number of telecommunication fraud cases in the region to the total number of cases of telecommunication fraud cases in the region can be obtained, the ratio is recorded as a risk ratio, the regions are sorted according to the risk ratio, the sorted risk ratios form a region proportion number sequence, the median and the standard deviation of the region proportion number sequence are calculated, the median added to twice the standard deviation is recorded as a first risk threshold, the standard deviation subtracted from the median added to twice is recorded as a second risk threshold, and for a certain region, when the risk ratio of the region is greater than or equal to the first risk threshold, the region risk coefficient of the region is the risk ratio of the region added by one; when the risk ratio of the region is smaller than the first risk threshold and larger than the second risk threshold, the region risk coefficient of the region is 1; and when the risk ratio of the region is smaller than the second risk threshold, the region risk coefficient of the region is a value obtained by subtracting the risk ratio of the region. The specific calculation formula is as follows:

；

wherein the content of the first and second substances,

a regional risk factor for a region is represented,

indicating the risk ratio for a certain region,

represents the median of the area fraction array,

the standard deviation of the area fraction array is indicated.

Step S109: and determining a target risk value according to the regional risk coefficient, the model joint score cheated risk value and a risk prediction rule.

Specifically, the target risk value of a certain person is the product of the model joint score deception risk value of the person and the corresponding regional risk coefficient of the person. The specific calculation formula is as follows:

；

wherein, the first and the second end of the pipe are connected with each other,

a target risk value representing the person a,

the model representing person a is combined with a score spoofed risk value,

is the regional risk coefficient for region k, and the person a belongs to region k.

An embodiment of the present application provides a telecommunication fraud risk prediction system 200, and referring to fig. 2, the telecommunication fraud risk prediction system 200 includes:

the data acquisition module 201 is used for acquiring case data, regional risk coefficients and target personnel information, wherein the target personnel information forms a personnel characteristic data set;

the data processing module 202 is used for determining a target data set according to the personnel feature data set, the case data and the data processing rule;

the data classification module 203 determines a test data set and a training data set according to the target data set and the data classification rule, wherein the training data set is used for training a model, and the test data set is used for scoring the model;

the model training module 204 is used for determining a plurality of prediction models according to a preset training frame and the training data set;

the model scoring module 205 is configured to determine a model score corresponding to each prediction model according to the plurality of prediction models, the test data set and a model scoring rule;

the personal risk calculation module 206 determines a model combined score cheating risk value according to the model score, the prediction model and the cheating coefficient calculation rule;

and the target risk determining module 207 determines a target risk value according to the region risk coefficient, the model joint score cheated risk value and a risk prediction rule.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the described module may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

The embodiment of the application discloses an electronic device. Referring to fig. 3, the electronic device includes a Central Processing Unit (CPU) 301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage section 307 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for system operation are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus. An input/output (I/O) interface 304 is also connected to the bus.

The following components are connected to the I/O interface 304: an input section 305 including a keyboard, a mouse, and the like; an output section 306 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 307 including a hard disk and the like; and a communication section 308 including a network interface card such as a LAN card, a modem, or the like. The communication section 308 performs communication processing via a network such as the internet. Drivers 309 are also connected to the I/O interface 304 as needed. A removable medium 310 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 309 as necessary, so that a computer program read out therefrom is mounted into the storage section 307 as necessary.

In particular, according to embodiments of the present application, the process described above with reference to the flowchart fig. 1 may be implemented as a computer software program. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 308 and/or installed from the removable medium 310. The computer program, when executed by the Central Processing Unit (CPU) 301, performs the above-described functions defined in the apparatus of the present application.

It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the application referred to in the present application is not limited to the embodiments with a particular combination of the above-mentioned features, but also encompasses other embodiments with any combination of the above-mentioned features or their equivalents without departing from the spirit of the application. For example, the above features and the technical features (but not limited to) having similar functions in the present application are mutually replaced to form the technical solution.

Claims

1. A telecommunication fraud risk prediction method, comprising:

determining a target data set according to the personnel characteristic data set, the case data and the data processing rule;

determining a model joint score cheating risk value according to the model score, the prediction model and the cheating coefficient calculation rule;

and determining a target risk value according to the region risk coefficient, the model combined score deception risk value and a risk prediction rule.

2. The telecommunication fraud risk prediction method of claim 1, wherein said determining a target data set according to the personnel feature data set, the case data and data processing rules comprises:

the first data set and the second data set constitute a target data set.

3. The telecommunications fraud risk prediction method of claim 1, wherein the determining a model score corresponding to each prediction model according to the plurality of prediction models, the test data set, and model scoring rules comprises:

the b cheated people information forms a third data set;

the c cheated people information forms a fourth data set;

the model score of the predictive model = d/b.

4. The telecommunications fraud risk prediction method of claim 1, wherein the determining a model joint-scoring fraud risk value according to the model score, the prediction model, and a fraud coefficient calculation rule, comprises:

5. The telecommunication fraud risk prediction method of claim 1, wherein obtaining a regional risk coefficient comprises:

6. The telecommunication fraud risk prediction method of claim 1, wherein said determining a target risk value according to the regional risk coefficient, the model joint score spoofed risk value, and a risk prediction rule comprises:

obtaining the region of the personal information corresponding to the model joint scoring cheated risk value according to the model joint scoring cheated risk value;

7. The telecommunication fraud risk prediction method of claim 1, wherein acquiring target personnel information comprises:

obtaining telephone fraud data and phishing data, wherein the telephone fraud data is used for obtaining data phishing by telephone information, and the phishing data is used for obtaining data phishing by website content;

8. A telecommunication fraud risk prediction system, comprising:

the system comprises a data acquisition module (201) for acquiring case data, regional risk coefficients and target personnel information, wherein the target personnel information forms a personnel characteristic data set;

the data processing module (202) is used for determining a target data set according to the personnel feature data set, the case data and the data processing rule;

a data classification module (203) for determining a test data set and a training data set according to the target data set and the data classification rule, wherein the training data set is used for training a model, and the test data set is used for grading the model;

a model training module (204) for determining a plurality of prediction models according to a preset training frame and the training data set;

a model scoring module (205) for determining a model score corresponding to each predictive model according to the predictive models, the test data set and a model scoring rule;

a personal risk calculation module (206) for determining a model combined score cheating risk value according to the model score, the prediction model and the cheating coefficient calculation rule;

and a target risk determining module (207) for determining a target risk value according to the region risk coefficient, the model combined score deception risk value and a risk prediction rule.

9. An electronic device, comprising a memory and a processor, the memory having stored thereon a computer program which can be loaded by the processor and which performs the method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a computer program is stored which can be loaded by a processor and which executes a method according to any one of claims 1 to 7.