CN110263817B - Risk grade classification method and device based on user account - Google Patents

Risk grade classification method and device based on user account Download PDF

Info

Publication number
CN110263817B
CN110263817B CN201910448945.2A CN201910448945A CN110263817B CN 110263817 B CN110263817 B CN 110263817B CN 201910448945 A CN201910448945 A CN 201910448945A CN 110263817 B CN110263817 B CN 110263817B
Authority
CN
China
Prior art keywords
user account
risk
discrete value
data
account data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910448945.2A
Other languages
Chinese (zh)
Other versions
CN110263817A (en
Inventor
王川
祝慧佳
司书强
邓黄健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910448945.2A priority Critical patent/CN110263817B/en
Publication of CN110263817A publication Critical patent/CN110263817A/en
Application granted granted Critical
Publication of CN110263817B publication Critical patent/CN110263817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a risk classification method and device based on a user account. The method comprises the following steps: acquiring a user account data set, and processing account data corresponding to each user account to obtain first characteristic data and second characteristic data; performing discretization operation on the first characteristic data and the second characteristic data to obtain a first discrete value and a second discrete value of each user account, and establishing a corresponding relation between the first discrete value and the second discrete value; detecting the user accounts according to the first discrete value and the second discrete value to obtain a first risk score of each user account; based on the first discrete value, the second discrete value and the first risk score, performing clustering operation on the user account to obtain a second risk score of the user account under the same corresponding relationship; and dividing the risk grade of the corresponding relation to which the user account belongs according to the second risk score and the corresponding relation.

Description

Risk grade classification method and device based on user account
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a method and an apparatus for risk classification based on a user account.
Background
With the rapid development of computer technology and internet technology, social activities based on content-based network platforms, such as instant messaging, forum communication, online transactions and other activities, may correspondingly generate some behavior records in the social activities, and publish or propagate data such as corpora and the like, which are collectively referred to as social content herein. Social activities bring convenience and rapidness to information exchange, and meanwhile, some junk information is increased day by day, for example: some fraudulent, gambling, pornography, etc. contents are spread by a plurality of contents sent out by the users over a period of time, which not only pollutes the network environment, but also brings risks to the content security of the internet.
In the prior art, risk identification of internet content is mainly realized through keywords, manual review, text models and the like, however, in actual risk countermeasures, the keywords and the manual review are difficult to effectively prevent in the aspects of accuracy, coverage and automation, and the text models have weak countermeasures against fast variant content. The traditional expert scoring method has the cold start problem of user risk division, and the division through variables shot by experts is not strict and scientific enough, so that the later explanation cost is high.
Based on the prior art, a risk grade division scheme which is efficient, convenient, accurate and precise is needed to be provided.
Disclosure of Invention
The embodiment of the specification provides a risk classification method and device based on a user account, and aims to solve the problems that in the prior art, the risk classification efficiency is low and the risk classification is not accurate and rigorous enough.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
the method for risk rating classification based on the user account provided by the embodiment of the specification comprises the following steps:
acquiring a preset user account data set, wherein the user account data set comprises user accounts and account data corresponding to the user accounts;
processing the account data to obtain first characteristic data and second characteristic data of each user account;
performing discretization operation on the first characteristic data and the second characteristic data of the user accounts to obtain a first discrete value and a second discrete value of each user account, and establishing a corresponding relation between the first discrete value and the second discrete value of each user account;
detecting the user accounts according to the first discrete value and the second discrete value to obtain a first risk score of each user account;
performing clustering operation on the user accounts based on the first discrete value, the second discrete value and the first risk score of each user account to obtain a second risk score of the user accounts under the same corresponding relation;
and dividing the risk level of the corresponding relation to which the user account belongs according to the second risk score and the corresponding relation.
An embodiment of the present specification provides a risk classification device based on a user account, including:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a preset user account data set, and the user account data set comprises user accounts and account data corresponding to the user accounts;
the processing module is used for processing the account data to obtain first characteristic data and second characteristic data of each user account;
the discrete module is used for performing discretization operation on the first characteristic data and the second characteristic data of the user accounts to obtain a first discrete value and a second discrete value of each user account, and establishing a corresponding relation between the first discrete value and the second discrete value of each user account;
the detection module is used for detecting the user accounts according to the first discrete value and the second discrete value to obtain a first risk score of each user account;
the clustering module is used for performing clustering operation on the user accounts based on the first discrete value, the second discrete value and the first risk value of each user account to obtain a second risk value of the user accounts under the same corresponding relation;
and the dividing module is used for dividing the risk grade of the corresponding relation to which the user account belongs according to the second risk score and the corresponding relation.
An electronic device provided in an embodiment of the present specification includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the risk classification method based on a user account when executing the computer program.
The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:
the method comprises the steps that a user account data set is obtained, wherein the user account data set comprises user accounts and account data corresponding to the user accounts, and the account data are processed to obtain first characteristic data and second characteristic data; performing discretization operation on the first characteristic data and the second characteristic data to obtain a first discrete value and a second discrete value of each user account, and establishing a corresponding relation between the first discrete value and the second discrete value; detecting the user accounts according to the first discrete value and the second discrete value to obtain a first risk score of each user account; performing clustering operation on the user accounts based on the first discrete value, the second discrete value and the first risk value to obtain a second risk value of the user accounts under the same corresponding relation; and dividing the risk grade of the corresponding relation to which the user account belongs according to the second risk score and the corresponding relation. Based on the scheme, the intelligent user risk classification of the account number dimension can be realized, the working efficiency of risk classification is improved, and the classification method is more accurate and rigorous.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a schematic diagram of an overall architecture of a platform involved in a practical application scenario according to the solution of the present specification;
fig. 2 is a schematic flowchart of a risk classification method based on a user account according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a user account risk classification hierarchy according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a risk classification device based on a user account according to an embodiment of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.
The users publish social content such as information, voice, articles and the like through the network platform, however, the social content published by the users may contain the contents such as riot, gambling, pornography and the like, and the contents pose a great risk to the content security of the internet. In the prior art, for the identification and prevention of content risks, the methods of keywords, manual review, text models and the like are mainly used, but in actual risk countermeasures, the keywords and the manual review methods are difficult to effectively prevent in the aspects of accuracy, coverage and automation. The method is characterized in that the text models of various risks are screened, and due to the fact that the varieties of the social content are low in cost, high in speed and various in form, the iteration speed of the text models cannot keep up with the variety speed of the content, and therefore the text models are weak in antagonism.
With continuous game playing with content risks, a small number of account numbers are gradually found to repeatedly release a large number of risks, and most variant content is generated by the part of the account numbers. Therefore, content risk prevention and control based on account dimensionality is developed, the prevention and control of the account dimensionality can effectively make up for short boards of content prevention and control, however, for risk division of the account dimensionality, a traditional method adopts an expert scoring method, the method has the cold start problem, and the expert levels of all risk fields are uneven and difficult to achieve completely; in addition, the division through the variables shot by experts is not strict and scientific enough, and the later explanation cost is high.
Therefore, a systematic methodology is needed for the user account risk division of the account dimensionality, so that efficient, rigorous, scientific and intelligent user risk division is realized, and the user risks of all levels are distinguished.
Fig. 1 is a schematic diagram of an overall architecture of a platform related to the technical solution of the present specification in an actual application scenario. The overall platform architecture comprises at least one social domain, wherein one social domain may refer to a smaller social domain, or may be a larger social domain composed of a plurality of smaller social domains, for example: all the internet content related in a content-type network platform (such as hundredths, knowns, and the like) can be regarded as a social domain, and other social domains (such as chat domains, social domains, public domains, and the like) forming the social domain in the content-type network platform can be regarded as a social domain. The application scenario in the embodiment of the present specification may be to divide user risk levels in a social domain in the content security field.
In the embodiment of the specification, account data of each user account in a user account data set is processed by acquiring the user account data set in a database of a social domain to obtain first characteristic data and second characteristic data of each user account, discretization operation is performed on the first characteristic data and the second characteristic data of all the user accounts to obtain a first discrete value and a second discrete value of each user account, and a corresponding relation between the first discrete value and the second discrete value of each user account is established; detecting the user accounts according to the first discrete value and the second discrete value to obtain a first risk value of each user account; after the first risk score is obtained, performing clustering operation on the user accounts based on the first discrete value, the second discrete value and the first risk score of each user account to obtain a second risk score of the user accounts under the same corresponding relation; and finally, according to the second risk score and the corresponding relation, dividing the risk level of the corresponding relation to which the user account belongs. According to the technical scheme, the risk grades of all users in the social contact area can be intelligently divided, multi-level user risk grades such as the first risk grade, the second risk grade and the third risk grade are formed, the working efficiency of risk grade division is improved, and the dividing method is accurate and rigorous.
Based on the above-described scenarios, the following describes the embodiments of the present specification in detail.
Fig. 2 is a schematic flow chart of a risk rating method based on a user account according to an embodiment of the present specification, where the method specifically includes the following steps:
in step S210, a predetermined user account data set is obtained, where the user account data set includes each user account and account data corresponding to each user account.
In one or more embodiments of the present description, user account data in one or more social domains may be obtained from a database of the one or more social domains, and a set of the user account data may be used as a user account data set. In practical applications, one social domain may correspond to one database, or a plurality of social domains share one database, so that account data of all user accounts of one social domain may be obtained from one database, or account data of all user accounts of a plurality of social domains may be obtained. Since the database in the social domain stores data of user accounts, a user has at least one user account, and the user who has the account may be referred to as a member, and the user account may be referred to as a member account (i.e., a member identifier).
Specifically, in an embodiment of the present specification, the database of the social domain may be a Data warehouse (e.g., ODPS), and the ODPS (Open Data Processing Service) may be used for storing and calculating batch structured Data, and may provide a solution for the mass Data warehouse and an analysis modeling Service for large Data. The data repository stores account data for all user accounts of one or more social domains.
In step S220, the account data is processed to obtain first characteristic data and second characteristic data of each user account.
In one or more embodiments of the present specification, after account data of a user account is acquired from the data repository, the first characteristic data and the second characteristic data of each user account may be extracted by preprocessing the account data, such as data cleansing and characteristic derivation. The first characteristic data of the user account may be regarded as a first variable, and the second characteristic data may be regarded as a second variable.
Further, in an embodiment of the present specification, the first variable and the second variable may have the following combinations: when the first variable is the violation frequency (namely violation quantity), the corresponding second variable is the risk concentration; when the first variable is the risk exposure frequency, the corresponding second variable is the risk exposure concentration; and when the first variable is the distribution times, the corresponding second variable is the average daily distribution times. The combination mode of the variables makes the business more valuable and accords with the logic of the business.
In step S230, a discretization operation is performed on the first feature data and the second feature data of the user accounts to obtain a first discrete value and a second discrete value of each user account, and a corresponding relationship between the first discrete value and the second discrete value of each user account is established.
In one or more embodiments of the present specification, discretization is to divide data of continuous attributes by a breakpoint and finally attribute the data to different classifications, and conversion of variables, that is, a specific business value into a discrete value, can also be achieved by performing a discretization operation on first feature data (i.e., a first variable) and second feature data (i.e., a second variable).
Specifically, in an embodiment of the present specification, the discretization may be performed by a quantile method, such as: dispersing a first variable (such as violation times), arranging the violation times of all user accounts in a period of time in a reverse order (from large to small), dispersing the variable in a fifth place and a tenth place, wherein for example, the dispersed segments are [0, 2), [2, 3) and ≧ 3, the corresponding dispersion values are 3, 2 and 1 in sequence.
Further, in an embodiment of the present specification, after obtaining the first discrete value and the second discrete value of each user account, a corresponding relationship between the first discrete value and the second discrete value of each user account may also be established. For example: the discretization values of the first variable and the second variable corresponding to the user account A after discretization are respectively 1 and 4, the corresponding relation between the first discrete value and the second discrete value of the user account A is 1-4, and the user account A belongs to the category of the corresponding relation 1-4.
In step S240, the user accounts are detected according to the first discrete value and the second discrete value, so as to obtain a first risk score of each user account.
In one or more embodiments of the present specification, a first risk score (i.e., a first abnormality score) of each user account may be obtained by inputting the first discrete value and the second discrete value of each user account into an abnormality detection model for detection, where the first risk score indicates an abnormality degree, and the higher the score is, the more abnormal the user is.
Further, in an embodiment of this specification, the anomaly detection model is constructed in advance, and the construction process may include: firstly, discretizing standard first characteristic data (first variable) and second characteristic data (second variable) to obtain discrete values corresponding to the standard first variable and the standard second variable, and inputting the discrete values into an anomaly detection algorithm for training to obtain an anomaly detection model. The first variable and the second variable of the above criteria may be characteristic data for evaluating whether a user account is abnormal.
In one or more embodiments of the present disclosure, the anomaly detection algorithm may be an Isolation Forest algorithm (i.e., if anomaly detection algorithm), and the anomaly detection model is an Isolation Forest algorithm model, which is an unsupervised anomaly detection method, and the result of the algorithm returns an abnormal value of the example, and a higher abnormal value indicates more anomaly. Embodiments of the present description separate instances by randomly selecting records and features, so instances that are more easily separated are more likely to be anomalous.
In step S250, based on the first discrete value, the second discrete value, and the first risk score of each user account, a clustering operation is performed on the user accounts to obtain a second risk score of the user accounts in the same corresponding relationship.
In one or more embodiments of the present specification, the first discrete value, the second discrete value, and the first risk score of each user account are input into a clustering model for clustering, so that the second risk scores (i.e., the second abnormal scores) of all user accounts in the same corresponding relationship can be obtained.
Further, in an embodiment of this specification, the clustering model is constructed in advance, and the construction process may include: firstly, discretizing standard first characteristic data (first variable) and second characteristic data (second variable) to obtain discrete values corresponding to the standard first variable and the standard second variable, then inputting the discrete values into the abnormal detection model obtained through training to obtain standard first risk values, and finally training the obtained discrete values and the standard first risk values based on a clustering algorithm to obtain a clustering model.
In one or more embodiments of the present disclosure, the clustering algorithm may be a CFSFDP clustering algorithm, and then the clustering model is a CFSFDP clustering model, and a core idea of the CFSFDP clustering algorithm lies in describing a clustering center, that is, the clustering center may have the following characteristics at the same time: the density of the self is larger, namely the self is surrounded by neighbors with the density not exceeding that of the self; the "distance" from other more dense data points is relatively larger, i.e., the distance between any two cluster center points should be larger.
In an embodiment of the present specification, the CFSFDP clustering algorithm may include the following steps:
performing text preprocessing on the data in the data set, and calculating the distance between every two texts;
according to a manually set cut-off distance d c And calculating the local density and distance of each text data according to the distance between the texts;
calculating the product of the local density and the distance of each text in the data set, and determining a clustering center point;
and attributing the rest data points to the class where the data point with the shortest distance from the clustering center is located, and finishing the clustering operation.
In step S260, the risk levels of the corresponding relationship to which the user account belongs are classified according to the second risk score and the corresponding relationship.
In one or more embodiments of the present specification, since clustering is performed by using the second risk score as a clustering center, each corresponding relationship also corresponds to one second risk score, that is, all user accounts in each corresponding relationship correspond to one second risk score, and by arranging the second risk scores in order (from large to small), the abnormality of each corresponding relationship group can be detected, and in principle, a group with a higher second risk score is more biased to a higher risk level. In addition, according to the ranking order of the second risk scores, the corresponding relations with the same first discrete values can be divided into first risk grades, and the first risk grades are sequentially divided until the risk grades of all the corresponding relations are determined.
Continuing with the above specific embodiment, as shown in table 1 below, by arranging the second risk scores in reverse order, the corresponding relationships corresponding to the second risk scores are also arranged in sequence from top to bottom, where the violation number segments may represent first discrete values in the corresponding relationships, and the risk concentration segments represent second discrete values in the corresponding relationships; the higher the second risk score is, the higher the risk level of all the user accounts under the corresponding relationship is, and if the risk level is expressed by color depth, the higher the second risk score is, the darker the corresponding relationship color is.
Figure BDA0002074426650000091
Figure BDA0002074426650000101
TABLE 1
Further, in an embodiment of the present specification, after the risk levels of the corresponding relationships to which the user accounts belong are divided according to the second risk scores and the corresponding relationships, the first discrete values and the second discrete values in the corresponding relationships may also be mapped according to the corresponding relationships and the risk levels, so as to restore specific service data of the corresponding relationships to which the user accounts belong.
As shown in table 2 below, the violation number segments (segments 1-3) and the risk concentration segments (segments 1-5) in table 1 are layered according to the variable combinations in the corresponding relationships, so that the discrete values (the violation number and the risk concentration) of the two-dimensional matrix can be restored to specific business values, and the restoration operation is performed by mapping the discretized variables in the corresponding relationships to the specific business values, so that the risk levels of the corresponding relationships of the user accounts can be conveniently divided according to specific business data, and the risk levels of each user account can be further divided, so that the application is more digital.
Figure BDA0002074426650000102
TABLE 2
In one or more embodiments of the present description, a first risk score is obtained for each user account through a constructed if anomaly detection model, then a corresponding relationship of each user account is clustered by using a constructed CFSFDP clustering model and centering on a second risk score to complete the grade division of all user accounts.
A complete embodiment is used below to fully describe the technical solution in this specification in detail, referring to fig. 3, which shows a schematic structural diagram of a user account risk classification hierarchy provided in this specification embodiment, and mainly includes the following contents:
in one embodiment of the present disclosure, all member accounts in the social domain may be divided into three risk levels, namely, a first risk level, a second risk level and a third risk level, wherein the first risk level may include black, suspected black and black-gray members, the second risk level may be continuously divided into white-gray members, and the third risk level may include suspected white and pure white members, so that the member risk is divided into 6 more specific risk levels. Specifically, the process of dividing the risk hierarchy may include the following steps:
the first step is as follows: and (3) dividing a first risk level from all the user accounts (namely, the first risk level), and further dividing members in the first risk level into black, suspected black and black gray members, specifically:
taking all the user account data in the social domain as total user account data, extracting a first user account data set according to a first preset condition, where the first preset condition may be a member with a number of violations greater than a preset number of violations, that is, a member reaching a violation value, and therefore, all the members in the first user account data set may be taken as a first risk level, where the user account data set in the above embodiment is the first user account data set, the number of violations is taken as first feature data of all the members in the first risk level, and the risk concentration is taken as second feature data of all the members in the first risk level.
Further, discretizing the number of violations and the risk concentration of all members to obtain a first discrete value and a second discrete value of each user account, for example: the rule violation times of the members in a time window (such as 30 days) can be sorted in a reverse order, and then the rule violation times are discretized in a fifth place and a tenth place, wherein the discretized variables are 1, 2 and 3, and each variable corresponds to one rule violation time segment; then, data distribution exploration is utilized to find that the risk concentrations are abnormal inflection points at 0.9, 0.5, 0.4 and 0.1, the risk concentrations are discretized into (0.9, 1), (0.5, 0.9), (0.4, 0.5), (0.1, 0.4) and (0, 0.1) 5 segments according to the abnormal points, corresponding relations can be established according to the segments, and each corresponding relation represents a group with similar behaviors.
Inputting the discretized variables into an if anomaly detection model for detection to obtain a first risk score of each member account; and then according to the two variables of the violation times and the risk concentration obtained after discretization, fusing a first risk score obtained by the if anomaly detection model, and clustering the members by using a CFSFDP clustering model to obtain second risk scores of all the user accounts under the same corresponding relation. According to the second risk score and the corresponding relationship, further risk division can be realized for all members in the first risk level, if the variable corresponding relationship is 1-4,1-5,1-2,1-1, the variable corresponding relationship is 2-4,2-5,2-2,2-1, the variable corresponding relationship is suspected black member, and the variable corresponding relationship is 3-4,3-5,3-2,3-1, the variable corresponding relationship is black grey member.
The second step is that: and (3) dividing a second risk level, namely, removing the user account data of the member in the first risk level from the total user account data, dividing the rest user account data into a second user account data set (namely, a second risk level) according to a second preset condition, and continuously dividing a grey member from the second risk level, specifically:
and dividing other members into a second risk level according to a second preset condition, where the second preset condition may be that the number of times is greater than a predetermined number of times (e.g., 30 times) determined by a text model, and the members do not reach a violation value but have a certain risk potential, so that all members in the set may be used as the second risk level, where the user account data set in the above embodiment is a second user account data set, the number of risk exposures is used as first feature data of all members in the second risk level, and the concentration of risk exposures is used as second feature data of all members in the second risk level. And finally, continuously dividing all members in the second risk level into grey-white members according to the if anomaly detection model detection in the first step and the division method fusing the CFSFDP clustering model.
The third step: and (3) dividing a third risk level, namely, removing the user account data of all members in the first risk level and the second risk level from the total user account data, taking the rest set of the user account data as the third risk level, and continuously dividing suspected white and pure white members from the third risk level, specifically:
and taking the distribution times as first characteristic data of all members in the third risk level, and taking the daily distribution times as second characteristic data of all members in the third risk level. And continuously dividing all members in the third risk level into suspected white members and pure white members according to the if anomaly detection model detection in the first step and the division method fusing the CFSFDP clustering model.
In an actual scene, all members in a social domain are further divided into 3 risk levels and 6 specific risk levels, and the risk levels can be tightly combined with daily prevention and control, so that the evaluation standards as behavior models, policy audit and the like are met, layered policy control on policy control is achieved, and layered fusing and other effects on fusing are achieved.
Based on the same idea, an embodiment of the present specification further provides a risk classification device based on a user account, for example, fig. 4 is a risk classification device based on a user account provided in an embodiment of the present specification, where the device 400 mainly includes:
an obtaining module 401, configured to obtain a predetermined user account data set, where the user account data set includes user accounts and account data corresponding to the user accounts;
a processing module 402, configured to process the account data to obtain first feature data and second feature data of each user account;
a discretization module 403, configured to perform discretization on the first feature data and the second feature data of the user account to obtain a first discrete value and a second discrete value of each user account, and establish a corresponding relationship between the first discrete value and the second discrete value of each user account;
a detection module 404, configured to detect the user accounts according to the first discrete value and the second discrete value, so as to obtain a first risk score of each user account;
a clustering module 405, configured to perform a clustering operation on the user accounts based on the first discrete value, the second discrete value, and the first risk score of each user account, to obtain a second risk score of the user accounts in the same corresponding relationship;
and the dividing module 406 is configured to divide the risk level of the corresponding relationship to which the user account belongs according to the second risk score and the corresponding relationship.
According to an embodiment of the present application, in the apparatus, the obtaining module 401 is specifically configured to obtain, from a database in one or more social domains, user account data in the one or more social domains, and use a set of the user account data as a user account data set.
According to an embodiment of the present application, in the apparatus, the processing module 402 is specifically configured to perform data cleansing and feature derivation on the account data, and extract first feature data and second feature data of each user account.
According to an embodiment of the present application, in the apparatus, the detecting module 404 is specifically configured to input the first discrete value and the second discrete value into an anomaly detection model for detection, so as to obtain a first risk score of each user account.
According to an embodiment of the present application, the apparatus further includes a first constructing module 407, configured to perform discretization processing on standard first feature data and second feature data to obtain discrete values, and train the discrete values based on an anomaly detection algorithm to obtain the anomaly detection model.
According to an embodiment of the present application, in the apparatus, the clustering module 405 is specifically configured to input the first discrete value, the second discrete value, and the first risk score into a clustering model for clustering, so as to obtain a second risk score of the user account under the same corresponding relationship.
According to an embodiment of the application, the apparatus further includes a second building module 408, configured to discretize the standard first feature data and the second feature data to obtain discrete values, input the discrete values into the anomaly detection model to obtain a standard first risk score, and train the discrete values and the standard first risk score based on a clustering algorithm to obtain the clustering model.
According to an embodiment of the present application, in the apparatus, the dividing module 406 is specifically configured to arrange the second risk scores in sequence, divide the correspondence relationships with the same first discrete values into first risk levels according to an arrangement sequence of the second risk scores, and divide the correspondence relationships into the first risk levels in sequence until the risk levels of all the correspondence relationships are determined.
According to an embodiment of the present application, in the apparatus, the apparatus further includes a mapping module 409, configured to map a first discrete value and a second discrete value in the corresponding relationship according to the corresponding relationship and the risk level, so as to restore specific service data of the corresponding relationship to which the user account belongs.
An embodiment of the present specification further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for risk classification based on a user account when executing the program.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the nonvolatile computer storage medium, since they are substantially similar to the embodiments of the method, the description is simple, and the relevant points can be referred to the partial description of the embodiments of the method.
The apparatus, the electronic device, the nonvolatile computer storage medium and the method provided in the embodiments of the present description correspond to each other, and therefore, the apparatus, the electronic device, and the nonvolatile computer storage medium also have similar advantageous technical effects to the corresponding method.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical blocks. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as ABEL (Advanced Boolean Expression Language), AHDL (alternate Hardware Description Language), traffic, CUPL (com universal Programming Language), HDCal, jhddl (Java Hardware Description Language), lava, lola, HDL, PALASM, plasm (software Hardware Description Language), VHDL (runtime Hardware Description Language), and vhjdl (Hardware Description Language), which are currently used in most popular applications. It will also be apparent to those skilled in the art that hardware circuitry for implementing the logical method flows can be readily obtained by a mere need to program the method flows with some of the hardware description languages described above and into an integrated circuit.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be regarded as a hardware component and the means for performing the various functions included therein may also be regarded as structures within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present specification, and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (12)

1. A risk rating method based on a user account number comprises the following steps:
acquiring a preset user account data set, wherein the user account data set comprises user accounts and account data corresponding to the user accounts;
processing the account data to obtain first characteristic data and second characteristic data of each user account;
performing discretization operation on the first characteristic data and the second characteristic data of the user accounts to obtain a first discrete value and a second discrete value of each user account, and establishing a corresponding relation between the first discrete value and the second discrete value of each user account;
detecting the user accounts according to the first discrete value and the second discrete value to obtain a first risk score of each user account, which specifically includes: inputting the first discrete value and the second discrete value into an anomaly detection model for detection to obtain a first risk score of each user account;
performing clustering operation on the user accounts based on the first discrete value, the second discrete value and the first risk score of each user account to obtain a second risk score of the user accounts under the same corresponding relationship, specifically comprising: inputting the first discrete value, the second discrete value and the first risk score into a clustering model for clustering to obtain a second risk score of the user account under the same corresponding relation;
and dividing the risk level of the corresponding relation to which the user account belongs according to the second risk score and the corresponding relation.
2. The method of claim 1, wherein the obtaining a predetermined set of user account data comprises:
and acquiring user account data in one or more social domains from databases of the one or more social domains, and taking the set of the user account data as a user account data set.
3. The method of claim 1, wherein the processing the account data to obtain first characteristic data and second characteristic data of each user account comprises:
and performing data cleaning and characteristic derivation on the account data, and extracting first characteristic data and second characteristic data of each user account.
4. The method of claim 1, further comprising constructing the anomaly detection model, in particular,
discretizing the standard first feature data and the standard second feature data to obtain discrete values, and training the discrete values based on an anomaly detection algorithm to obtain the anomaly detection model.
5. The method of claim 1, further comprising, constructing the clustering model, in particular,
discretizing the standard first characteristic data and the standard second characteristic data to obtain discrete values, inputting the discrete values into an anomaly detection model to obtain standard first risk scores, and training the discrete values and the standard first risk scores based on a clustering algorithm to obtain the clustering model.
6. The method of claim 1, wherein the classifying the risk level of the corresponding relationship to which the user account belongs according to the second risk score and the corresponding relationship comprises:
and arranging the second risk scores in sequence, dividing the corresponding relations with the same first discrete values into first risk grades according to the arrangement sequence of the second risk scores, and sequentially dividing until the risk grades of all the corresponding relations are determined.
7. The method according to claim 1, after the dividing the risk level of the corresponding relationship to which the user account belongs according to the second risk score and the corresponding relationship, further comprising:
and mapping the first discrete value and the second discrete value in the corresponding relationship according to the corresponding relationship and the risk level so as to restore the specific service data of the corresponding relationship to which the user account belongs.
8. The method of claim 1, comprising:
acquiring user account data in one or more social domains from databases of the one or more social domains as total user account data, extracting the user account data meeting a first preset condition from the total user account data, and taking a set of the user account data meeting the first preset condition as a first user account data set; and when the user account data set is a first user account data set, taking the violation times as the first characteristic data and taking the risk concentration as the second characteristic data.
9. The method of claim 8, further comprising:
removing user account data in the first user account data set from the total user account data, and taking a set of user account data meeting a second preset condition in the remaining user account data as a second user account data set; and when the user account data set is a second user account data set, taking the risk exposure times as the first characteristic data and taking the risk exposure concentration as the second characteristic data.
10. The method of claim 9, further comprising:
removing user account data in the first user account data set and the second user account data set from the total user account data, and taking the rest user account data set as a third user account data set; and when the user account data set is a third user account data set, taking the distribution times as the first characteristic data and taking the average daily distribution times as the second characteristic data.
11. An apparatus for risk ranking based on a user account, the apparatus comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a preset user account data set, and the user account data set comprises user accounts and account data corresponding to the user accounts;
the processing module is used for processing the account data to obtain first characteristic data and second characteristic data of each user account;
the discrete module is used for performing discretization operation on the first characteristic data and the second characteristic data of the user accounts to obtain a first discrete value and a second discrete value of each user account, and establishing a corresponding relation between the first discrete value and the second discrete value of each user account;
the detection module is configured to detect the user accounts according to the first discrete value and the second discrete value to obtain a first risk score of each user account, and specifically includes: inputting the first discrete value and the second discrete value into an anomaly detection model for detection to obtain a first risk score of each user account;
the clustering module is configured to perform clustering operation on the user accounts based on the first discrete value, the second discrete value, and the first risk score of each user account to obtain a second risk score of the user accounts under the same corresponding relationship, and specifically includes: inputting the first discrete value, the second discrete value and the first risk score into a clustering model for clustering to obtain a second risk score of the user account under the same corresponding relation;
and the dividing module is used for dividing the risk grade of the corresponding relation to which the user account belongs according to the second risk score and the corresponding relation.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the user account based risk ranking method of any of claims 1 to 10 when executing the program.
CN201910448945.2A 2019-05-28 2019-05-28 Risk grade classification method and device based on user account Active CN110263817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910448945.2A CN110263817B (en) 2019-05-28 2019-05-28 Risk grade classification method and device based on user account

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910448945.2A CN110263817B (en) 2019-05-28 2019-05-28 Risk grade classification method and device based on user account

Publications (2)

Publication Number Publication Date
CN110263817A CN110263817A (en) 2019-09-20
CN110263817B true CN110263817B (en) 2023-04-07

Family

ID=67915546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910448945.2A Active CN110263817B (en) 2019-05-28 2019-05-28 Risk grade classification method and device based on user account

Country Status (1)

Country Link
CN (1) CN110263817B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008377A (en) * 2019-10-12 2020-04-14 中国平安财产保险股份有限公司 Account monitoring method and device, computer equipment and storage medium
CN111783998B (en) * 2020-06-30 2023-08-11 百度在线网络技术(北京)有限公司 Training method and device for illegal account identification model and electronic equipment
CN115858637B (en) * 2023-03-02 2023-05-19 四川三思德科技有限公司 Urban groundwater monitoring and analyzing method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120239557A1 (en) * 2010-12-14 2012-09-20 Early Warning Services, Llc System and method for detecting fraudulent account access and transfers
PH12015000166A1 (en) * 2015-05-15 2016-11-21 James D Griffin Method and system for fraud detection and compliance management
CN108108866A (en) * 2016-11-24 2018-06-01 阿里巴巴集团控股有限公司 A kind of method and device of risk control
CN109087106B (en) * 2018-07-03 2020-12-08 创新先进技术有限公司 Wind control model training and wind control method, device and equipment for recognizing fraudulent use of secondary number-paying account

Also Published As

Publication number Publication date
CN110263817A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN110400169B (en) Information pushing method, device and equipment
CN107391545B (en) Method for classifying users, input method and device
CN109508879B (en) Risk identification method, device and equipment
CN110263817B (en) Risk grade classification method and device based on user account
CN108596410B (en) Automatic wind control event processing method and device
CN111538794B (en) Data fusion method, device and equipment
CN110674188A (en) Feature extraction method, device and equipment
CN108764915B (en) Model training method, data type identification method and computer equipment
CN111080304A (en) Credible relationship identification method, device and equipment
CN110263157A (en) A kind of data Risk Forecast Method, device and equipment
CN110633989A (en) Method and device for determining risk behavior generation model
TW201923629A (en) Data processing method and apparatus
CN112672184A (en) Video auditing and publishing method
CN111782637A (en) Model construction method, device and equipment
CN109300041A (en) Typical karst ecosystem recommended method, electronic device and readable storage medium storing program for executing
CN112132238A (en) Method, device, equipment and readable medium for identifying private data
CN105989066A (en) Information processing method and device
CN109492401B (en) Content carrier risk detection method, device, equipment and medium
CN111259975B (en) Method and device for generating classifier and method and device for classifying text
CN111723280B (en) Information processing method and device, storage medium and electronic equipment
CN110175733B (en) Public opinion information processing method and server
CN108595395B (en) Nickname generation method, device and equipment
CN110008352A (en) Entity finds method and device
CN113255857B (en) Risk detection method, device and equipment for graphic code
CN112528021B (en) Model training method, model training device and intelligent equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201016

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201016

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant