WO2018192348A1 - 数据处理方法、装置及服务器 - Google Patents

数据处理方法、装置及服务器 Download PDF

Info

Publication number
WO2018192348A1
WO2018192348A1 PCT/CN2018/080842 CN2018080842W WO2018192348A1 WO 2018192348 A1 WO2018192348 A1 WO 2018192348A1 CN 2018080842 W CN2018080842 W CN 2018080842W WO 2018192348 A1 WO2018192348 A1 WO 2018192348A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
target asset
target
data
behavior
Prior art date
Application number
PCT/CN2018/080842
Other languages
English (en)
French (fr)
Inventor
郑巧玲
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018192348A1 publication Critical patent/WO2018192348A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Definitions

  • the present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and server.
  • the user asset status is used as part of the user information, just like the user's age, gender, country, province, city and other basic attributes. It is a very important information to describe the user. It is widely used in the scenes of user image generation and information recommendation. Good implementation of user image generation, information recommendation, etc., to optimize the mining state of the user's asset state is of great significance; user asset status is a representation of whether the user owns an asset, such as whether the user has a property, or whether there is no car, etc. .
  • an embodiment of the present invention provides a data processing method, apparatus, and server, to improve processing efficiency of user asset state mining.
  • the embodiment of the present invention provides the following technical solutions:
  • a data processing method is applied to a server, the method comprising:
  • the target asset state prediction model being trained according to user characteristics of the positive sample user and the negative sample user acquired from the at least one data source; wherein the positive sample user has the possibility of the target asset Sex, greater than the probability that the negative sample user has the target asset; the user feature includes at least: a user behavior feature;
  • the probability that the user to be mined has a target asset is greater than a probability threshold, it is determined that the user to be mined has a target asset.
  • the embodiment of the invention further provides a data processing device, which is applied to a server, and the data processing device includes:
  • a feature acquisition module configured to acquire, from at least one data source, a user feature of the user to be mined
  • a model retrieval module configured to acquire a pre-trained target asset state prediction model, wherein the target asset state prediction model is trained according to user characteristics of a positive sample user and a negative sample user acquired from the at least one data source; The possibility that the sample user has the target asset is greater than the probability that the negative sample user has the target asset; the user feature includes at least: the user behavior feature;
  • a probability prediction module configured to predict, according to the user feature of the user to be mined, a probability that the user to be mined has a target asset, and the target asset state prediction model;
  • the first result determining module is configured to determine that the user to be mined has a target asset if the probability that the user to be mined has a target asset is greater than a probability threshold.
  • the embodiment of the invention further provides a server, comprising the data processing device described above.
  • the embodiment of the present invention can train the target asset state prediction model according to at least the behavior characteristics corresponding to the data source of the positive sample user and the negative sample user; and further, when the target asset state mining is performed by the user to be mined, Mining a user feature of the user in the at least one data source, predicting, by the target asset state prediction model, a probability that the user to be mined has a target asset, and determining, when the probability that the user to be mined has a target asset is greater than a probability threshold The user to be mined has a target asset, and the state of the target asset is mined.
  • the embodiment of the present invention can train the target asset state prediction model according to at least the user behavior feature, the target asset state prediction model is used to predict the probability that the user has the target asset, and the target asset state is automatically mined without going to the bank.
  • the Housing Authority and the Vehicle Management Office manually inquired about the user asset data, which improved the processing efficiency of the user's asset status mining.
  • the bank, the Housing Authority, the vehicle management office and other institutions need to authorize and agree.
  • the embodiment of the invention can use at least the user behavior characteristics recorded in the data sources such as social, search, etc. to realize the mining of the target asset state, and the use limitation of the mining mode is reduced.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for training a target asset state prediction model according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for determining a target user according to an embodiment of the present invention
  • FIG. 4 is a flowchart of a method for determining a score of a primary user according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of processing according to an embodiment of the present invention.
  • FIG. 6 is a structural block diagram of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 7 is a block diagram showing another structure of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 8 is a block diagram showing still another structure of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 9 is a block diagram showing another structure of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 10 is a structural block diagram of a hardware structure of a server according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the present invention.
  • the method may be applied to a server.
  • the server may be a service device with data processing capability located on the network side, or may be located at the user.
  • a computing device such as a PC (personal computer) having data processing capability on the side;
  • the method can include:
  • Step S100 Acquire user characteristics of the user to be mined from at least one data source.
  • the user to be mined is the user to be mined the target asset, that is, the embodiment of the present invention needs to determine whether the user to be mined has the target asset, so as to implement user mining with the target asset;
  • the user characteristics of the user to be mined include the user behavior characteristics of the user to be mined.
  • the user feature of the user to be mined may further include: at least one of a basic attribute (such as age, gender, education, etc.) of the user to be excavated, and an interest feature.
  • the user characteristics of the user to be excavated may be obtained from at least one data source according to the user ID, the ID number, the mobile phone number, and the like of the user to be excavated;
  • the data source may be an application platform on which the user data is recorded, such as a social platform or a search platform.
  • Such an application platform can provide a user registration function (when the user registers, the user can fill in the user's age, gender, education and other basic attributes), and for the registered user, the corresponding user behavior can be recorded according to the behavior of the registered user on the application platform.
  • data The user behavior data refers to data that characterizes the user behavior.
  • the user behavior data may be behavior data of a purchase, a search, and the like implemented by the user on the e-commerce platform, and the user behavior data may also be social behavior data implemented by the user on the social platform.
  • the user behavior data can also be search behavior data that the user implements in the search engine.
  • the user behavior data recorded by the data source may exist in the form of a user behavior log, and the user behavior characteristics may be determined through the user behavior data in the user behavior log.
  • the user's interest characteristics may also be analyzed based on the user behavior data generated by the historical behavior of the registered user for a period of time.
  • the method of obtaining the user feature from the data source such as the social platform and the search platform may be implemented by the network; or the data source and the server belong to the same service provider, and the server may obtain the information through the interface of the application platform corresponding to the data source. Mining user characteristics of users.
  • the application platform corresponding to the server and the data source may use the same account system, and the server may access the user account of the application platform corresponding to the data source, so that the user to be tapped can use the application platform corresponding to the data source. Registered user account, log in to the server.
  • Step S110 Retrieving a pre-trained target asset state prediction model, where the target asset state prediction model is trained according to user characteristics of a positive sample user acquired from the at least one data source and user characteristics of a negative sample user; The probability that the sample user has the target asset is greater than the probability that the negative sample user has the target asset; the user feature includes at least: the user behavior feature.
  • the embodiment of the present invention may pre-train a target asset state prediction model, and the target asset state prediction model can predict a probability that a certain user has a target asset; during specific training, multiple users can be obtained from the at least one data source.
  • the behavior data is obtained by analyzing the plurality of pieces of user behavior data, and selecting a positive sample user and a negative sample user from the plurality of user behavior data corresponding to the plurality of user behavior data, thereby obtaining a user of the positive sample user corresponding to the at least one data source Feature, and user characteristics corresponding to the at least one data source of the negative sample user, and training the target asset state prediction model by a machine learning method;
  • the positive sample user may be a user who has a greater likelihood of having a target asset after analyzing the plurality of pieces of user behavior data, wherein the user who has a higher probability of having the target asset is statistically Angle, that is, a user with a high probability of having a target asset; relatively speaking, the probability that the negative sample user has the target asset is less than the probability that the positive sample user has the target asset, that is, the probability that the negative sample user has the target asset is less than The probability that a positive sample user has a target asset.
  • the user feature of the positive sample user may include the user behavior feature of the positive sample user.
  • the user feature of the positive sample user may further include at least one of a user base attribute, an interest feature, and the like of the positive sample user.
  • the user feature of the negative sample user may include the user behavior feature of the negative sample user, and may also include at least one of a user base attribute, an interest feature, and the like of the negative sample user.
  • Step S120 predict, according to the user feature of the user to be mined, and the target asset state prediction model, a probability that the user to be mined has a target asset.
  • the user characteristics of the user to be excavated are used as the input data of the target asset state prediction model, and input into the target asset state prediction model.
  • the target asset state prediction model can be used to predict the probability that the user to be mined has the target asset.
  • Step S130 If the probability that the user to be mined has a target asset is greater than a probability threshold, determine that the user to be mined has a target asset.
  • the embodiment of the present invention may set a probability lower limit value of the target asset, and obtain the probability threshold, so that the probability that the user to be mined predicted by the target asset state prediction model has the target asset is greater than the probability threshold, and is considered to be It is said that the mining user has the target asset, thereby realizing the target asset state mining.
  • the embodiment of the present invention can train the target asset state prediction model according to at least the behavior characteristics corresponding to the data source of the positive sample user and the negative sample user; and further, when the target asset state mining is performed on the user to be mined, Determining, by the user, the user characteristics of the at least one data source, the target asset state prediction model, the probability that the user to be mined has a target asset, and determining that the probability that the user to be mined has a target asset is greater than a probability threshold It is said that the mining user has the target asset and realizes the mining of the state of the target asset.
  • the target asset state prediction model is used to predict the probability that the user has the target asset, and the target asset state is automatically mined without going to the bank.
  • the Housing Authority and the Vehicle Management Office manually inquired about the user asset data, which improved the processing efficiency of the user's asset status mining.
  • the bank, the Housing Authority, the vehicle management office and other institutions need to authorize and agree.
  • the embodiment of the invention can use at least the user behavior characteristics recorded in the data sources such as social, search, etc. to realize the mining of the state of the target asset, and reduce the authorization of the bank, the housing management bureau, the vehicle management office and the like to obtain the user asset data, and reduce the situation. Limitations on the use of user asset state mining.
  • the probability that the user to be mined predicted by the step S120 has the target asset is less than or equal to the probability threshold, it may be determined that the user to be mined does not have the target asset.
  • the training process of the target asset state prediction model is introduced below.
  • the training ideas of the target asset state prediction model are mainly: selecting positive sample users with high probability of target assets and less possibility of having target assets.
  • the negative sample user realizes the model training by using the user features of the positive sample user and the negative sample user in at least one data source as input features of the model training;
  • FIG. 2 shows a method flow for training a target asset state prediction model, which can be applied to a server, which mainly implements selection of a positive sample user by matching processing of user behavior features;
  • the method may include:
  • Step S200 Determine a target user from a user set; the user behavior feature of the target user matches a forward feature word of a predetermined target asset, and the user behavior feature of the target user and a negative feature word of the predetermined target asset Mismatch.
  • the user set may be a set of users corresponding to the behavior data set (including the plurality of behavior data) collected from the at least one data source, where the user set corresponds to the plurality of behavior data collected from the at least one data source.
  • the positive feature word of the target asset may represent a keyword of the target asset, and is a description of a positive feature of the target asset (a feature matching the target asset), taking the target asset as having a vehicle production as an example.
  • the positive feature words can be "car friends, car insurance, illegal, car loans, selling cars" and so on.
  • the negative characteristic words of the target asset may be filter words of the target asset, which are descriptions of the reverse characteristics of the target asset (characteristics that do not match the target asset). Negative feature words can be "car rental, driving school, buying a car” and so on.
  • the positive characteristic words of the target asset and the negative characteristic words of the target asset can be preset.
  • the positive feature word of the target asset and the negative feature word of the target asset may be preset according to the empirical value, and the forward feature word of the predetermined target asset is the predetermined target asset.
  • the positive feature word, the negative feature word of the predetermined target asset is the negative feature word of the predetermined target asset.
  • the above-mentioned target assets with automobile production are taken as an example, and the positive characteristic characters and negative characteristic words of the target assets described are only exemplary descriptions, and the positive characteristic characters and negative characteristic words of the target assets are very rich.
  • the embodiment of the present invention can enumerate the forward positive feature words and the negative feature words of the target asset as much as possible, so that the determination result of the target user is as accurate as possible.
  • the embodiment of the present invention may display the user behavior characteristic represented by the user behavior data of each user, and the forward characteristic word of the predetermined target asset and the predetermined The negative feature words of the target asset are respectively matched to determine that the user behavior feature matches the predetermined positive feature word of the target asset, and the user behavior feature of the target user does not match the negative feature word of the predetermined target asset.
  • Target users may display the user behavior characteristic represented by the user behavior data of each user, and the forward characteristic word of the predetermined target asset and the predetermined The negative feature words of the target asset are respectively matched to determine that the user behavior feature matches the predetermined positive feature word of the target asset, and the user behavior feature of the target user does not match the negative feature word of the predetermined target asset.
  • the target asset of the above article is the case of automobile production.
  • the characteristic words of the user behavior characteristics of the target user should be positive characteristics such as “Cheyou, Auto Insurance, Violation, Car Loan, Selling Car”. Matching, and does not match the negative feature words such as “car rental, driving school, buying a car”.
  • the target user can be determined by matching with the positive feature word.
  • the user behavior data can not only match the positive feature words, but also match some negative feature words, and can also filter out the negative feature words, so that the target user's determination result is more accurate.
  • the target asset of the above article is an example of having a car production.
  • the initial users may be among these users.
  • the users who do not have the car products but want to know the vehicle information are filtered out, and the users who have accurate car products are retained. At this time, it is necessary to pass the car rental, driving school, and buying a car.
  • the preliminary users matched by the positive feature words filter out the users who have the negative feature words such as “renting, driving school, and buying a car” that are highly likely to be irrelevant to the user with the state of production. Therefore, the user behavior data that interferes with the determination of the target user is filtered, so that the determined result of the determined target user with the vehicle production is more accurate.
  • the process of determining the target user may also be directly implemented by the forward feature word of the predetermined target asset, without passing through the negative feature word of the predetermined target asset; that is, the user may concentrate the user behavior in the embodiment of the present invention.
  • the user whose feature matches the predetermined positive feature word of the target asset directly acts as the target user; after determining the user whose user behavior feature matches the predetermined positive feature word of the target asset, and then through the predetermined negative direction with the target asset
  • the means for the feature word to interfere with the user filtering is only an optional way to improve the accuracy rate of the target user; on the basis of setting a reasonable forward feature word of the target asset, the embodiment of the present invention can also describe the user behavior feature.
  • the user matching the predetermined positive feature word of the target asset is directly determined as the target user, and the determination result has certain accuracy.
  • Step S210 The target user is used as a positive sample user used by the training target asset state prediction model, and at least one user other than the positive sample user is selected from the user set as a negative sample user used by the training target asset state prediction model.
  • the target user since the user behavior characteristic of the target user matches the predetermined positive feature word of the target asset, and does not match the predetermined negative feature word of the target asset, the target user has a greater possibility of having the target asset.
  • the target user can be used as a positive sample user for the training target asset state prediction model.
  • At least one user other than the positive sample user may be selected from the user set as the negative sample user used for training the training target asset state prediction model.
  • the method of selecting the negative sample user is not limited, and may be randomly selected from the user set except the positive sample to obtain the negative sample user, or may be selected from the user set except the positive sample according to the sample distribution rule.
  • the negative sample user is not limited in this embodiment.
  • the ratio of the positive sample user and the negative sample user may be set according to actual conditions.
  • the ratio of the positive sample user to the negative sample user may be 1:1, or N:1, etc., where N is a set value.
  • Step S220 Acquire, from the at least one data source, a user feature of the positive sample user and a user feature of the negative sample user.
  • the embodiment of the present invention may obtain the user feature of the positive sample user and the user feature of the negative sample user from the at least one data source.
  • the user characteristics of the positive sample user include at least the user behavior characteristics of the positive sample user.
  • the user's basic attributes such as age, gender, education, etc.
  • interest characteristics, and the like may also be used as user features.
  • the user characteristics of the negative sample user include at least the user behavior characteristics of the negative sample user.
  • the basic attributes, interest features, and the like of the negative sample user are also included.
  • the specific form of the user feature can be defined according to the actual situation.
  • Step S230 The target asset state prediction model is trained by the machine training method according to the user characteristics of the positive sample user and the user characteristics of the negative sample user.
  • the embodiment of the present invention may use the user feature of the positive sample user and the user feature of the negative sample user as the input data of the machine training method, and train the target asset state prediction model through the machine training method;
  • the machine training method used in the embodiments of the present invention may include: Decision Tree (DT), Logistic Regression (LR), Naive Bayesian Network ( Bayes, NB), Random Forest (RF), Support Vector Machine (SVM), and Boosting Model xgboost (eXtreme Gradient Boosting), etc.
  • DT Decision Tree
  • LR Logistic Regression
  • RF Random Forest
  • SVM Support Vector Machine
  • Boosting Model xgboost eXtreme Gradient Boosting
  • the parameters of the model may be adjusted, such as the xgboost model, and the tree depth, the contraction step size (also referred to as the learning rate, represented by the character eta), and the number of iterations may be adjusted.
  • the model quality can be judged according to the area under the curve (auc), the error rate, the recall rate, and the precision of the model output, and the model training result can be optimized.
  • the embodiment of the present invention may perform pre-processing to obtain a plurality of user behavior records, thereby predetermining the user behavior characteristics and the target assets represented by the user behavior record. Matching the positive characteristic word and the negative characteristic word to determine the target user;
  • FIG. 3 is a flowchart of a method for determining a target user according to an embodiment of the present invention.
  • the method may be applied to a server.
  • the method may include:
  • Step S300 Acquire a plurality of pieces of user behavior data collected from at least one data source, where users corresponding to the plurality of pieces of user behavior data are included in the user set.
  • the user behavior data may be in the form of a user behavior log.
  • the embodiment of the present invention may collect a large amount of user behavior data from at least one data source, where the user behavior data is a behavior of an application platform corresponding to a user in the data source.
  • the description refers to the user involved in the massive user behavior data collected, corresponding to the user set described above.
  • Step S310 pre-processing the plurality of user behavior data, obtaining pre-processed user behavior data, extracting user behavior records corresponding to each pre-processed user behavior data, and obtaining a plurality of user behavior records;
  • a behavior record represents a user's behavioral characteristics at a point in time.
  • the embodiment of the present invention may perform pre-processing, and then extract the user behavior records corresponding to the pre-processed user behavior data to obtain a plurality of user behavior records;
  • the number of user behavior records is not greater than the number of collected user behavior data, and a user behavior record may correspond to a pre-processed user behavior data;
  • the process of preprocessing the user behavior data may be: deleting user behavior data belonging to data noise, and/or filling missing values in the user behavior data to obtain pre-processed user behavior data;
  • the embodiment of the present invention may delete user behavior data that is data noise from the plurality of pieces of user behavior data, and/or perform user attribute data with missing attribute values in the plurality of pieces of user behavior data, and perform attribute value complementing.
  • User behavior data pertaining to data noise refers to user behavior data that contains incorrect attribute values or attribute values that deviate from the desired isolated point. There may be several reasons for noise in the user behavior data, such as equipment failure in data collection, error in data input, error in data transmission, damage to storage medium, etc.
  • this The embodiment of the invention may be deleted by preprocessing; for example, in the collected user behavior data, there is a behavior data with a time attribute of 2050, and then obviously, since the current year has not yet reached 2050, the behavior data may be It is caused by an error in the year when the data is input, or the device is faulty. It needs to be deleted by preprocessing. Obviously, the example here is only the case where the behavior data contains the value of the error attribute or the attribute value deviates from the expected isolated point. A form.
  • the optional processing method may be: filling in the missing attribute values in the user behavior data, such as using a predetermined value to fill the missing attribute values; for example, a user behavior data lacking the attribute value of the age attribute, Then, the embodiment of the present invention may fill the attribute value of the age attribute with a predetermined age value to fill in the missing attribute value in the user behavior data.
  • the example of the attribute value filling method lacking the age value described herein is only an indication, actually In use, you can set the attribute type that needs to be filled with missing attribute values as needed.
  • the quantity of the user behavior data after the preprocessing may be less than the quantity of the user behavior data in the user behavior data set;
  • the embodiment of the present invention extracts the user behavior record corresponding to each pre-processed user behavior data, and obtains multiple user behaviors. Record, and a user behavior record corresponds to a pre-processed user behavior data;
  • a user behavior record can represent a user's behavior characteristics at a point in time, such as the behavior and behavior of a user at a certain point in time.
  • An optional user behavior record can be in the form of ⁇ user id, Behavior time, behavior type, behavior number, behavior description ⁇ ; wherein the user id can be used to uniquely identify a user, and the behavior time can be used to indicate the time point of the user execution behavior corresponding to the user id;
  • the pre-processed user behavior data is obtained, and the user behavior records corresponding to each pre-processed user behavior data are extracted to perform subsequent target asset state prediction.
  • the training of the model can greatly reduce the amount of data processing.
  • the obtained plurality of user behavior records correspond to at least one data source, and the plurality of users corresponding to the plurality of user behavior records belong to the user set.
  • step S300-step S310 only the user behavior record corresponding to the user in the user set is obtained from the at least one data source, and a specific implementation manner of the multiple user behavior records is obtained, and other possible implementation manners in the embodiment of the present invention are provided.
  • multiple user behavior records can be obtained in other ways.
  • Step S320 determining, according to the forward feature word of the predetermined target asset and the negative feature word of the predetermined target asset, that the user behavior feature matches the positive feature word in the plurality of user behavior records, and A user behavior record in which the negative feature words do not match.
  • Step S330 determining the user corresponding to the determined user behavior record as the primary user.
  • Step S340 Determine a target user from the primary users.
  • the embodiment of the present invention can preprocess a plurality of pieces of user behavior data collected from at least one data source, extract a user behavior record, and obtain a plurality of user behavior records, thereby predicting positive feature words according to the target asset.
  • a negative feature word determining, from the plurality of user behavior records, a user behavior record that matches the positive feature word and does not match the negative feature word; and further determining the user behavior
  • the corresponding user of the record is determined to be a primary user, and the target user is determined from the primary users.
  • the embodiment of the present invention may directly use the primary user as the target user.
  • the selected user may also be selected to obtain the target user.
  • the embodiment of the present invention may determine the score of each primary user; the score of a primary selected user may represent that the primary selected user has the target asset. Probability; the higher the score of the primary user, the greater the probability that the primary user has the target asset, that is, the greater the likelihood of having the target asset.
  • the first number of target users may be determined from the primary users; the first number is less than the number of primary users.
  • FIG. 4 is A flowchart of a method for determining a score of a primary user is provided by the embodiment of the present invention. Referring to FIG. 4, the method may include:
  • Step S400 Determine a data source corresponding to the user behavior record of the primary user, obtain a data source corresponding to the primary user, and determine a number of behaviors corresponding to the primary data in the corresponding data source, and a behavior occurrence time.
  • the embodiment of the present invention may determine a user behavior record corresponding to the primary user, and obtain a user behavior record of the primary user (the user behavior characteristic of the user behavior record matches the positive feature word, And the user does not match the negative feature word; that is, the embodiment of the present invention needs to analyze the user behavior record corresponding to the primary user, specifically, which of the user behavior records determined in step S320 are the user corresponding to the primary user.
  • the behavior record for example, the user behavior record determined in step S320 is 100, and the 100 user behavior records record the user behavior characteristics of the three users A, B, and C, and for the primary user A, the 100 needs to be determined.
  • Which user behavior records are the user behavior records of the primary user A (optionally, each user can be distinguished by a user id such as a user account in each user behavior record, and correspondingly, the user behavior determined in step S320 can be determined. Determining the user behavior record corresponding to the user id of the primary user A in the record, and obtaining the user behavior record of the primary user A; thus for each primary user All of the processing is performed to obtain a user behavior record corresponding to each primary user.
  • the embodiment of the present invention may determine a data source corresponding to the user behavior record of the primary user; for example, determining the user behavior record of the primary user A as determined in step S320.
  • the user behavior record of the first to the 20th may be from different data sources, and the embodiment of the present invention needs to determine the user behavior record of the first to the 20th user behavior records.
  • the data source is used to determine the data source corresponding to the user behavior record of the primary user A, and the data source corresponding to the primary user A is obtained; for each primary user, the data source corresponding to each primary user is obtained. .
  • the embodiment of the present invention can determine the user behavior record of the primary user's user behavior in the corresponding data source, and the behavior occurrence time, and obtain the primary user.
  • the corresponding number of behaviors in each corresponding data source, and the behavior occurrence time (optional, the primary selection user can be determined by information such as the behavior time and the number of behaviors recorded by the user behavior record corresponding to each data source of the primary user) The number of corresponding actions in each corresponding data source, and the behavior occurrence time).
  • the user behavior records of the same data source may be aggregated; in the aggregation, a user may record the latest behavior time in multiple user behavior records of the data source.
  • the behavior occurrence time of the user in the data source As the behavior occurrence time of the user in the data source; the accumulated value of the behavior times of a user in a plurality of user behavior records of the data source, as the number of user behaviors corresponding to the user in the data source;
  • the primary user A has 20 user behavior records in the data source 1, and the corresponding number of behaviors of the primary user in the data source 1 may be the accumulated value of the behaviors of the 20 user behavior records, and the primary user is The corresponding behavior occurrence time in the data source 1 may be the time of the most recent behavior occurrence in the 20 user behavior records.
  • Step S410 For each data source corresponding to the primary user, the data source weight of the data source, the number of user behaviors corresponding to the primary user in the data source, and the behavior occurrence time are combined to obtain the corresponding primary user. The score of each data source.
  • the score of the u-th primary user in the i-th data source can be determined according to the following formula:
  • sigmoid function The normalization process indicates that the frequency of the behavior is higher, and the score is higher, that is, the number of user behaviors corresponding to the data source of the primary user is positively correlated with the score of the primary user in the data source;
  • t 0 represents the current system time and ⁇ is the time decay parameter.
  • the function indicates that the closer the behavior occurs to the current system time, the larger the score is, the farther away from the current system time, the smaller the score; that is, the difference between the current system time and the behavior occurrence time of the primary user in a data source. The value is negatively correlated with the score of the primary user at the data source.
  • Step S420 adding the scores of the primary users to the corresponding data sources to obtain the scores of the primary users.
  • N is the number of data sources corresponding to the uth primary user.
  • the factors of the primary user's score include the following: First, the weights of different data sources will be different due to different behaviors represented by different data sources (the aforementioned data sources include: social, search, e-commerce, etc.)
  • the corresponding user's behavior in different data sources may be social behavior, search behavior, e-commerce transaction behavior, etc.), for example, the purchase of auto insurance, auto parts behavior clearly indicates that the user is a user with a car status, and search for a certain paragraph Cars or browsing car related information can only indicate that the user is interested in the car;
  • the number of user actions (frequency) is also an important factor, or the above example to illustrate that the user buys auto parts and auto insurance multiple times, then the behavior weight It will be superimposed.
  • the user can clearly indicate that the user has a car status.
  • the user behavior time is different, and the weight is different. The more recently the behavior occurs, the more the user's current asset status can be explained.
  • the embodiment of the present invention may determine a data source corresponding to the user behavior record of the primary user to obtain a data source corresponding to each primary user; and for each primary user, the embodiment of the present invention may Determining the number of behaviors of the primary user in each corresponding data source, and the time when the behavior occurs;
  • the scores of the primary users in the corresponding data sources are determined to obtain the scores of the primary users in the corresponding data sources respectively; wherein, a primary user corresponds to
  • the determining process of the score of a data source includes: weighting the data source of the data source, combining the number of user actions corresponding to the primary user in the data source, and the time of occurrence of the behavior, and obtaining the primary user at the a score in the data source;
  • the primary users are added to the scores of the corresponding data sources to obtain the scores of the primary users.
  • the embodiment of the present invention may assign a uniform weight value to each data source
  • the weights of different data sources may be different; specifically, for a data source, some of the primary users in the data source may be selected as positive samples, and then a certain proportion of negative samples are randomly selected from the user set corresponding to the data source. Assigning an initial weight value to the data source, and inputting the characteristics of the positive and negative samples of the data source into the LR model for training, and finally the result of the convergence of the model iteration is considered to be the weight value of the data source; for each data source For this processing, the weight of each data source can be obtained; here is not limited to learning the data source weight using the LR model, and other machine learning methods can be selected according to specific needs.
  • the embodiment of the present invention selects the first number of primary users ranked first by the score as the target user; or, the determined score is greater than the score threshold.
  • the primary user randomly selects the first number of primary users from the primary users whose scores are greater than the score threshold as the target user.
  • the embodiment of the present invention may further classify the user behavior record by using the topic model, that is, calculating the user behavior characteristic of the user behavior record by using the similarity method, and The similarity of the topic keywords of the sample user is obtained, and the similarity corresponding to each user behavior record is obtained, and the first number of users corresponding to the user behavior record with the highest similarity is taken as the positive sample user.
  • the topic model that is, calculating the user behavior characteristic of the user behavior record by using the similarity method, and The similarity of the topic keywords of the sample user is obtained, and the similarity corresponding to each user behavior record is obtained, and the first number of users corresponding to the user behavior record with the highest similarity is taken as the positive sample user.
  • the embodiment of the present invention may randomly select a second number of users from the primary users as the test sample user, and test the user characteristics of the sample user. Evaluate the accuracy and recall rate of the target asset state prediction model trained;
  • the user characteristics of each test sample user can be input into the target asset state prediction model, and the target asset state prediction model is used to accurately predict the proportion of the test sample user having the target asset, thereby determining the accuracy of the target asset state prediction model. rate.
  • the target asset state prediction model is 10 of the users predicted to have the target asset.
  • the target asset state prediction model and the result of predicting the test sample user are represented by an ROC curve; that is, the user feature of each test sample user is input into the target asset state prediction model, and the predicted result of the test sample user is obtained, and then
  • the prediction result is represented by a Receiver Operating Characteristic (ROC); wherein the ROC curve is also referred to as a receiver operating characteristic curve, and is a comprehensive index reflecting the continuous variables of sensitivity and specificity, and is a composition method.
  • ROC Receiver Operating Characteristic
  • each prediction result of the test sample can be used as a continuous variable, thereby constructing a ROC curve by calculating a series of sensitivity and specificity.
  • the area under the ROC curve can be between 0.1 and 1 as the probability threshold; that is, the probability threshold can be selected by the ROC curve of the test sample of the asset.
  • the target asset state prediction model predicts a probability that a user has a target asset, which is greater than the probability threshold, then the user Classified as a positive class (ie, with a target asset); the target asset state prediction model predicts the probability that a user has a target asset, less than the probability threshold, the user is classified as a negative class (ie, does not have a target asset); Decreasing the probability threshold, for example, to 0.5, can identify more positive categories, that is, increase the proportion of identified positive classes, but also more users who should be negative The positive class is made.
  • the ROC curve can be used to visualize the change, and the ROC curve can be used to visualize different probability threshold selections, and the positive user identification accuracy changes, thereby evaluating the target asset state prediction model.
  • Embodiments of the present invention can adjust the model according to different asset types, and select different probability thresholds according to the ROC curve of the model on the test sample users of each asset to balance the true rate (the true positive class is predicted) Ratio) and negative positive rate (actually not a positive class, but predicted as a positive class ratio), improve the prediction accuracy of the model. If the threshold theory of different assets (with housing, car) is different, the forecasting model of the asset should be adjusted according to the test sample of different assets, so that the ROC curve on the test sample is based on the predicted model of the asset.
  • the embodiment of the present invention may select the asset by the ROC curve of the predicted result of the test sample of the asset.
  • the probability threshold of the model is predicted so that this process is performed for each asset to obtain a probability threshold for the predicted model of each asset.
  • the classifier is in the form of a target asset state prediction model.
  • the optional process in the embodiment of the present invention may be as shown in FIG. 5, and the user behaviors collected from at least one data source are collected by referring to FIG. 5.
  • the embodiment of the present invention can perform preprocessing, extract the user behavior record, and set keywords and filter words of the target asset; thereby performing text semantic mining processing on the user behavior record by using the keyword and filter word of the target asset. And analyzing the user behavior record that matches the positive feature word and does not match the negative feature word, so as to further filter out the refined user behavior record, and the behavior feature matches the target asset
  • the user behavior record thus, the user corresponding to the filtered user behavior record is used as the primary user to determine the primary user;
  • the user characteristics of the user to be mined are imported into the target asset state prediction classifier, and the probability that the user to be mined has the target asset is obtained, and the probability is compared with the probability threshold to determine whether the user to be mined has the target asset. .
  • the embodiment of the present invention may generate the user image of the user to be mined according to the determined result that the user to be mined has the target asset (if The result of the user having the target asset to be mined is used as a vivid data dimension of the user image of the user to be mined to realize the generation of the user image of the user to be mined, so as to realize the mining result of the user asset state in the user image generation. application;
  • the embodiment of the present invention may also recommend information associated with the target asset to the user to be mined according to the determined result that the user to be mined has a target asset.
  • the target asset as an example of vehicle production, and the information associated with the target asset, such as new car information, vehicle limit information, and the like.
  • the embodiment of the invention can train the target asset state prediction model according to at least the user behavior feature, and then use the target asset state prediction model to predict the probability that the user has the target asset, and realize the automatic mining of the target asset state without going to the bank or the house management.
  • the bureau, the vehicle management office and other institutions manually query the user asset data, which improves the processing efficiency of the user asset state mining.
  • the bank, the housing management bureau, the vehicle management office and other institutions require authorization, and the invention
  • the embodiment can use at least the user behavior characteristics recorded in the data sources such as social, search, etc. to realize the mining of the state of the target asset, and the use limitation of the mining method is reduced.
  • the data processing device provided by the embodiment of the present invention is described below.
  • the data processing device described below can be considered as a functional module structure required by the server to implement the data processing method provided by the embodiment of the present invention.
  • FIG. 6 is a structural block diagram of a data processing apparatus according to an embodiment of the present disclosure.
  • the apparatus is applicable to a server.
  • the apparatus may include:
  • the feature obtaining module 100 is configured to acquire, from at least one data source, a user feature of the user to be mined;
  • the model retrieval module 200 is configured to acquire a pre-trained target asset state prediction model, and the target asset state prediction model is trained according to user characteristics of the positive sample user acquired from the at least one data source and user characteristics of the negative sample user. Wherein, the probability that the positive sample user has the target asset is greater than the probability that the negative sample user has the target asset; the user feature includes at least: the user behavior feature;
  • the probability prediction module 300 is configured to predict, according to the user feature of the user to be mined, a probability that the user to be mined has a target asset, and the target asset state prediction model;
  • the first result determining module 400 is configured to determine that the user to be mined has a target asset if the probability that the user to be mined has a target asset is greater than a probability threshold.
  • the apparatus may further include:
  • the second result determining module 500 is configured to determine that the user to be mined does not have the target asset if the probability that the user to be mined has the target asset is less than or equal to the probability threshold.
  • FIG. 7 is another structural block diagram of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 6 and FIG. 7, the apparatus may further include:
  • a model training module 600 configured to determine a target user from a user; the user behavior feature of the target user matches a forward feature word of a predetermined target asset, and the user behavior feature of the target user and a predetermined target asset The negative feature words do not match; the target user is used as a positive sample user used in the training target asset state prediction model, and at least one user other than the positive sample user is selected from the user set as the training target asset state prediction model.
  • a negative sample user obtaining, from the at least one data source, a user feature of the positive sample user and a user feature of the negative sample user; and training the target through a machine training method according to the user feature of the positive sample user and the user feature of the negative sample user Asset state prediction model.
  • model training module 600 is configured to determine the target user from the user, including:
  • a target user is determined from the primary users.
  • model training module 600 is configured to determine a target user from the primary users, including:
  • the score of a primary user characterizes the probability that the primary user has the target asset
  • the first number of target users are determined from the primary users according to the scores of the primary users.
  • model training module 600 is configured to determine a score of a primary user, including:
  • Determining a data source corresponding to the user behavior record of the primary selected user obtaining a data source corresponding to the primary selected user; and determining a corresponding number of behaviors of the primary selected user in each corresponding data source, and a behavior occurrence time;
  • the data source weight of the data source For each data source corresponding to the primary user, the data source weight of the data source, the number of user behaviors corresponding to the primary user in the data source, and the behavior occurrence time are combined to obtain the corresponding data of the primary user.
  • Source score For each data source corresponding to the primary user, the data source weight of the data source, the number of user behaviors corresponding to the primary user in the data source, and the behavior occurrence time are combined to obtain the corresponding data of the primary user.
  • the primary users are added to the scores of the corresponding data sources to obtain the scores of the primary users.
  • the model training module 600 is configured to determine, according to the scores of the primary users, a first number of target users from the primary users, including:
  • determining a primary user whose score is greater than the score threshold determining a primary user whose score is greater than the score threshold, and randomly selecting the first number of primary users from the primary users whose scores are greater than the score threshold as the target user.
  • model training module 600 is configured to obtain multiple pieces of user behavior records, including:
  • the plurality of user behavior data are preprocessed, the preprocessed user behavior data is obtained, and the user behavior records corresponding to the preprocessed user behavior data are extracted, and a plurality of user behavior records are obtained.
  • model training module 600 is configured to perform preprocessing on the multiple pieces of user behavior data, specifically:
  • FIG. 8 is a block diagram showing another structure of the data processing apparatus according to the embodiment of the present invention. As shown in FIG. 7 and FIG. 8, the apparatus may further include:
  • the model testing module 700 is configured to randomly select a second number of users from the primary users as test sample users;
  • the predicted result is represented by an ROC curve
  • the probability threshold is adjusted according to the ROC curve.
  • the user feature further includes: basic attribute information and/or an interest feature.
  • FIG. 9 is a block diagram showing another structure of the data processing apparatus according to the embodiment of the present invention. As shown in FIG. 6 and FIG. 9, the apparatus may further include:
  • the image generation module 800 is configured to generate a user image of the user to be excavated according to the determined result that the user to be mined has a target asset;
  • the information recommendation module 900 is configured to recommend, according to the determined result that the user to be mined has a target asset, information associated with the target asset to the user to be mined.
  • the image generation module 800 and the information recommendation module 900 may alternatively be applied to the device shown in FIG. 6.
  • the embodiment of the invention further provides a server, which may include the data processing device described above.
  • FIG. 10 shows a hardware structural block diagram of a server.
  • the server may include: a processor 10, a communication interface 20, a memory 30, and a communication bus 40;
  • the processor 10, the communication interface 20, and the memory 30 complete communication with each other through the communication bus 40;
  • the communication interface 20 can be an interface of the communication module, such as an interface of the GSM module;
  • the processor 10 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • the memory 30 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the processor 10 is specifically configured to:
  • the target asset state prediction model being trained according to user characteristics of the positive sample user and the negative sample user acquired from the at least one data source; wherein the positive sample user has the possibility of the target asset Sex, greater than the probability that the negative sample user has the target asset; the user feature includes at least: a user behavior feature;
  • the probability that the user to be mined has a target asset is greater than a probability threshold, it is determined that the user to be mined has a target asset.
  • an embodiment of the present invention further provides a storage medium for storing program code, and the program code is used to execute the data processing method provided by the foregoing embodiment.
  • the embodiment of the invention further provides a computer program product comprising instructions, which when executed on a server, causes the server to execute the data processing method provided by the above embodiment.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供一种数据处理方法、装置及服务器,该方法包括:从至少一个数据源获取待挖掘用户的用户特征;获取预训练的目标资产状态预测模型;根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率;如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。本发明实施例可提高用户资产状态挖掘的处理效率。

Description

数据处理方法、装置及服务器
本申请要求于2017年04月20日提交中国专利局、申请号为201710261884.X、申请名称为“一种数据处理方法、装置及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理技术领域,具体涉及一种数据处理方法、装置及服务器。
背景技术
用户资产状态作为用户信息的一部分,就像用户的年龄、性别、国家、省份、城市等基础属性一样,是描述用户非常重要的信息,在用户画像生成、信息推荐等场景下应用广泛;为更好的实现用户画像生成、信息推荐等目的,优化用户资产状态的挖掘方式具有重要意义;用户资产状态是对用户是否拥有某一资产的表示,比如表示用户有无房产、或有无车产等。
目前在对某一用户进行资产状态的挖掘时,需要前往银行、房管局、车管所等登记有用户资产数据的机构进行人工查询,从而基于人工查询结果,判断某一用户是否具有特定资产,实现用户资产状态的挖掘;这种需要前往特定的登记有用户资产数据的机构,才能查询出用户是否具有特定资产的方式,存在用户资产状态挖掘的处理效率低的问题。
发明内容
有鉴于此,本发明实施例提供一种数据处理方法、装置及服务器,以提高用户资产状态挖掘的处理效率。
为实现上述目的,本发明实施例提供如下技术方案:
一种数据处理方法,应用于服务器,所述方法包括:
从至少一个数据源获取待挖掘用户的用户特征;
获取预训练的目标资产状态预测模型,所述目标资产状态预测模型根据从所述至少一个数据源获取的正样本用户和负样本用户的用户特征训练得到;其 中,正样本用户具有目标资产的可能性,大于负样本用户具有目标资产的可能性;所述用户特征至少包括:用户行为特征;
根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率;
如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。
本发明实施例还提供一种数据处理装置,应用于服务器,所述数据处理装置包括:
特征获取模块,用于从至少一个数据源获取待挖掘用户的用户特征;
模型调取模块,用于获取预训练的目标资产状态预测模型,所述目标资产状态预测模型根据从所述至少一个数据源获取的正样本用户和负样本用户的用户特征训练得到;其中,正样本用户具有目标资产的可能性,大于负样本用户具有目标资产的可能性;所述用户特征至少包括:用户行为特征;
概率预测模块,用于根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率;
第一结果确定模块,用于如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。
本发明实施例还提供一种服务器,包括上述所述的数据处理装置。
基于上述技术方案,本发明实施例可以至少根据正样本用户和负样本用户在数据源对应的行为特征,训练出目标资产状态预测模型;进而在对待挖掘用户进行目标资产状态挖掘时,可以根据待挖掘用户在至少一个数据源的用户特征,通过所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率,并在所述待挖掘用户具有目标资产的概率大于概率阈值时,确定所述待挖掘用户具有目标资产,实现目标资产状态的挖掘。
由于本发明实施例可至少根据用户行为特征训练出目标资产状态预测模型,再利用目标资产状态预测模型进行用户具有目标资产的概率进行预测,实现目标资产状态的自动挖掘,而不需要前往银行、房管局、车管所等机构人工查询用户资产数据,提升了用户资产状态挖掘的处理效率;同时,相比用户资产数据的查询获取,需要银行、房管局、车管所等机构授权同意,本发明实施 例可至少使用社交、搜索等数据源中记录的用户行为特征,实现目标资产状态的挖掘,挖掘方式的使用局限性得以降低。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明实施例提供的数据处理方法的流程图;
图2为本发明实施例提供的训练目标资产状态预测模型的方法流程图;
图3为本发明实施例提供的确定目标用户的方法流程图;
图4为本发明实施例提供的确定初选用户的分值的方法流程图;
图5为本发明实施例的处理示意图;
图6为本发明实施例提供的数据处理装置的结构框图;
图7为本发明实施例提供的数据处理装置的另一结构框图;
图8为本发明实施例提供的数据处理装置的再一结构框图;
图9为本发明实施例提供的数据处理装置的又一结构框图;
图10为本发明实施例提供的服务器的硬件结构框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
设定房产、车产等用户资产类型中的任一种为目标资产,本发明实施例可通过数据处理方式,实现用户是否具有目标资产的挖掘(即实现目标资产状态的挖掘),数据处理过程可以如图1所示;图1为本发明实施例提供的数据处理方法的流程图,该方法可应用于服务器;服务器可以是位于网络侧的具有数据处理能力的服务设备,也可能是位于用户侧的具有数据处理能力的PC(个人 计算机)等计算设备;
参照图1,该方法可以包括:
步骤S100、从至少一个数据源获取待挖掘用户的用户特征。
待挖掘用户为待挖掘目标资产的用户,即本发明实施例需要判断待挖掘用户是否具有目标资产,以实现具有目标资产的用户挖掘;
待挖掘用户的用户特征包括待挖掘用户的用户行为特征。在本发明实施例一些可能的实现方式中,待挖掘用户的用户特征还可以包括:待挖掘用户的基础属性(如年龄、性别、学历等)、兴趣特征等特征中的至少一种。
待挖掘用户的用户特征可以根据待挖掘用户的用户账号、身份证号、手机号等用户ID,从至少一个数据源中获取;数据源可以是社交平台、搜索平台等记录有用户数据的应用平台,这类应用平台可以提供用户注册功能(用户注册时,可以要求填写用户的年龄、性别、学历等基础属性),并对于注册用户,可以根据注册用户在应用平台的行为,记录相应的用户行为数据。其中,用户行为数据是指表征用户行为的数据,例如,用户行为数据可以是用户在电商平台实施的购买、搜索等行为数据,用户行为数据也可以是用户在社交平台实施的社交行为数据,用户行为数据还可以是用户在搜索引擎中实施的搜索行为数据。数据源所记录的用户行为数据可以通过用户行为日志的形式存在,通过用户行为日志中的用户行为数据,可以确定用户行为特征。在本发明实施例一些可能的实现方式中,还可以基于注册用户一段时间的历史行为所产生的用户行为数据,分析出用户的兴趣特征。
从社交平台、搜索平台等数据源获取用户特征的方式,可以是通过网络抓取实现;也可能是数据源与服务器属于同一服务提供商,服务器可以通过数据源对应的应用平台的接口获取到待挖掘用户的用户特征。
在一些可能的实现方式中,服务器和数据源对应的应用平台可以使用同一账号体系,服务器可以接入数据源对应的应用平台的用户账号,从而使得待挖掘用户可以使用在数据源对应的应用平台注册的用户账号,登录服务器。
步骤S110、调取预训练的目标资产状态预测模型,所述目标资产状态预测模型根据从所述至少一个数据源获取的正样本用户的用户特征和负样本用户的用户特征训练得到;其中,正样本用户具有目标资产的概率,大于负样本 用户具有目标资产的概率;所述用户特征至少包括:用户行为特征。
本发明实施例可以预先训练出目标资产状态预测模型,该目标资产状态预测模型能够预测出某一用户具有目标资产的概率;在具体训练时,可以从所述至少一个数据源中获取多条用户行为数据,通过对该多条用户行为数据进行分析,从该多条用户行为数据对应的用户中选取出正样本用户和负样本用户,从而获取正样本用户在所述至少一个数据源对应的用户特征,和负样本用户在所述至少一个数据源对应的用户特征,以机器学习方法训练出目标资产状态预测模型;
可选的,正样本用户可以是对该多条用户行为数据进行分析后,确定的具有目标资产的可能性较大的用户,其中,具有目标资产的可能性较大的用户,从统计学的角度,即为具有目标资产的概率较大的用户;相对而言,负样本用户具有目标资产的可能性,小于正样本用户具有目标资产的可能性,也即负样本用户具有目标资产的概率小于正样本用户具有目标资产的概率。
正样本用户的用户特征可以包括正样本用户的用户行为特征,在一些可能的实现方式中,正样本用户的用户特征还可以包括正样本用户的用户基础属性、兴趣特征等特征中的至少一种。类似的,负样本用户的用户特征可以包括负样本用户的用户行为特征,还可以包括负样本用户的用户基础属性、兴趣特征等至少一种。
步骤S120、根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率。
将待挖掘用户的用户特征作为目标资产状态预测模型的输入数据,输入到目标资产状态预测模型中,可以通过目标资产状态预测模型,预测出待挖掘用户具有目标资产的概率。
步骤S130、如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。
本发明实施例可以设定用户具有目标资产的概率下限值,得到所述概率阈值,从而在目标资产状态预测模型预测的待挖掘用户具有目标资产的概率,大于所述概率阈值时,认为所述待挖掘用户具有目标资产,从而实现目标资产状态挖掘。
可以看出,本发明实施例可以至少根据正样本用户和负样本用户在数据源对应的行为特征,训练出目标资产状态预测模型;进而在对待挖掘用户进行目标资产状态挖掘时,可以根据待挖掘用户在至少一个数据源的用户特征,通过所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率,并在所述待挖掘用户具有目标资产的概率大于概率阈值时,确定所述待挖掘用户具有目标资产,实现目标资产状态的挖掘。
由于本发明实施例可至少根据用户行为特征训练出目标资产状态预测模型,再利用目标资产状态预测模型对用户具有目标资产的概率进行预测,实现目标资产状态的自动挖掘,而不需要前往银行、房管局、车管所等机构人工查询用户资产数据,提升了用户资产状态挖掘的处理效率;同时,相比用户资产数据的查询获取,需要银行、房管局、车管所等机构授权同意,本发明实施例可至少使用社交、搜索等数据源中记录的用户行为特征,实现目标资产状态的挖掘,减少经银行、房管局、车管所等机构授权同意,才能获取用户资产数据的情况,降低用户资产状态挖掘的使用局限。
可选的,进一步,如果步骤S120所预测的待挖掘用户具有目标资产的概率,小于或等于所述概率阈值,则可确定所述待挖掘用户不具有目标资产。
下面对目标资产状态预测模型的训练过程进行介绍,目标资产状态预测模型的训练思路主要是:选定具有目标资产的可能性较大的正样本用户,以及具有目标资产的可能性较小的负样本用户,通过正样本用户和负样本用户在至少一个数据源中的用户特征,作为模型训练的输入特征,来实现模型训练;
在此思路下,图2示出了一种训练目标资产状态预测模型的方法流程,该方法可应用于服务器,该方法主要是通过用户行为特征的匹配处理,来实现正样本用户的选取;
参照图2,该方法可以包括:
步骤S200、从用户集中确定目标用户;所述目标用户的用户行为特征与预定的目标资产的正向特征词相匹配,且所述目标用户的用户行为特征与预定的目标资产的负向特征词不匹配。
可选的,用户集可以是从所述至少一个数据源收集的行为数据集合(包含多条行为数据)对应的用户集合,用户集中对应了从所述至少一个数据源收集 的多条行为数据所属的用户。
可选的,目标资产的正向特征词可以表示目标资产的关键词(keywords),是对目标资产的正向特征(与目标资产匹配的特征)的描述,以目标资产为具有车产为例,则正向特征词可以为“车友、车险、违章、车贷、卖车”等。
目标资产的负向特征词可以是目标资产的过滤词(filter words),是对目标资产的反向特征(与目标资产不匹配的特征)的描述,以目标资产为具有车产为例,则负向特征词可以是“租车、驾校、买车”等。
其中,目标资产的正向特征词以及目标资产的负向特征词可以预先设定。作为一种可能的实现方式,可以根据经验值预先设定目标资产的正向特征词以及目标资产的负向特征词,该预先设定的目标资产的正向特征词即为预定的目标资产的正向特征词,该预先设定的目标资产的负向特征词即为预定的目标资产的负向特征词。
可选的,上述以具有车产为目标资产为例,描述的目标资产的正向特征词和负向特征词仅是示例性说明,目标资产的正向特征词和负向特征词是非常丰富的,本发明实施例可以尽可能的列举目标资产预定的正向特征词及负向特征词,使得目标用户的确定结果尽可能的准确。
在设定了目标资产预定的正向特征词和负向特征词后,本发明实施例可将各用户的用户行为数据表示的用户行为特征,与预定的目标资产的正向特征词和预定的目标资产的负向特征词分别进行匹配,确定出用户行为特征与目标资产预定的正向特征词相匹配,且所述目标用户的用户行为特征与预定的目标资产的负向特征词不匹配的目标用户。
以上文的目标资产为具有车产为例,则目标用户(具有车产的用户)的用户行为特征的特征词应与“车友、车险、违章、车贷、卖车”等正向特征词相匹配,且与“租车、驾校、买车”等负向特征词不匹配。
可以理解,在确定出行为特征与所述正向特征词相匹配的用户行为数据后,虽然能通过与正向特征词相匹配,确定出目标用户。但是在有些情况下,用户行为数据不仅能和正向特征词相匹配,也能和一些负向特征词相匹配,还可以将所述负向特征词过滤掉,从而使得目标用户的确定结果更为准确。
具体的,以上文的目标资产为具有车产为例,在通过车产的“车友、车险、 违章、车贷、卖车”等正向特征词匹配出初步的用户后,这些用户中可能存在真正具有车产的用户,也可能存在不具有车产但希望了解车辆信息的用户(如不具有车产,但搜索了与车辆贷款,保险相关信息的用户);因此需要从通过正向特征词匹配出的初步的用户中,过滤掉不具有车产但希望了解车辆信息的用户,保留出准确的真正具有车产的用户,此时就需要通过车产的“租车、驾校、买车”等负向特征词,从正向特征词匹配出的初步的用户中,过滤掉与具有车产状态的用户极可能不相关的存在“租车、驾校、买车”等负向特征词行为的用户,从而过滤对目标用户的确定产生干扰的用户行为数据,使得所确定的具有车产的目标用户的确定结果更为准确。
可选的,确定目标用户的过程也可直接通过预定的目标资产的正向特征词实现,而不需通过预定的目标资产的负向特征词;即本发明实施例可将用户集中,用户行为特征与目标资产预定的正向特征词相匹配的用户,直接作为目标用户;在确定出用户行为特征与目标资产预定的正向特征词相匹配的用户后,再通过与目标资产预定的负向特征词进行干扰用户过滤的手段,仅是提高目标用户确定准确率的一种可选方式;在设置合理的目标资产预定的正向特征词的基础上,本发明实施例也可将用户行为特征与目标资产预定的正向特征词相匹配的用户,直接确定为目标用户,且确定结果具有一定的准确性。
步骤S210、将所述目标用户作为训练目标资产状态预测模型所使用的正样本用户,并从用户集中选取除正样本用户以外的至少一个用户作为训练目标资产状态预测模型所使用的负样本用户。
在本实施例中,由于目标用户的用户行为特征与目标资产预定的正向特征词相匹配,且不与目标资产预定的负向特征词匹配,因此目标用户具有目标资产的可能性较大,可以将目标用户作为训练目标资产状态预测模型所使用的正样本用户。
在确定了正样本用户后,可以从所述用户集中选取除正样本用户以外的至少一个用户作为训练训练目标资产状态预测模型所使用的负样本用户。其中,选取负样本用户的方式不做限定,可以从用户集除正样本以外的用户中随机选取,得到负样本用户,也可以按照样本分布规律从用户集除正样本以外的用户中选取,得到负样本用户,本实施例对此不作限定。
需要说明,正样本用户和负样本用户的比例可以根据实际情况设定,比如正样本用户和负样本用户的比例可以是1:1,或者N:1等,N为设定数值。
步骤S220、从所述至少一个数据源,获取正样本用户的用户特征及负样本用户的用户特征。
在确定正样本用户和负样本用户后,本发明实施例可从所述至少一个数据源,获取正样本用户的用户特征及负样本用户的用户特征。其中,正样本用户的用户特征至少包括正样本用户的用户行为特征,在一些可能的实现方式中,还可以将用户的基础属性(如年龄、性别、学历等)、兴趣特征等作为用户特征使用;类似的,负样本用户的用户特征至少包括负样本用户的用户行为特征,在一些可能的实现方式中,还包括负样本用户的基础属性、兴趣特征等。用户特征的具体形式可根据实际情况定义。
步骤S230、根据正样本用户的用户特征和负样本用户的用户特征,通过机器训练方法,训练出目标资产状态预测模型。
可选的,本发明实施例可以将正样本用户的用户特征,和负样本用户的用户特征,作为机器训练方法的输入数据,通过机器训练方法,训练出目标资产状态预测模型;
本发明实施例所使用的机器训练方法可以包括:决策树(Decision Tree,DT)、逻辑回归(Logistic Regression,LR)、朴素贝叶斯网络(
Figure PCTCN2018080842-appb-000001
Bayes,NB)、随机森林(Random Forest,RF)、支持向量机(Support Vector Machine,SVM)及boosting模型xgboost(eXtreme Gradient Boosting)等;在本发明实施例一些可能的实现方式中,可以使用二分类比较经典的模型LR,也可以选精度高速度快的xgboost模型。
可选的,在训练目标资产状态预测模型的过程中,可以调整模型的参数,如xgboost模型,可以调整树深度、收缩步长(也称学习速率,用字符eta表示),以及迭代次数,以得到了质量较优的目标资产状态预测模型。作为一种可能的实现方式,可以根据模型输出的曲线下的面积(area under curve,auc)、错误率、召回率(recall)、准确率(precision)判断模型质量,优化模型训练结果。
可选的,对于从至少一个数据源收集的多条用户行为数据,本发明实施例可进行预处理,得到多条用户行为记录,从而通过用户行为记录所表示的用户 行为特征与目标资产预定的正向特征词和负向特征词的匹配,确定出目标用户;
可选的,图3示出了本发明实施例提供的确定目标用户的方法流程图,该方法可应用于服务器,参照图3,该方法可以包括:
步骤S300、获取从至少一个数据源收集的多条用户行为数据,所述多条用户行为数据对应的用户包含于所述用户集中。
可选的,用户行为数据可以用户行为日志的形式存在,本发明实施例可从至少一个数据源收集到海量的用户行为数据,用户行为数据是对某一用户在数据源对应的应用平台的行为的描述,所收集的海量的用户行为数据涉及到的用户,与前文描述的用户集相应。
步骤S310、对所述多条用户行为数据进行预处理,得到预处理后的用户行为数据,提炼出各条预处理后的用户行为数据对应的用户行为记录,得到多条用户行为记录;一条用户行为记录表示一个用户在一个时间点的用户行为特征。
可选的,对于所收集的多条用户行为数据,本发明实施例可以进行预处理,然后提炼出各条预处理后的用户行为数据对应的用户行为记录,得到多条用户行为记录;所得到的用户行为记录的数量,不大于所收集的多条用户行为数据的数量,且一条用户行为记录可对应一条预处理后的用户行为数据;
可选的,对用户行为数据进行预处理的过程可以是,删除属于数据噪声的用户行为数据,和/或,补齐用户行为数据中的缺失值,得到预处理后的用户行为数据;具体的,本发明实施例可从所述多条用户行为数据中删除为数据噪声的用户行为数据,和/或,对所述多条用户行为数据中缺失属性值的用户行为数据,进行属性值的补齐处理;
属于数据噪声的用户行为数据是指包含错误属性值或存在偏离期望的孤立点的属性值的用户行为数据。用户行为数据中出现噪声的原因可能有多种,比如收集数据的设备出现故障,数据输入时出现错误,数据传输过程中出现错误,存储介质出现损坏等;对于属于数据噪声的用户行为数据,本发明实施例可通过预处理进行删除;比如,收集的用户行为数据中,存在一条时间属性为2050年的行为数据,那么明显的,由于当前的年份还未到达2050年,那么此 条行为数据可能是由于数据输入时,年份出现错误,或者设备故障所导致,需要通过预处理方式进行删除,显然此处的举例仅是行为数据包含错误属性值或存在偏离期望的孤立点的属性值的情况的一种形式。
从数据源中收集的用户行为数据缺失属性值的情况经常发生,甚至是不可避免的,比如有些属性值是无法获取的,有些属性值是被遗漏等;因此需要对缺失属性值的用户行为数据进行处理,可选的处理方式可以是:补齐用户行为数据中的缺失的属性值,比如使用预定值去填充缺失的属性值等;比如,一条用户行为数据中缺失了年龄属性的属性值,则本发明实施例可使用预定的年龄值填充年龄属性的属性值,以补齐用户行为数据中的缺失的属性值,显然,此处所描述的缺少年龄值的属性值填充举例仅是示意,实际使用中,可以根据需要设定需要填充缺失的属性值的属性类型。
可选的,由于对用户行为数据进行预处理的过程涉及噪声数据的删除,因此预处理后的用户行为数据的数量,可能会小于用户行为数据集合中的用户行为数据的数量;
在得到预处理后的用户行为数据后,对于各条预处理后的用户行为数据,本发明实施例可提炼出各条预处理后的用户行为数据所对应的用户行为记录,得到多条用户行为记录,且一条用户行为记录对应一条预处理后的用户行为数据;
一条用户行为记录可以表示一个用户在一个时间点的用户行为特征,比如某一用户在某一时间点所执行的行为及行为次数,一种可选的用户行为记录的形式可以是{用户id,行为时间,行为类型,行为次数,行为描述};其中,用户id可用于唯一标识一个用户,行为时间可用于表示该用户id对应的用户执行行为的时间点;
通过对用户行为集合中的多条用户行为数据进行预处理,得到预处理后的用户行为数据,且提炼出各条预处理后的用户行为数据对应的用户行为记录,来进行后续目标资产状态预测模型的训练,可以极大的缩减数据处理量。
相应的,所得到的多条用户行为记录对应至少一个数据源,且所述多条用户行为记录对应的多个用户属于所述用户集。
需要说明,步骤S300-步骤S310仅为从至少一个数据源中获取用户集中 的用户对应的用户行为记录,得到多条用户行为记录的一种具体实现方式,在本发明实施例其他可能的实现方式中,也可以通过其他方式获取多条用户行为记录。
步骤S320、根据预定的目标资产的正向特征词和预定的目标资产的负向特征词,从所述多条用户行为记录中确定用户行为特征与所述正向特征词相匹配,且与所述负向特征词不匹配的用户行为记录。
步骤S330、将所确定的用户行为记录对应的用户确定为初选用户。
步骤S340、从所述初选用户中确定目标用户。
可见,本发明实施例可以对从至少一个数据源收集的多条用户行为数据进行预处理,并提炼出用户行为记录,获取到多条用户行为记录,从而根据目标资产预定的正向特征词和负向特征词,从所述多条用户行为记录中确定用户行为特征与所述正向特征词相匹配,且不与所述负向特征词匹配的用户行为记录;进而将所确定的用户行为记录对应的用户确定为初选用户,从所述初选用户中确定出目标用户。
可选的,在确定出初选用户后,本发明实施例可以直接将初选用户作为目标用户。
进一步,也可以从初选用户中进行选取,得到目标用户;具体的,本发明实施例可以确定各初选用户的分值;一个初选用户的分值可以表征该初选用户具有目标资产的概率;初选用户的分值越高,表明该初选用户具有目标资产的概率越大,也即具有目标资产的可能性越大。进一步地,根据各初选用户的分值,可以从初选用户中确定第一数量的目标用户;第一数量小于初选用户的数量。
可选的,一个初选用户的分值确定过程可以如图4所示,对于每一个初选用户均执行如图4所示处理,则可得到各初选用户的分值;图4为本发明实施例提供的确定初选用户的分值的方法流程图,参照图4,该方法可以包括:
步骤S400、确定初选用户的用户行为记录所对应的数据源,得到初选用户对应的数据源;及确定初选用户在所对应的各数据源中对应的行为次数,和行为发生时间。
对于一个初选用户,本发明实施例可确定该初选用户对应的用户行为记 录,得到该初选用户的用户行为记录(该用户行为记录的用户行为特征与所述正向特征词相匹配,且不与所述负向特征词匹配);即本发明实施例需要分析出初选用户对应的用户行为记录,具体是步骤S320所确定的用户行为记录中的哪些是该初选用户对应的用户行为记录,比如步骤S320所确定的用户行为记录是100条,该100条用户行为记录记录的是A、B和C三个用户的用户行为特征,则对于初选用户A,需要确定出该100条用户行为记录中哪些是初选用户A的用户行为记录(可选的,可通过各用户行为记录中的用户账号等用户id,区分各个用户,相应的,可从步骤S320所确定的用户行为记录中确定与初选用户A的用户id相应的用户行为记录,获取到初选用户A的用户行为记录);从而对于每一初选用户均作此处理,可得到各初选用户对应的用户行为记录。
在确定初选用户对应的用户行为记录后,本发明实施例可确定初选用户的用户行为记录所对应的数据源;比如在确定初选用户A的用户行为记录为,步骤S320所确定的第1至20条的用户行为记录,则该第1至20条的用户行为记录可能来自不同的数据源,本发明实施例需要确定该第1至20条的用户行为记录中,各用户行为记录的数据源,从而确定初选用户A的用户行为记录所对应的数据源,得到初选用户A对应的数据源;对于每一初选用户均作此处理,可得到各初选用户对应的数据源。
在得到初选用户的所对应的数据源后,本发明实施例可确定初选用户的用户行为记录在所对应的各数据源中对应的用户行为次数,及行为发生时间,得到初选用户在所对应的各数据源中对应的行为次数,和行为发生时间(可选的,可通过初选用户在各个数据源对应的用户行为记录所记录的行为时间和行为次数等信息,确定初选用户在所对应的各数据源中对应的行为次数,和行为发生时间)。
可选的,同一用户的多条用户行为记录中,相同数据源的用户行为记录可进行聚合;在聚合时,可将一用户在一数据源的多条用户行为记录中,最近的行为发生时间,作为该用户在该数据源对应的行为发生时间;将一用户在一数据源的多条用户行为记录中,行为次数的累加值,作为该用户在该数据源对应的用户行为次数;
比如初选用户A在数据源1中具有20条用户行为记录,则初选用户在数 据源1中对应的行为次数可以是,该20条用户行为记录的行为次数的累加值,初选用户在数据源1中对应的行为发生时间可以是,该20条用户行为记录中最近的行为发生时间。
步骤S410、对于初选用户所对应的各数据源,将数据源的数据源权重,初选用户在数据源中对应的用户行为次数,及行为发生时间相结合,以得到初选用户在所对应的各数据源的分值。
可选的,以第u个初选用户为例,设w i表示第i个数据源的权重,m为该第u个初选用户在第i个数据源的用户行为次数,t表示该第u个初选用户在第i个数据源的行为发生时间,则可根据如下公式确定第u个初选用户在第i个数据源的分值:
Figure PCTCN2018080842-appb-000002
其中,sigmoid函数
Figure PCTCN2018080842-appb-000003
进行归一化处理,表示行为频次越多,分值越高,即初选用户在一数据源对应的用户行为次数,与初选用户在该数据源的分值为正相关关系;
t 0表示当前***时间,α为时间衰减参数。
Figure PCTCN2018080842-appb-000004
函数表示行为发生时间距离当前***时间越近,则分值越大,距离当前***时间越远,分值越小;即当前***时间,和初选用户在一数据源对应的行为发生时间的差值,与初选用户在该数据源的分值为负相关关系。
步骤S420、将初选用户在所对应的各数据源的分值相加,得到初选用户的分值。
设s u表示第u个初选用户的分值,则第u个初选用户的分值的确定公式可以为:
Figure PCTCN2018080842-appb-000005
其中,N表示第u个初选用户对应的数据源个数。
可见,初选用户的分值的因素包含以下几个:首先由于不同数据源表示的行为不同,不同数据源的权重(weight)将不同(前面提到数据源包括:社交、搜索、电商等;相应的用户在不同数据源的行为可能是社交行为、搜索行为、电商交易行为等),举个例子,购买车险、汽车配件行为明确的表明用户是有车状态的用户,而搜索某款汽车或者浏览汽车相关信息,只能表示用户对汽车感兴趣;其次,用户行为次数(频次)也是一个重要影响因素,还是上面的例子来说明下,用户多次购买汽车配件、车险,那么行为权重会叠加,相对于偶尔购买一次的用户更能明确表明用户是有车状态的;另外,用户行为发生时间不同,其权重也不同,越是近期发生的行为越能说明用户现在的资产状态。
因此,对于各初选用户,本发明实施例可确定初选用户的用户行为记录所对应的数据源,以得到各初选用户对应的数据源;及对于各初选用户,本发明实施例可确定初选用户在所对应的各数据源中对应的行为次数,和行为发生时间;
从而对于各初选用户,确定初选用户在所对应的各数据源的分值,以分别得到各初选用户在所对应的各数据源中的分值;其中,一个初选用户在所对应的一个数据源的分值的确定过程包括:将该数据源的数据源权重,该初选用户在该数据源中对应的用户行为次数,及行为发生时间相结合,得到该初选用户在该数据源中的分值;
进而对于各初选用户,将初选用户在所对应的各数据源的分值相加,得到各初选用户的分值。
可选的,本发明实施例可对每个数据源都赋一个统一的权重值;
另一方面,不同的数据源的权重可以不同;具体的,对于一个数据源,可以选取数据源中部分初选用户作为正样本,然后从数据源对应的用户集中随机挑选一定比例的负样本,对该数据源赋一个初始权重值,将数据源的正、负样本的特征输入LR模型进行训练,最后模型迭代收敛输出的结果,则认为是该数据源的权重值;对于每一数据源均作此处理,则可得到每一个数据源的权重;这里不限于使用LR模型学习数据源权重,还可以根据具体需要选择其它机器学习方法。
可选的,在确定各初选用户的分值后,本发明实施例选取分值排序靠前的 第一数量的初选用户,作为目标用户;也可以是,确定分值大于分值阈值的初选用户,从分值大于分值阈值的初选用户中随机选取第一数量的初选用户,作为目标用户。
可选的,除通过上述描述的方式实现正样本用户的确定外,本发明实施例还可采用主题模型对用户行为记录进行分类,也就是通过相似度方法计算用户行为记录的用户行为特征,与正样本用户的主题关键词的相似度,得到各用户行为记录对应的相似度,取相似度最高的用户行为记录所对应第一数量的用户作为正样本用户。
可选的,在通过图2所示方法训练出目标资产状态预测模型后,本发明实施例可从初选用户中随机选取第二数量的用户作为测试样本用户,通过测试样本用户的用户特征来评估训练出的目标资产状态预测模型的准确率、召回率等指标;
具体的,可将各测试样本用户的用户特征输入目标资产状态预测模型中,通过目标资产状态预测模型,正确预测出测试样本用户具有目标资产的结果的比例,来确定目标资产状态预测模型的准确率。例如,测试样本用户有100个,其中,具有目标资产的用户为80个,不具有目标资产的用户为20个,具有目标资产的用户中有60个被预测为具有目标资产,不具有目标资产的用户中有10个被预测为具有目标资产,则目标资产状态预测模型的准确率为
Figure PCTCN2018080842-appb-000006
进一步,本发明实施例可将目标资产状态预测模型,预测测试样本用户的结果,采用ROC曲线表示;即将各测试样本用户的用户特征输入目标资产状态预测模型,得到测试样本用户的预测结果,然后将所述预测结果采用受试者工作特征曲线(Receiver Operating Characteristic,ROC)表示;其中,ROC曲线也称接收器操作特性曲线,是反映敏感性和特异性连续变量的综合指标,是用构图法揭示敏感性和特异性的相互关系,它通过将连续变量设定出多个不同的临界值,从而计算出一系列敏感性和特异性,再以敏感性为纵坐标、(1-特异性)为横坐标绘制成曲线,曲线下面积越大,诊断准确性越高;本发明实施例 可将测试样本的各个预测结果作为连续变量,从而通过计算一系列敏感性和特异性,构建出ROC曲线;
相应的,可将ROC曲线下的面积,介于0.1和1之间,作为概率阀值;也即概率阈值可通过资产的测试样本的ROC曲线进行选择。
比如,在一个二分类模型中,对于所得到的连续结果,假设已确定一个概率阀值,比如说0.6,目标资产状态预测模型预测一个用户具有目标资产的概率,大于这个概率阈值,则该用户划归为正类(即具有目标资产);目标资产状态预测模型预测一个用户具有目标资产的概率,小于这个概率阈值,则该用户划归为负类(即不具有目标资产);那么,如果减小概率阀值,比如减到0.5,固然能识别出更多的正类,也就是提高了识别出的正类所占的比例,但同时也将更多的原本应是负类的用户当作了正类,本发明实施例可通过ROC曲线形象化这一变化,即可通过ROC曲线形象化不同概率阀值选择上,正类用户识别准确性的变化,从而可评价目标资产状态预测模型的预测准确性;如一概率阈值下,测试样本有100个,模型预测测试样本时,预测对了80个,那么准确率=80/100=80%,而如果再次调整概率阈值,则准确率降为70%,则可通过这一变化来评价目标资产状态预测模型的预测准确性;
本发明实施例可根据不同资产类型来调整模型,并根据模型在各资产的测试样本用户上的ROC曲线,来选择不同的概率阀值,以平衡真正类率(真正的正类被预测出的比率)及负正类率(实际上不是正类,但被预测成正类的比率),提升模型的预测准确性。如不同的资产(有房、有车)对应的阀值理论是不同的,应该根据不同资产的测试样本调整该资产的预测模型,从而根据该资产的预测模型在测试样本上的ROC曲线,来选择该资产的预测模型的概率阀值,即对于每一资产,本发明实施例在构建出该资产的预测模型后,可通过该资产的测试样本的预测结果的ROC曲线,来选择该资产的预测模型的概率阀值,从而对于每一资产均作此处理,得到每一资产的预测模型的概率阈值。
可选的,以目标资产状态预测模型的形式为分类器为例,本发明实施例的可选处理示意可以如图5所示,参照图5,在从至少一个数据源收集到多条用户行为数据后,本发明实施例可以进行预处理,提炼出用户行为记录,并且设置出目标资产的关键词和过滤词;从而以目标资产的关键词和过滤词,对用户行 为记录进行文本语义挖掘处理,分析出用户行为特征与所述正向特征词相匹配,且不与所述负向特征词匹配的用户行为记录,从而从提炼的用户行为记录中进一步的筛选出,行为特征与目标资产匹配的用户行为记录;从而将筛选出的用户行为记录对应的用户作为初选用户,实现初选用户的确定;
进而确定各初选用户的分值,根据分值确定出模型训练使用的正样本用户,并选取出负样本用户;
将正样本用户的用户特征,和负样本用户的用户特征导入分类器训练模型中,训练得到目标资产状态预测分类器;
从而将待挖掘用户的用户特征,导入该目标资产状态预测分类器中,得到待挖掘用户具有目标资产的概率,将该概率与概率阈值进行比对,确定出待挖掘用户是否具有目标资产的结果。
可选的,在确定出待挖掘用户具有目标资产的结果后,本发明实施例可根据所确定的所述待挖掘用户具有目标资产的结果,生成所述待挖掘用户的用户画像(如可将待挖掘用户具有目标资产的结果,作为该待挖掘用户的用户画像生动的一个数据维度,以实现该待挖掘用户的用户画像的生成),以实现用户资产状态的挖掘结果在用户画像生成中的应用;
在另一种应用中,本发明实施例也可以根据所确定的所述待挖掘用户具有目标资产的结果,向所述待挖掘用户推荐与所述目标资产相关联的信息。以目标资产为具有车产为例,与目标资产相关联的信息比如新车信息,车辆限行信息等。
本发明实施例可至少根据用户行为特征训练出目标资产状态预测模型,再利用目标资产状态预测模型进行用户具有目标资产的概率进行预测,实现目标资产状态的自动挖掘,而不需要前往银行、房管局、车管所等机构人工查询用户资产数据,提升了用户资产状态挖掘的处理效率;同时,相比用户资产数据的查询获取,需要银行、房管局、车管所等机构授权同意,本发明实施例可至少使用社交、搜索等数据源中记录的用户行为特征,实现目标资产状态的挖掘,挖掘方式的使用局限性得以降低。
下面对本发明实施例提供的数据处理装置进行介绍,下文描述的数据处理装置可以认为是服务器为实现本发明实施例提供的数据处理方法所需设置的 功能模块结构。
图6为本发明实施例提供的数据处理装置的结构框图,该装置可应用于服务器,参照图6,该装置可以包括:
特征获取模块100,用于从至少一个数据源获取待挖掘用户的用户特征;
模型调取模块200,用于获取预训练的目标资产状态预测模型,所述目标资产状态预测模型根据从所述至少一个数据源获取的正样本用户的用户特征和负样本用户的用户特征训练得到;其中,正样本用户具有目标资产的概率,大于负样本用户具有目标资产的概率;所述用户特征至少包括:用户行为特征;
概率预测模块300,用于根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率;
第一结果确定模块400,用于如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。
可选的,如图6所示,该装置还可以包括:
第二结果确定模块500,用于如果所述待挖掘用户具有目标资产的概率小于或等于所述概率阈值,确定所述待挖掘用户不具有目标资产。
可选的,图7示出了本发明实施例提供的数据处理装置的另一结构框图,结合图6和图7所示,该装置还可以包括:
模型训练模块600,用于从用户集中确定目标用户;所述目标用户的用户行为特征与预定的目标资产的正向特征词相匹配,且所述目标用户的用户行为特征与预定的目标资产的负向特征词不匹配;将所述目标用户作为训练目标资产状态预测模型所使用的正样本用户,并从用户集中选取除正样本用户以外的至少一个用户作为训练目标资产状态预测模型所使用的负样本用户;从所述至少一个数据源,获取正样本用户的用户特征及负样本用户的用户特征;根据正样本用户的用户特征和负样本用户的用户特征,通过机器训练方法,训练出目标资产状态预测模型。
可选的,模型训练模块600,用于从用户集中确定目标用户,具体包括:
从至少一个数据源中获取用户集中的用户对应的用户行为记录,得到多条用户行为记录;其中,一条用户行为记录表示一个用户在一个时间点的用户行为特征;
根据预定的目标资产的正向特征词和预定的目标资产的负向特征词,从所述多条用户行为记录中确定用户行为特征与所述正向特征词相匹配,且与所述负向特征词不匹配的用户行为记录;
将所确定的用户行为记录对应的用户确定为初选用户;
从所述初选用户中确定目标用户。
可选的,模型训练模块600,用于从所述初选用户中确定目标用户,具体包括:
确定各初选用户的分值;一个初选用户的分值表征该初选用户具有目标资产的概率;
根据各初选用户的分值,从初选用户中确定第一数量的目标用户。
可选的,模型训练模块600,用确定一个初选用户的分值,具体包括:
确定初选用户的用户行为记录所对应的数据源,得到初选用户对应的数据源;及确定初选用户在所对应的各数据源中对应的行为次数,和行为发生时间;
对于初选用户所对应的各数据源,将数据源的数据源权重,初选用户在数据源中对应的用户行为次数,及行为发生时间相结合,以得到初选用户在所对应的各数据源的分值;
将初选用户在所对应的各数据源的分值相加,得到初选用户的分值。
可选的,模型训练模块600,用于根据各初选用户的分值,从初选用户中确定第一数量的目标用户,具体包括:
选取分值排序靠前的第一数量的初选用户,作为目标用户;
或,确定分值大于分值阈值的初选用户,从分值大于分值阈值的初选用户中随机选取第一数量的初选用户,作为目标用户。
可选的,模型训练模块600,用于获取多条用户行为记录,具体包括:
获取从至少一个数据源收集的多条用户行为数据,所述多条用户行为数据对应的用户包含于所述用户集中;
对所述多条用户行为数据进行预处理,得到预处理后的用户行为数据,提炼出各条预处理后的用户行为数据对应的用户行为记录,得到多条用户行为记录。
可选的,模型训练模块600,用于对所述多条用户行为数据进行预处理, 具体包括:
从所述多条用户行为数据中删除属于数据噪声的用户行为数据;
和/或,对所述多条用户行为数据中缺失属性值的用户行为数据,进行属性值的补齐处理。
可选的,图8示出了本发明实施例提供的数据处理装置的再一结构框图,结合图7和图8所示,该装置还可以包括:
模型测试模块700,用于从初选用户中随机选取第二数量的用户作为测试样本用户;
将各测试样本用户的用户特征输入,训练出的目标资产状态预测模型,得到训练出的目标资产状态预测模型,对测试样本用户的预测结果;
将所述预测结果采用ROC曲线表示;
根据所述ROC曲线调整所述概率阈值。
可选的,所述用户特征还包括:基础属性信息和/或兴趣特征。
可选的,图9示出了本发明实施例提供的数据处理装置的又一结构框图,结合图6和图9所示,该装置还可以包括:
画像生成模块800,用于根据所确定的所述待挖掘用户具有目标资产的结果,生成所述待挖掘用户的用户画像;
信息推荐模块900,用于根据所确定的所述待挖掘用户具有目标资产的结果,向所述待挖掘用户推荐与所述目标资产相关联的信息。
可选的,画像生成模块800和信息推荐模块900也可择一应用在图6所示装置中。
本发明实施例还提供一种服务器,该服务器可以包括上述所述的数据处理装置。
可选的,图10示出了服务器的硬件结构框图,参照图10,该服务器可以包括:处理器10,通信接口20,存储器30和通信总线40;
其中处理器10、通信接口20、存储器30通过通信总线40完成相互间的通信;
可选的,通信接口20可以为通信模块的接口,如GSM模块的接口;
处理器10可能是一个中央处理器CPU,或者是特定集成电路ASIC (Application Specific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路。
存储器30可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
其中,处理器10具体用于:
从至少一个数据源获取待挖掘用户的用户特征;
获取预训练的目标资产状态预测模型,所述目标资产状态预测模型根据从所述至少一个数据源获取的正样本用户和负样本用户的用户特征训练得到;其中,正样本用户具有目标资产的可能性,大于负样本用户具有目标资产的可能性;所述用户特征至少包括:用户行为特征;
根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率;
如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。
另外,本发明实施例还提供了一种存储介质,存储介质用于存储程序代码,程序代码用于执行上述实施例提供的数据处理方法。
本发明实施例还提供了一种包括指令的计算机程序产品,当其在服务器上运行时,使得服务器执行上述实施例提供的数据处理方法。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的核心思想或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (17)

  1. 一种数据处理方法,应用于服务器,所述方法包括:
    从至少一个数据源获取待挖掘用户的用户特征;
    获取预训练的目标资产状态预测模型,所述目标资产状态预测模型根据从所述至少一个数据源获取的正样本用户的用户特征和负样本用户的用户特征训练得到;其中,正样本用户具有目标资产的概率,大于负样本用户具有目标资产的概率;所述用户特征至少包括:用户行为特征;
    根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率;
    如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。
  2. 根据权利要求1所述的数据处理方法,还包括:
    如果所述待挖掘用户具有目标资产的概率小于或等于所述概率阈值,确定所述待挖掘用户不具有目标资产。
  3. 根据权利要求1或2所述的数据处理方法,所述目标资产状态预测模型通过如下训练方式得到:
    从用户集中确定目标用户;所述目标用户的用户行为特征与预定的目标资产的正向特征词相匹配,且所述目标用户的用户行为特征与预定的目标资产的负向特征词不匹配;
    将所述目标用户作为训练目标资产状态预测模型所使用的正样本用户,并从用户集中选取除正样本用户以外的至少一个用户作为训练目标资产状态预测模型所使用的负样本用户;
    从所述至少一个数据源,获取正样本用户的用户特征及负样本用户的用户特征;
    根据正样本用户的用户特征和负样本用户的用户特征,通过机器训练方法,训练出目标资产状态预测模型。
  4. 根据权利要求3所述的数据处理方法,所述从用户集中确定目标用户包括:
    从至少一个数据源中获取用户集中的用户对应的用户行为记录,得到多条 用户行为记录;其中,一条用户行为记录表示一个用户在一个时间点的用户行为特征;
    根据预定的目标资产的正向特征词和预定的目标资产的负向特征词,从所述多条用户行为记录中确定用户行为特征与所述正向特征词相匹配,且与所述负向特征词不匹配的用户行为记录;
    将所确定的用户行为记录对应的用户确定为初选用户;
    从所述初选用户中确定目标用户。
  5. 根据权利要求4所述的数据处理方法,所述从所述初选用户中确定目标用户包括:
    确定各初选用户的分值;一个初选用户的分值表征该初选用户具有目标资产的概率;
    根据各初选用户的分值,从初选用户中确定第一数量的目标用户。
  6. 根据权利要求5所述的数据处理方法,一个初选用户的分值确定过程包括:
    确定初选用户的用户行为记录所对应的数据源,得到初选用户对应的数据源;及确定初选用户在所对应的各数据源中对应的行为次数,和行为发生时间;
    对于初选用户所对应的各数据源,将数据源的数据源权重,初选用户在数据源中对应的用户行为次数,及行为发生时间相结合,以得到初选用户在所对应的各数据源的分值;
    将初选用户在所对应的各数据源的分值相加,得到初选用户的分值。
  7. 根据权利要求6所述的数据处理方法,所述根据各初选用户的分值,从初选用户中确定第一数量的目标用户包括:
    选取分值排序靠前的第一数量的初选用户,作为目标用户;
    或,确定分值大于分值阈值的初选用户,从分值大于分值阈值的初选用户中随机选取第一数量的初选用户,作为目标用户。
  8. 根据权利要求4所述的数据处理方法,所述获取多条用户行为记录包括:
    获取从至少一个数据源收集的多条用户行为数据,所述多条用户行为数据对应的用户包含于所述用户集中;
    对所述多条用户行为数据进行预处理,得到预处理后的用户行为数据,提炼出各条预处理后的用户行为数据对应的用户行为记录,得到多条用户行为记录。
  9. 根据权利要求8所述的数据处理方法,所述对所述多条用户行为数据进行预处理包括:
    从所述多条用户行为数据中删除属于数据噪声的用户行为数据;
    和/或,对所述多条用户行为数据中缺失属性值的用户行为数据,进行属性值的补齐处理。
  10. 根据权利要求4-9任一项所述的数据处理方法,还包括:
    从初选用户中随机选取第二数量的用户作为测试样本用户;
    将各测试样本用户的用户特征输入目标资产状态预测模型,得到测试样本用户的预测结果;
    将所述预测结果采用ROC曲线表示;
    根据所述ROC曲线调整所述概率阈值。
  11. 根据权利要求1所述的数据处理方法,所述用户特征还包括:基础属性信息和/或兴趣特征。
  12. 根据权利要求1所述的数据处理方法,还包括:
    根据所确定的所述待挖掘用户具有目标资产的结果,生成所述待挖掘用户的用户画像;
    或,根据所确定的所述待挖掘用户具有目标资产的结果,向所述待挖掘用户推荐与所述目标资产相关联的信息。
  13. 一种数据处理装置,应用于服务器,所述数据处理装置包括:
    特征获取模块,用于从至少一个数据源获取待挖掘用户的用户特征;
    模型调取模块,用于获取预训练的目标资产状态预测模型,所述目标资产状态预测模型根据从所述至少一个数据源获取的正样本用户的用户特征和负样本用户的用户特征训练得到;其中,正样本用户具有目标资产的概率,大于负样本用户具有目标资产的概率;所述用户特征至少包括:用户行为特征;
    概率预测模块,用于根据所述待挖掘用户的用户特征,与所述目标资产状态预测模型,预测所述待挖掘用户具有目标资产的概率;
    第一结果确定模块,用于如果所述待挖掘用户具有目标资产的概率大于概率阈值,确定所述待挖掘用户具有目标资产。
  14. 根据权利要求13所述的数据处理装置,还包括:
    模型训练模块,用于从用户集中确定目标用户;所述目标用户的用户行为特征与预定的目标资产的正向特征词相匹配,且所述目标用户的用户行为特征与预定的目标资产的负向特征词不匹配;将所述目标用户作为训练目标资产状态预测模型所使用的正样本用户,并从用户集中选取除正样本用户以外的至少一个用户作为训练目标资产状态预测模型所使用的负样本用户;从所述至少一个数据源,获取正样本用户的用户特征及负样本用户的用户特征;根据正样本用户的用户特征和负样本用户的用户特征,通过机器训练方法,训练出目标资产状态预测模型。
  15. 一种服务器,所述服务器包括:
    处理器、通信接口、存储器和通信总线;
    其中,所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信;所述通信接口为通信模块的接口;
    所述存储器,用于存储程序代码,并将所述程序代码传输给所述处理器;
    所述处理器,用于调用存储器中程序代码的指令执行权利要求1-12任意一项所述的数据处理方法。
  16. 一种存储介质,所述存储介质用于存储程序代码,所述程序代码用于执行权利要求1-12任意一项所述的数据处理方法。
  17. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1-12任意一项所述的数据处理方法。
PCT/CN2018/080842 2017-04-20 2018-03-28 数据处理方法、装置及服务器 WO2018192348A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710261884.XA CN108734327A (zh) 2017-04-20 2017-04-20 一种数据处理方法、装置及服务器
CN201710261884.X 2017-04-20

Publications (1)

Publication Number Publication Date
WO2018192348A1 true WO2018192348A1 (zh) 2018-10-25

Family

ID=63857099

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/080842 WO2018192348A1 (zh) 2017-04-20 2018-03-28 数据处理方法、装置及服务器

Country Status (2)

Country Link
CN (1) CN108734327A (zh)
WO (1) WO2018192348A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487262A (zh) * 2020-11-25 2021-03-12 建信金融科技有限责任公司 一种数据处理的方法和装置

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472640A (zh) * 2018-11-09 2019-03-15 斑马网络技术有限公司 客户识别方法、装置、设备以及存储介质
CN109783539A (zh) * 2019-01-07 2019-05-21 腾讯科技(深圳)有限公司 用户挖掘及其模型构建方法、装置及计算机设备
CN109858974A (zh) * 2019-02-18 2019-06-07 重庆邮电大学 已购车用户识别模型构建方法及识别方法
CN109919219B (zh) * 2019-03-01 2021-02-26 北京邮电大学 一种基于粒计算ML-kNN的Xgboost多视角画像构建方法
CN113728342A (zh) * 2019-05-31 2021-11-30 Abb瑞士股份有限公司 用于配置用于监控工业过程和工业资产的监控***的方法
CN111126714A (zh) * 2019-12-31 2020-05-08 青梧桐有限责任公司 基于长租公寓租房场景下的退租预测***及方法
CN112269937B (zh) * 2020-11-16 2024-02-02 加和(北京)信息科技有限公司 一种计算用户相似度的方法、***及装置
CN117649300B (zh) * 2024-01-29 2024-04-30 山东新睿信息科技有限公司 一种基于数字孪生的资产分配管理方法及***

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980211A (zh) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 一种机器学习模型及其建立方法
CN104751234A (zh) * 2013-12-31 2015-07-01 华为技术有限公司 一种用户资产的预测方法及装置
CN105447730A (zh) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 目标用户定向方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10331785B2 (en) * 2012-02-17 2019-06-25 Tivo Solutions Inc. Identifying multimedia asset similarity using blended semantic and latent feature analysis
CN103064987B (zh) * 2013-01-31 2016-09-21 五八同城信息技术有限公司 一种虚假交易信息识别方法
CN104933075A (zh) * 2014-03-20 2015-09-23 百度在线网络技术(北京)有限公司 用户属性预测平台和方法
CN104331502B (zh) * 2014-11-19 2018-04-03 杭州亚信软件有限公司 针对快递员周边人群营销中快递员数据的识别方法
CN104933157A (zh) * 2015-06-26 2015-09-23 百度在线网络技术(北京)有限公司 用于获取用户属性信息的方法、装置及服务器

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980211A (zh) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 一种机器学习模型及其建立方法
CN104751234A (zh) * 2013-12-31 2015-07-01 华为技术有限公司 一种用户资产的预测方法及装置
CN105447730A (zh) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 目标用户定向方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487262A (zh) * 2020-11-25 2021-03-12 建信金融科技有限责任公司 一种数据处理的方法和装置

Also Published As

Publication number Publication date
CN108734327A (zh) 2018-11-02

Similar Documents

Publication Publication Date Title
WO2018192348A1 (zh) 数据处理方法、装置及服务器
US11403643B2 (en) Utilizing a time-dependent graph convolutional neural network for fraudulent transaction identification
US11693907B2 (en) Domain-specific negative media search techniques
WO2019028179A1 (en) SYSTEMS AND METHODS FOR PROVIDING DISAPPEARED IMPACT INFORMATION OF AUTOMATIC LEARNING MODEL
WO2017121314A1 (zh) 信息推荐方法及装置
US9031967B2 (en) Natural language processing system, method and computer program product useful for automotive data mapping
CN108550065B (zh) 评论数据处理方法、装置及设备
US20130159288A1 (en) Information graph
WO2011089461A1 (en) Patent scoring and classification
KR20220144356A (ko) 자동 모델 생성을 위한 시스템들 및 방법들
CN110443290B (zh) 一种基于大数据的产品竞争关系量化生成方法及装置
US8825641B2 (en) Measuring duplication in search results
WO2023000491A1 (zh) 一种应用推荐方法、装置、设备及计算机可读存储介质
US20230099627A1 (en) Machine learning model for predicting an action
CN114118816A (zh) 一种风险评估方法、装置、设备及计算机存储介质
US20230058076A1 (en) Method and system for auto generating automotive data quality marker
WO2023168222A1 (en) Systems and methods for predictive analysis of electronic transaction representment data using machine learning
US20230281635A1 (en) Systems and methods for predictive analysis of electronic transaction representment data using machine learning
CN115769194A (zh) 跨数据集的自动数据链接
CN115115322A (zh) 目标群组识别方法、风险评估方法、装置、设备及介质
CN112115258A (zh) 一种用户的信用评价方法、装置、服务器及存储介质
CN117094817B (zh) 一种信用风险控制智能预测方法及***
CN115034400B (zh) 一种业务数据处理方法、装置、电子设备及存储介质
CN116628310B (zh) 内容的推荐方法、装置、设备、介质及计算机程序产品
CN110197056B (zh) 关系网络和关联身份识别方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18787336

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18787336

Country of ref document: EP

Kind code of ref document: A1