CN113762585A - Data processing method, account type identification method and device - Google Patents

Data processing method, account type identification method and device Download PDF

Info

Publication number
CN113762585A
CN113762585A CN202110535924.1A CN202110535924A CN113762585A CN 113762585 A CN113762585 A CN 113762585A CN 202110535924 A CN202110535924 A CN 202110535924A CN 113762585 A CN113762585 A CN 113762585A
Authority
CN
China
Prior art keywords
account
sample
identification
service data
models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110535924.1A
Other languages
Chinese (zh)
Other versions
CN113762585B (en
Inventor
张堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110535924.1A priority Critical patent/CN113762585B/en
Publication of CN113762585A publication Critical patent/CN113762585A/en
Application granted granted Critical
Publication of CN113762585B publication Critical patent/CN113762585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The application discloses a data processing method, an account type identification device, computer equipment and a storage medium, and belongs to the technical field of computers. According to the method and the device, joint training is performed on a plurality of initial recognition models and initial fusion models through a plurality of groups of sample service data of sample accounts under various service reference dimensions and the marked account types, and in the process of adjusting parameters of each model, parameters of other models are fixed to be unchanged, so that layered training is not performed in isolation when a single model is trained, an end-to-end training mode is provided, the recognition accuracy of a plurality of trained account recognition models and fusion models can be greatly improved, and the recognition accuracy of the account types is improved.

Description

Data processing method, account type identification method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, an account type identification device, a computer device, and a storage medium.
Background
With the development of computer technology and the advancement of Artificial Intelligence (AI) technology, in more and more application scenarios, a machine learning model is required to perform some recognition tasks, such as face recognition, voice recognition, type recognition (i.e. classification), and the like.
Taking the identification of high-quality accounts in short video accounts as an example, the account of each user has multiple service reference dimensions, such as account activity, account consumption condition, account interaction force and the like. For high-quality account numbers with high account number activity, good account number consumption condition and large account number interaction power, the high-quality account numbers are usually easily identified by a model, in addition, the high-quality account numbers with good account number consumption condition and low account number activity exist, the identification accuracy of the current model for the high-quality account numbers is low, and therefore a method capable of improving the identification accuracy of the account number types is urgently needed.
Disclosure of Invention
The embodiment of the application provides a data processing method, an account type identification device, computer equipment and a storage medium, and can improve the identification accuracy of the account type. The technical scheme is as follows:
in one aspect, a method for processing data is provided, and the method includes:
acquiring a plurality of groups of sample service data and account types of sample accounts, wherein different groups of sample service data correspond to different service reference dimensions;
adjusting parameters of a plurality of initial identification models and initial fusion models based on the plurality of groups of sample service data and the account number types, wherein when the parameters of any initial identification model or the initial fusion model are adjusted, the parameters of other models are kept unchanged;
and responding to the fact that the iteration accords with the convergence condition, acquiring a plurality of account identification models and fusion models after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on a single group of service data of the corresponding service reference dimension, and the fusion models are used for acquiring the predicted account types based on a plurality of groups of service data.
In one aspect, a method for identifying an account type is provided, and the method includes:
acquiring a plurality of groups of service data of a target account, wherein different groups of service data correspond to different service reference dimensions;
acquiring a plurality of first identification results of the target account based on the plurality of groups of service data, wherein the first identification results are predicted account types determined based on a single group of service data;
acquiring a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined based on the plurality of groups of service data;
determining a predicted account type of the target account based on the plurality of first recognition results of the target account and the second recognition result of the target account.
In one aspect, an apparatus for processing data is provided, the apparatus comprising:
the first acquisition module is used for acquiring a plurality of groups of sample service data and account types of the sample accounts, wherein different groups of sample service data correspond to different service reference dimensions;
the adjusting module is used for adjusting parameters of a plurality of initial identification models and initial fusion models based on the plurality of groups of sample service data and the account number types, wherein when the parameters of any initial identification model or the initial fusion model are adjusted, the parameters of other models are kept unchanged;
and the second obtaining module is used for responding to the fact that the iteration accords with the convergence condition, obtaining a plurality of account identification models and fusion models after parameter adjustment, wherein the account identification models are used for obtaining the predicted account types based on a single group of service data corresponding to the service reference dimension, and the fusion models are used for obtaining the predicted account types based on a plurality of groups of service data.
In one possible embodiment, the adjustment module is configured to:
respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results of the sample account;
inputting the plurality of first sample identification results into the initial fusion model to obtain a second sample identification result of the sample account;
determining a loss function value based on the plurality of first sample identification results, the second sample identification results, and the account number type;
in response to the non-compliance with the stop condition, adjusting parameters of any one of the plurality of initial recognition models until the stop condition is met, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the adjustment of the parameters of the plurality of initial recognition models.
In one possible embodiment, the adjusting module is further configured to:
and adjusting parameters of a basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein any initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
In one possible embodiment, the adjusting module is further configured to:
respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results;
and adjusting parameters of a basic fusion model based on the plurality of first sample identification results to obtain the initial fusion model.
In one possible implementation, the convergence condition is that the difference between the loss function values of the current iteration process and the last iteration process is smaller than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
In one aspect, an apparatus for identifying an account type is provided, and the apparatus includes:
the first acquisition module is used for acquiring a plurality of groups of service data of the target account, wherein different groups of service data correspond to different service reference dimensions;
a second obtaining module, configured to obtain multiple first identification results of the target account based on the multiple sets of service data, where the first identification results are predicted account types determined based on a single set of the service data;
a third obtaining module, configured to obtain a second recognition result of the target account based on multiple first recognition results of the target account, where the second recognition result is a predicted account type determined based on the multiple sets of service data;
a determination module, configured to determine a predicted account type of the target account based on the plurality of first recognition results of the target account and the second recognition result of the target account.
In one possible implementation, the second obtaining module is configured to:
for any group of service data in the multiple groups of service data, determining a service reference dimension corresponding to the any group of service data;
determining an account identification model corresponding to a business reference dimension based on a mapping relation between the business reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single group of business data of the business reference dimension;
and inputting the any group of business data into the account identification model, and processing the any group of business data through the account identification model to obtain a first identification result corresponding to the any group of business data.
In one possible implementation, the third obtaining module is configured to:
inputting the first recognition results into a fusion model, weighting the first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for acquiring a predicted account type based on a plurality of groups of service data;
and performing linear mapping on the sum of the weighted recognition results to obtain the second recognition result.
In one possible implementation, the first obtaining module is configured to:
determining service reference dimensions to which a plurality of service data of the target account belong respectively;
and dividing all the service data belonging to the same service reference dimension into the same group of service data.
In one possible implementation, the service reference dimension includes at least two of: account liveness, account influence, account consumption, account interaction force, number of associated accounts, and video playing completion rate.
In one aspect, a computer device is provided, which includes one or more processors and one or more memories, where at least one computer program is stored in the one or more memories, and the at least one computer program is loaded and executed by the one or more processors to implement the processing method of the data or the identification method of the account type.
In one aspect, a storage medium is provided, and at least one computer program is stored in the storage medium, and is loaded and executed by a processor to implement the data processing method or the account type identification method.
In one aspect, a computer program product or computer program is provided that includes one or more program codes stored in a computer readable storage medium. The one or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the computer device can execute the above-mentioned data processing method or account type identification method.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
the method has the advantages that multiple initial recognition models and initial fusion models are jointly trained by utilizing multiple groups of sample service data of sample accounts under multiple service reference dimensions and the marked account types, parameters of other models are fixed in the process of adjusting parameters of each model, so that individual model training is not independently conducted when a single model is trained, an end-to-end training mode is provided, namely, each model is independently trained by considering the recognition result of the model, but is jointly trained by combining the recognition results of other models, the recognition accuracy of the trained multiple account recognition models and fusion models can be greatly improved, and the recognition accuracy of the account types is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to be able to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a cross-validation method provided by an embodiment of the present application;
fig. 3 is a flowchart of a data processing method provided in an embodiment of the present application;
fig. 4 is a flowchart of a data processing method provided in an embodiment of the present application;
fig. 5 is a flowchart of a method for identifying an account type according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of an account type identification method according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram of an account type identification method according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an account type identification device according to an embodiment of the present disclosure;
FIG. 10 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.
The term "at least one" in this application means one or more, and the meaning of "a plurality" means two or more, for example, a plurality of first locations means two or more first locations.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises an audio processing technology, a computer vision technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The solution provided in the embodiment of the present application relates to technologies such as machine learning of artificial intelligence, and mainly relates to how to improve accuracy of identifying account types by using a machine learning model, which will be described in detail in each embodiment below.
Fig. 1 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present application. Referring to fig. 1, in this implementation environment, a terminal 110 and a server 120 are included, and the terminal 110 and the server 120 are exemplary illustrations of computer devices.
The terminal 110 is configured to provide service data of a target account, a user logs in the target account on the terminal 110 and may initiate various service requests and service behaviors, and the server 120 may collect and count the service data of the target account, or the terminal 110 collects and counts the service data of the target account and then sends the service data to the server 120.
The terminal 110 and the server 120 can be directly or indirectly connected through wired or wireless communication, and the application is not limited thereto.
The server 120 may be configured to provide business services to the terminals 110, and may also be configured to perform account type identification of a target account, that is, the server 120 may train multiple account identification models and a fusion model, where each account identification model is configured to obtain a predicted account type of an input account based on a single set of business data of a corresponding business reference dimension, and the fusion model is configured to obtain the predicted account type of the input account based on multiple sets of business data, and further, may provide multiple first identification results output by the multiple account identification models and a second identification result output by the fusion model to a technician, so that the technician may analyze the account type of the input account. In addition, after the account type is determined, based on the determined account type, an indication effect can be played on resource recommendation work of a subsequent target account, optionally, the resource recommendation work includes two meanings, one is how to perform accurate resource recommendation on the target account, and the other is whether to recommend the target account to other accounts so as to improve the exposure rate of the target account. In addition, the account consumption conditions can be used for integrally sequencing the accounts, so that the accounts with higher consumption potential can be mined, high-quality accounts (or called high-level accounts and core accounts) can be mined or deleted, and the method has wide application value.
It should be noted that, as disclosed in the present application, the service data of each account may be stored on the blockchain.
Optionally, the server 120 maintains the multiple account identification models and the fusion model only at its own server, so that the server 120 can group the collected multiple service data of each account according to the service reference dimension to obtain multiple sets of service data, call the multiple account identification models to obtain multiple first identification results, call the fusion model to obtain a second identification result, and determine the predicted account type of each account based on the multiple first identification results and the second identification result, so as to accurately recommend resources to the terminal logged in by each account based on the predicted account type of each account.
Optionally, after the plurality of account identification models and the fusion model are obtained through training, the server 120 sends the plurality of account identification models and the fusion model to the terminal 110, so that the terminal 110 can locally call the plurality of account identification models and the fusion model to determine the predicted account type of the self account, and the terminal 110 actively requests the server 120 to recommend the corresponding resource according to the predicted account type of the self account.
Optionally, after the plurality of account identification models and the fusion model are obtained through local training by the terminal 110, the plurality of account identification models and the fusion model are locally invoked to automatically determine the predicted account type of the self account, so that the terminal 110 actively requests the server 120 to recommend the corresponding resource according to the predicted account type of the self account, and the communication overhead between the terminal 110 and the server 120 can be reduced.
The server 120 may include at least one of a server, a plurality of servers, a cloud computing platform, or a virtualization center. Alternatively, the server 120 may undertake primary computational tasks and the terminal 110 may undertake secondary computational tasks; alternatively, the server 120 undertakes the secondary computing work and the terminal 110 undertakes the primary computing work; alternatively, the terminal 110 and the server 120 perform cooperative computing by using a distributed computing architecture.
Optionally, the server 120 is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data and artificial intelligence platform, and the like.
Optionally, the terminal 110 is a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, an MP3(Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4) player, an e-book reader, and the like, but is not limited thereto.
Those skilled in the art will appreciate that terminal 110 may refer broadly to one of a plurality of terminals, which may be more or less in number. For example, the number of the terminals may be only one, or several tens or hundreds of the terminals, or more. The number of terminals and the type of the device are not limited in the embodiments of the present application.
With the development of computer technology and the advancement of AI technology, a stacking (ensemble learning) method is a multi-model fusion method, and when multiple sets of business data are involved in processing, the stacking method needs to respectively train one or more models under each set of business data, and predict on a training set and a test set to prepare for model fusion in the next step. In the above process, if multiple models are trained for each group of service data, the multiple models can be regarded as a whole for the purpose of performing cross validation to prevent model overfitting, and if the overfitting problem is not considered, only one model can be trained for each group of service data.
Fig. 2 is a schematic diagram of a cross validation method provided in an embodiment of the present application, as shown in 200, showing a training process of a model corresponding to any group of business data, randomly dividing the business data (i.e., training data) into 5 parts according to data size, and training based on a basic model to obtain 5 models 1, where the 5 models 1 can be regarded as an integral model group 1, when training each model 1, selecting 1 part of the 5 parts as a validation set, combining the remaining 4 parts into a training set, training the model 1 and obtaining a prediction result of the model 1 on the validation set, then splicing the prediction results of the 5 models 1 in the validation set to obtain a complete prediction result set of the model group 1, further, inputting the test set into the 5 models 1 respectively to obtain respective test results of the 5 models 1, the average of the 5 test results was taken as the final test result of model group 1. The above-mentioned process is executed for each group of service data, a prediction result set and a final test result of each model group can be obtained, and a fusion model can be trained based on the prediction result set and the final test result. In actual use, the service data with different service reference dimensions are respectively input into the corresponding model groups, the output result of each model group is averaged and then input into the fusion model, and the final service processing result can be output.
The problem of insufficient information exists in the training process of the stacking method, which is described by taking a task of identifying a high-quality account in a short video account as an example, each account has multiple groups of service reference dimensions such as account activity, account consumption condition, account interaction force and the like, and thus whether each account is a high-quality account is comprehensively judged. In practical application, one type of account exists, and under the condition that account consumption is very good, even if the activity of the account is not high, the account is finally determined to be a high-quality account, which indicates that the final identification result needs to be subjected to multi-dimensional determination by integrating multiple groups of service data. In the process of training each set of service data, training samples may be changed into noise data under different sets of service data, for example, a common high-quality account also has a good account consumption condition, and belongs to a normal sample when training under the service data set under the account consumption condition, but the common high-quality account also has a higher account activity, and at this time, the account shows a lower account activity but carries a label of the high-quality account, so that the characteristic opposite to the account activity of the actual high-quality account is shown, and thus the common high-quality account belongs to the noise sample when training under the service data set under the account activity, which may cause new noise to be added in the training process, thereby causing interference to the training process, and causing the identification accuracy of the entire model to be reduced.
In addition, after the training method is used for training the models under each group of service data, the models are predicted on the same data set and used as input data of the next layer, and in order to ensure that the models are not over-fitted on the training set, a cross validation mode shown in fig. 2 is adopted, namely, one model group is trained on each group of service data instead of a single model, so that more-fold models can be operated in actual use, and the calculation efficiency is greatly influenced.
In view of this, an embodiment of the present application provides a data processing method, which provides an end-to-end training method, and avoids the problem of insufficient training process information caused by training fusion models after service data are grouped, so as to obtain multiple account identification models and fusion models with higher identification accuracy through training.
Fig. 3 is a flowchart of a data processing method according to an embodiment of the present application, please refer to fig. 3, which is applied to a computer device, and the following description takes the computer device as a server as an example, and the embodiment includes the following steps:
301. the server obtains a plurality of groups of sample service data and account types of the sample accounts, wherein different groups of sample service data correspond to different service reference dimensions.
The sample account number may be any account number registered on a platform provided by a server, and the number of the sample account numbers may be one or more, in this embodiment of the present application, only a single iteration process of a single sample account number is taken as an example for description, but no limitation on the number of the sample account numbers is to be formed, and when the number of the sample account numbers is multiple, each sample account number may execute a service data processing flow similar to that of the single sample account number.
Optionally, according to different services provided by the account registration platform, the sample account may be a short video account, a social media account, an up primary (video uploader) account, a public number, a game account, a comment account, or the like, which is not specifically limited in this embodiment of the present application.
In some embodiments, when obtaining the plurality of groups of sample service data, the server may first obtain a plurality of original sample service data of the sample account, determine a service reference dimension to which each of the plurality of sample service data of the sample account belongs, and divide the plurality of sample service data into the plurality of groups of sample service data based on a plurality of service reference dimensions, for example, divide each sample service data belonging to the same service reference dimension into the same group of sample service data.
It should be noted that the service reference dimension refers to an evaluation dimension that needs to be considered when the account types are divided, and different service reference dimensions can be set for different account types to be divided. For example, when dividing whether a sample account is a high quality account, the service reference dimensions to be considered include, but are not limited to: account liveness, account influence, account consumption, account interaction force and the like; for another example, when a sample account is divided into high-activity accounts, the service reference dimensions to be considered may only include account activity, account interaction force, and the like, but account consumption conditions and account influence do not need to be considered; for another example, if the sample account is a short video account, a video playing completion rate may be introduced in the service reference dimension, and if the sample account is a social media account, a homepage popularity of the account may be introduced in the service reference dimension, and the like.
In some embodiments, the account types of the sample accounts are also classification labels of the sample accounts, and the account types may be labeled manually by a technician, or sample accounts with different account types (for example, 1000 high-quality accounts and 1000 non-high-quality accounts) may be collected to obtain sample accounts that naturally carry the labeled account types.
In some embodiments, taking the sample account as the short video account as an example, the service reference dimension includes at least two of the following: account liveness, account influence, account consumption, account interaction force, number of associated accounts, video playing completion rate and the like.
Optionally, the plurality of sample business data of the sample account includes: the method comprises the steps of counting a daily playing amount of an account, a weekly playing amount of the account, a monthly playing amount of the account, an accumulated playing time of a video work uploaded by the account, an accumulated clicking number of a video uploaded by the account, an accumulated commenting number of a video work uploaded by the account, an accumulated sharing number of a video work uploaded by the account, an accumulated total consumption amount of the account, a consumption amount of a latest order of the account, an accumulated total dynamic commenting number of the account, an accumulated dynamic commenting number of the account, an account powder silk amount, an account comment amount, an account finished playing rate of all videos clicked cumulatively by the account, an finished playing rate of videos clicked within 7 days of the account and the like.
When grouping sample service data on the basis, it is necessary to determine the service reference dimension to which each sample service data belongs, that is: the business reference dimension of the three of the daily playing amount, the weekly playing amount and the monthly playing amount of the account is account activity, the business reference dimension of the four of the accumulated playing time of the video work uploaded by the account, the accumulated clicking times of the video uploaded by the account, the accumulated comment times of the video work uploaded by the account and the accumulated sharing times of the video work uploaded by the account is account influence, the business attribute of the accumulated total consumption amount of the account and the consumption amount of the latest dynamic order of the account is account consumption condition, the business reference dimension of the accumulated total dynamic comment number of the account and the latest dynamic comment number of the account is account interaction force, the business reference dimension of the account powder amount and the account comment amount is related account number, and the play-out rate of all videos clicked by the account is account activity, And the service reference dimension of the playing completion rate of the videos clicked in the account within 7 days is the playing completion rate of the videos.
In some embodiments, taking the sample account as the social media account as an example, the business reference dimension includes at least two of: account activity, account influence, account consumption, account interaction force, number of associated accounts, account homepage popularity and the like.
Optionally, the plurality of sample business data of the sample account includes: the account number daily active time, the account number weekly active time, the account number monthly active time, the account number published dynamic accumulated click times, the account number published dynamic accumulated comment times, the account number published dynamic accumulated share times, the account number accumulated total consumption amount, the consumption amount of the latest order of the account number, the account number membership grade, the account number accumulated all dynamic comment numbers, the latest dynamic comment number of the account number, the latest dynamic reading amount of the account number, the account number vermicelli amount, the account number concern amount, the account number daily access amount, the account number homepage weekly access amount, the account number homepage monthly access amount and the like.
When grouping sample service data on the basis, it is necessary to determine the service reference dimension to which each sample service data belongs, that is: the business reference dimensions of the account daily active duration, the account weekly active duration and the account monthly active duration are account activity, the business reference dimensions of the three of dynamic accumulated click times, dynamic accumulated comment times and dynamic accumulated share times issued by the account are account influence, the business reference dimensions of the three of the account accumulated total consumption amount, the consumption amount of the latest order of the account and the account membership grade are account consumption conditions, the business reference dimensions of the account accumulated all dynamic comment numbers, the latest dynamic comment numbers and the latest dynamic read amount of the account are account interaction forces, the business reference dimensions of the account powder amount and the account comment amount are associated account numbers, the daily access amount of the account homepage and the weekly access amount of the account homepage are related account numbers, and the business reference dimensions of the account homepage and the weekly access amount of the account homepage are related account numbers, The monthly access amount of the account homepage belongs to a service reference dimension which is the account homepage popularity.
In some embodiments, a user may log in the sample account on a terminal, initiate various service requests to a server based on the sample account, record various sample service data of the sample account when the server provides service services corresponding to the service requests to the sample account, and group the various sample service data according to service reference dimensions by using various statistical analysis tools to obtain multiple groups of sample service data of the sample account.
In other embodiments, the terminal locally records each sample service data of the sample account, and uses various statistical analysis tools to group each sample service data according to a service reference dimension, so as to obtain multiple groups of sample service data of the sample account, and sends the multiple groups of sample service data of the sample account to the server, and the server receives the multiple groups of sample service data of the sample account.
Optionally, according to the setting of a technician, the sample account number may be divided into different account number types, for example, the sample account number is divided into a high-quality account number and a non-high-quality account number, or the sample account number is divided into a head account number, a waist account number and a bottom account number, or the sample account number is divided into a high-credit account number, a medium-credit account number and a low-credit account number, and a downstream task of personalized resource recommendation for account numbers of different account number types can be indicated by accurately dividing the account number types.
In the embodiment of the present application, only the example of identifying whether the sample account is a high-quality account is taken as an example, but the account type of the sample account should not be limited. The high-quality account refers to a core account or a high-quality account of the platform, and needs to be comprehensively evaluated and measured through various service reference dimensions such as account activity, account consumption condition, account interaction force and the like.
302. And the server adjusts parameters of a plurality of initial recognition models and initial fusion models based on the plurality of groups of sample business data and the account types, wherein when any initial recognition model or the parameters of the initial fusion model are adjusted, the parameters of other models are kept unchanged.
In some embodiments, the server, in adjusting the parameters of each of the initial recognition model and the initial fusion model, may perform the following operations: in the iteration process, the multiple groups of sample service data are respectively input into the multiple initial identification models to obtain multiple first sample identification results of the sample account; inputting the plurality of first sample identification results into the initial fusion model to obtain a second sample identification result of the sample account; determining a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type; in response to the non-compliance with the stop condition, adjusting parameters of any one of the plurality of initial recognition models until the stop condition is met, and adjusting parameters of a next initial recognition model; and adjusting the parameters of the initial fusion model in response to the adjustment of the parameters of the plurality of initial recognition models.
Alternatively, the stop condition may be that the loss function value is less than a loss threshold, which may be any value greater than or equal to 0, or the stop condition may be that the number of iterations is greater than a number threshold, which may be any integer greater than or equal to 1.
In the above process, taking any iteration process as an example for explanation, if the stopping condition is not met, adjusting the parameters of the current initial identification model (and fixing the parameters of other models to be unchanged), until the stopping condition is met, explaining that the parameters of the current initial identification model are adjusted completely, adjusting the parameters of the next initial identification model (and fixing the parameters of other models to be unchanged), repeating the above steps until all the parameters of the initial identification model are adjusted completely, adjusting the parameters of the initial fusion model (and fixing the parameters of other models to be unchanged), and when the parameters of the initial fusion model are also adjusted completely, considering that the iteration process is completed, continuing to execute the next iteration process.
Optionally, one or more sample account numbers may be input in the current iteration process, one or more sample account numbers may also be input in the next iteration process, and the sample account numbers in the current iteration process and the next iteration process may be the same or different, which is not specifically limited in this embodiment of the disclosure.
In the embodiment of the application, a plurality of models do not need to be trained respectively under each group of service data for cross validation in a stacking mode, but the overfitting problem caused by layered training is avoided because the overall training is carried out again in an end-to-end mode in the subsequent steps.
In some embodiments, the plurality of initial recognition models may each be a base recognition model that is not pre-trained, or the plurality of initial recognition models may also be models that are pre-trained on the basis of the base recognition model by the plurality of sets of sample traffic data.
Alternatively, the basic recognition model may be a Gradient Boosting Tree (GB) model, a Gradient Boosting Decision Tree (GBDT) model, an eXtreme Gradient Boosting (XGBoost) model, a Light Gradient Boosting Machine (Light Gradient Boosting Machine, LightGBM) model, or the like, and the embodiment of the present application does not specifically limit the model structure of the basic recognition model.
In some embodiments, the obtaining of any of the plurality of initial recognition models comprises: and performing parameter adjustment on the basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein the any initial identification model corresponds to a service reference dimension to which the any group of sample service data belongs.
In other words, the server pre-trains each group of sample service data to obtain an initial recognition model based on the basic recognition models with the same structure, so that each initial recognition model can correspond to the service reference dimension to which each input group of sample service data belongs, that is, the initialization work of each initial recognition model is completed.
In some embodiments, the initial fusion model may be a base fusion model that is not pre-trained, or the initial fusion model may be a model that is pre-trained on the basis of the base fusion model through the plurality of first sample recognition results.
Optionally, the basic fusion model may be a logistic regression model, a least square method model, or the like, and the embodiment of the present application does not specifically limit the model result of the basic fusion model.
In some embodiments, the obtaining of the initial fusion model comprises: respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results; and adjusting parameters of the basic fusion model based on the plurality of first sample identification results to obtain the initial fusion model.
In the above process, after each initial recognition model is obtained through pre-training, each group of sample service data is input into each corresponding initial recognition model again, and each first sample recognition result output by each initial recognition model is integrated to construct a new training set, wherein the training set is used for training the basic fusion model to finally obtain the initial fusion model, that is, the initialization work of the initial fusion model is completed.
303. The server responds to that the iteration meets the convergence condition, and obtains a plurality of account identification models and a fusion model after parameter adjustment, wherein the account identification models are used for obtaining the predicted account types based on a single group of service data of corresponding service reference dimensions, and the fusion model is used for obtaining the predicted account types based on a plurality of groups of service data.
In some embodiments, the convergence condition is that the difference between the loss function values of the current iteration and the last iteration is smaller than a convergence threshold, where the convergence threshold is a value greater than or equal to 0.
In some embodiments, the step 302 is performed iteratively, and the process of training all the initial recognition models and the initial fusion models is referred to as an iterative process. In each iteration process, obtaining a loss function value when the initial fusion model is trained in the previous iteration process, then obtaining the difference between the two loss function values, stopping training if the difference value between the two loss function values meets the convergence condition, determining a plurality of initial recognition models in the iteration process as the plurality of account recognition models, and determining the initial fusion model in the iteration process as the fusion model. If the difference between the two is not in accordance with the convergence condition, the next iteration process is continued until the convergence condition is met.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
According to the method provided by the embodiment of the application, multiple initial recognition models and initial fusion models are jointly trained by utilizing multiple groups of sample service data of sample accounts under multiple service reference dimensions and the marked account types, and in the process of adjusting parameters of each model, the parameters of other models are fixed to be unchanged, so that layered training is not independently performed when a single model is trained, but an end-to-end training mode is provided, namely, each model not only considers the recognition result of the model to perform independent training, but also performs joint training by combining the recognition results of other models, the recognition accuracy of the multiple account recognition models and the fusion models obtained by training can be greatly improved, and the recognition accuracy of the account types is improved.
Fig. 4 is a flowchart of a data processing method according to an embodiment of the present application, please refer to fig. 4, where the embodiment is applied to a computer device, and the following description takes the computer device as a server as an example, and the embodiment includes the following steps:
401. the server obtains a plurality of groups of sample service data and account types of the sample accounts, wherein different groups of sample service data correspond to different service reference dimensions.
Step 401 is similar to step 301, and is not described herein.
402. And the server respectively inputs the multiple groups of sample service data into multiple initial identification models to obtain multiple first sample identification results of the sample account.
In some embodiments, for any group of sample service data of the multiple groups of sample service data, in the current iteration process, the server determines a service reference dimension corresponding to the any group of service data; determining an initial identification model corresponding to the service reference dimension based on the mapping relation between the service reference dimension and the initial identification model; and inputting any group of sample service data into the initial identification model, and processing any group of sample service data through the initial identification model to obtain a first sample identification result corresponding to any group of sample service data. The above operation is performed on each group of sample service data, and a plurality of first sample identification results can be obtained.
Optionally, the process of acquiring any initial recognition model in the multiple initial recognition models is similar to the process of acquiring any initial recognition model in step 302, and is not described herein again.
403. And the server inputs the plurality of first sample identification results into the initial fusion model to obtain a second sample identification result of the sample account.
In some embodiments, the server inputs the plurality of first sample recognition results into an initial fusion model, and weights the plurality of first sample recognition results through the initial fusion model to obtain a plurality of sample weighted recognition results; and performing linear mapping on the sum value of the weighted identification results of the plurality of samples to obtain the second sample identification result.
Optionally, the process of acquiring the initial fusion model is similar to the process of acquiring the initial fusion model in step 302, and is not described herein again.
404. The server determines a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type.
The loss function value here refers to a loss function value of the plurality of initial identification models and the initial fusion model as a whole, that is, in each iteration process, the first sample identification result output by each initial identification model does not directly participate in the loss function calculation of its own single initial identification model, but is combined with the first sample identification result output by each other initial identification model and the second sample identification model output by the initial fusion model to obtain one loss function value as a whole.
For example, if it is assumed that each initial recognition model is a gradient lifting tree model, the initial fusion model is a logistic regression model, and it is assumed that the current iteration is performed to the ith group of sample service data, and the ith first sample recognition result of the gradient lifting tree model for a certain sample account is x, the account type (i.e., the actual classification label) of the sample account is y, and the loss function adopted by the gradient lifting tree model is L, then if the gradient lifting tree model is trained separately in a stacking manner, the loss function value L (x, y) needs to be obtained separately. However, in the end-to-end training method of the embodiment of the present application, the loss function value L (x, y) does not need to be obtained, but the second sample identification result p output by the logistic regression model is determined first, as shown in the following formula:
Figure BDA0003069825480000171
wherein, ciWeight parameter, f, representing the ith gradient lifting tree modeliA first sample recognition result representing an ith gradient lifting tree model output, wherein i is greater than or equal to 1 and less than or equal ton, n being the number of gradient boosting tree models and n being an integer greater than or equal to 1, b being the intercept (i.e. the bias parameter) of the logistic regression model.
Assuming that C is a summary of all constants, the above equation can be simplified as:
Figure BDA0003069825480000172
therefore, when the gradient lifting tree model is trained, the loss function values L (x, y) of the individual models do not need to be referred to, but the loss function values L (p, y) of the whole models need to be referred to, so that the current gradient lifting tree model can be trained under the whole model framework. Assuming that the loss function used is a logrikehood function, the loss function value is expressed as follows:
L=-(ylogp+(1-y)log(1-p))
405. the server responds to the non-conformity of the stopping condition, adjusts the parameters of any initial recognition model in the plurality of initial recognition models until the stopping condition is met, and adjusts the parameters of the next initial recognition model.
Alternatively, the stop condition may be that the loss function value is less than a loss threshold, which may be any value greater than or equal to 0, or the stop condition may be that the number of iterations is greater than a number threshold, which may be any integer greater than or equal to 1.
In some embodiments, in response to the non-compliance with the stop condition, for any initial recognition model in the plurality of initial recognition models, the server keeps the parameters of the initial recognition models except for the any initial recognition model and the initial fusion model unchanged while adjusting the parameters of the any initial recognition model, adjusts the parameters of the any initial recognition model until the compliance with the stop condition indicates that the parameters of the any initial recognition model are adjusted, and then adjusts the parameters of the next initial recognition model.
In other words, the server traverses each group of sample service data, when training the initial recognition model corresponding to each group of sample service data, the parameters of other initial recognition models corresponding to other groups of sample service data are fixed and the parameters of the initial fusion model are fixed, the current initial recognition model is trained again in the overall model frame until the current initial recognition model is trained completely, the next initial recognition model is trained continuously, the above steps are repeatedly executed until all the initial recognition models are trained, and then the following step 406 is executed to train the initial fusion model.
406. And the server responds to the condition that the parameters of the plurality of initial recognition models are all adjusted, and adjusts the parameters of the initial fusion model.
In some embodiments, the server starts adjusting the parameters of the initial fusion model only when it is ensured that all the parameters of the initial recognition model are adjusted. Next, the parameters of the adjusted initial recognition models are kept unchanged, the parameters of the initial fusion model are adjusted until the adjusted parameters meet the stopping condition, the loss function value of the current iteration process during the training of the initial fusion model is obtained, a difference value is calculated with the loss function value of the last iteration process during the training of the initial fusion model, and when the difference value meets the convergence condition, the following step 407 is executed.
In other words, when the server trains the initial fusion model, the parameters of all the initial recognition models are fixed and unchanged, the initial fusion model of this time is trained again in the whole model framework until the stopping condition is met, the loss function value of the iteration process of this time when the initial fusion model is trained is obtained, the difference value is calculated with the loss function value of the iteration process of the last time when the initial fusion model is trained, and when the difference value meets the convergence condition, the following step 407 is executed.
In the step 402-.
407. The server responds to that the iteration meets the convergence condition, and obtains a plurality of account identification models and a fusion model after parameter adjustment, wherein the account identification models are used for obtaining the predicted account types based on a single group of service data of corresponding service reference dimensions, and the fusion model is used for obtaining the predicted account types based on a plurality of groups of service data.
In some embodiments, the convergence condition is that the difference between the loss function values of the current iteration and the last iteration is smaller than a convergence threshold, where the convergence threshold is a value greater than or equal to 0.
Step 407 is similar to step 303, and is not described herein.
In some embodiments, since the parameters that need to be input in the training process include the first and second derivatives of the loss function value to the model output result (i.e., the first sample recognition result x), the expressions of the first and second derivatives are as follows:
Figure BDA0003069825480000191
Figure BDA0003069825480000192
in the iterative process, the first derivative and the second derivative are input by using the formula, so that the normal operation of the model iterative process can be ensured, an end-to-end training mode is realized, and the account identification model and the fusion model are trained.
In the embodiment of the application, the server finally obtains the plurality of account identification models and the fusion model by adjusting the parameters of the plurality of initial identification models and the initial fusion model. By keeping the parameters of other models under the whole framework unchanged when the parameters of each model are adjusted, and the loss function of the model does not participate in calculation, but participates in calculation with the final loss function of the overall model, the method can avoid splitting with the training process of other models when a single model is trained, solves the problem of overfitting of the model, does not need to train a plurality of models aiming at each group of sample service data, and greatly reduces the data volume of the model.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
According to the method provided by the embodiment of the application, multiple initial recognition models and initial fusion models are jointly trained by utilizing multiple groups of sample service data of sample accounts under multiple service reference dimensions and the marked account types, and in the process of adjusting parameters of each model, the parameters of other models are fixed to be unchanged, so that layered training is not independently performed when a single model is trained, but an end-to-end training mode is provided, namely, each model not only considers the recognition result of the model to perform independent training, but also performs joint training by combining the recognition results of other models, the recognition accuracy of the multiple account recognition models and the fusion models obtained by training can be greatly improved, and the recognition accuracy of the account types is improved.
In an exemplary scenario, an experiment is performed on a task of determining whether a short video account is a high-quality account, a recall rate is 46.6% in a case that an accuracy rate of a stacking method on a test set is 30%, and the recall rate is improved to 50.9% in an end-to-end training mode provided by the embodiment of the present application in a case that the accuracy rate is also 30%.
Fig. 5 is a flowchart of an account type identification method according to an embodiment of the present application. Referring to fig. 5, the embodiment is applied to a computer device, and the following description takes the computer device as a server as an example, and the embodiment includes:
501. the server acquires a plurality of groups of service data of the target account, and different groups of service data correspond to different service reference dimensions.
The target account number may be any account number registered on a platform provided by a server, the number of the target account numbers may be one or more, in the embodiment of the present application, only an account number type identification process of a single target account number is taken as an example for description, but no limitation on the number of the target account numbers is formed, and when the number of the target account numbers is multiple, each target account number may perform an account number type identification process similar to that of the single target account number.
Optionally, according to different services provided by the account registration platform, the target account may be a short video account, a social media account, an up primary (video uploader) account, a public number, a game account, a comment account, or the like, which is not specifically limited in this embodiment of the present application.
In some embodiments, when acquiring the multiple sets of service data, the server may first acquire multiple original service data of the target account, determine a service reference dimension to which each of the multiple pieces of service data of the target account belongs, and divide the multiple pieces of service data into the multiple sets of service data based on multiple service reference dimensions, for example, divide each piece of service data belonging to the same service reference dimension into the same set of service data.
It should be noted that the service reference dimension refers to an evaluation dimension that needs to be considered when the account types are divided, and different service reference dimensions can be set for different account types to be divided. For example, when dividing whether a target account is a high-quality account, the service reference dimensions to be considered include, but are not limited to: account liveness, account influence, account consumption, account interaction force and the like; for another example, when dividing whether the target account is a highly active account, the service reference dimension to be considered may only include account activity, account interaction force, and the like, but account consumption condition and account influence need not to be considered; for another example, if the target account is a short video account, a video playing completion rate may be introduced in the service reference dimension, and if the target account is a social media account, a homepage popularity of the account may be introduced in the service reference dimension, and the like.
In some embodiments, taking the target account as the short video account as an example, the service reference dimension includes at least two of the following: account liveness, account influence, account consumption, account interaction force, number of associated accounts, video playing completion rate and the like.
Optionally, the plurality of service data of the target account includes: the method comprises the steps of counting a daily playing amount of an account, a weekly playing amount of the account, a monthly playing amount of the account, an accumulated playing time of a video work uploaded by the account, an accumulated clicking number of a video uploaded by the account, an accumulated commenting number of a video work uploaded by the account, an accumulated sharing number of a video work uploaded by the account, an accumulated total consumption amount of the account, a consumption amount of a latest order of the account, an accumulated total dynamic commenting number of the account, an accumulated dynamic commenting number of the account, an account powder silk amount, an account comment amount, an account finished playing rate of all videos clicked cumulatively by the account, an finished playing rate of videos clicked within 7 days of the account and the like.
When service data grouping is performed on the basis, the service reference dimension to which each service data belongs needs to be determined, that is: the business reference dimensions of the three of the daily playing amount, the weekly playing amount and the monthly playing amount of the account are account activity, the business reference dimensions of the four of the cumulative playing time of the video work uploaded by the account, the cumulative clicking times of the video uploaded by the account, the cumulative comment times of the video work uploaded by the account and the cumulative sharing times of the video work uploaded by the account are account influence, the business reference dimensions of the two of the cumulative total consumption amount of the account and the consumption amount of the latest order of the account are account consumption conditions, the business reference dimensions of the two of the cumulative total dynamic comment number of the account, the latest dynamic comment number of the account are account interaction force, the business reference dimensions of the two of the vermicelli amount and the related comment amount of the account are associated account numbers, and the playing completion rate of all videos clicked by the account is account, And the service reference dimension of the playing completion rate of the videos clicked in the account within 7 days is the playing completion rate of the videos.
In some embodiments, taking the target account as a social media account as an example, the business reference dimension includes at least two of: account activity, account influence, account consumption, account interaction force, number of associated accounts, account homepage popularity and the like.
Optionally, the plurality of service data of the target account includes: the account number daily active time, the account number weekly active time, the account number monthly active time, the account number published dynamic accumulated click times, the account number published dynamic accumulated comment times, the account number published dynamic accumulated share times, the account number accumulated total consumption amount, the consumption amount of the latest order of the account number, the account number membership grade, the account number accumulated all dynamic comment numbers, the latest dynamic comment number of the account number, the latest dynamic reading amount of the account number, the account number vermicelli amount, the account number concern amount, the account number daily access amount, the account number homepage weekly access amount, the account number homepage monthly access amount and the like.
When service data grouping is performed on the basis, the service reference dimension to which each service data belongs needs to be determined, that is: the business reference dimensions of the account daily active duration, the account weekly active duration and the account monthly active duration are account activity, the business reference dimensions of the three of dynamic accumulated click times, dynamic accumulated comment times and dynamic accumulated share times issued by the account are account influence, the business reference dimensions of the three of the account accumulated total consumption amount, the consumption amount of the latest order of the account and the account membership grade are account consumption conditions, the business reference dimensions of the account accumulated all dynamic comment numbers, the latest dynamic comment numbers and the latest dynamic read amount of the account are account interaction forces, the business reference dimensions of the account powder amount and the account comment amount are associated account numbers, the daily access amount of the account homepage and the weekly access amount of the account homepage are related account numbers, and the business reference dimensions of the account homepage and the weekly access amount of the account homepage are related account numbers, The monthly access amount of the account homepage belongs to a service reference dimension which is the account homepage popularity.
In some embodiments, a user may log in the target account on a terminal, initiate various service requests to a server based on the target account, record various service data of the target account when the server provides service services corresponding to the service requests to the target account, and group the service data according to service reference dimensions by using various statistical analysis tools to obtain multiple groups of service data of the target account.
In other embodiments, the terminal locally records each service data of the target account, and uses various statistical analysis tools to group each service data according to a service reference dimension to obtain multiple groups of service data of the target account, and sends the multiple groups of service data of the target account to the server to request the server to identify the account type of the target account based on the multiple groups of service data of the target account.
Optionally, according to the setting of a technician, the target account number may be divided into different account number types, for example, the target account number is divided into a high-quality account number and a non-high-quality account number, or the target account number is divided into a head account number, a waist account number and a bottom account number, or the target account number is divided into a high-credit account number, a medium-credit account number and a low-credit account number, and a downstream task of personalized resource recommendation for account numbers of different account number types can be indicated by accurately dividing the account number types.
In the embodiment of the present application, only the case of identifying whether the target account is a high-quality account is taken as an example, but the account type of the target account should not be limited. The high-quality account refers to a core account or a high-quality account of the platform, and needs to be comprehensively evaluated and measured through various service reference dimensions such as account activity, account consumption condition, account interaction force and the like.
502. The server obtains a plurality of first identification results of the target account based on the plurality of sets of service data, wherein the first identification results are predicted account types determined based on the single set of service data.
In some embodiments, the server inputs the multiple sets of service data into multiple account identification models respectively, and processes the multiple sets of service data through the multiple account identification models respectively to obtain the multiple first identification results, where the account identification model is used to obtain a predicted account type based on a single set of service data of a corresponding service reference dimension.
Optionally, after acquiring multiple sets of service data of the target account, the server determines a service reference dimension corresponding to any set of service data in the multiple sets of service data; determining an account identification model corresponding to a business reference dimension from a plurality of account identification models prestored locally based on the mapping relation between the business reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single group of business data of the business reference dimension; and inputting any group of business data into the corresponding account number identification model, processing any group of business data through the account number identification model to obtain a first identification result corresponding to any group of business data, and executing the steps on each group of business data of the target account number to obtain a plurality of first identification results.
In the process, one account identification model is trained for each service reference dimension, and the service data corresponding to the service reference dimension is input into the corresponding account identification model, so that the service data of different service reference dimensions can be specifically processed by using different account identification models, the influence of the service data of other service reference dimensions can be eliminated from the prediction result, and the account type of the target account can be accurately predicted in a single service reference dimension.
In some embodiments, the server may pre-store a mapping relationship between a service reference dimension and an account identification model, where one service reference dimension corresponds to a unique account identification model, and further, determine, according to the service reference dimension to which each set of input service data belongs, an account identification model having a mapping relationship with the service reference dimension. For example, the server has 3 account identification models A, B, C prestored therein, and service reference dimensions corresponding to the account identification models A, B, C respectively include: account activity, account consumption and account interaction, and the input service data of the target account comprises: the online time of the last 7 days is 10 hours, the accumulated consumption is 1 ten thousand yuan, the number of comments sent out in the last 7 days is 100, and the service reference dimension corresponding to the accumulated consumption of 1 ten thousand yuan is the account consumption condition, so that the service data of 1 ten thousand yuan is input into the account identification model B corresponding to the account consumption condition.
Alternatively, the account id model may be any classification model, for example, a Gradient Boosting (GB) model, a Gradient Boosting Decision Tree (GBDT) model, an eXtreme Gradient Boosting (XGBoost) model, a Light Gradient Boosting Machine (LightGBM) model, and the like, and the embodiment of the present invention does not specifically limit the model structure of the account id model.
In an exemplary embodiment, it is assumed that an account identification model is used for identifying whether an input account is a high-quality account, where the high-quality account refers to a core account or a high-quality account of a platform and needs to be comprehensively evaluated and measured through multiple dimensions such as account activity, account consumption, account interaction force, and the like. Taking the account identification model as an XGBoost model as an example, the XGBoost model is a strong learner integrated by a plurality of weak learners, where the weak learner may be a CART (Classification And Regression Tree) or a linear classifier (gblinar), And this is not specifically limited in this embodiment of the present application. The XGboost model can reduce variance and deviation and improve prediction effect, and mainly comprises machine learning algorithms such as Boosting algorithm, Bagging algorithm and Stacking algorithm.
For the XGBoost model, the server inputs a corresponding group of service data into the XGBoost model, that is, the group of service data is input into the weak learners, each weak learner performs feature splitting on the group of service data to obtain leaf nodes of a decision tree in which the weak learner is located, and outputs corresponding leaf node scores, and finally, the server performs weighting processing on the leaf node scores output by the weak learners, so as to obtain a prediction probability (i.e., a first recognition result), where the prediction probability is used to represent a possibility that a target account belongs to a high-quality account determined on the group of service data, optionally, the prediction probability is a numerical value greater than or equal to 0 and less than or equal to 1, when the prediction probability is larger, the probability representing the XGBoost model predicts that the target account belongs to the high-quality account is larger, when the prediction probability is smaller, the less likely the representative XGBoost model predicts that the target account is a high quality account. Alternatively, each decision tree may be a binary tree, that is, each weak learner is divided into two parts, namely, a left sub-tree and a right sub-tree when performing feature splitting.
503. The server obtains a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined based on the plurality of groups of service data.
In some embodiments, the server inputs the plurality of first recognition results into a fusion model, and weights the plurality of first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for obtaining a predicted account type based on a plurality of groups of service data; and performing linear mapping on the sum value of the weighted recognition results to obtain the second recognition result.
In the process, the linear weighted fusion is performed on each first recognition result, so that the second recognition result can comprehensively refer to respective values of the plurality of first recognition results, and the importance degree of each group of service data in the second recognition result can be dynamically adjusted by adjusting the weight of each first recognition result, so that the second recognition result has higher accuracy.
Optionally, the fusion model may be a Logistic Regression (LR) model, in which a respective weight parameter is assigned to each account identification model in the LR model, a first identification result output by each account identification model is multiplied by the corresponding weight parameter to obtain a plurality of weighted identification results, the plurality of weighted identification results are added to obtain a sum, and the sum is subjected to linear mapping to obtain the second identification result.
In one possible implementation, assume ciWeight parameter, f, representing the i-th account identification modeliRepresenting a first recognition result output by the ith account number recognition model, wherein i is an integer which is greater than or equal to 1 and less than or equal to n, n is the number of the account number recognition models and n is an integer which is greater than or equal to 1, and b is a truncation of the logistic regression modelDistance (i.e. bias parameter), the second recognition result p output by the logistic regression model can be expressed as:
Figure BDA0003069825480000241
assuming that C is a summary of all constants, the above equation can be simplified as:
Figure BDA0003069825480000251
in some embodiments, the fusion model may be a least square model in addition to the LR model, and both the LR model and the least square model may perform linear weighted fusion on the plurality of first recognition results, so that the final second recognition result may be integrated with the first recognition result of each account recognition model to achieve a global account type prediction effect.
Fig. 6 is a schematic flow chart of an account type identification method provided in the embodiment of the present application, and as shown in 600, it is assumed that a server collects n sets of service data (n is greater than or equal to 1) of a target account, and the n sets of service data are respectively input into n account identification models respectively corresponding to the n sets of service data, so as to obtain n first identification results, for example, each account identification model is a gradient lifting tree model, and the n first identification results are input into a fusion model, so as to obtain a second identification result.
504. The server determines a predicted account type of the target account based on the plurality of first recognition results of the target account and the second recognition result of the target account.
In some embodiments, the server determines that the predicted account type of the target account is a high quality account in response to the second recognition result being greater than a first target threshold. The first target threshold is any value greater than or equal to 0 and less than or equal to 1.
In some embodiments, the server determines that the predicted account type of the target account is a high-quality account in response to that a target recognition result in the plurality of first recognition results is greater than a second target threshold, where the target recognition result refers to a first recognition result corresponding to a decisive business data in the plurality of sets of business data, and the decisive business data may be specified by a technician, for example, a consumption situation of the account belongs to the decisive business data. The second target threshold is any value greater than or equal to 0 and less than or equal to 1.
In some embodiments, the server determines that the predicted account type of the target account is a high quality account in response to a number of first recognition results of the plurality of first recognition results that are greater than a third target threshold being greater than a number threshold. The third target threshold is any value greater than or equal to 0 and less than or equal to 1, and the number threshold is any value greater than or equal to 1.
In some embodiments, when the server satisfies at least one of the three conditions, it may be determined that the predicted account type of the target account is a high-quality account, and the determination manner for determining the account type of the target account is not specifically limited in this embodiment.
Fig. 7 is a schematic diagram of an account type identification method according to an embodiment of the present application, as shown in 700, assuming that the target account is a short video account, the server analyzes the service request of the target account by using various statistical analysis tools to obtain a plurality of service data of the target account, and groups the plurality of service data according to the service reference dimension, for example, dividing the data into various groups of service data such as account activity, account consumption, account interaction power, etc., inputting the various groups of service data into corresponding account identification models (taking a gradient lifting tree model as an example), obtaining each first recognition result, inputting each first recognition result into the fusion model to obtain a second recognition result, and the second identification result is the probability that the whole predicted target account belongs to the high-quality account. On the basis of the first recognition result and the second recognition result, the account analysis can be carried out, the prediction account type to which the target account ultimately belongs is determined, and downstream application work such as recommendation bit allocation, account consumption overall sequencing, high-quality account mining/deleting and the like is carried out on the basis of the prediction account type. For example, a short video issued by a high-quality account is preferentially recommended to other accounts to increase the exposure rate of the high-quality account, so as to improve the indexes of user duration retention and the like, and for example, the characteristics of the high-quality account are analyzed and taken as a reference factor of an account analysis management tool in a platform.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
According to the method provided by the embodiment of the application, original service data are grouped according to the service reference dimension, the first identification result of the target account is predicted independently on the basis of each group of service data, the second identification result of the target account is predicted comprehensively by synthesizing the first identification results, when the final predicted account type is determined, the second identification result and the first identification results are considered comprehensively, instead of only judging the single dimension based on the second identification result, the identification accuracy of the account type can be greatly improved.
Furthermore, different account identification models are modeled for the business data with different business reference dimensions, the account identification models are used for obtaining respective first identification results of each group of corresponding business data, the first identification results are modeled into a fusion model, the fusion model is used for obtaining second identification results, the first identification results and the second identification results can be automatically obtained through the machine learning model, and the data processing efficiency is improved.
Fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, please refer to fig. 8, where the apparatus includes:
a first obtaining module 801, configured to obtain multiple sets of sample service data and account types of a sample account, where different sets of sample service data correspond to different service reference dimensions;
an adjusting module 802, configured to adjust parameters of multiple initial identification models and initial fusion models based on the multiple sets of sample service data and the account types, where parameters of other models are kept unchanged when any initial identification model or parameter of the initial fusion model is adjusted;
a second obtaining module 803, configured to obtain, in response to that the iteration meets the convergence condition, a plurality of account identification models and a fusion model after parameter adjustment, where the account identification model is used to obtain a predicted account type based on a single set of service data of a corresponding service reference dimension, and the fusion model is used to obtain the predicted account type based on a plurality of sets of service data.
According to the device provided by the embodiment of the application, multiple groups of sample service data of sample accounts under multiple service reference dimensions and the marked account types are utilized to carry out joint training on multiple initial recognition models and initial fusion models, and in the process of adjusting parameters of each model, the parameters of other models are fixed, so that layered training is not independently carried out when a single model is trained, but an end-to-end training mode is provided, namely, each model not only considers the recognition result of the model to carry out independent training, but also combines the recognition results of other models to carry out joint training, the recognition accuracy rates of multiple account recognition models and fusion models obtained by training can be greatly improved, and the recognition accuracy rate of the account types is improved.
In one possible implementation, the adjusting module 802 is configured to:
respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results of the sample account;
inputting the plurality of first sample identification results into the initial fusion model to obtain a second sample identification result of the sample account;
determining a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type;
in response to the non-compliance with the stop condition, adjusting parameters of any one of the plurality of initial recognition models until the stop condition is met, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the adjustment of the parameters of the plurality of initial recognition models.
In one possible implementation, the adjusting module 802 is further configured to:
and adjusting parameters of the basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein any initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
In one possible implementation, the adjusting module 802 is further configured to:
respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results;
and adjusting parameters of the basic fusion model based on the plurality of first sample identification results to obtain the initial fusion model.
In one possible embodiment, the convergence condition is that the difference between the loss function values of the current iteration and the last iteration is smaller than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the data processing apparatus provided in the above embodiment, when processing the service data, only the division of the functional modules is illustrated, and in practical applications, the functions can be allocated to different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the data processing apparatus and the data processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the data processing method embodiments and are not described herein again.
Fig. 9 is a schematic structural diagram of an apparatus for identifying an account type according to an embodiment of the present application, please refer to fig. 9, where the apparatus includes:
a first obtaining module 901, configured to obtain multiple sets of service data of a target account, where different sets of service data correspond to different service reference dimensions;
a second obtaining module 902, configured to obtain, based on the multiple sets of service data, multiple first identification results of the target account, where the first identification results are predicted account types determined based on a single set of the service data;
a third obtaining module 903, configured to obtain a second identification result of the target account based on multiple first identification results of the target account, where the second identification result is a predicted account type determined based on the multiple sets of service data;
a determining module 904, configured to determine a predicted account type of the target account based on the plurality of first recognition results of the target account and the second recognition result of the target account.
According to the device provided by the embodiment of the application, original service data are grouped according to the service reference dimension, the first identification result of the target account is predicted independently on the basis of each group of service data, the second identification result of the target account is predicted comprehensively by synthesizing the first identification results, when the final predicted account type is determined, the second identification result and the first identification results are considered comprehensively, instead of only judging the single dimension based on the second identification result, the identification accuracy of the account type can be greatly improved.
In a possible implementation, the second obtaining module 902 is configured to:
for any group of service data in the multiple groups of service data, determining a service reference dimension corresponding to the any group of service data;
determining an account identification model corresponding to a business reference dimension based on a mapping relation between the business reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single group of business data of the business reference dimension;
and inputting any group of business data into the account number identification model, and processing any group of business data through the account number identification model to obtain a first identification result corresponding to any group of business data.
In one possible implementation, the third obtaining module 903 is configured to:
inputting the plurality of first recognition results into a fusion model, weighting the plurality of first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for acquiring a predicted account type based on a plurality of groups of service data;
and performing linear mapping on the sum value of the weighted recognition results to obtain the second recognition result.
In one possible implementation, the first obtaining module 901 is configured to:
determining service reference dimensions to which a plurality of service data of the target account belong;
and dividing all the service data belonging to the same service reference dimension into the same group of service data.
In one possible embodiment, the service reference dimension includes at least two of: account liveness, account influence, account consumption, account interaction force, number of associated accounts, and video playing completion rate.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the above embodiment, when the identification apparatus for an account type identifies an account type, only the division of the function modules is described as an example, and in practical applications, the function distribution can be completed by different function modules according to needs, that is, the internal structure of the computer device is divided into different function modules to complete all or part of the above-described functions. In addition, the identification apparatus of an account type and the identification method of an account type provided in the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the identification method of an account type, and are not described herein again.
Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. Optionally, taking a computer device as an example for explanation, the device type of the terminal 1000 includes: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1000 can also be referred to as user equipment, portable terminal, laptop terminal, desktop terminal, or the like by other names.
In general, terminal 1000 can include: a processor 1001 and a memory 1002.
Optionally, the processor 1001 includes one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. Alternatively, the processor 1001 is implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). In some embodiments, the processor 1001 includes a main processor and a coprocessor, the main processor is a processor for Processing data in an awake state, also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 is integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 1001 further includes an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
In some embodiments, memory 1002 includes one or more computer-readable storage media, which are optionally non-transitory. Optionally, the memory 1002 also includes high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1002 is used to store at least one program code for execution by the processor 1001 to implement the methods for processing data or identifying account types provided by the various embodiments herein.
In some embodiments, terminal 1000 can also optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal lines. Each peripheral can be connected to the peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, display screen 1005, camera assembly 1006, audio circuitry 1007, positioning assembly 1008, and power supply 1009.
The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1001, the memory 1002, and the peripheral interface 1003 are implemented on a separate chip or circuit board, which is not limited by this embodiment.
The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. Optionally, the radio frequency circuit 1004 communicates with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 further includes NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 1005 is used to display a UI (User Interface). Optionally, the UI includes graphics, text, icons, video, and any combination thereof. When the display screen 1005 is a touch display screen, the display screen 1005 also has the ability to capture touch signals on or over the surface of the display screen 1005. The touch signal can be input to the processor 1001 as a control signal to be processed. Optionally, the display screen 1005 is also used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display screen 1005 is one, providing a front panel of terminal 1000; in other embodiments, there are at least two display screens 1005, each of which is disposed on a different surface of terminal 1000 or in a folded design; in still other embodiments, display 1005 is a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even more optionally, the display screen 1005 is arranged in a non-rectangular irregular figure, i.e. a shaped screen. Alternatively, the Display screen 1005 is made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.
The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1006 also includes a flash. Optionally, the flash is a monochrome temperature flash, or a bi-color temperature flash. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and is used for light compensation under different color temperatures.
In some embodiments, the audio circuit 1007 includes a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing or inputting the electric signals to the radio frequency circuit 1004 for realizing voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones are respectively arranged at different positions of terminal 1000. Optionally, the microphone is an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. Alternatively, the speaker is a conventional membrane speaker, or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to human, but also the electric signal can be converted into a sound wave inaudible to human for use in distance measurement or the like. In some embodiments, the audio circuit 1007 also includes a headphone jack.
A Location component 1008 is employed to locate a current geographic Location of terminal 1000 for purposes of navigation or LBS (Location Based Service). Alternatively, the Positioning component 1008 is a Positioning component based on a GPS (Global Positioning System) in the united states, a beidou System in china, a graves System in russia, or a galileo System in the european union.
Power supply 1009 is used to supply power to various components in terminal 1000. Optionally, the power source 1009 is an alternating current, direct current, disposable battery, or rechargeable battery. When the power source 1009 includes a rechargeable battery, the rechargeable battery supports wired charging or wireless charging. The rechargeable battery is also used to support fast charge technology.
In some embodiments, terminal 1000 can also include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.
In some embodiments, acceleration sensor 1011 detects acceleration magnitudes on three coordinate axes of a coordinate system established with terminal 1000. For example, the acceleration sensor 1011 is used to detect components of the gravitational acceleration on three coordinate axes. Alternatively, the processor 1001 controls the display screen 1005 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011. The acceleration sensor 1011 is also used for acquisition of motion data of a game or a user.
In some embodiments, the gyro sensor 1012 detects the body direction and rotation angle of the terminal 1000, and the gyro sensor 1012 and the acceleration sensor 1011 cooperate to acquire the 3D motion of the terminal 1000 by the user. The processor 1001 implements the following functions according to the data collected by the gyro sensor 1012: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Alternatively, pressure sensor 1013 is disposed on a side frame of terminal 1000 and/or underneath display screen 1005. When pressure sensor 1013 is disposed on a side frame of terminal 1000, a user's grip signal on terminal 1000 can be detected, and processor 1001 performs left-right hand recognition or shortcut operation according to the grip signal collected by pressure sensor 1013. When the pressure sensor 1013 is disposed at a lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the user according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Alternatively, fingerprint sensor 1014 is disposed on a front, back, or side of terminal 1000. When a physical key or vendor Logo is provided on terminal 1000, fingerprint sensor 1014 can be integrated with the physical key or vendor Logo.
The optical sensor 1015 is used to collect the ambient light intensity. In one embodiment, the processor 1001 controls the display brightness of the display screen 1005 according to the intensity of the ambient light collected by the optical sensor 1015. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 also dynamically adjusts the shooting parameters of the camera assembly 1006 according to the intensity of the ambient light collected by the optical sensor 1015.
Proximity sensor 1016, also known as a distance sensor, is typically disposed on a front panel of terminal 1000. Proximity sensor 1016 is used to gather the distance between the user and the front face of terminal 1000. In one embodiment, when proximity sensor 1016 detects that the distance between the user and the front surface of terminal 1000 is gradually reduced, processor 1001 controls display screen 1005 to switch from a bright screen state to a dark screen state; when proximity sensor 1016 detects that the distance between the user and the front of terminal 1000 is gradually increased, display screen 1005 is controlled by processor 1001 to switch from a breath-screen state to a bright-screen state.
Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting and can include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
Fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device 1100 may have relatively large differences due to different configurations or performances, and the computer device 1100 includes one or more processors (CPUs) 1101 and one or more memories 1102, where the memory 1102 stores at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 1101 to implement the method for Processing data or the method for identifying an account type provided in the foregoing embodiments. Optionally, the computer device 1100 further has components such as a wired or wireless network interface, a keyboard, an input/output interface, and the like, so as to perform input/output, and the computer device 1100 further includes other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, a computer-readable storage medium, for example, a memory including at least one computer program, which is executable by a processor in a terminal to perform the data processing method or the account type identification method in the above embodiments, is also provided. For example, the computer-readable storage medium includes a ROM (Read-Only Memory), a RAM (Random-Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product or computer program is also provided, comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the computer device can execute the method for processing data or the method for identifying the account type in the above embodiments.
Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments can be implemented by hardware, or can be implemented by a program instructing relevant hardware, and optionally, the program is stored in a computer readable storage medium, and optionally, the above mentioned storage medium is a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (15)

1. A method of processing data, the method comprising:
acquiring a plurality of groups of sample service data and account types of sample accounts, wherein different groups of sample service data correspond to different service reference dimensions;
adjusting parameters of a plurality of initial identification models and initial fusion models based on the plurality of groups of sample service data and the account number types, wherein when the parameters of any initial identification model or the initial fusion model are adjusted, the parameters of other models are kept unchanged;
and responding to the fact that the iteration accords with the convergence condition, acquiring a plurality of account identification models and fusion models after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on a single group of service data of the corresponding service reference dimension, and the fusion models are used for acquiring the predicted account types based on a plurality of groups of service data.
2. The method of claim 1, wherein adjusting parameters of a plurality of initial recognition models and initial fusion models based on the plurality of sets of sample traffic data and the account type comprises:
respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results of the sample account;
inputting the plurality of first sample identification results into the initial fusion model to obtain a second sample identification result of the sample account;
determining a loss function value based on the plurality of first sample identification results, the second sample identification results, and the account number type;
in response to the non-compliance with the stop condition, adjusting parameters of any one of the plurality of initial recognition models until the stop condition is met, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the adjustment of the parameters of the plurality of initial recognition models.
3. The method of claim 1, wherein before adjusting parameters of a plurality of initial recognition models and initial fusion models based on the plurality of sets of sample traffic data and the account type, the method further comprises:
and adjusting parameters of a basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein any initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
4. The method of claim 1, wherein before adjusting parameters of a plurality of initial recognition models and initial fusion models based on the plurality of sets of sample traffic data and the account type, the method further comprises:
respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results;
and adjusting parameters of a basic fusion model based on the plurality of first sample identification results to obtain the initial fusion model.
5. The method according to any one of claims 1 to 4, wherein the convergence condition is that the difference between the loss function values of the current iteration process and the last iteration process is smaller than a convergence threshold value, and the convergence threshold value is a value greater than or equal to 0.
6. An account type identification method is characterized by comprising the following steps:
acquiring a plurality of groups of service data of a target account, wherein different groups of service data correspond to different service reference dimensions;
acquiring a plurality of first identification results of the target account based on the plurality of groups of service data, wherein the first identification results are predicted account types determined based on a single group of service data;
acquiring a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined based on the plurality of groups of service data;
determining a predicted account type of the target account based on the plurality of first recognition results of the target account and the second recognition result of the target account.
7. The method of claim 1, wherein the obtaining a plurality of first recognition results of the target account based on the plurality of sets of business data comprises:
for any group of service data in the multiple groups of service data, determining a service reference dimension corresponding to the any group of service data;
determining an account identification model corresponding to a business reference dimension based on a mapping relation between the business reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single group of business data of the business reference dimension;
and inputting the any group of business data into the account identification model, and processing the any group of business data through the account identification model to obtain a first identification result corresponding to the any group of business data.
8. The method of claim 1, wherein the obtaining a second recognition result of the target account number based on the plurality of first recognition results of the target account number comprises:
inputting the first recognition results into a fusion model, weighting the first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for acquiring a predicted account type based on a plurality of groups of service data;
and performing linear mapping on the sum of the weighted recognition results to obtain the second recognition result.
9. The method according to claim 1, wherein the acquiring multiple sets of service data of the target account comprises:
determining service reference dimensions to which a plurality of service data of the target account belong respectively;
and dividing all the service data belonging to the same service reference dimension into the same group of service data.
10. The method according to any of claims 6 to 9, wherein the traffic reference dimension comprises at least two of: account liveness, account influence, account consumption, account interaction force, number of associated accounts, and video playing completion rate.
11. An apparatus for processing data, the apparatus comprising:
the first acquisition module is used for acquiring a plurality of groups of sample service data and account types of the sample accounts, wherein different groups of sample service data correspond to different service reference dimensions;
the adjusting module is used for adjusting parameters of a plurality of initial identification models and initial fusion models based on the plurality of groups of sample service data and the account number types, wherein when the parameters of any initial identification model or the initial fusion model are adjusted, the parameters of other models are kept unchanged;
and the second obtaining module is used for responding to the fact that the iteration accords with the convergence condition, obtaining a plurality of account identification models and fusion models after parameter adjustment, wherein the account identification models are used for obtaining the predicted account types based on a single group of service data corresponding to the service reference dimension, and the fusion models are used for obtaining the predicted account types based on a plurality of groups of service data.
12. The apparatus of claim 11, wherein the adjustment module is configured to:
respectively inputting the multiple groups of sample service data into the multiple initial identification models to obtain multiple first sample identification results of the sample account;
inputting the plurality of first sample identification results into the initial fusion model to obtain a second sample identification result of the sample account;
determining a loss function value based on the plurality of first sample identification results, the second sample identification results, and the account number type;
in response to the non-compliance with the stop condition, adjusting parameters of any one of the plurality of initial recognition models until the stop condition is met, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the adjustment of the parameters of the plurality of initial recognition models.
13. An apparatus for identifying account types, the apparatus comprising:
the first acquisition module is used for acquiring a plurality of groups of service data of the target account, wherein different groups of service data correspond to different service reference dimensions;
a second obtaining module, configured to obtain multiple first identification results of the target account based on the multiple sets of service data, where the first identification results are predicted account types determined based on a single set of the service data;
a third obtaining module, configured to obtain a second recognition result of the target account based on multiple first recognition results of the target account, where the second recognition result is a predicted account type determined based on the multiple sets of service data;
a determination module, configured to determine a predicted account type of the target account based on the plurality of first recognition results of the target account and the second recognition result of the target account.
14. A computer device, characterized in that the computer device comprises one or more processors and one or more memories in which at least one computer program is stored, the at least one computer program being loaded and executed by the one or more processors to implement a method of processing data according to any one of claims 1 to 5; or, to implement the account type identification method according to any one of claims 6 to 10.
15. A storage medium, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement a method of processing data according to any one of claims 1 to 5; or, to implement the account type identification method according to any one of claims 6 to 10.
CN202110535924.1A 2021-05-17 2021-05-17 Data processing method, account type identification method and device Active CN113762585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110535924.1A CN113762585B (en) 2021-05-17 2021-05-17 Data processing method, account type identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110535924.1A CN113762585B (en) 2021-05-17 2021-05-17 Data processing method, account type identification method and device

Publications (2)

Publication Number Publication Date
CN113762585A true CN113762585A (en) 2021-12-07
CN113762585B CN113762585B (en) 2023-08-01

Family

ID=78787072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110535924.1A Active CN113762585B (en) 2021-05-17 2021-05-17 Data processing method, account type identification method and device

Country Status (1)

Country Link
CN (1) CN113762585B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169440A (en) * 2021-12-08 2022-03-11 北京百度网讯科技有限公司 Model training method, data processing method, device, electronic device and medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107346464A (en) * 2016-05-06 2017-11-14 腾讯科技(深圳)有限公司 Operational indicator Forecasting Methodology and device
CN109948670A (en) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 Training method and device, the data processing method and device of data identification model
CN110188836A (en) * 2019-06-21 2019-08-30 西安交通大学 A kind of brain function network class method based on variation self-encoding encoder
CN110399925A (en) * 2019-07-26 2019-11-01 腾讯科技(武汉)有限公司 Risk Identification Method, device and the storage medium of account
CN110598840A (en) * 2018-06-13 2019-12-20 富士通株式会社 Knowledge migration method, information processing apparatus, and storage medium
CN110738263A (en) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 image recognition model training method, image recognition method and device
CN111582694A (en) * 2020-04-29 2020-08-25 腾讯科技(深圳)有限公司 Learning evaluation method and device
CN111708823A (en) * 2020-08-18 2020-09-25 腾讯科技(深圳)有限公司 Abnormal social account identification method and device, computer equipment and storage medium
US20200312307A1 (en) * 2019-03-25 2020-10-01 Microsoft Technology Licensing, Llc Dynamic Combination of Acoustic Model States
CN111783998A (en) * 2020-06-30 2020-10-16 百度在线网络技术(北京)有限公司 Illegal account recognition model training method and device and electronic equipment
CN112016633A (en) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
WO2020247949A1 (en) * 2019-06-07 2020-12-10 The Regents Of The University Of California General form of the tree alternating optimization (tao) for learning decision trees
CN112221156A (en) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 Data abnormality recognition method, data abnormality recognition device, storage medium, and electronic device
CN112257876A (en) * 2020-11-15 2021-01-22 腾讯科技(深圳)有限公司 Federal learning method, apparatus, computer device and medium
CN112488163A (en) * 2020-11-17 2021-03-12 中国平安财产保险股份有限公司 Abnormal account identification method and device, computer equipment and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107346464A (en) * 2016-05-06 2017-11-14 腾讯科技(深圳)有限公司 Operational indicator Forecasting Methodology and device
CN110598840A (en) * 2018-06-13 2019-12-20 富士通株式会社 Knowledge migration method, information processing apparatus, and storage medium
CN109948670A (en) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 Training method and device, the data processing method and device of data identification model
US20200312307A1 (en) * 2019-03-25 2020-10-01 Microsoft Technology Licensing, Llc Dynamic Combination of Acoustic Model States
WO2020247949A1 (en) * 2019-06-07 2020-12-10 The Regents Of The University Of California General form of the tree alternating optimization (tao) for learning decision trees
CN110188836A (en) * 2019-06-21 2019-08-30 西安交通大学 A kind of brain function network class method based on variation self-encoding encoder
CN110399925A (en) * 2019-07-26 2019-11-01 腾讯科技(武汉)有限公司 Risk Identification Method, device and the storage medium of account
CN110738263A (en) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 image recognition model training method, image recognition method and device
CN111582694A (en) * 2020-04-29 2020-08-25 腾讯科技(深圳)有限公司 Learning evaluation method and device
CN111783998A (en) * 2020-06-30 2020-10-16 百度在线网络技术(北京)有限公司 Illegal account recognition model training method and device and electronic equipment
CN111708823A (en) * 2020-08-18 2020-09-25 腾讯科技(深圳)有限公司 Abnormal social account identification method and device, computer equipment and storage medium
CN112016633A (en) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
CN112221156A (en) * 2020-10-27 2021-01-15 腾讯科技(深圳)有限公司 Data abnormality recognition method, data abnormality recognition device, storage medium, and electronic device
CN112257876A (en) * 2020-11-15 2021-01-22 腾讯科技(深圳)有限公司 Federal learning method, apparatus, computer device and medium
CN112488163A (en) * 2020-11-17 2021-03-12 中国平安财产保险股份有限公司 Abnormal account identification method and device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孙宁;顾正东;刘佶鑫;韩光;: "面向人脸年龄估计的深度融合神经网络", 中国图象图形学报, no. 01 *
曲文龙;陈笑屹;李一漪;汪慎文;: "一种深度梯度提升回归预测模型", 计算机应用与软件, no. 09 *
陈启伟;王伟;马迪;毛伟;: "基于Ext-GBDT集成的类别不平衡信用评分模型", 计算机应用研究, no. 02 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169440A (en) * 2021-12-08 2022-03-11 北京百度网讯科技有限公司 Model training method, data processing method, device, electronic device and medium

Also Published As

Publication number Publication date
CN113762585B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN111298445B (en) Target account detection method and device, electronic equipment and storage medium
CN112069414A (en) Recommendation model training method and device, computer equipment and storage medium
CN111104980B (en) Method, device, equipment and storage medium for determining classification result
CN109784351B (en) Behavior data classification method and device and classification model training method and device
CN111552888A (en) Content recommendation method, device, equipment and storage medium
CN111931877A (en) Target detection method, device, equipment and storage medium
CN112749728A (en) Student model training method and device, computer equipment and storage medium
CN111897996A (en) Topic label recommendation method, device, equipment and storage medium
CN111738365B (en) Image classification model training method and device, computer equipment and storage medium
CN110555102A (en) media title recognition method, device and storage medium
CN111753498A (en) Text processing method, device, equipment and storage medium
CN112561084B (en) Feature extraction method and device, computer equipment and storage medium
CN114282587A (en) Data processing method and device, computer equipment and storage medium
CN111931075B (en) Content recommendation method and device, computer equipment and storage medium
CN114281936A (en) Classification method and device, computer equipment and storage medium
CN112036492B (en) Sample set processing method, device, equipment and storage medium
CN113762585B (en) Data processing method, account type identification method and device
CN113724189A (en) Image processing method, device, equipment and storage medium
CN113674856A (en) Medical data processing method, device, equipment and medium based on artificial intelligence
CN112527104A (en) Method, device and equipment for determining parameters and storage medium
CN111259252B (en) User identification recognition method and device, computer equipment and storage medium
CN114385854A (en) Resource recommendation method and device, electronic equipment and storage medium
CN114328948A (en) Training method of text standardization model, text standardization method and device
CN114764480A (en) Group type identification method and device, computer equipment and medium
CN113486260A (en) Interactive information generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant