CN107886366A - Generation method, sex fill method, terminal and the storage medium of Gender Classification model - Google Patents

Generation method, sex fill method, terminal and the storage medium of Gender Classification model Download PDF

Info

Publication number
CN107886366A
CN107886366A CN201711176286.9A CN201711176286A CN107886366A CN 107886366 A CN107886366 A CN 107886366A CN 201711176286 A CN201711176286 A CN 201711176286A CN 107886366 A CN107886366 A CN 107886366A
Authority
CN
China
Prior art keywords
user
business
gender
sex
data collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201711176286.9A
Other languages
Chinese (zh)
Inventor
黄程波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jinli Communication Equipment Co Ltd
Original Assignee
Shenzhen Jinli Communication Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jinli Communication Equipment Co Ltd filed Critical Shenzhen Jinli Communication Equipment Co Ltd
Priority to CN201711176286.9A priority Critical patent/CN107886366A/en
Publication of CN107886366A publication Critical patent/CN107886366A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention discloses the generation method of sex disaggregated model, sex fill method, terminal and storage medium, and the generation method of its Gender Classification model includes:Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate objective matrix table;What the gender data collection in objective matrix table filtered out multiple business treats training user, treats that training user is included in multiple pre-set business and gathers containing gender information and gender information's identical user;It will treat that gender data collection and behavioral data collection of the training user in the objective matrix table are converted to the characteristic data set of training Gender Classification model, wherein characteristic data set includes training dataset and test data set;According to training dataset, Gender Classification model is trained using decision Tree algorithms;According to algorithm tuning parameter and test data set cross validation Gender Classification model, optimal sex disaggregated model is obtained.

Description

Generation method, sex fill method, terminal and the storage medium of Gender Classification model
Technical field
The present invention relates to electronic technology field, more particularly to a kind of generation method of Gender Classification model, sex filling side Method, terminal and storage medium.
Background technology
At present, with Internet technology development and ecommerce popularization, and high performance intelligent mobile terminal by Gradually popularize, mobile Internet is that user has made a brand-new communication environment, can greatly meet the differentiation need of user Ask, Mobile solution is also enriched constantly with astonishing speed.Wherein, ecommerce is typically referred in the extensive business in all parts of the world In industry trade activity, under open mobile internet environment, based on browser/server application mode, on both parties' line Carry out various commercial activities.However, service mode under traditional wire is different from, and on line in process of exchange, of the trade company to user People's Back ground Information is had little understanding, and causing trade company to understand the demand of user has certain limitation, easily cause some advertisements and Situations such as marketing measures such as the invalid dispensing or advertisement of promotion are difficult to the set goal.Therefore, the base of research prediction user This attribute information and historical behavior are highly desirable to the demand of precise positioning user, to provide the user more preferable personalization Service.Wherein, the gender information of the user index most basic as demographics, it is most heavy in structure user's portrait label system One of composition wanted.Gender information combines other base attributes of user and the historical behavior of user is usually used in analysis and sees clearly user Hobby and individual demand, crowd orient in, sex is one of most important screening conditions.But user base category Property information is considered as individual privacy information, registration of the user in each platform such as wechat, Sina as sex, age etc. by user During all selectively can not fill in this kind of individual privacy information, therefore, many network application companies be difficult obtain user The primary attribute information such as sex, age.
In the prior art, the gender information for obtaining user relies on substantially gender information that user filled in or with certain The data of individual single business are modeled gender information both modes that prediction obtains.For example some network application companies exist It can Qiang Zhiyaoqiu that user fills in or allowed user selectively to fill in gender information during user's registration individual's account, but this kind of individual is hidden Personal letter breath is more sensitive for a user.Therefore, for the log-on message filled in of compulsive requirement, the experience effect of user compared with Difference, focus on the user of privacy for part or even can easily cause the dislike of user, furthermore user may also deliberately fill in mistake Information, these false information have negative interaction to the personalized recommendation of user.And actual conditions are most of users is noting Primary attribute information such as gender information of correlation etc. is not all filled in during volume.App of the prior art also by obtaining user installation The single data modeling such as Apply Names or series installation bag name list predicts sex, and will predict the sex come and make For the sex label that user is final.However, if uniquely rely on user establishes model in single operational behavioral data, carry out Gender prediction, easily cause the degree of accuracy for predicting the sex come relatively low, even if the degree of accuracy of gender prediction is higher, the row of collection It is the user group of the single business for data, the coverage rate of user group is narrower, and the sex of the user of other business is still Vacancy.
Therefore, to solve to fill in sex by user in the prior art or be modeled with some single business datum pre- The problem of sex is present is surveyed, it is necessary to a kind of APP classifications and its history that can be used with the data and combination user of multiple business Behavioral data establishes Gender Classification model, so as to the sex label for the model prediction user that classifies by sex, and then can be to multiple The sex of all users of business is effectively filled.
The content of the invention
The embodiment of the present invention provides a kind of generation method of Gender Classification model, sex fill method, terminal and storage and is situated between Matter, training dataset can be used as by the gender data collection and behavioral data collection of the higher user of confidence level and trains sex point Class model, and optimal sex disaggregated model is gone out by algorithm tuning parameter and test data set cross validation;And can be by this The sex for not having the user of gender information or the relatively low user of confidence level in multiple business is filled in the prediction of optimal sex disaggregated model Label, improve the overall accuracy of the sex label that all users finally judge in platform.
In a first aspect, the embodiments of the invention provide a kind of generation method of Gender Classification model, this method includes:Obtain Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are to generate objective matrix table; The gender data collection in the objective matrix table filters out the training user that treats of the multiple business, described to wait to train User is included in multiple pre-set business to be gathered containing gender information and gender information's identical user;Treat that training is used by described Gender data collection and behavioral data collection of the family in the objective matrix table are converted to the characteristic of training Gender Classification model Collection, wherein the characteristic data set includes training dataset and test data set;According to the training dataset, using decision tree Algorithm for Training goes out the Gender Classification model;According to sex described in algorithm tuning parameter and the test data set cross validation point Class model, obtain optimal sex disaggregated model.
Second aspect, the embodiments of the invention provide a kind of sex fill method, this method includes:User is obtained multiple Gender data collection in business and its behavioral data collection in multiple application programs are to generate objective matrix table;According to the property Other data set filters out the user to be filled of the multiple business and user to be corrected, and the user to be filled is included in described more The user for not having gender information in individual business gathers, and the user to be corrected, which is included in the multiple business, contains gender information And gender information's difference respectively occupies user's set of half;Each use to be filled is obtained according to the behavioral data collection The number of clicks of family and the user to be corrected in each application program is as characteristic vector;According to the feature to Amount, the sex of the user to be filled is predicted and by prediction result using optimal sex disaggregated model as described in relation to the first aspect It is filled;According to the characteristic vector, using user to be corrected described in the optimal sex disaggregated model prediction sex simultaneously With reference to the gender data collection of the user to be corrected, its mode is taken the final sex of user to be corrected and to be carried out as described in Filling.
The third aspect, the embodiments of the invention provide a kind of terminal, the terminal includes being used to perform above-mentioned first and second The unit of the method for aspect.
Fourth aspect, the embodiments of the invention provide another terminal, including processor, input equipment, output equipment and Memory, the processor, input equipment, output equipment and memory are connected with each other, wherein, the memory is used to store branch The computer program that terminal performs the above method is held, the computer program includes programmed instruction, and the processor is configured to use In calling described program instruction, the method for performing above-mentioned first and second aspect.
5th aspect, the embodiments of the invention provide a kind of storage medium, the computer-readable storage medium is stored with calculating Machine program, the computer program include programmed instruction, and described program instruction when being executed by a processor holds the processor The method of the above-mentioned first and second aspect of row.
The embodiment of the present invention provides a kind of generation method of Gender Classification model, sex fill method, terminal and storage and is situated between Matter, it is by user data with multiple business, gender data collection and its APP that uses with reference to the higher user of confidence level Classification and its click behavioral data collection include training dataset and test data as characteristic data set, the characteristic data set Collection, according to decision Tree algorithms, trains Gender Classification model, according to algorithm tuning parameter and the test using training dataset Data set cross validation obtains optimal sex disaggregated model, and then to there is no the user of gender information and sex to believe in multiple business The sex of the relatively low user of breath confidence level is filled, and can effectively improve the sex label that all users in platform finally judge Overall accuracy, the sex label finally judged can according to the preference of the user of different sexes it is related to demand progress Personalized service recommendation provides support, meanwhile, the accuracy rate estimated to the marketing of crowd's precise positioning and clicking rate plays important Effect.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, it is required in being described below to embodiment to use Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the present invention, general for this area For logical technical staff, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of schematic flow diagram of the generation method of Gender Classification model provided in an embodiment of the present invention;
Fig. 2 be Gender Classification model shown in Fig. 1 generation method in step S11 specific schematic flow diagram;
Fig. 3 is the specific schematic flow diagram of step S11b in step S11 shown in Fig. 2;
Fig. 4 is a kind of schematic flow diagram of sex fill method provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic flow diagram for sex fill method that first embodiment of the invention provides;
Fig. 6 is a kind of schematic flow diagram for sex fill method that second embodiment of the invention provides;
Fig. 7 is a kind of structural representation of the terminal corresponding with Fig. 1 methods provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic block diagram of the first acquisition unit of terminal shown in Fig. 7;
Fig. 9 is a kind of structural representation of the terminal corresponding with Fig. 4 methods provided in an embodiment of the present invention;
Figure 10 is a kind of structural representation of the terminal corresponding with Fig. 5 methods provided in an embodiment of the present invention;
Figure 11 is a kind of structural representation of the terminal corresponding with Fig. 6 methods provided in an embodiment of the present invention;
Figure 12 is another terminal schematic block diagram provided in an embodiment of the present invention;
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, rather than whole embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.
It should be appreciated that ought be in this specification and in the appended claims in use, term " comprising " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but it is not precluded from one or more of the other feature, whole Body, step, operation, element, component and/or its presence or addition for gathering.
It is also understood that the term used in this description of the invention is merely for the sake of the mesh for describing specific embodiment And be not intended to limit the present invention.As used in description of the invention and appended claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singulative, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and appended claims is Refer to any combinations of one or more of the associated item listed and be possible to combine, and including these combinations.
In the specific implementation, the terminal described in the embodiment of the present invention is including but not limited to such as with touch sensitive surface The mobile phone, laptop computer or tablet PC of (for example, touch-screen display and/or touch pad) etc it is other just Portable device.It is to be further understood that in certain embodiments, the equipment is not portable communication device, but with tactile Touch the desktop computer of sensing surface (for example, touch-screen display and/or touch pad).
In discussion below, the terminal including display and touch sensitive surface is described.It is, however, to be understood that It is that terminal can include one or more of the other physical user-interface device of such as physical keyboard, mouse and/or control-rod.
Terminal supports various application programs, such as one or more of following:Drawing application program, demonstration application journey Sequence, word-processing application, website create application program, disk imprinting application program, spreadsheet applications, game application Program, telephony application, videoconference application, email application, instant messaging applications, exercise Support application program, photo management application program, digital camera application program, digital camera application program, web-browsing application Program, digital music player application and/or video frequency player application program.
The various application programs that can be performed in terminal can use at least one public of such as touch sensitive surface Physical user-interface device.It can adjust and/or change among applications and/or in corresponding application programs and touch sensitive table The corresponding information shown in the one or more functions and terminal in face.So, the public physical structure of terminal is (for example, touch Sensing surface) the various application programs with user interface directly perceived and transparent for a user can be supported.
Fig. 1 is referred to, it is a kind of exemplary flow of the generation method of Gender Classification model provided in an embodiment of the present invention Figure, this method may operate in smart mobile phone (such as Android phone, IOS mobile phones), have the flat board electricity of mobile networking function In the equipment such as brain, personal digital assistant (PDA), Intelligent worn device.As illustrated, this method may include step S11 to S15.
S11, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment, By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got Combined with behavioral data collection to generate objective matrix table.
S12, the gender data collection in the matrix table filter out the training user that treats of the multiple business, institute State and treat that training user gathers in multiple pre-set business containing gender information and gender information's identical user.Specifically, root Filtered out according to the gender data collection in the multiple pre-set business obtained in the objective matrix table containing gender information and Gender information's identical treats training user, wherein, the multiple pre-set business can carry out self-defined setting according to user's request, Gender information is more and the overall accuracy rate of sex is higher business can also be gone out to contain by system detectio as multiple pre-set business. For example, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if Detect user in the multiple business in such as purchase machine, Yan Bao and reading three major businesses reported gender information more and Gender information is consistent, then the purchase machine, Yan Bao and reading is set in advance as into the multiple pre-set business.It is feasible at some Embodiment in, the business number of the multiple pre-set business at least accounts for the 75% of the total business number of the multiple business.
S13, the gender data collection for treating training user in the objective matrix table and behavioral data collection be converted to The characteristic data set of Gender Classification model is trained, wherein the characteristic data set includes training dataset and test data set.Tool Body, it is described to treat that training user includes containing gender information and gender information's identical user gathering in multiple pre-set business, The confidence level of the gender data collection treated training user and reported in multiple pre-set business is higher.In the present embodiment, choose Higher described of confidence level treats that the gender data collection of training user and behavioral data collection are converted to the spy of the Gender Classification model Data set is levied, wherein, the behavioral data collection treats that training user uses the historical behavior of multiple application programs including described Data set, for example, obtaining the number that user clicks on each application program in preset time;Each user will be obtained pre- If the number of clicks in the time in each application program enters row vector conversion, so as to obtain each user in each application Click feature vector under program.In addition, by preset ratio random division it is training dataset and survey by the characteristic data set Try data set.In the present embodiment, it is the training dataset and institute in seven or three ratio random divisions by the characteristic data set Test data set is stated, wherein, the training dataset accounts for 70 the percent of the characteristic data set, and the test data set accounts for 30 the percent of the characteristic data set.In some feasible embodiments, the preset ratio can be according to user's request Carry out self-defined setting.
S14, according to the training dataset, train the Gender Classification model using decision Tree algorithms.Specifically, exist In the present embodiment, the characteristic data set is randomly divided into the training dataset and the test data in seven or three ratios Collection, i.e., described training dataset account for 70 the percent of the characteristic data set, and the test data set accounts for the characteristic 30 the percent of collection.Wherein, the training dataset is as the training set for training the Gender Classification model, by the training Report gender information to show positive sample of the user of male as training pattern in data set, show the user of women as training The negative sample of model.The decision Tree algorithms can include CART algorithms (Classification And Regression Tree Algorithm), ID3 algorithms, C4.5 algorithms and random forests algorithm (Random Forest Algorithm). In the present embodiment, the gender data collection and its behavioral data collection of positive negative sample are obtained, using random forests algorithm, utilizes more trees The gender data collection and its behavioral data collection of the positive negative sample are trained, so as to train the Gender Classification model. Wherein, random forests algorithm refers to setting a kind of grader for being trained sample and predicting, the class of the output using more It is not to be determined by the mode of the classification of each tree output.Random forests algorithm can handle very high-dimensional data, and it goes without doing Feature selecting, its character subset are randomly selected, i.e., in each node, randomly select a subset of all features, be used for Calculate optimal segmentation mode.Random forests algorithm not only for unbalanced data set for, it can be with balance error, Er Qieru Fruit has substantial portion of missing features, can still maintain the degree of accuracy of its Algorithm for Training model.
S15, the Gender Classification model according to algorithm tuning parameter and the test data set cross validation, are obtained optimal Gender Classification model.Specifically, the algorithm tuning parameter can include:A number (numTrees) for random forest tree, feature Subset selection strategy (Feature Subset Strategy), Attributions selection measurement (0impurity), the depth capacity of tree The parameter such as (max Depth) and the Breadth Maximum (max Bins) of tree.Wherein, the class number is without default value, and the parameter Tuning scope includes [20,50,90,100,150,160,210,220];The feature subset selection strategy is without default value, the ginseng Several tuning scopes include:Auto, sqrt, log2, one third;The Attributions selection measurement is without default value, the tune of the parameter It is excellent to include purity (gini) and information gain (entropy);The depth capacity of the tree is without default value, the arameter optimization scope bag Include [5,10,20,25,30];The Breadth Maximum of the tree without default value, the arameter optimization scope include [50,100,200,300, 400,500].In the present embodiment, different tuning parameters is set, and then the training dataset of random division is instructed Practice, and cross validation carried out to obtain optimal sex disaggregated model using Gender Classification model described in the test data set pair, The evaluation index of the cross validation includes:Precision ((Precision), recall rate (Recall) and overall accuracy rate (Accuracy).For two categorizing systems, the situation of Gender Classification model prediction filling has 4 kinds, wherein, this 4 kinds Including:User is male and predicts that user's sex result is male, and user is male but predicts that user's sex result is Women, user's sex is women but predicts that user's sex result is male, and user's sex is women and predicts user's property Other result is women.Wherein, the definition of accuracy in the evaluation index of the cross validation is the correctly predicted result of the category Total number of users and be predicted as the category total number of users ratio, by taking male's sample of test data set as an example, user for man Property classification precision=user be male and predict total number of users that user's sex is male/(user is male and prediction should User's sex is that total number of users+user of male is the total number of users that women still predicts that user's sex is male).It is described to call together The rate of returning is defined as the total number of persons of the correctly predicted result of the category and the ratio of category effective strength, with the male of test data set Exemplified by sample, user be recall rate=user of male's classification be male and predict total number of users that user's sex is male/ (user is male and predicts that total number of users+user that user's sex is male is male but predicts that user's sex is women Total number of users).The overall accuracy rate of the Gender Classification model is defined as correctly predicted number and actual prediction number Ratio, the overall accuracy rate=(user be male and predict total number of users+user that user's sex is male be women and Predict the total number of users that user's sex is women)/total number of users.In the present embodiment, by the training dataset and institute The M-F of test data set is stated 5:In the range of 1, therefore oversampling or sub- sampling processing are not carried out to it, in model only On the premise of fitting, model evaluation index is also mainly defined by overall accuracy rate.In the present embodiment, by algorithm tuning parameter The Gender Classification model for training to obtain according to the training dataset with the test data set cross validation, and then obtain optimal Gender Classification model.Preferably, the overall accuracy rate of the optimal sex disaggregated model is at least up to 89.41%.
In the above-described embodiments, by integrating gender data collection of the user in multiple business and its in multiple application programs In behavioral data collection, and then according to the gender data collection filter out confidence level it is higher treat training user, wait to instruct by described Practice the gender data collection of user and the behavioral data collection is converted to the characteristic data set of training pattern, by using decision-making The higher user characteristic data collection of tree algorithm training confidence level draws the Gender Classification model, the prediction accuracy of its model compared with Height, and model credibility is higher, nicety of grading is high, and according to algorithm tuning parameter and the test data set cross validation Gender Classification model, so as to obtain optimal sex disaggregated model.Therefore, the Gender Classification model is higher by using confidence level Training user is treated, and the data based on multiple business are modeled, its user's coverage rate is wider, the degree of accuracy of gender prediction's result It is higher.
Refer to Fig. 2, its be Gender Classification model shown in Fig. 1 generation method in step S11 specific schematic flow diagram. As illustrated, step S11 includes S11a-S11b.
S11a, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs To generate original matrix table, wherein, the gender data collection includes user the gender information in each business, the row Click on the number of each application program in preset time including user for data set, the row of the original matrix table is to use Family ID number, row are gender information of the corresponding user in each business and its each application of click in preset time The number of program, wherein, the gender data collection and behavioral data collection are associated and be used as characteristic by the ID number of user According to collection.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, in the present embodiment, can be with By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased Machine, after sale, in Yan Bao and reading etc. reported gender information and the gender data collection and ID users without gender information; The historical behavior data set for accessing the application program is browsed due to can accordingly be produced when user accesses some application program, at this In embodiment, the behavioral data collection includes the number that user clicks on each application program under mobile internet environment, An internet log will be accordingly produced because user clicks on an application program, counting user is answered in preset time all With the internet log of program can counting user click on the corresponding numbers of all application programs, then all users are carried out similar Statistics, so as to obtain the behavioral data collection that all users access all application programs.Wherein, the preset time can be random A period of time of setting, self-defined setting can also be carried out according to the demand data of model training;According to the gender data collection The original matrix table can be generated with behavioral data collection, wherein the row of the original matrix table is ID users, row are corresponding every The number of the gender information and its click each application program of the individual user in each business.Specifically, to obtaining The obtained gender data collection of ID users, user in multiple business and its behavioral data collection in multiple application programs etc. Data carry out collecting arrangement to generate original matrix table.In the present embodiment, the original matrix table can be as shown in table 1.
Table one
In original matrix table as shown in table 1, the line direction in the original matrix table correspondingly includes each ID users, Column direction can include gender data collection of each user in multiple business and its access in preset time each to apply journey Behavioral data collection caused by sequence, in the present embodiment, the behavioral data collection are clicked on including user under mobile internet environment The number of each application program, in the present embodiment, the behavioral data collection include user under mobile internet environment Click on the number of each application program.
S11b, data cleansing is carried out to the original matrix table to generate the objective matrix table.Specifically, in this implementation In example, as shown in figure 3, Fig. 3 is the specific schematic flow diagram of step S11b in step S11 shown in Fig. 2, to the original matrix table Data cleansing is carried out, specific steps S11b includes S11b1-S11b2:
Miss rate is more than 90% application program in S11b1, the identification original matrix table.Specifically, city at present The application program of available download is very more on face, and the application program that different user installations uses also is not quite similar, and removes Some mobile phones are conventional to apply such as wechat, Alipay application program, the application programs of more minority's classes is cut etc. such as U.S. shaddock, hundred words Installation and frequency of use are also what is varied with each individual, therefore, 90% null value are had more than in the original matrix table.In the present embodiment In, the classification of the application program obtained in the original matrix table can include browser, input method, news consulting, net The application category of network community etc. 478, therefore, it is necessary to identify application journey of the miss rate more than 90% in the original matrix table Sequence.
S11b2, delete the application program identified from the original matrix table and generate the objective matrix table.
In the above-described embodiments, by integrating gender data collection of the user in multiple business and its applying journey multiple Behavioral data collection in sequence arranges cleaning to generate objective matrix table, wherein, the gender data collection includes user in each institute The gender information for having reported in business or not reported is stated, the behavioral data collection is clicked on each including user in preset time The number of application program, due to being available for the application program of installation to be unequal to its number, the application program that different people uses on the market Differ, and then need to carry out data cleansing to original matrix table, remove the application program that miss rate is up to 90%.Therefore, it is right The original matrix table carries out data cleansing processing, can reduce the complexity of algorithm process, improves and trains the Gender Classification mould The training effectiveness of type and the degree of accuracy.
Fig. 4 is referred to, it is a kind of schematic flow diagram of sex fill method provided in an embodiment of the present invention, and this method can With tablet personal computer, the individual digital for operating in smart mobile phone (such as Android phone, IOS mobile phones), there is communication interaction function In the equipment such as assistant (PDA), Intelligent worn device.As illustrated, this method may include step S21 to S25.
S21, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment, By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got Combined with behavioral data collection to generate objective matrix table.
S22, the user to be filled of the multiple business and user to be corrected filtered out according to the gender data collection, it is described User to be filled is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in described Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in multiple business.Specifically Ground, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if inspection Measure user and do not filled in the multiple business and report gender information, then screen the user group as the multiple The user to be filled of business;If detecting, user partly believes in the multiple business containing gender information and containing different sexes The business of breath respectively occupies half, i.e., in purchase machine, after sale, some user may simply in purchase machine in the four big business such as Yan Bao and reading Business and after sale business, which are filled in, has reported gender information, and fills in gender information and differ, such as certain user in machine business is purchased It is male to fill in the gender information reported, and it is women that the gender information reported is filled in business after sale;May also four kinds all fill out Write but the number of services containing different sexes information respectively occupies half, such as certain user fills out in purchase machine business and reading business The gender information for writing report is women, and it is male that the gender information reported is filled in business after sale and Yan Bao business, then should User group filters out the user to be corrected as the multiple business.
S23, each number of clicks of the user in each application program obtained according to the behavioral data collection As characteristic vector.Specifically, the behavioral data collection clicks on each institute including each user of statistics in preset time The number in application program is stated, and statistics is obtained into number of clicks of each user under each application program and enters row vector Conversion, so as to obtain click feature vector of each user under each application program.
S24, according to the characteristic vector, using the optimal sex disaggregated model obtained by the method as described in Fig. 1-3 come Predict the sex of the user to be filled and be filled prediction result.Specifically, by using the optimal Gender Classification Model, the characteristic vector data collection of each user to be filled is predicted, show that each user's to be filled is pre- Sex is surveyed, the sex result of the prediction of each user to be filled is filled.
S25, according to the characteristic vector, using the sex of user to be corrected described in the optimal sex disaggregated model prediction And with reference to the gender data collection of the user to be corrected, its mode final sex of user to be corrected as described in is taken to go forward side by side Row filling.Specifically, by using the optimal sex disaggregated model, to the characteristic vector data of each user to be corrected Collection is trained prediction, the prediction sex of each user to be corrected is drawn, by the prediction of each user to be corrected Sex combines with its gender data collection in multiple business, takes the sex result of its mode to fill and is used as the user final Sex, for example, when the sex that some user in predicting to be corrected is drawn is women, what it was reported in purchase machine business and reading business Gender information is male, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then by prediction result and sex Data concentrating takes its mode as the final sex of some user to be corrected and filled altogether, i.e., described user to be corrected The sex finally filled is women.
The sex fill method that the present embodiment provides, can by obtain gender data collection of the user in multiple business and its Behavioral data collection in multiple application programs, and filtered out according to the gender data collection and do not reported in multiple business The user to be filled of gender information and the relatively low user to be corrected of confidence level, by the user to be filled and described treat that correction is used Number of clicks conversion of the family in each application program is used as characteristic vector, is predicted by using optimal sex disaggregated model Draw the sex of the user to be filled and user to be corrected and filled accordingly, its model credibility is higher, classification essence Degree is high, and the prediction accuracy of model is higher.Therefore, the optimal sex disaggregated model can be based on multiple traffic forecasts use to be filled Family and user to be corrected, its user's coverage rate is wider, can effectively fill the user group of unknown sex, correct in partial service There is the sex for reporting gender information but the relatively low user group of confidence level, wherein, the correction reports sex in partial service Information but the sex of the relatively low user group of confidence level are also a kind of mode of user's sex filling.This programme can be more accurate pre- The sex label filled and do not have the user of gender information or the relatively low user of confidence level in multiple business is surveyed, improves in platform and owns The overall accuracy for the sex label that user finally judges.
Refer to Fig. 5, its be first embodiment of the invention provide a kind of sex fill method schematic flow diagram, the party Method may operate in smart mobile phone (such as Android phone, IOS mobile phones), the tablet personal computer with communication interaction function, individual In the equipment such as digital assistants (PDA), Intelligent worn device.As illustrated, this method may include step S31 to S37.
S31, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment, By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got Combined with behavioral data collection to generate objective matrix table.
S32, the user to be filled of the multiple business and user to be corrected filtered out according to the gender data collection, it is described User to be filled is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in described Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in multiple business.Specifically Ground, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if inspection Measure user and do not filled in the multiple business and report gender information, then screen the user group as the multiple The user to be filled of business;If detecting, user partly believes in the multiple business containing gender information and containing different sexes The business of breath respectively occupies half, i.e., in purchase machine, after sale, some user may simply in purchase machine in the four big business such as Yan Bao and reading Business and after sale business, which are filled in, has reported gender information, and fills in gender information and differ, such as certain user in machine business is purchased It is male to fill in the gender information reported, and it is women that the gender information reported is filled in business after sale;May also four kinds all fill out Write but the number of services containing different sexes information respectively occupies half, such as certain user fills out in purchase machine business and reading business The gender information for writing report is women, and it is male that the gender information reported is filled in business after sale and Yan Bao business, then should User group filters out the user to be corrected as the multiple business.
S33, each number of clicks of the user in each application program obtained according to the behavioral data collection As characteristic vector.Specifically, the behavioral data collection clicks on each institute including each user of statistics in preset time The number in application program is stated, and statistics is obtained into number of clicks of each user under each application program and enters row vector Conversion, so as to obtain click feature vector of each user under each application program.
S34, according to the characteristic vector, using the optimal sex disaggregated model obtained by the method as described in Fig. 1-3 come Predict the sex of the user to be filled and be filled prediction result.Specifically, by using the optimal Gender Classification Model, the characteristic vector data collection of each user to be filled is predicted, show that each user's to be filled is pre- Sex is surveyed, the sex result of the prediction of each user to be filled is filled.
S35, obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If S36, the prediction result are women, the scoring of the prediction result is S1.Specifically, when the optimal sex When disaggregated model training prediction sex result is women, the scoring of the prediction result is the optimal sex disaggregated model prediction The user to be filled is the overall accuracy rate S1 of women.
If S37, the prediction result are male, the scoring of the prediction result is S2, S2=1-S1.Specifically, institute is worked as When to state optimal sex disaggregated model training prediction sex result be male, the scoring of the prediction result is (1-S1).
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex The user to be filled of attribute, the sex label of the user is predicted by calling optimal sex disaggregated model, and by the gender prediction Result judgement is accordingly scored for the final prediction result of the user and to the prediction result, wherein, when gender prediction ties When fruit is male, appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), works as institute When to state gender prediction's result be women, appraisal result is the overall accuracy rate that optimal sex disaggregated model predicts that the user is women S1.The accuracy rate of the prediction result can be drawn by scoring.
Refer to Fig. 6, its be second embodiment of the invention provide a kind of sex fill method schematic flow diagram, the party Method may operate in smart mobile phone (such as Android phone, IOS mobile phones), the tablet personal computer with communication interaction function, individual In the equipment such as digital assistants (PDA), Intelligent worn device.As illustrated, this method may include step S41 to S49.
S41, obtain gender data collection of the user in multiple business and its behavioral data collection in multiple application programs To generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, the business such as Yan Bao and reading, it is described more Individual application program can apply generic according to corresponding divide of major function of each application program, and the applicating category can be with Including multiple applicating categories such as browser, input method, news consulting, Web Community and amusement social activities, it is preferable that the application Classification can include 478 application categorys such as browser, input method, news consulting, Web Community.In the present embodiment, By technical limit spacing users such as crawler capturing external website data, inquiry internal database or purchase interfaces in multiple business as purchased Machine, after sale, the gender information that in Yan Bao and reading etc. has reported and do not reported, while user can also be obtained and browsed Behavioral data collection in multiple applicating categories such as device, input method, news consulting, Web Community, the gender data collection that will be got Combined with behavioral data collection to generate objective matrix table.
S42, the user to be filled of the multiple business and user to be corrected filtered out according to the gender data collection, it is described User to be filled is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in described Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in multiple business.Specifically Ground, in the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if inspection Measure user and do not filled in the multiple business and report gender information, then screen the user group as the multiple The user to be filled of business;If detecting, user partly believes in the multiple business containing gender information and containing different sexes The business of breath respectively occupies half, i.e., in purchase machine, after sale, some user may simply in purchase machine in the four big business such as Yan Bao and reading Business and after sale business, which are filled in, has reported gender information, and fills in gender information and differ, such as certain user in machine business is purchased It is male to fill in the gender information reported, and it is women that the gender information reported is filled in business after sale;May also four kinds all fill out Write but the number of services containing different sexes information respectively occupies half, such as certain user fills out in purchase machine business and reading business The gender information for writing report is women, and it is male that the gender information reported is filled in business after sale and Yan Bao business, then should User group filters out the user to be corrected as the multiple business.
S43, each number of clicks of the user in each application program obtained according to the behavioral data collection As characteristic vector.Specifically, the behavioral data collection clicks on each institute including each user of statistics in preset time The number in application program is stated, and statistics is obtained into number of clicks of each user under each application program and enters row vector Conversion, so as to obtain click feature vector of each user under each application program.
S44, according to the characteristic vector, using the sex of user to be corrected described in the optimal sex disaggregated model prediction And with reference to the gender data collection of the user to be corrected, its mode final sex of user to be corrected as described in is taken to go forward side by side Row filling.Specifically, by using the optimal sex disaggregated model, to the characteristic vector data of each user to be corrected Collection is trained prediction, the prediction sex of each user to be corrected is drawn, by the prediction of each user to be corrected Sex combines with its gender data collection in multiple business, takes the sex result of its mode to fill and is used as the user final Sex, for example, when the sex that some user in predicting to be corrected is drawn is women, what it was reported in purchase machine business and reading business Gender information is male, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then by prediction result and sex Data concentrating takes its mode as the final sex of some user to be corrected and filled altogether, i.e., described user to be corrected The sex finally filled is women.
S45, the sex result that the user that gender information has been reported in each business is sampled to investigation and its The gender information accordingly reported in each business is compared one by one.Specifically, in the multiple business on Report the user of gender information to carry out random sampling, and in each business user is corresponded to it to sample survey results Calculating is compared in the gender information reported, so as to obtain the sex entirety accuracy rate in each business.
S46, according to comparison result, the sex entirety accuracy rate z of each business is calculatedn.Specifically, in this reality Apply in example, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, pass through investigation of sampling and compare knot Fruit can obtain the purchase machine business, after sale business, prolong guarantor's business and read business sex entirety accuracy rate be respectively z1, z2, z3, z4.In some feasible embodiments, if being investigated to the multiple business without sampling, the overall accuracy rate z1, z2, z3, z4It is defaulted as 1.0.
S47, obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If S48, the final sex are women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)× (1-z2)...×(1-zn)).Specifically, n value is the total business number of the multiple business;And gender information has been reported as man Business corresponding to property and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.For example, The multiple business includes purchase machine business, reading business, prolongs guarantor's business and the after sale four big business such as business, so n=4, its In, the purchase machine business, after sale business, prolong guarantor's business and read the sex entirety accuracy rate for having reported gender information of business Respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is women, its property reported in machine business is purchased Other information is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of business does not report gender information, then z1 =0, z2=0, z4=0, therefore, the scoring of some user to be corrected is S3=(1-S1 × (1-z3)).For another example, when certain The sex that individual user in predicting to be corrected is drawn is women, and its gender information reported in purchase machine business and reading business is man Property, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then n=4, z1=0, z2=0, therefore, it is described some The scoring of user to be corrected is S3=(1-S1 × (1-z3)×(1-z4))。
If S49, the final sex are male, the other scoring S4 of lastness, S4=(1- (1-S1) × (1-z1)× (1-z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And it is women to have reported gender information Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.Specifically, The multiple business includes purchase machine business, reading business, prolongs guarantor's business and the after sale four big business such as business, so n=4, its In, the purchase machine business, after sale business, prolong guarantor's business and read the sex entirety accuracy rate for having reported gender information of business Respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is male, its property reported in machine business is purchased Other information is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of business does not report gender information, then z2 =0, z3=0, z4=0, therefore, the scoring of some user to be corrected is S4=(1- (1-S1) × (1-z1)).For another example, When the sex that some user in predicting to be corrected is drawn is male, its gender information reported in purchase machine business and reading business is Male, be women in the gender information for prolonging guarantor's business and being reported in business after sale, then z2=0, z3=0, therefore, it is described some treat The scoring for correcting user is S4=(1- (1-S1) × (1-z1)×(1-z4))。
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for The relatively low user to be corrected of the confidence level of half, by calling optimal sex disaggregated model to predict the sex label of the user, and With reference to the gender data collection of the user to be corrected, final sex of the sex result filling of mode as the user is taken, And scored accordingly.The other accuracy rate of lastness of the prediction can be drawn by scoring.
Fig. 7 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 1 methods provided in an embodiment of the present invention Figure.The terminal 100 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant (PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 100 includes first acquisition unit 110, first Screening unit 120, data processing unit 130, training pattern unit 140, model tuning unit 150.
The first acquisition unit 110, for obtain gender data collection of the user in multiple business and its it is multiple should With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community Programs categories.In the present embodiment, by the first acquisition unit 110 obtain user in multiple business such as purchase machine, after sale, prolong The gender information that has reported and do not reported in protecting and read etc., while user can also be obtained in browser, input method, newly The behavioral data collection in multiple applicating categories such as consulting, Web Community is heard, the first acquisition unit 110 is additionally operable to get Gender data collection and behavioral data collection combine to generate objective matrix table.
First screening unit 120, filtered out for the gender data collection in the matrix table described more Individual business treats training user, it is described treat training user in multiple pre-set business containing gender information and gender information it is identical User set.Specifically, first screening unit 120 is used for the multiple pre- in the objective matrix table according to obtaining If the gender data collection in business filters out treats training user containing gender information and gender information's identical, wherein, it is described Multiple pre-set business can carry out self-defined setting according to user's request, can also be gone out by system detectio more containing gender information And the higher business of sex entirety accuracy rate is as multiple pre-set business.For example, in the present embodiment, the multiple business can be with Including purchase machine, after sale, the four big business such as Yan Bao and reading, wherein, if detect user in the multiple business such as purchase machine, prolong Protect and read etc. in three major businesses reported gender information more and gender information is consistent, then by the purchase machine, prolong guarantor And read and be set in advance as the multiple pre-set business.In some feasible embodiments, the business of the multiple pre-set business Number at least accounts for the 75% of the total business number of the multiple business.
The data processing unit 130, for by the gender data for treating training user in the objective matrix table Collection and behavioral data collection are converted to the characteristic data set of training Gender Classification model, wherein the characteristic data set includes training number According to collection and test data set.Specifically, it is described to treat that training user includes containing gender information and sex in multiple pre-set business Information identical user gathers, the confidence level of the gender data collection treated training user and reported in multiple pre-set business compared with It is high.In the present embodiment, the data processing unit 130 is used to choose the higher sex number for treating training user of confidence level The characteristic data set of the Gender Classification model is converted to according to collection and behavioral data collection, wherein, the behavioral data collection includes institute The historical behavior data set for treating that training user uses multiple application programs is stated, for example, obtaining user's point in preset time Hit the number of each application program;Click of each user in preset time in each application program will be obtained Number enters row vector conversion, so as to obtain click feature vector of each user under each application program.In addition, the data It is training dataset and test data set by preset ratio random division that processing unit 130, which is additionally operable to the characteristic data set,. In the present embodiment, it is the training dataset and the test data set in pseudo-ginseng ratio random division, wherein, the training Data set accounts for 70 the percent of the characteristic data set, and the test data set accounts for 3 the percent of the characteristic data set Ten.In some feasible embodiments, the preset ratio can carry out self-defined setting according to user's request.
The training pattern unit 140, for according to the training dataset, the property to be trained using decision Tree algorithms Other disaggregated model.Specifically, in the present embodiment, the characteristic data set is randomly divided into the training number in seven or three ratios According to collection and the test data set, i.e., described training dataset accounts for 70 the percent of the characteristic data set, the test number 30 the percent of the characteristic data set is accounted for according to collection.Wherein, the training dataset is as the training Gender Classification model Training set, the training data is concentrated the user for reporting gender information to show male be used as the positive sample of training pattern, it is aobvious Show negative sample of the user as training pattern of women.The decision Tree algorithms can include CART algorithms (Classification And Regression Tree Algorithm), ID3 algorithms, C4.5 algorithms and random forest are calculated Method (Random Forest Algorithm).In the present embodiment, the gender data collection and its behavioral data of positive negative sample are obtained Collection, using random forests algorithm, the gender data collection and its behavioral data collection of the positive negative sample are instructed using more trees Practice, so as to train the Gender Classification model.Wherein, random forests algorithm refers to being trained sample using more trees And a kind of grader predicted, the classification of the output are determined by the mode of the classification of each tree output.Random forests algorithm Very high-dimensional data can be handled, feature selecting that it goes without doing, its character subset is randomly selected, i.e., in each node, with Machine chooses a subset of all features, for calculating optimal segmentation mode.Random forests algorithm is not only for unbalanced number For collection, it can be with balance error, and if substantial portion of missing features, can still maintain its Algorithm for Training mould The degree of accuracy of type.
The model tuning unit 150, for according to algorithm tuning parameter and the test data set cross validation Gender Classification model, obtain optimal sex disaggregated model.Specifically, the algorithm tuning parameter can include:Random forest tree A number (numTrees), feature subset selection strategy (Feature Subset Strategy), Attributions selection measurement The parameter such as the depth capacity (max Depth) of (0impurity), tree and the Breadth Maximum (max Bins) of tree.Wherein, it is described Class number is without default value, and the tuning scope of the parameter includes [20,50,90,100,150,160,210,220];Feature Collection selection strategy includes without default value, the tuning scope of the parameter:Auto, sqrt, log2, one third;The Attributions selection Measurement includes purity (gini) and information gain (entropy) without default value, the tuning of the parameter;The depth capacity of the tree without Default value, the arameter optimization scope include [5,10,20,25,30];The Breadth Maximum of the tree is without default value, the arameter optimization Scope includes [50,100,200,300,400,500].In the present embodiment, different tuning parameters is set, and then drawn to random The training dataset divided is trained, and utilizes Gender Classification model described in the test data set pair to carry out cross validation To obtain optimal sex disaggregated model, the evaluation index of the cross validation includes:Precision ((Precision), recall rate And overall accuracy rate (Accuracy) (Recall).For two categorizing systems, Gender Classification model prediction filling Situation have 4 kinds, wherein, this 4 kinds include:User is male and predicts that user's sex result is male, user be male but It is to predict that user's sex result be women, user's sex is that women still predicts that user's sex result is male, Yong Huxing Women and it Wei not predict that user's sex result is women.Wherein, the precision in the evaluation index of the cross validation is determined Justice is the total number of users of the correctly predicted result of the category and is predicted as the ratio of the total number of users of the category, with test data set Exemplified by male's sample, user is that precision=user of male's classification is male and predicts that the user that user's sex is male is total Number/(user is male and predicts that total number of users+user that user's sex is male is that women still predicts that user's sex is The total number of users of male).The recall rate is defined as total number of persons and the category effective strength of the category correctly predicted result Ratio, by taking male's sample of test data set as an example, user is that recall rate=user of male's classification is male and predicts the use Family sex is the total number of users/(user is male and predicts that total number of users+user that user's sex is male is male of male But predict the total number of users that user's sex is women).The overall accuracy rate of the Gender Classification model is defined as correctly predicted Number and actual prediction number ratio, the overall accuracy rate=(user is male and predicts that user's sex is male Total number of users+user be women and predict the total number of users that user's sex is women)/total number of users.In the present embodiment, Because the training dataset and the M-F of the test data set are 5:In the range of 1, thus it is not carried out oversampling or The processing of person's sub- sampling, on the premise of model not over-fitting, model evaluation index is also mainly defined by overall accuracy rate.In this reality Apply in example, the property for training to obtain according to the training dataset by algorithm tuning parameter and the test data set cross validation Other disaggregated model, and then obtain optimal sex disaggregated model.Preferably, the overall accuracy rate of the optimal sex disaggregated model is extremely Reach 89.41% less.
In the above-described embodiments, gender data of the user in multiple business is integrated by the first acquisition unit 110 Collection and its behavioral data collection in multiple application programs, and then according to first screening unit 120 filter out confidence level compared with It is high to treat training user, so according to the data processing unit 130 by the gender data collection for treating training user and The behavioral data collection is converted to the characteristic data set of training pattern, and the training pattern unit 140 is used to calculate using decision tree The higher user characteristic data collection of method training confidence level obtains the Gender Classification model, and the prediction accuracy of its model is higher, And model credibility is higher, nicety of grading is high, and according to model tuning unit 150 come adjustment algorithm tuning parameter, using described Gender Classification model described in test data set cross validation, so as to obtain optimal sex disaggregated model.Therefore, the Gender Classification mould Type by using confidence level it is higher treat training user, and the data based on multiple business are modeled, its user's coverage rate compared with Extensively, the degree of accuracy of gender prediction's result is higher.
Fig. 8 is referred to, it is a kind of schematic block diagram of the first acquisition unit 110 of terminal shown in Fig. 7.In this implementation In example, the first acquisition unit 110 is used to obtain gender data collection of the user in multiple business and its applies journey multiple Behavioral data collection in sequence is to generate objective matrix table.Specifically, the first acquisition unit 110 obtains including matrix information battle array Unit 111 and data cleansing unit 112, wherein the data cleansing unit 112 also includes data identification unit 112a sums According to deletion unit 112b.
The matrix information acquiring unit 111, obtain gender data collection of the user in multiple business and its it is multiple should With the behavioral data collection in program to generate original matrix table, wherein, the gender data collection includes user in each industry Gender information in business, the behavioral data collection include the number that user clicks on each application program in preset time, The row of the original matrix table is ID users, and row are gender information of the corresponding user in each business and its default The number of each application program is clicked in time, wherein, by the ID number of user by the gender data collection and behavior number Associated according to collection as characteristic data set.Specifically, the multiple business can include purchase machine, after sale, Yan Bao and reading etc. Business, in the present embodiment, the technologies such as crawler capturing external website data, inquiry internal database or purchase interface can be passed through Obtain user multiple business such as purchase machine, after sale, in Yan Bao and reading etc. reported gender information and without gender information Gender data collection and ID users;Due to can accordingly be produced when user accesses some application program browse access this apply journey The historical behavior data set of sequence, in the present embodiment, the behavioral data collection are clicked on including user under mobile internet environment The number of each application program, an internet log will be accordingly produced because user clicks on an application program, is counted User in preset time the internet log of all application programs can counting user click on the correspondences time of all application programs Number, then all users are carried out with similar statistics, so as to obtain the behavioral data collection that all users access all application programs.Its In, the preset time can be a period of time being randomly provided, and can also be made by oneself according to the demand data of model training Justice is set;The original matrix table can be generated according to the gender data collection and behavioral data collection, wherein the original matrix table Row be ID users, row are gender information of the corresponding each user in each business and its click on each application The number of program.Specifically, to gender data collection in multiple business of the ID users, user that acquire and its multiple The data such as the behavioral data collection in application program carry out collecting arrangement to generate original matrix table.In the present embodiment, the original Beginning matrix table can be as shown in table 2.
Table two
In original matrix table as shown in table 2, the line direction in the original matrix table correspondingly includes each ID users, Column direction can include gender data collection of each user in multiple business and its access in preset time each to apply journey Behavioral data collection caused by sequence, in the present embodiment, the behavioral data collection are clicked on including user under mobile internet environment The number of each application program, in the present embodiment, the behavioral data collection include user under mobile internet environment Click on the number of each application program.
The data cleansing unit 112, for carrying out data cleansing to the matrix table.Specifically, the data cleansing Unit 112 also includes data identification unit 112a and data delete unit 112b.
The data identification unit 112a, for identifying the application of the miss rate more than 90% in the original matrix table Program.Specifically, the application program of available download is very more on the market at present, the application journey that different user installations uses Sequence is also not quite similar, and it is conventional using such as wechat, Alipay application program, the application program of more minority's classes to remove some mobile phones If installation that U.S. shaddock, hundred words are cut and frequency of use are also varying with each individual, therefore, 90% is had more than in the original matrix table Null value.In the present embodiment, the classification of the application program obtained in the original matrix table can include browser, defeated Enter 478 application categorys such as method, news consulting, Web Community, therefore, it is necessary to identify miss rate in the original matrix table Application program more than 90%.
The data delete unit 112b, for deleting the application program identified from the original matrix table Generate the objective matrix table.
In the above-described embodiments, by first acquisition unit 110 integrate gender data collection of the user in multiple business and Its behavioral data collection in multiple application programs arranges cleaning to generate objective matrix table, wherein it is possible to pass through matrix information The acquisition user of acquiring unit 111 is reported in the multiple business or the gender information not reported and its ID users and use The number of each application program is clicked at family in preset time, due to being available for the application program of installation to be unequal to its number on the market, no The application program that same people uses also differs, and then needs to carry out data to original matrix table by data cleansing unit 112 Cleaning, removes the application program that miss rate is up to 90%.Therefore, data cleansing processing is carried out to the original matrix table, can be with The complexity of algorithm process is reduced, improves the training effectiveness for training the Gender Classification model and the degree of accuracy.
Fig. 9 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 4 methods provided in an embodiment of the present invention Figure.The terminal 200 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant (PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 200 includes second acquisition unit 210, second Screening unit 220, fisrt feature processing unit 230, the first fills unit 240, the second fills unit 250.
The second acquisition unit 210, for obtain gender data collection of the user in multiple business and its it is multiple should With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community Programs categories.In the present embodiment, by second acquisition unit 210 obtain user in multiple business such as purchase machine, after sale, prolong guarantor with And the gender information for having reported and not reported in reading etc., while user can also be obtained and consulted in browser, input method, news Behavioral data collection in multiple applicating categories such as inquiry, Web Community, the gender data collection got and behavioral data collection are combined To generate objective matrix table.
Second screening unit 220, for filtering out the to be filled of the multiple business according to the gender data collection User and user to be corrected, the user to be filled are included in the user for not having gender information in the multiple business and gathered, institute State user to be corrected and be included in the multiple business business of the part containing gender information and containing different sexes information and respectively account for The user for having half gathers.Specifically, in the present embodiment, the multiple business can include purchase machine, after sale, Yan Bao and reading Deng four big business, wherein, if detecting that user does not fill in the multiple business reports gender information, by the user To be filled user of the mass screening as the multiple business;If detecting, user partly contains sex in the multiple business Information and business containing different sexes information respectively occupies half, i.e., in purchase machine, after sale, certain in the four big business such as Yan Bao and reading Individual user may be to fill in have reported gender information in purchase machine business and after sale business, and fill in gender information and differ, example It is male that certain user, which fills in the gender information reported, such as in machine business is purchased, and the gender information reported is filled in business after sale is Women;May also four kinds all fill in but the number of services containing different sexes information respectively occupies half, such as in purchase machine business It is women that the gender information reported is filled in certain user in reading business, and the property reported is filled in business after sale and Yan Bao business Other information is male, then the user group is filtered out into the user to be corrected as the multiple business.
The fisrt feature processing unit 230, for obtaining each user each according to the behavioral data collection Number of clicks in the application program is as characteristic vector.Specifically, the behavioral data collection includes each use of statistics The number in each application program is clicked at family in preset time, and statistics is obtained into each user in each application Number of clicks under program enters row vector conversion, special so as to obtain click of each user under each application program Sign vector.
First fills unit 240, for according to the characteristic vector, obtained by the method as described in Fig. 1-3 Optimal sex disaggregated model predict the sex of the user to be filled and be filled prediction result.Specifically, it is described First fills unit 240 is used to use the optimal sex disaggregated model, to the characteristic vector number of each user to be filled It is predicted according to collection, draws the prediction sex of each user to be filled, by the property of the prediction of each user to be filled Sex label that Tian Chong be not final as the user.
Second fills unit 250, for according to the characteristic vector, being predicted using the optimal sex disaggregated model The sex of the user to be corrected and with reference to the gender data collection of the user to be corrected, take its mode be used as described in wait to rectify The final sex of positive user is simultaneously filled.Specifically, second fills unit 250 is used to use the optimal Gender Classification Model, the characteristic vector data collection of each user to be corrected is predicted, show that each user's to be corrected is pre- Sex is surveyed, the prediction sex of each user to be corrected is combined with its gender data collection in multiple business, taken The sex result of its mode fills the sex label final as the user, for example, drawn when some user in predicting to be corrected Sex is women, and its gender information reported in purchase machine business and reading business is male, is prolonging guarantor's business and after sale business In the gender information that reports be women, combine using prediction result with gender data collection and take its mode to treat that correction is used as some The final sex at family, i.e., the final sex of some described user to be corrected is women.
The sex fill method that the present embodiment provides, user can be obtained by the second acquisition unit 210 in multiple industry Gender data collection in business and its behavioral data collection in multiple application programs, and sieved according to second screening unit 220 The user to be filled for not reporting gender information in multiple business and the user to be corrected that confidence level is relatively low are selected, then is led to Cross point of the fisrt feature processing unit 230 by the user to be filled and the user to be corrected in each application program Hit number conversion and be used as characteristic vector, and then the first fills unit 240 and the second fills unit 250 use optimal Gender Classification Model prediction draws the sex of the user to be filled and user to be corrected and filled accordingly, its model credibility compared with Height, nicety of grading is high, and the prediction accuracy of model is higher.Therefore, it is pre- can be based on multiple business for the optimal sex disaggregated model User to be filled and user to be corrected are surveyed, its user's coverage rate is wider, can effectively fill the user group of unknown sex, correct There is the sex for reporting gender information but the relatively low user group of confidence level in partial service, wherein, the correction is in partial service There is a kind of mode for reporting the sex of gender information but the relatively low user group of confidence level to be also the filling of user's sex.This programme energy More Accurate Prediction fills the sex label for not having the user of gender information or the relatively low user of confidence level in multiple business, improves The overall accuracy for the sex label that all users finally judge in platform.
Figure 10 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 5 methods provided in an embodiment of the present invention Figure.The terminal 300 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant (PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 300 includes the 3rd acquiring unit the 310, the 3rd Screening unit 320, second feature processing unit 330, the 3rd fills unit 340, the first accuracy rate acquiring unit 350, first are commented The scoring unit 370 of subdivision 360 and second.
3rd acquiring unit 310, for obtain gender data collection of the user in multiple business and its it is multiple should With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community Programs categories.In the present embodiment, by the 3rd acquiring unit 310 obtain user in multiple business such as purchase machine, after sale, prolong guarantor with And the gender information for having reported and not reported in reading etc., while user can also be obtained and consulted in browser, input method, news Behavioral data collection in multiple applicating categories such as inquiry, Web Community, the gender data collection got and behavioral data collection are combined To generate objective matrix table.
The third filtering unit 320, for filtering out the to be filled of the multiple business according to the gender data collection User and user to be corrected, the user to be filled are included in the user for not having gender information in the multiple business and gathered, institute State user to be corrected and be included in the multiple business business of the part containing gender information and containing different sexes information and respectively account for The user for having half gathers.Specifically, in the present embodiment, the multiple business can include purchase machine, after sale, Yan Bao and reading Deng four big business, wherein, if detecting that user does not fill in the multiple business reports gender information, by the user To be filled user of the mass screening as the multiple business;If detecting, user partly contains sex in the multiple business Information and business containing different sexes information respectively occupies half, i.e., in purchase machine, after sale, certain in the four big business such as Yan Bao and reading Individual user may be to fill in have reported gender information in purchase machine business and after sale business, and fill in gender information and differ, example It is male that certain user, which fills in the gender information reported, such as in machine business is purchased, and the gender information reported is filled in business after sale is Women;May also four kinds all fill in but the number of services containing different sexes information respectively occupies half, such as in purchase machine business It is women that the gender information reported is filled in certain user in reading business, and the property reported is filled in business after sale and Yan Bao business Other information is male, then the user group is filtered out into the user to be corrected as the multiple business.
The second feature processing unit 330, for obtaining each user each according to the behavioral data collection Number of clicks in the application program is as characteristic vector.Specifically, the behavioral data collection includes each use of statistics The number in each application program is clicked at family in preset time, and statistics is obtained into each user in each application Number of clicks under program enters row vector conversion, special so as to obtain click of each user under each application program Sign vector.
3rd fills unit 340, for according to the characteristic vector, obtained by the method as described in Fig. 1-3 Optimal sex disaggregated model predict the sex of the user to be filled and be filled prediction result.Specifically, it is described 3rd fills unit 340 is used to use the optimal sex disaggregated model, to the characteristic vector number of each user to be filled It is predicted according to collection, draws the prediction sex of each user to be filled, by the property of the prediction of each user to be filled Sex label that Tian Chong be not final as the user.
The first accuracy rate acquiring unit 350, predict that the user is female for obtaining the optimal sex disaggregated model The overall accuracy rate S1 of property.
The first scoring unit 360, if being women for the prediction result, the scoring of the prediction result is S1. Specifically, when the optimal sex disaggregated model training prediction sex result is women, the scoring of the prediction result is institute State the overall accuracy rate S1 that optimal sex disaggregated model predicts that the user to be filled is women.
The second scoring unit 370, if being male for the prediction result, the scoring of the prediction result is S2, The S2=1-S1.Specifically, when the optimal sex disaggregated model training prediction sex result is male, the prediction knot The scoring of fruit is (1-S1).
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex The user to be filled of attribute, optimal sex disaggregated model is called to predict the sex of the user by the 3rd fills unit 340 Label, it is the final prediction result of the user by gender prediction's result judgement, and is scored by the first scoring unit 360 and second Unit 370 is scored gender prediction's result accordingly, wherein, the second scoring unit 370 is used for when gender prediction ties When fruit is male, appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), described First scoring unit 360 is used for when when it is women that gender prediction, which comes out result, appraisal result is that optimal sex disaggregated model is pre- Survey the overall accuracy rate S1 that the user is women.The accuracy rate of the prediction result can be drawn by scoring.
Figure 11 is referred to, it is a kind of structural representation of the terminal corresponding with Fig. 6 methods provided in an embodiment of the present invention Figure.The terminal 400 can be smart mobile phone (such as Android phone, IOS mobile phones), tablet personal computer, personal digital assistant (PDA), Intelligent worn device etc. has the equipment of mobile networking function.The terminal 400 includes the 4th acquiring unit the 410, the 4th Screening unit 420, third feature processing unit 430, the 4th fills unit 440, sampling comparing unit 450, computing unit 460, Second accuracy rate acquiring unit the 470, the 3rd scoring scoring unit 490 of unit 480 and the 4th.
4th acquiring unit 410, for obtain gender data collection of the user in multiple business and its it is multiple should With the behavioral data collection in program to generate objective matrix table.Specifically, the multiple business can include purchase machine, after sale, prolong The business such as guarantor and reading, the multiple application program can be according to belonging to the corresponding division applications of major function of each application program Classification, the applicating category can include browser, input method, and news consulting, Web Community and amusement social activity etc. are multiple should With classification, it is preferable that the applicating category can include 478 applications such as browser, input method, news consulting, Web Community Programs categories.In the present embodiment, by the 4th acquiring unit 410 obtain user in multiple business such as purchase machine, after sale, prolong guarantor with And the gender information for having reported and not reported in reading etc., while user can also be obtained and consulted in browser, input method, news Behavioral data collection in multiple applicating categories such as inquiry, Web Community, the gender data collection got and behavioral data collection are combined To generate objective matrix table.
4th screening unit 420, for filtering out the to be filled of the multiple business according to the gender data collection User and user to be corrected, the user to be filled are included in the user for not having gender information in the multiple business and gathered, institute State user to be corrected and be included in the multiple business business of the part containing gender information and containing different sexes information and respectively account for The user for having half gathers.Specifically, in the present embodiment, the multiple business can include purchase machine, after sale, Yan Bao and reading Deng four big business, wherein, if detecting that user does not fill in the multiple business reports gender information, by the user To be filled user of the mass screening as the multiple business;If detecting, user partly contains sex in the multiple business Information and business containing different sexes information respectively occupies half, i.e., in purchase machine, after sale, certain in the four big business such as Yan Bao and reading Individual user may be to fill in have reported gender information in purchase machine business and after sale business, and fill in gender information and differ, example It is male that certain user, which fills in the gender information reported, such as in machine business is purchased, and the gender information reported is filled in business after sale is Women;May also four kinds all fill in but the number of services containing different sexes information respectively occupies half, such as in purchase machine business It is women that the gender information reported is filled in certain user in reading business, and the property reported is filled in business after sale and Yan Bao business Other information is male, then the user group is filtered out into the user to be corrected as the multiple business.
The third feature processing unit 430, for obtaining each user each according to the behavioral data collection Number of clicks in the application program is as characteristic vector.Specifically, the behavioral data collection includes each use of statistics The number in each application program is clicked at family in preset time, and statistics is obtained into each user in each application Number of clicks under program enters row vector conversion, special so as to obtain click of each user under each application program Sign vector.
4th fills unit 440, for according to the characteristic vector, being predicted using the optimal sex disaggregated model The sex of the user to be corrected and with reference to the gender data collection of the user to be corrected, take its mode be used as described in wait to rectify The final sex of positive user is simultaneously filled.Specifically, the 4th fills unit 440 is used to use the optimal Gender Classification Model, the characteristic vector data collection of each user to be corrected is predicted, show that each user's to be corrected is pre- Sex is surveyed, the prediction sex of each user to be corrected is combined with its gender data collection in multiple business, taken The sex result of its mode fills the sex label final as the user, for example, drawn when some user in predicting to be corrected Sex is women, and its gender information reported in purchase machine business and reading business is male, is prolonging guarantor's business and after sale business In the gender information that reports be women, combine using prediction result with gender data collection and take its mode to treat that correction is used as some The final sex at family, i.e., the final sex of some described user to be corrected is women.
The sampling comparing unit 450, in each business the user of gender information will have been reported to be taken out Sample investigation sex result to its in each business the corresponding gender information reported compared one by one.Specifically, To reported in the multiple business gender information user carry out random sampling, and to sample survey results and its Calculating is compared in the gender information that reporting of user is corresponded in each business, so as to obtain in each business Sex entirety accuracy rate.
The computing unit 460, for according to comparison result, the sex that each business is calculated to be integrally accurate Rate.In the present embodiment, the multiple business can include purchase machine, after sale, the four big business such as Yan Bao and reading, pass through to sample and adjust Grind comparison result can obtain the purchase machine business, after sale business, prolong guarantor's business and read the sex entirety accuracy rate point of business Wei not z1, z2, z3, z4.In some feasible embodiments, if being investigated to the multiple business without sampling, the entirety Accuracy rate z1, z2, z3, z4It is defaulted as 1.0.
The second accuracy rate acquiring unit 470, predict that the user is female for obtaining the optimal sex disaggregated model The overall accuracy rate S1 of property.
The 3rd scoring unit 480, if being women for the final sex, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×(1-z2)...×(1-zn)).Specifically, n value is the total business number of the multiple business; And gender information has been reported as the business corresponding to male and integrally accurate without the sex of the business corresponding to gender information Rate znValue be zero.For example, the multiple business includes purchase machine business, reading business, prolongs guarantor's business and after sale business etc. Four big business, so n=4, wherein, the purchase machine business, after sale business, prolong guarantor's business and that reads business has reported sex The sex entirety accuracy rate of information is respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is women, its The gender information reported in machine business is purchased is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of business Do not report gender information, then z1=0, z2=0, z4=0, therefore, the scoring of some user to be corrected is S3=(1-S1 ×(1-z3)).For another example, when the sex that some user in predicting to be corrected is drawn is women, it is in purchase machine business and reads business In the gender information that reports be male, be women in the gender information for prolonging guarantor's business and being reported in business after sale, then n=4, z1= 0, z2=0, therefore, the scoring of some user to be corrected is S3=(1-S1 × (1-z3)×(1-z4))。
The 4th scoring unit 490, if being male for the final sex, lastness other the scoring S4, S4 =(1- (1-S1) × (1-z1)×(1-z2)...×(1-zn)), wherein, n value is the total business number of the multiple business; And gender information has been reported as the business corresponding to women and integrally accurate without the sex of the business corresponding to gender information Rate znValue be zero.Specifically, the multiple business includes purchase machine business, reading business, prolongs guarantor's business and after sale business Deng four big business, so n=4, wherein, the purchase machine business, after sale business, prolong guarantor's business and read the having reported property of business The sex entirety accuracy rate of other information is respectively z1, z2, z3, z4.When the sex that some user in predicting to be corrected is drawn is male, Its gender information reported in machine business is purchased is male, is women prolonging the gender information that guarantor's business reports, remaining two kinds of industry Business does not report gender information, then z2=0, z3=0, z4=0, therefore, the scoring of some user to be corrected is S4=(1- (1-S1)×(1-z1)).For another example, when the sex that some user in predicting to be corrected is drawn is male, it is in purchase machine business and reads The gender information reported in reading business is male, is women in the gender information for prolonging guarantor's business and being reported in business after sale, then z2 =0, z3=0, therefore, the scoring of some user to be corrected is S4=(1- (1-S1) × (1-z1)×(1-z4))。
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for The relatively low user to be corrected of the confidence level of half, optimal sex disaggregated model is called to predict by the 4th fills unit 440 The sex label of the user, by the gender data collection with reference to the user to be corrected, mode is taken as the final of the user Sex, and scored accordingly by the described 3rd scoring unit 480 and the 4th scoring unit 490.Can by scoring To draw the other accuracy rate of the lastness of the prediction
Figure 12 is referred to, it is a kind of terminal schematic block diagram that another embodiment of the present invention provides.As depicted originally Terminal in embodiment can include:One or more processors 801;One or more input equipments 802, it is one or more defeated Go out equipment 803 and memory 804.Above-mentioned processor 801, input equipment 802, output equipment 803 and memory 804 pass through bus 805 connections.Memory 802 is used to store computer program, and the computer program includes programmed instruction, and processor 801 is used for Perform the programmed instruction that memory 802 stores.
Wherein, processor 801 is arranged to call described program instruction to perform:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The gender data collection in the matrix table filters out the training user that treats of the multiple business, described to treat Training user gathers in multiple pre-set business containing gender information and gender information's identical user.
The gender data collection that training user is treated in the objective matrix table and behavioral data collection are converted into training The characteristic data set of Gender Classification model, wherein the characteristic data set includes training dataset and test data set.
According to the training dataset, the Gender Classification model is trained using decision Tree algorithms.
According to Gender Classification model described in algorithm tuning parameter and the test data set cross validation, optimal sex is obtained Disaggregated model.
Further realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into original matrix table, wherein, the gender data collection includes user the gender information in each business, the behavior number The number for including user according to collecting and each application program being clicked in preset time, the row of the original matrix table is ID Number, row are gender information of the corresponding user in each business and its each application program of click in preset time Number, wherein, the gender data collection and behavioral data collection are associated and are used as characteristic data set by the ID number of user. Data cleansing is carried out to the original matrix table to generate the objective matrix table.Specifically:
Identify the application program of the miss rate more than 90% in the original matrix table.
The application program identified is deleted from the original matrix table and generates the objective matrix table.
Wherein, processor 801 may be further configured for calling described program instruction to perform:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection Characteristic vector.
According to the characteristic vector, according to the optimal Gender Classification mould obtained by the generation method of the Gender Classification model Type predicts the sex of the user to be filled and is filled prediction result.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in Fill.
Further realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection Characteristic vector.
According to the characteristic vector, using the optimal Gender Classification mould obtained by the generation method of the Gender Classification model Type predicts the sex of the user to be filled and is filled prediction result.
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the prediction result is women, the scoring of the prediction result is S1.
If the prediction result is male, the scoring of the prediction result is S2, the S2=1-S1.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex The user to be filled of attribute, the sex label of the user is predicted by calling optimal sex disaggregated model, and by the gender prediction Result judgement is the final prediction result of the user and is scored accordingly, wherein, when gender prediction's result is male user, Appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), when gender prediction out ties When fruit is women, appraisal result is the overall accuracy rate S1 that optimal sex disaggregated model predicts that the user is women.Pass through scoring The accuracy rate of the prediction result can be drawn.
It can also realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection Characteristic vector.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in Fill.
By the user that gender information has been reported in each business be sampled the sex result of investigation with its The gender information accordingly reported in each business is compared one by one.
According to comparison result, the sex entirety accuracy rate z of each business is calculatedn
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the final sex is women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×(1- z2)...×(1-zn))。
If the final sex is male, the other scoring S4 of lastness, S4=(1- (1-S1) × (1-z1)×(1- z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And gender information has been reported as women institute Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for The relatively low user to be corrected of the confidence level of half, by calling optimal sex disaggregated model to predict the sex label of the user, and With reference to the gender data collection of the user to be corrected, final sex of the sex result filling of mode as the user is taken, And scored accordingly.The other accuracy rate of lastness of the prediction can be drawn by scoring.
It should be appreciated that in embodiments of the present invention, alleged processor 801 can be CPU (Central Processing Unit, CPU), the processor can also be other general processors, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other FPGAs Device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor or this at It can also be any conventional processor etc. to manage device.
Input equipment 802 can include Trackpad, fingerprint adopt sensor (finger print information that is used to gathering user and fingerprint Directional information), microphone etc., output equipment 803 can include display (LCD etc.), loudspeaker etc..
The memory 804 can include read-only storage and random access memory, and to processor 801 provide instruction and Data.The a part of of memory 804 can also include nonvolatile RAM.For example, memory 804 can also be deposited Store up the information of device type.
In the specific implementation, processor 801, input equipment 802, the output equipment 803 described in the embodiment of the present invention can Perform the generation method of Gender Classification model provided in an embodiment of the present invention and the first embodiment of sex fill method and Implementation described in two embodiments, the implementation of the terminal described by the embodiment of the present invention is also can perform, herein not Repeat again.
A kind of storage medium is provided in another embodiment of the invention, and the storage medium can be computer-readable deposits Storage media, the storage medium are stored with computer program, and the computer program includes programmed instruction, described program instruction quilt Realized during computing device:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The gender data collection in the matrix table filters out the training user that treats of the multiple business, described to treat Training user gathers in multiple pre-set business containing gender information and gender information's identical user.
The gender data collection that training user is treated in the objective matrix table and behavioral data collection are converted into training The characteristic data set of Gender Classification model, wherein the characteristic data set includes training dataset and test data set.
According to the training dataset, the Gender Classification model is trained using decision Tree algorithms.
According to Gender Classification model described in algorithm tuning parameter and the test data set cross validation, optimal sex is obtained Disaggregated model.
Further realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into original matrix table, wherein, the gender data collection includes user the gender information in each business, the behavior number The number for including user according to collecting and each application program being clicked in preset time, the row of the original matrix table is ID Number, row are gender information of the corresponding user in each business and its each application program of click in preset time Number, wherein, the gender data collection and behavioral data collection are associated and are used as characteristic data set by the ID number of user. Data cleansing is carried out to the original matrix table to generate the objective matrix table.Specifically:
Identify the application program of the miss rate more than 90% in the original matrix table.
The application program identified is deleted from the original matrix table and generates the objective matrix table.
Wherein, processor 801 may be further configured for calling described program instruction to perform:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out Filling user includes not having in the multiple business the user of gender information to gather, and the user to be corrected is included in the multiple industry Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection Characteristic vector.
According to the characteristic vector, according to the optimal Gender Classification mould obtained by the generation method of the Gender Classification model Type predicts the sex of the user to be filled and is filled prediction result.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in Fill.
It can also realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection Characteristic vector.
According to the characteristic vector, using the optimal Gender Classification mould obtained by the generation method of the Gender Classification model Type predicts the sex of the user to be filled and is filled prediction result.
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the prediction result is women, the scoring of the prediction result is S1.
If the prediction result is male, the scoring of the prediction result is S2, the S2=1-S1.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading do not report sex The user to be filled of attribute, the sex label of the user is predicted by calling optimal sex disaggregated model, and by the gender prediction Result judgement is the final prediction result of the user and is scored accordingly, wherein, when gender prediction's result is male user, Appraisal result is (the optimal sex disaggregated models of 1- predict the overall accuracy rate S1 that the user is women), when gender prediction out ties When fruit is women, appraisal result is the overall accuracy rate S1 that optimal sex disaggregated model predicts that the user is women.Pass through scoring The accuracy rate of the prediction result can be drawn.
Further, can also realize:
Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained with life Into objective matrix table.
The user to be filled of the multiple business and user to be corrected are filtered out according to the gender data collection, it is described to wait to fill out Fill user including the user for not having gender information in multiple business to gather, the user to be corrected is included in the multiple Business of the part containing gender information and containing different sexes information respectively occupies user's set of half in business.
Each number of clicks conduct of the user in each application program is obtained according to the behavioral data collection Characteristic vector.
According to the characteristic vector, using the sex and knot of user to be corrected described in the optimal sex disaggregated model prediction The gender data collection of user to be corrected described in conjunction, its mode is taken the final sex of user to be corrected and to be filled out as described in Fill.
By the user that gender information has been reported in each business be sampled the sex result of investigation with its The gender information accordingly reported in each business is compared one by one.
According to comparison result, the sex entirety accuracy rate z of each business is calculatedn
Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women.
If the final sex is women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×(1- z2)...×(1-zn))。
If the final sex is male, the other scoring S4 of lastness, S4=(1- (1-S1) × (1-z1)×(1- z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And gender information has been reported as women institute Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.
In the above-described embodiments, in purchase machine, after sale, in the four big business such as Yan Bao and reading only reported two great causes The gender attribute of business and gender information is inconsistent or reported gender attribute in four big business and gender information is inconsistent respectively accounts for The relatively low user to be corrected of the confidence level of half, by calling optimal sex disaggregated model to predict the sex label of the user, and With reference to the gender data collection of the user to be corrected, final sex of the sex result filling of mode as the user is taken, And scored accordingly.The other accuracy rate of lastness of the prediction can be drawn by scoring.
The storage medium can be the internal storage unit of the terminal described in foregoing any embodiment, such as terminal is hard Disk or internal memory.The storage medium can also be the grafting being equipped with the External memory equipment of the terminal, such as the terminal Formula hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, the storage medium can also both including the terminal internal storage unit and also including External memory equipment.The storage medium is used to store the computer program and other program sums needed for the terminal According to.The storage medium can be also used for temporarily storing the data that has exported or will export.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Member and algorithm steps, it can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, the composition and step of each example are generally described according to function in the above description.This A little functions are performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme.Specially Industry technical staff can realize described function using distinct methods to each specific application, but this realization is not It is considered as beyond the scope of this invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, the end of foregoing description End and the specific work process of unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed terminal and method, it can be passed through Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only Only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can be tied Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.In addition, shown or discussed phase Coupling or direct-coupling or communication connection between mutually can be INDIRECT COUPLING or the communication by some interfaces, device or unit Connection or electricity, the connection of mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize scheme of the embodiment of the present invention according to the actual needs Purpose.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also It is that unit is individually physically present or two or more units are integrated in a unit.It is above-mentioned integrated Unit can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part to be contributed in other words to prior art, or all or part of the technical scheme can be in the form of software product Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the present invention Portion or part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, various equivalent modifications can be readily occurred in or replaced Change, these modifications or substitutions should be all included within the scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection domain be defined.

Claims (14)

  1. A kind of 1. generation method of Gender Classification model, it is characterised in that including:
    Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate mesh Mark matrix table;
    The gender data collection in the objective matrix table filters out the training user that treats of the multiple business, described to treat Training user is included in multiple pre-set business to be gathered containing gender information and gender information's identical user;
    The gender data collection that training user is treated in the objective matrix table and behavioral data collection are converted into training sex The characteristic data set of disaggregated model, wherein the characteristic data set includes training dataset and test data set;
    According to the training dataset, the Gender Classification model is trained using decision Tree algorithms;
    According to Gender Classification model described in algorithm tuning parameter and the test data set cross validation, optimal Gender Classification is obtained Model.
  2. 2. according to the method for claim 1, it is characterised in that the gender data collection for obtaining user in multiple business And its behavioral data collection in multiple application programs is specifically included with generating objective matrix table:
    Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate original Beginning matrix table, wherein, the gender data collection includes user the gender information in each business, the behavioral data collection The number of each application program is clicked in preset time including user, the row of the original matrix table is ID users, Row are gender information of the corresponding user in each business and its each application program of click in preset time Number;
    Data cleansing is carried out to the original matrix table to generate the objective matrix table.
  3. 3. according to the method for claim 2, it is characterised in that described that data cleansing is carried out to the original matrix table with life Into the objective matrix table, specifically include:
    Identify application program of the miss rate more than 90% in the original matrix table;
    The application program identified is deleted from the original matrix table and generates the objective matrix table.
  4. 4. according to the method for claim 1, it is characterised in that the multiple business include purchase machine business, after sale business, Prolong guarantor's business, reading business.
  5. 5. according to the method for claim 1, it is characterised in that the business number of the multiple pre-set business at least accounts for described more The 75% of the total business number of individual business.
  6. 6. according to the method for claim 1, it is characterised in that the decision Tree algorithms include:CART algorithms, ID3 algorithms, C4.5 algorithms and random forests algorithm.
  7. 7. according to the method for claim 1, it is characterised in that the algorithm tuning parameter includes:A number for decision tree, spy Levy subset selection strategy, Attributions selection measurement, the depth capacity of tree and the Breadth Maximum of tree.
  8. 8. according to the method for claim 1, it is characterised in that the evaluation index of the cross validation includes:Precision, recall Rate and overall accuracy rate.
  9. A kind of 9. sex fill method, it is characterised in that including:
    Gender data collection of the user in multiple business and its behavioral data collection in multiple application programs are obtained to generate mesh Mark matrix table;
    The user to be filled of the multiple business and user to be corrected, the use to be filled are filtered out according to the gender data collection Family is included in the user for not having gender information in the multiple business and gathered, and the user to be corrected is included in the multiple business Business of the middle part containing gender information and containing different sexes information respectively occupies user's set of half;According to the behavior number The each number of clicks conduct of the user to be filled and the user to be corrected in each application program is obtained according to collection Characteristic vector;
    According to the characteristic vector, described treat is predicted using the optimal sex disaggregated model described in claim any one of 1-8 Fill the sex of user and be filled prediction result;
    According to the characteristic vector, using the sex of user to be corrected described in the optimal sex disaggregated model prediction and institute is combined The gender data collection of user to be corrected is stated, takes its mode the final sex of user to be corrected and to be filled as described in.
  10. 10. according to the method for claim 9, it is characterised in that it is described prediction result is filled after, in addition to:
    Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women;
    If the prediction result is women, the scoring of the prediction result is S1;
    If the prediction result is male, the scoring of the prediction result is S2, and the S2 is equal to 1-S1.
  11. 11. according to the method for claim 9, it is characterised in that described to take its mode user to be corrected is most as described in Whole sex and after being filled, in addition to:
    The user that gender information has been reported in each business is sampled the sex result of investigation with it in each institute The gender information accordingly reported in business is stated to be compared one by one;
    According to comparison result, the sex entirety accuracy rate z of each business is calculatedn
    Obtain the overall accuracy rate S1 that the optimal sex disaggregated model predicts that the user is women;
    If the final sex is women, the other scoring of lastness is S3, S3=(1-S1 × (1-z1)×(1-z2)...× (1-zn)), wherein, n value is the total business number of the multiple business;And industry of the gender information corresponding to male is reported Business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero;
    If the final sex is male, the other scoring of lastness is S4, S4=(1- (1-S1) × (1-z1)×(1- z2)...×(1-zn)), wherein, n value is the total business number of the multiple business;And gender information has been reported as women institute Corresponding business and the sex entirety accuracy rate z without the business corresponding to gender informationnValue be zero.
  12. 12. a kind of terminal, it is characterised in that including for performing the method as described in claim 1-11 any claims Unit.
  13. 13. a kind of terminal, it is characterised in that the processor, defeated including processor, input equipment, output equipment and memory Enter equipment, output equipment and memory to be connected with each other, wherein, the memory is used to store computer program, the computer Program includes programmed instruction, and the processor is arranged to call described program instruction, performed as claim 1-11 is any Method described in.
  14. 14. a kind of storage medium, it is characterised in that the storage medium is stored with computer program, the computer program bag Programmed instruction is included, described program instruction makes the computing device such as any one of claim 1-11 institutes when being executed by a processor The method stated.
CN201711176286.9A 2017-11-22 2017-11-22 Generation method, sex fill method, terminal and the storage medium of Gender Classification model Withdrawn CN107886366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711176286.9A CN107886366A (en) 2017-11-22 2017-11-22 Generation method, sex fill method, terminal and the storage medium of Gender Classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711176286.9A CN107886366A (en) 2017-11-22 2017-11-22 Generation method, sex fill method, terminal and the storage medium of Gender Classification model

Publications (1)

Publication Number Publication Date
CN107886366A true CN107886366A (en) 2018-04-06

Family

ID=61778274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711176286.9A Withdrawn CN107886366A (en) 2017-11-22 2017-11-22 Generation method, sex fill method, terminal and the storage medium of Gender Classification model

Country Status (1)

Country Link
CN (1) CN107886366A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960922A (en) * 2018-07-09 2018-12-07 中国联合网络通信集团有限公司 The replacement prediction technique and device of terminal
CN109492104A (en) * 2018-11-09 2019-03-19 北京京东尚科信息技术有限公司 Training method, classification method, system, equipment and the medium of intent classifier model
CN110097170A (en) * 2019-04-25 2019-08-06 深圳市豪斯莱科技有限公司 Information pushes object prediction model acquisition methods, terminal and storage medium
CN110502432A (en) * 2019-07-23 2019-11-26 平安科技(深圳)有限公司 Intelligent test method, device, equipment and readable storage medium storing program for executing
CN110781374A (en) * 2018-07-13 2020-02-11 北京字节跳动网络技术有限公司 User data matching method and device, electronic equipment and computer readable medium
CN110784760A (en) * 2019-09-16 2020-02-11 清华大学 Video playing method, video player and computer storage medium
CN111078742A (en) * 2019-12-09 2020-04-28 秒针信息技术有限公司 User classification model training method, user classification method and device
CN111178983A (en) * 2020-01-03 2020-05-19 北京搜狐新媒体信息技术有限公司 User gender prediction method, device, equipment and storage medium
WO2020192460A1 (en) * 2019-03-25 2020-10-01 华为技术有限公司 Data processing method, terminal-side device, cloud-side device, and terminal-cloud collaboration system
CN113657917A (en) * 2020-05-12 2021-11-16 上海佳投互联网技术集团有限公司 Visitor gender analysis method and system based on USER-AGENT
CN116992267A (en) * 2023-09-28 2023-11-03 北京融信数联科技有限公司 Regional population gender identification method and system based on signaling data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654131A (en) * 2015-12-30 2016-06-08 小米科技有限责任公司 Classification model training method and device
CN106203473A (en) * 2016-06-24 2016-12-07 有米科技股份有限公司 A kind of mobile subscriber's gender prediction's method based on installation kit list
CN106682686A (en) * 2016-12-09 2017-05-17 北京拓明科技有限公司 User gender prediction method based on mobile phone Internet-surfing behavior
CN106897727A (en) * 2015-12-21 2017-06-27 百度在线网络技术(北京)有限公司 A kind of user's gender identification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897727A (en) * 2015-12-21 2017-06-27 百度在线网络技术(北京)有限公司 A kind of user's gender identification method and device
CN105654131A (en) * 2015-12-30 2016-06-08 小米科技有限责任公司 Classification model training method and device
CN106203473A (en) * 2016-06-24 2016-12-07 有米科技股份有限公司 A kind of mobile subscriber's gender prediction's method based on installation kit list
CN106682686A (en) * 2016-12-09 2017-05-17 北京拓明科技有限公司 User gender prediction method based on mobile phone Internet-surfing behavior

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960922A (en) * 2018-07-09 2018-12-07 中国联合网络通信集团有限公司 The replacement prediction technique and device of terminal
CN110781374A (en) * 2018-07-13 2020-02-11 北京字节跳动网络技术有限公司 User data matching method and device, electronic equipment and computer readable medium
CN109492104A (en) * 2018-11-09 2019-03-19 北京京东尚科信息技术有限公司 Training method, classification method, system, equipment and the medium of intent classifier model
WO2020192460A1 (en) * 2019-03-25 2020-10-01 华为技术有限公司 Data processing method, terminal-side device, cloud-side device, and terminal-cloud collaboration system
CN110097170A (en) * 2019-04-25 2019-08-06 深圳市豪斯莱科技有限公司 Information pushes object prediction model acquisition methods, terminal and storage medium
CN110502432A (en) * 2019-07-23 2019-11-26 平安科技(深圳)有限公司 Intelligent test method, device, equipment and readable storage medium storing program for executing
CN110502432B (en) * 2019-07-23 2023-11-28 平安科技(深圳)有限公司 Intelligent test method, device, equipment and readable storage medium
CN110784760A (en) * 2019-09-16 2020-02-11 清华大学 Video playing method, video player and computer storage medium
CN111078742A (en) * 2019-12-09 2020-04-28 秒针信息技术有限公司 User classification model training method, user classification method and device
CN111078742B (en) * 2019-12-09 2023-09-05 秒针信息技术有限公司 User classification model training method, user classification method and device
CN111178983A (en) * 2020-01-03 2020-05-19 北京搜狐新媒体信息技术有限公司 User gender prediction method, device, equipment and storage medium
CN111178983B (en) * 2020-01-03 2024-03-12 北京搜狐新媒体信息技术有限公司 User gender prediction method, device, equipment and storage medium
CN113657917A (en) * 2020-05-12 2021-11-16 上海佳投互联网技术集团有限公司 Visitor gender analysis method and system based on USER-AGENT
CN116992267A (en) * 2023-09-28 2023-11-03 北京融信数联科技有限公司 Regional population gender identification method and system based on signaling data
CN116992267B (en) * 2023-09-28 2024-01-23 北京融信数联科技有限公司 Regional population gender identification method and system based on signaling data

Similar Documents

Publication Publication Date Title
CN107886366A (en) Generation method, sex fill method, terminal and the storage medium of Gender Classification model
CN105718515B (en) Data-storage system and its method and data analysis system and its method
CN102708130B (en) Calculate the easily extensible engine that fine point of user is mated for offer
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN106779457A (en) A kind of rating business credit method and system
CN109615487A (en) Products Show method, apparatus, equipment and storage medium based on user behavior
CN105490823B (en) data processing method and device
CN107273436A (en) The training method and trainer of a kind of recommended models
CN106919579A (en) A kind of information processing method and device, equipment
CN107292463A (en) A kind of method and system that the project evaluation is carried out to application program
CN109978033A (en) The method and apparatus of the building of biconditional operation people's identification model and biconditional operation people identification
CN108073659A (en) A kind of love and marriage object recommendation method and device
CN110246007A (en) A kind of Method of Commodity Recommendation and device
CN108764332A (en) A kind of Channel Quality analysis method, computing device and storage medium
CN103150696A (en) Method and device for selecting potential customer of target value-added service
CN103646049B (en) The method and system of automatically generated data form
CN110781308A (en) Anti-fraud system for building knowledge graph based on big data
CN107408114A (en) Based on transactions access pattern-recognition connection relation
CN107563621A (en) A kind of website user's wastage analysis method and device
CN108648068A (en) A kind of assessing credit risks method and system
CN109325845A (en) A kind of financial product intelligent recommendation method and system
CN108021651A (en) Network public opinion risk assessment method and device
CN106651547A (en) Data processing method and apparatus
CN109767269A (en) A kind for the treatment of method and apparatus of game data
Sharaf Addin et al. Customer mobile behavioral segmentation and analysis in telecom using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20180406

WW01 Invention patent application withdrawn after publication