CN108629381A - Crowd's screening technique based on big data and terminal device - Google Patents

Crowd's screening technique based on big data and terminal device Download PDF

Info

Publication number
CN108629381A
CN108629381A CN201810455659.4A CN201810455659A CN108629381A CN 108629381 A CN108629381 A CN 108629381A CN 201810455659 A CN201810455659 A CN 201810455659A CN 108629381 A CN108629381 A CN 108629381A
Authority
CN
China
Prior art keywords
sample
model
characteristic value
screening
output characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810455659.4A
Other languages
Chinese (zh)
Inventor
卢少烽
洪博然
徐亮
阮晓雯
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810455659.4A priority Critical patent/CN108629381A/en
Priority to PCT/CN2018/097561 priority patent/WO2019218482A1/en
Publication of CN108629381A publication Critical patent/CN108629381A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention is suitable for technical field of data processing, provides crowd's screening technique, terminal device and computer readable storage medium based on big data, including:Multiple sample informations of sample population are obtained, sample information includes sample environment feature and sample characteristics;Multiple sample informations and preset pretreated model are fitted, and using the pretreated model after fitting as screening model;Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, obtain output characteristic value set corresponding with the multiple personal feature, the output characteristic value set includes multiple output characteristic values;The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and determines target group corresponding with the target signature value set from the crowd to be screened.The present invention, which is realized, screens crowd to be screened according to multiple features, improves the accuracy rate of crowd's screening.

Description

Crowd's screening technique based on big data and terminal device
Technical field
The invention belongs to technical field of data processing, more particularly to crowd's screening technique, terminal device based on big data And computer readable storage medium.
Background technology
In actual life, often there is the demand that partial target crowd is filtered out from the huge crowd of radix, screening Foundation is generally characterized as with some, such as based on some county owner, it is more than quinquagenary target person to filter out the age Group.And statistics is the science about understanding objective phenomenon total number feature and quantitative relation, is needing to filter out because of some It when state changes and generates the target group of particularity, needs to use statistics, find out associated related to state change Feature, to filter out target group according to correlated characteristic.
But in existing crowd's screening technique, the correlated characteristic that can be found out according to statistics is less, to root It is relatively low according to the accuracy that the crowd that correlated characteristic filters out is target group.For example it is screened to the crowd with chronic obstructive pulmonary disease When, the main statistical result according to patient groups, if it is more to count number of some areas adjacent with chronic obstructive pulmonary disease, Or it is more positioned at the number with chronic obstructive pulmonary disease of some age range, then using the region or the age range as correlated characteristic Target group is filtered out, but the correlated characteristic of actually chronic obstructive pulmonary disease is more, therefore actual target group should be not limited to the region Or the age range is screened.To sum up, existing crowd's screening technique can foundation correlated characteristic it is few, screening accuracy it is low.
Invention content
In view of this, an embodiment of the present invention provides crowd's screening technique, terminal device and computers based on big data Readable storage medium storing program for executing, with solve in the prior art carry out crowd's screening when can foundation correlated characteristic it is few, the accuracy of screening is low The problem of.
The first aspect of the embodiment of the present invention provides a kind of crowd's screening technique based on big data, including:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics, The sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and by the pretreatment mould after fitting Type is as screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, Output characteristic value set corresponding with the multiple personal feature is obtained, the output characteristic value set includes multiple output features Value;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and Target group corresponding with the target signature value set is determined from the crowd to be screened.
The second aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer program that can run on the processor, when the processor executes the computer program Realize following steps:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics, The sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and by the pretreatment mould after fitting Type is as screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, Output characteristic value set corresponding with the multiple personal feature is obtained, the output characteristic value set includes multiple output features Value;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and Target group corresponding with the target signature value set is determined from the crowd to be screened.
The third aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, the computer program to realize following steps when being executed by processor:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics, The sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and by the pretreatment mould after fitting Type is as screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, Output characteristic value set corresponding with the multiple personal feature is obtained, the output characteristic value set includes multiple output features Value;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and Target group corresponding with the target signature value set is determined from the crowd to be screened.
Existing advantageous effect is the embodiment of the present invention compared with prior art:
Multiple sample informations that the embodiment of the present invention passes through acquisition sample population, wherein each sample information includes sample Environmental characteristic and sample characteristics, sample characteristics are used to describe the corresponding individual of sample of sample information belonging to sample characteristics Individual state, after acquisition, multiple sample informations and preset pretreated model are fitted, and will fitting complete Pretreated model is exported as screening model, then carries out the screening to crowd to be screened by screening model, and acquisition waits for Multiple personal features are input to screening model by multiple personal features of screening crowd, obtain exporting after screening model calculates Output characteristic value set corresponding with multiple personal features, characteristic value collection includes multiple output characteristic values, finally will output The output characteristic value for meeting preset condition in characteristic value collection is added to target signature value set, is determined from crowd to be screened Target group corresponding with target signature value set, the embodiment of the present invention train screening mould by being based on multiple sample informations Type improves the accuracy rate of crowd's screening so as to be screened to crowd to be screened according to multiple features.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention one provides;
Fig. 2 is the implementation flow chart of crowd's screening technique provided by Embodiment 2 of the present invention based on big data;
Fig. 3 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention three provides;
Fig. 4 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention four provides;
Fig. 5 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention five provides;
Fig. 6 is the structure diagram for the terminal device that the embodiment of the present invention six provides;
Fig. 7 is the schematic diagram for the terminal device that the embodiment of the present invention seven provides.
Specific implementation mode
In being described below, for illustration and not for limitation, it is proposed that such as tool of particular system structure, technology etc Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention can also be realized in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Fig. 1 shows the implementation process of crowd's screening technique provided in an embodiment of the present invention based on big data, is described in detail such as Under:
In S101, multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample Eigen value, the sample characteristics is for describing the individual that the sample information corresponds to individual of sample in the sample population State.
In embodiments of the present invention, in order to filter out the target group of demand, multiple samples in sample population are obtained first Multiple sample informations of individual, one of sample information correspond to an individual of sample.Sample population determines according to sample conditions, Sample conditions include samples sources and samples selection factor etc., and samples sources determine the source place that sample population is chosen, such as sample This source is the individual archives in some city, and samples selection factor determines the sample of the composition sample population selected from source place Body, such as samples selection factor are male of the age at 50 years old or more, therefore the sample of samples sources and samples selection factor composition This condition is multiple individual of sample composition that the age is filtered out in the individual archives in some city in 50 years old or more male Sample population.Certainly, sample conditions are not limited to above-mentioned example, you can are determined according to practical application scene.In order to be promoted Accuracy to target group's screening, therefore after the corresponding sample conditions of sample population determine, it can be arranged and be obtained from sample population The quantity of the sample information taken, makes the quantity of sample information be in the preset order of magnitude, and the order of magnitude can be according to accuracy requirement certainly By being arranged.For example the order of magnitude can be 1,000.Sample information in sample population includes sample environment feature and sample characteristics Value, wherein sample environment feature is used to indicate the ambient condition that the sample information corresponds to individual of sample, may include one or more A subcharacter, such as sample environment feature may include the age of individual of sample, family's Types of Drinking Water, marital status, occupation, blood Pressure situation, respiratory rate, diabetic conditions, Alcohol Consumption Status and smoking state etc., similarly, the son that sample environment feature includes are special Sign can also be determined according to practical application scene.Sample characteristics in sample information is used to indicate corresponding with sample information Individual of sample individual state, wherein individual state in the embodiment of the present invention screen the purpose of target group it is associated, act For example, if desired filter out the target group influenced by anxiety disorder, then whether individual state is related with anxiety disorder to individual, The value range of sample characteristics can be then limited to 0 and 1 two integer, sample characteristics is the corresponding individual of sample of 1 instruction It is influenced by anxiety disorder, sample characteristics, which is the corresponding individual of sample of 0 instruction, not to be influenced by anxiety disorder.
Preferably, multiple blood information samples are obtained from health account or certain database.Usually, health account or Storage has different types of individual of sample in feature database such as hospital database, and the order of magnitude for storing information is high, and sample The feature that environmental characteristic includes is more complete, and sample characteristics has predefined, and accuracy is higher, therefore can be directly from health Sample population is determined in archives or certain database, then the acquisition of multiple sample informations is carried out based on sample population.
In S102, the multiple sample information and preset pretreated model are fitted, and by the institute after fitting Pretreated model is stated as screening model.
Since the sample environment feature under sample information may include multiple subcharacters, and individual state and corresponding sample Multiple subcharacters of individual are related, i.e., multiple subcharacters can have an impact the generation of sample characteristics.For example, if some The adult male that this individual represents is in married state, gives birth to three offsprings, occupation change frequency is high, and is in and drinks With the state of smoking, then the probability of the individual of sample anxiety attack, i.e., the probability that sample characteristics is 1 are larger.In above-mentioned example In, marital status, children's number, occupation, Alcohol Consumption Status and smoking state that the sample environment feature of individual of sample includes with should The sample characteristics of individual of sample is associated.But under common situation, it can not know that individual state is influenced by multiple subcharacters Changing rule, therefore in embodiments of the present invention, by the fixed multiple sample informations of sample environment feature and sample characteristics It is fitted with preset pretreated model, pretreated model is constantly trained in fit procedure, finally completed fitting pre- It is screening model to handle model output.
In S103, multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to described Screening model, obtains output characteristic value set corresponding with the multiple personal feature, and the output characteristic value set includes more A output characteristic value.
After generating screening model by multiple sample informations, crowd to be screened is analyzed, first, obtains crowd to be screened Multiple personal features, personal feature is corresponding with the individual to be screened in crowd to be screened.Usually, personal feature and sample The format of the sample environment feature of body is identical, i.e., the type comprising subcharacter is identical.Optionally, after getting personal feature, Personal feature is handled, the subcharacter in personal feature with sample environment feature identical type is only retained.For example individual is special Sign includes name, age, gender, Alcohol Consumption Status and smoking state, and sample environment feature includes age, Alcohol Consumption Status and smoking Situation is first handled personal feature then before personal feature is input to screening model, is only obtained and is planted in personal feature Class be the age, Alcohol Consumption Status and smoking state subcharacter, reduce the complexity subsequently calculated.If in addition, treated The value of some subcharacter is sky in body characteristics, then take in the sample environment feature of multiple individual of sample the average value of the subcharacter or Value of the preset value as the subcharacter in personal feature, prevents from subsequently calculating error, improves the stability of calculating.
After determining multiple personal features, multiple personal features are input to screening model, each personal feature is through screening Corresponding output characteristic value can be obtained after calculating in model, and output characteristic value is identical as the format of the sample characteristics of sample information, But it is not limited to the value range of sample characteristics.Obtain the corresponding multiple output characteristic value output knots of multiple personal features Fruit generates output characteristic value set.
In S104, the output characteristic value that preset condition is met in the output characteristic value set is added to target signature Value set, and target group corresponding with the target signature value set is determined from the crowd to be screened.
After output characteristic value set determines, the output characteristic value for meeting preset condition is obtained from output characteristic value set, And the output characteristic value is added to target signature value set.In embodiments of the present invention, preset condition can set output is special The output characteristic value in preset ratio is added to target signature value set in value indicative set, such as will be in output characteristic value set Multiple output characteristic values carry out size sequence, and after the completion of sequence, the output characteristic value that ratio is in preceding 10% adds Into target signature value set.In addition, preset condition can also be set according to characteristic threshold value, the determination method of characteristic threshold value is rear Text is specifically addressed.Since output characteristic value corresponds to the individual to be screened in crowd to be screened, therefore according to object feature value collection The target group that can determine that in crowd to be screened is closed, the screening to crowd to be screened is completed.
By embodiment illustrated in fig. 1 it is found that in embodiments of the present invention, obtaining multiple individual of sample in sample population first Corresponding multiple sample informations, wherein sample information includes sample environment feature and sample characteristics, and sample characteristics is for retouching Multiple sample informations and preset pretreated model are fitted by the individual state for stating individual of sample, and fitting is completed Pretreated model output is screening model, carries out the screening to crowd to be screened, obtains the multiple individuals to be screened of crowd to be screened Multiple personal features, multiple personal features are input to screening model, obtain the output being combined by multiple output characteristic values The output characteristic value for meeting preset condition in output characteristic value set is finally added to object feature value collection by characteristic value collection It closes, determines that multiple individuals to be screened corresponding with target signature value set are realized as target group from crowd to be screened Automatic screening, and improve the accuracy screened to target group.
It is on the basis of the embodiment of the present invention one, to intending multiple sample informations with pretreated model shown in Fig. 2 It closes, and the step of pretreated model after fitting is refined to obtain as the process of screening model.The embodiment of the present invention carries The implementation flow chart of the crowd's screening based on big data supplied, as shown, crowd's screening technique may comprise steps of:
In S201, the multiple sample information is input to the pretreated model, with the training pretreated model, Wherein, using the sample environment feature of the sample information as the input parameter of the pretreated model, by the sample information Reference parameter of the sample characteristics as the pretreated model.
When multiple sample informations are input to pretreated model, using the sample environment feature in sample information as pre- place The input parameter for managing model, is specifically based on the sample characteristics in sample information as the reference parameter of pretreated model Multiple sample informations build sample information collection, are (Charactersenviron1, Valueenviron1), (Charactersenviron2, Valueenviron2)……(Charactersenvironn, Valueenvironn), wherein CharactersenvironiRepresent i-th of sample The sample environment feature of this information may include multiple subcharacters, Value under itenvironiRepresent the sample of i-th of sample information Eigen value, n represent the total number of the multiple sample informations obtained from sample population.Sample information collection is input to pretreatment mould Type, to train pretreated model, in the embodiment of the present invention, pretreated model is to the calculation formula of input parameter:
In above-mentioned formula,It represents to input parameter as CharactersenvironiPredicted value, be By CharactersenvironiAfter pretreated model being input to as input parameter, the output result after pretreated model calculating.It is public F () in formula indicates that a function for being present in function space, function space are referred to from a set to another set The set of the function of given type, i.e. f () function are initially at unknown state, and K then indicates that there are on K in pretreated model The f () function stated needs after all calculated results of f () function add up, can just obtain final predicted value.
In reality to the training process of pretreated model, above-mentioned formula is depended on, in embodiments of the present invention, using order The method of study learns f () function, so that finally obtained K f () function meets multiple sample letters to the maximum extent Data in breath.For example, it is Characters in input parameterenvironiOn the basis of, the predicted value prediction of t wheels is carried out, And in the predicted value prediction for carrying out t wheels, retains the predicted value prediction result of t-1 wheels, i.e., train pre-processing according to order Model so that predicted valueWith actual value (Valueenvironi) between gap be gradually reduced, be specifically shown in down:
……
In above-mentioned formulaIt is to provide input parameter as CharactersenvironiOn the basis of, into Predicted value after the prediction of row t wheels.In order to determine required in order learning process f () function, make its as possible close to In sample information collection, therefore majorized function is built, specific formula is seen below:
In above-mentioned formula, ValueenvironiIt is that sample information is concentrated and input parameter CharactersenvironiIt is corresponding It is the sample characteristics in sample information with reference to parameter, the parameter of function as an optimization in embodiments of the present invention.Optimize letter Ω (f in number formulat) it is regular terms, D is constant term, wherein regular terms controls the training degree of majorized function, prevents sample Information collection and pretreated model over-fitting;Constant term is a constant, and setting constant term is to limit the numerical value of majorized function Range.It is noted thatFor error letter Number, to the process that majorized function optimizes, that is, is to determine that suitable f () function makes the value of above-mentioned error function subtract as possible Small process.
In embodiments of the present invention, in order to just be optimized to majorized function above calculating level, to above-mentionedIt is unfolded, and is defined:
Majorized function after expansion is:
Since constant term substantially has no effect on the optimization process of majorized function, therefore extract in the majorized function after expansion Constant term, produces the Reduce function that the majorized function after expansion is taken turns in t, and formula is as follows:
In final Reduce function, the output valve that Reduce function obtains depends on giAnd hiValue, therefore can determine quickly Suitable f () function, improves trained simplicity, in embodiments of the present invention, the method learnt by above-mentioned order with And majorized function (Reduce function also can) is to train pretreated model.
It is the screening model by the pretreated model output after training in S202.
When sample information concentrates all input parameters and fully enters pretreated model with reference to parameter, and pre-process mould After the completion of type training, the pretreated model that training is completed is carried out as screening model (f () function that mainly training is completed) Output.When needing to carry out the screening of crowd to be screened, by the personal feature Characters of individual to be screenedenvironxInput screening Model, you can pass through the calculation formula after optimizing in screening model By predicted value is calculated
By embodiment illustrated in fig. 2 it is found that in embodiments of the present invention, using the sample environment feature in sample information as The input parameter of pretreated model, using the sample characteristics of sample information as the reference parameter of pretreated model, thus will be more A sample information is input to pretreated model, and to train pretreated model, the pretreated model for finally completing training is as sieve Modeling type is exported, and the compactness of screening model and multiple sample informations is improved, and improve by screening model into The accuracy of pedestrian's group's screening.
It is on the basis of the embodiment of the present invention one, to preset condition will be met in output characteristic value set shown in Fig. 3 Output characteristic value is added to a kind of detailed process obtained after target signature value set is refined.An embodiment of the present invention provides The implementation flow chart of crowd's screening technique based on big data, as shown in figure 3, crowd's screening technique may include following step Suddenly:
In S301, characteristic threshold value is obtained, the characteristic threshold value is for judging it is described whether the output characteristic value meets Preset condition.
In embodiments of the present invention, after screening model determines, the multiple personal features for obtaining crowd to be screened are needed, and certainly It is dynamic that multiple personal features are input to screening model, obtain the output characteristic value set for including multiple output characteristic values.In order to obtain The output characteristic value for meeting preset condition in output characteristic value set is obtained, characteristic threshold value is obtained first, meets the defeated of preset condition It is the output characteristic value more than or equal to characteristic threshold value to go out characteristic value.
Be on the basis of step S301, and in the value of sample characteristics it is that the First Eigenvalue or second are special shown in Fig. 4 In the case of value indicative, to obtaining a kind of detailed process obtained after characteristic threshold value refines.An embodiment of the present invention provides bases In the implementation flow chart of crowd's screening technique of big data, as shown in figure 4, crowd's screening technique may comprise steps of:
In S401, big data analysis is carried out to the multiple sample information, determines that sample characteristics value is described the The quantitative proportion that the sample information of one characteristic value is occupied in the multiple sample information.
For different application scenarios, the value range of sample characteristics is it is possible that difference, in the embodiment of the present invention In, using the value of sample characteristics as the First Eigenvalue or Second Eigenvalue, and the First Eigenvalue is more than Second Eigenvalue Situation illustrates.For example, sample characteristics value is that the First Eigenvalue indicates corresponding individual of sample by anxiety disorder It influences, sample characteristics value is that Second Eigenvalue indicates that corresponding individual of sample is not affected by anxiety disorder influence, and screens purpose It is to filter out the target group influenced by anxiety disorder from crowd to be screened, then big data is carried out to multiple sample informations first Analysis extracts the first sample quantity that sample characteristics value is the First Eigenvalue, to obtain first sample quantity in institute There is the quantitative proportion occupied in sample information.
In S402, the spy is calculated according to the quantitative proportion, the First Eigenvalue and the Second Eigenvalue Levy threshold value.
Since the value range of sample characteristics is defined in the First Eigenvalue and Second Eigenvalue, therefore calculate the First Eigenvalue Difference between Second Eigenvalue, and the First Eigenvalue is subtracted into the difference and the product of quantitative proportion, obtain characteristic threshold value. For example, if the First Eigenvalue is 2, the first quantitative proportion is 30%, Second Eigenvalue 1, and the first quantitative proportion is 70%, Then difference is 2-1=1, and characteristic threshold value is 2-1 × 30%=1.7.Certainly, above-mentioned computational methods are only applicable to sample characteristics Value be the First Eigenvalue or Second Eigenvalue, and the First Eigenvalue be more than Second Eigenvalue the case where, it is possible for other Sample characteristics can be extended on the basis of above-mentioned computational methods, and the embodiment of the present invention is no longer repeated.
It is on the basis of step S301, to obtaining the another kind obtained after characteristic threshold value refines specifically shown in Fig. 5 Process.An embodiment of the present invention provides the implementation flow charts of crowd's screening technique based on big data, as shown in figure 5, the crowd Screening technique may comprise steps of:
In S501, the sample environment feature of the multiple sample information is input to the screening model, and obtain institute State multiple result characteristic values corresponding with the sample environment feature of the multiple sample information of screening model output.
In embodiments of the present invention, after screening model generates, multiple sample informations are input to screening model.Specifically will The sample environment feature of multiple sample informations is input to screening model as input parameter, but not by the sample of multiple sample informations Reference parameter of the eigen value as screening model, but by the calculation formula of screening model directly to multiple sample informations Sample environment feature carries out that multiple output parameters are calculated, i.e., multiple result characteristic values corresponding with sample environment feature.One As for, it is poor that the original sample characteristics of result characteristic value of the sample information after screening model calculates and sample information exists It is different.
In S502, the multiple result characteristic value is ranked up, generates result characteristic value sequence.
After getting multiple result characteristic values, multiple result characteristic values are ranked up according to numerical values recited, generate numerical value The row head of sequence, sequence of values is the maximum result characteristic value of numerical value.If it is noted that in multiple result characteristic values First result characteristic value is identical as the second result characteristic value, then corresponding according to the first result characteristic value and the second result characteristic value The input order of sample information generates characteristic value sequence.Specifically, order writing mechanism is set, when some result characteristic value needs When wanting write-in characteristic value sequence, it whether there is existing result feature identical with the result characteristic value in judging characteristic value sequence Value is looked into if existing result characteristic value identical with the result characteristic value is not present according to the numerical values recited of the result characteristic value The existing result characteristic value more than the result characteristic value minimum in characteristic value sequence is found out, and after this has result characteristic value Position the result characteristic value is written;If in the presence of existing result characteristic value identical with the result characteristic value, in the existing knot The result characteristic value is written in position after fruit characteristic value, if there are multiple existing result features identical with the result characteristic value Value, then search the existing result characteristic value at multiple existing result characteristic value ends, and in the position after having result characteristic value The result characteristic value is written.
In S503, preset screening ratio is obtained, and found out and the screening in the result characteristic value sequence The corresponding result characteristic value of ratio, exports as the characteristic threshold value.
After characteristic value sequence generates, preset screening ratio is obtained, and searched and the screening ratio in characteristic value sequence Result characteristic value output in the screening position is characterized threshold value by corresponding screening position.For example, characteristic value sequence In have been written into 300 result characteristic values, and screen ratio be 10%, then screen position be the 30th, then extract characteristic value sequence In positioned at the 30th result characteristic value numerical value as characteristic threshold value.Optionally, screening ratio can be according to big data analysis side Method is determined, such as can count nationwide multiple national sample informations, and each whole nation sample information includes sample environment Feature and sample characteristics, first sample characteristics value are the Screening Samples quantity of numerical value corresponding with screening purpose, to The ratio that Screening Samples quantity is occupied in all national sample informations is determined, using the ratio as screening ratio, by counting greatly It may make the general applicability of the characteristic threshold value of generation higher according to analysis method.Certainly, screening ratio also can artificially be preset.
In S302, the output feature for being greater than or equal to the characteristic threshold value in the output characteristic value set is extracted Value, and the output characteristic value extracted is added to the target signature value set.
After characteristic threshold value determines, the output feature more than or equal to characteristic threshold value is extracted from output characteristic value set Value, then the output characteristic value extracted is corresponding with the target individual of screening, therefore the output characteristic value extracted is added to target Characteristic value collection, to determine target group corresponding with object feature value subsequently from crowd to be screened.
By embodiment illustrated in fig. 3 it is found that in embodiments of the present invention, feature threshold is obtained by using different methods It is worth, and extracts the output characteristic value more than or equal to characteristic threshold value from output characteristic value set, and the output that will be extracted Characteristic value is added to target signature value set, by setting characteristic threshold value, improves the simplicity of target signature value set generation And efficiency.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
Fig. 6 shows that a kind of structure diagram of terminal device provided in an embodiment of the present invention, the terminal device include each Unit is used to execute each step in the corresponding embodiments of Fig. 1.Referring specifically to Fig. 1 and the phase in the embodiment corresponding to Fig. 1 Close description.For convenience of description, only the parts related to this embodiment are shown.
Referring to Fig. 6, the terminal device includes:
First acquisition unit 61, multiple sample informations for obtaining sample population, the sample information include sample ring Border feature and sample characteristics, the sample characteristics correspond to sample for describing the sample information in the sample population The individual state of individual;
Fitting unit 62, for the multiple sample information and preset pretreated model to be fitted, and will fitting The pretreated model afterwards is as screening model;
Second acquisition unit 63, multiple personal features for obtaining crowd to be screened, and by the multiple personal feature It is input to the screening model, obtains output characteristic value set corresponding with the multiple personal feature, the output characteristic value Set includes multiple output characteristic values;
Target determination unit 64, for adding the output characteristic value for meeting preset condition in the output characteristic value set To target signature value set, and target group corresponding with the target signature value set is determined from the crowd to be screened.
Optionally, the fitting unit 62, including:
Input unit, for the multiple sample information to be input to the pretreated model, with the training pretreatment Model, wherein using the sample environment feature of the sample information as the input parameter of the pretreated model, by the sample Reference parameter of the sample characteristics of information as the pretreated model;
Output unit is the screening model for the pretreated model output after training.
Optionally, the target determination unit 64, including:
Threshold value acquiring unit, for obtaining characteristic threshold value, whether the characteristic threshold value is for judging the output characteristic value Meet the preset condition;
Extraction unit, for extracting the output spy in the output characteristic value set more than or equal to the characteristic threshold value Value indicative, and the output characteristic value extracted is added to the target signature value set.
Optionally, sample characteristics be the First Eigenvalue or Second Eigenvalue, the threshold value acquiring unit, including:
Analytic unit determines sample characteristics value for institute for carrying out big data analysis to the multiple sample information State the quantitative proportion that the sample information of the First Eigenvalue is occupied in the multiple sample information;
Computing unit, for calculating institute according to the quantitative proportion and the First Eigenvalue and the Second Eigenvalue State characteristic threshold value.
Optionally, the threshold value acquiring unit, including:
Feature input unit, for the sample environment feature of the multiple sample information to be input to the screening model, And obtain multiple result characteristic values corresponding with the sample environment feature of the multiple sample information of the screening model output;
Sequencing unit generates result characteristic value sequence for being ranked up to the multiple result characteristic value;
Threshold value output unit, for obtaining preset screening ratio, and find out in the result characteristic value sequence with The corresponding result characteristic value of the screening ratio, exports as the characteristic threshold value.
Therefore, terminal device provided in an embodiment of the present invention is by intending multiple sample informations with pretreated model It closes, obtains screening model, and carry out the determination of target group according to screening model so that crowd's screening can be according to multiple features It is screened, improves the accuracy of crowd's screening.
Fig. 7 is the schematic diagram of terminal device provided in an embodiment of the present invention.As shown in fig. 7, the terminal device 7 of the embodiment Including:Processor 70, memory 71 and it is stored in the calculating that can be run in the memory 71 and on the processor 70 Machine program 72, for example, terminal device control program.The processor 70 is realized above-mentioned each when executing the computer program 72 Step in a crowd's screening technique embodiment based on big data, such as step S101 to S104 shown in FIG. 1.Alternatively, institute State the function that each unit in above-mentioned each device embodiment is realized when processor 70 executes the computer program 72, such as Fig. 6 institutes Show the function of unit 61 to 64.
Illustratively, the computer program 72 can be divided into one or more units, one or more of Unit is stored in the memory 71, and is executed by the processor 70, to complete the present invention.One or more of lists Member can complete the series of computation machine program instruction section of specific function, and the instruction segment is for describing the computer journey Implementation procedure of the sequence 72 in the terminal device 7.For example, the computer program 72 can be divided into the first acquisition list Member, fitting unit, second acquisition unit and target determination unit, each unit concrete function are as follows:
First acquisition unit, multiple sample informations for obtaining sample population, the sample information includes sample environment Feature and sample characteristics, the sample characteristics correspond to sample for describing the sample information in the sample population The individual state of body;
Fitting unit, for the multiple sample information and preset pretreated model to be fitted, and will be after fitting The pretreated model as screening model;
Second acquisition unit, multiple personal features for obtaining crowd to be screened, and the multiple personal feature is defeated Enter to the screening model, obtains output characteristic value set corresponding with the multiple personal feature, the output characteristic value collection Conjunction includes multiple output characteristic values;
Target determination unit, for the output characteristic value for meeting preset condition in the output characteristic value set to be added to Target signature value set, and target group corresponding with the target signature value set is determined from the crowd to be screened.
The terminal device 7 can be that the calculating such as desktop PC, notebook, palm PC and cloud server are set It is standby.The terminal device 7 may include, but be not limited only to, processor 70, memory 71.It will be understood by those skilled in the art that figure 7 be only the example of terminal device 7, does not constitute the restriction to terminal device 7, may include than illustrating more or fewer portions Part either combines certain components or different components, such as the terminal device 7 can also include input-output equipment, net Network access device, bus etc..
Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng.
The memory 71 can be the internal storage unit of the terminal device 7, such as the hard disk of terminal device 7 or interior It deposits.The memory 71 can also be to be equipped on the External memory equipment of the terminal device 7, such as the terminal device 7 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 71 can also both include the storage inside list of the terminal device 7 Member also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device 7 Other programs and data.The memory 71 can be also used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different functions Unit is completed, i.e., the internal structure of described device is divided into different functional units, with complete it is described above whole or Partial function.Each functional unit in embodiment can be integrated in a processing unit, can also be the independent object of each unit Reason exists, can also be during two or more units are integrated in one unit, and hardware both may be used in above-mentioned integrated unit Form realize, can also be realized in the form of SFU software functional unit.In addition, the specific name of each functional unit is also only Convenient for mutually distinguishing, the protection domain that is not intended to limit this application.The specific work process of unit in above system, can be with With reference to the corresponding process in preceding method embodiment, details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, depends on the specific application and design constraint of technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of unit is stated, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple lists Member or component can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, Shown or discussed mutual coupling or direct-coupling or communication connection can be by some interfaces, device or unit INDIRECT COUPLING or communication connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can be stored in a computer read/write memory medium.Based on this understanding, the present invention realizes above-described embodiment side All or part of flow in method can also instruct relevant hardware to complete, the computer by computer program Program can be stored in a computer readable storage medium, and the computer program is when being executed by processor, it can be achieved that above-mentioned each The step of a embodiment of the method.Wherein, the computer program includes computer program code, and the computer program code can Think source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium can be with Including:Any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disc, light of the computer program code can be carried Disk, computer storage, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described computer-readable The content that medium includes can carry out increase and decrease appropriate according to legislation in jurisdiction and the requirement of patent practice, such as at certain A little jurisdictions, according to legislation and patent practice, computer-readable medium does not include electric carrier signal and telecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of crowd's screening technique based on big data, which is characterized in that including:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics, described Sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and the pretreated model after fitting is made For screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, are obtained Output characteristic value set corresponding with the multiple personal feature, the output characteristic value set include multiple output characteristic values;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and from institute It states and determines target group corresponding with the target signature value set in crowd to be screened.
2. crowd's screening technique as described in claim 1, which is characterized in that it is described by the multiple sample information with it is preset Pretreated model is fitted, and using the pretreated model after fitting as screening model, including:
The multiple sample information is input to the pretreated model, with the training pretreated model, wherein by the sample Input parameter of the sample environment feature of this information as the pretreated model makees the sample characteristics of the sample information For the reference parameter of the pretreated model;
It is the screening model by the pretreated model output after training.
3. crowd's screening technique as described in claim 1, which is characterized in that described to meet in the output characteristic value set The output characteristic value of preset condition is added to target signature value set, including:
Characteristic threshold value is obtained, the characteristic threshold value is for judging whether the output characteristic value meets the preset condition;
The output characteristic value for being greater than or equal to the characteristic threshold value in the output characteristic value set is extracted, and will be extracted Output characteristic value is added to the target signature value set.
4. crowd's screening technique as claimed in claim 3, which is characterized in that the sample characteristics is the First Eigenvalue or the Two characteristic values, the acquisition characteristic threshold value, including:
Big data analysis is carried out to the multiple sample information, determines that sample characteristics value is the described of the First Eigenvalue The quantitative proportion that sample information is occupied in the multiple sample information;
The characteristic threshold value is calculated according to the quantitative proportion, the First Eigenvalue and the Second Eigenvalue.
5. crowd's screening technique as claimed in claim 3, which is characterized in that the acquisition characteristic threshold value, including:
The sample environment feature of the multiple sample information is input to the screening model, and obtains the screening model output Multiple result characteristic values corresponding with the sample environment feature of the multiple sample information;
The multiple result characteristic value is ranked up, result characteristic value sequence is generated;
Preset screening ratio is obtained, and is found out in the result characteristic value sequence corresponding with the screening ratio described As a result characteristic value exports as the characteristic threshold value.
6. a kind of terminal device, which is characterized in that the terminal device includes memory, processor and is stored in the storage In device and the computer program that can run on the processor, the processor are realized as follows when executing the computer program Step:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics, described Sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and the pretreated model after fitting is made For screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, are obtained Output characteristic value set corresponding with the multiple personal feature, the output characteristic value set include multiple output characteristic values;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and from institute It states and determines target group corresponding with the target signature value set in crowd to be screened.
7. terminal device as claimed in claim 6, which is characterized in that described by the multiple sample information and preset pre- place Reason model is fitted, and using the pretreated model after fitting as screening model, including:
The multiple sample information is input to the pretreated model, with the training pretreated model, wherein by the sample Input parameter of the sample environment feature of this information as the pretreated model makees the sample characteristics of the sample information For the reference parameter of the pretreated model;
It is the screening model by the pretreated model output after training.
8. terminal device as claimed in claim 6, which is characterized in that described satisfaction to be preset in the output characteristic value set The output characteristic value of condition is added to target signature value set, including:
Characteristic threshold value is obtained, the characteristic threshold value is for judging whether the output characteristic value meets the preset condition;
The output characteristic value for being greater than or equal to the characteristic threshold value in the output characteristic value set is extracted, and will be extracted Output characteristic value is added to the target signature value set.
9. terminal device as claimed in claim 8, which is characterized in that the sample characteristics is that the First Eigenvalue or second are special Value indicative, the acquisition characteristic threshold value, including:
Big data analysis is carried out to the multiple sample information, determines that sample characteristics value is the described of the First Eigenvalue The quantitative proportion that sample information is occupied in the multiple sample information;
The characteristic threshold value is calculated according to the quantitative proportion and the First Eigenvalue and the Second Eigenvalue.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, feature to exist In the step of realization crowd's screening technique as described in any one of claim 1 to 5 when the computer program is executed by processor Suddenly.
CN201810455659.4A 2018-05-14 2018-05-14 Crowd's screening technique based on big data and terminal device Withdrawn CN108629381A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810455659.4A CN108629381A (en) 2018-05-14 2018-05-14 Crowd's screening technique based on big data and terminal device
PCT/CN2018/097561 WO2019218482A1 (en) 2018-05-14 2018-07-27 Big data-based population screening method and apparatus, terminal device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810455659.4A CN108629381A (en) 2018-05-14 2018-05-14 Crowd's screening technique based on big data and terminal device

Publications (1)

Publication Number Publication Date
CN108629381A true CN108629381A (en) 2018-10-09

Family

ID=63693185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810455659.4A Withdrawn CN108629381A (en) 2018-05-14 2018-05-14 Crowd's screening technique based on big data and terminal device

Country Status (2)

Country Link
CN (1) CN108629381A (en)
WO (1) WO2019218482A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726242A (en) * 2018-12-29 2019-05-07 陕西西部资信股份有限公司 Data processing method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610168B (en) * 2021-08-11 2024-05-14 平安科技(深圳)有限公司 Data processing method, device, equipment and medium
CN114334696B (en) * 2021-12-30 2024-03-05 中国电信股份有限公司 Quality detection method and device, electronic equipment and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106214120A (en) * 2016-08-19 2016-12-14 靳晓亮 A kind of methods for screening of glaucoma
CN107895596A (en) * 2016-12-19 2018-04-10 平安科技(深圳)有限公司 Risk Forecast Method and system
CN106706627A (en) * 2017-03-06 2017-05-24 温鹏 Combined application of hematin and beta-glucuronidase in detection of nasopharyngeal epithelial cell heterogeneity hyperplasia and reagent kit

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726242A (en) * 2018-12-29 2019-05-07 陕西西部资信股份有限公司 Data processing method and system

Also Published As

Publication number Publication date
WO2019218482A1 (en) 2019-11-21

Similar Documents

Publication Publication Date Title
CN109902222B (en) Recommendation method and device
WO2021155706A1 (en) Method and device for training business prediction model by using unbalanced positive and negative samples
WO2017206936A1 (en) Machine learning based network model construction method and apparatus
CN109857860A (en) File classification method, device, computer equipment and storage medium
CN111967971B (en) Bank customer data processing method and device
CN110968701A (en) Relationship map establishing method, device and equipment for graph neural network
CN107423442A (en) Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis
CN109376844A (en) The automatic training method of neural network and device recommended based on cloud platform and model
CN108898476A (en) A kind of loan customer credit-graded approach and device
JP6908302B2 (en) Learning device, identification device and program
CN108629381A (en) Crowd's screening technique based on big data and terminal device
CN113902131B (en) Updating method of node model for resisting discrimination propagation in federal learning
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
CN116305289B (en) Medical privacy data processing method, device, computer equipment and storage medium
Chen et al. Research on credit card default prediction based on k-means SMOTE and BP neural network
CN111062444A (en) Credit risk prediction method, system, terminal and storage medium
Wang et al. Research on maize disease recognition method based on improved resnet50
CN107223260A (en) Method for dynamicalling update grader complexity
CN113011895A (en) Associated account sample screening method, device and equipment and computer storage medium
CN110334720A (en) Feature extracting method, device, server and the storage medium of business datum
CN113380360B (en) Similar medical record retrieval method and system based on multi-mode medical record map
Li et al. A credit risk model with small sample data based on G-XGBoost
CN116090618A (en) Operation situation sensing method and device for power communication network
CN107402984B (en) A kind of classification method and device based on theme
Harikumar et al. Prescriptive analytics through constrained Bayesian optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20181009

WW01 Invention patent application withdrawn after publication