CN108629381A - Crowd's screening technique based on big data and terminal device - Google Patents
Crowd's screening technique based on big data and terminal device Download PDFInfo
- Publication number
- CN108629381A CN108629381A CN201810455659.4A CN201810455659A CN108629381A CN 108629381 A CN108629381 A CN 108629381A CN 201810455659 A CN201810455659 A CN 201810455659A CN 108629381 A CN108629381 A CN 108629381A
- Authority
- CN
- China
- Prior art keywords
- sample
- model
- characteristic value
- screening
- output characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention is suitable for technical field of data processing, provides crowd's screening technique, terminal device and computer readable storage medium based on big data, including:Multiple sample informations of sample population are obtained, sample information includes sample environment feature and sample characteristics;Multiple sample informations and preset pretreated model are fitted, and using the pretreated model after fitting as screening model;Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, obtain output characteristic value set corresponding with the multiple personal feature, the output characteristic value set includes multiple output characteristic values;The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and determines target group corresponding with the target signature value set from the crowd to be screened.The present invention, which is realized, screens crowd to be screened according to multiple features, improves the accuracy rate of crowd's screening.
Description
Technical field
The invention belongs to technical field of data processing, more particularly to crowd's screening technique, terminal device based on big data
And computer readable storage medium.
Background technology
In actual life, often there is the demand that partial target crowd is filtered out from the huge crowd of radix, screening
Foundation is generally characterized as with some, such as based on some county owner, it is more than quinquagenary target person to filter out the age
Group.And statistics is the science about understanding objective phenomenon total number feature and quantitative relation, is needing to filter out because of some
It when state changes and generates the target group of particularity, needs to use statistics, find out associated related to state change
Feature, to filter out target group according to correlated characteristic.
But in existing crowd's screening technique, the correlated characteristic that can be found out according to statistics is less, to root
It is relatively low according to the accuracy that the crowd that correlated characteristic filters out is target group.For example it is screened to the crowd with chronic obstructive pulmonary disease
When, the main statistical result according to patient groups, if it is more to count number of some areas adjacent with chronic obstructive pulmonary disease,
Or it is more positioned at the number with chronic obstructive pulmonary disease of some age range, then using the region or the age range as correlated characteristic
Target group is filtered out, but the correlated characteristic of actually chronic obstructive pulmonary disease is more, therefore actual target group should be not limited to the region
Or the age range is screened.To sum up, existing crowd's screening technique can foundation correlated characteristic it is few, screening accuracy it is low.
Invention content
In view of this, an embodiment of the present invention provides crowd's screening technique, terminal device and computers based on big data
Readable storage medium storing program for executing, with solve in the prior art carry out crowd's screening when can foundation correlated characteristic it is few, the accuracy of screening is low
The problem of.
The first aspect of the embodiment of the present invention provides a kind of crowd's screening technique based on big data, including:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics,
The sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and by the pretreatment mould after fitting
Type is as screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model,
Output characteristic value set corresponding with the multiple personal feature is obtained, the output characteristic value set includes multiple output features
Value;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and
Target group corresponding with the target signature value set is determined from the crowd to be screened.
The second aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in
In the memory and the computer program that can run on the processor, when the processor executes the computer program
Realize following steps:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics,
The sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and by the pretreatment mould after fitting
Type is as screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model,
Output characteristic value set corresponding with the multiple personal feature is obtained, the output characteristic value set includes multiple output features
Value;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and
Target group corresponding with the target signature value set is determined from the crowd to be screened.
The third aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, the computer program to realize following steps when being executed by processor:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics,
The sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and by the pretreatment mould after fitting
Type is as screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model,
Output characteristic value set corresponding with the multiple personal feature is obtained, the output characteristic value set includes multiple output features
Value;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and
Target group corresponding with the target signature value set is determined from the crowd to be screened.
Existing advantageous effect is the embodiment of the present invention compared with prior art:
Multiple sample informations that the embodiment of the present invention passes through acquisition sample population, wherein each sample information includes sample
Environmental characteristic and sample characteristics, sample characteristics are used to describe the corresponding individual of sample of sample information belonging to sample characteristics
Individual state, after acquisition, multiple sample informations and preset pretreated model are fitted, and will fitting complete
Pretreated model is exported as screening model, then carries out the screening to crowd to be screened by screening model, and acquisition waits for
Multiple personal features are input to screening model by multiple personal features of screening crowd, obtain exporting after screening model calculates
Output characteristic value set corresponding with multiple personal features, characteristic value collection includes multiple output characteristic values, finally will output
The output characteristic value for meeting preset condition in characteristic value collection is added to target signature value set, is determined from crowd to be screened
Target group corresponding with target signature value set, the embodiment of the present invention train screening mould by being based on multiple sample informations
Type improves the accuracy rate of crowd's screening so as to be screened to crowd to be screened according to multiple features.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some
Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention one provides;
Fig. 2 is the implementation flow chart of crowd's screening technique provided by Embodiment 2 of the present invention based on big data;
Fig. 3 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention three provides;
Fig. 4 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention four provides;
Fig. 5 is the implementation flow chart for crowd's screening technique based on big data that the embodiment of the present invention five provides;
Fig. 6 is the structure diagram for the terminal device that the embodiment of the present invention six provides;
Fig. 7 is the schematic diagram for the terminal device that the embodiment of the present invention seven provides.
Specific implementation mode
In being described below, for illustration and not for limitation, it is proposed that such as tool of particular system structure, technology etc
Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific
The present invention can also be realized in the other embodiments of details.In other situations, it omits to well-known system, device, electricity
The detailed description of road and method, in case unnecessary details interferes description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Fig. 1 shows the implementation process of crowd's screening technique provided in an embodiment of the present invention based on big data, is described in detail such as
Under:
In S101, multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample
Eigen value, the sample characteristics is for describing the individual that the sample information corresponds to individual of sample in the sample population
State.
In embodiments of the present invention, in order to filter out the target group of demand, multiple samples in sample population are obtained first
Multiple sample informations of individual, one of sample information correspond to an individual of sample.Sample population determines according to sample conditions,
Sample conditions include samples sources and samples selection factor etc., and samples sources determine the source place that sample population is chosen, such as sample
This source is the individual archives in some city, and samples selection factor determines the sample of the composition sample population selected from source place
Body, such as samples selection factor are male of the age at 50 years old or more, therefore the sample of samples sources and samples selection factor composition
This condition is multiple individual of sample composition that the age is filtered out in the individual archives in some city in 50 years old or more male
Sample population.Certainly, sample conditions are not limited to above-mentioned example, you can are determined according to practical application scene.In order to be promoted
Accuracy to target group's screening, therefore after the corresponding sample conditions of sample population determine, it can be arranged and be obtained from sample population
The quantity of the sample information taken, makes the quantity of sample information be in the preset order of magnitude, and the order of magnitude can be according to accuracy requirement certainly
By being arranged.For example the order of magnitude can be 1,000.Sample information in sample population includes sample environment feature and sample characteristics
Value, wherein sample environment feature is used to indicate the ambient condition that the sample information corresponds to individual of sample, may include one or more
A subcharacter, such as sample environment feature may include the age of individual of sample, family's Types of Drinking Water, marital status, occupation, blood
Pressure situation, respiratory rate, diabetic conditions, Alcohol Consumption Status and smoking state etc., similarly, the son that sample environment feature includes are special
Sign can also be determined according to practical application scene.Sample characteristics in sample information is used to indicate corresponding with sample information
Individual of sample individual state, wherein individual state in the embodiment of the present invention screen the purpose of target group it is associated, act
For example, if desired filter out the target group influenced by anxiety disorder, then whether individual state is related with anxiety disorder to individual,
The value range of sample characteristics can be then limited to 0 and 1 two integer, sample characteristics is the corresponding individual of sample of 1 instruction
It is influenced by anxiety disorder, sample characteristics, which is the corresponding individual of sample of 0 instruction, not to be influenced by anxiety disorder.
Preferably, multiple blood information samples are obtained from health account or certain database.Usually, health account or
Storage has different types of individual of sample in feature database such as hospital database, and the order of magnitude for storing information is high, and sample
The feature that environmental characteristic includes is more complete, and sample characteristics has predefined, and accuracy is higher, therefore can be directly from health
Sample population is determined in archives or certain database, then the acquisition of multiple sample informations is carried out based on sample population.
In S102, the multiple sample information and preset pretreated model are fitted, and by the institute after fitting
Pretreated model is stated as screening model.
Since the sample environment feature under sample information may include multiple subcharacters, and individual state and corresponding sample
Multiple subcharacters of individual are related, i.e., multiple subcharacters can have an impact the generation of sample characteristics.For example, if some
The adult male that this individual represents is in married state, gives birth to three offsprings, occupation change frequency is high, and is in and drinks
With the state of smoking, then the probability of the individual of sample anxiety attack, i.e., the probability that sample characteristics is 1 are larger.In above-mentioned example
In, marital status, children's number, occupation, Alcohol Consumption Status and smoking state that the sample environment feature of individual of sample includes with should
The sample characteristics of individual of sample is associated.But under common situation, it can not know that individual state is influenced by multiple subcharacters
Changing rule, therefore in embodiments of the present invention, by the fixed multiple sample informations of sample environment feature and sample characteristics
It is fitted with preset pretreated model, pretreated model is constantly trained in fit procedure, finally completed fitting pre-
It is screening model to handle model output.
In S103, multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to described
Screening model, obtains output characteristic value set corresponding with the multiple personal feature, and the output characteristic value set includes more
A output characteristic value.
After generating screening model by multiple sample informations, crowd to be screened is analyzed, first, obtains crowd to be screened
Multiple personal features, personal feature is corresponding with the individual to be screened in crowd to be screened.Usually, personal feature and sample
The format of the sample environment feature of body is identical, i.e., the type comprising subcharacter is identical.Optionally, after getting personal feature,
Personal feature is handled, the subcharacter in personal feature with sample environment feature identical type is only retained.For example individual is special
Sign includes name, age, gender, Alcohol Consumption Status and smoking state, and sample environment feature includes age, Alcohol Consumption Status and smoking
Situation is first handled personal feature then before personal feature is input to screening model, is only obtained and is planted in personal feature
Class be the age, Alcohol Consumption Status and smoking state subcharacter, reduce the complexity subsequently calculated.If in addition, treated
The value of some subcharacter is sky in body characteristics, then take in the sample environment feature of multiple individual of sample the average value of the subcharacter or
Value of the preset value as the subcharacter in personal feature, prevents from subsequently calculating error, improves the stability of calculating.
After determining multiple personal features, multiple personal features are input to screening model, each personal feature is through screening
Corresponding output characteristic value can be obtained after calculating in model, and output characteristic value is identical as the format of the sample characteristics of sample information,
But it is not limited to the value range of sample characteristics.Obtain the corresponding multiple output characteristic value output knots of multiple personal features
Fruit generates output characteristic value set.
In S104, the output characteristic value that preset condition is met in the output characteristic value set is added to target signature
Value set, and target group corresponding with the target signature value set is determined from the crowd to be screened.
After output characteristic value set determines, the output characteristic value for meeting preset condition is obtained from output characteristic value set,
And the output characteristic value is added to target signature value set.In embodiments of the present invention, preset condition can set output is special
The output characteristic value in preset ratio is added to target signature value set in value indicative set, such as will be in output characteristic value set
Multiple output characteristic values carry out size sequence, and after the completion of sequence, the output characteristic value that ratio is in preceding 10% adds
Into target signature value set.In addition, preset condition can also be set according to characteristic threshold value, the determination method of characteristic threshold value is rear
Text is specifically addressed.Since output characteristic value corresponds to the individual to be screened in crowd to be screened, therefore according to object feature value collection
The target group that can determine that in crowd to be screened is closed, the screening to crowd to be screened is completed.
By embodiment illustrated in fig. 1 it is found that in embodiments of the present invention, obtaining multiple individual of sample in sample population first
Corresponding multiple sample informations, wherein sample information includes sample environment feature and sample characteristics, and sample characteristics is for retouching
Multiple sample informations and preset pretreated model are fitted by the individual state for stating individual of sample, and fitting is completed
Pretreated model output is screening model, carries out the screening to crowd to be screened, obtains the multiple individuals to be screened of crowd to be screened
Multiple personal features, multiple personal features are input to screening model, obtain the output being combined by multiple output characteristic values
The output characteristic value for meeting preset condition in output characteristic value set is finally added to object feature value collection by characteristic value collection
It closes, determines that multiple individuals to be screened corresponding with target signature value set are realized as target group from crowd to be screened
Automatic screening, and improve the accuracy screened to target group.
It is on the basis of the embodiment of the present invention one, to intending multiple sample informations with pretreated model shown in Fig. 2
It closes, and the step of pretreated model after fitting is refined to obtain as the process of screening model.The embodiment of the present invention carries
The implementation flow chart of the crowd's screening based on big data supplied, as shown, crowd's screening technique may comprise steps of:
In S201, the multiple sample information is input to the pretreated model, with the training pretreated model,
Wherein, using the sample environment feature of the sample information as the input parameter of the pretreated model, by the sample information
Reference parameter of the sample characteristics as the pretreated model.
When multiple sample informations are input to pretreated model, using the sample environment feature in sample information as pre- place
The input parameter for managing model, is specifically based on the sample characteristics in sample information as the reference parameter of pretreated model
Multiple sample informations build sample information collection, are (Charactersenviron1, Valueenviron1), (Charactersenviron2,
Valueenviron2)……(Charactersenvironn, Valueenvironn), wherein CharactersenvironiRepresent i-th of sample
The sample environment feature of this information may include multiple subcharacters, Value under itenvironiRepresent the sample of i-th of sample information
Eigen value, n represent the total number of the multiple sample informations obtained from sample population.Sample information collection is input to pretreatment mould
Type, to train pretreated model, in the embodiment of the present invention, pretreated model is to the calculation formula of input parameter:
In above-mentioned formula,It represents to input parameter as CharactersenvironiPredicted value, be
By CharactersenvironiAfter pretreated model being input to as input parameter, the output result after pretreated model calculating.It is public
F () in formula indicates that a function for being present in function space, function space are referred to from a set to another set
The set of the function of given type, i.e. f () function are initially at unknown state, and K then indicates that there are on K in pretreated model
The f () function stated needs after all calculated results of f () function add up, can just obtain final predicted value.
In reality to the training process of pretreated model, above-mentioned formula is depended on, in embodiments of the present invention, using order
The method of study learns f () function, so that finally obtained K f () function meets multiple sample letters to the maximum extent
Data in breath.For example, it is Characters in input parameterenvironiOn the basis of, the predicted value prediction of t wheels is carried out,
And in the predicted value prediction for carrying out t wheels, retains the predicted value prediction result of t-1 wheels, i.e., train pre-processing according to order
Model so that predicted valueWith actual value (Valueenvironi) between gap be gradually reduced, be specifically shown in down:
……
In above-mentioned formulaIt is to provide input parameter as CharactersenvironiOn the basis of, into
Predicted value after the prediction of row t wheels.In order to determine required in order learning process f () function, make its as possible close to
In sample information collection, therefore majorized function is built, specific formula is seen below:
In above-mentioned formula, ValueenvironiIt is that sample information is concentrated and input parameter CharactersenvironiIt is corresponding
It is the sample characteristics in sample information with reference to parameter, the parameter of function as an optimization in embodiments of the present invention.Optimize letter
Ω (f in number formulat) it is regular terms, D is constant term, wherein regular terms controls the training degree of majorized function, prevents sample
Information collection and pretreated model over-fitting;Constant term is a constant, and setting constant term is to limit the numerical value of majorized function
Range.It is noted thatFor error letter
Number, to the process that majorized function optimizes, that is, is to determine that suitable f () function makes the value of above-mentioned error function subtract as possible
Small process.
In embodiments of the present invention, in order to just be optimized to majorized function above calculating level, to above-mentionedIt is unfolded, and is defined:
Majorized function after expansion is:
Since constant term substantially has no effect on the optimization process of majorized function, therefore extract in the majorized function after expansion
Constant term, produces the Reduce function that the majorized function after expansion is taken turns in t, and formula is as follows:
In final Reduce function, the output valve that Reduce function obtains depends on giAnd hiValue, therefore can determine quickly
Suitable f () function, improves trained simplicity, in embodiments of the present invention, the method learnt by above-mentioned order with
And majorized function (Reduce function also can) is to train pretreated model.
It is the screening model by the pretreated model output after training in S202.
When sample information concentrates all input parameters and fully enters pretreated model with reference to parameter, and pre-process mould
After the completion of type training, the pretreated model that training is completed is carried out as screening model (f () function that mainly training is completed)
Output.When needing to carry out the screening of crowd to be screened, by the personal feature Characters of individual to be screenedenvironxInput screening
Model, you can pass through the calculation formula after optimizing in screening model
By predicted value is calculated
By embodiment illustrated in fig. 2 it is found that in embodiments of the present invention, using the sample environment feature in sample information as
The input parameter of pretreated model, using the sample characteristics of sample information as the reference parameter of pretreated model, thus will be more
A sample information is input to pretreated model, and to train pretreated model, the pretreated model for finally completing training is as sieve
Modeling type is exported, and the compactness of screening model and multiple sample informations is improved, and improve by screening model into
The accuracy of pedestrian's group's screening.
It is on the basis of the embodiment of the present invention one, to preset condition will be met in output characteristic value set shown in Fig. 3
Output characteristic value is added to a kind of detailed process obtained after target signature value set is refined.An embodiment of the present invention provides
The implementation flow chart of crowd's screening technique based on big data, as shown in figure 3, crowd's screening technique may include following step
Suddenly:
In S301, characteristic threshold value is obtained, the characteristic threshold value is for judging it is described whether the output characteristic value meets
Preset condition.
In embodiments of the present invention, after screening model determines, the multiple personal features for obtaining crowd to be screened are needed, and certainly
It is dynamic that multiple personal features are input to screening model, obtain the output characteristic value set for including multiple output characteristic values.In order to obtain
The output characteristic value for meeting preset condition in output characteristic value set is obtained, characteristic threshold value is obtained first, meets the defeated of preset condition
It is the output characteristic value more than or equal to characteristic threshold value to go out characteristic value.
Be on the basis of step S301, and in the value of sample characteristics it is that the First Eigenvalue or second are special shown in Fig. 4
In the case of value indicative, to obtaining a kind of detailed process obtained after characteristic threshold value refines.An embodiment of the present invention provides bases
In the implementation flow chart of crowd's screening technique of big data, as shown in figure 4, crowd's screening technique may comprise steps of:
In S401, big data analysis is carried out to the multiple sample information, determines that sample characteristics value is described the
The quantitative proportion that the sample information of one characteristic value is occupied in the multiple sample information.
For different application scenarios, the value range of sample characteristics is it is possible that difference, in the embodiment of the present invention
In, using the value of sample characteristics as the First Eigenvalue or Second Eigenvalue, and the First Eigenvalue is more than Second Eigenvalue
Situation illustrates.For example, sample characteristics value is that the First Eigenvalue indicates corresponding individual of sample by anxiety disorder
It influences, sample characteristics value is that Second Eigenvalue indicates that corresponding individual of sample is not affected by anxiety disorder influence, and screens purpose
It is to filter out the target group influenced by anxiety disorder from crowd to be screened, then big data is carried out to multiple sample informations first
Analysis extracts the first sample quantity that sample characteristics value is the First Eigenvalue, to obtain first sample quantity in institute
There is the quantitative proportion occupied in sample information.
In S402, the spy is calculated according to the quantitative proportion, the First Eigenvalue and the Second Eigenvalue
Levy threshold value.
Since the value range of sample characteristics is defined in the First Eigenvalue and Second Eigenvalue, therefore calculate the First Eigenvalue
Difference between Second Eigenvalue, and the First Eigenvalue is subtracted into the difference and the product of quantitative proportion, obtain characteristic threshold value.
For example, if the First Eigenvalue is 2, the first quantitative proportion is 30%, Second Eigenvalue 1, and the first quantitative proportion is 70%,
Then difference is 2-1=1, and characteristic threshold value is 2-1 × 30%=1.7.Certainly, above-mentioned computational methods are only applicable to sample characteristics
Value be the First Eigenvalue or Second Eigenvalue, and the First Eigenvalue be more than Second Eigenvalue the case where, it is possible for other
Sample characteristics can be extended on the basis of above-mentioned computational methods, and the embodiment of the present invention is no longer repeated.
It is on the basis of step S301, to obtaining the another kind obtained after characteristic threshold value refines specifically shown in Fig. 5
Process.An embodiment of the present invention provides the implementation flow charts of crowd's screening technique based on big data, as shown in figure 5, the crowd
Screening technique may comprise steps of:
In S501, the sample environment feature of the multiple sample information is input to the screening model, and obtain institute
State multiple result characteristic values corresponding with the sample environment feature of the multiple sample information of screening model output.
In embodiments of the present invention, after screening model generates, multiple sample informations are input to screening model.Specifically will
The sample environment feature of multiple sample informations is input to screening model as input parameter, but not by the sample of multiple sample informations
Reference parameter of the eigen value as screening model, but by the calculation formula of screening model directly to multiple sample informations
Sample environment feature carries out that multiple output parameters are calculated, i.e., multiple result characteristic values corresponding with sample environment feature.One
As for, it is poor that the original sample characteristics of result characteristic value of the sample information after screening model calculates and sample information exists
It is different.
In S502, the multiple result characteristic value is ranked up, generates result characteristic value sequence.
After getting multiple result characteristic values, multiple result characteristic values are ranked up according to numerical values recited, generate numerical value
The row head of sequence, sequence of values is the maximum result characteristic value of numerical value.If it is noted that in multiple result characteristic values
First result characteristic value is identical as the second result characteristic value, then corresponding according to the first result characteristic value and the second result characteristic value
The input order of sample information generates characteristic value sequence.Specifically, order writing mechanism is set, when some result characteristic value needs
When wanting write-in characteristic value sequence, it whether there is existing result feature identical with the result characteristic value in judging characteristic value sequence
Value is looked into if existing result characteristic value identical with the result characteristic value is not present according to the numerical values recited of the result characteristic value
The existing result characteristic value more than the result characteristic value minimum in characteristic value sequence is found out, and after this has result characteristic value
Position the result characteristic value is written;If in the presence of existing result characteristic value identical with the result characteristic value, in the existing knot
The result characteristic value is written in position after fruit characteristic value, if there are multiple existing result features identical with the result characteristic value
Value, then search the existing result characteristic value at multiple existing result characteristic value ends, and in the position after having result characteristic value
The result characteristic value is written.
In S503, preset screening ratio is obtained, and found out and the screening in the result characteristic value sequence
The corresponding result characteristic value of ratio, exports as the characteristic threshold value.
After characteristic value sequence generates, preset screening ratio is obtained, and searched and the screening ratio in characteristic value sequence
Result characteristic value output in the screening position is characterized threshold value by corresponding screening position.For example, characteristic value sequence
In have been written into 300 result characteristic values, and screen ratio be 10%, then screen position be the 30th, then extract characteristic value sequence
In positioned at the 30th result characteristic value numerical value as characteristic threshold value.Optionally, screening ratio can be according to big data analysis side
Method is determined, such as can count nationwide multiple national sample informations, and each whole nation sample information includes sample environment
Feature and sample characteristics, first sample characteristics value are the Screening Samples quantity of numerical value corresponding with screening purpose, to
The ratio that Screening Samples quantity is occupied in all national sample informations is determined, using the ratio as screening ratio, by counting greatly
It may make the general applicability of the characteristic threshold value of generation higher according to analysis method.Certainly, screening ratio also can artificially be preset.
In S302, the output feature for being greater than or equal to the characteristic threshold value in the output characteristic value set is extracted
Value, and the output characteristic value extracted is added to the target signature value set.
After characteristic threshold value determines, the output feature more than or equal to characteristic threshold value is extracted from output characteristic value set
Value, then the output characteristic value extracted is corresponding with the target individual of screening, therefore the output characteristic value extracted is added to target
Characteristic value collection, to determine target group corresponding with object feature value subsequently from crowd to be screened.
By embodiment illustrated in fig. 3 it is found that in embodiments of the present invention, feature threshold is obtained by using different methods
It is worth, and extracts the output characteristic value more than or equal to characteristic threshold value from output characteristic value set, and the output that will be extracted
Characteristic value is added to target signature value set, by setting characteristic threshold value, improves the simplicity of target signature value set generation
And efficiency.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Fig. 6 shows that a kind of structure diagram of terminal device provided in an embodiment of the present invention, the terminal device include each
Unit is used to execute each step in the corresponding embodiments of Fig. 1.Referring specifically to Fig. 1 and the phase in the embodiment corresponding to Fig. 1
Close description.For convenience of description, only the parts related to this embodiment are shown.
Referring to Fig. 6, the terminal device includes:
First acquisition unit 61, multiple sample informations for obtaining sample population, the sample information include sample ring
Border feature and sample characteristics, the sample characteristics correspond to sample for describing the sample information in the sample population
The individual state of individual;
Fitting unit 62, for the multiple sample information and preset pretreated model to be fitted, and will fitting
The pretreated model afterwards is as screening model;
Second acquisition unit 63, multiple personal features for obtaining crowd to be screened, and by the multiple personal feature
It is input to the screening model, obtains output characteristic value set corresponding with the multiple personal feature, the output characteristic value
Set includes multiple output characteristic values;
Target determination unit 64, for adding the output characteristic value for meeting preset condition in the output characteristic value set
To target signature value set, and target group corresponding with the target signature value set is determined from the crowd to be screened.
Optionally, the fitting unit 62, including:
Input unit, for the multiple sample information to be input to the pretreated model, with the training pretreatment
Model, wherein using the sample environment feature of the sample information as the input parameter of the pretreated model, by the sample
Reference parameter of the sample characteristics of information as the pretreated model;
Output unit is the screening model for the pretreated model output after training.
Optionally, the target determination unit 64, including:
Threshold value acquiring unit, for obtaining characteristic threshold value, whether the characteristic threshold value is for judging the output characteristic value
Meet the preset condition;
Extraction unit, for extracting the output spy in the output characteristic value set more than or equal to the characteristic threshold value
Value indicative, and the output characteristic value extracted is added to the target signature value set.
Optionally, sample characteristics be the First Eigenvalue or Second Eigenvalue, the threshold value acquiring unit, including:
Analytic unit determines sample characteristics value for institute for carrying out big data analysis to the multiple sample information
State the quantitative proportion that the sample information of the First Eigenvalue is occupied in the multiple sample information;
Computing unit, for calculating institute according to the quantitative proportion and the First Eigenvalue and the Second Eigenvalue
State characteristic threshold value.
Optionally, the threshold value acquiring unit, including:
Feature input unit, for the sample environment feature of the multiple sample information to be input to the screening model,
And obtain multiple result characteristic values corresponding with the sample environment feature of the multiple sample information of the screening model output;
Sequencing unit generates result characteristic value sequence for being ranked up to the multiple result characteristic value;
Threshold value output unit, for obtaining preset screening ratio, and find out in the result characteristic value sequence with
The corresponding result characteristic value of the screening ratio, exports as the characteristic threshold value.
Therefore, terminal device provided in an embodiment of the present invention is by intending multiple sample informations with pretreated model
It closes, obtains screening model, and carry out the determination of target group according to screening model so that crowd's screening can be according to multiple features
It is screened, improves the accuracy of crowd's screening.
Fig. 7 is the schematic diagram of terminal device provided in an embodiment of the present invention.As shown in fig. 7, the terminal device 7 of the embodiment
Including:Processor 70, memory 71 and it is stored in the calculating that can be run in the memory 71 and on the processor 70
Machine program 72, for example, terminal device control program.The processor 70 is realized above-mentioned each when executing the computer program 72
Step in a crowd's screening technique embodiment based on big data, such as step S101 to S104 shown in FIG. 1.Alternatively, institute
State the function that each unit in above-mentioned each device embodiment is realized when processor 70 executes the computer program 72, such as Fig. 6 institutes
Show the function of unit 61 to 64.
Illustratively, the computer program 72 can be divided into one or more units, one or more of
Unit is stored in the memory 71, and is executed by the processor 70, to complete the present invention.One or more of lists
Member can complete the series of computation machine program instruction section of specific function, and the instruction segment is for describing the computer journey
Implementation procedure of the sequence 72 in the terminal device 7.For example, the computer program 72 can be divided into the first acquisition list
Member, fitting unit, second acquisition unit and target determination unit, each unit concrete function are as follows:
First acquisition unit, multiple sample informations for obtaining sample population, the sample information includes sample environment
Feature and sample characteristics, the sample characteristics correspond to sample for describing the sample information in the sample population
The individual state of body;
Fitting unit, for the multiple sample information and preset pretreated model to be fitted, and will be after fitting
The pretreated model as screening model;
Second acquisition unit, multiple personal features for obtaining crowd to be screened, and the multiple personal feature is defeated
Enter to the screening model, obtains output characteristic value set corresponding with the multiple personal feature, the output characteristic value collection
Conjunction includes multiple output characteristic values;
Target determination unit, for the output characteristic value for meeting preset condition in the output characteristic value set to be added to
Target signature value set, and target group corresponding with the target signature value set is determined from the crowd to be screened.
The terminal device 7 can be that the calculating such as desktop PC, notebook, palm PC and cloud server are set
It is standby.The terminal device 7 may include, but be not limited only to, processor 70, memory 71.It will be understood by those skilled in the art that figure
7 be only the example of terminal device 7, does not constitute the restriction to terminal device 7, may include than illustrating more or fewer portions
Part either combines certain components or different components, such as the terminal device 7 can also include input-output equipment, net
Network access device, bus etc..
Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng.
The memory 71 can be the internal storage unit of the terminal device 7, such as the hard disk of terminal device 7 or interior
It deposits.The memory 71 can also be to be equipped on the External memory equipment of the terminal device 7, such as the terminal device 7
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..Further, the memory 71 can also both include the storage inside list of the terminal device 7
Member also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device 7
Other programs and data.The memory 71 can be also used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work(
Can unit division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different functions
Unit is completed, i.e., the internal structure of described device is divided into different functional units, with complete it is described above whole or
Partial function.Each functional unit in embodiment can be integrated in a processing unit, can also be the independent object of each unit
Reason exists, can also be during two or more units are integrated in one unit, and hardware both may be used in above-mentioned integrated unit
Form realize, can also be realized in the form of SFU software functional unit.In addition, the specific name of each functional unit is also only
Convenient for mutually distinguishing, the protection domain that is not intended to limit this application.The specific work process of unit in above system, can be with
With reference to the corresponding process in preceding method embodiment, details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, depends on the specific application and design constraint of technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with
It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute
The division of unit is stated, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple lists
Member or component can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point,
Shown or discussed mutual coupling or direct-coupling or communication connection can be by some interfaces, device or unit
INDIRECT COUPLING or communication connection, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can be stored in a computer read/write memory medium.Based on this understanding, the present invention realizes above-described embodiment side
All or part of flow in method can also instruct relevant hardware to complete, the computer by computer program
Program can be stored in a computer readable storage medium, and the computer program is when being executed by processor, it can be achieved that above-mentioned each
The step of a embodiment of the method.Wherein, the computer program includes computer program code, and the computer program code can
Think source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium can be with
Including:Any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disc, light of the computer program code can be carried
Disk, computer storage, read-only memory (Read-Only Memory, ROM), random access memory (Random Access
Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described computer-readable
The content that medium includes can carry out increase and decrease appropriate according to legislation in jurisdiction and the requirement of patent practice, such as at certain
A little jurisdictions, according to legislation and patent practice, computer-readable medium does not include electric carrier signal and telecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality
Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each
Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed
Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of crowd's screening technique based on big data, which is characterized in that including:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics, described
Sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and the pretreated model after fitting is made
For screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, are obtained
Output characteristic value set corresponding with the multiple personal feature, the output characteristic value set include multiple output characteristic values;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and from institute
It states and determines target group corresponding with the target signature value set in crowd to be screened.
2. crowd's screening technique as described in claim 1, which is characterized in that it is described by the multiple sample information with it is preset
Pretreated model is fitted, and using the pretreated model after fitting as screening model, including:
The multiple sample information is input to the pretreated model, with the training pretreated model, wherein by the sample
Input parameter of the sample environment feature of this information as the pretreated model makees the sample characteristics of the sample information
For the reference parameter of the pretreated model;
It is the screening model by the pretreated model output after training.
3. crowd's screening technique as described in claim 1, which is characterized in that described to meet in the output characteristic value set
The output characteristic value of preset condition is added to target signature value set, including:
Characteristic threshold value is obtained, the characteristic threshold value is for judging whether the output characteristic value meets the preset condition;
The output characteristic value for being greater than or equal to the characteristic threshold value in the output characteristic value set is extracted, and will be extracted
Output characteristic value is added to the target signature value set.
4. crowd's screening technique as claimed in claim 3, which is characterized in that the sample characteristics is the First Eigenvalue or the
Two characteristic values, the acquisition characteristic threshold value, including:
Big data analysis is carried out to the multiple sample information, determines that sample characteristics value is the described of the First Eigenvalue
The quantitative proportion that sample information is occupied in the multiple sample information;
The characteristic threshold value is calculated according to the quantitative proportion, the First Eigenvalue and the Second Eigenvalue.
5. crowd's screening technique as claimed in claim 3, which is characterized in that the acquisition characteristic threshold value, including:
The sample environment feature of the multiple sample information is input to the screening model, and obtains the screening model output
Multiple result characteristic values corresponding with the sample environment feature of the multiple sample information;
The multiple result characteristic value is ranked up, result characteristic value sequence is generated;
Preset screening ratio is obtained, and is found out in the result characteristic value sequence corresponding with the screening ratio described
As a result characteristic value exports as the characteristic threshold value.
6. a kind of terminal device, which is characterized in that the terminal device includes memory, processor and is stored in the storage
In device and the computer program that can run on the processor, the processor are realized as follows when executing the computer program
Step:
Multiple sample informations of sample population are obtained, the sample information includes sample environment feature and sample characteristics, described
Sample characteristics is for describing the individual state that the sample information corresponds to individual of sample in the sample population;
The multiple sample information and preset pretreated model are fitted, and the pretreated model after fitting is made
For screening model;
Multiple personal features of crowd to be screened are obtained, and the multiple personal feature is input to the screening model, are obtained
Output characteristic value set corresponding with the multiple personal feature, the output characteristic value set include multiple output characteristic values;
The output characteristic value for meeting preset condition in the output characteristic value set is added to target signature value set, and from institute
It states and determines target group corresponding with the target signature value set in crowd to be screened.
7. terminal device as claimed in claim 6, which is characterized in that described by the multiple sample information and preset pre- place
Reason model is fitted, and using the pretreated model after fitting as screening model, including:
The multiple sample information is input to the pretreated model, with the training pretreated model, wherein by the sample
Input parameter of the sample environment feature of this information as the pretreated model makees the sample characteristics of the sample information
For the reference parameter of the pretreated model;
It is the screening model by the pretreated model output after training.
8. terminal device as claimed in claim 6, which is characterized in that described satisfaction to be preset in the output characteristic value set
The output characteristic value of condition is added to target signature value set, including:
Characteristic threshold value is obtained, the characteristic threshold value is for judging whether the output characteristic value meets the preset condition;
The output characteristic value for being greater than or equal to the characteristic threshold value in the output characteristic value set is extracted, and will be extracted
Output characteristic value is added to the target signature value set.
9. terminal device as claimed in claim 8, which is characterized in that the sample characteristics is that the First Eigenvalue or second are special
Value indicative, the acquisition characteristic threshold value, including:
Big data analysis is carried out to the multiple sample information, determines that sample characteristics value is the described of the First Eigenvalue
The quantitative proportion that sample information is occupied in the multiple sample information;
The characteristic threshold value is calculated according to the quantitative proportion and the First Eigenvalue and the Second Eigenvalue.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, feature to exist
In the step of realization crowd's screening technique as described in any one of claim 1 to 5 when the computer program is executed by processor
Suddenly.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810455659.4A CN108629381A (en) | 2018-05-14 | 2018-05-14 | Crowd's screening technique based on big data and terminal device |
PCT/CN2018/097561 WO2019218482A1 (en) | 2018-05-14 | 2018-07-27 | Big data-based population screening method and apparatus, terminal device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810455659.4A CN108629381A (en) | 2018-05-14 | 2018-05-14 | Crowd's screening technique based on big data and terminal device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108629381A true CN108629381A (en) | 2018-10-09 |
Family
ID=63693185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810455659.4A Withdrawn CN108629381A (en) | 2018-05-14 | 2018-05-14 | Crowd's screening technique based on big data and terminal device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108629381A (en) |
WO (1) | WO2019218482A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726242A (en) * | 2018-12-29 | 2019-05-07 | 陕西西部资信股份有限公司 | Data processing method and system |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113610168B (en) * | 2021-08-11 | 2024-05-14 | 平安科技(深圳)有限公司 | Data processing method, device, equipment and medium |
CN114334696B (en) * | 2021-12-30 | 2024-03-05 | 中国电信股份有限公司 | Quality detection method and device, electronic equipment and computer readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106214120A (en) * | 2016-08-19 | 2016-12-14 | 靳晓亮 | A kind of methods for screening of glaucoma |
CN107895596A (en) * | 2016-12-19 | 2018-04-10 | 平安科技(深圳)有限公司 | Risk Forecast Method and system |
CN106706627A (en) * | 2017-03-06 | 2017-05-24 | 温鹏 | Combined application of hematin and beta-glucuronidase in detection of nasopharyngeal epithelial cell heterogeneity hyperplasia and reagent kit |
-
2018
- 2018-05-14 CN CN201810455659.4A patent/CN108629381A/en not_active Withdrawn
- 2018-07-27 WO PCT/CN2018/097561 patent/WO2019218482A1/en active Application Filing
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726242A (en) * | 2018-12-29 | 2019-05-07 | 陕西西部资信股份有限公司 | Data processing method and system |
Also Published As
Publication number | Publication date |
---|---|
WO2019218482A1 (en) | 2019-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109902222B (en) | Recommendation method and device | |
WO2021155706A1 (en) | Method and device for training business prediction model by using unbalanced positive and negative samples | |
WO2017206936A1 (en) | Machine learning based network model construction method and apparatus | |
CN109857860A (en) | File classification method, device, computer equipment and storage medium | |
CN111967971B (en) | Bank customer data processing method and device | |
CN110968701A (en) | Relationship map establishing method, device and equipment for graph neural network | |
CN107423442A (en) | Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis | |
CN109376844A (en) | The automatic training method of neural network and device recommended based on cloud platform and model | |
CN108898476A (en) | A kind of loan customer credit-graded approach and device | |
JP6908302B2 (en) | Learning device, identification device and program | |
CN108629381A (en) | Crowd's screening technique based on big data and terminal device | |
CN113902131B (en) | Updating method of node model for resisting discrimination propagation in federal learning | |
CN110147389A (en) | Account number treating method and apparatus, storage medium and electronic device | |
CN116305289B (en) | Medical privacy data processing method, device, computer equipment and storage medium | |
Chen et al. | Research on credit card default prediction based on k-means SMOTE and BP neural network | |
CN111062444A (en) | Credit risk prediction method, system, terminal and storage medium | |
Wang et al. | Research on maize disease recognition method based on improved resnet50 | |
CN107223260A (en) | Method for dynamicalling update grader complexity | |
CN113011895A (en) | Associated account sample screening method, device and equipment and computer storage medium | |
CN110334720A (en) | Feature extracting method, device, server and the storage medium of business datum | |
CN113380360B (en) | Similar medical record retrieval method and system based on multi-mode medical record map | |
Li et al. | A credit risk model with small sample data based on G-XGBoost | |
CN116090618A (en) | Operation situation sensing method and device for power communication network | |
CN107402984B (en) | A kind of classification method and device based on theme | |
Harikumar et al. | Prescriptive analytics through constrained Bayesian optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181009 |
|
WW01 | Invention patent application withdrawn after publication |