CN111415099A - Poverty-poverty identification method based on multi-classification BP-Adaboost - Google Patents
Poverty-poverty identification method based on multi-classification BP-Adaboost Download PDFInfo
- Publication number
- CN111415099A CN111415099A CN202010236492.XA CN202010236492A CN111415099A CN 111415099 A CN111415099 A CN 111415099A CN 202010236492 A CN202010236492 A CN 202010236492A CN 111415099 A CN111415099 A CN 111415099A
- Authority
- CN
- China
- Prior art keywords
- poverty
- student
- data set
- data
- students
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 239000011159 matrix material Substances 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000013145 classification model Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 238000002372 labelling Methods 0.000 claims abstract description 3
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 7
- RWSOTUBLDIXVET-UHFFFAOYSA-N Dihydrogen sulfide Chemical compound S RWSOTUBLDIXVET-UHFFFAOYSA-N 0.000 claims description 6
- 206010041349 Somnolence Diseases 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 5
- 230000006399 behavior Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims 1
- 238000010801 machine learning Methods 0.000 description 6
- 238000003745 diagnosis Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Life Sciences & Earth Sciences (AREA)
- Educational Administration (AREA)
- Strategic Management (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Tourism & Hospitality (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Marketing (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Business, Economics & Management (AREA)
- Educational Technology (AREA)
- Game Theory and Decision Science (AREA)
- Primary Health Care (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A poverty-suffering recognizing method based on multi-classification BP-Adaboost comprises the following steps: (1) acquiring multi-dimensional historical data of past-year poverty-stricken students, (2) preprocessing the acquired historical data of past-year poverty-stricken students to construct a student characteristic matrix S; (3) dividing multi-dimensional historical data of the past poor students into three categories according to the poor degree, labeling the labels of the categories of the poor students, and constructing a training data set; (4) designing a BP-Adaboost classification model, and training the BP-Adaboost classification model by using a data set constructed by the extracted poverty-suffering characteristic matrix of each poverty-suffering degree in the previous year; (5) the training model is used for assisting the identification of poverty. According to the invention, a BP-Adaboost-based multi-classification model is designed by utilizing student behavior data generated by students at schools, and the model can quickly and accurately classify the students into three categories of poverty, so that the situation of poverty of the students is judged to assist the poverty management staff of the colleges and universities to make decisions.
Description
Technical Field
The invention belongs to the technical field of feature extraction and classification algorithms, and particularly relates to a poverty-poverty identification method based on multi-classification BP-Adaboost.
Background
The subsidy of students is an important content and an important measure for overcoming poverty and hardness, promoting education fairness and further realizing social fairness. The poverty-stricken determination of colleges and universities is the basic work of effective implementation of national student subsidy policies and is important content for promoting the precision of student subsidy. Currently, most of the poverty-stricken assessment of colleges and universities is performed by class public assessment and institution instructor after relevant evidence is presented in villages and towns where students are located. The identification mode has the problems of poor identification deviation, easy doping of individual subjective feelings in each link evaluation, giving up evaluation due to self-esteem of poor students and the like, so that the fairness, the efficiency and the accuracy of the poor students are influenced.
The coming of big data era and the advanced learning method are mature day by day, new ideas and technical supports are brought to the funding work of poverty-suffering, and new opportunities are brought to colleges and universities to utilize big data and the advanced learning method to promote the rapid, convenient, efficient and accurate funding work. At present, the information-based construction of colleges and universities has been developed greatly, all behaviors of students in a campus can generate data, various characteristics of the students are recorded, the data reflect the real conditions of the students, the data can be reasonably applied to assist the poor students in the identification process to a certain extent, the identification result is more real and objective, and more help is provided for the really poor students.
At present, the diagnosis and assessment work of poverty and poverty school is still in an exploration stage by using a big data means and a machine learning method for assisting the diagnosis and the assessment, and a unified diagnosis and assessment method is not available in China. Although some techniques provide some points and ideas, none of them can meet the practical application or are difficult to implement, for example: the application number 201810972342.8 and the patent name are patent application documents of a student poverty degree prediction method based on machine learning, although the student poverty degree prediction is carried out aiming at behavior data generated by students at a school, the required data are various, dozens of types of data are required to be used, the data dimension disaster is easily caused, and the realization difficulty is increased.
Therefore, the effective and accurate realization of poverty-stricken birth identification by using a big data means and a machine learning method becomes the key for researching the auxiliary poverty-stricken birth accurate subsidization.
Disclosure of Invention
In order to solve the problems that high-dimensional poverty-suffering data are difficult to process and poverty-suffering difficulty is difficult to accurately subsidize in the prior art, the invention provides a poverty-suffering identifying method based on multi-classification BP-Adaboost.
In order to achieve the purpose, the invention adopts the technical scheme that:
a poverty-suffering recognizing method based on multi-classification BP-Adaboost is characterized by comprising the following steps:
step 1, obtaining historical behavior data of students, and obtaining multidimensional historical data of poverty students in the past year, wherein the multidimensional historical data of the poverty students in the past year comprise the family condition and the economic condition of the students, the campus consumption condition, the student score condition and the basic information of the poverty students;
the specific steps of acquiring past year poverty-stricken multidimensional historical data and establishing a poverty-stricken feature matrix are as follows:
1) extracting the family condition and the economic condition of the student, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the card is established or not, whether the student has disability or illness or not, whether the parent has disability or illness or not, whether the student is a specially-trapped support person in urban and rural areas or not, and whether the student is a lowest life guarantee family in urban and rural areas or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) let E be the student family situation and economic situation data set1,e2,…,enWhere n denotes the student number, enWhether the disease is a solitary child, whether the disease is an orphan, whether a card-setting impoverished user is established, whether a knight or a pacifying child, whether a student has disability or illness, whether parents have disability or illness, whether people are particularly stranded in urban and rural areas, and whether the lowest life guarantee is established in urban and rural areasA matrix of households;
3) let campus consumption data set C ═ { C ═ C1,c2,…,cnWhere n denotes the student number, cnIs a matrix composed of total consumption, maximum daily consumption, average daily consumption, maximum monthly consumption and average daily consumption;
4) let student achievement situation data set G ═ G1,g2,…,gnWhere n denotes the student number, gnIs a matrix composed of achievement points, average achievement of the scholarly period and the number of the hanging departments;
5) let poverty basic situation data set B ═ B1,b2,…,bnWhere n denotes the student number, bnIs a matrix formed by whether the green channel enters the study or not and whether the biographical loan is transacted or not;
step 2, preprocessing the past year poverty and habitability multi-dimensional historical data collected in the step 1; the method comprises the following specific steps:
1) processing missing values in the data set, wherein the missing values enable data to lose part of information, and filling missing empty fields by using an average value;
2) removing repeated data, sequencing the poor and sleepy data of the previous year according to the serial numbers of students, detecting whether records are repeated or not by comparing whether adjacent records are similar or not, and deleting repeated records if the records are repeated;
3) carrying out feature coding on a student family condition and economic condition data set E and a poor and sleepy life basic condition data set B, and adopting a one-hot coding mode;
4) normalization, namely normalizing the campus consumption condition data set C and the student achievement condition data set G by using a Sigmoid function;
5) a student family condition and economic condition data set E and a campus consumption condition data setStudent achievement situation data setMerging the poverty-poverty basic situation data sets B into a student characteristic matrix S;
step 3, dividing the past year poverty-poor multi-dimensional historical data into three categories according to poverty degrees, labeling student poverty-poor category labels, and constructing a training data set, wherein the specific steps are as follows:
classifying the students into three levels according to the grade of the past year poverty, namely non-poverty, general poverty and special poverty, and using one-hot coding as a class label of the poverty of the students to construct a training data set T, wherein T is { (x)1,y1),…,(xi,yi),…,(xn,yn) Where the data x is inputiRandomly extracted from student feature matrix S, label yi∈ {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and special poverty, respectively, n is the data amount, and the data amount in T is 70% of the student feature matrix;
step 4, designing a BP-Adaboost classification model, and training the BP-Adaboost classification model by using the data set constructed by the poverty-suffering characteristic matrix of each poverty-suffering degree in the past year extracted in the step one, wherein the method specifically comprises the following steps:
1) inputting training data set T, initializing weight D ═ W of training data11,…,W1i,…,w1n) Wherein w is1i1/N, i is 1,2, … N, N represents the amount of data in the student feature matrix S; meanwhile, setting the iteration number M to be 1, and setting the total iteration number to be M, wherein the M is 10;
2) starting iteration, and adopting a three-layer neural network, wherein the neural network adopts a BP neural network and comprises an input layer, a hidden layer and an output layer, the input layer is provided with 17 nodes, the hidden layer is provided with 18 nodes, and the output layer is provided with 3 nodes;
3) training the training data set with weight distribution to obtain a weak classifier: gm(x) The method comprises the following steps X → {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and extra poverty, respectively;
4) calculating training data in the current classifier Gm(x) Error rate of:
5) calculation of Gm(x) Coefficient α ofm:
Wherein K represents the species of poverty, αmRepresents Gm(x) Importance in the final classifier, αmWith errmDecreasing and increasing, i.e. the smaller the classification error rate, the greater the contribution of the classifier in the final classifier;
6) updating the weight distribution of the training data set:
Dm+1=(Wm+1,1,…,Wm+1,i,…,Wm+1,N),
Wm+1,ican be converted to the following formula:
from this, the basic classifier Gm(x) The weight of the misclassified samples is enlarged, and the weight of the correctly classified samples is reduced, so that the BP-Adaboost classification model focuses more on the misclassified samples, and the misclassified samples play a greater role in the next round of learning, thereby improving the classification capability of the classification model;
Zmis a normalization factor:
it makes Dm+1Becoming a probability distribution;
7) judging whether to terminate the iteration, when M is less than M, thenSkipping to the 3 rd step in the step 3), and continuing to iterate when the iteration time m is m + 1; otherwise, terminating iteration, finishing the training of the BP-Adaboost classifier, and obtaining the final classifier
Step 4, training the model for assisting poverty-stricken life determination, and specifically comprising the following steps:
1) extracting the family condition and economic condition of the student to be identified, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the student is a card-setting poor family or not, whether the student is disabled or ill or not, whether the parent is disabled or ill or not, whether the urban and rural particularly-stranded support personnel exist or not, and whether the urban and rural lowest life support family exists or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) preprocessing the acquired student data and constructing a student characteristic matrix S;
3) inputting the student feature matrix S to be classified into the trained BP-Adaboost classification model to obtain a recognition result, if the output result is 1, the student is not poverty, if the output result is 2, the student is general poverty, and if the output result is 3, the student is particularly poverty.
The campus consumption condition data set C and the student achievement condition data set G are normalized by using a Sigmoid function; the method comprises the following specific steps:
1) normalizing each item of data in the campus consumption data condition data set C by using SigmoidFor the normalized student campus consumption data,the normalized campus consumption data situation data set is recorded as
2) Normalizing each item of data in the student achievement condition data set G by using SigmoidFor the normalized student achievement situation data,the normalized campus consumption data situation data set is recorded as
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention provides a poverty-suffering organism identification method based on multi-classification BP-Adaboost, which changes the traditional poverty-suffering organism identification mode and overcomes the artificial subjectivity by adopting a machine learning method in the identification process; compared with the existing method for identifying poverty-stricken students by machine learning, the method selects key factors in poverty-stricken student identification, reduces data dimensionality of students and avoids dimensionality disaster in machine learning; the method takes BP-Adaboost as a classifier, has higher classification precision, and effectively improves the accuracy of poverty-stricken birth determination.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flowchart of the training of the BP-Adaboost classification model.
Detailed Description
The present invention will be further described with reference to the following embodiments and the accompanying drawings, but the present invention is not limited to the following embodiments.
A poverty-suffering recognizing method based on multi-classification BP-Adaboost comprises the following steps:
step (1): collecting historical data of poverty and poverty in the past year; the past-year poverty-poor student multi-dimensional historical data comprises student family conditions, economic conditions, campus consumption conditions, student score conditions and poverty-poor student basic information, and a past-year poverty-poor student feature matrix is established; the establishment of the classification model in the invention is constructed based on the characteristics of poverty-stricken birth data, so that the accurate selection of basic data lays a foundation for the accurate classification of late poverty-stricken birth, and the specific steps are as follows (1.1) to (1.6):
(1.1) extracting the family condition and the economic condition of the student, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the card-setting poverty-stricken is established or not, whether the student has disability or illness or not, whether the parent has disability or illness or not, whether the urban and rural area is particularly stranded for the support personnel or not, and whether the urban and rural area is the lowest life guarantee family or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
(1.2) set the student family situation and economic situation data set E ═ E1,e2,…,enWhere n denotes the student number, enThe method is a matrix which consists of whether the student is a solitary child or not, whether the student is an orphan or not, whether a card-setting poverty-stricken user is established or not, whether the student is a burning man or a pacifying child or not, whether the student is disabled or sick or not, whether parents are disabled or sick or not, whether the urban and rural particularly-sleepy support personnel exist or not and whether the urban and rural lowest life support family exists or not, and a student family condition and economic condition data set E is established;
(1.3) setting campus consumption condition data set C ═ C1,c2,…,cnWhere n denotes the student number, cnThe campus consumption condition data set C is established by a matrix consisting of total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times;
(1.4) set student achievement situation data set G ═ G1,g2,…,gnTherein ofn represents the student number, gnThe student achievement situation data set G is a matrix consisting of achievement points, average achievement of a scholarly period and the number of the department hanging, and is established;
(1.5) setting the poverty basic situation data set B ═ B1,b2,…,bnWhere n denotes the student number, bnEstablishing a poverty-poor student basic condition data set B by a matrix formed by whether the green channel enters the school or not and whether the student loan is transacted or not;
step (2): data obtained in specific practice often has missing values and repeated values, for example, student consumption information is missing due to a school canteen card reader fault, so that data preprocessing is required before data is used, and preprocessing has no standard flow, and only a data preprocessing process is designed for the flow related to the invention, and the specific process is as described in steps (2.1) to (2.5):
(2.1) missing values in a data set are processed, the missing values enable data to lose partial information, and some models with poor robustness can not calculate the data due to the missing values, the campus consumption condition data and student performance condition data related to the method are possibly subjected to data missing due to acquisition equipment or other reasons, and missing empty fields are filled by using an average value;
(2.2) removing repeated data, sequencing the poor and sleepy data in the past year according to the student numbers, detecting whether the records are repeated or not by comparing whether the adjacent records are similar or not, and deleting the repeated records if the records are repeated;
(2.3) carrying out feature coding on the student family condition and economic condition data set E and the poverty-poor life basic condition data set B, and adopting a one-hot coding mode;
(2.4) data normalization is to adjust some characteristics of attribute values, the data is scaled to fall into a small specific interval, in the specific implementation, the campus consumption condition data set C and the student achievement condition data set G need to be normalized by using a Sigmoid function, and the specific steps are described as the step (2.4.1) and the step (2.4.2):
(2.4.1) use of each item of data in the campus consumption data situation dataset CSigmoid is normalized byFor the normalized student campus consumption data, the normalized campus consumption data situation data set is recorded as
(2.4.2) normalizing each item of data in the student achievement situation data set G by using SigmoidFor the normalized student achievement situation data,the normalized campus consumption data situation data set is recorded as
(2.5) data sets E and E of family conditions and economic conditions of students and data sets of campus consumption conditionsStudent achievement situation data setMerging the poverty-poverty basic situation data sets B into a student characteristic matrix S;
and (3): dividing the poverty-poor living data in the student feature matrix S into three classes according to the national poverty-poor living resource assistant standard, namely poverty-poor, general poverty-poor and special poverty-poor, and using one-hot coding as the class label of the poverty-poor of the student to construct a training data set T, wherein T is { (x)1,y1),…,(xi,yi),…,(xn,yn) Where the data x is inputiRandomly extracted from student feature matrix S, label yi∈ {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and special poverty, respectively, n is the data amount, and the data amount in T is 70% of the student feature matrix;
and (4): as shown in fig. two, a BP-Adaboost poor living classification model is designed, and the classification model is trained by using data with weights, and the specific steps are as follows:
(3.1) inputting training data set T, initializing weight D ═ W of training data11,…,W1i,…,w1n) Wherein w is1i1/N, i is 1,2, … N, N represents the amount of data in the student feature matrix S; meanwhile, setting the iteration number M to be 1, and setting the total iteration number to be M, wherein the M is 10;
(3.2) starting iteration, adopting a three-layer neural network, wherein the neural network adopts a BP neural network and comprises an input layer, a hidden layer and an output layer, the input layer is provided with 17 nodes, the hidden layer is provided with 18 nodes, and the output layer is provided with 3 nodes;
(3.3) training the training data set with weight distribution to obtain a weak classifier: : gm(x) The method comprises the following steps X → {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and extra poverty, respectively;
(3.4) calculating the training data in the current classifier Gm(x) Error rate of:
(3.5) calculation of Gm(x) Coefficient α ofm:
K denotes species of poverty, 1,2, 3 denote poverty, general poverty and special poverty, αmRepresents Gm(x) Importance in the final classifier, αmWith errmDecreasing and increasing, i.e. the smaller the classification error rate, the greater the contribution of the classifier in the final classifier; (3.6) updating the weight distribution of the training data set:
Dm+1=(wm+1,1,…,Wm+1,i,…,Wm+1,N),
Wm+1,ican be converted to the following formula:
from this, the basic classifier Gm(x) The weight of the misclassified samples is enlarged, and the weight of the correctly classified samples is reduced, so that the BP-Adaboost classification model focuses more on the misclassified samples, and the misclassified samples play a greater role in the next round of learning, thereby improving the classification capability of the classification model;
Zmis a normalization factor:
it makes Dm+1Becoming a probability distribution;
(3.7) judging whether to terminate iteration, and when M is less than M, skipping to the step (3.3), wherein the iteration time M is M +1, and continuing to iterate; otherwise, terminating iteration, finishing the training of the BP-Adaboost classifier, and obtaining the final classifier
And (4): the method comprises the following steps of obtaining data of students to be identified, preprocessing the data of the students, inputting the preprocessed data into a classification model, and using a classification result for auxiliary identification of poverty-stricken students, wherein the specific steps are as follows:
(4.1) extracting the family condition and economic condition of the student to be identified, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the student is a card-setting impoverished or not, whether the student is disabled or ill or not, the level of the disabled or ill degree of the student, whether the parent is disabled or ill or not, whether the parent is disabled or ill degree, whether the person is particularly stranded in urban and rural areas or whether the family is the lowest life guarantee family in urban and rural areas; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
(4.2) preprocessing the acquired student data, wherein the preprocessing step comprises missing value processing, duplicate removal, feature coding and normalization, and constructing a student feature matrix S;
(4.3) inputting the student feature matrix S to be classified into the trained BP-Adaboost classification model to obtain a confirmation result, if the output result is 1, the student is not poverty, if the output result is 2, the student is general poverty, and if the output result is 3, the student is special poverty;
(4.4) actually examining the identification result of the classification model, submitting the discovered suspected invisible poverty and false identification student lists to college managers for processing, and continuously adjusting the model according to the feedback verification condition;
while the foregoing shows and describes the principles of the present invention, together with the advantages thereof, the embodiments of the invention are not limited by the foregoing examples, which are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this disclosure.
Claims (2)
1. A poverty-suffering recognizing method based on multi-classification BP-Adaboost is characterized by comprising the following steps:
step 1, obtaining historical behavior data of students, and obtaining multidimensional historical data of poverty students in the past year, wherein the multidimensional historical data of the poverty students in the past year comprise the family condition and the economic condition of the students, the campus consumption condition, the student score condition and the basic information of the poverty students;
the specific steps of acquiring past year poverty-stricken multidimensional historical data and establishing a poverty-stricken feature matrix are as follows:
1) extracting the family condition and the economic condition of the student, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the card is established or not, whether the student has disability or illness or not, whether the parent has disability or illness or not, whether the student is a specially-trapped support person in urban and rural areas or not, and whether the student is a lowest life guarantee family in urban and rural areas or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) let E be the student family situation and economic situation data set1,e2,…,enWhere n denotes the student number, enThe system is a matrix consisting of whether the child is solitary or not, whether the impoverished user is established in a file card or not, whether the parent is suffering from disability or illness or not, whether the support personnel are particularly stranded in cities and countryside or not and whether the lowest life support family is in cities and countryside or not;
3) let campus consumption data set C ═ { C ═ C1,c2,…,cnWhere n denotes the student number, cnIs a matrix composed of total consumption, maximum daily consumption, average daily consumption, maximum monthly consumption and average daily consumption;
4) let student achievement situation data set G ═ G1,g2,…,gnWhere n denotes the student number, gnIs a matrix composed of achievement points, average achievement of the scholarly period and the number of the hanging departments;
5) let poverty basic situation data set B ═ B1,b2,…,bnWhere n denotes the student number, bnIs a matrix formed by whether the green channel enters the study or not and whether the biographical loan is transacted or not;
step 2, preprocessing the past year poverty and habitability multi-dimensional historical data collected in the step 1; the method comprises the following specific steps:
1) processing missing values in the data set, wherein the missing values enable data to lose part of information, and filling missing empty fields by using an average value;
2) removing repeated data, sequencing the poor and sleepy data of the previous year according to the serial numbers of students, detecting whether records are repeated or not by comparing whether adjacent records are similar or not, and deleting repeated records if the records are repeated;
3) carrying out feature coding on a student family condition and economic condition data set E and a poor and sleepy life basic condition data set B, and adopting a one-hot coding mode;
4) normalization, namely, normalizing the campus consumption condition data set C and the student achievement condition data set G by using a Sigmoid function, and recording the normalized campus consumption condition data set asStudent achievement situation data set
5) A student family condition and economic condition data set E and a campus consumption condition data setStudent achievement situation data setMerging the poverty-poverty basic situation data sets B into a student characteristic matrix S;
step 3, dividing the past year poverty-poor multi-dimensional historical data into three categories according to poverty degrees, labeling student poverty-poor category labels, and constructing a training data set, wherein the specific steps are as follows:
classifying the students into three levels according to the grade of the past year poverty, namely non-poverty, general poverty and special poverty, and using one-hot coding as a class label of the poverty of the students to construct a training data set T, wherein T is { (x)1,y1),…,(xi,yi),…,(xn,yn) Where the data x is inputiRandomly extracted from student feature matrix S, label yi∈ {001, 010,011 }, where 001,010,011 corresponds to no poverty, generally poverty, and particularly poverty, respectively, and n is the amount of data.
Step 4, designing a BP-Adaboost classification model, and training the BP-Adaboost classification model by using the data set constructed by the poverty-suffering characteristic matrix of each poverty-suffering degree in the past year extracted in the step one, wherein the method specifically comprises the following steps:
1) inputting training data set T, initializing weight D ═ w of training data11,…,w1i,…,w1n) Wherein w is1i1/N, i is 1,2, … N, N represents the amount of data in the student feature matrix S; meanwhile, setting the iteration number M to be 1, and setting the total iteration number to be M, wherein the M is 10;
2) starting iteration, and adopting a three-layer neural network, wherein the neural network adopts a BP neural network and comprises an input layer, a hidden layer and an output layer, the input layer is provided with 17 nodes, the hidden layer is provided with 18 nodes, and the output layer is provided with 3 nodes;
3) training the training data set with weight distribution to obtain a weak classifier: gm(x) The method comprises the following steps X → {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and extra poverty, respectively;
4) calculating training data in the current classifier Gm(x) Error rate of:
wherein y isi∈ 001,010,011, where 001,010,011 corresponds to no poverty, general poverty, and special poverty, respectively, and n is the number of data;
5) calculation of Gm(x) Coefficient α ofm:
K denotes species of poverty, αmRepresents Gm(x) Importance in the final classifier, αmWith errmDecreasing and increasing, i.e. the smaller the classification error rate, the greater the contribution of the classifier in the final classifier;
6) updating the weight distribution of the training data set:
Dm+1=(wm+1,1,…,wm+1,i,…,wm+1,N),
wm+1,ican be converted to the following formula:
from this, the basic classifier Gm(x) The weight of the misclassified samples is enlarged, and the weight of the correctly classified samples is reduced, so that the BP-Adaboost classification model focuses more on the misclassified samples, and the misclassified samples play a greater role in the next round of learning, thereby improving the classification capability of the classification model;
Zmis a normalization factor:
it makes Dm+1Becoming a probability distribution;
7) judging whether to terminate the iteration when m<When M is needed, jumping to the 3 rd step in step 3), and continuing the iteration when the iteration time M is M + 1; otherwise, terminating iteration, finishing the training of the BP-Adaboost classifier, and obtaining the final classifier
Step 4, training the model for assisting poverty-stricken life determination, and specifically comprising the following steps:
1) extracting the family condition and economic condition of the student to be identified, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the student is a card-setting poor family or not, whether the student is disabled or ill or not, whether the parent is disabled or ill or not, whether the urban and rural particularly-stranded support personnel exist or not, and whether the urban and rural lowest life support family exists or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) preprocessing the acquired student data and constructing a student characteristic matrix S;
3) inputting the student feature matrix S to be classified into the trained BP-Adaboost classification model to obtain a recognition result, if the output result is 1, the student is not poverty, if the output result is 2, the student is general poverty, and if the output result is 3, the student is particularly poverty.
2. The poverty-identifying method based on multi-classification BP-Adaboost as claimed in claim 1, wherein the campus consumption condition data set C and student achievement condition data set G are normalized by using Sigmoid function; the method comprises the following specific steps:
1) normalizing each item of data in the campus consumption data condition data set C by using SigmoidFor the normalized student campus consumption data,the normalized campus consumption data situation data set is recorded as
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010236492.XA CN111415099A (en) | 2020-03-30 | 2020-03-30 | Poverty-poverty identification method based on multi-classification BP-Adaboost |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010236492.XA CN111415099A (en) | 2020-03-30 | 2020-03-30 | Poverty-poverty identification method based on multi-classification BP-Adaboost |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111415099A true CN111415099A (en) | 2020-07-14 |
Family
ID=71494673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010236492.XA Pending CN111415099A (en) | 2020-03-30 | 2020-03-30 | Poverty-poverty identification method based on multi-classification BP-Adaboost |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111415099A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112231621A (en) * | 2020-10-13 | 2021-01-15 | 电子科技大学 | Method for reducing element detection limit based on BP-adaboost |
CN112416914A (en) * | 2020-10-15 | 2021-02-26 | 三峡大学 | Difficult student identification and early warning method and system based on big data analysis |
CN112541579A (en) * | 2020-12-23 | 2021-03-23 | 北京北明数科信息技术有限公司 | Model training method, poverty degree information identification method, device and storage medium |
CN113407516A (en) * | 2021-06-02 | 2021-09-17 | 浪潮软件股份有限公司 | Assisted object management method based on student status data |
CN116664014A (en) * | 2023-07-25 | 2023-08-29 | 临沂大学 | Comprehensive evaluation system and method for college student management |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018090657A1 (en) * | 2016-11-18 | 2018-05-24 | 同济大学 | Bp_adaboost model-based method and system for predicting credit card user default |
CN108960273A (en) * | 2018-05-03 | 2018-12-07 | 淮阴工学院 | A kind of poor student's identification based on deep learning |
CN109145113A (en) * | 2018-08-24 | 2019-01-04 | 北京桃花岛信息技术有限公司 | A kind of student's poverty degree prediction technique based on machine learning |
CN109992592A (en) * | 2019-04-10 | 2019-07-09 | 哈尔滨工业大学 | Impoverished College Studentss recognition methods based on campus consumption card pipelined data |
-
2020
- 2020-03-30 CN CN202010236492.XA patent/CN111415099A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018090657A1 (en) * | 2016-11-18 | 2018-05-24 | 同济大学 | Bp_adaboost model-based method and system for predicting credit card user default |
CN108960273A (en) * | 2018-05-03 | 2018-12-07 | 淮阴工学院 | A kind of poor student's identification based on deep learning |
CN109145113A (en) * | 2018-08-24 | 2019-01-04 | 北京桃花岛信息技术有限公司 | A kind of student's poverty degree prediction technique based on machine learning |
CN109992592A (en) * | 2019-04-10 | 2019-07-09 | 哈尔滨工业大学 | Impoverished College Studentss recognition methods based on campus consumption card pipelined data |
Non-Patent Citations (1)
Title |
---|
魏巍: ""面向高校数据分析和贫困生认定的一卡通分析***"", CNKI优秀硕士学位论文全文库, vol. 2019, no. 12, pages 228 - 232 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112231621A (en) * | 2020-10-13 | 2021-01-15 | 电子科技大学 | Method for reducing element detection limit based on BP-adaboost |
CN112231621B (en) * | 2020-10-13 | 2021-09-24 | 电子科技大学 | Method for reducing element detection limit based on BP-adaboost |
CN112416914A (en) * | 2020-10-15 | 2021-02-26 | 三峡大学 | Difficult student identification and early warning method and system based on big data analysis |
CN112541579A (en) * | 2020-12-23 | 2021-03-23 | 北京北明数科信息技术有限公司 | Model training method, poverty degree information identification method, device and storage medium |
CN112541579B (en) * | 2020-12-23 | 2023-08-08 | 北京北明数科信息技术有限公司 | Model training method, lean degree information identification method, device and storage medium |
CN113407516A (en) * | 2021-06-02 | 2021-09-17 | 浪潮软件股份有限公司 | Assisted object management method based on student status data |
CN116664014A (en) * | 2023-07-25 | 2023-08-29 | 临沂大学 | Comprehensive evaluation system and method for college student management |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111415099A (en) | Poverty-poverty identification method based on multi-classification BP-Adaboost | |
CN111382272B (en) | Electronic medical record ICD automatic coding method based on knowledge graph | |
CN112115963B (en) | Method for generating unbiased deep learning model based on transfer learning | |
CN111950708B (en) | Neural network structure and method for finding daily life habits of college students | |
CN109464122B (en) | Individual core trait prediction system and method based on multi-modal data | |
CN108764621A (en) | A kind of family endowment collaboration nurse dispatching method of data-driven | |
CN109145113A (en) | A kind of student's poverty degree prediction technique based on machine learning | |
CN110197332A (en) | A kind of overall control of social public security evaluation method | |
CN110689523A (en) | Personalized image information evaluation method based on meta-learning and information data processing terminal | |
CN116304035B (en) | Multi-notice multi-crime name relation extraction method and device in complex case | |
CN112927782A (en) | Mental and physical health state early warning system based on text emotion analysis | |
CN114628008A (en) | Social user depression tendency detection method based on heterogeneous graph attention network | |
CN109086794A (en) | A kind of driving behavior mode knowledge method based on T-LDA topic model | |
KR20110098286A (en) | Self health diagnosis system of oriental medicine using fuzzy inference method | |
CN114511759A (en) | Method and system for identifying categories and determining characteristics of skin state images | |
CN110188958A (en) | A kind of method that college entrance will intelligently makes a report on prediction recommendation | |
CN113707317A (en) | Disease risk factor importance analysis method based on mixed model | |
CN112417286A (en) | Method and system for analyzing influence factors gathered by regional culture industry | |
CN107909090A (en) | Learn semi-supervised music-book on pianoforte difficulty recognition methods based on estimating | |
CN111221915B (en) | Online learning resource quality analysis method based on CWK-means | |
CN109992592B (en) | College poverty and poverty identification method based on flow data of campus consumption card | |
CN115115483B (en) | Student comprehensive ability evaluation method integrating privacy protection | |
CN110298331A (en) | A kind of testimony of a witness comparison method | |
TWI761090B (en) | Dialogue data processing system and method thereof and computer readable medium | |
KR100539148B1 (en) | Method and apparatus for providing grade information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |