CN111415099A - Poverty-poverty identification method based on multi-classification BP-Adaboost - Google Patents

Poverty-poverty identification method based on multi-classification BP-Adaboost Download PDF

Info

Publication number
CN111415099A
CN111415099A CN202010236492.XA CN202010236492A CN111415099A CN 111415099 A CN111415099 A CN 111415099A CN 202010236492 A CN202010236492 A CN 202010236492A CN 111415099 A CN111415099 A CN 111415099A
Authority
CN
China
Prior art keywords
poverty
student
data set
data
students
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010236492.XA
Other languages
Chinese (zh)
Inventor
杨建锋
魏瀚哲
王朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern University
Original Assignee
Northwestern University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern University filed Critical Northwestern University
Priority to CN202010236492.XA priority Critical patent/CN111415099A/en
Publication of CN111415099A publication Critical patent/CN111415099A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Marketing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Technology (AREA)
  • Game Theory and Decision Science (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A poverty-suffering recognizing method based on multi-classification BP-Adaboost comprises the following steps: (1) acquiring multi-dimensional historical data of past-year poverty-stricken students, (2) preprocessing the acquired historical data of past-year poverty-stricken students to construct a student characteristic matrix S; (3) dividing multi-dimensional historical data of the past poor students into three categories according to the poor degree, labeling the labels of the categories of the poor students, and constructing a training data set; (4) designing a BP-Adaboost classification model, and training the BP-Adaboost classification model by using a data set constructed by the extracted poverty-suffering characteristic matrix of each poverty-suffering degree in the previous year; (5) the training model is used for assisting the identification of poverty. According to the invention, a BP-Adaboost-based multi-classification model is designed by utilizing student behavior data generated by students at schools, and the model can quickly and accurately classify the students into three categories of poverty, so that the situation of poverty of the students is judged to assist the poverty management staff of the colleges and universities to make decisions.

Description

Poverty-poverty identification method based on multi-classification BP-Adaboost
Technical Field
The invention belongs to the technical field of feature extraction and classification algorithms, and particularly relates to a poverty-poverty identification method based on multi-classification BP-Adaboost.
Background
The subsidy of students is an important content and an important measure for overcoming poverty and hardness, promoting education fairness and further realizing social fairness. The poverty-stricken determination of colleges and universities is the basic work of effective implementation of national student subsidy policies and is important content for promoting the precision of student subsidy. Currently, most of the poverty-stricken assessment of colleges and universities is performed by class public assessment and institution instructor after relevant evidence is presented in villages and towns where students are located. The identification mode has the problems of poor identification deviation, easy doping of individual subjective feelings in each link evaluation, giving up evaluation due to self-esteem of poor students and the like, so that the fairness, the efficiency and the accuracy of the poor students are influenced.
The coming of big data era and the advanced learning method are mature day by day, new ideas and technical supports are brought to the funding work of poverty-suffering, and new opportunities are brought to colleges and universities to utilize big data and the advanced learning method to promote the rapid, convenient, efficient and accurate funding work. At present, the information-based construction of colleges and universities has been developed greatly, all behaviors of students in a campus can generate data, various characteristics of the students are recorded, the data reflect the real conditions of the students, the data can be reasonably applied to assist the poor students in the identification process to a certain extent, the identification result is more real and objective, and more help is provided for the really poor students.
At present, the diagnosis and assessment work of poverty and poverty school is still in an exploration stage by using a big data means and a machine learning method for assisting the diagnosis and the assessment, and a unified diagnosis and assessment method is not available in China. Although some techniques provide some points and ideas, none of them can meet the practical application or are difficult to implement, for example: the application number 201810972342.8 and the patent name are patent application documents of a student poverty degree prediction method based on machine learning, although the student poverty degree prediction is carried out aiming at behavior data generated by students at a school, the required data are various, dozens of types of data are required to be used, the data dimension disaster is easily caused, and the realization difficulty is increased.
Therefore, the effective and accurate realization of poverty-stricken birth identification by using a big data means and a machine learning method becomes the key for researching the auxiliary poverty-stricken birth accurate subsidization.
Disclosure of Invention
In order to solve the problems that high-dimensional poverty-suffering data are difficult to process and poverty-suffering difficulty is difficult to accurately subsidize in the prior art, the invention provides a poverty-suffering identifying method based on multi-classification BP-Adaboost.
In order to achieve the purpose, the invention adopts the technical scheme that:
a poverty-suffering recognizing method based on multi-classification BP-Adaboost is characterized by comprising the following steps:
step 1, obtaining historical behavior data of students, and obtaining multidimensional historical data of poverty students in the past year, wherein the multidimensional historical data of the poverty students in the past year comprise the family condition and the economic condition of the students, the campus consumption condition, the student score condition and the basic information of the poverty students;
the specific steps of acquiring past year poverty-stricken multidimensional historical data and establishing a poverty-stricken feature matrix are as follows:
1) extracting the family condition and the economic condition of the student, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the card is established or not, whether the student has disability or illness or not, whether the parent has disability or illness or not, whether the student is a specially-trapped support person in urban and rural areas or not, and whether the student is a lowest life guarantee family in urban and rural areas or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) let E be the student family situation and economic situation data set1,e2,…,enWhere n denotes the student number, enWhether the disease is a solitary child, whether the disease is an orphan, whether a card-setting impoverished user is established, whether a knight or a pacifying child, whether a student has disability or illness, whether parents have disability or illness, whether people are particularly stranded in urban and rural areas, and whether the lowest life guarantee is established in urban and rural areasA matrix of households;
3) let campus consumption data set C ═ { C ═ C1,c2,…,cnWhere n denotes the student number, cnIs a matrix composed of total consumption, maximum daily consumption, average daily consumption, maximum monthly consumption and average daily consumption;
4) let student achievement situation data set G ═ G1,g2,…,gnWhere n denotes the student number, gnIs a matrix composed of achievement points, average achievement of the scholarly period and the number of the hanging departments;
5) let poverty basic situation data set B ═ B1,b2,…,bnWhere n denotes the student number, bnIs a matrix formed by whether the green channel enters the study or not and whether the biographical loan is transacted or not;
step 2, preprocessing the past year poverty and habitability multi-dimensional historical data collected in the step 1; the method comprises the following specific steps:
1) processing missing values in the data set, wherein the missing values enable data to lose part of information, and filling missing empty fields by using an average value;
2) removing repeated data, sequencing the poor and sleepy data of the previous year according to the serial numbers of students, detecting whether records are repeated or not by comparing whether adjacent records are similar or not, and deleting repeated records if the records are repeated;
3) carrying out feature coding on a student family condition and economic condition data set E and a poor and sleepy life basic condition data set B, and adopting a one-hot coding mode;
4) normalization, namely normalizing the campus consumption condition data set C and the student achievement condition data set G by using a Sigmoid function;
5) a student family condition and economic condition data set E and a campus consumption condition data set
Figure BDA0002431162660000041
Student achievement situation data set
Figure BDA0002431162660000042
Merging the poverty-poverty basic situation data sets B into a student characteristic matrix S;
step 3, dividing the past year poverty-poor multi-dimensional historical data into three categories according to poverty degrees, labeling student poverty-poor category labels, and constructing a training data set, wherein the specific steps are as follows:
classifying the students into three levels according to the grade of the past year poverty, namely non-poverty, general poverty and special poverty, and using one-hot coding as a class label of the poverty of the students to construct a training data set T, wherein T is { (x)1,y1),…,(xi,yi),…,(xn,yn) Where the data x is inputiRandomly extracted from student feature matrix S, label yi∈ {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and special poverty, respectively, n is the data amount, and the data amount in T is 70% of the student feature matrix;
step 4, designing a BP-Adaboost classification model, and training the BP-Adaboost classification model by using the data set constructed by the poverty-suffering characteristic matrix of each poverty-suffering degree in the past year extracted in the step one, wherein the method specifically comprises the following steps:
1) inputting training data set T, initializing weight D ═ W of training data11,…,W1i,…,w1n) Wherein w is1i1/N, i is 1,2, … N, N represents the amount of data in the student feature matrix S; meanwhile, setting the iteration number M to be 1, and setting the total iteration number to be M, wherein the M is 10;
2) starting iteration, and adopting a three-layer neural network, wherein the neural network adopts a BP neural network and comprises an input layer, a hidden layer and an output layer, the input layer is provided with 17 nodes, the hidden layer is provided with 18 nodes, and the output layer is provided with 3 nodes;
3) training the training data set with weight distribution to obtain a weak classifier: gm(x) The method comprises the following steps X → {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and extra poverty, respectively;
4) calculating training data in the current classifier Gm(x) Error rate of:
Figure BDA0002431162660000051
5) calculation of Gm(x) Coefficient α ofm
Figure BDA0002431162660000052
Wherein K represents the species of poverty, αmRepresents Gm(x) Importance in the final classifier, αmWith errmDecreasing and increasing, i.e. the smaller the classification error rate, the greater the contribution of the classifier in the final classifier;
6) updating the weight distribution of the training data set:
Dm+1=(Wm+1,1,…,Wm+1,i,…,Wm+1,N),
Figure BDA0002431162660000053
Wm+1,ican be converted to the following formula:
Figure BDA0002431162660000061
from this, the basic classifier Gm(x) The weight of the misclassified samples is enlarged, and the weight of the correctly classified samples is reduced, so that the BP-Adaboost classification model focuses more on the misclassified samples, and the misclassified samples play a greater role in the next round of learning, thereby improving the classification capability of the classification model;
Zmis a normalization factor:
Figure BDA0002431162660000062
it makes Dm+1Becoming a probability distribution;
7) judging whether to terminate the iteration, when M is less than M, thenSkipping to the 3 rd step in the step 3), and continuing to iterate when the iteration time m is m + 1; otherwise, terminating iteration, finishing the training of the BP-Adaboost classifier, and obtaining the final classifier
Figure BDA0002431162660000063
Step 4, training the model for assisting poverty-stricken life determination, and specifically comprising the following steps:
1) extracting the family condition and economic condition of the student to be identified, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the student is a card-setting poor family or not, whether the student is disabled or ill or not, whether the parent is disabled or ill or not, whether the urban and rural particularly-stranded support personnel exist or not, and whether the urban and rural lowest life support family exists or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) preprocessing the acquired student data and constructing a student characteristic matrix S;
3) inputting the student feature matrix S to be classified into the trained BP-Adaboost classification model to obtain a recognition result, if the output result is 1, the student is not poverty, if the output result is 2, the student is general poverty, and if the output result is 3, the student is particularly poverty.
The campus consumption condition data set C and the student achievement condition data set G are normalized by using a Sigmoid function; the method comprises the following specific steps:
1) normalizing each item of data in the campus consumption data condition data set C by using Sigmoid
Figure BDA0002431162660000071
For the normalized student campus consumption data,
Figure BDA0002431162660000072
the normalized campus consumption data situation data set is recorded as
Figure BDA0002431162660000073
2) Normalizing each item of data in the student achievement condition data set G by using Sigmoid
Figure BDA0002431162660000074
For the normalized student achievement situation data,
Figure BDA0002431162660000075
the normalized campus consumption data situation data set is recorded as
Figure BDA0002431162660000076
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention provides a poverty-suffering organism identification method based on multi-classification BP-Adaboost, which changes the traditional poverty-suffering organism identification mode and overcomes the artificial subjectivity by adopting a machine learning method in the identification process; compared with the existing method for identifying poverty-stricken students by machine learning, the method selects key factors in poverty-stricken student identification, reduces data dimensionality of students and avoids dimensionality disaster in machine learning; the method takes BP-Adaboost as a classifier, has higher classification precision, and effectively improves the accuracy of poverty-stricken birth determination.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flowchart of the training of the BP-Adaboost classification model.
Detailed Description
The present invention will be further described with reference to the following embodiments and the accompanying drawings, but the present invention is not limited to the following embodiments.
A poverty-suffering recognizing method based on multi-classification BP-Adaboost comprises the following steps:
step (1): collecting historical data of poverty and poverty in the past year; the past-year poverty-poor student multi-dimensional historical data comprises student family conditions, economic conditions, campus consumption conditions, student score conditions and poverty-poor student basic information, and a past-year poverty-poor student feature matrix is established; the establishment of the classification model in the invention is constructed based on the characteristics of poverty-stricken birth data, so that the accurate selection of basic data lays a foundation for the accurate classification of late poverty-stricken birth, and the specific steps are as follows (1.1) to (1.6):
(1.1) extracting the family condition and the economic condition of the student, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the card-setting poverty-stricken is established or not, whether the student has disability or illness or not, whether the parent has disability or illness or not, whether the urban and rural area is particularly stranded for the support personnel or not, and whether the urban and rural area is the lowest life guarantee family or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
(1.2) set the student family situation and economic situation data set E ═ E1,e2,…,enWhere n denotes the student number, enThe method is a matrix which consists of whether the student is a solitary child or not, whether the student is an orphan or not, whether a card-setting poverty-stricken user is established or not, whether the student is a burning man or a pacifying child or not, whether the student is disabled or sick or not, whether parents are disabled or sick or not, whether the urban and rural particularly-sleepy support personnel exist or not and whether the urban and rural lowest life support family exists or not, and a student family condition and economic condition data set E is established;
(1.3) setting campus consumption condition data set C ═ C1,c2,…,cnWhere n denotes the student number, cnThe campus consumption condition data set C is established by a matrix consisting of total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times;
(1.4) set student achievement situation data set G ═ G1,g2,…,gnTherein ofn represents the student number, gnThe student achievement situation data set G is a matrix consisting of achievement points, average achievement of a scholarly period and the number of the department hanging, and is established;
(1.5) setting the poverty basic situation data set B ═ B1,b2,…,bnWhere n denotes the student number, bnEstablishing a poverty-poor student basic condition data set B by a matrix formed by whether the green channel enters the school or not and whether the student loan is transacted or not;
step (2): data obtained in specific practice often has missing values and repeated values, for example, student consumption information is missing due to a school canteen card reader fault, so that data preprocessing is required before data is used, and preprocessing has no standard flow, and only a data preprocessing process is designed for the flow related to the invention, and the specific process is as described in steps (2.1) to (2.5):
(2.1) missing values in a data set are processed, the missing values enable data to lose partial information, and some models with poor robustness can not calculate the data due to the missing values, the campus consumption condition data and student performance condition data related to the method are possibly subjected to data missing due to acquisition equipment or other reasons, and missing empty fields are filled by using an average value;
(2.2) removing repeated data, sequencing the poor and sleepy data in the past year according to the student numbers, detecting whether the records are repeated or not by comparing whether the adjacent records are similar or not, and deleting the repeated records if the records are repeated;
(2.3) carrying out feature coding on the student family condition and economic condition data set E and the poverty-poor life basic condition data set B, and adopting a one-hot coding mode;
(2.4) data normalization is to adjust some characteristics of attribute values, the data is scaled to fall into a small specific interval, in the specific implementation, the campus consumption condition data set C and the student achievement condition data set G need to be normalized by using a Sigmoid function, and the specific steps are described as the step (2.4.1) and the step (2.4.2):
(2.4.1) use of each item of data in the campus consumption data situation dataset CSigmoid is normalized by
Figure BDA00024311626600001010
For the normalized student campus consumption data,
Figure BDA0002431162660000102
Figure BDA0002431162660000103
the normalized campus consumption data situation data set is recorded as
Figure BDA0002431162660000104
(2.4.2) normalizing each item of data in the student achievement situation data set G by using Sigmoid
Figure BDA0002431162660000105
For the normalized student achievement situation data,
Figure BDA0002431162660000106
the normalized campus consumption data situation data set is recorded as
Figure BDA0002431162660000107
(2.5) data sets E and E of family conditions and economic conditions of students and data sets of campus consumption conditions
Figure BDA0002431162660000108
Student achievement situation data set
Figure BDA0002431162660000109
Merging the poverty-poverty basic situation data sets B into a student characteristic matrix S;
and (3): dividing the poverty-poor living data in the student feature matrix S into three classes according to the national poverty-poor living resource assistant standard, namely poverty-poor, general poverty-poor and special poverty-poor, and using one-hot coding as the class label of the poverty-poor of the student to construct a training data set T, wherein T is { (x)1,y1),…,(xi,yi),…,(xn,yn) Where the data x is inputiRandomly extracted from student feature matrix S, label yi∈ {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and special poverty, respectively, n is the data amount, and the data amount in T is 70% of the student feature matrix;
and (4): as shown in fig. two, a BP-Adaboost poor living classification model is designed, and the classification model is trained by using data with weights, and the specific steps are as follows:
(3.1) inputting training data set T, initializing weight D ═ W of training data11,…,W1i,…,w1n) Wherein w is1i1/N, i is 1,2, … N, N represents the amount of data in the student feature matrix S; meanwhile, setting the iteration number M to be 1, and setting the total iteration number to be M, wherein the M is 10;
(3.2) starting iteration, adopting a three-layer neural network, wherein the neural network adopts a BP neural network and comprises an input layer, a hidden layer and an output layer, the input layer is provided with 17 nodes, the hidden layer is provided with 18 nodes, and the output layer is provided with 3 nodes;
(3.3) training the training data set with weight distribution to obtain a weak classifier: : gm(x) The method comprises the following steps X → {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and extra poverty, respectively;
(3.4) calculating the training data in the current classifier Gm(x) Error rate of:
Figure BDA0002431162660000111
(3.5) calculation of Gm(x) Coefficient α ofm
Figure BDA0002431162660000112
K denotes species of poverty, 1,2, 3 denote poverty, general poverty and special poverty, αmRepresents Gm(x) Importance in the final classifier, αmWith errmDecreasing and increasing, i.e. the smaller the classification error rate, the greater the contribution of the classifier in the final classifier; (3.6) updating the weight distribution of the training data set:
Dm+1=(wm+1,1,…,Wm+1,i,…,Wm+1,N),
Figure BDA0002431162660000121
Wm+1,ican be converted to the following formula:
Figure BDA0002431162660000122
from this, the basic classifier Gm(x) The weight of the misclassified samples is enlarged, and the weight of the correctly classified samples is reduced, so that the BP-Adaboost classification model focuses more on the misclassified samples, and the misclassified samples play a greater role in the next round of learning, thereby improving the classification capability of the classification model;
Zmis a normalization factor:
Figure BDA0002431162660000123
it makes Dm+1Becoming a probability distribution;
(3.7) judging whether to terminate iteration, and when M is less than M, skipping to the step (3.3), wherein the iteration time M is M +1, and continuing to iterate; otherwise, terminating iteration, finishing the training of the BP-Adaboost classifier, and obtaining the final classifier
Figure BDA0002431162660000124
And (4): the method comprises the following steps of obtaining data of students to be identified, preprocessing the data of the students, inputting the preprocessed data into a classification model, and using a classification result for auxiliary identification of poverty-stricken students, wherein the specific steps are as follows:
(4.1) extracting the family condition and economic condition of the student to be identified, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the student is a card-setting impoverished or not, whether the student is disabled or ill or not, the level of the disabled or ill degree of the student, whether the parent is disabled or ill or not, whether the parent is disabled or ill degree, whether the person is particularly stranded in urban and rural areas or whether the family is the lowest life guarantee family in urban and rural areas; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
(4.2) preprocessing the acquired student data, wherein the preprocessing step comprises missing value processing, duplicate removal, feature coding and normalization, and constructing a student feature matrix S;
(4.3) inputting the student feature matrix S to be classified into the trained BP-Adaboost classification model to obtain a confirmation result, if the output result is 1, the student is not poverty, if the output result is 2, the student is general poverty, and if the output result is 3, the student is special poverty;
(4.4) actually examining the identification result of the classification model, submitting the discovered suspected invisible poverty and false identification student lists to college managers for processing, and continuously adjusting the model according to the feedback verification condition;
while the foregoing shows and describes the principles of the present invention, together with the advantages thereof, the embodiments of the invention are not limited by the foregoing examples, which are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this disclosure.

Claims (2)

1. A poverty-suffering recognizing method based on multi-classification BP-Adaboost is characterized by comprising the following steps:
step 1, obtaining historical behavior data of students, and obtaining multidimensional historical data of poverty students in the past year, wherein the multidimensional historical data of the poverty students in the past year comprise the family condition and the economic condition of the students, the campus consumption condition, the student score condition and the basic information of the poverty students;
the specific steps of acquiring past year poverty-stricken multidimensional historical data and establishing a poverty-stricken feature matrix are as follows:
1) extracting the family condition and the economic condition of the student, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the card is established or not, whether the student has disability or illness or not, whether the parent has disability or illness or not, whether the student is a specially-trapped support person in urban and rural areas or not, and whether the student is a lowest life guarantee family in urban and rural areas or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) let E be the student family situation and economic situation data set1,e2,…,enWhere n denotes the student number, enThe system is a matrix consisting of whether the child is solitary or not, whether the impoverished user is established in a file card or not, whether the parent is suffering from disability or illness or not, whether the support personnel are particularly stranded in cities and countryside or not and whether the lowest life support family is in cities and countryside or not;
3) let campus consumption data set C ═ { C ═ C1,c2,…,cnWhere n denotes the student number, cnIs a matrix composed of total consumption, maximum daily consumption, average daily consumption, maximum monthly consumption and average daily consumption;
4) let student achievement situation data set G ═ G1,g2,…,gnWhere n denotes the student number, gnIs a matrix composed of achievement points, average achievement of the scholarly period and the number of the hanging departments;
5) let poverty basic situation data set B ═ B1,b2,…,bnWhere n denotes the student number, bnIs a matrix formed by whether the green channel enters the study or not and whether the biographical loan is transacted or not;
step 2, preprocessing the past year poverty and habitability multi-dimensional historical data collected in the step 1; the method comprises the following specific steps:
1) processing missing values in the data set, wherein the missing values enable data to lose part of information, and filling missing empty fields by using an average value;
2) removing repeated data, sequencing the poor and sleepy data of the previous year according to the serial numbers of students, detecting whether records are repeated or not by comparing whether adjacent records are similar or not, and deleting repeated records if the records are repeated;
3) carrying out feature coding on a student family condition and economic condition data set E and a poor and sleepy life basic condition data set B, and adopting a one-hot coding mode;
4) normalization, namely, normalizing the campus consumption condition data set C and the student achievement condition data set G by using a Sigmoid function, and recording the normalized campus consumption condition data set as
Figure FDA0002431162650000021
Student achievement situation data set
Figure FDA0002431162650000022
5) A student family condition and economic condition data set E and a campus consumption condition data set
Figure FDA0002431162650000023
Student achievement situation data set
Figure FDA0002431162650000024
Merging the poverty-poverty basic situation data sets B into a student characteristic matrix S;
step 3, dividing the past year poverty-poor multi-dimensional historical data into three categories according to poverty degrees, labeling student poverty-poor category labels, and constructing a training data set, wherein the specific steps are as follows:
classifying the students into three levels according to the grade of the past year poverty, namely non-poverty, general poverty and special poverty, and using one-hot coding as a class label of the poverty of the students to construct a training data set T, wherein T is { (x)1,y1),…,(xi,yi),…,(xn,yn) Where the data x is inputiRandomly extracted from student feature matrix S, label yi∈ {001, 010,011 }, where 001,010,011 corresponds to no poverty, generally poverty, and particularly poverty, respectively, and n is the amount of data.
Step 4, designing a BP-Adaboost classification model, and training the BP-Adaboost classification model by using the data set constructed by the poverty-suffering characteristic matrix of each poverty-suffering degree in the past year extracted in the step one, wherein the method specifically comprises the following steps:
1) inputting training data set T, initializing weight D ═ w of training data11,…,w1i,…,w1n) Wherein w is1i1/N, i is 1,2, … N, N represents the amount of data in the student feature matrix S; meanwhile, setting the iteration number M to be 1, and setting the total iteration number to be M, wherein the M is 10;
2) starting iteration, and adopting a three-layer neural network, wherein the neural network adopts a BP neural network and comprises an input layer, a hidden layer and an output layer, the input layer is provided with 17 nodes, the hidden layer is provided with 18 nodes, and the output layer is provided with 3 nodes;
3) training the training data set with weight distribution to obtain a weak classifier: gm(x) The method comprises the following steps X → {001, 010,011 }, where 001,010,011 correspond to poverty, general poverty, and extra poverty, respectively;
4) calculating training data in the current classifier Gm(x) Error rate of:
Figure FDA0002431162650000041
wherein y isi∈ 001,010,011, where 001,010,011 corresponds to no poverty, general poverty, and special poverty, respectively, and n is the number of data;
5) calculation of Gm(x) Coefficient α ofm:
Figure FDA0002431162650000042
K denotes species of poverty, αmRepresents Gm(x) Importance in the final classifier, αmWith errmDecreasing and increasing, i.e. the smaller the classification error rate, the greater the contribution of the classifier in the final classifier;
6) updating the weight distribution of the training data set:
Dm+1=(wm+1,1,…,wm+1,i,…,wm+1,N),
Figure FDA0002431162650000043
wm+1,ican be converted to the following formula:
Figure FDA0002431162650000044
from this, the basic classifier Gm(x) The weight of the misclassified samples is enlarged, and the weight of the correctly classified samples is reduced, so that the BP-Adaboost classification model focuses more on the misclassified samples, and the misclassified samples play a greater role in the next round of learning, thereby improving the classification capability of the classification model;
Zmis a normalization factor:
Figure FDA0002431162650000045
it makes Dm+1Becoming a probability distribution;
7) judging whether to terminate the iteration when m<When M is needed, jumping to the 3 rd step in step 3), and continuing the iteration when the iteration time M is M + 1; otherwise, terminating iteration, finishing the training of the BP-Adaboost classifier, and obtaining the final classifier
Figure FDA0002431162650000051
Step 4, training the model for assisting poverty-stricken life determination, and specifically comprising the following steps:
1) extracting the family condition and economic condition of the student to be identified, including whether the student is a solitary child or not, whether the student is an orphan or not, whether the student is a card-setting poor family or not, whether the student is disabled or ill or not, whether the parent is disabled or ill or not, whether the urban and rural particularly-stranded support personnel exist or not, and whether the urban and rural lowest life support family exists or not; extracting campus consumption conditions including total consumption amount, maximum daily consumption amount, average daily consumption amount, maximum monthly consumption amount and average daily consumption times; extracting student achievement conditions including achievement points, average achievement of a scholarly period and the number of hung disciplines; extracting basic conditions of poverty and poverty, including whether to enter a school through a green channel or not and whether to transact a biographical loan or not;
2) preprocessing the acquired student data and constructing a student characteristic matrix S;
3) inputting the student feature matrix S to be classified into the trained BP-Adaboost classification model to obtain a recognition result, if the output result is 1, the student is not poverty, if the output result is 2, the student is general poverty, and if the output result is 3, the student is particularly poverty.
2. The poverty-identifying method based on multi-classification BP-Adaboost as claimed in claim 1, wherein the campus consumption condition data set C and student achievement condition data set G are normalized by using Sigmoid function; the method comprises the following specific steps:
1) normalizing each item of data in the campus consumption data condition data set C by using Sigmoid
Figure FDA0002431162650000061
For the normalized student campus consumption data,
Figure FDA0002431162650000062
the normalized campus consumption data situation data set is recorded as
Figure FDA0002431162650000063
2) Normalizing each item of data in the student achievement condition data set G by using Sigmoid
Figure FDA0002431162650000064
For the normalized student achievement situation data,
Figure FDA0002431162650000065
the normalized campus consumption data situation data set is recorded as
Figure FDA0002431162650000066
CN202010236492.XA 2020-03-30 2020-03-30 Poverty-poverty identification method based on multi-classification BP-Adaboost Pending CN111415099A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010236492.XA CN111415099A (en) 2020-03-30 2020-03-30 Poverty-poverty identification method based on multi-classification BP-Adaboost

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010236492.XA CN111415099A (en) 2020-03-30 2020-03-30 Poverty-poverty identification method based on multi-classification BP-Adaboost

Publications (1)

Publication Number Publication Date
CN111415099A true CN111415099A (en) 2020-07-14

Family

ID=71494673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010236492.XA Pending CN111415099A (en) 2020-03-30 2020-03-30 Poverty-poverty identification method based on multi-classification BP-Adaboost

Country Status (1)

Country Link
CN (1) CN111415099A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231621A (en) * 2020-10-13 2021-01-15 电子科技大学 Method for reducing element detection limit based on BP-adaboost
CN112416914A (en) * 2020-10-15 2021-02-26 三峡大学 Difficult student identification and early warning method and system based on big data analysis
CN112541579A (en) * 2020-12-23 2021-03-23 北京北明数科信息技术有限公司 Model training method, poverty degree information identification method, device and storage medium
CN113407516A (en) * 2021-06-02 2021-09-17 浪潮软件股份有限公司 Assisted object management method based on student status data
CN116664014A (en) * 2023-07-25 2023-08-29 临沂大学 Comprehensive evaluation system and method for college student management

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
CN108960273A (en) * 2018-05-03 2018-12-07 淮阴工学院 A kind of poor student's identification based on deep learning
CN109145113A (en) * 2018-08-24 2019-01-04 北京桃花岛信息技术有限公司 A kind of student's poverty degree prediction technique based on machine learning
CN109992592A (en) * 2019-04-10 2019-07-09 哈尔滨工业大学 Impoverished College Studentss recognition methods based on campus consumption card pipelined data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
CN108960273A (en) * 2018-05-03 2018-12-07 淮阴工学院 A kind of poor student's identification based on deep learning
CN109145113A (en) * 2018-08-24 2019-01-04 北京桃花岛信息技术有限公司 A kind of student's poverty degree prediction technique based on machine learning
CN109992592A (en) * 2019-04-10 2019-07-09 哈尔滨工业大学 Impoverished College Studentss recognition methods based on campus consumption card pipelined data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏巍: ""面向高校数据分析和贫困生认定的一卡通分析***"", CNKI优秀硕士学位论文全文库, vol. 2019, no. 12, pages 228 - 232 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231621A (en) * 2020-10-13 2021-01-15 电子科技大学 Method for reducing element detection limit based on BP-adaboost
CN112231621B (en) * 2020-10-13 2021-09-24 电子科技大学 Method for reducing element detection limit based on BP-adaboost
CN112416914A (en) * 2020-10-15 2021-02-26 三峡大学 Difficult student identification and early warning method and system based on big data analysis
CN112541579A (en) * 2020-12-23 2021-03-23 北京北明数科信息技术有限公司 Model training method, poverty degree information identification method, device and storage medium
CN112541579B (en) * 2020-12-23 2023-08-08 北京北明数科信息技术有限公司 Model training method, lean degree information identification method, device and storage medium
CN113407516A (en) * 2021-06-02 2021-09-17 浪潮软件股份有限公司 Assisted object management method based on student status data
CN116664014A (en) * 2023-07-25 2023-08-29 临沂大学 Comprehensive evaluation system and method for college student management

Similar Documents

Publication Publication Date Title
CN111415099A (en) Poverty-poverty identification method based on multi-classification BP-Adaboost
CN111382272B (en) Electronic medical record ICD automatic coding method based on knowledge graph
CN112115963B (en) Method for generating unbiased deep learning model based on transfer learning
CN111950708B (en) Neural network structure and method for finding daily life habits of college students
CN109464122B (en) Individual core trait prediction system and method based on multi-modal data
CN108764621A (en) A kind of family endowment collaboration nurse dispatching method of data-driven
CN109145113A (en) A kind of student&#39;s poverty degree prediction technique based on machine learning
CN110197332A (en) A kind of overall control of social public security evaluation method
CN110689523A (en) Personalized image information evaluation method based on meta-learning and information data processing terminal
CN116304035B (en) Multi-notice multi-crime name relation extraction method and device in complex case
CN112927782A (en) Mental and physical health state early warning system based on text emotion analysis
CN114628008A (en) Social user depression tendency detection method based on heterogeneous graph attention network
CN109086794A (en) A kind of driving behavior mode knowledge method based on T-LDA topic model
KR20110098286A (en) Self health diagnosis system of oriental medicine using fuzzy inference method
CN114511759A (en) Method and system for identifying categories and determining characteristics of skin state images
CN110188958A (en) A kind of method that college entrance will intelligently makes a report on prediction recommendation
CN113707317A (en) Disease risk factor importance analysis method based on mixed model
CN112417286A (en) Method and system for analyzing influence factors gathered by regional culture industry
CN107909090A (en) Learn semi-supervised music-book on pianoforte difficulty recognition methods based on estimating
CN111221915B (en) Online learning resource quality analysis method based on CWK-means
CN109992592B (en) College poverty and poverty identification method based on flow data of campus consumption card
CN115115483B (en) Student comprehensive ability evaluation method integrating privacy protection
CN110298331A (en) A kind of testimony of a witness comparison method
TWI761090B (en) Dialogue data processing system and method thereof and computer readable medium
KR100539148B1 (en) Method and apparatus for providing grade information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination