CN109460914A - Method is determined based on the bridge health grade of semi-supervised error correction study - Google Patents

Method is determined based on the bridge health grade of semi-supervised error correction study Download PDF

Info

Publication number
CN109460914A
CN109460914A CN201811307692.9A CN201811307692A CN109460914A CN 109460914 A CN109460914 A CN 109460914A CN 201811307692 A CN201811307692 A CN 201811307692A CN 109460914 A CN109460914 A CN 109460914A
Authority
CN
China
Prior art keywords
data
bridge
label
cluster
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811307692.9A
Other languages
Chinese (zh)
Other versions
CN109460914B (en
Inventor
杨云
杨璐晖
黄雪梅
黄韶峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201811307692.9A priority Critical patent/CN109460914B/en
Publication of CN109460914A publication Critical patent/CN109460914A/en
Application granted granted Critical
Publication of CN109460914B publication Critical patent/CN109460914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of bridge health grades based on semi-supervised error correction study to determine method, by the original tag and character separation of primitive bridge label data;Using unsupervised learning method, cluster operation is carried out to data characteristics;Label unification is carried out with primitive bridge label data to the cluster data obtained by cluster operation with classification information;According to label unified result, using the data identical with original tag of cluster labels in training set as high confidence level bridge data collection, the two different data of class label are as low confidence data set;Low confidence bridge data collection is iterated using high confidence level bridge data collection training classifier, and using trained classifier by supervised learning method according to preset iterative parameter k, updates high confidence level bridge data collection;It is trained again using updated high confidence level bridge data collection, obtains final classifier;Using final classifier, the bridge data of bridge is inputted, obtains bridge health grade.

Description

Method is determined based on the bridge health grade of semi-supervised error correction study
Technical field
The invention belongs to bridge health grade assessment technology fields, are learnt more particularly to a kind of based on semi-supervised error correction Bridge health grade determine method.
Background technique
Under the background that there are a large amount of road and bridges in China, highway bridge maintenance management is a huge task, The developing direction for complying with highway in China bridge maintenance management " evaluation criteria, the scientification of decision-making ", further strengthens highway bridge Maintenance management, it is horizontal to improve highway bridge maintenance management, ensures highway bridge maintenance quality, this for China economic development with Social stability is of great significance to.The evaluation situation, that is, the health of bridge etc. of bridge general technical situation grade Grade, will direct decision bridge maintenance administrative staff to the attention degree of bridge, that is, bridge inspects periodically frequency and bridge Beam can normal use etc., and these are particularly important to the maintenance management work of bridge.Therefore, to bridge general health grade It is very crucial to the related work of highway bridge maintenance management to make accurate reasonable judgement.
Semi-supervised learning is the very active application study direction of one of machine learning, the data that semi-supervised learning uses In, a part is the data with label, and a part is the data of not label.Semi-supervised algorithm uses simultaneously has label The data of data and not label carry out training pattern, and data identification work is then carried out using trained model.When making When with semi-supervised learning, it will it is required that personnel few as far as possible are engaged in work, meanwhile, and can bring relatively high accuracy. In the method, while supervised learning and unsupervised learning having been used to constitute the assessment models of entire semi-supervised learning.Supervision Study, which refers to, carries out model training using the data with label, then carries out data using trained model and identifies work, Unsupervised learning be then using data be not related to label, in untagged data, it is intended to find hiding structure, come Identification work is carried out to data.
Have at present for the determination method of bridge health grade, the step analysis Evaluation Method of Zhang Yongqing, China's " highway maintenance Technical specification " JTJ073-96 Bridge assessment method, bridge defective eigenpairs index BCI, the U.S. " investigation of national bridge structure with The record and encoded guidelines of evaluation " SR index etc..There are also certain methods, basic principle is similar, it may be assumed that first by expert couple Interested index makees fuzzy classification, and provides the weight of index, makes the regional standard of industry;In use, to the rule of real bridge Determine index to be scored with fuzzy classification, and weighted sum, obtains evaluation conclusion.The above method is all wanted to utilize a number It learns model (or several formula) and the TOP SCORES of bridge is calculated according to the scoring of each component, so that it is determined that bridge defective eigenpairs Grade proposes corresponding maintenance counterproposal.These methods are all since there are errors between mathematical model (or formula) and practical problem Problem, that is to say, that the model can not react the complicated pass between each component situation and bridge overall state with enough precision System.Contacting between data and label oneself can be found out by using disaggregated model using the mode of traditional machine learning Characteristic, can be established with higher accuracy component level scoring bridge general technical rating between relationship.But It is that traditional disaggregated model assumes that the label of learning sample is all correctly, for having in sample data not really in training The bridge data of label is calibrated, traditional machine learning method then will receive the influence of the data of uncertain label, so that classification mould Type cannot reach good accuracy.
Summary of the invention
The purpose of the present invention is to provide a kind of bridge health grades based on semi-supervised error correction study to determine method, with It solves current supervised learning bridge health grade and determines that method because there is the bridge data of uncertain label in sample data, causes The problem of assessment result accuracy is not high, influences highway bridge maintenance management.
The technical scheme adopted by the invention is that method is determined based on the bridge health grade of semi-supervised error correction study, Specific step is as follows:
Step S1, the label information and character separation that the primitive bridge data of label will be had in training set, i.e., will be original The original tag and character separation of bridge label data;
Step S2, using unsupervised learning method, cluster operation is carried out to the data characteristics of primitive bridge label data;
Step S3, to the cluster data and primitive bridge label data progress label obtained by cluster operation with classification information It is unified;
Step S4, according to label unified result, using the data identical with original tag of cluster labels in training set as height Confidence level bridge data collection, the two different data of class label are as low confidence data set;
Step S5, it is assembled for training by supervised learning method using high confidence level bridge data according to preset iterative parameter k Practice classifier, and low confidence bridge data collection is iterated using trained classifier, updates high confidence level bridge data Collection;
Step S6, it is trained again using updated high confidence level bridge data collection, obtains final classifier;
Step S7, using final classifier, the bridge data of bridge is inputted, obtains bridge health grade.
Further, label information is bridge technology grade in the step S1.
Further, specific step is as follows by the step S3:
Step S31, take each categorical data in cluster data respectively with every a kind of number in bridge original tag data According to seeking common ground;
Step S32, number of data contained by number of data cluster data corresponding with its contained by each single item intersection data is calculated Ratio, obtain the label probability of each corresponding cluster data of intersection data;
Step S33, the corresponding label probability maximum value of cluster data of each classification is filtered out;
Step S34, the corresponding intersection data of the label probability maximum value of the cluster data of each classification are determined, and by the friendship The label of the corresponding primitive bridge label data of collection data assigns such cluster data, completes label unification, such cluster data Obtain cluster labels.
Further, specific step is as follows by the step S5:
Step S51, judge whether iterative parameter K reaches setting value, if so, then stopping iteration;
Step S52, by unsupervised learning method, low confidence bridge data clustering is obtained with clustering information Low confidence cluster data;
Step S53, it is united to low confidence bridge data and low confidence cluster data using the method for step S31~S34 One label assigns cluster labels corresponding to low confidence cluster data;
Step S54, by supervised learning method, high confidence level bridge data collection training classifier is utilized;
Step S55, the consistent low confidence bridge data of cluster labels, original tag and tag along sort is merged into height Confidence level bridge data is concentrated;And again to cluster labels, original tag and the inconsistent low confidence bridge data of tag along sort Secondary iteration;
Step S56, iterative parameter K adds 1, the S61 that repeats the above steps~step S65.
Further, the size of the iterative parameter K is according to the wrong data in training set size and the training set estimated Accounting determines.
The invention has the advantages that determining method based on the bridge health grade of semi-supervised error correction study, do not need Complicated modeling analysis process is avoided during modeling analysis to the hypothesis of practical problem and simplified bring error, is mentioned The precision of high bridge Health Category assessment models;It does not need that very big cost is spent to calibrate label to bridge data, uses tradition side Formula (mathematical formulae weighted scoring) obtains the historical data of the label of certain accuracy, then is learnt by error correction, rejects big portion Divide tag error or the uncertain bridge data of label in historical data, filters out the bridge number that bridge data concentrates confidence level high According to, reduce the influence of tag error or the uncertain bridge data of label to classifier training, improve the accuracy of classifier, And then improve bridge health grade assessment accuracy, aid in determining whether bridge inspect periodically frequency and can bridge normal It uses, improves highway bridge maintenance management level, ensure highway bridge safety, improve highway bridge maintenance quality, save Manpower and material resources.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is that the present invention is based on the flow charts that the bridge health grade of semi-supervised error correction study determines method;
Fig. 2 is bridge data statistics classification figure of the invention;
Fig. 3 is the classifier and initial data and cluster of classifier of the present invention and the data set training not learnt by error correction The sum of mutual information of data comparison diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Method is determined based on the bridge health grade of semi-supervised error correction study, as shown in Figure 1, the specific steps are as follows:
Step S1, the label information and character separation that the primitive bridge data of label will be had in training set, i.e., will be original The original tag and character separation of bridge label data, to facilitate processing;
Step S2, using unsupervised learning method, cluster operation is carried out to the data characteristics of primitive bridge label data;
Label information due to cluster operation independent of data script is found between each data from the inside of data Connection give data different classification informations so that data to be referred to different classifications, so pass through cluster operation obtain As a result, can more represent the contiguity inside data.
Step S3, to the cluster data and primitive bridge label data progress label obtained by cluster operation with classification information It is unified;
Under normal circumstances, the label of the classification of the cluster data with classification information and primitive bridge label data is not pair It answers, the label information of primitive bridge data is I class bridge, II class bridge, Group III bridge, IV class bridge and V class bridge, And cluster data does not have label information, so 1,2,3,4,5 classes can only be splitted data by data according to preset classification number, But any label information of initial data cannot be corresponded to specific to 1 class, and cluster every time as a result, being likely to be different , 1 class clustered for the first time, when clustering for the second time, this partial data may be flagged as 2 classes, so needing original bridge Beam label data and cluster data carry out label unification, so that each classification of cluster data is endowed and primitive bridge label The consistent cluster labels of the label of data.
Carrying out label unification to cluster data and primitive bridge label data, specific step is as follows:
Step S31, take each categorical data in cluster data respectively with every a kind of number in bridge original tag data According to seeking common ground;
Step S32, number of data contained by number of data cluster data corresponding with its contained by each single item intersection data is calculated Ratio, obtain the label probability of each corresponding cluster data of intersection data;
Step S33, the corresponding label probability maximum value of cluster data of each classification is filtered out;
Step S34, the corresponding intersection data of the label probability maximum value of the cluster data of each classification are determined, and by the friendship The label of the corresponding primitive bridge label data of collection data assigns such cluster data, completes label unification, such cluster data Obtain cluster labels.
Specially take each categorical data in cluster data respectively with every a kind of data in bridge original tag data It seeks common ground, obtains the intersection matrix that a size is classification number * classification number;The original for being with doing intersection with it that intersection part indicates The beginning consistent cluster data of bridge label data label;Each item data in intersection matrix indicates a kind of data of certain in cluster Data identical with a kind of data label of certain in initial data;Calculate each single item intersection data and corresponding in intersection matrix The ratio of a kind of cluster data obtains the probability matrix of a same specification;Each single item in probability matrix, expression is cluster Data one kind data, are the sizes of the probability of corresponding class in bridge initial data;Probability matrix is traversed, maximum probability is taken It is corresponding to assign the corresponding original tag of maximum probability value item, that is, primitive bridge label data label to maximum probability value item for value The cluster data for clustering classification, makes the cluster class transitions of cluster data correspond to the cluster labels of bridge technology grade, completes Label is unified;All Datarows and column data where cluster data has determined that the probability data of label will be corresponded in probability matrix It is set to 0;Due to being set the label information of category data, then the data of other classifications in cluster are just no longer assigned The label is given, the data of the category in cluster can not be endowed other labels again, so clustering corresponding in probability matrix The All Datarows of row and column and column data are set to 0 where data have determined that the probability data of label.
Step S4, according to label unified result, using the data identical with original tag of cluster labels in training set as height Confidence level bridge data collection, the two different data of class label are as low confidence data set;
Step S5, it is assembled for training by supervised learning method using high confidence level bridge data according to preset iterative parameter k Practice classifier, and low confidence bridge data collection is iterated using trained classifier, updates high confidence level bridge data Collection;
Iterative parameter k is the hyper parameter being manually set, and value can be according to training bridge data collection size used and bridge The wrong data accounting estimated in data set is arranged.Iterative parameter k expression needs to screen the number of the uncertain data of label, K value is smaller, and screening number is fewer, and the data for screening library are more accurate, but data volume is fewer;K value is bigger, indicates that screening number is got over More, the data for screening library are more inaccurate, and data volume is more, at most can all divide all data in high confidence level data set, In this way, being equivalent to directly carry out model training with the data of all original tags, common machine learning mould has been reformed into Type loses the meaning of error correction.
Classified by supervised learning method using high confidence level bridge data collection training according to preset iterative parameter k Device, and low confidence bridge data collection is iterated using trained classifier, update the tool of high confidence level bridge data collection Steps are as follows for body:
Step S51, judge whether iterative parameter K reaches setting value, if so, then stopping iteration;
Step S52, by unsupervised learning method, low confidence bridge data clustering is obtained with clustering information Low confidence cluster data;
Step S53, it is united to low confidence bridge data and low confidence cluster data using the method for step S31~S34 One label assigns cluster labels corresponding to low confidence cluster data;
Step S54, by supervised learning method, high confidence level bridge data collection training classifier is utilized;
Step S55, the consistent low confidence bridge data of cluster labels, original tag and tag along sort is merged into height Confidence level bridge data is concentrated;And again to cluster labels, original tag and the inconsistent low confidence bridge data of tag along sort Secondary iteration;
Step S56, iterative parameter K adds 1, the S61 that repeats the above steps~step S65.
Step S6, it is trained again using updated high confidence level bridge data collection, obtains final classifier;
Step S7, using final classifier, the bridge data of bridge is inputted, obtains bridge health grade.
After iteration k times, the confidence level of high confidence level bridge data collection is relatively high, meanwhile, it is also not true without too many label Fixed data, the classifier trained by the data set will be little affected by the influence of the uncertain data of label, therefore meeting Effect more better than general category device is obtained, therefore, classifier is exactly the classifier finally needed.
In such a way, it is that data are clustered for the first time in order to achieve the purpose that correcting data error, is to pass through The mode of unsupervised learning finds the inner link of data, certain obtained by way of cluster a kind of data and original number Each classification in compares, if wherein belonging to one kind with initial data, illustrates this partial data and initial data Middle this kind label meets, and is compared later by label, cluster data and the consistent data of initial data label are done intersection, handed over The result of collection is that the mode for clustering and classifying all is assigned to of a sort data, and this kind of data are it is considered that be that label is set A possibility that relatively high data of reliability, that is, very determining for the label of itself, there are mistakes very little.But this partial data May itself only the sub-fraction in initial data for train for a disaggregated model data volume be likely to not enough , and in remaining data, also there are the correct data of label.So to data left carry out iteration, exactly each Screening, by classifying, clustering, then compares with original tag, filters out the high data of confidence level.It, can after screening To think the data set after screening, the accuracy of label is higher than original data set, and during this screening, classification The performance of device also gets a promotion in the increase with the high data set of confidence level, and when reaching the iteration threshold of setting, we recognize It has been screened for the uncertain data of label similar, has been that label confidence level is higher in the high data set of confidence level Data are trained using this partial data later, and the accuracy of obtained classifier will be than using raw data set training Classifier want high.
The present invention is the obtained bridge health grade of method that is calculated based on existing mathematical formulae the case where there are errors Lower proposition, since the data of existing bridge health grade are calculated by mathematical formulae, that is to say, that all Bridge data there may be error, so it is unreasonable for carrying out the assessment of model using the existing bridge data of inaccuracy , in order to prove the validity of this method, we used include Iris, Banknote Authentication, 9 groups including Ionosphere, Seeds, Sonar, Wine, Glass, Statlog (Heart), Letter-recognition Uci data set, which includes two classification tasks and more classification tasks.In order to determine this method in the number with uncertain label According to upper validity, these data are training dataset and test data grade according to 80%, 20% ratio cut partition by we, are divided A part of data for concentrating the training data of every sets of data collection are not corrected mistakes manually with 10%, 30%, 50% ratio, The label of these data is changed to other classifications at random, so that training data concentrates data that there is mistake, uncertain.It Classifier is trained using the partial data afterwards, obtains error correction learning intelligence assessment classifier.Then using similarly changing The training dataset retraining the missed one general category device without error correction study.Then using the test number that do not correct mistakes It is input in two classifiers according to collection, the original tag of assessment result and data that two classifiers are obtained, i.e., it is completely correct Data label compare, as a result such as table 1, the data set training of error correction learning method error correction of the invention can be obtained by table 1 The assessment accuracy of classifier be above the general category device of unused this method.
Table 1 manually correct mistakes after uci data set training classifier evaluation accuracy (%)
The present invention specifically addresses the determination of bridge health grade, primitive bridge data are built from Yunnan Province's investment in transportation Guarantor's dragon bridge information data of Group Co., Ltd's acquisition.Bridge data shares 76 attributes, during attribute selection, for state High net mileage, by the such information obviously unrelated with bridge technology evaluation of inspection bent cap number etc., it is deleted.For certain categories Property, this column information is also deleted, is not considered in the training process to it by the identical attribute of information.For remaining category Property, after Xiang Youguan professional inquires, finally determined as properties: bridge technology situation grade, function type are current Load-carrying, deck paving build up the time limit, and classification, structure type, expansion joint type, support style, bridge structure type, top is held Reconstruct part scoring, the general component scoring in top, support scoring, aileron, the scoring of ear wall, cone slope, slope protection scoring, bridge pier scoring, bridge Platform scoring, pile foundation scoring, riverbed scoring, regulating structure scoring, deck paving scoring, expansion gap device scoring, people's row Road scoring, railing, guardrail scoring, illumination, mark scoring, drainage system scoring.Bridge technology situation grade is as data label.
The top supporting member scoring, the general component scoring in top, support scoring, aileron, the scoring of ear wall, cone slope, shield Slope scoring, bridge pier scoring, abutment scoring, pile foundation scoring, riverbed scoring, regulating structure scoring, deck paving scoring are stretched Contracting seam device scoring, pavement scoring, railing, guardrail scoring, illumination, mark scoring and drainage system scoring, be respectively according to According to top supporting member, the general component in top, support, aileron, ear wall, cone slope, slope protection, bridge pier, abutment, pile foundation, riverbed, Regulating structure, bridge deck pavement, expansion gap device, pavement, railing, guardrail illuminate, the live instrument of mark and drainage system The defective eigenpairs of device exploration, defect determine the amendment situation for the influence and defective eigenpairs for using function.The calculating respectively scored obtains It takes, referring to " Highway bridge technique status assessment standard " (JTGT H21-2011).
Data set as shown in Fig. 2, shared three classes (being followed successively by 2,1 and 3 classes) from left to right, wherein 2 class data are most, altogether 254,1 class data are taken second place, and totally 240,3 class minimum datas, totally 7.Since 3 class data bulks are very few, which exists The problem of data nonbalance.In addition, the label of the data set is obtained by mathematical formulae meter, so it is uncertain to have label Data, that is to say, that there are data noises.
Before experiment, since 3 class data are very few, in order to avoid absolutely not third class data in training set or test set Situation, by the data of each class respectively according to the ratio cut partition of 8:2 be training data, test data, wherein 1 class data training Data 192, test data 48,2 class data training datas 204, test data 51,3 class data training datas 5, Test data 2.There are problems that data nonbalance in training data, training set data, solution are handled by using smute method The certainly unbalanced problem of training set data.Error correction is carried out to training set using error correction learning method of the invention later, it is then right Classifier is trained, and obtains final disaggregated model.
Since bridge data is calculated based on mathematical formulae, so also there is label and do not know in bridge data itself The problem of, if assessment of the bridge data as model accuracy is used only, can there is a problem of that benchmark is wrong, but actually There is no the bridge datas of entirely accurate to assess, and has then introduced the assessment mode of mutual information.Mutual information (Mutual Information), be the measurement of interdependency between two stochastic variables, the mutual information the big, show between two variables according to Lai Yueqiang.Since bridge original tag has inaccuracy, we introduce the result of cluster as reference, calculate separately The assessment result of error correction learning method and the assessment result of the method without error correction and the mutual information of original tag and cluster labels The sum of, the accuracy of two methods is assessed after 10 times with this.
Wherein, the result of cluster is learnt by the following method:
1. using training data clustering, three cluster centres are obtained;
2. calculating test data at a distance from three cluster centres, test data is grouped into cluster centre apart from the smallest It is a kind of;
3. unifying label with initial data;
By calculating two methods and the sum of initial data and the mutual information of cluster data, final result such as Fig. 3 passes through 10 Secondary result is averaged, the mutual information of the result of error correction learning method of the present invention and initial data and cluster result and mean value be 0.6678160581326551, the mutual information of the result of the method without error correction and initial data and cluster result and mean value be 0.626207974109575, it follows that effect of the invention, better than the Evaluated effect of the commonsense method without error correction.
Bridge health grade falls into 5 types, specific as follows:
The data of the invention used must be noted that due to being historical data, that is, these data are all led to The mode for having crossed mathematical formulae has calculated Health Category, and only, we used the basic components in these historical datas Input feature vector as our method of scoring and other bridge information.Use the classification of these data go as Bridge health grade trains the classifiers of our methods.In training process, our method can reject those historical datas The data of middle classification error, that is, low confidence data set leave behind classification and are substantially correct data, that is, high set Then reliability data set removes one classifier of training, since the data of classification error are screened, institute using high confidence level data Classifier performance will not be caused not good enough due to there is the data of classification error in training data, so what is obtained is a standard The higher classifier of exactness.After training classifier, it is desired nonetheless to observe bridge, using with method as before (here Former theory, the mode of bridge health grade is calculated with formula scoring), also as before to the marking of all parts, It has made score, the score and bridge foundation information on basis, i.e. bridge technology situation grade, function type, current load-carrying, bridge Face is mated formation, and the time limit is built up, classification, structure type, expansion joint type, support style, bridge structure type, and top supporting member is commented Point, the general component scoring in top, support scoring, aileron, the scoring of ear wall, cone slope, slope protection scoring, bridge pier scoring, abutment scoring, pier Platform basic score, riverbed scoring, regulating structure scoring, deck paving scoring, expansion gap device scoring, pavement scoring, column Bar, guardrail scoring, illumination, mark scoring, drainage system scoring, by these information inputs into our method, method is automatic Obtain the corresponding Health Category of the bridge.
The core technology that the present invention includes is entire method, and multiple assessment mode has been used to get the label of data, Judge whether data label has a high confidence level by contacting between data, judgment basis include and be not limited only to it is whether equal, Such as judge that the confidence level of difference and data label between data is also a kind of alternative solution using mutual information.The present invention Used in disaggregated model and clustering algorithm be also not limited to any specific method.As long as classification method, or only If clustering method can be used in appraisal procedure of the invention, core is using different methods to data plus mark Label, as long as the method that data can be distinguished classification, the method for being not limited to machine learning.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (5)

1. determining method based on the bridge health grade of semi-supervised error correction study, which is characterized in that specific step is as follows:
Step S1, the label information and character separation that the primitive bridge data of label will be had in training set, i.e., by primitive bridge The original tag and character separation of label data;
Step S2, using unsupervised learning method, cluster operation is carried out to the data characteristics of primitive bridge label data;
Step S3, to the cluster data and primitive bridge label data progress label system obtained by cluster operation with classification information One;
Step S4, according to label unified result, using the data identical with original tag of cluster labels in training set as high confidence Bridge data collection is spent, the two different data of class label are as low confidence data set;
Step S5, high confidence level bridge data collection training point is utilized by supervised learning method according to preset iterative parameter k Class device, and low confidence bridge data collection is iterated using trained classifier, update high confidence level bridge data collection;
Step S6, it is trained again using updated high confidence level bridge data collection, obtains final classifier;
Step S7, using final classifier, the bridge data of bridge is inputted, obtains bridge health grade.
2. the bridge health grade according to claim 1 based on semi-supervised error correction study determines that method, feature exist In label information is bridge technology grade in the step S1.
3. the bridge health grade according to claim 1 or 2 based on semi-supervised error correction study determines method, feature It is, specific step is as follows by the step S3:
Step S31, each categorical data in cluster data is taken to ask respectively with every a kind of data in bridge original tag data Intersection;
Step S32, the ratio of number of data contained by number of data cluster data corresponding with its contained by each single item intersection data is calculated Value, obtains the label probability of each corresponding cluster data of intersection data;
Step S33, the corresponding label probability maximum value of cluster data of each classification is filtered out;
Step S34, the corresponding intersection data of the label probability maximum value of the cluster data of each classification are determined, and by the intersection number Such cluster data is assigned according to the label of corresponding primitive bridge label data, completes label unification, such cluster data obtains Obtain cluster labels.
4. the bridge health grade according to claim 3 based on semi-supervised error correction study determines that method, feature exist In specific step is as follows by the step S5:
Step S51, judge whether iterative parameter K reaches setting value, if so, then stopping iteration;
Step S52, by unsupervised learning method, low confidence bridge data clustering is obtained setting with the low of clustering information Reliability cluster data;
Step S53, low confidence bridge data and low confidence cluster data are marked using the method unification of step S31~S34 Label assign cluster labels corresponding to low confidence cluster data;
Step S54, by supervised learning method, high confidence level bridge data collection training classifier is utilized;
Step S55, the consistent low confidence bridge data of cluster labels, original tag and tag along sort is merged into high confidence Bridge data is spent to concentrate;And it changes again to cluster labels, original tag and the inconsistent low confidence bridge data of tag along sort Generation;
Step S56, iterative parameter K adds 1, the S61 that repeats the above steps~step S65.
5. the bridge health grade according to claim 1 or 4 based on semi-supervised error correction study determines method, feature It is, the size of the iterative parameter K is determined according to the wrong data accounting in training set size and the training set estimated.
CN201811307692.9A 2018-11-05 2018-11-05 Semi-supervised error correction learning-based bridge health grade determination method Active CN109460914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811307692.9A CN109460914B (en) 2018-11-05 2018-11-05 Semi-supervised error correction learning-based bridge health grade determination method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811307692.9A CN109460914B (en) 2018-11-05 2018-11-05 Semi-supervised error correction learning-based bridge health grade determination method

Publications (2)

Publication Number Publication Date
CN109460914A true CN109460914A (en) 2019-03-12
CN109460914B CN109460914B (en) 2021-12-31

Family

ID=65609387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811307692.9A Active CN109460914B (en) 2018-11-05 2018-11-05 Semi-supervised error correction learning-based bridge health grade determination method

Country Status (1)

Country Link
CN (1) CN109460914B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491520A (en) * 2019-07-26 2019-11-22 北京邮电大学 A kind of construction method of the sclerotin status assessment model based on semi-supervised learning
CN110991411A (en) * 2019-12-20 2020-04-10 谢骏 Intelligent document structured extraction method suitable for logistics industry
CN112579581A (en) * 2020-11-30 2021-03-30 贵州力创科技发展有限公司 Data access method and system of data analysis engine
CN112632278A (en) * 2020-12-18 2021-04-09 平安普惠企业管理有限公司 Labeling method, device, equipment and storage medium based on multi-label classification
CN114036258A (en) * 2021-10-19 2022-02-11 东南大学 Bridge technical condition grade rapid identification method based on natural language processing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663264A (en) * 2012-04-28 2012-09-12 北京工商大学 Semi-supervised synergistic evaluation method for static parameter of health monitoring of bridge structure
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN107316049A (en) * 2017-05-05 2017-11-03 华南理工大学 A kind of transfer learning sorting technique based on semi-supervised self-training
CN107451597A (en) * 2016-06-01 2017-12-08 腾讯科技(深圳)有限公司 A kind of sample class label method and device for correcting
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning
US20180285771A1 (en) * 2017-03-31 2018-10-04 Drvision Technologies Llc Efficient machine learning method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663264A (en) * 2012-04-28 2012-09-12 北京工商大学 Semi-supervised synergistic evaluation method for static parameter of health monitoring of bridge structure
CN103150578A (en) * 2013-04-09 2013-06-12 山东师范大学 Training method of SVM (Support Vector Machine) classifier based on semi-supervised learning
CN107451597A (en) * 2016-06-01 2017-12-08 腾讯科技(深圳)有限公司 A kind of sample class label method and device for correcting
US20180285771A1 (en) * 2017-03-31 2018-10-04 Drvision Technologies Llc Efficient machine learning method
CN107316049A (en) * 2017-05-05 2017-11-03 华南理工大学 A kind of transfer learning sorting technique based on semi-supervised self-training
CN107644235A (en) * 2017-10-24 2018-01-30 广西师范大学 Image automatic annotation method based on semi-supervised learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHONGCHONG YU ET. AL.: "Classification with cooperative semi-supervised learning using bridge structural health data", 《2012 INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS》 *
YU CHONGCHONG ET. AL.: "A bridge structural health data analysis model based on semi-supervised learning", 《2011 INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS》 *
YUN YANG ET. AL.: "A robust semi-supervised learning approach via mixture of label information", 《PATTERN RECOGNITION LETTERS》 *
王竞燕: "基于半监督学习的桥梁结构健康分类模型的研究与应用", 《中国优秀硕士学位论文全文数据库工程科技II辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491520A (en) * 2019-07-26 2019-11-22 北京邮电大学 A kind of construction method of the sclerotin status assessment model based on semi-supervised learning
CN110991411A (en) * 2019-12-20 2020-04-10 谢骏 Intelligent document structured extraction method suitable for logistics industry
CN112579581A (en) * 2020-11-30 2021-03-30 贵州力创科技发展有限公司 Data access method and system of data analysis engine
CN112579581B (en) * 2020-11-30 2023-04-14 贵州力创科技发展有限公司 Data access method and system of data analysis engine
CN112632278A (en) * 2020-12-18 2021-04-09 平安普惠企业管理有限公司 Labeling method, device, equipment and storage medium based on multi-label classification
CN114036258A (en) * 2021-10-19 2022-02-11 东南大学 Bridge technical condition grade rapid identification method based on natural language processing
CN114036258B (en) * 2021-10-19 2022-06-24 东南大学 Bridge technical condition grade rapid identification method based on natural language processing

Also Published As

Publication number Publication date
CN109460914B (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN109460914A (en) Method is determined based on the bridge health grade of semi-supervised error correction study
CN109859163A (en) A kind of LCD defect inspection method based on feature pyramid convolutional neural networks
CN107578104A (en) A kind of Chinese Traditional Medicine knowledge system
CN108228716A (en) SMOTE_Bagging integrated sewage handling failure diagnostic methods based on weighting extreme learning machine
CN105931116A (en) Automated credit scoring system and method based on depth learning mechanism
CN110175434A (en) A kind of rail fastener system injury detection method based on convolutional neural networks
CN103324937A (en) Method and device for labeling targets
CN109324604A (en) A kind of intelligent train resultant fault analysis method based on source signal
CN110930250A (en) Enterprise credit risk prediction method and system, storage medium and electronic equipment
CN110263934B (en) Artificial intelligence data labeling method and device
CN104680542A (en) Online learning based detection method for change of remote-sensing image
CN112756759B (en) Spot welding robot workstation fault judgment method
CN105426441B (en) A kind of automatic preprocess method of time series
CN109493147A (en) House property automatic evaluation method and system based on multi-layer Model Fusion
CN109272160A (en) Score on Prediction system and prediction technique
CN103400160A (en) Zero training sample behavior identification method
CN107295537A (en) A kind of method and system for wireless sensor network reliability of testing and assessing
CN107967540A (en) Student's academic warning system and method
CN117011006A (en) Electronic bidding supervision method based on big data mining
CN112052342A (en) Learning path recommendation method and system based on online test result big data analysis
CN114565207A (en) Urban mass high-quality development monitoring and evaluating method integrating attribute data and flow data
CN107977719A (en) A kind of bearing fault Forecasting Methodology
CN107704869A (en) A kind of corpus data methods of sampling and model training method
CN108763459A (en) Professional trend analysis method and system based on psychological test and DNN algorithms
CN110348480A (en) A kind of non-supervisory anomaly data detection algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant