CN108664607A - A kind of power telecom network quality of data method for improving based on transfer learning - Google Patents

A kind of power telecom network quality of data method for improving based on transfer learning Download PDF

Info

Publication number
CN108664607A
CN108664607A CN201810445948.6A CN201810445948A CN108664607A CN 108664607 A CN108664607 A CN 108664607A CN 201810445948 A CN201810445948 A CN 201810445948A CN 108664607 A CN108664607 A CN 108664607A
Authority
CN
China
Prior art keywords
sample
nuclear
space
cluster
target domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810445948.6A
Other languages
Chinese (zh)
Inventor
杨济海
李仁华
彭汐单
巢玉坚
邓永康
伍小生
田晖
郑富永
王�华
付萍萍
胡游君
邱玉祥
吕顺利
周鹏
邓伟
刘皓
蔡新忠
查凡
王宏
丁传文
刘洋
李石君
余伟
余放
李宇轩
李敏
彭亮
彭超
陈雪莲
陈艳华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information And Communication Branch Of Jiangxi Electric Power Co Ltd
Wuhan University WHU
NARI Group Corp
Information and Telecommunication Branch of State Grid Jiangxi Electric Power Co Ltd
Original Assignee
Information And Communication Branch Of Jiangxi Electric Power Co Ltd
Wuhan University WHU
NARI Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information And Communication Branch Of Jiangxi Electric Power Co Ltd, Wuhan University WHU, NARI Group Corp filed Critical Information And Communication Branch Of Jiangxi Electric Power Co Ltd
Priority to CN201810445948.6A priority Critical patent/CN108664607A/en
Publication of CN108664607A publication Critical patent/CN108664607A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to a kind of power telecom network quality of data method for improving based on transfer learning.Kernel discriminant analysis is used to set L first, finds a kind of suitable nuclear mapping space, and all samples in L, U and O are mapped in nuclear space so that the edge distribution of source domain and target domain sample is very close in nuclear space.Then the sample for possessing similar conditional probability distribution with target domain is selected in source domain using two points of k mean algorithms.And in the nuclear space that step 1 obtains, the markd sample of sample and target domain picked out with step 2 trains a model jointly, and there is no the sample of label to be predicted in target domain, finally obtain the N kind prediction results to set U, with majority voting method, the label that sample is final in set U is determined.The present invention efficiently solves the problems, such as that training set and test set sample distribution are inconsistent by transfer learning, has solved the problems, such as that exemplar is less and can not train, has dramatically saved manpower and financial resources.

Description

A kind of power telecom network quality of data method for improving based on transfer learning
Technical field
The invention belongs to the technical fields that the power telecom network quality of data is promoted, and are related specifically to the electricity based on transfer learning Power communication network quality of data method for improving.
Background technology
With the deep development of State Grid Corporation of China " three collection five are big " system, strong intelligent grid builds swift and violent, enterprise's letter Breath chemical industry is pushed forward comprehensively.As the electric power dedicated communications network of intelligent grid important support, stepped by the way that at the double, quick march within 3 years Enter the information system management stage, builds up a set of general headquarters and provincial company " two-stage deployment ", general headquarters, branch, provincial company, company of cities and counties The communications management system " SG-TMS " of " level Four application ".By the project construction of standardization and normalization and to system functionization Carry forward vigorously, " SG-TMS " depth incorporates in the routine work of tens thousand of power communication professionals, and acquires comprehensively The construction in the past few years of tens thousand of equipment, operation, management data, accumulative magnanimity Electric Power Communication Data and numerous external systems System data, common data have been formed together the basis for carrying out big data analysis.
Want from accumulative mass data, efficiently and accurately searches out required information, information classification is must not The first step that can lack.By classification, information can be organized effectively, and quickly and accurately location information is conducive to.Point Class problem concerning study is a kind of important learning method in machine learning, has obtained extensive research and development at present.
In traditional classification study, in order to ensure that the disaggregated model that training obtains has accuracy and high reliability, have Two basic assumptions:(1) training sample and new test sample for being used for study meet independent identically distributed condition;(2) must There must be training sample available enough that could learn to obtain a good disaggregated model.But we send out in practical applications Existing, the two conditions often cannot be satisfied.First, over time, original available sample data for having label can It can become use, the distribution with new test sample generates difference semantic, in distribution.In addition, there is the sample of label Data are often very deficient, and are difficult to obtain, and abandon out-of-date data completely, excessively waste.
In recent years, with the further investigation of transfer learning, the above problem is resolved.Transfer learning is in source domain Knowledge solve the problems, such as a kind of new machine learning method of target domain, research field includes mainly text classification, text Cluster, emotional semantic classification, image classification, collaborative filtering, sensor-based location estimation, artificial intelligence planning etc..
In text-processing field, Dai et al. proposes joint clustering method, while being clustered to document and word feature, Identical word feature, which is shared, by different field carries out knowledge migration.They also propose migration Bayes classifier, estimate first Then the data distribution of source domain data is constantly corrected and adapts it to target domain data.Zhuang et al. is in concept level On text is handled, propose excavate document concepts and word Feature concept transfer learning method.Long etc. on this basis People proposes dual migration models, is further divided to concept, improves algorithm classification accuracy rate.Gu et al. proposes shared son The multitask clustering method in space, and applied in migration classification.
In terms of image procossing, Dai et al. proposes a kind of translation transfer learning method, carrys out assistant images by text data Cluster.Raina et al. proposes a kind of new method for carrying out self study from no label data, this method usage factor coding techniques From largely without high-level characteristic is constructed on label data, to improve image classification performance.Zhu et al. has studied a kind of isomery migration Learning method, using the Tag label informations on image as the bridge of knowledge migration between text and image, to improve image Classifying quality in data.
In terms of collaborative filtering, Wang et al. proposes the transfer learning method of proper subspace to overcome in collaborative filtering Sparse Problems, i.e., from the acquistion of auxiliary data middle school to user characteristics subspace be migrated in target domain.Pan et al. is studied Transfer learning algorithm with uncertain scoring in collaborative filtering, i.e., consider uncertain scoring in optimization aim matrix decomposition Auxiliary data as limitation.Cao et al. proposes the link prediction model based on the potential feature sharing policy of project, compares in performance The study of individual task is promoted.
Invention content
With the continuous intensification of the power telecom network level of informatization, telecommunication management, equipment operation, network construction etc. Mass data gradually accumulates, and is urgently excavated wherein having contained huge value.But over time, former First available to have the sample data of label that become use, the distribution with new test sample generates semanteme, divides Difference on cloth.In addition, there is the sample data of label often very deficient, and it is difficult to obtain, and abandon out-of-date number completely According to excessively wasting.Since data have the characteristics that timeliness is strong, when excavating hiding information, it is possible that deviation.
The present invention defines following relational language before proposing solution:
Out-of-date data are source domains, and new data is target domain.Present invention L={ XL, YLRepresent in target domain and have The sample of label, wherein XL={ x1..., xγ, YL={ y1..., yγ, including γ sample;With U={ XURepresent target domain In there is no the sample of label, wherein XU={ xγ+1..., xγ+u, including u sample.Similarly, with O={ Xo, YoRepresent source neck Domain sample, including o sample.
The present invention utilizes transfer learning domain knowledge, migrate in source domain part sample and target domain sample jointly into Row training.Migrate basic principle:Source domain sample possesses the edge distribution same or similar with target domain and condition distribution.
To complete the above target, scheme proposed by the present invention is as follows:
A kind of power telecom network quality of data method for improving based on transfer learning, which is characterized in that based on definition:L= {XL, YLRepresent and have the sample of label, wherein X in target domainL={ x1..., xγ, YL={ y1..., yγ, including γ sample This;U={ XURepresent and there is no the sample of label, wherein X in target domainU={ xγ+1..., xγ+u, including u sample;O= {XO, YOSource domain sample is represented, including o sample, specifically includes:
Step 1, Kernel discriminant analysis is used to set L, finds a kind of suitable nuclear mapping space, and by the institute in L, U and O There is sample to be mapped in nuclear space so that source domain sample nuclear space edge distribution close to target domain sample in nuclear space Edge distribution;
Step 2, in the nuclear space that step 1 obtains, using two points of k mean algorithms (Bisecting k-means) in source The sample for possessing similar conditional probability distribution with target domain is selected in field, and is recorded and be selected sample in original sky Between in original sample set S;
Step 3, in the nuclear space that step 1 obtains, the markd sample of sample and target domain picked out with step 2 One model of training jointly, and there is no the sample of label to be predicted in target domain;
Step 4, step 1-3 executes n times, in step 1, except the sample for finding nuclear mapping space for the first time be in set L, It is the sample that nuclear mapping space is found in L and S and concentrating that following cycle, which executes,;It finally obtains and knot is predicted to the N kinds of set U Fruit determines the label that sample is final in set U with majority voting method.
In a kind of above-mentioned power telecom network quality of data method for improving based on transfer learning, the step 1 is specifically wrapped It includes:
Step 1.1, calculating matrix W;W=(Wi)I=1 ..., NCIt is block diagonal matrix, WiIt is li×liMatrix, WiEach of Element is 1/li, wherein liIt is the quantity of the i-th class sample, NC is sample class sum, i.e.,:
Step 1.2 calculates nuclear matrix K;Kernel function κ (xi, xj) define the dot-product operation in feature space F, i.e. κ (xi, xj)=φ (xi)·φ(xj), each element of nuclear matrix K is κij=κ (xi, xj);The present invention select Gaussian kernel as Kernel function, i.e.,:σ > 0 are the bandwidth of Gaussian kernel;
Step 1.3 simplifies object function;The decomposition for using nuclear matrix K feature vector, obtains K=P Λ PT, wherein Λ The column vector of the diagonal matrix being made of nonzero eigenvalue, P is unit character vector and mutually orthogonal, feature vector and Λ In characteristic value it is corresponding;Then object function is reduced to:
λ β=PTWPβ
Wherein, β=Λ PTα, finding out makes λ value maximum β, and corresponding α can be calculated;
Step 1.4, sample are mapped to nuclear space;Sample z to v's is projected as:
In a kind of above-mentioned power telecom network quality of data method for improving based on transfer learning, the step 2 is based on fixed Justice:
Define 2.1:A given cluster C and its two submanifold C1And C2, C1∪C2=C andThen:
Par(C,C1,C2)--[SSE(C)-SSE(C1)-SSE(C2)]
Wherein, SSE (C) represent non-center of mass point in C to the distance of center of mass point and, Par (C, C1, C2) represent whether C can divide Solution is two submanifold C1And C2, value 1:Can, 0:It cannot;
Define 2.2:Give a cluster Ci, sample standard deviation therein is marked as "+" and "-", then cluster CiPurity be:
Purity(Ci) represent CiPurity, i.e., the maximum value of proportion in positive and negative two classes sample;
It specifically includes:
Step 2.1, from Ci2 samples of middle random selection are as initial mean value vector μ1And μ2, respectively as submanifold Ci1With Ci2Barycenter;
Step 2.2 calculates cluster CiIn each sample and μ1、μ2Euclidean distance, with μ1Recently, then the sample is included in cluster Ci1, otherwise, cut-in cluster Ci2
Step 2.3 is cluster Ci1Calculate new mean vector:If μ1≠μ′1, update μ1For μ '1; To cluster Ci2Do same operation;
If step 2.4, current mean vector do not update, cluster CiFinally it is divided into two submanifold Ci1And Ci2, otherwise, Step 2.2 is repeated to step 2.3.
In a kind of above-mentioned power telecom network quality of data method for improving based on transfer learning, the step 4 is specifically wrapped It includes:
Step 4.1, given L, U and O, i=1, iterations N;
Step 4.2, LearnKDA=L, if i > 1, LearnKDA=L ∪ Si-1
Step 4.3 uses Kernel discriminant analysis method to the sample in set LearnKDA, finds nuclear mapping space;
It is NL that step 4.4, set L, U and O, which are respectively mapped to nuclear space,i, NUiAnd NOi
Step 4.5, with two points of k mean algorithm clustering methods, in NOiIn select sample, the sample set being selected is SOi, with set SiIndicate SOiSample set in luv space;
Step 4.6 utilizes SOiAnd NLiTrain a MODEL Ci, to set NUiIn each sample predictions label;
Step 4.7 enables i=i+1, repeats step 4.2 to step 4.5, until i=N;
Step 4.8, the N kind prediction results finally obtained to set U determine that sample is most in set U with majority voting method Whole label.
Therefore, the invention has the advantages that:
It efficiently solves the problems, such as that training set and test set sample distribution are inconsistent by transfer learning, has solved mark The problem that signed-off sample sheet is less and can not train, dramatically saves manpower and financial resources.
Description of the drawings
Fig. 1 a are source domain and target domain in luv space and nuclear space sample distribution comparison diagram (target domain sample point Butut).
Fig. 1 b are source domain and target domain in luv space and nuclear space sample distribution comparison diagram (source domain sample distribution Figure).
Fig. 1 c are source domain and target domain, and in luv space and nuclear space sample distribution comparison diagram, (target domain is in core sky Between middle sample distribution figure).
Fig. 1 d are source domain and target domain, and in luv space and nuclear space sample distribution comparison diagram, (source domain is in nuclear space Middle sample distribution figure).
Fig. 2 is operational flowchart.
Specific implementation mode
Step 1, the Feature Mapping based on kernel function
Kernel discriminant analysis (KDA) has used " geo-nuclear tracin4 " similar to SVM and core PCA methods, i.e., first that data are non-thread Property be mapped to some feature space F, then in this feature space carry out linear discriminant analysis (LDA), thus imply Ground realizes the nonlinear discriminant of the former input space.
If φ is Nonlinear Mapping of the input space to some feature space F, the linear discriminant that find in F needs maximum Change
Wherein, v ∈ F,WithIt is corresponding matrix in F, i.e.,:
Wherein,L is the total quantity of sample, liIt is the i-th class sample Quantity, NC be sample class sum.
According to theory of reproducing kernel space, any v ∈ F must be located at all training samples and open collection in F, therefore can find following shape An expansion of the v of formula
It recycles kernel function to replace dot product, the object function of KDA can be obtained:
1.1 matrix W
W=(Wi)I=1 ..., NCIt is block diagonal matrix, WiIt is li×liMatrix, each element therein is 1/li, i.e.,:
1.2 nuclear matrix K
Kernel function κ (xi, xj) define the dot-product operation in feature space F, i.e. κ (xI, xj)=φ (xi)φ(xj), core Each element of matrix K is κij=κ (xi, xj).The present invention selects Gaussian kernel as kernel function, i.e.,:σ > 0 are the bandwidth of Gaussian kernel.
1.3 simplify object function
The decomposition for using nuclear matrix K feature vector, obtains K=P Λ PT, wherein Λ is made of nonzero eigenvalue The column vector of diagonal matrix, P is that unit character is vectorial and mutually orthogonal, and the characteristic value in feature vector and Λ is corresponding.Then Object function is reduced to:
λ β=PTWPβ
Wherein, β=Λ PTα, finding out makes λ value maximum β, and corresponding α can be calculated.
1.4 samples are mapped to nuclear space
Sample z to v's is projected as:
Step 2, the samples selection based on cluster
Bisecting k-means clustering algorithms, i.e. two points of k mean algorithms, it is a change of k-means clustering algorithms Body, primarily to the randomness that improvement k-means algorithms randomly choose initial barycenter causes cluster result is probabilistic to ask Topic, and Bisecting k-means algorithms by randomly choose initial barycenter influenced it is smaller.
In Euclidean space, a cluster C is weighediQuality usually using following measurement:Error sum of squares (Sum of The Squared Error, abbreviation SSE), that is, to calculate execute clustering after, a mistake will be calculated to each point Difference, i.e., non-center of mass point to center of mass point uiDistance, i.e.,:
Before carrying out samples selection operation, it is first described below two definition.
Define 2.1:A given cluster C and its two submanifold C1And C2, C1∪C2=C andThen:
Par(C,C1,C2)=[SSE (C)-SSE (C1)-SSE(C2)]
Define 2.2:Give a cluster Ci, sample standard deviation therein is marked as "+" and "-", then cluster CiPurity be:
Using Bisecting k-means clustering algorithms, the specific implementation procedure for executing samples selection is as follows:
(1) when initial, there is exemplar as data set to be clustered source domain and target domain, and be initialized as one Cluster C0, i.e. C={ C0}
(2) a cluster C is taken from CiK-means cluster operations (k=2) are carried out, two submanifold C are obtainedi1And Ci2
(3) if Purity (Ci)≤0.9 or Par (Ci, Ci1, Ci2C is used in set C in)=1i1And Ci2Replace Ci
(4) step (2) (3) is repeated, until the element in set C was traversed.
(5) C={ C are finally obtained1..., Ck, cluster CiLabel be CiThere is exemplar quantity most in middle target domain Label, i.e.,:CLi=arg maxJ ∈ [1, NC]ncij, wherein ncijIt is cluster CiMiddle jth class target domain has exemplar quantity. In cluster CiIn select and cluster CiThe consistent source domain sample of label.
The concrete operations of step (2) are as follows:
(a) from Ci2 samples of middle random selection are as initial mean value vector μ1And μ2, respectively as submanifold Ci1And Ci2Matter The heart.
(b) cluster C is calculatediIn each sample and μ1、μ2Euclidean distance, with μ1Recently, then by sample cut-in cluster Ci1, no Then, cut-in cluster Ci2
(c) it is cluster Ci1Calculate new mean vector:If μ1≠μ′1, update μ1For μ '1;To cluster Ci2 Do same operation.
(d) current mean vector does not update, then cluster CiFinally it is divided into two submanifold Ci1And Ci2, otherwise, repeat to walk Suddenly (b), (c).
Step 3, training grader
Sample and target domain one grader of markd sample training picked out with step 2, and be target domain In there is no the sample of label to be predicted.Sorter model can be from support vector machines (SVM), logistic regression, decision tree, simplicity It is selected in the models such as Bayes, the quality of measurement model is gone with cross validation.
Step 4, step 1-3 repeats n times
Concrete operations are as follows:
(1) L, U and O, i=1, iterations N are given.
(2) LearnKDA=L, if i > 1, LearnKDA=L ∪ Si-1
(3) nuclear mapping space is found with the KDA methods in step 1 to the sample in set LearnKDA.
(4) it is NL that set L, U and O, which is respectively mapped to nuclear space,i, NUiAnd NOi
(5) clustering method in step 2 is used, in NOiIn select sample, the sample set being selected be SOi, with set SiIndicate SOiSample set in luv space.
(6) SO is utilizediAnd NLiTrain a MODEL Ci, to set NUiIn each sample predictions label.
(7) i=i+1 repeats step (2)-(5), until i=N.
(8) the N kind prediction results to set U are finally obtained, with majority voting method, determine sample in set U finally Label.
Specific embodiment described herein is only an example for the spirit of the invention.Technology belonging to the present invention is led The technical staff in domain can make various modifications or additions to the described embodiments or replace by a similar method In generation, however, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.

Claims (4)

1. a kind of power telecom network quality of data method for improving based on transfer learning, which is characterized in that based on definition:L={ XL, YLRepresent and have the sample of label, wherein X in target domainL={ x1..., xγ, YL={ y1..., yγ, including γ sample;U ={ XURepresent and there is no the sample of label, wherein X in target domainU{xγ+1..., xγ+u, including u sample;O={ XO, YOGeneration Table source domain sample, including o sample, specifically includes:
Step 1, Kernel discriminant analysis is used to set L, finds a kind of suitable nuclear mapping space, and by all samples in L, U and O Originally be mapped in nuclear space so that source domain sample nuclear space edge distribution close to target domain sample on the side of nuclear space Fate cloth;
Step 2, in the nuclear space that step 1 obtains, using two points of k mean algorithms (Bisecting k-means) in source domain In select the sample for possessing similar conditional probability distribution with target domain, and record and be selected sample in luv space Original sample set S;
Step 3, in the nuclear space that step 1 obtains, the markd sample of sample and target domain picked out with step 2 is common One model of training, and there is no the sample of label to be predicted in target domain;
Step 4, step 1-3 executes n times, is in set L, subsequently except the sample in nuclear mapping space is found for the first time in step 1 It is the sample that nuclear mapping space is found in L and S and concentrating that cycle, which executes,;The N kind prediction results to set U are finally obtained, With majority voting method, the label that sample is final in set U is determined.
2. a kind of power telecom network quality of data method for improving based on transfer learning according to claim 1, feature It is, the step 1 specifically includes:
Step 1.1, calculating matrix W;W=(Wi)I=1 ..., NCIt is block diagonal matrix, WiIt is li×liMatrix, WiIn each element It is 1/li, wherein liIt is the quantity of the i-th class sample, NC is sample class sum, i.e.,:
Step 1.2 calculates nuclear matrix K;Kernel function κ (xi, xj) define the dot-product operation in feature space F, i.e. κ (xi, xj) =φ (xi)·φ(xj), each element of nuclear matrix K is κij=κ (xi, xj);The present invention selects Gaussian kernel as kernel function, I.e.:σ > 0 are the bandwidth of Gaussian kernel;
Step 1.3 simplifies object function;The decomposition for using nuclear matrix K feature vector, obtains K=P Λ PT, wherein Λ be by The column vector of the diagonal matrix of nonzero eigenvalue composition, P is that unit character is vectorial and mutually orthogonal, in feature vector and Λ Characteristic value is corresponding;Then object function is reduced to:
λ β=PTWPβ
Wherein, β=Λ PTα, finding out makes λ value maximum β, and corresponding α can be calculated;
Step 1.4, sample are mapped to nuclear space;Sample z to v's is projected as:
3. a kind of power telecom network quality of data method for improving based on transfer learning according to claim 1, feature It is, the step 2 is based on definition:
Define 2.1:A given cluster C and its two submanifold C1And C2, C1∪C2=C andThen:
Par (C, C1, C2)=[SSE (C)-SSE (C1)-SSE(C2)]
Wherein, SSE (C) represent non-center of mass point in C to the distance of center of mass point and, Par (C, C1, C2) represent whether C can be decomposed into Two submanifold C1And C2, value 1:Can, 0:It cannot;
Define 2.2:Give a cluster Ci, sample standard deviation therein is marked as "+" and "-", then cluster CiPurity be:
Purity(Ci) represent CiPurity, i.e., the maximum value of proportion in positive and negative two classes sample;
It specifically includes:
Step 2.1, from Ci2 samples of middle random selection are as initial mean value vector μ1And μ2, respectively as submanifold Ci1And Ci2Matter The heart;
Step 2.2 calculates cluster CiIn each sample and μ1、μ2Euclidean distance, with μ1Recently, then by sample cut-in cluster Ci1, no Then, cut-in cluster Ci2
Step 2.3 is cluster Ci1Calculate new mean vector:If μ1≠μ′1, update μ1For μ '1;It is right Cluster Ci2Do same operation;
If step 2.4, current mean vector do not update, cluster CiFinally it is divided into two submanifold Ci1And Ci2, otherwise, repeat Step 2.2 is to step 2.3.
4. a kind of power telecom network quality of data method for improving based on transfer learning according to claim 1, feature It is, the step 4 specifically includes:
Step 4.1, given L, U and O, i=1, iterations N;
Step 4.2, LearnKDA=L, if i > 1, LearnKDA=L ∪ Si-1
Step 4.3 uses Kernel discriminant analysis method to the sample in set LearnKDA, finds nuclear mapping space;
It is NL that step 4.4, set L, U and O, which are respectively mapped to nuclear space,i, NUiAnd NOi
Step 4.5, with two points of k mean algorithm clustering methods, in NOiIn select sample, the sample set being selected be SOi, use Set SiIndicate SOiSample set in luv space;
Step 4.6 utilizes SOiAnd NLiTrain a MODEL Ci, to set NUiIn each sample predictions label;
Step 4.7 enables i=i+1, repeats step 4.2 to step 4.5, until i=N;
Step 4.8 finally obtains N kind prediction results to set U, with majority voting method, determines sample in set U finally Label.
CN201810445948.6A 2018-05-11 2018-05-11 A kind of power telecom network quality of data method for improving based on transfer learning Pending CN108664607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810445948.6A CN108664607A (en) 2018-05-11 2018-05-11 A kind of power telecom network quality of data method for improving based on transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810445948.6A CN108664607A (en) 2018-05-11 2018-05-11 A kind of power telecom network quality of data method for improving based on transfer learning

Publications (1)

Publication Number Publication Date
CN108664607A true CN108664607A (en) 2018-10-16

Family

ID=63779040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810445948.6A Pending CN108664607A (en) 2018-05-11 2018-05-11 A kind of power telecom network quality of data method for improving based on transfer learning

Country Status (1)

Country Link
CN (1) CN108664607A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210018A (en) * 2019-05-14 2019-09-06 北京百度网讯科技有限公司 It registers the matching process and device of department
CN110490275A (en) * 2019-06-28 2019-11-22 北京理工大学 A kind of driving behavior prediction technique based on transfer learning
CN110766212A (en) * 2019-10-15 2020-02-07 哈尔滨工程大学 Ultra-short-term photovoltaic power prediction method for historical data missing electric field

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210018A (en) * 2019-05-14 2019-09-06 北京百度网讯科技有限公司 It registers the matching process and device of department
CN110210018B (en) * 2019-05-14 2023-07-11 北京百度网讯科技有限公司 Matching method and device for registration department
CN110490275A (en) * 2019-06-28 2019-11-22 北京理工大学 A kind of driving behavior prediction technique based on transfer learning
CN110490275B (en) * 2019-06-28 2020-07-07 北京理工大学 Driving behavior prediction method based on transfer learning
CN110766212A (en) * 2019-10-15 2020-02-07 哈尔滨工程大学 Ultra-short-term photovoltaic power prediction method for historical data missing electric field

Similar Documents

Publication Publication Date Title
CN113822494B (en) Risk prediction method, device, equipment and storage medium
US10599623B2 (en) Matching multidimensional projections of functional space
US9990380B2 (en) Proximity search and navigation for functional information systems
Chong et al. Simultaneous image classification and annotation
CN108875816A (en) Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion
Mazzetto et al. Adversarial multi class learning under weak supervision with performance guarantees
CN116644755B (en) Multi-task learning-based few-sample named entity recognition method, device and medium
Athani et al. Student academic performance and social behavior predictor using data mining techniques
CN108664607A (en) A kind of power telecom network quality of data method for improving based on transfer learning
Li et al. Beyond confusion matrix: learning from multiple annotators with awareness of instance features
Sun et al. Hierarchical multilabel classification with optimal path prediction
Huang et al. Learning consistent region features for lifelong person re-identification
CN114093445B (en) Patient screening marking method based on partial multi-marking learning
CN117171413B (en) Data processing system and method for digital collection management
CN109857892A (en) Semi-supervised cross-module state Hash search method based on category transmitting
Fadhil Hybrid of K-means clustering and naive Bayes classifier for predicting performance of an employee
Shrivastava et al. Selection of efficient and accurate prediction algorithm for employing real time 5G data load prediction
Chefrour et al. A Novel Incremental Learning Algorithm Based on Incremental Vector Support Machina and Incremental Neural Network Learn++.
Zhou et al. MetaMove: On improving human mobility classification and prediction via metalearning
Wu et al. Multi-graph-view learning for complicated object classification
Lai et al. A new method for stock price prediction based on MRFs and SSVM
Li et al. CRNN: Integrating classification rules into neural network
US11875250B1 (en) Deep neural networks with semantically weighted loss functions
Han et al. BALQUE: Batch active learning by querying unstable examples with calibrated confidence
Rastogi et al. Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181016

RJ01 Rejection of invention patent application after publication