CN110263939A

CN110263939A - A kind of appraisal procedure, device, equipment and medium indicating learning model

Info

Publication number: CN110263939A
Application number: CN201910549544.6A
Authority: CN
Inventors: 周晋; 李超; 王翔
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2019-09-20

Abstract

This application discloses a kind of appraisal procedures for indicating learning model, it include: to generate Performance Evaluating Indexes for the expression learning model being trained based on unsupervised mode, it includes at least one of the first index and the second index, wherein, first index is the expression vector of each sample in the first sample subset learnt in the training process based on expression learning model, generate for measuring the quantizating index that similar sample is close and inhomogeneity sample is mutually become estranged, second index is the corresponding similarity vector of expression vector of each sample in the second sample set learnt in the training process based on expression learning model, what is generated indicates the quantizating index of stability for measuring sample, according to the Performance Evaluating Indexes, determine the training for indicating learning model.By above-mentioned quantizating index, so that being no longer dependent on subsequent machine learning task, the training iterative process that whole table dendrography is practised is greatly speeded up.Disclosed herein as well is corresponding device, equipment and media.

Description

A kind of appraisal procedure, device, equipment and medium indicating learning model

Technical field

This application involves field of computer technology more particularly to a kind of appraisal procedure, device, equipment for indicating learning model And computer storage medium.

Background technique

It indicates that study refers to the expression by learning data, initial data is converted into effectively being opened by machine learning The form of hair, so that being easier to extract the task of useful information when its subsequent builds classifier or other prediction tasks.It is popular For, vector expression is exactly converted the data into, while vector being made to include data as much as possible, useful to follow-up work Information.In recent years, indicate that study is widely noticed in fields such as voice, images.

Unsupervised expression study refers to that training indicates learning model on no label training data.Due to not known mark The result of unsupervised learning, can not be compared by label with physical tags, so being difficult to assess the model of unsupervised learning.

In general, the assessment to the expression learning model based on the training of unsupervised mode, is to rely on subsequent machine learning and appoints The assessment result of business, this results in the cycle stretch-out of the unsupervised training for indicating learning model, Optimized Iterative, increases model instruction Experienced time cost drags the iteration speed of slow model, practical application is caused to lose.

Summary of the invention

This application provides a kind of appraisal procedures for indicating learning model, propose the quantization of two kinds of assessment training qualities Index, to find the abnormal conditions in training process in time, is avoided with measuring the unsupervised training condition for indicating learning model Cycle of training extends, training speed slows down and training time increased costs, and then avoids causing to damage to practical application.This Shen It please additionally provide corresponding device, equipment, medium and computer program product.

The application first aspect provides a kind of appraisal procedure for indicating learning model, which comprises

For the expression learning model being trained based on unsupervised mode, generates the performance for indicating learning model and comment Valence index, the Performance Evaluating Indexes include at least one of the first index and the second index；

Wherein, first index is first sample learnt in the training process based on the expression learning model Concentrate the expression vector of each sample, generation for measuring the quantizating index that similar sample is close and inhomogeneity sample is mutually become estranged； The first sample subset is that the first subset concentrated to the training sample for indicating learning model carries out label for labelling generation , first subset includes different classes of sample；

Second index is in the second sample set learnt in the training process based on the expression learning model The corresponding similarity vector of expression vector of each sample, generation indicate the quantizating index of stability for measuring sample；Described Two sample sets are the second subset that the training sample is concentrated；

According to the Performance Evaluating Indexes, the training for indicating learning model is determined.

The application second aspect provides a kind of assessment device for indicating learning model, and described device includes:

Index generation module, for generating the table for the expression learning model being trained based on unsupervised mode Show that the Performance Evaluating Indexes of learning model, the Performance Evaluating Indexes include at least one of the first index and the second index；

Evaluation module, for determining the training for indicating learning model according to the Performance Evaluating Indexes.

Optionally, the index generation module includes:

First acquisition submodule, for obtaining the expression learning model, to be directed to first sample subset in the training process each The expression vector that sample learning obtains；

It generates submodule and determines all kinds of samples for the expression vector sum label according to each sample of first sample subset Distance in distance and classification, generates division according to the ratio of distance in distance between the classification and the classification between this classification Than；

First determines submodule, for using the division ratio as the first index.

Optionally, the Performance Evaluating Indexes include the first index；

The evaluation module is specifically used for:

When multiple division ratios based on determined by multiple iteration rounds are in convergence state and convergence in preset time period When value is greater than the first reference threshold, determine that the training for indicating learning model tends towards stability.

Optionally, the index generation module includes:

Second acquisition submodule, for obtaining the expression learning model, multiple iteration rounds are directed to instruction in the training process Practice the expression vector that each sample learning of sample set obtains；

Add submodule, for learn according to each iteration round the training sample concentration various kinds sheet expression to Amount originally selects most like preset quantity sample for the various kinds in second sample set respectively, will be for selected by sample The similar sample of preset quantity similar sample corresponding with each sample in second sample set and iteration round is added Collection；

Second determines submodule, is used for multiple similar sample sets corresponding for sample each in second sample set, The corresponding refined carr index of each sample in second sample set is generated, using the refined carr index as the second index.

Optionally, the Performance Evaluating Indexes include the second index；

Then the evaluation module is specifically used for:

When the sample accounting for being greater than refined carr index threshold value in second sample set is more than preset ratio, institute is determined Stating indicates that the training of learning model tends towards stability.

Optionally, the Performance Evaluating Indexes include the first index and the second index；

Then the evaluation module includes:

Submodule is weighted, for being weighted processing to first index and second index；

Submodule is assessed, for determining the training for indicating learning model according to weighting processing result.

Optionally, described device further include:

First display module, the performance evaluation for being generated according to the expression learning model difference iteration round refer to Mark draws and shows that the training effect curve for indicating learning model, the training effect curve indicate the expression study The performance of model with training process situation of change.

Optionally, the index generation module is specifically used for:

For the expression learning model for being configured with different hyper parameters, the performance evaluation of different iteration rounds is generated Index；

Described device further include:

Second display module, for drawing and showing the contrast effect figure for indicating learning model, the contrast effect Figure is for showing the respective training effect curve of the expression learning model based on different hyper parameters, the training effect curve Indicate the performance for indicating learning model with the situation of change of training process.

Optionally, the expression learning model is that term vector indicates learning model.

The application third aspect provides a kind of terminal device, and the terminal device includes processor and memory:

The memory is for storing computer program；

The processor be used for according to the computer program execute the application first aspect described in expression learning model Appraisal procedure.

The application fourth aspect provides a kind of computer readable storage medium, and the computer readable storage medium is for depositing Computer program is stored up, the computer program is used to execute the appraisal procedure that learning model is indicated described in above-mentioned first aspect.

The 5th aspect of the application provides a kind of computer program product including instruction, when run on a computer, So that the computer executes the appraisal procedure for indicating learning model described in above-mentioned first aspect.

As can be seen from the above technical solutions, the embodiment of the present application has the advantage that

A kind of appraisal procedure for indicating learning model is provided in the embodiment of the present application, for using the training of unsupervised mode Expression learning model, this method propose two amounts indexs, can be weighed by least one of both quantizating index The training condition of the model is measured, specifically, the first index is that the partial data concentrated to training sample is labeled, and is being trained Cheng Zhong calculates the group inner distance and group distance of Different categories of samples based on the labeled data, according to above-mentioned group inner distance and group spacing Similar sample is characterized for measuring the quantizating index that similar sample is close and inhomogeneity sample is mutually become estranged, the quantizating index from generation This is more close, and inhomogeneity sample is more become estranged, then shows that the classification capacity of model is better, in this way, can be true based on first index Surely the training of learning model is indicated, the second index is to concentrate determining section to divide sample from training sample, in the training process, week Phase property, which calculates its, indicates the similarity degree that vector and entire training sample concentrate sample to indicate, determine in each period with it is above-mentioned Indicate the corresponding similarity vector of vector, and then what is generated indicates the quantizating index of stability, the quantizating index for measuring sample It is more stable, then show that model is more stable, in this way, can determine the training for indicating learning model based on second index.

By above-mentioned quantizating index, user is allowed to grasp model training situation in time, i.e. whether model is gradually becoming It is good, whether training can terminate etc., and it is no longer dependent on subsequent machine learning task, so that the training that whole table dendrography is practised Iterative process is greatly speeded up, and is eliminated and subsequent added a model training, adjustment and assess the spent time.Moreover, should Method provides quantitative evaluation as a result, can determine the subsequent method of adjustment of model by the performance that different parameters combine, on the one hand Historical experience is avoided relying on, subjectivity leads to by force the problem of being easy error, on the other hand makes being automatically adjusted to for hyper parameter It is possible.

Detailed description of the invention

Fig. 1 is the scene framework figure that the appraisal procedure of learning model is indicated in the embodiment of the present application；

Fig. 2 is the flow chart that the appraisal procedure of learning model is indicated in the embodiment of the present application；

Fig. 3 is the time consumption for training comparison diagram that learning model is indicated in the embodiment of the present application；

Fig. 4 is the different training effect figures for indicating learning model in the embodiment of the present application；

Fig. 5 A is the scene figure that the appraisal procedure of learning model is indicated in the embodiment of the present application；

Fig. 5 B is the flow chart that the appraisal procedure of learning model is indicated in the embodiment of the present application；

Fig. 6 is the structural schematic diagram that the assessment device of learning model is indicated in the embodiment of the present application；

Fig. 7 is the structural schematic diagram that the assessment device of learning model is indicated in the embodiment of the present application；

Fig. 8 is the structural schematic diagram that the assessment device of learning model is indicated in the embodiment of the present application；

Fig. 9 is the structural schematic diagram that the assessment device of learning model is indicated in the embodiment of the present application；

Figure 10 is the structural schematic diagram that the assessment device of learning model is indicated in the embodiment of the present application；

Figure 11 is the structural schematic diagram that the assessment device of learning model is indicated in the embodiment of the present application；

Figure 12 is a structural schematic diagram of terminal in the embodiment of the present application.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.

The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.

For the assessment at present to the expression learning model of unsupervised mode training dependent on subsequent machine learning task Assessment result, leads to that the unsupervised cycle of training for indicating learning model extends, training speed slows down, training time increased costs Problem, this application provides the methods that quantitative evaluation can be realized in one kind in the training process, and this method is particular by two kinds What at least one of quantizating index was realized, a kind of quantizating index is marked in advance to low volume data, according to model training into The dimension that degree calculates a kind of characterization model classification capacity in real time carries out quantitative evaluation, and another quantizating index is to determine a small amount of full mistake The sample of journey tracking periodically calculates the similarity that result is indicated with full dose, the stability of assessment result, to realize to model The assessment of stability.

User is made to be no longer dependent on subsequent machine learning task for this method you can learn that training condition, whole table dendrography The training iterative process of habit is greatly speeded up, and is eliminated and subsequent added a model training, adjustment and assess the spent time. Moreover, this method provides quantitative evaluation as a result, can by different parameters combine performance determine the subsequent method of adjustment of model, On the one hand historical experience is avoided relying on, subjectivity leads to by force the problem of being easy error, on the other hand makes the automatic of hyper parameter Adjusting is possibly realized.

It is appreciated that the appraisal procedure provided by the present application for indicating learning model can be applied to arbitrarily have data processing The processing equipment of ability, the processing equipment can be terminal, be also possible to server.Wherein, terminal can be desktop computer Desktop is also possible to the portable mobile terminal equipments such as tablet computer, smart phone, can also be mainframe etc..Service Device, which refers to, provides the equipment of the service of calculating, can be independent calculating equipment, is also possible to the meter of multiple calculating equipment compositions Calculate cluster.

The appraisal procedure provided by the present application for indicating learning model can be stored in the form of a computer program processing and set In standby, processing equipment realizes the appraisal procedure of above-mentioned expression learning model by executing the computer program.Wherein, computer journey Sequence can be independent, be also possible to run on functional module, plug-in unit or small routine etc. on other programs.

In practical application, the appraisal procedure provided by the present application for indicating learning model can be, but not limited to be applied to as schemed In application environment shown in 1.

As shown in Figure 1, terminal 102 is connect with database 104, training sample set, terminal 102 are stored in database 104 Sample is obtained from the training sample set in database 104, learning model is indicated using the training of unsupervised mode, in training process In, terminal 102 generates that similar sample is close and inhomogeneity for measuring based on the expression vector of each sample in first sample subset The first index that sample is mutually become estranged, the corresponding similarity vector generation of expression vector based on each sample in the second sample set are used for Measuring sample indicates that the second index of stability, terminal 102 determine table based at least one of the first index and the second index Show the training of learning model.

Next, from the angle of terminal to each step of the appraisal procedure provided by the embodiments of the present application for indicating learning model Suddenly it is described in detail.

The flow chart of the appraisal procedure of expression learning model shown in Figure 2, this method comprises:

S201: for the expression learning model being trained based on unsupervised mode, the expression learning model is generated Performance Evaluating Indexes.

In the present embodiment, the sample that terminal uses training sample to concentrate, training indicates study mould in the way of unsupervised Type.In order to assess the training condition for indicating learning model, terminal generates the Performance Evaluating Indexes for being directed to the expression learning model, uses Show learning model in evaluation table.

Wherein, Performance Evaluating Indexes include at least one of the first index and the second index.First index is based on pre- A kind of quantizating index that the sample of mark is calculated in real time, the quantizating index can characterization model classification capacity, i.e., will be similar Sample is expressed as similar results, and inhomogeneity sample is expressed as mutually to become estranged the ability of result, and in other words, which can weigh Measure the degree that similar sample is close and inhomogeneity sample is mutually become estranged；Second index is that the sample based on the tracking of a small amount of overall process calculates A kind of obtained quantizating index, the quantizating index are based on periodically calculating a small amount of overall process tracking sample expression result and full dose table Show the similarity of result, model is quantitatively evaluated and exports result stability.

First index is each sample in the first sample subset learnt in the training process based on expression learning model Indicate vector generate.Wherein, first sample subset is that the first subset concentrated based on training sample is generated.Specifically, Determine that a small amount of sample forms the first subset from training sample set, which includes different classes of sample, to determine mould Type carries out available first sample of label for labelling for the sample in the first subset to the classification capacity of different classes of sample Collection.That is, first sample subset includes the corresponding label of sample each in the first subset and first subset.

In specific implementation, terminal, which obtains, indicates that learning model is directed to each sample of first sample subset in the training process The expression vector that acquistion is arrived determines the class of Different categories of samples then according to the expression vector sum label of each sample of first sample subset Distance in distance and classification between other, generates division ratio according to the ratio of distance in distance between the classification and the classification, by institute Division ratio is stated as the first index.The calculating process of first index may refer to following formula:

Wherein, Λ indicates division ratio, d_ABIndicate the distance between classification AB, d_A、d_BRespectively indicate sample distance in classification A With sample distance in classification B, mean indicates average value, is based on this, mean (d_AB) indicate classification AB between average distance, mean (d_A)、mean(d_B) respectively indicate in classification A sample mean distance in sample mean distance and classification B.

Wherein, there are sample distance many forms can use COS distance in some possible implementations It is characterized, specifically may refer to following formula:

Wherein, d is COS distance, and X and Y respectively indicate the corresponding expression vector of different samples, and ‖ X ‖ and ‖ Y ‖ respectively indicates X With the respective length of Y.

Certainly, in practical application, sample distance can also be calculated using other modes, Euclidean distance, Man Ha are such as used Distance, Chebyshev distance etc., the present embodiment is not construed as limiting this.In addition, in some cases, terminal can also use The median of distance is replaced apart from mean value computation division ratio, and the above is only the examples of the application, do not constitute to the application skill The restriction of art scheme.

It is the second sample set learnt in the training process based on the expression learning model for the second index In each sample the corresponding similarity vector of expression vector, generation for measure sample indicate stability quantizating index.Wherein, Second sample set is the second subset that training sample is concentrated.

It is appreciated that indicating that learning model is to be iterated according to round trained, measures sample in order to calculate and indicate steady Qualitative second index, the available expression learning model of terminal are directed to each sample learning of training sample set in multiple iteration rounds Obtained expression vector, the training sample learnt according to each iteration round concentrate the expression vector of various kinds sheet, are the Various kinds in two sample sets originally selects most like N number of sample respectively, will be added for the selected N number of similar sample of sample with Each sample and the corresponding similar sample set of iteration round in second sample set, for each sample in second sample set Corresponding multiple similar sample sets generate the corresponding refined carr index of each sample in second sample set, will be described refined Carr index is as the second index.Wherein, N is preset quantity, can be arranged according to actual needs, such as be set greater than 1 positive integer.

Refined carr index (Jaccard Index) is also referred to as Jaccard likeness coefficient, concentrates for comparative sample similar Property or dispersibility a probability value referred to as hand over and compare equal to the ratio of sample set intersection and sample set union, specific to this Shen Please, it can be calculated by following formula:

Wherein,I-th of sample in the second sample set is characterized in the similar of model iterative value t step (i.e. t round) Sample set,I-th of sample in the second sample set is characterized in the similar of model iterative value t+n step (i.e. t+n round) Sample set, i value are 1 to the positive integer between k, and k is the number of element in the second sample set,It characterizes above-mentioned The intersection element number of two similar sample sets,Characterize of the union element of above-mentioned two similar sample set Number.

S202: according to the Performance Evaluating Indexes, the training for indicating learning model is determined.

In practical application, Performance Evaluating Indexes index includes at least one of division ratio and refined carr index, it is based on This, terminal can determine the training for indicating learning model by following several implementations, be specifically described below.

The first implementation is to be based only upon division to indicate the training of learning model than determining.Specifically, when default The division ratios multiple based on determined by multiple iteration rounds are greater than the first reference in convergence state and convergency value in period When threshold value, determine that the training for indicating learning model tends towards stability.

It is appreciated that division ratio Λ reference value be 1, indicate two classifications inside average distance and classification it Between average distance indifference, be based on this, one it is effective indicate learning training during, the value of division ratio should be greater than 1, and be gradually increased until stablizing.Based on this, terminal can be based on the division of iteration rounds multiple in preset time period determination The convergence situation and convergency value of ratio, which determine, indicates whether the training of learning model tends towards stability.

For the convergence situation and convergency value of division ratio, terminal can be realized in the following way.Specifically, for default Each round iterative process in period calculates each according to the corresponding label of the expression vector sum of sample each in first sample subset Distance in distance and classification between the classification of class sample calculates division ratio based on distance in distance between the category and classification, wherein repeatedly It is denoted as the first division ratio for preceding division ratio, division ratio is denoted as the second division ratio after iteration, when the first division ratio and the second division ratio Be all larger than the first reference threshold, and the second division than the absolute difference with the first division ratio less than the second reference threshold when, then Determine division ratio convergence, and convergency value is greater than the first reference threshold.

It should be noted that preset time period, the first reference threshold and the second reference threshold can according to actual needs and Setting, as the example of the application, preset time period be can be one day, and the first reference threshold can be 2, and second refers to threshold Value can be 0.01.

For division ratio, numerical value is bigger, then shows that the difference between classification is more obvious, and inside of all categories is more similar, The unsupervised classification standard for indicating the result learnt and meeting on small sample is reflected, indicates that the expression vector of learning model output is taken It is protected with valuable information, the validity of entire model training.

Second of implementation is to be based only upon refined carr index to determine the training for indicating learning model.Specifically, needle Refined carr index is determined when the sample accounting for being greater than refined carr index threshold value in the second sample set is more than preset ratio The training for indicating learning model tends towards stability.

It is appreciated that during an effective expression learning training, as what expression learning model was trained gos deep into, the High correlation should be presented in the corresponding similar sample set of each sample in two sample sets, and has gradually fixed trend, change Largely changing no longer occur in Yan Zhi, the corresponding similar sample set of each sample.Based on this, terminal can be based on refined in preset time period The size of carr index, which determines, indicates whether the training of learning model tends towards stability.For the sample in the second sample set This, Ruo Qiya carr index is greater than refined carr index threshold value, then shows the similar sample set of the sample almost one before and after iteration It causes, if the refined carr index more than the sample of preset ratio is all larger than refined carr index threshold value, shows to indicate that learning model becomes In stabilization.

It should be noted that preset ratio and refined carr index threshold value can be arranged according to actual needs, as this Shen An example please, refined carr index threshold value can be set to 70%, and preset ratio can be set to 80%.

The third implementation is to indicate the training feelings of learning model than determining jointly with refined carr index based on division Condition.When determining training jointly based on division ratio and refined carr index, it can be and judge that division ratio and refined karr refer to respectively Whether number meets corresponding standard, so that it is determined that indicating the training of learning model；It is also possible to division ratio and refined Carr index is weighted processing, determines the training for indicating learning model according to weighting processing result.

It should be noted that division is than belonging to different dimensions with refined carr index, to division ratio and refined carr index When being weighted processing, first division ratio and refined carr index can also be normalized, the index after being then based on normalization It is weighted processing.Wherein, division ratio and the respective weight of refined carr index can be arranged according to actual needs.

It should also be noted that, above-mentioned three kinds are achieved in that using the first index as division ratio, the second index is refined karr What index illustrated, in the embodiment of the present application in other possible implementations, the first index and the second index are When other parameters, the training for indicating learning model can be determined with reference at least one of other parameters.

From the foregoing, it will be observed that the embodiment of the present application provides a kind of appraisal procedure for indicating learning model, for using unsupervised The expression learning model of mode training, this method propose two amounts indexs, pass through at least one in both quantizating index Kind can measure the training condition of the model, and specifically, the first index is that the partial data concentrated to training sample is labeled, In the training process, the group inner distance and group distance that Different categories of samples is calculated based on the labeled data, according to above-mentioned group inner distance With group distance generate for measuring the quantizating index that similar sample is close and inhomogeneity sample is mutually become estranged, the quantizating index table It is more close to levy similar sample, inhomogeneity sample is more become estranged, then show that the classification capacity of model is better, in this way, can based on this One index determines the training for indicating learning model, and the second index is to concentrate determining section to divide sample from training sample, in training In the process, its similarity degree for indicating that vector and entire training sample concentrate sample to indicate periodically is calculated, determines each week Interim similarity vector corresponding with above-mentioned expression vector, and then what is generated indicates the quantizating index of stability for measuring sample, The quantizating index is more stable, then shows that model is more stable, in this way, can determine the instruction for indicating learning model based on second index Practice situation.

By above-mentioned quantizating index, user is allowed to grasp model training situation in time, i.e. whether model is gradually becoming It is good, whether training can terminate etc., and it is no longer dependent on subsequent machine learning task, so that the training that whole table dendrography is practised Iterative process is greatly speeded up, and is eliminated and subsequent added a model training, adjustment and assess the spent time.

Unsupervised expression study is carried out this application provides the appraisal procedure of the expression learning model based on the application to be consumed The comparison diagram of the time and tradition the taken unsupervised time for indicating that study is spent, as shown in figure 3, conventional machines study time-consuming is 7 days, time (2 days) were expended including unsupervised expression study and unsupervised expression study is determined based on subsequent machine learning task Learn the time (5 days) spent by situation, and it is true can be based on Performance Evaluating Indexes in unsupervised expression learning process by the application Determine study condition, without determining the current unsupervised study condition for indicating study by subsequent machine learning, directly saves subsequent Machine learning task is time-consuming, accelerates trained progress.

In view of above-mentioned Performance Evaluating Indexes be it is variable, terminal can also according to indicate learning model difference iteration round The Performance Evaluating Indexes generated are drawn and show that the training effect curve for indicating learning model, the training effect are bent Line indicates the performance for indicating learning model with the situation of change of training process, so that the process of model training and effect can Depending on changing, user is intuitively allowed to find whether the training of model is effective.

Further, terminal can also generate difference and change for the expression learning model configured with different hyper parameters For the Performance Evaluating Indexes of round, draws and show the contrast effect figure for indicating learning model, the contrast effect figure For showing the respective training effect curve of the expression learning model based on different hyper parameters, user can be helped to determine On the one hand model type selecting and the direction for adjusting ginseng avoid relying on historical experience, subjectivity causes by force the problem of being easy error, another Aspect makes the automatic adjustment of hyper parameter become possible.

In order to make it easy to understand, present invention also provides a specific examples of contrast effect figure.As shown in figure 4, it illustrates 5 The different hyper parameters of group combine the corresponding training effect curve for indicating learning model, i.e., and 41 to 45, wherein curve 41 and curve 42 The corresponding division ratio for indicating learning model converges on high value, and curve 43, curve 44 and the corresponding expression of curve 45 learn The division ratio of model converges on high value, and the corresponding expression learning model of curve 43 reaches stable state at first.

The appraisal procedure provided by the present application for indicating learning model can be applied in a variety of unsupervised expression learning tasks, As t- distribution random neighborhood insertion study (T-distributed Stochastic Neighbor Embedding, t-SNE), Manifold learning, term vector indicate study, and are suitable for various for indicating the loss function of study, including but not limited to noise pair Compared estimate loss function (Noise-Contrastive Loss, NCE Loss).

In order to enable the technical solution of the application it is clearer, it can be readily appreciated that below with reference to " user accesses in one month The vectorization for the domain name crossed indicates " this concrete scene, the appraisal procedure of the expression learning model of the application is introduced.

Study is indicated shown in application scenario diagram and Fig. 5 B referring to the appraisal procedure for indicating learning model shown in Fig. 5 A Appraisal procedure flow chart, include terminal 102 in the application scenarios, terminal 102 obtained it at one month from local cache Interior access record, extracts domain name from access record, then generates training sample set for the domain name as sample, be based on the instruction Practicing sample set training in the way of unsupervised learning indicates learning model, to realize domain name vectorization.

In the training process, also it is achieved by the steps of the assessment to learning model is indicated:

Step 1: it is concentrated from training sample and manually selects part domain name the first subset of formation, to the sample in the first subset It is labeled and generates first sample subset.

Step 2: selected part domain name is concentrated to form second subset from training sample, using the second subset as the second sample This subset.

Wherein, the first subset includes at least two biggish domain names of class difference, such as may include Concern Mafia website domain name and length Video on demand website domain name.In practical application, every class takes about 50 samples, it to be used for subsequent calculating division ratio.For second Subset, domain name quantity can be arranged according to actual needs, in the present embodiment, second subset include 9 domain names, this 9 Domain name is added into concern list, so as to subsequent the second index based on domain name computational representation model stability in concern list.

It should be noted that can also successively be executed according to setting sequence Step 1: step 2 can execute parallel, this Embodiment is not construed as limiting this.

Step 3: the expression learning model of domain name is directed to based on training sample set training.

In the present embodiment, it can be word2vec model for the expression learning model of domain name, which is with domain name Input indicates vector for output so that domain name is corresponding.The domain name that training sample is concentrated is input in word2vec model, model Corresponding feature can be therefrom extracted based on learning algorithm, be converted into vector, to generate the expression vector for being directed to domain name.

Step 4: during model training, mark vector and label synchronometer based on each sample of first sample subset Calculate the corresponding division ratio of each iteration round.

For each iteration round, obtains the word2vec model and be directed to the table that each sample learning of first sample subset obtains Show vector, the label for being then based on the expression vector and each sample can determine between the classification of Different categories of samples in distance and classification Distance, the ratio for calculating distance in distance and classification between classification can be obtained division ratio.In this way, each iteration round pair can be obtained The division ratio answered.

Step 5: during model training, by domain name in concern list and full dose training sample (i.e. training sample concentration All samples) periodically carry out cosine similarity calculating, obtain concern list in the corresponding similar domain name of domain name, formed Similar sample set, for the corresponding refined carr index of domain name each in the second sample set of above-mentioned similar sample set generation.

In specific implementation, the quantity of the similar domain name of selection can be set according to demand.As an example, this implementation Most similar 8 domain names of domain name form similar sample set in example selection concern list.

Wherein, step 4 and step 5 can execute parallel, can also successively execute according to setting sequence.

Step 6: judging whether division ratio and stabilization are all satisfied preset condition, if so then execute step seven, if it is not, then returning Return step 3.

Step 7: the training for indicating learning model is determined.

Specifically, for division ratio, it can be determined that whether it restrains, and whether convergency value is greater than the first reference threshold, if Be then show model to similar sample and inhomogeneity sample standard deviation have it is preferable indicate effect, and model tends towards stability state.

For refined carr index, it can be determined that when the sample for being greater than refined carr index threshold value in second sample set accounts for Than whether being more than preset ratio, if so, the training of characterization model tends towards stability.In some cases, above-mentioned default ratio Example can be set to 100%, that is, judge whether the refined carr index of sample in first sample subset is all larger than refined carr index Threshold value, if each sample is all satisfiedWherein, p is refined carr index threshold value, then it is assumed that word2vec word Vectorization indicates that the training result of study tends towards stability.

When division ratio and refined carr index are all satisfied condition, it is determined that word2vec model tends towards stability state, when point When at least one in composition and division in a proportion and refined carr index is unsatisfactory for condition, then showing model also has optimization space, can be to model Optimize adjustment.

The above are some specific implementations of the appraisal procedure provided by the embodiments of the present application for indicating learning model, are based on This, the embodiment of the present application also provides corresponding devices, it is introduced from the angle of function modoularization below.

The assessment device of expression learning model shown in Figure 6, the device 600 include:

Index generation module 610, for for the expression learning model that is trained based on unsupervised mode, described in generation Indicate the Performance Evaluating Indexes of learning model, the Performance Evaluating Indexes include at least one in the first index and the second index It is a；

Evaluation module 620, for determining the training for indicating learning model according to the Performance Evaluating Indexes.

Optionally, show referring to the structure that Fig. 7, Fig. 7 are the assessment device provided by the embodiments of the present application for indicating learning model It is intended to, on the basis of structure shown in Fig. 6, the index generation module 610 includes:

First acquisition submodule 611 is directed to first sample for obtaining the expression learning model in the training process Collect the expression vector that each sample learning obtains；

Submodule 612 is generated to determine all kinds of for the expression vector sum label according to each sample of first sample subset Distance in distance and classification between the classification of sample, generates division according to the ratio of distance in distance between the classification and the classification Than；

First determines submodule 613, for using the division ratio as the first index.

Optionally, the Performance Evaluating Indexes include the first index；

The evaluation module 620 is specifically used for:

Optionally, show referring to the structure that Fig. 8, Fig. 8 are the assessment device provided by the embodiments of the present application for indicating learning model It is intended to, on the basis of structure shown in Fig. 6, the index generation module 610 includes:

Second acquisition submodule 614, for obtaining the expression learning model multiple iteration round needles in the training process The expression vector that each sample learning of training sample set is obtained；

Submodule 615 is added, the training sample for learning according to each iteration round concentrates the table of various kinds sheet Show vector, originally selects most like preset quantity sample respectively for the various kinds in second sample set, will be sample institute The similar sample of preset quantity of selection is added corresponding similar with each sample in second sample set and iteration round Sample set；

Second determines submodule 616, for for the corresponding multiple similar samples of sample each in second sample set Collection generates the corresponding refined carr index of each sample in second sample set, will the refined carr index as the second finger Mark.

Optionally, the Performance Evaluating Indexes include the second index；

Then the evaluation module 620 is specifically used for:

Optionally, show referring to the structure that Fig. 9, Fig. 9 are the assessment device provided by the embodiments of the present application for indicating learning model It is intended to, on the basis of structure shown in Fig. 6, the Performance Evaluating Indexes include the first index and the second index；

The evaluation module 620 includes:

Submodule 621 is weighted, for being weighted processing to first index and second index；

Submodule 622 is assessed, for determining the training for indicating learning model according to weighting processing result.

It should be noted that on the basis of Fig. 9 is also possible to shown in Fig. 7 or Fig. 8, including above-mentioned weighting submodule and assessment Submodule.

It optionally, is the structure of the assessment device provided by the embodiments of the present application for indicating learning model referring to Figure 10, Figure 10 Schematic diagram, on the basis of structure shown in Fig. 6, the Performance Evaluating Indexes include the first index and the second index；

Described device 600 further include:

First display module 630, the performance for being generated according to the expression learning model difference iteration round are commented Valence index draws and shows that the training effect curve for indicating learning model, the training effect curve indicate the expression The performance of learning model with training process situation of change.

Certainly, Figure 10 is also possible on the basis of Fig. 6 to Fig. 9 further include the first display module.

It optionally, is the structure of the assessment device provided by the embodiments of the present application for indicating learning model referring to Figure 11, Figure 11 Schematic diagram, on the basis of structure shown in Fig. 6, the Performance Evaluating Indexes include the first index and the second index；

The index generation module 610 is specifically used for:

Described device 600 further include:

Second display module 640, for drawing and showing the contrast effect figure for indicating learning model, the comparison effect For fruit figure for showing the respective training effect curve of the expression learning model based on different hyper parameters, the training effect is bent Line indicates the performance for indicating learning model with the situation of change of training process.

Wherein, Figure 11 is also possible on the basis of Fig. 6 to Fig. 9 further include the second display module.

The embodiment of the present application also provides a kind of equipment, which specifically can be terminal, as shown in figure 12, for the ease of Illustrate, illustrates only part relevant to the embodiment of the present application, it is disclosed by specific technical details, please refer to the embodiment of the present application Method part.The terminal can be include mobile phone, tablet computer, personal digital assistant (full name in English: Personal Digital Assistant, english abbreviation: PDA), point-of-sale terminal (full name in English: Point of Sales, english abbreviation: POS), vehicle mounted electric Any terminal device such as brain, taking the terminal as an example:

Figure 12 shows the block diagram of the part-structure of mobile phone relevant to terminal provided by the embodiments of the present application.With reference to figure 12, mobile phone includes: radio frequency (full name in English: Radio Frequency, english abbreviation: RF) circuit 1210, memory 1220, defeated Enter unit 1230, display unit 1240, sensor 1250, voicefrequency circuit 1260, Wireless Fidelity (full name in English: wireless Fidelity, english abbreviation: WiFi) components such as module 1270, processor 1280 and power supply 1290.Those skilled in the art It is appreciated that handset structure shown in Figure 12 does not constitute the restriction to mobile phone, it may include more more or fewer than illustrating Component perhaps combines certain components or different component layouts.

It is specifically introduced below with reference to each component parts of the Figure 12 to mobile phone:

RF circuit 1210 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, handled to processor 1280；In addition, the data for designing uplink are sent to base station.In general, RF circuit 1210 include but is not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (full name in English: Low Noise Amplifier, english abbreviation: LNA), duplexer etc..In addition, RF circuit 1210 can also by wireless communication with net Network and other equipment communication.Any communication standard or agreement can be used in above-mentioned wireless communication, and including but not limited to the whole world is mobile Communication system (full name in English: Global System of Mobile communication, english abbreviation: GSM), general point Group wireless service (full name in English: General Packet Radio Service, GPRS), CDMA (full name in English: Code Division Multiple Access, english abbreviation: CDMA), wideband code division multiple access (full name in English: Wideband Code Division Multiple Access, english abbreviation: WCDMA), long term evolution (full name in English: Long Term Evolution, english abbreviation: LTE), Email, short message service (full name in English: Short Messaging Service, SMS) etc..

Memory 1220 can be used for storing software program and module, and processor 1280 is stored in memory by operation 1220 software program and module, thereby executing the various function application and data processing of mobile phone.Memory 1220 can be led It to include storing program area and storage data area, wherein storing program area can be needed for storage program area, at least one function Application program (such as sound-playing function, image player function etc.) etc.；Storage data area, which can be stored, uses institute according to mobile phone Data (such as audio data, phone directory etc.) of creation etc..In addition, memory 1220 may include high random access storage Device, can also include nonvolatile memory, and a for example, at least disk memory, flush memory device or other volatibility are solid State memory device.

Input unit 1230 can be used for receiving the number or character information of input, and generate with the user setting of mobile phone with And the related key signals input of function control.Specifically, input unit 1230 may include touch panel 1231 and other inputs Equipment 1232.Touch panel 1231, also referred to as touch screen collect touch operation (such as the user of user on it or nearby Use the behaviour of any suitable object or attachment such as finger, stylus on touch panel 1231 or near touch panel 1231 Make), and corresponding attachment device is driven according to preset formula.Optionally, touch panel 1231 may include touch detection Two parts of device and touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch operation band The signal come, transmits a signal to touch controller；Touch controller receives touch information from touch detecting apparatus, and by it It is converted into contact coordinate, then gives processor 1280, and order that processor 1280 is sent can be received and executed.In addition, Touch panel 1231 can be realized using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.In addition to touch surface Plate 1231, input unit 1230 can also include other input equipments 1232.Specifically, other input equipments 1232 may include But in being not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. It is one or more.

Display unit 1240 can be used for showing information input by user or be supplied to user information and mobile phone it is each Kind menu.Display unit 1240 may include display panel 1241, optionally, can using liquid crystal display (full name in English: Liquid Crystal Display, english abbreviation: LCD), Organic Light Emitting Diode (full name in English: Organic Light- Emitting Diode, english abbreviation: OLED) etc. forms configure display panel 1241.Further, touch panel 1231 can Covering display panel 1241 sends processor to after touch panel 1231 detects touch operation on it or nearby 1280, to determine the type of touch event, are followed by subsequent processing device 1280 and are provided on display panel 1241 according to the type of touch event Corresponding visual output.Although touch panel 1231 and display panel 1241 are come as two independent components in Figure 12 Realize the input and input function of mobile phone, but in some embodiments it is possible to by touch panel 1231 and display panel 1241 It is integrated and that realizes mobile phone output and input function.

Mobile phone may also include at least one sensor 1250, such as optical sensor, motion sensor and other sensors. Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light Light and shade adjust the brightness of display panel 1241, proximity sensor can close display panel when mobile phone is moved in one's ear 1241 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (generally three axis) and add The size of speed can detect that size and the direction of gravity when static, can be used to identify application (such as the horizontal/vertical screen of mobile phone posture Switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.；Also as mobile phone The other sensors such as configurable gyroscope, barometer, hygrometer, thermometer, infrared sensor, details are not described herein.

Voicefrequency circuit 1260, loudspeaker 1261, microphone 1262 can provide the audio interface between user and mobile phone.Audio Electric signal after the audio data received conversion can be transferred to loudspeaker 1261, be converted by loudspeaker 1261 by circuit 1260 For voice signal output；On the other hand, the voice signal of collection is converted to electric signal by microphone 1262, by voicefrequency circuit 1260 Audio data is converted to after reception, then by after the processing of audio data output processor 1280, through RF circuit 1210 to be sent to ratio Such as another mobile phone, or audio data is exported to memory 1220 to be further processed.

WiFi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronics postal by WiFi module 1270 Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Figure 12 is shown WiFi module 1270, but it is understood that, and it is not belonging to must be configured into for mobile phone, it can according to need do not changing completely Become in the range of the essence of invention and omits.

Processor 1280 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone, By running or execute the software program and/or module that are stored in memory 1220, and calls and be stored in memory 1220 Interior data execute the various functions and processing data of mobile phone, to carry out integral monitoring to mobile phone.Optionally, processor 1280 may include one or more processing units；Preferably, processor 1280 can integrate application processor and modulation /demodulation processing Device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is mainly located Reason wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 1280.

Mobile phone further includes the power supply 1290 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply Management system and processor 1280 are logically contiguous, to realize management charging, electric discharge and power consumption pipe by power-supply management system The functions such as reason.

Although being not shown, mobile phone can also include camera, bluetooth module etc., and details are not described herein.

In the embodiment of the present application, processor 1280 included by the terminal is also with the following functions:

Optionally, processor 1280 is also used to execute the appraisal procedure provided by the embodiments of the present application for indicating learning model The step of any one implementation.

The embodiment of the present application also provides a kind of computer readable storage medium, for storing computer program, the computer Program is used to execute any one embodiment party in a kind of appraisal procedure of expression learning model described in foregoing individual embodiments Formula.

The embodiment of the present application also provides a kind of computer program product including instruction, when run on a computer, So that computer executes any one implementation in a kind of appraisal procedure of expression learning model described in foregoing individual embodiments Mode.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (full name in English: Read-Only Memory, english abbreviation: ROM), random access memory (full name in English: Random Access Memory, english abbreviation: RAM), the various media that can store program code such as magnetic or disk.

The above, above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although referring to before Embodiment is stated the application is described in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features；And these It modifies or replaces, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of appraisal procedure for indicating learning model characterized by comprising

For the expression learning model being trained based on unsupervised mode, generates the performance evaluation for indicating learning model and refer to Mark, the Performance Evaluating Indexes include at least one of the first index and the second index；

Wherein, first index is in the first sample subset learnt in the training process based on the expression learning model The expression vector of each sample, generation for measuring the quantizating index that similar sample is close and inhomogeneity sample is mutually become estranged；It is described First sample subset is that the first subset concentrated to the training sample for indicating learning model carries out label for labelling generation, institute Stating the first subset includes different classes of sample；

Second index is various kinds in the second sample set learnt in the training process based on the expression learning model This corresponding similarity vector of expression vector, generation indicate the quantizating index of stability for measuring sample；Second sample This subset is the second subset that the training sample is concentrated；

2. the method according to claim 1, wherein the first index of the generation includes:

It obtains the expression learning model and is directed to the expression vector that each sample learning of first sample subset obtains in the training process；

According to the expression vector sum label of each sample of first sample subset, distance and classification between the classification of Different categories of samples are determined Interior distance generates division ratio according to the ratio of distance in distance between the classification and the classification；

Using the division ratio as the first index.

3. according to the method described in claim 2, it is characterized in that, the Performance Evaluating Indexes include the first index；

Then according to the Performance Evaluating Indexes, the training for indicating learning model is determined, comprising:

When multiple division ratios based on determined by multiple iteration rounds are in convergence state in preset time period and convergency value is big When the first reference threshold, determine that the training for indicating learning model tends towards stability.

4. the method according to claim 1, wherein the second index of the generation includes:

Obtaining the expression learning model, multiple iteration rounds are obtained for each sample learning of training sample set in the training process Expression vector；

The expression vector of various kinds sheet is concentrated, according to the training sample that each iteration round learns for the second sample The various kinds of concentration originally selects most like preset quantity sample respectively, will be the similar sample of the selected preset quantity of sample Similar sample set corresponding with each sample in second sample set and iteration round is added；

For the corresponding multiple similar sample sets of sample each in second sample set, generate each in second sample set The corresponding refined carr index of a sample, using the refined carr index as the second index.

5. according to the method described in claim 4, it is characterized in that, the Performance Evaluating Indexes include the second index；

When the sample accounting for being greater than refined carr index threshold value in second sample set is more than preset ratio, the table is determined Show that the training of learning model tends towards stability.

6. method according to any one of claims 1 to 5, which is characterized in that the Performance Evaluating Indexes include the first finger Mark and the second index；

Processing is weighted to first index and second index；

The training for indicating learning model is determined according to weighting processing result.

7. method according to any one of claims 1 to 5, which is characterized in that the method also includes:

According to the Performance Evaluating Indexes that the expression learning model difference iteration round generates, draws and show the expression The training effect curve of learning model, the training effect curve indicate the performance for indicating learning model with training process Situation of change.

8. method according to any one of claims 1 to 5, which is characterized in that the method also includes:

For the expression learning model for being configured with different hyper parameters, the performance evaluation for generating different iteration rounds refers to Mark；

Draw and show the contrast effect figure for indicating learning model, the contrast effect figure is for showing based on different super ginsengs Several respective training effect curves of the expression learning model, the training effect curve indicate the expression learning model Performance with training process situation of change.

9. method according to any one of claims 1 to 5, which is characterized in that the expression learning model is term vector table Show learning model.

10. a kind of assessment device for indicating learning model characterized by comprising

Index generation module, for generating the expression and learning for the expression learning model being trained based on unsupervised mode The Performance Evaluating Indexes of model are practised, the Performance Evaluating Indexes include at least one of the first index and the second index；

11. device according to claim 10, which is characterized in that the index generation module is specifically used for:

Using the division ratio as the first index.

12. device according to claim 10, which is characterized in that the index generation module is specifically used for:

The expression vector of various kinds sheet is concentrated, according to the training sample that each iteration round learns for the second sample The various kinds of concentration originally selects most like preset quantity sample respectively, will be the similar sample of the selected preset quantity of sample Similar sample set corresponding with each sample in second sample set and iteration round is added；For the second sample It concentrates the corresponding multiple similar sample sets of each sample, generates the corresponding refined karr of each sample in second sample set and refer to Number, using the refined carr index as the second index.

13. 0 to 12 described in any item methods according to claim 1, which is characterized in that the Performance Evaluating Indexes include first Index and the second index；

The then evaluation module, is specifically used for:

Processing is weighted to first index and second index；

14. a kind of terminal device, which is characterized in that the terminal device includes processor and memory:

The memory is for storing computer program；

The processor is used to require 1 to 9 described in any item methods according to the computer program perform claim.

15. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is for storing computer Program, the computer program require 1 to 9 described in any item methods for perform claim.