CN110287857A

CN110287857A - A kind of training method of characteristic point detection model

Info

Publication number: CN110287857A
Application number: CN201910540027.2A
Authority: CN
Inventors: 齐子铭; 李启东; 李志阳; 张伟; 傅松林
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2019-09-27
Anticipated expiration: 2039-06-20
Also published as: CN110287857B

Abstract

The invention discloses a kind of training methods of characteristic point detection model, suitable for executing in calculating equipment, comprising: obtain the training image with labeled data, labeled data is the characteristic point coordinate of target object in image；Training image is inputted in the first model and is handled, with the first model after being trained；The network structure of the first model after training is handled, the second model is obtained；Based on the network parameter of the first model after training, the network parameter of the second model, the second model after being initialized are initialized；It will be handled in the second model after training image input initialization, to export the predicted characteristics point coordinate of target object in training image, and utilize the characteristic point coordinate of predicted characteristics point coordinate and mark, the second model after initialization is trained, using the second model after being trained as characteristic point detection model.

Description

A kind of training method of characteristic point detection model

Technical field

The present invention relates to depth learning technology field more particularly to a kind of training method of characteristic point detection model, calculate Equipment and storage medium.

Background technique

Characteristic point detection, such as the alignment of cat face, the alignment of dog face are widely used in many reality scenes, are such as being doted on During object is taken pictures or people takes a group photo with pet, by detecting the face position and profile point of pet, it can be added in real time Some textures controls or text, to increase the enjoyment of shooting.But difference is aligned with face, pet is vivaciously active, leads to its appearance State and expression multiplicity, and pet is many kinds of, increases the difficulty of characteristic point detection.

In addition, the very high real-time of characteristic point testing requirements based on mobile terminal, at present characteristic point detection model exist with Lower difficult point: model is larger, can not dispose in mobile terminal；Computational efficiency is low during model training, can not real-time detection go out cat face Point；It directlys adopt mini Mod study and detection accuracy is not achieved.

Therefore, it is necessary to a kind of characteristic point detection models, can either reach the detection accuracy of large-sized model, and are able to satisfy mobile terminal Application demand.

Summary of the invention

For this purpose, the present invention provides a kind of training method of characteristic point detection model, to try hard to solve or at least alleviate At least one existing problem above.

According to an aspect of the invention, there is provided a kind of training method of characteristic point detection model, suitable for being set in calculating Standby middle execution, firstly, obtaining the training image with labeled data, labeled data is that the characteristic point of target object in image is sat Mark.Then, training image is inputted in the first model and is handled, with the first model after being trained.Then, after to training The network structure of the first model handled, obtain the second model.Subsequently, based on the network ginseng of the first model after training Number initializes the network parameter of the second model, the second model after being initialized.Finally, by training image input initialization It is handled in the second model afterwards, to export the predicted characteristics point coordinate of target object in training image, and it is special using prediction Sign point coordinate and mark characteristic point coordinate, training initialization after the second model, using the second model after being trained as Characteristic point detection model.

Optionally, in the above-mentioned methods, the penalty values between the characteristic point coordinate based on mark and predicted characteristics point coordinate, The parameter for updating the second model, when the penalty values meet predetermined condition, training terminates.

Optionally, in the above-mentioned methods, penalty values are calculated based on following formula:

Wherein, N is characterized quantity a little, x_i, y_iThe abscissa and ordinate for the ith feature point respectively predicted, target_xi, target_yiThe abscissa and ordinate of the ith feature point respectively marked.

Optionally, in the above-mentioned methods, the first model includes multiple process of convolution layers, and each process of convolution layer includes multiple Convolution kernel, can be firstly, carry out delete processing, convolution kernel after being deleted to convolution kernel.Then, to the convolution kernel after deletion Carry out deconsolidation process, the convolution kernel after being split.Finally, being grouped to the convolution kernel after fractionation.

Optionally, in the above-mentioned methods, the quantity that convolution kernel weighted value is 0 in each process of convolution layer is counted, if statistics Quantity a predetermined level is exceeded or a certain row/column of convolution kernel/cornerwise weighted value are all 0, then delete the convolution of the first quantity Core.If the quantity of remaining convolution kernel is greater than preset value after deleting, based on the quadratic sum of remaining convolution kernel weighted value, from residue The convolution kernel of the second quantity is deleted in convolution kernel, to obtain pretreated convolution kernel.

Optionally, in the above-mentioned methods, remaining convolution kernel is carried out according to the sequence of the quadratic sum of weighted value from big to small Sequence；Calculate separately the L2 distance between the maximum convolution kernel of weighted value quadratic sum and other each remaining convolution kernels；According to L2 The sequence of distance from big to small, successively deletes the remaining convolution kernel in addition to the maximum convolution kernel of weighted value quadratic sum, until deleting The quantity of remaining convolution kernel after removing is not more than preset value.

Optionally, in the above-mentioned methods, the maximum convolution kernel of weighted value quadratic sum is calculated based on following formula and other are each L2 distance between a residue convolution kernel:

Wherein, n is the quantity of convolution kernel weight, c_kFor k-th of weighted value of the maximum convolution kernel of weighted value quadratic sum, i_k For k-th of weighted value of some other remaining convolution kernel.

Optionally, in the above-mentioned methods, by the convolution kernel after deletion be split as first passage quantity the first convolution kernel and Second convolution kernel of second channel quantity.

Optionally, in the above-mentioned methods, the weight of the center of the convolution kernel before fractionation based on first passage quantity The weighted value of the first convolution kernel after value initialization fractionation；At the beginning of the weighted value of convolution kernel before fractionation based on second channel quantity The weighted value of the second convolution kernel after beginningization fractionation.

Optionally, in the above-mentioned methods, calculate split after each convolution kernel weighted value quadratic sum, and according to from greatly to Small sequence sequence；Based on the sequence of weighted value quadratic sum, the convolution kernel after fractionation is grouped.

Optionally, in the above-mentioned methods, by training image input the first model or initialization after the second model in into Before row processing, training image can be cut based on the characteristic point coordinate of mark；And the image after cutting is counted It is handled according to enhancing.

Optionally, in the above-mentioned methods, the characteristic point coordinate based on target object calculates convex closure, minimum external to obtain Rectangle；Minimum circumscribed rectangle is extended into prearranged multiple, so as to based on the rectangular CROPPED IMAGE after extension.

Optionally, in the above-mentioned methods, data enhancing processing includes stretching, overturning, rotation, affine transformation, blocks, colour cast It is one or more in processing.

According to a further aspect of the present invention, a kind of calculating equipment is provided, comprising: one or more processors；And storage Device；One or more programs, wherein one or more programs store in memory and are configured as being handled by one or more Device executes, and one or more programs include the instruction for executing method as described above.

In accordance with a further aspect of the present invention, a kind of computer-readable storage medium for storing one or more programs is provided Matter, one or more programs include instruction, and instruction is when calculating equipment execution, so that calculating equipment executes side as described above Method.

According to the solution of the present invention, for more complex application scenarios such as cat face, the detections of dog face, first to trained big The network structure of model is handled, and the few mini Mod of parameter amount is obtained, after recycling trained large-sized model initialization process Mini Mod, finally using fine tuning method training managing after mini Mod, obtain characteristic point detection model.Therefore this programme was both It can guarantee characteristic point detection model detection accuracy with higher, and can satisfy the calculating speed of mobile terminal and memory to need It asks.

Detailed description of the invention

To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings Face, these aspects indicate the various modes that can practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical appended drawing reference generally refers to identical Component or element.

Fig. 1 shows the organigram according to an embodiment of the invention for calculating equipment 100；

Fig. 2 shows the schematic of the training method 200 of characteristic point detection model according to an embodiment of the invention Flow chart；

Fig. 3 shows the schematic diagram of training image according to an embodiment of the invention；

Fig. 4 shows the schematic diagram according to an embodiment of the invention cut to training image；

Fig. 5 shows the schematic diagram that 3*3 convolution kernel according to an embodiment of the invention is split；

Fig. 6 shows the schematic diagram of grouping convolution according to an embodiment of the invention.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

Characteristic point detection for target object in image carries out deep learning using complicated, deep layer neural network, High-precision requirement can be reached, but the application in mobile terminal is limited.This programme for cat face detection, dog face detection etc. it is some compared with Complicated application scenarios, handle the higher large-sized model of precision, reduce the parameter amount of model, then recycle trained Large-sized model carrys out the less mini Mod of training parameter, make its can close to large-sized model precision and mobile terminal it is real-time Property require.

Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, calculating equipment 100, which typically comprises, is System memory 106 and one or more processor 104.Memory bus 108 can be used for storing in processor 104 and system Communication between device 106.

Depending on desired configuration, processor 104 can be any kind of processor, including but not limited to: micro process Device (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include all Cache, processor core such as one or more rank of on-chip cache 110 and second level cache 112 etc 114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor 104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.

Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to: easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System storage Device 106 may include operating system 120, one or more is using 122 and program data 124.In some embodiments, It may be arranged to be operated using program data 124 on an operating system using 122.In some embodiments, equipment is calculated 100 are configured as executing the training method 200 of characteristic point detection model, just contain in program data 124 for executing method 200 instruction.

Calculating equipment 100 can also include facilitating from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as facilitate via One or more port A/V 152 is communicated with the various external equipments of such as display or loudspeaker etc.Outside example If interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, facilitates Via one or more port I/O 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, image Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set Standby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and one A or multiple other calculate communication of the equipment 162 by network communication link.

Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or computer readable instructions, data structure, program module in the modulated data signal of other transmission mechanisms etc, and can To include any information delivery media." modulated data signal " can be such signal, one in its data set or Multiple or its change can be carried out in a manner of encoded information in the signal.As unrestricted example, communication media It may include the wired medium of such as cable network or private line network etc, and such as sound, radio frequency (RF), microwave, red Various wireless mediums including (IR) or other wireless mediums outside.Term computer-readable medium used herein may include Both storage medium and communication media.In some embodiments, one or more programs are stored in computer-readable medium, this It include the instruction for executing certain methods in a or multiple programs.

Calculating equipment 100 can be implemented as a part of portable (or mobile) electronic equipment of small size, these electronics are set It is standby to can be such as cellular phone, digital camera, personal digital assistant (PDA), personal media player device, wireless network Browsing apparatus, personal helmet, application specific equipment or may include any of the above function mixing apparatus.Certainly, it counts Calculate equipment 100 and also can be implemented as include desktop computer and notebook computer configuration personal computer, or have The server of above-mentioned configuration.Embodiments of the present invention to this with no restriction.

Fig. 2 shows the schematic of the training method 200 of characteristic point detection model according to an embodiment of the invention Flow chart.This method 200 is suitable for executing in calculating equipment 100.Wherein, characteristic point detection model can be to the figure of input model As carrying out characteristic point detection, the characteristic point coordinate of target object is exported.It, can preparatory construction feature point inspection in the method 200 of execution Survey the network structure of model.According to one embodiment of present invention, the higher characteristic point detection of a precision can be constructed first Model, i.e. the first model.Table 1 shows the subnetwork structural parameters of the first model according to an embodiment of the invention. As shown in table 1, the input layer of the first model is the image of 112*112.It is followed by multiple process of convolution layers, each process of convolution layer Including convolution, batch normalization (BN) and activation processing.It is finally pond layer and full articulamentum.For example, BM_Conv1_BN_ Conv1 indicates that the first convolutional layer, BN indicate batch normalized in ReLU, and ReLU indicates activation processing.BM_ GlobalPooling indicates global pool layer in GlobalPooling, and wherein global_pooling=Ave indicates pond layer It averages to the characteristic pattern of above process of convolution layer output.FullyConnect is full articulamentum in BM_FullyConnect, I.e. as the output layer of model, 56 numerical value (x coordinate and y-coordinate of corresponding 28 characteristic points) is exported.Kh, kw distinguish in parameter Indicate the height and width of convolution kernel, pading is Filling power, and stride is step-length.Group indicates grouping convolution, group=1 Expression is not grouped.In addition, shown in table 1 activation processing used ReLU activation primitive, can also be ReLU, tanh, Any type of activation primitive such as sigmoid, LeakyReLU, it is not limited here.

Table 1

Next Fig. 2 is combined to elaborate this programme, as shown in Fig. 2, method is suitable for step S210, obtaining has mark number According to training image, wherein labeled data be image in target object characteristic point coordinate.Fig. 3 shows according to the present invention one The schematic diagram of the training image of a embodiment.As it was noted above, this method 200 is mainly for detection of animals such as cat, dogs Facial feature points.Therefore in some embodiments in accordance with the present invention, training image is the face comprising toys such as cat face, dog faces The image of image.As shown in figure 3, being labelled with 28 characteristic point coordinates of characterization cat face feature, including 28 cross in cat face image Coordinate and 28 ordinates.The quantity that characteristic point marks in figure is merely exemplary, and can be marked according to the actual application not With the characteristic point of quantity.

In an implementation of the invention, can based on the characteristic point coordinate of mark, to the training image of acquisition into Row cuts processing.Fig. 4 shows the schematic diagram according to an embodiment of the invention cut to training image.Firstly, according to The coordinate of 28 characteristic points of mark calculates convex closure, and then calculates minimum circumscribed rectangle, and 4 vertex are A0, B0, C0, D0, to It is outer expand obtain cat face cut rectangle, 4 vertex for expanding rectangle are A, B, C, D.Calculate convex closure and minimum circumscribed rectangle Method is common sense well known to those skilled in the art, and details are not described herein.

Data enhancing processing can also be carried out to the image after cutting, a variety of disturbances be added to training image, to improve mould The robustness of type.It may include stretching that data enhancing, which is handled, overturning, rotation, affine transformation, exposes, block, colour cast is handled etc.. The problem of can solve data nonbalance in this way makes model have better robustness.

Then in step S220, training image is inputted in the first model and is handled, with first after being trained Model.

It is possible, firstly, to which the parameter of the first model of random initializtion, then uses training image training pattern, constantly adjustment net Network parameter is smaller and smaller until penalty values.The parameter of training pattern can be preserved at the end of training, to train The first good model is used for the training of subsequent second model.An implementation according to the present invention, can be based on the spy of mark Penalty values between sign point coordinate and the characteristic point coordinate of the first model prediction, update the parameter of the first model, until penalty values When meeting predetermined condition, training terminates.For example, predetermined condition can be set to, the first-loss calculated using gradient descent method The difference of penalty values is less than predetermined threshold twice before and after value or the number of iterations reaches pre-determined number.In further embodiments, also The image that can be will acquire is divided into training set and test set, such as collects totally 10658, cat face picture, is divided into training set 10530 , test set 128 is opened.Training, avoids the over-fitting or poor fitting of network by the way of training set and test set cross validation, It is without being limited thereto.

Then in step S230, the network structure of the first model after training is handled, the second model is obtained.

The network structure of the first model can be handled to reduce the parameter of the first model, to reduce the ginseng of model Quantity.First model includes multiple process of convolution layers, and each process of convolution layer includes multiple convolution kernels, and one according to the present invention Embodiment can carry out delete processing to convolution kernel first, the convolution kernel after being deleted.Then, to the convolution kernel after deletion Carry out deconsolidation process, the convolution kernel after being split.Finally, being grouped to the convolution kernel after fractionation.Second model and first Model, which is compared, has less parameter and the convolution number of plies.

For the delete processing of convolution kernel, following methods can be used: count convolution kernel weighted value in each process of convolution layer For 0 quantity, when a certain row/column of quantity a predetermined level is exceeded or convolution kernel/cornerwise weighted value of statistics is all 0, then Delete the convolution kernel of the first quantity.If the quantity of remaining convolution kernel is greater than preset value, based on remaining convolution kernel weighted value Quadratic sum deletes the convolution kernel of the second quantity from remaining convolution kernel, to obtain pretreated convolution kernel.Wherein it is possible to press Remaining convolution kernel is ranked up according to the quadratic sum sequence from big to small of weighted value.Then, it is maximum to calculate weighted value quadratic sum Convolution kernel and other each remaining convolution kernels between L2 distance.Finally, the sequence according to L2 distance from big to small, is successively deleted Except the quantity of the remaining convolution kernel in addition to the maximum convolution kernel of weighted value quadratic sum, the remaining convolution kernel after deleting is little In preset value.

By taking 3*3 convolution kernel as an example, N channel is reduced to the channel M.If certain a line, a certain column or right in 3x3 convolution kernel The weighted value of linea angulata is all 0, then deletes the convolution kernel.Or in 9 weighted values of 3*3 convolution kernel for 0 quantity be more than 7, Then delete the convolution kernel.If the port number of remaining convolution kernel is still greater than M after the deletion of previous step, further screen.It can To calculate the weighted value quadratic sum of remaining each 3x3 convolution kernel, the corresponding convolution kernel of maximum value is therefrom chosen, convolution kernel is denoted as C calculates the L2 norm (L2 distance) of convolution kernel C and other remaining convolution kernels (assuming that residue X), then X L2 distance can be obtained. To this X L2 apart from ascending sort, selected apart from the smallest since L2, until the convolution kernel total number selected reaches M.

The L2 distance can be calculated based on following formula:

It is N/4 by channel compressions for example, the channel 3N/4 can be deleted for the convolution of the N channel 5x5 in the first model, The quantity of convolution kernel is reduced to original 1/4.For the convolution of N channel 1x1 in the first model, (the last one convolutional layer is removed Outside), the channel N/2 is deleted, is N/2 by channel compressions, i.e., the quantity of convolution kernel is reduced to original 1/2.

Since the effect that big convolution kernel is handled often is not so good as small convolution kernel, big convolution kernel can be carried out It decomposes, deepens the model number of plies, but parameter can but be reduced, the classifying quality of model is more preferable.After then can be to pretreatment Convolution kernel carry out deconsolidation process, the convolution kernel after deletion is split as to the first convolution kernel and second channel of first passage quantity Second convolution kernel of quantity.

Fig. 5 shows the schematic diagram that 3*3 convolution kernel according to an embodiment of the invention is split.As shown in figure 5, by The convolution kernel of N channel 3x3 is split as the 1x1 convolution in the channel N/4 in one model, connects the 3x3 convolution in the channel N/2. Further, the 3x3 convolution kernel in the channel N/2 can also be continued to decompose, the convolution of x and y both direction is separated, formed two layers Convolution is 1x3 convolution kernel 3x1 convolution kernel respectively.Only by taking the convolution kernel of 3x3 as an example, this method can push away above-mentioned method for splitting The wide convolution kernel for arriving other sizes.For example, the convolution kernel of a 5*5 can be replaced with continuous 2 3*3 convolution kernels, Mei Gejuan Product core is all by activation.The convolution kernel of 1 5*5 can also be replaced by 2 continuous 1*5 and 5*1 convolution kernels.

Then convolution can be grouped to the convolution kernel after fractionation.Grouping convolution can make convolution process concurrent operation, Training parameter can be reduced, it is not easy to over-fitting.It is one group since grouping convolution is by the convolution volume in certain several channel, every group of volume Core and the input data convolution in corresponding group are accumulated, is combined again with combined mode after obtaining output data.Fig. 6 shows root According to the schematic diagram of the grouping convolution of one embodiment of the present of invention.As shown in fig. 6, the convolution in 16 channels is k₁..., k₁₆, meter Their quadratic sum is calculated, and carries out descending sort, sequence postscript is A₁..., A₁₆.The convolution in 16 channels can be divided into 4 groups. A₁, A₂, A₃, A₄Successively assign to first to fourth group；A₅, A₆, A₇, A₈Successively assign to first to fourth group；A₉, A₁₀, A₁₁, A₁₂Successively Assign to first to fourth group；A₁₃, A₁₄, A₁₅, A₁₆Successively assign to first to fourth group.The above method is only divided into 4 groups with 16 channels Convolution for, this method can be generalized to other channels any convolution grouping situation.

Convolution kernel by the above method to the network structure of the first model, i.e., to process of convolution layer each in the first model It deleted, split, after packet transaction, so that the second model obtained after processing has less network parameter, reaching compression The purpose of model.Table 2 shows the subnetwork parameter of the second model according to an embodiment of the invention.As shown in table 2, Compared with the network structure of the first model, the port number of 5*5 convolution kernel is reduced to original 1/4 in the second model, i.e., will 32x56x56 is reduced to 8x56x56.3*3 convolution kernel is decomposed into 1*1 convolution kernel and the grouping convolution of 3*3.1*1 convolution after fractionation Core port number is reduced to original 1/4, i.e., 64x28x28 is reduced to 16x56x56.Due to having used 3*3 to be grouped convolution group =4,3*3 convolution kernel port number is reduced to original 1/2 after fractionation, i.e., former 128x14x14 is reduced to 16x28x28.It needs Bright, the partial parameters in the second model are overlapped with the parameter in the first model, and details are not described herein again.

Table 2

The initialization of neural network is most important for the convergence of model.Traditional Gaussian Profile random initializtion is in network Model is made to be difficult to restrain when deepening.Initialization for the first model, can make the distribution of weight is 0 variance of mean value For the Gaussian Profile of 2/n.Initialization for the second model can be initialized based on the gain of parameter of the second model after training Parameter value.

Then in step S240, based on the network parameter of the first model after training, the network of the second model is initialized Parameter, the second model after being initialized.

According to one embodiment of present invention, can convolution kernel before the fractionation based on first passage quantity center Weighted value initialization split after the first convolution kernel weighted value；The power of convolution kernel before fractionation based on second channel quantity The weighted value of the second convolution kernel after weight values initialization fractionation.

For example, for the 1x1 convolution kernel after splitting, can be selected from big to small with before splitting according to weighted value quadratic sum The weighted value deinitialization of the center of N/4 3x3 convolution kernel；It, can be directly with fractionation for the 3x3 convolution kernel after splitting The preceding N/2 3x3 convolution kernel weight deinitialization selected from big to small according to weighted value quadratic sum.

Finally in step s 250, it will be handled in the second model after training image input initialization, to export instruction Practice the predicted characteristics point coordinate of target object in image, and utilizes the characteristic point coordinate of predicted characteristics point coordinate and mark, training The second model after initialization, using the second model after being trained as characteristic point detection model.

According to one embodiment of present invention, the characteristic point coordinate of the characteristic point coordinate based on mark and the second model prediction Between penalty values, update the second model parameter, when the penalty values meet predetermined condition, training terminate.For example, pre- Fixed condition can be set to, and the difference of penalty values is less than predetermined threshold twice before and after the second penalty values calculated using gradient descent method Value or the number of iterations reach pre-determined number.It in an embodiment of the present invention, can be using method the second mould of training of fine tuning Type.Wherein finetune is exactly to fix the second model with the parameter of corresponding layer in the first model, makes the second model training mistake Penalty values in journey are reduced to a very low value, then turn down learning rate, preferable convergence effect available in this way.Network instruction Adam optimization algorithm can be used in experienced optimization algorithm, and hyper parameter learning rate can be set to 0.0002, and L2 loss can be used Function calculates penalty values.L2 loss function are as follows:

Wherein, N is characterized quantity (N=28 in the embodiment of the present invention) a little, x_i, y_iThe ith feature point respectively predicted Abscissa and ordinate, target_xi, target_yiThe abscissa and ordinate of the ith feature point respectively marked.

The second model size obtained by above-mentioned training can satisfy mobile terminal to the limit of model size in 3Mb or so System.Method based on characteristic point detection model can be executed in mobile terminal.Cat face image to be detected is cut first, Then it detects in the cat face image output characteristic point detection model (the second model after training) after cutting, finally exports The position of 28 cat face points.

Since the posture, expression and kind of animal are all too various, if directly adopt simple convolutional neural networks into The detection of row characteristic point, precision are insufficient；If detected using complicated convolutional neural networks, although meeting required precision, But model is too huge, can not be applied to mobile terminal and realize real-time detection.This programme is in view of the above-mentioned problems, by first to essence It spends higher, the biggish characteristic point detection model of model to be handled, reduces the parameter amount of model.Then the training side of fine tuning is utilized Mini Mod after method training managing, the mini Mod after being trained can either guarantee model as characteristic point detection model in this way Precision with higher, and the computational efficiency and memory of mobile terminal can be matched.

A9, method as described in a1, wherein described first using the training of the characteristic point coordinate of predicted characteristics point coordinate and mark The step of the second model after beginningization includes:

Penalty values between characteristic point coordinate based on mark and predicted characteristics point coordinate update the parameter of the second model, When the penalty values meet predetermined condition, training terminates.

A10, the method as described in A9, wherein calculate the penalty values based on following formula:

A11, method as described in a1, wherein the second model after training image to be inputted to the first model or initialization In before the step of being handled, the method also includes:

Characteristic point coordinate based on mark, cuts training image；And

Data enhancing processing is carried out to the image after cutting.

A12, the method as described in A11, wherein described the step of cutting to training image includes:

Characteristic point coordinate based on target object calculates convex closure, to obtain minimum circumscribed rectangle；

The minimum circumscribed rectangle is extended into prearranged multiple, so as to based on the rectangular CROPPED IMAGE after extension.

A13, the method as described in A11, wherein data enhancing processing include stretching, overturning, rotation, affine transformation, It blocks, is in colour cast processing one or more.

It should be appreciated that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, it is right above In the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure or In person's descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. claimed hair Bright requirement is than feature more features expressly recited in each claim.More precisely, as the following claims As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real Thus the claims for applying mode are expressly incorporated in the specific embodiment, wherein each claim itself is used as this hair Bright separate embodiments.

Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined into a module or furthermore be segmented into multiple Submodule.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.

Various technologies described herein are realized together in combination with hardware or software or their combination.To the present invention Method and apparatus or the process and apparatus of the present invention some aspects or part can take insertion tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to practice this hair Bright equipment.

In the case where program code executes on programmable computers, calculates equipment and generally comprise processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is configured for storage program code；Processor is configured for according to the memory Instruction in the said program code of middle storage executes method of the present invention.

By way of example and not limitation, computer-readable medium includes computer storage media and communication media.It calculates Machine readable medium includes computer storage media and communication media.Computer storage medium storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc. Data-signal processed passes to embody computer readable instructions, data structure, program module or other data including any information Pass medium.Above any combination is also included within the scope of computer-readable medium.

In addition, be described as herein can be by the processor of computer system or by executing by some in the embodiment The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, Installation practice Element described in this is the example of following device: the device be used for implement as in order to implement the purpose of the invention element performed by Function.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc. Description plain objects, which are merely representative of, is related to the different instances of similar object, and is not intended to imply that the object being described in this way must Must have the time it is upper, spatially, sequence aspect or given sequence in any other manner.

Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that Language used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, for this Many modifications and changes are obvious for the those of ordinary skill of technical field.For the scope of the present invention, to this Invent done disclosure be it is illustrative and not restrictive, it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of training method of characteristic point detection model, suitable for being executed in calculating equipment, which comprises

The training image with labeled data is obtained, the labeled data is the characteristic point coordinate of target object in image；

Training image is inputted in the first model and is handled, with the first model after being trained；

The network structure of the first model after training is handled, the second model is obtained；

Based on the network parameter of the first model after training, the network parameter of the second model is initialized, the after being initialized Two models；

It will be handled in the second model after training image input initialization, to export the prediction of target object in training image Characteristic point coordinate, and using the characteristic point coordinate of predicted characteristics point coordinate and mark, the second model after initializing is trained, with The second model after to training is as characteristic point detection model.

2. the method for claim 1, wherein first model includes multiple process of convolution layers, each process of convolution Layer includes multiple convolution kernels, and the step of network structure of first model to pre-training is handled includes:

Delete processing is carried out to convolution kernel, the convolution kernel after being deleted；

Deconsolidation process is carried out to the convolution kernel after deletion, the convolution kernel after being split；

Convolution kernel after fractionation is grouped.

3. method according to claim 2, wherein it is described to convolution kernel carry out delete processing the step of include:

The quantity that convolution kernel weighted value is 0 in each process of convolution layer is counted, if the quantity a predetermined level is exceeded or convolution of statistics The a certain row/column of core/cornerwise weighted value is all 0, then deletes the convolution kernel of the first quantity；

If the quantity of remaining convolution kernel is greater than preset value after deleting, based on the quadratic sum of remaining convolution kernel weighted value, from surplus The convolution kernel of the second quantity is deleted in remaining convolution kernel, to obtain pretreated convolution kernel.

4. method as claimed in claim 3, wherein the quadratic sum based on remaining convolution kernel weighted value, from remaining convolution The step of convolution kernel of the second quantity is deleted in core include:

Remaining convolution kernel is ranked up according to the quadratic sum sequence from big to small of weighted value；

Calculate separately the L2 distance between the maximum convolution kernel of weighted value quadratic sum and other each remaining convolution kernels；

According to the sequence of L2 distance from big to small, the remaining convolution in addition to the maximum convolution kernel of weighted value quadratic sum is successively deleted The quantity of core, the remaining convolution kernel after deleting is not more than preset value.

5. method as claimed in claim 4, wherein calculate the L2 distance based on following formula:

Wherein, n is the quantity of convolution kernel weight, c_kFor k-th of weighted value of the maximum convolution kernel of weighted value quadratic sum, i_kFor it K-th of weighted value of his some remaining convolution kernel.

6. method according to claim 2, wherein described pair delete after convolution kernel carry out deconsolidation process the step of include:

Convolution kernel after deletion is split as to the first convolution kernel of first passage quantity and the second convolution kernel of second channel quantity.

7. method as claimed in claim 6, wherein the network parameter initialization second based on the first model after training The step of network parameter of model includes:

The first convolution after the weighted value initialization fractionation of the center of convolution kernel before fractionation based on first passage quantity The weighted value of core；

The weighted value of the second convolution kernel after the weighted value initialization fractionation of convolution kernel before fractionation based on second channel quantity.

8. method according to claim 2, wherein described pair split after convolution kernel the step of being grouped include:

The weighted value quadratic sum of each convolution kernel after splitting is calculated, and is sorted according to sequence from big to small；

Based on the sequence of weighted value quadratic sum, the convolution kernel after fractionation is grouped.

9. a kind of calculating equipment, comprising:

Memory；

One or more processors；

One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, and one or more of programs include appointing for executing in -8 the methods according to claim 1 The instruction of one method.

10. a kind of computer readable storage medium for storing one or more programs, one or more of programs include instruction, Described instruction is when calculating equipment and executing, so that calculatings equipment execution is any in method described in -8 according to claim 1 The instruction of method.