CN107392316B - Network training method and device, computing equipment and computer storage medium - Google Patents

Network training method and device, computing equipment and computer storage medium Download PDF

Info

Publication number
CN107392316B
CN107392316B CN201710555959.5A CN201710555959A CN107392316B CN 107392316 B CN107392316 B CN 107392316B CN 201710555959 A CN201710555959 A CN 201710555959A CN 107392316 B CN107392316 B CN 107392316B
Authority
CN
China
Prior art keywords
network
sample image
image
style
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710555959.5A
Other languages
Chinese (zh)
Other versions
CN107392316A (en
Inventor
申发龙
颜水成
曾钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201710555959.5A priority Critical patent/CN107392316B/en
Publication of CN107392316A publication Critical patent/CN107392316A/en
Application granted granted Critical
Publication of CN107392316B publication Critical patent/CN107392316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a network training method, a device, computing equipment and a computer storage medium, wherein the network training method is completed through multiple iterations; the training step of one iteration process comprises the following steps: extracting a first sample image and a second sample image; obtaining a second network corresponding to the style of the first sample image according to the first network and the first sample image; generating a third sample image corresponding to the second sample image using the second network; updating the weight parameters of the first network according to the loss between the third sample image and the first sample image and between the third sample image and the second sample image; and iteratively executing the training steps until a preset convergence condition is met. According to the technical scheme provided by the invention, the first network suitable for the images with any styles and images with any contents can be obtained through training, and the first network is utilized to help to quickly obtain the corresponding image conversion network, so that the efficiency of image stylization processing is improved.

Description

Network training method and device, computing equipment and computer storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a network training method, a network training device, computing equipment and a computer storage medium.
Background
By utilizing the image stylization processing technology, the style on the style image can be transferred to the daily shot image, so that the image can obtain better visual effect. In the prior art, a given style image is directly input into a neural network (neural network), then a large number of content images are used as sample images, an image conversion network corresponding to the given style image is obtained through a plurality of times of iterative training, and the style conversion of the input content image is realized by using the image conversion network.
In the prior art, for any given style of image, thousands of times of iterative operations are required to train a neural network, so as to obtain an image conversion network corresponding to the style. In the training process of the image conversion network, thousands of times of iterative operations cause huge calculation amount, which will require long training time, resulting in low image stylization processing efficiency.
Disclosure of Invention
In view of the above, the present invention has been developed to provide a network training method, apparatus, computing device, and computer storage medium that overcome or at least partially address the above-identified problems.
According to an aspect of the present invention, there is provided a network training method, which is performed through a plurality of iterations;
the training step of one iteration process comprises the following steps:
extracting a first sample image and a second sample image;
obtaining a second network corresponding to the style of the first sample image according to the first network and the first sample image;
generating a third sample image corresponding to the second sample image using the second network;
updating the weight parameters of the first network according to the loss between the third sample image and the first sample image and between the third sample image and the second sample image;
the method comprises the following steps: and iteratively executing the training steps until a preset convergence condition is met.
Further, extracting the first sample image and the second sample image further comprises:
a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library.
Further, in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.
Further, obtaining a second network corresponding to the style of the first sample image according to the first network and the first sample image further comprises:
and inputting the first sample image into the first network to obtain a second network corresponding to the style of the first sample image.
Further, inputting the first sample image into the first network, and obtaining a second network corresponding to the style of the first sample image further includes:
extracting style texture features from the first sample image;
and inputting the style texture features into the first network to obtain a second network corresponding to the style texture features.
Further, updating the weight parameter of the first network according to the loss between the third sample image and the first sample image and the second sample image further comprises:
and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and updating the weight parameter of the first network by using the first network loss function.
Further, the predetermined convergence condition includes: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches the preset visual effect parameter.
Further, the first network is a meta-network obtained by training the neural network, and the second network is an image conversion network.
Further, the method is performed by a terminal or a server.
According to another aspect of the present invention, there is provided a network training apparatus, which is performed through a plurality of iterations; the device includes:
an extraction module adapted to extract a first sample image and a second sample image;
the generating module is suitable for obtaining a second network corresponding to the style of the first sample image according to the first network and the first sample image;
a sample processing module adapted to generate a third sample image corresponding to the second sample image using the second network;
the updating module is suitable for updating the weight parameters of the first network according to the loss between the third sample image and the first sample image and between the third sample image and the second sample image;
the network training device is run iteratively until a predetermined convergence condition is met.
Further, the extraction module is further adapted to:
a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library.
Further, the apparatus is further adapted to: in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.
Further, the generation module is further adapted to:
and inputting the first sample image into the first network to obtain a second network corresponding to the style of the first sample image.
Further, the generation module is further adapted to:
extracting style texture features from the first sample image;
and inputting the style texture features into the first network to obtain a second network corresponding to the style texture features.
Further, the update module is further adapted to:
and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and updating the weight parameter of the first network by using the first network loss function.
Further, the predetermined convergence condition includes: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches the preset visual effect parameter.
Further, the first network is a meta-network obtained by training the neural network, and the second network is an image conversion network.
According to another aspect of the present invention, there is provided a terminal including the network training apparatus described above.
According to another aspect of the present invention, there is provided a server including the network training apparatus described above.
According to yet another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the network training method.
According to still another aspect of the present invention, a computer storage medium is provided, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform operations corresponding to the network training method.
According to the technical scheme provided by the invention, a first sample image and a second sample image are extracted, a second network corresponding to the style of the first sample image is obtained according to the first network and the first sample image, then a third sample image corresponding to the second sample image is generated by using the second network, then the weight parameter of the first network is updated according to the loss between the third sample image and the first sample image as well as the second sample image, and the training step is iteratively executed until the preset convergence condition is met. The technical scheme provided by the invention can train and obtain the first network suitable for the images with any style and images with any content, and the first network is used for facilitating the rapid obtaining of the corresponding image conversion network, thereby improving the efficiency of the stylized processing of the images.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 shows a flow diagram of an image stylization processing method according to one embodiment of the invention;
FIG. 2a shows an exemplary diagram of a first image;
FIG. 2b shows an exemplary diagram of a second image;
FIG. 2c shows an example diagram of a third image;
FIG. 3a shows a schematic flow diagram of a network training method according to an embodiment of the invention;
FIG. 3b shows a schematic flow diagram of a network training method according to another embodiment of the invention;
FIG. 4 shows a flow diagram of an image stylization processing method according to another embodiment of the invention;
FIG. 5 shows a block diagram of an image stylization processing apparatus according to one embodiment of the present invention;
fig. 6 is a block diagram showing the configuration of an image stylization processing apparatus according to another embodiment of the present invention;
FIG. 7 is a block diagram of a network training apparatus according to another embodiment of the present invention;
FIG. 8 illustrates a schematic structural diagram of a computing device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 shows a flow diagram of an image stylization processing method according to an embodiment of the present invention, which may be executed by a terminal or a server, the method being executed based on a trained first network, as shown in fig. 1, the method including the steps of:
step S100, a first image is acquired.
The first image may be a stylistic image with any style, and is not limited to stylistic images with certain specific styles. When the user wants to process an image into an image having a consistent style with one of the first images, the first image may be acquired in step S100. In order to distinguish from the first image, an image that the user wants to process is referred to as a second image to be processed in the present invention.
Step S101, inputting the first image into the first network to obtain a second network corresponding to the style of the first image.
The first network is trained, and specifically, a sample image used for training the first network comprises: a plurality of first sample images stored by the genre image library and a plurality of second sample images stored by the content image library. The first sample image is a style sample image, and the second sample image is a content sample image. Since the trained first network can be applied to any style image and any content image, the second network corresponding to the style of the first image can be mapped quickly without training the first image after the first image acquired in step S100 is input into the first network in step S101.
Wherein the training process of the first network is completed through a plurality of iterations. Optionally, in an iterative process, a first sample image is extracted from the genre image library, at least one second sample image is extracted from the content image library, and the first network is trained using the first sample image and the at least one second sample image.
Optionally, the one-iteration process comprises: generating a third sample image corresponding to the second sample image using a second network corresponding to the style of the first sample image; and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and updating the weight parameter of the first network according to the first network loss function.
In an embodiment of the present invention, the first network is a meta network (meta network) obtained by training a neural network, and the second network is an image conversion network. In the prior art, a neural network is directly utilized to obtain a corresponding image conversion network through long-time training, while in the invention, the neural network is trained, and because the trained meta-network can be well suitable for images with any styles and images with any contents, the corresponding image conversion network can be quickly mapped by utilizing the meta-network instead of directly utilizing the neural network to train to obtain the image conversion network, therefore, compared with the prior art, the speed of obtaining the image conversion network is greatly improved.
And S102, performing stylization processing on the second image to be processed by using a second network to obtain a third image corresponding to the second image.
And after a second network corresponding to the style of the first image is obtained, performing stylization processing on the second image to be processed by using the second network, wherein a third image obtained after the stylization processing is a style migration image corresponding to the second image, and the style migration image has a style consistent with that of the first image. Fig. 2a and 2b show exemplary views of a first image and a second image, respectively, the second image shown in fig. 2b being stylized using a second network corresponding to the style of the first image shown in fig. 2a, and the resulting corresponding third image being as shown in fig. 2 c. As shown in fig. 2c, this third image has had the style of the first image shown in fig. 2 a.
According to the image stylization processing method provided by the embodiment of the invention, a first image is obtained, then the first image is input into a first network to obtain a second network corresponding to the style of the first image, and then the second network is utilized to perform stylization processing on a second image to be processed to obtain a third image corresponding to the second image. Compared with the image stylization processing mode in the prior art, the technical scheme provided by the invention can rapidly obtain the corresponding image conversion network by utilizing the trained first network, thereby effectively improving the efficiency of the image stylization processing and optimizing the image stylization processing mode.
Fig. 3a is a flow chart of a network training method according to an embodiment of the present invention, which may be executed by a terminal or a server, and is completed through multiple iterations, as shown in fig. 3a, the training step of one iteration process includes:
step S200, a first sample image and a second sample image are extracted.
In order to train the first network, in step S200, a first sample image and a second sample image need to be extracted. The first sample image is a style image, and the second sample image is a content image. The number of the extracted first sample image and the second sample image can be set by those skilled in the art according to actual needs, and is not limited herein.
Step S201, according to the first network and the first sample image, obtaining a second network corresponding to the style of the first sample image.
In one embodiment of the present invention, the first network is a meta-network obtained by training a neural network. After the first sample image is extracted, a second network corresponding to the style of the first sample image can be obtained according to the neural network and the extracted first sample image, and the second network is an image conversion network.
Step S202 is to generate a third sample image corresponding to the second sample image using the second network.
After the second network corresponding to the style of the first sample image is obtained, a third sample image corresponding to the second sample image can be generated by using the second network, wherein the third sample image is a style transition image, and the style transition image has a style consistent with that of the first sample image.
Step S203, updating the weight parameter of the first network according to the loss between the third sample image and the first sample image and between the third sample image and the second sample image.
Specifically, the weight parameter of the first network may be updated according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image.
Step S204, the training step is executed iteratively until a preset convergence condition is met.
The predetermined convergence condition can be set by those skilled in the art according to actual needs, and is not limited herein. And iteratively executing the training steps until a preset convergence condition is met, thereby obtaining a trained first network. The trained first network can be suitable for images of any style and images of any content, and then the first network can be used for quickly mapping to obtain a corresponding image conversion network instead of directly training the first network to obtain the image conversion network, so that compared with the prior art, the speed of obtaining the image conversion network is greatly improved.
According to the network training method provided by the embodiment of the invention, the first network suitable for the images with any styles and images with any contents can be obtained through training, and the first network is utilized to help to quickly obtain the corresponding image conversion network, so that the efficiency of image stylization processing is improved.
Fig. 3b is a flowchart illustrating a network training method according to another embodiment of the present invention, which may be executed by a terminal or a server, and is performed through multiple iterations, as shown in fig. 3b, where the training step of one iteration process includes:
step S300, a first sample image is extracted from the style image library, and at least one second sample image is extracted from the content image library.
In a specific training process, the style image library stores 10 ten thousand first sample images, and the content image library stores 10 ten thousand second sample images, wherein the first sample images are style images, and the second sample images are content images. In step S300, a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library. The number of the second sample images can be set by those skilled in the art according to actual needs, and is not limited herein.
Step S301, inputting the first sample image into the first network, and obtaining a second network corresponding to the style of the first sample image.
In one embodiment of the present invention, the first network is a meta-network obtained by training a neural network. For example, the neural network may be a VGG-16 convolutional neural network (convolutional neural network). Specifically, in step S301, style texture features are extracted from the first sample image, and then the extracted style texture features are input into the first network, and forward propagation (forward propagation) operation is performed in the first network, so as to obtain a second network corresponding to the style texture features.
Step S302 is to generate a third sample image corresponding to at least one second sample image, respectively, using a second network corresponding to the style of the first sample image.
After the second network corresponding to the style of the first sample image is obtained, corresponding third sample images can be generated for at least one second sample image respectively by using the second network corresponding to the style of the first sample image, wherein the third sample images are style transition images corresponding to the second sample images, and the style transition images have the style consistent with the first sample images. When 8 second sample images are extracted in step S300, corresponding third sample images are generated for the 8 second sample images, respectively, i.e., one corresponding third sample image is generated for each second sample image in step S302.
Step S303, obtaining a first network loss function according to the style loss between the at least one third sample image and the first sample image and the content loss between the at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function.
Wherein, those skilled in the art can set the specific content of the first network loss function according to actual needs, and the content is not limited herein. In one embodiment, the first network loss function may be:
Figure BSA0000147380410000091
wherein, IcFor the second sample image, IsI is the first sample image, I is the third sample image, CP is the perceptual function for perceiving the content difference, SP is the perceptual function for perceiving the style difference,
Figure BSA0000147380410000101
for a loss of content between the third sample image and the corresponding second sample image,
Figure BSA0000147380410000102
is the loss of style between the third sample image and the first sample image, theta is the weight parameter of the first network, and lambdacFor presetting content loss weight, λsWeight is lost for the default style. According to the first network loss function, a back propagation (back propagation) operation is performed, and the weight parameter θ of the first network is updated according to the operation result.
In a specific training process, the first network is a meta-network obtained by training a neural network, and the second network is an image conversion network. The first network is trained using a stochastic gradient descent (stochastic gradient device) algorithm. The specific training process comprises:
1. setting a number of iterations k of a first sample image and a second sample image IcThe number of (2). For example, the number of iterations k may be set to 20, and the second sample image I may be setcThe number m of the second sample images is set to be 8, which indicates that 20 times of iteration is needed for one first sample image in the training process of the meta-network, and 8 second sample images I need to be extracted from the content image library in each iterationc
2. Fixedly extracting a first sample image I from a style image librarys
3. A first sample image IsInputting the image into a first network N (-) and performing feed-forward propagation (feed-forward propagation) operation in the first network N (-) to obtain the image IsCorresponding to the second network w. The mapping formula of the second network w and the first network N (·; theta) is as follows: w ← N (I)s;θ)。
4. Inputting m second sample images Ic. Wherein m second sample images IcCan be used
Figure BSA0000147380410000103
And (4) showing.
5. Using the second network w, respectively for each second sample image IcA corresponding third sample image I is generated.
6. The weight parameter theta of the first network is updated according to the first network loss function.
The first network loss function is specifically:
Figure BSA0000147380410000104
in the first network loss function, λcFor presetting content loss weight, λsWeight is lost for the default style.
Step S304, the training step is executed iteratively until a preset convergence condition is met.
Wherein, those skilled in the art can set the predetermined convergence condition according to the actual requirement, and the present disclosure is not limited herein. For example, the predetermined convergence condition may include: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches the preset visual effect parameter. Specifically, whether the predetermined convergence condition is satisfied may be determined by determining whether the iteration number reaches a preset iteration number, whether the predetermined convergence condition is satisfied may be determined according to whether an output value of the first network loss function is smaller than a preset threshold value, and whether the predetermined convergence condition is satisfied may be determined by determining whether a visual effect parameter of a third sample image corresponding to the second sample image reaches a preset visual effect parameter. In step S304, the training steps are iteratively performed until a predetermined convergence condition is satisfied, thereby obtaining a trained first network.
It is worth noting that in order to improve the stability of the first network in the training process, in the multiple iteration process, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.
By fixing the first sample image and continuously replacing the second sample image, the first network suitable for the first sample image and any second sample image can be efficiently trained, and then the next first sample image is replaced and the second sample image is continuously replaced, so that the first network suitable for the two first sample images and any second sample image is trained. The process is repeated until the first sample image in the style image library and the second sample image in the content image library are extracted, so that the first network suitable for any first sample image and any second sample image can be obtained through training, which is equivalent to the first network suitable for any style image and any content image obtained through training, the time required for training the first network is effectively shortened, and the training efficiency of the first network is improved.
According to the network training method provided by the embodiment of the invention, not only can the first network suitable for the images with any styles and images with any contents be obtained through training, but also the time required by network training is effectively shortened, and the network training efficiency is improved; in addition, the first network is utilized to help to obtain the corresponding image conversion network quickly, and the efficiency of the image stylization processing is improved.
Fig. 4 shows a flowchart of an image stylization processing method according to another embodiment of the present invention, which may be executed by a terminal or a server, the method being executed based on a trained first network, as shown in fig. 4, the method including the steps of:
in step S400, a first image is acquired.
The first image may be a stylistic image with any style, and is not limited to stylistic images with certain specific styles. Specifically, the first image may be a genre image in a website, or a genre image shared by other users. When the user wants to process the second image to be processed into an image having a consistent style with one of the first images, the first image may be acquired in step S400.
Step S401, inputting the first image into the first network, and performing a forward propagation operation in the first network to obtain a second network corresponding to the style of the first image.
Because the first network is trained, the first network can be well suitable for images with any styles and images with any contents, and after the first image is input into the first network, the first image does not need to be trained, and the second network corresponding to the style of the first image can be quickly mapped by only carrying out forward propagation operation once in the first network. In specific application, after a first image is input into a first network, a second network corresponding to the style of the first image can be obtained only by 0.02s, and the second network is an image conversion network.
And step S402, performing stylization processing on the second image to be processed by using a second network to obtain a third image corresponding to the second image.
After the second network corresponding to the style of the first image is obtained, the second image to be processed is stylized by the second network, and a third image corresponding to the second image can be conveniently obtained. The third image is the style transition image corresponding to the second image.
The advantages of the image stylization processing method provided by the present invention will be described below by comparing with two image stylization processing methods in the prior art. Table 1 shows the comparison result between the present method and two image stylization processing methods in the prior art.
TABLE 1
Figure BSA0000147380410000121
As shown in table 1, gaits et al filed a paper "neural algorithm for artistic style", and the method proposed in the paper could not obtain an image conversion network, but could be applied to any style, and it took 9.52s to obtain a corresponding style migration image.
Johnson et al published a paper "real-time style conversion and super-resolution perception loss" in the european computer vision conference in 2016, and the method proposed in the paper takes 4 hours to obtain a corresponding image conversion network, and is only applicable to one style, but only takes 0.015s to obtain a corresponding style migration image.
Compared with the two methods, the image stylization processing method provided by the invention not only can be suitable for any style, but also only needs to take 0.022s to obtain the corresponding image conversion network and only needs to take 0.015s to obtain the corresponding style migration image, thereby effectively improving the speed of obtaining the image conversion network and the efficiency of obtaining the style migration image.
According to the image stylization processing method provided by the embodiment of the invention, a first image is obtained, then the first image is input into a first network, forward propagation operation is carried out once in the first network to obtain a second network corresponding to the style of the first image, and then stylization processing is carried out on the second image to be processed by utilizing the second network to obtain a third image corresponding to the second image. Compared with the image stylization processing mode in the prior art, the technical scheme provided by the invention can map and obtain the corresponding image conversion network quickly by performing forward propagation operation once in the trained first network, thereby effectively improving the speed of obtaining the image conversion network, improving the efficiency of image stylization processing and optimizing the image stylization processing mode; in addition, the obtained image conversion network can be used for conveniently and quickly stylizing the image.
Fig. 5 is a block diagram showing the structure of an image stylization processing apparatus according to an embodiment of the present invention, which operates based on a trained first network, as shown in fig. 5, and includes: an acquisition module 510, a mapping module 520, and a processing module 530.
The obtaining module 510 is adapted to: a first image is acquired.
The first image may be a stylistic image with any style, and is not limited to stylistic images with certain specific styles. When the user wants to process the second image to be processed into an image having a style consistent with one of the first images, the obtaining module 510 needs to obtain the first image.
The mapping module 520 is adapted to: the first image is input into a first network, and a second network corresponding to the style of the first image is obtained.
Specifically, the sample image used for the first network training includes: a plurality of first sample images stored by the genre image library and a plurality of second sample images stored by the content image library. After the mapping module 520 inputs the first image acquired by the acquiring module 510 into the first network, the second network corresponding to the style of the first image can be quickly mapped without training the first image.
The processing module 530 is adapted to: and performing stylization processing on the second image to be processed by using a second network to obtain a third image corresponding to the second image.
The processing module 530 performs stylization on the second image to be processed by using the second network obtained by the mapping module 520, and conveniently obtains a third image corresponding to the second image, wherein the third image has a style consistent with that of the first image.
According to the image stylization processing device provided by the embodiment of the invention, the acquisition module acquires a first image, the mapping module inputs the first image into the first network to obtain a second network corresponding to the style of the first image, and the processing module performs stylization processing on the second image to be processed by using the second network to obtain a third image corresponding to the second image. Compared with the image stylization processing mode in the prior art, the technical scheme provided by the invention can rapidly obtain the corresponding image conversion network by utilizing the trained first network, thereby effectively improving the efficiency of the image stylization processing and optimizing the image stylization processing mode.
Fig. 6 is a block diagram showing a configuration of an image stylization processing apparatus according to another embodiment of the present invention, which includes, as shown in fig. 6: an acquisition module 610, a first network training module 620, a mapping module 630, and a processing module 640.
The acquisition module 610 is adapted to: a first image is acquired.
Wherein the training process of the first network is completed through a plurality of iterations. The first network training module 620 is adapted to: in an iterative process, a first sample image is extracted from the style image library, at least one second sample image is extracted from the content image library, and the first network is trained by using the first sample image and the at least one second sample image.
Optionally, the first network training module 620 is adapted to: generating a third sample image corresponding to the second sample image by using a second network corresponding to the style of the first sample image in an iteration process; and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and updating the weight parameter of the first network according to the first network loss function.
In a particular embodiment, the first network training module 620 may include: an extraction unit 621, a generation unit 622, a processing unit 623, and an update unit 624.
In particular, the extraction unit 621 is adapted to: a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library.
The generating unit 622 is adapted to: and inputting the first sample image into the first network to obtain a second network corresponding to the style of the first sample image.
In an embodiment of the present invention, the first network is a meta-network obtained by training a neural network, and the second network is an image transformation network. The generating unit 622 is further adapted to: extracting style texture features from the first sample image; and inputting the style texture features into the first network to obtain a second network corresponding to the style texture features.
The processing unit 623 is adapted to: and generating corresponding third sample images respectively aiming at the at least one second sample image by utilizing a second network corresponding to the style of the first sample image.
The update unit 624 is adapted to: and obtaining a first network loss function according to the style loss between the at least one third sample image and the first sample image and the content loss between the at least one third sample image and the corresponding second sample image, and updating the weight parameter of the first network according to the first network loss function. Wherein, those skilled in the art can set the specific content of the first network loss function according to actual needs, and the content is not limited herein. In one embodiment, the first network loss function may be:
Figure BSA0000147380410000151
wherein, IcFor the second sample image, IsI is the first sample image, I is the third sample image, CP is the perceptual function for perceiving the content difference, SP is the perceptual function for perceiving the style difference,
Figure BSA0000147380410000152
for a loss of content between the third sample image and the corresponding second sample image,
Figure BSA0000147380410000153
is the loss of style between the third sample image and the first sample image, theta is the weight parameter of the neural network, and lambdacFor presetting content loss weight, λsWeight is lost for the default style.
The first network training module 620 iteratively runs until a predetermined convergence condition is met. The first network training module 620 is further adapted to: fixedly extracting a first sample image, and alternatively extracting at least one second sample image; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image. By the method, the first network suitable for the images of any style and any content can be trained efficiently, so that the time required for training the first network is effectively shortened, and the training efficiency of the first network is improved.
The mapping module 630 is adapted to: and inputting the first image into a first network, and carrying out forward propagation operation once in the first network to obtain a second network corresponding to the style of the first image.
Since the first network is trained by the first network training module 620, the first network can be well suitable for images of any style and images of any content, the mapping module 630 inputs the first image acquired by the acquisition module 610 into the first network trained by the first network training module 620, and the second network corresponding to the style of the first image can be quickly mapped by performing forward propagation operation in the first network only once without training the first image.
The processing module 640 is adapted to: and performing stylization processing on the second image to be processed by using a second network to obtain a third image corresponding to the second image.
According to the image stylization processing device provided by the embodiment of the invention, the acquisition module acquires a first image, the first network training module trains a first network, the mapping module inputs the first image into the first network, forward propagation operation is performed in the first network once to obtain a second network corresponding to the style of the first image, and the processing module stylizes the second image by using the second network to obtain a third image corresponding to the second image. Compared with the image stylization processing mode in the prior art, the technical scheme provided by the invention can map and obtain the corresponding image conversion network quickly by performing forward propagation operation once in the trained first network, thereby effectively improving the speed of obtaining the image conversion network, improving the efficiency of image stylization processing and optimizing the image stylization processing mode; in addition, the obtained image conversion network can be used for conveniently and quickly stylizing the image.
Fig. 7 is a block diagram of a network training apparatus according to another embodiment of the present invention, which is completed through multiple iterations, as shown in fig. 7, and includes: an extraction module 710, a generation module 720, a sample processing module 730, and an update module 740.
The extraction module 710 is adapted to: a first sample image and a second sample image are extracted.
The extraction module 710 is further adapted to: a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library.
The generation module 720 is adapted to: and obtaining a second network corresponding to the style of the first sample image according to the first network and the first sample image.
The generation module 720 is further adapted to: and inputting the first sample image into the first network to obtain a second network corresponding to the style of the first sample image. In particular, the generating module 720 is further adapted to: extracting style texture features from the first sample image; and inputting the style texture features into the first network to obtain a second network corresponding to the style texture features.
The sample processing module 730 is adapted to: and generating a third sample image corresponding to the second sample image by using the second network.
The update module 740 is adapted to: and updating the weight parameters of the first network according to the loss between the third sample image and the first sample image and between the third sample image and the second sample image.
The update module 740 is further adapted to: and obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, and updating the weight parameter of the first network by using the first network loss function.
Wherein, those skilled in the art can set the specific content of the first network loss function according to actual needs, and the content is not limited herein. In one embodiment, the first network loss function may be:
Figure BSA0000147380410000171
wherein, IcFor the second sample image, IsI is the first sample image, I is the third sample image, CP is the perceptual function for perceiving the content difference, SP is the perceptual function for perceiving the style difference,
Figure BSA0000147380410000172
for a loss of content between the third sample image and the corresponding second sample image,
Figure BSA0000147380410000173
is the loss of style between the third sample image and the first sample image, theta is the weight parameter of the first network, and lambdacFor presetting content loss weight, λsWeight is lost for the default style.
The network training device is run iteratively until a predetermined convergence condition is met. Wherein the predetermined convergence condition includes: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches the preset visual effect parameter.
The network training device is further adapted to: in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is continuously and alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image, and extracting at least one second sample image continuously and alternatively. The network training device effectively shortens the time required by training the first network and improves the training efficiency of the first network.
The invention also provides a terminal which comprises the network training device. The terminal can be a mobile phone, a PAD, a computer, a camera device and the like.
The invention also provides a server which comprises the network training device.
The invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the network training method in any method embodiment. The computer storage medium can be a memory card of a mobile phone, a memory card of a PAD, a magnetic disk of a computer, a memory card of a camera device, and the like.
Fig. 8 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device. The computing device can be a mobile phone, a PAD, a computer, a camera device, a server, and the like.
As shown in fig. 8, the computing device may include: a processor (processor)802, a Communications Interface 804, a memory 806, and a communication bus 808.
Wherein:
the processor 802, communication interface 804, and memory 806 communicate with one another via a communication bus 808.
A communication interface 804 for communicating with network elements of other devices, such as clients or other servers.
The processor 802 is configured to execute the program 810, and may specifically perform relevant steps in the above network training method embodiments.
In particular, the program 810 may include program code comprising computer operating instructions.
The processor 802 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
The memory 806 stores a program 810. The memory 806 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 810 may be specifically configured to cause the processor 802 to perform the network training method in any of the method embodiments described above. For specific implementation of each step in the program 810, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing network training embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
By the scheme provided by the embodiment, the first network suitable for the images of any style and any content can be trained, and the first network is utilized to help to quickly obtain the corresponding image conversion network, so that the efficiency of image stylization processing is improved.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (15)

1. A method of network training, the method being performed by a plurality of iterations;
the training step of one iteration process comprises the following steps:
extracting a first sample image and a second sample image;
inputting the first sample image into a first network, and performing forward propagation operation in the first network to obtain a second network corresponding to the style of the first sample image;
generating a third sample image corresponding to the second sample image using the second network;
obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, performing back propagation operation according to the first network loss function to obtain an operation result, and updating the weight parameter of the first network according to the operation result;
the method comprises the following steps: iteratively executing the training steps until a preset convergence condition is met;
wherein, in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; and after the second sample image in the content image library is extracted, replacing the next first sample image and then extracting at least one second sample image.
2. The method of claim 1, wherein the extracting the first and second sample images further comprises:
a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library.
3. The method of claim 1 or 2, wherein the inputting the first sample image into the first network, resulting in a second network corresponding to a style of the first sample image further comprises:
extracting style texture features from the first sample image;
and inputting the style texture features into a first network to obtain a second network corresponding to the style texture features.
4. The method according to claim 1 or 2, wherein the predetermined convergence condition comprises:
the iteration times reach the preset iteration times; and/or the presence of a gas in the gas,
the output value of the first network loss function is smaller than a preset threshold value; and/or the presence of a gas in the gas,
and the visual effect parameter of the third sample image corresponding to the second sample image reaches a preset visual effect parameter.
5. The method according to claim 1 or 2, wherein the first network is a meta-network obtained by training a neural network, and the second network is an image transformation network.
6. The method according to claim 1 or 2, the method being performed by a terminal or a server.
7. A network training apparatus, the apparatus being completed through a plurality of iterations; the device comprises:
an extraction module adapted to extract a first sample image and a second sample image;
the generating module is suitable for inputting the first sample image into a first network and carrying out forward propagation operation in the first network to obtain a second network corresponding to the style of the first sample image;
a sample processing module adapted to generate a third sample image corresponding to the second sample image using the second network;
an updating module adapted to update the weight parameters of the first network according to the loss between the third sample image and the first and second sample images;
the network training device is operated iteratively until a preset convergence condition is met;
wherein the apparatus is further adapted to: in the process of multiple iterations, a first sample image is fixedly extracted, and at least one second sample image is alternatively extracted; when the second sample image in the content image library is extracted, replacing the next first sample image, and then extracting at least one second sample image;
the update module is further adapted to: obtaining a first network loss function according to the style loss between the third sample image and the first sample image and the content loss between the third sample image and the second sample image, performing back propagation operation according to the first network loss function to obtain an operation result, and updating the weight parameter of the first network according to the operation result.
8. The apparatus of claim 7, wherein the extraction module is further adapted to:
a first sample image is extracted from the genre image library, and at least a second sample image is extracted from the content image library.
9. The apparatus of claim 7 or 8, wherein the generating means is further adapted to:
extracting style texture features from the first sample image;
and inputting the style texture features into a first network to obtain a second network corresponding to the style texture features.
10. The apparatus of claim 7 or 8, wherein the predetermined convergence condition comprises: the iteration times reach the preset iteration times; and/or the output value of the first network loss function is smaller than a preset threshold value; and/or the visual effect parameter of the third sample image corresponding to the second sample image reaches a preset visual effect parameter.
11. The apparatus according to claim 7 or 8, wherein the first network is a meta-network obtained by training a neural network, and the second network is an image transformation network.
12. A terminal comprising the network training apparatus of any one of claims 7-11.
13. A server comprising the network training apparatus of any one of claims 7-11.
14. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the network training method according to any one of claims 1-6.
15. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the network training method of any one of claims 1-6.
CN201710555959.5A 2017-06-30 2017-06-30 Network training method and device, computing equipment and computer storage medium Active CN107392316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710555959.5A CN107392316B (en) 2017-06-30 2017-06-30 Network training method and device, computing equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710555959.5A CN107392316B (en) 2017-06-30 2017-06-30 Network training method and device, computing equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN107392316A CN107392316A (en) 2017-11-24
CN107392316B true CN107392316B (en) 2021-05-18

Family

ID=60335436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710555959.5A Active CN107392316B (en) 2017-06-30 2017-06-30 Network training method and device, computing equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN107392316B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038818A (en) * 2017-12-06 2018-05-15 电子科技大学 A kind of generation confrontation type network image style transfer method based on Multiple Cycle uniformity
CN109949255B (en) * 2017-12-20 2023-07-28 华为技术有限公司 Image reconstruction method and device
CN108537776A (en) * 2018-03-12 2018-09-14 维沃移动通信有限公司 A kind of image Style Transfer model generating method and mobile terminal
CN108733439A (en) * 2018-03-26 2018-11-02 西安万像电子科技有限公司 Image processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096358A (en) * 2015-08-05 2015-11-25 云南大学 Line enhanced simulation method for pyrography artistic effect
CN106847294A (en) * 2017-01-17 2017-06-13 百度在线网络技术(北京)有限公司 Audio-frequency processing method and device based on artificial intelligence
CN106886975A (en) * 2016-11-29 2017-06-23 华南理工大学 It is a kind of can real time execution image stylizing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105096358A (en) * 2015-08-05 2015-11-25 云南大学 Line enhanced simulation method for pyrography artistic effect
CN106886975A (en) * 2016-11-29 2017-06-23 华南理工大学 It is a kind of can real time execution image stylizing method
CN106847294A (en) * 2017-01-17 2017-06-13 百度在线网络技术(北京)有限公司 Audio-frequency processing method and device based on artificial intelligence

Also Published As

Publication number Publication date
CN107392316A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN107392842B (en) Image stylization processing method and device, computing equipment and computer storage medium
CN107277391B (en) Image conversion network processing method, server, computing device and storage medium
CN107277615B (en) Live broadcast stylization processing method and device, computing device and storage medium
CN107516290B (en) Image conversion network acquisition method and device, computing equipment and storage medium
CN107392316B (en) Network training method and device, computing equipment and computer storage medium
CN108875523B (en) Human body joint point detection method, device, system and storage medium
CN107730514B (en) Scene segmentation network training method and device, computing equipment and storage medium
CN111488149B (en) Canvas element-based table rendering method and device and computer equipment
CN110555795A (en) High resolution style migration
CN109522902B (en) Extraction of space-time feature representations
CN108010538B (en) Audio data processing method and device and computing equipment
CN110610154A (en) Behavior recognition method and apparatus, computer device, and storage medium
CN109859113B (en) Model generation method, image enhancement method, device and computer-readable storage medium
CN106327188B (en) Method and device for binding bank card in payment application
CN111160288A (en) Gesture key point detection method and device, computer equipment and storage medium
CN107644423B (en) Scene segmentation-based video data real-time processing method and device and computing equipment
CN108096833B (en) Motion sensing game control method and device based on cascade neural network and computing equipment
CN109543139A (en) Convolution algorithm method, apparatus, computer equipment and computer readable storage medium
KR20210014561A (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN107808394B (en) Image processing method based on convolutional neural network and mobile terminal
CN107577943B (en) Sample prediction method and device based on machine learning and server
CN111027670B (en) Feature map processing method and device, electronic equipment and storage medium
CN106126670B (en) Operation data sorting processing method and device
KR102239588B1 (en) Image processing method and apparatus
CN107622498B (en) Image crossing processing method and device based on scene segmentation and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant