WO2018161775A1 - 一种用于图像处理的神经网络模型的训练方法、装置和存储介质 - Google Patents

一种用于图像处理的神经网络模型的训练方法、装置和存储介质 Download PDF

Info

Publication number
WO2018161775A1
WO2018161775A1 PCT/CN2018/075958 CN2018075958W WO2018161775A1 WO 2018161775 A1 WO2018161775 A1 WO 2018161775A1 CN 2018075958 W CN2018075958 W CN 2018075958W WO 2018161775 A1 WO2018161775 A1 WO 2018161775A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
network model
intermediate image
loss
Prior art date
Application number
PCT/CN2018/075958
Other languages
English (en)
French (fr)
Inventor
黄浩智
王浩
罗文寒
马林
杨鹏
姜文浩
朱晓龙
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020197021770A priority Critical patent/KR102281017B1/ko
Priority to JP2019524446A priority patent/JP6755395B2/ja
Priority to EP18764177.4A priority patent/EP3540637B1/en
Publication of WO2018161775A1 publication Critical patent/WO2018161775A1/zh
Priority to US16/373,034 priority patent/US10970600B2/en
Priority to US17/187,473 priority patent/US11610082B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration

Definitions

  • the present application relates to the field of computer technology, and in particular, to a training method, apparatus, and storage medium for a neural network model for image processing.
  • the neural network model is usually used to transform the features of the image, such as image color feature conversion, image light and shadow feature conversion or image style feature conversion.
  • a neural network model for image processing needs to be trained.
  • the embodiment of the present application provides a training method for a neural network model for image processing, which is applied to an electronic device, and the method includes:
  • Embodiments of the present application propose a training apparatus for a neural network model for image processing, the apparatus comprising a processor and a memory connected to the processor, wherein the memory stores a machine executable by the processor A readable instruction module, the machine readable instruction module comprising:
  • An input obtaining module configured to acquire video frames adjacent to each other at multiple times
  • An output obtaining module configured to separately process the plurality of video frames by using a neural network model to output the corresponding intermediate image
  • a loss acquisition module configured to acquire optical flow information of a video frame whose timing is advanced to a video frame with a lower timing in the plurality of temporally adjacent video frames, and obtain an intermediate corresponding to the video frame with the highest timing Obtaining an image of the image according to the optical flow information; obtaining a time loss between the intermediate image corresponding to the video frame after the timing and the changed image; acquiring an intermediate corresponding to the plurality of temporally adjacent video frames Feature loss between the image and the target feature image;
  • a model adjustment module configured to adjust the neural network model according to the time loss and the feature loss, and return to the step of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies a training end condition .
  • the embodiment of the present application further provides a non-transitory computer readable storage medium, wherein the storage medium stores machine readable instructions, which are executable by a processor to perform the following operations:
  • 1A is a schematic diagram of an implementation environment of a training method for a neural network model for image processing according to an embodiment of the present application
  • 1B is a schematic diagram showing an internal structure of an electronic device for implementing a training method for a neural network model for image processing according to an embodiment of the present application;
  • FIG. 2 is a schematic flow chart of a training method for a neural network model for image processing according to an embodiment of the present application
  • FIG. 3 is a schematic flow chart of a training method for a neural network model for image processing according to another embodiment of the present application.
  • FIG. 4 is a training architecture diagram of a neural network model for image processing in an embodiment of the present application.
  • FIG. 5 is a structural block diagram of a training apparatus for a neural network model for image processing according to an embodiment of the present application
  • FIG. 6 is another structural block diagram of a training apparatus for a neural network model for image processing according to an embodiment of the present application.
  • the neural network model used for image processing trained by the traditional neural network model training method introduces a large amount of flickering (flickering) when the feature conversion of the video is not considered, since the time consistency between the video frames is not considered. , resulting in poor video feature conversion.
  • the embodiment of the present application provides a training method, device, and storage medium for a neural network model for image processing.
  • time loss and feature loss are used as a feedback adjustment basis.
  • the neural network model is adjusted to train a neural network model that can be used for image processing.
  • the intermediate image corresponding to the video frame with the preceding timing is changed to the video with the lower timing according to the video frame with the preceding timing.
  • the optical flow information of the frame is obtained as an intermediate image corresponding to the expected video frame at a later time, thereby obtaining a time loss.
  • This time loss reflects the loss in time consistency between the respective intermediate images of temporally adjacent video frames.
  • the trained neural network model considers the time consistency between video frames of the video when performing feature conversion on the video, which greatly reduces the flicker noise introduced during the feature conversion process, thereby improving the feature conversion of the video. Conversion effect.
  • the neural network model calculation is combined with the electronic device processor capability to process the video image, which improves the processor computing speed without sacrificing the video image feature conversion effect, thereby producing a better neural network model for image processing.
  • FIG. 1A is a schematic diagram of an implementation environment of a training method for a neural network model for image processing according to an embodiment of the present application.
  • the training device 11 for the neural network model of the image processing provided by the electronic device 1 is integrated with the training device 11 of the neural network model for image processing.
  • the electronic device 1 and the user terminal 2 are connected by a network 3, and the network 3 may be a wired network or a wireless network.
  • FIG. 1B is a schematic diagram showing the internal structure of an electronic device for implementing a training method for a neural network model for image processing according to an embodiment of the present application.
  • the electronic device includes a processor 102, a non-volatile storage medium 103, and an internal memory 104 connected by a system bus 101.
  • the non-volatile storage medium 103 of the electronic device stores an operating system 1031, and further stores a training device 1032 for a neural network model for image processing, and the training device 1032 for the neural network model for image processing is used.
  • a training method for a neural network model for image processing is implemented.
  • the processor 102 of the electronic device is used to provide computing and control capabilities to support the operation of the entire electronic device.
  • the internal memory 104 in the electronic device provides an environment for the operation of the training device for the neural network model of image processing in the non-volatile storage medium 103.
  • the internal memory 104 can store computer readable instructions that, when executed by the processor 102, can cause the processor 102 to perform a training method for a neural network model of image processing.
  • the electronic device can be a terminal or a server.
  • the terminal may be a personal computer or a mobile electronic device including at least one of a mobile phone, a tablet, a personal digital assistant, or a wearable device.
  • the server can be implemented as a stand-alone server or a server cluster consisting of multiple physical servers. A person skilled in the art can understand that the structure shown in FIG.
  • 1B is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the electronic device to which the solution of the present application is applied.
  • the specific electronic device may be More or less components than those shown in Figure IB are included, or some components are combined, or have different component arrangements.
  • FIG. 2 is a schematic flow chart of a training method of a neural network model for image processing in an embodiment of the present application. This embodiment is mainly illustrated by the method applied to the electronic device in Fig. 1B described above.
  • the training method for the neural network model for image processing specifically includes the following steps:
  • Step S202 acquiring a plurality of temporally adjacent video frames.
  • video refers to data that can be divided into a sequence of still images arranged in chronological order.
  • a still image obtained by dividing a video can be used as a video frame.
  • Time-adjacent video frames refer to adjacent video frames in a video frame arranged in time series.
  • the acquired temporally adjacent video frames may specifically be two or more than two temporally adjacent video frames. For example, if the video frames arranged in time series are p1, p2, p3, p4, ..., p1 and p2 are temporally adjacent video frames, and p1, p2, and p3 are also temporally adjacent video frames.
  • a set of training samples is disposed in the electronic device, and a plurality of sets of temporally adjacent video frames are stored in the training sample set, and the electronic device may acquire any set of temporally adjacent video frames from the training sample set.
  • the temporally adjacent video frames in the training sample set may be obtained by the electronic device according to the video segmentation obtained from the Internet, or may be obtained by the electronic device according to the video segmentation recorded by the imaging device included in the electronic device.
  • a plurality of training sample sets may be set in the electronic device, and each training sample set is set with a corresponding training sample set identifier.
  • the user can access the training sample set through the electronic device and select a training sample set for training through the electronic device.
  • the electronic device may detect a selection instruction that is triggered by the user and carries the identifier of the training sample set, and the electronic device extracts the training sample set identifier in the selection instruction, and acquires the temporally adjacent video frame from the training sample set corresponding to the training sample set identifier.
  • Step S204 The plurality of video frames are respectively processed by the neural network model to output the corresponding intermediate image.
  • the neural network model refers to a complex network model formed by multiple layers connected to each other.
  • the electronic device can train a neural network model, and the neural network model obtained after the training is completed can be used for image processing.
  • the neural network model may include a multi-layer feature conversion layer. Each layer of the feature conversion layer has a corresponding nonlinear variation operator. Each layer may have multiple nonlinear variation operators, and one nonlinear variation in each layer of the feature conversion layer. The operator makes a nonlinear change to the input image to obtain a feature map as the operation result.
  • Each feature conversion layer receives the operation result of the previous layer, and outputs the operation result of the layer to the next layer after its own operation.
  • the electronic device After acquiring the video frames adjacent to each other, the electronic device inputs the temporally adjacent video frames into the neural network model, and sequentially passes through the feature conversion layers of the neural network model. On each layer of the feature conversion layer, the electronic device uses the nonlinear variation operator corresponding to the feature conversion layer to nonlinearly change the pixel value corresponding to the pixel point included in the feature image outputted by the previous layer, and outputs the current feature.
  • the feature map on the conversion layer Wherein, if the current feature conversion layer is the first-level feature conversion layer, the feature image outputted by the upper layer is the input video frame.
  • the pixel value corresponding to the pixel may specifically be an RGB (Red Green Blue) three-channel color value of the pixel.
  • the neural network model to be trained may specifically include three convolution layers, five residual modules, two deconvolution layers, and one convolution layer.
  • the electronic device After inputting the video frame into the neural network model, the electronic device first passes through the convolution layer, and each convolution kernel corresponding to the convolution layer performs a convolution operation on the pixel value matrix corresponding to the input video frame, and obtains the volume in the convolution layer.
  • the corresponding pixel value matrix that is, the feature map
  • the feature map of the number of cores is further calculated according to the offset term corresponding to each feature map for the pixel values of the corresponding pixel positions in each feature map, and a feature map is synthesized as the intermediate image of the output.
  • the electronic device can be set to perform a downsampling operation after a convolution operation of one of the convolution layers.
  • the method of downsampling may specifically be mean sampling, or extreme sampling.
  • the resolution of the feature map obtained after the downsampling operation is reduced to 1/4 of the resolution of the input video frame.
  • the electronic device needs to set an upsampling operation corresponding to the previous downsampling operation after the deconvolution operation of the deconvolution layer, so that the resolution of the feature map obtained after the upsampling operation is increased to be before the upsampling operation.
  • the resolution of the feature map is 4 times to ensure that the intermediate image of the output is consistent with the resolution of the input video frame.
  • the number of layers included in the neural network model and the type of the layer can be customized, or adjusted according to subsequent training results. However, the resolution of the image that satisfies the input neural network model is consistent with the resolution of the image output by the neural network model.
  • Step S206 Acquire optical flow information of a video frame whose timing is earlier in the plurality of temporally adjacent video frames and whose video frame is changed to a later time.
  • the optical flow can represent the moving speed of the gray mode in the image.
  • All optical streams in the image arranged in spatial position constitute an optical flow field.
  • the optical flow field characterizes the variation of pixel points in the image and can be used to determine the motion information of corresponding pixels between images.
  • the video frame with the highest timing refers to the video frame with the earlier time stamp in the video frame adjacent to the time; the video frame with the lower timing refers to the time stamp in the video frame adjacent to the time.
  • Late video frames For example, time-adjacent video frames are sequentially arranged as x1, x2, and x3, and x1 is a video frame with a timing ahead with respect to x2 and x3; x2 is a video frame with a lower timing than x1, and x2 is relative to x3. The video frame with the highest timing.
  • the optical frame information of the video frame whose timing is changed to the video frame whose timing is later may be represented by the optical flow field between the video frame with the highest timing and the video frame with the lower timing.
  • the method for calculating the optical flow information may specifically be a differential-based optical flow algorithm obtained according to an optical flow constraint equation, an optical flow algorithm based on region matching, an energy-based optical flow algorithm, and a phase-based light.
  • the flow algorithm and the neurodynamic optical flow algorithm and the like are not specifically limited in the embodiment of the present application.
  • the electronic device may calculate, according to the method for calculating the optical flow information, the optical flow information of the video frame whose timing is changed to the video frame with the lower timing, and obtain corresponding points of each pixel in the video frame with the highest timing.
  • the electronic device can also select feature points from the video frames with the highest timing, and use the sparse optical flow calculation method to calculate the corresponding optical flow of the selected feature points.
  • the position of the pixel A in the video frame with the highest timing is (x1, y1)
  • the position of the pixel A in the video frame with the lower timing is (x2, y2)
  • the velocity vector of the pixel A The vector field formed by the velocity vector of the corresponding pixel in the video frame of the preceding video is changed to the optical field of the video frame with the timing of the previous video frame .
  • the electronic device may calculate between the adjacent two video frames in the temporally adjacent video frames.
  • the optical flow information may also calculate optical flow information between two adjacent video frames in temporally adjacent video frames.
  • the time-adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the electronic device can calculate optical flow information between x1 and x2, optical flow information between x2 and x3, and can also calculate x1 and x3.
  • the electronic device may determine the calculated optical flow information when calculating the optical frame information of the video frame whose timing is changed to the video frame with the lower timing according to the method for calculating the optical flow information.
  • Confidence The confidence of the optical flow information is in one-to-one correspondence with the optical flow information, and is used to indicate the reliability of the corresponding optical flow information. The higher the confidence of the optical flow information, the more accurate the motion information of the pixel points in the image represented by the calculated optical flow information.
  • Step S208 Acquire an image in which the intermediate image corresponding to the video frame with the highest timing is changed by the optical flow information.
  • the electronic device may change the pixel points included in the intermediate image corresponding to the video frame with the preceding timing to the optical flow information of the video frame with the lower timing to change the optical frame information of the video frame with the lower timing, and obtain the changed The image formed by the pixel, that is, the pixel value distribution of the intermediate image corresponding to the video frame whose timing is later.
  • the electronic device may be between adjacent two video frames in the video frames adjacent in time.
  • the optical flow information is obtained by obtaining an intermediate image corresponding to the video frame of the adjacent two video frames in the adjacent two video frames according to the optical flow information for the intermediate image corresponding to the video frame of the adjacent two video frames.
  • temporally adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the intermediate images of x1, x2, and x3 output by the neural network model are sequentially sorted into y1, y2, and y3.
  • the optical flow information of x1 changing to x2 is g1
  • the optical flow information of x2 changing to x3 is g2
  • the electronic device can change y1 according to g1 to z2, and z2 to g3 according to g2; wherein z2 is the intermediate corresponding to x2 expected Image, z3 is the intermediate image expected by x3.
  • the electronic device may also use the optical flow information between two adjacent non-adjacent video frames in a temporally adjacent video frame, and the video with the highest timing in the two adjacent non-adjacent video frames.
  • the intermediate image corresponding to the frame obtains an intermediate image corresponding to the expected video frame in the two adjacent video frames that are not adjacent according to the optical flow information.
  • temporally adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the intermediate images of x1, x2, and x3 output by the neural network model are sequentially sorted into y1, y2, and y3.
  • the optical flow information of x1 changing to x3 is g3, and the electronic device can change y1 to z3 according to g3, and z3 is an intermediate image corresponding to x3 expected.
  • the electronic device may also use the confidence of the optical flow information as a weight when the pixel included in the intermediate image corresponding to the video frame with the preceding timing is changed according to the corresponding optical flow information. An image formed by the changed pixel points.
  • step S210 the time loss between the intermediate image corresponding to the video frame after the timing and the changed image acquired in step S208 is acquired.
  • the time loss can be used to represent the change of the temporally adjacent video frames in the time domain, and the difference in the time domain between the images obtained by the temporally adjacent video frames processed by the neural network model.
  • the electronic device may change the intermediate image corresponding to the video frame with the lower timing and the intermediate image corresponding to the video frame with the previous timing to the optical frame of the video frame with the lower timing.
  • the image after the information change is compared to obtain a difference between the two, and the time loss between the intermediate image corresponding to the video frame after the timing and the changed image is determined according to the difference.
  • the video frame with the highest timing is x t-1
  • the video frame with the lower timing is x t
  • the light with x t-1 changes to x t
  • the stream information is G t .
  • the intermediate image output by x t-1 after processing by the neural network model is y t-1
  • the intermediate image output by x t after being processed by the neural network model is y t .
  • the electronic device can change y t-1 according to the optical flow information G t changed from x t-1 to x t , and obtain z t , z t can be used as the neural network model corresponding to the expected video frame x t The image that is output after processing. The electronic device may then compare the difference between t and y t Z, resulting in loss between the time t y and z t.
  • temporally adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the intermediate images corresponding to x1, x2, and x3 output by the neural network model are sequentially sorted into y1, y2, and y3.
  • the optical flow information in which x1 changes to x2 is g1
  • the optical flow information in which x2 changes to x3 is g2
  • the optical flow information in which x1 changes to x3 is g3.
  • the electronic device can change y1 according to g1 to z2, z2 according to g2 to z3, y1 to g3 according to g3, z2 to x2 expected corresponding intermediate image, z3 and z'3 are both x3 expected corresponding intermediate Image, electronic device can compare the difference between y2 and z2, get the time loss between y2 and z2; electronic device can compare the difference between y3 and z3, and the difference between y3 and z'3, according to the weight of z3 and z'3 Time loss between y3 and z3 and z'3.
  • Step S212 Acquire feature loss between the intermediate image corresponding to the plurality of temporally adjacent video frames and the target feature image.
  • the image feature to be converted to when the neural network model is used for feature conversion is the image feature corresponding to the target feature image.
  • the feature loss is the difference between the image feature corresponding to the intermediate image output by the neural network model and the image feature corresponding to the target feature image.
  • the image features may specifically be image color features, image light features or image style features, and the like.
  • the target feature image may be a target color feature image, a target light shadow feature image or a target style feature image, and the like; the feature loss of the intermediate image and the target feature image may specifically be a color feature loss, a light feature loss or a style feature loss.
  • the electronic device may first determine an image feature to be trained, and acquire an image conforming to the image feature as the target feature image.
  • the electronic device can further extract the image features corresponding to the target feature image by using the trained neural network model for extracting image features, and then compare the image features corresponding to the intermediate image with the image features corresponding to the target feature image to obtain The difference between the two determines the feature loss between the intermediate image and the target feature image based on the difference.
  • the neural network model is used to transform the image style features of the image
  • the target style feature image is S
  • the number of frames of temporally adjacent video frames is two frames
  • the video frame with the highest timing is x t-1 , timing.
  • the lower video frame is x t .
  • the intermediate image output by x t-1 after processing by the neural network model is y t-1
  • the intermediate image output by x t after being processed by the neural network model is y t .
  • the electronic device can compare the difference between y t-1 and S and the difference between y t and S, respectively, to obtain the style feature loss between y t-1 and S and the style feature loss between y t and S.
  • Step S214 adjusting the neural network model according to the time loss and the feature loss, and returning to the step S202 of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies the training end condition.
  • the process of training the neural network model is a process of determining a nonlinear change operator corresponding to each feature conversion layer in the neural network model to be trained.
  • the electronic device may first initialize the nonlinear change operator corresponding to each feature conversion layer in the neural network model to be trained, and continuously optimize the nonlinear change of the initialization in the subsequent training process.
  • the operator is used to optimize the optimal nonlinear variation operator as the nonlinear variation operator of the trained neural network model.
  • the electronic device may construct a time domain loss function according to the time loss, construct a spatial domain loss function according to the feature loss, combine the time domain loss function with the spatial domain loss function to obtain a mixed loss function, and then calculate the mixed loss function.
  • the rate of change of the nonlinear variation operator corresponding to each feature transformation layer in the neural network model can adjust the nonlinear change operator corresponding to each feature conversion layer in the neural network model according to the calculated rate of change, so that the calculated rate of change becomes smaller, so that the neural network model is optimized for training.
  • the training end condition may be that the number of trainings of the neural network model reaches a preset number of trainings.
  • the electronic device can count the number of trainings when training the neural network model. When the counting reaches the preset training number, the electronic device can determine that the neural network model satisfies the training end condition and end training of the neural network model.
  • the training end condition may also be that the mixing loss function satisfies the convergence condition.
  • the mixed loss function calculated after each training is recorded with the rate of change of the nonlinear change operator corresponding to each feature conversion layer in the neural network model, and the calculated When the rate of change gradually approaches a certain value, the electronic device can determine that the neural network model satisfies the training end condition and end training of the neural network model.
  • the above training method for neural network model for image processing when training the neural network model, adjusts the neural network model by using time loss and feature loss synergy as a feedback adjustment basis to train a neural network model that can be used for image processing.
  • the intermediate image corresponding to the video frame with the preceding timing is changed to the video frame with the lower timing according to the video frame with the preceding timing.
  • the optical flow information is obtained as an intermediate image corresponding to the expected video frame of the timing, thereby obtaining time loss. This time loss reflects the loss in time consistency between the respective intermediate images of temporally adjacent video frames.
  • the trained neural network model considers the time consistency between video frames of the video when performing feature conversion on the video, which greatly reduces the flicker noise introduced during the feature conversion process, thereby improving the feature conversion of the video. Conversion effect.
  • the neural network model calculation is combined with the electronic device processor capability to process the video image, which improves the processor computing speed without sacrificing the video image feature conversion effect, thereby producing a better neural network model for image processing.
  • the adjusting the neural network model according to the time loss and the feature loss in the training method of the neural network model for image processing specifically includes: acquiring content between the input video frames corresponding to the intermediate image and the intermediate image. Loss; generates training cost based on time loss, feature loss, and content loss; adjusts the neural network model according to training cost.
  • the content loss refers to the difference in image content between the intermediate image output through the neural network model and the corresponding input video frame.
  • the electronic device may use the trained neural network model for extracting image content features to respectively extract the image content feature corresponding to the intermediate image and the image content feature corresponding to the input video frame corresponding to the intermediate image, and then corresponding the intermediate image.
  • the image content feature is compared with the image content feature corresponding to the corresponding input video frame to obtain a difference between the two, and the content loss between the intermediate image and the corresponding video frame is determined according to the difference.
  • the electronic device may construct a time domain loss function according to the time loss, and jointly construct a spatial domain loss function according to the feature loss and the content loss, and generate a positive correlation with the time domain loss function and a positive spatial domain loss function.
  • the electronic device can recalculate the training rate with the change rate of the nonlinear change operator corresponding to each feature conversion layer in the neural network model, and adjust the nonlinear change operator corresponding to each feature conversion layer in the neural network model according to the calculated change rate. So that the calculated rate of change becomes smaller, so that the neural network model is optimized for training.
  • the electronic device may further perform denoising processing on the intermediate image output by the neural network model. Specifically, the electronic device may determine a total variation minimization term for performing denoising processing on edge pixels of the intermediate image based on a denoising algorithm that implements Total Variation (TV), and convert the total variation The minimized joint joint feature loss and content loss jointly construct a spatial domain loss function for neural network model training. This method of denoising the image by using the total variation minimization term improves the conversion effect of the neural network model on the feature conversion of the video.
  • TV Total Variation
  • the neural network model when training the neural network model, is adjusted by using the time loss, the feature loss, and the content loss as a feedback adjustment basis to train a neural network model that can be used for image processing, from time to time,
  • the three dimensions of content and feature ensure the accuracy of image feature conversion, and improve the conversion effect of the trained neural network model on feature conversion of video.
  • step S210 specifically includes: subtracting an intermediate image corresponding to the video frame with a lower timing from a corresponding pixel position in the changed image to obtain a difference distribution map; and according to the difference distribution graph, The time loss between the intermediate image corresponding to the video frame after the timing and the changed image is determined.
  • the difference distribution map obtained by subtracting the value of the corresponding pixel position in the intermediate image corresponding to the video frame with the lower-order video frame by the electronic device may be a pixel value difference matrix.
  • the electronic device can perform a dimensionality reduction operation on the difference distribution map to obtain a time loss value. After the electronic equipment selects the dimensionality reduction operation method for the first time calculation of the time loss, the subsequent time loss calculation adopts the selected dimensionality reduction operation mode.
  • the intermediate image corresponding to the video frame after the timing is compared with the changed image by the difference between the pixel value corresponding to the pixel position in the video frame after the timing and the pixel value of the corresponding pixel position in the changed image.
  • the time loss between images makes the calculation of time loss more accurate.
  • the step of acquiring the content loss between the input video frame corresponding to the intermediate image and the intermediate image in the training method of the neural network model for image processing comprises: the video frame and the corresponding intermediate image Entering an evaluation network model; obtaining a feature map corresponding to the video frame and a feature map corresponding to the intermediate image, and a feature map corresponding to the corresponding video frame according to the intermediate image; , determining the content loss between the intermediate image and the corresponding video frame.
  • the evaluation network model is used to extract image features of the input image.
  • the evaluation network model may specifically be an Alexnet network model, a VGG (Visual Geometry Group) network model, or a GoogLeNet network.
  • the layer included in the evaluation network model corresponds to a plurality of feature extraction factors, and each feature extraction factor extracts different features.
  • the feature map is an image processing result obtained by processing the input image by evaluating a layer change operator in the network model, and the image processing result is an image feature matrix, and the image feature matrix is processed by the change operator to process the input image matrix.
  • the response value is composed.
  • the evaluation network model can obtain a matrix of pixel values corresponding to the input video frame and a matrix of pixel values corresponding to the corresponding intermediate image.
  • the layer included in the evaluation network model operates on the pixel value matrix corresponding to the input video frame or the intermediate image according to the feature extraction factor corresponding to the layer, and obtains a corresponding response value to form a feature map.
  • the characteristics of different layers extracted in the evaluation network model are different.
  • the electronic device may previously set a feature map of the layer output of the extracted image content feature in the evaluation network model as a feature map for performing content loss calculation.
  • the layer for extracting image content features in the evaluation network model may be one layer or multiple layers.
  • the electronic device After acquiring the feature map corresponding to the input image corresponding to the intermediate image and the input image corresponding to the intermediate image, the electronic device selects the corresponding pixel position in the feature map corresponding to the intermediate image and the corresponding image frame corresponding to the corresponding video frame.
  • the pixel values are subtracted to obtain a content difference matrix between the two, and the content difference matrix is subjected to dimensionality reduction to obtain content loss.
  • the image content of the video frame before the feature conversion and the image content of the feature image after the feature conversion are extracted by evaluating the network model, and the feature image extracted from the image feature is extracted to calculate the image between the corresponding input images.
  • Content loss makes the calculation of content loss more accurate.
  • step S212 specifically includes: inputting an intermediate image and a target feature image into the evaluation network model; acquiring a feature image corresponding to the intermediate image and corresponding to the target feature image, which is output by the layer included in the evaluation network model.
  • the feature map determines the feature loss between the intermediate image and the target feature image according to the feature map corresponding to the intermediate image and the feature map corresponding to the target feature image.
  • the electronic device may previously set a feature map of the layer output of the extracted image feature in the evaluation network model as a feature map for performing feature loss calculation.
  • the layer for extracting image features in the evaluation network model may be one layer or multiple layers.
  • the image feature of the target feature image and the feature-converted intermediate image is extracted by evaluating the network model, and the feature map extracted from the image feature is used to calculate the feature loss between the corresponding input images. To make the calculation of feature loss more accurate.
  • the training method for the neural network model for image processing determines the feature between the intermediate image and the target feature image according to the feature map corresponding to the intermediate image and the feature map corresponding to the target feature image.
  • the step of depleting specifically includes: determining a feature matrix corresponding to the intermediate image according to the feature map corresponding to the intermediate image; determining a feature matrix corresponding to the target feature image according to the feature map corresponding to the target feature image; and corresponding to the intermediate image
  • the values of the corresponding positions in the feature matrix corresponding to the feature matrix and the target feature image are subtracted to obtain a feature difference matrix; and the feature loss between the intermediate image and the target feature image is determined according to the feature difference matrix.
  • the neural network model is used to perform image style feature conversion on the image
  • the feature matrix corresponding to the intermediate image may specifically be a style feature matrix.
  • the style feature matrix is a matrix that reflects image style features.
  • the style feature matrix may specifically be a Gram Matrix.
  • the electronic device can obtain the corresponding Gram matrix as the style feature matrix corresponding to the intermediate image by taking the inner product of the feature image corresponding to the intermediate image, and obtain the inner product of the feature image corresponding to the target style image to obtain the corresponding Gram.
  • the matrix is used as the style feature matrix corresponding to the target style image.
  • the electronic device can further subtract the value of the corresponding position in the style feature matrix corresponding to the intermediate image and the corresponding position in the style feature matrix corresponding to the target style image to obtain a style difference feature matrix; and then perform a dimensionality reduction operation on the style difference feature matrix to obtain a style feature. loss.
  • the feature matrix that can reflect the image features is used to calculate the feature loss between the image obtained by the feature conversion and the target feature image, so that the calculation of the feature loss is more accurate.
  • the electronic device may select a VGG-19 network model as the evaluation network model, and the network model includes a 16-layer convolution layer and a 5-layer pooling layer.
  • the feature extracted by the fourth layer of the model can reflect the characteristics of the image content.
  • the features extracted by the first, second, third and fourth layers of the model can reflect the image style features.
  • the electronic device may acquire the feature map corresponding to the intermediate image outputted by the fourth layer of the convolutional layer and the feature image corresponding to the input video frame corresponding to the intermediate image, and calculate the intermediate image and the corresponding video frame based on the acquired feature image. The loss of content between.
  • the electronic device may acquire a feature map corresponding to the intermediate image outputted by the first, second, third, and fourth layers of the convolutional layer, and a feature map corresponding to the input video frame corresponding to the intermediate image, and calculate the middle based on the acquired feature image.
  • the training method for the neural network model for image processing adjusts the neural network model according to the training cost, including: determining the training cost in the reverse order according to the order of the layers included in the neural network model. Corresponding change rate of nonlinear change operator; adjust the nonlinear change operator corresponding to the layer included in the neural network model in reverse order, so that the training cost decreases with the change rate of the nonlinear change operator corresponding to the corresponding adjusted layer small.
  • the electronic device may determine the rate of change of the training cost from the nonlinear change operator corresponding to the current layer from the last layer included in the neural network model according to the order of the layers included in the neural network model, and then determine the training cost in reverse order. The rate of change of the operator with the nonlinear variation corresponding to each layer.
  • the electronic device can sequentially adjust the nonlinear change operator corresponding to the layer included in the neural network model in reverse order, so that the training rate decreases with the change rate of the nonlinear change operator corresponding to the corresponding adjusted layer.
  • the nonlinear change operator corresponding to the first layer in reverse order is z
  • the rate of change of L with z is The nonlinear variation operator corresponding to the second layer of the reverse order
  • the rate of change of L with b is
  • the nonlinear change operator corresponding to the third layer of the reverse order is c
  • the rate of change of L with c is When solving the rate of change, the chain derivation will conduct the gradient layer by layer to the previous layer.
  • the electronic device can sequentially adjust the nonlinear change operator z, b, c to the first layer included in the neural network model (ie, the last layer of the reverse order).
  • the corresponding nonlinear change operator reduces the rate of change obtained in the last layer of the reverse order.
  • the training cost may be specifically expressed as:
  • L hybrid represents the training cost
  • L spatial (x i , y i , s) represents the spatial domain loss function
  • L temporal (y t , y t-1 ) represents the time domain loss function, generated by time loss
  • is time The corresponding weight of the domain loss function.
  • the spatial domain loss function can be expressed as:
  • l represents a layer for extracting image features in the evaluation network model; Representing the content loss between the image input to the neural network model and the image output by the neural network model; Represents the feature loss between the image output by the neural network model and the target feature image; R tv represents the total variation minimization term; ⁇ , ⁇ , and ⁇ are the weights corresponding to the various losses. For example, the value of ⁇ may be 1, the value of ⁇ may be 1, and the value of ⁇ may be 10 4 .
  • the rate of change of the training cost with the nonlinear change operator corresponding to each layer of the neural network model is solved by the back propagation method, and the nonlinear change operator corresponding to each layer of the neural network model is adjusted to calculate The rate of change is reduced to train the neural network model, so that the trained neural network model is better for image conversion.
  • a training method for a neural network model for image processing specifically includes the following steps:
  • Step S302 acquiring a plurality of temporally adjacent video frames.
  • Step S304 the plurality of video frames are respectively processed by a neural network model to cause the neural network model to output a corresponding intermediate image.
  • Step S306 Acquire optical flow information of the video frame whose timing is changed to the video frame whose timing is later.
  • Step S308 the image of the intermediate image corresponding to the video frame with the highest timing is changed according to the optical flow information.
  • Step S310 subtracting the intermediate image corresponding to the lower-order video frame from the value of the corresponding pixel position in the changed image to obtain a difference distribution map; and determining, according to the difference distribution map, the corresponding video frame corresponding to the timing Time loss between the intermediate image and the changed image.
  • Step S312 input the intermediate image and the target feature image into the evaluation network model; acquire the feature image corresponding to the intermediate image and the feature image corresponding to the target feature image output by the layer included in the evaluation network model; and according to the feature corresponding to the intermediate image a map, determining a feature matrix corresponding to the intermediate image; determining a feature matrix corresponding to the target feature image according to the feature map corresponding to the target feature image; corresponding to the feature matrix corresponding to the intermediate image and the feature matrix corresponding to the target feature image The values of the positions are subtracted to obtain a feature difference matrix; and the feature loss between the intermediate image and the target feature image is determined according to the feature difference matrix.
  • Step S314 input an intermediate image corresponding to the video frame and the video frame into the evaluation network model; acquire a feature map corresponding to the video frame and a feature map corresponding to the intermediate image, which are output by the layer included in the evaluation network model; The corresponding feature map and the corresponding feature map corresponding to the video frame determine the content loss between the intermediate image and the corresponding video frame.
  • Step S316 generating a training cost based on time loss, feature loss, and content loss.
  • Step S318 according to the sequence of layers included in the neural network model, determining the rate of change of the training cost with the nonlinear change operator corresponding to each layer in reverse order; adjusting the nonlinear change corresponding to the layer included in the neural network model in reverse order
  • the sub-segment reduces the training rate with the rate of change of the nonlinear variation operator corresponding to the corresponding adjusted layer.
  • step S320 it is determined whether the neural network model satisfies the training end condition; if the neural network model satisfies the training end condition, the process proceeds to step S322; if the neural network model does not satisfy the training end condition, the process proceeds to step S302.
  • step S322 the training neural network model ends.
  • the time loss, the feature loss and the content loss are used as feedback adjustment basis to adjust the neural network model, in three dimensions of time, feature and content. Training the neural network model improves the training effect of the neural network model.
  • FIG. 4 is a diagram showing a training architecture of a neural network model for image processing in one embodiment of the present application.
  • the neural network model embodiment convolution of three layers, a residual module 5, two layers deconvolution and convolution layers 1, the electronic device may forward the timing of the present embodiment video frame x t -1 , the video frames x t after the timing are respectively input into the neural network model, and the intermediate images output by the neural network model are obtained as y t-1 and y t .
  • the electronic device can obtain the time domain loss function of y t-1 and y t according to the optical flow information between x t-1 and x t ; and then x t-1 , x t , y t-1 , y t and The target feature image S is input to the evaluation network model, and the content loss between x t-1 and y t-1 , x t and y t is obtained by evaluating the feature map of the layer output included in the network model, y t-1 and S , the characteristic loss between y t and S, resulting in a spatial domain loss function.
  • the neural network model may be used for video feature conversion.
  • the electronic device may divide the video that needs to be feature-converted into temporally adjacent video frames, and sequentially input the segmented video frames into the trained neural network model, and after processing the neural network model, obtain the feature conversion corresponding to each frame of the video frame.
  • the output image is combined with each output image according to the time sequence of the corresponding input video frame to obtain the feature-converted video.
  • the neural network model can perform feature conversion on multiple frames of video frames at the same time.
  • a training apparatus 500 for a neural network model for image processing is provided, and the apparatus specifically includes: an input acquisition module 501, an output acquisition module 502, a loss acquisition module 503, and Model adjustment module 504.
  • the input obtaining module 501 is configured to acquire a plurality of temporally adjacent video frames.
  • the output obtaining module 502 is configured to process the plurality of video frames respectively through a neural network model to output the corresponding intermediate image.
  • the loss obtaining module 503 is configured to acquire optical flow information of the video frame whose timing is earlier in the plurality of temporally adjacent video frames and to the video frame that is later than the timing; and obtain an intermediate image corresponding to the video frame with the highest timing Obtaining an image according to the optical flow information; obtaining a time loss between the intermediate image corresponding to the video frame after the timing and the changed image; acquiring an intermediate image corresponding to the plurality of temporally adjacent video frames and the target feature image Characteristic loss.
  • the model adjustment module 504 is configured to adjust the neural network model according to the time loss and the feature loss, and return to the process of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies the training end condition.
  • the model adjustment module 504 is further configured to acquire content loss between an intermediate image corresponding to a plurality of temporally adjacent video frames and a corresponding video frame; and generate, according to time loss, feature loss, and content loss. Training cost; adjust the neural network model according to the training cost.
  • the neural network model when training the neural network model, is adjusted by using the time loss, the feature loss, and the content loss as a feedback adjustment basis to train a neural network model that can be used for image processing, from time to time,
  • the three dimensions of content and feature ensure the accuracy of image feature conversion, and improve the conversion effect of the trained neural network model on feature conversion of video.
  • the model adjustment module 504 is further configured to input a video frame corresponding to the intermediate image and the intermediate image into an evaluation network model, and obtain a layer output included in the evaluation network model, corresponding to the video frame. a feature map and a feature map corresponding to the intermediate image; determining a content loss between the intermediate image and the corresponding video frame according to the feature map corresponding to the intermediate image and the feature map corresponding to the corresponding video frame.
  • the image content of the video frame before the feature conversion and the image content of the feature image after the feature conversion are extracted by evaluating the network model, and the feature image extracted from the image feature is extracted to calculate the image between the corresponding input images.
  • Content loss makes the calculation of content loss more accurate.
  • the model adjustment module 504 is further configured to determine the rate of change of the training cost with the nonlinear change operator corresponding to each layer according to the order of the layers included in the neural network model; adjust the neural network in reverse order The nonlinear variation operator corresponding to the layer included in the model reduces the training rate with the rate of change of the nonlinear variation operator corresponding to the corresponding adjusted layer.
  • the rate of change of the training cost with the nonlinear change operator corresponding to each layer of the neural network model is solved by the back propagation method, and the nonlinear change operator corresponding to each layer of the neural network model is adjusted to calculate The rate of change is reduced to train the neural network model, so that the trained neural network model is better for image conversion.
  • the loss obtaining module 503 is further configured to subtract the value of the corresponding pixel position in the intermediate image corresponding to the lower-order video frame from the corresponding pixel position in the changed image to obtain a difference distribution map; The figure determines the time loss between the intermediate image corresponding to the video frame after the timing and the changed image.
  • the intermediate image corresponding to the video frame after the timing is compared with the changed image by the difference between the pixel value corresponding to the pixel position in the video frame after the timing and the pixel value of the corresponding pixel position in the changed image.
  • the time loss between images makes the calculation of time loss more accurate.
  • the loss obtaining module 503 is further configured to input the intermediate image and the target feature image into the evaluation network model; and acquire the feature image corresponding to the intermediate image and the target feature image outputted by the layer included in the evaluation network model. Corresponding feature map; determining feature loss between the intermediate image and the target feature image according to the feature map corresponding to the intermediate image and the feature map corresponding to the target feature image.
  • the image feature of the target feature image and the feature-converted intermediate image is extracted by evaluating the network model, and the feature map extracted from the image feature is used to calculate the feature loss between the corresponding input images, so that the feature is obtained.
  • the calculation of losses is more accurate.
  • the loss obtaining module 503 is further configured to determine, according to the feature map corresponding to the intermediate image, a feature matrix corresponding to the intermediate image; and determine, according to the feature map corresponding to the target feature image, the target feature image.
  • the feature matrix; the feature matrix corresponding to the intermediate image and the corresponding position value in the feature matrix corresponding to the target feature image are subtracted to obtain a feature difference matrix; and the feature loss between the intermediate image and the target feature image is determined according to the feature difference matrix.
  • the feature matrix that can reflect the image features is used to calculate the feature loss between the image obtained by the feature conversion and the target feature image, so that the calculation of the feature loss is more accurate.
  • FIG. 6 is another structural block diagram of a training apparatus for a neural network model for image processing according to an embodiment of the present application.
  • the apparatus includes a processor 610, and a memory 630 coupled to the processor 610 via a bus 620.
  • a machine readable instruction module executable by the processor 610 is stored in the memory 630.
  • the machine readable instruction module includes an input acquisition module 601, an output acquisition module 602, a loss acquisition module 603, and a model adjustment module 604.
  • the input obtaining module 601 is configured to acquire a plurality of temporally adjacent video frames.
  • the output obtaining module 602 is configured to process the plurality of video frames respectively through a neural network model to output the corresponding intermediate image.
  • the loss acquisition module 603 is configured to acquire optical flow information of a video frame whose timing is earlier in the plurality of temporally adjacent video frames and to a video frame that is later than the timing; and obtain an intermediate image corresponding to the video frame with the highest timing Obtaining an image according to the optical flow information; obtaining a time loss between the intermediate image corresponding to the video frame after the timing and the changed image; acquiring an intermediate image corresponding to the plurality of temporally adjacent video frames and the target feature image Characteristic loss.
  • the model adjustment module 604 is configured to adjust the neural network model according to the time loss and the feature loss, and return to the process of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies the training end condition.
  • the specific functions of the input obtaining module 601, the output obtaining module 602, the loss obtaining module 603, and the model adjusting module 604 are the same as the foregoing input obtaining module 501, output obtaining module 502, loss obtaining module 503, and model adjustment. Module 504 is the same and will not be described again here.
  • the training device for the neural network model for image processing when training the neural network model, adjusts the neural network model by using the time loss and the feature loss as a feedback adjustment basis to train a neural network model that can be used for image processing.
  • the intermediate image corresponding to the video frame with the preceding timing is changed to the video with the lower timing according to the video frame with the preceding timing.
  • the optical flow information of the frame is obtained as an intermediate image corresponding to the expected video frame at a later time, thereby obtaining a time loss. This time loss reflects the loss in time consistency between the respective intermediate images of temporally adjacent video frames.
  • the trained neural network model considers the time consistency between video frames of the video when performing feature conversion on the video, which greatly reduces the flicker noise introduced during the feature conversion process, thereby improving the feature conversion of the video. Conversion effect.
  • the neural network model calculation is combined with the electronic device processor capability to process the video image, which improves the processor computing speed without sacrificing the video image feature conversion effect, thereby producing a better neural network model for image processing.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Quality & Reliability (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

一种用于图像处理的神经网络模型的训练方法、装置和存储介质,所述方法包括:获取多个时间相邻的视频帧(S202);将所述多个视频帧分别经过神经网络模型处理以使神经网络模型输出相对应的中间图像(S204);获取多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息(S206);获取时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像(S208);获取时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗(S210);获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗(S212);根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件(S214)。

Description

一种用于图像处理的神经网络模型的训练方法、装置和存储介质
本申请要求于2017年3月8日提交中国专利局、申请号为201710136471.9,申请名称为“用于图像处理的神经网络模型的训练方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种用于图像处理的神经网络模型的训练方法、装置和存储介质。
背景技术
随着计算机技术的发展,在图像处理技术中,通常会用到神经网络模型来对图像的特征进行转换处理,比如图像颜色特征转换、图像光影特征转换或者图像风格特征转换等。在通过神经网络模型对图像进行特征转换处理之前,需要先训练出用于图像处理的神经网络模型。
发明内容
本申请实施例提出一种用于图像处理的神经网络模型的训练方法,应用于电子设备,所述方法包括:
获取多个时间相邻的视频帧;
将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;
获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;
获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;
获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间 的时间损耗;
获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;
根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
本申请实施例提出一种用于图像处理的神经网络模型的训练装置,所述装置包括处理器以及与所述处理器相连接的存储器,所述存储器中存储有可由所述处理器执行的机器可读指令模块,所述机器可读指令模块包括:
输入获取模块,用于获取多个时间相邻的视频帧;
输出获取模块,用于将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;
损耗获取模块,用于获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;
模型调整模块,用于根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
本申请实施例还提出一种非易失性计算机可读存储介质,所述存储介质中存储有机器可读指令,所述机器可读指令可以由处理器执行以完成以下操作:
获取多个时间相邻的视频帧;
将所述多个视频帧分别经过神经网络模型处理以使所述神经网络 模型输出相对应的中间图像;
获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;
获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;
获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;
获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;
根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
附图说明
图1A是本申请实施例提供的用于图像处理的神经网络模型的训练方法的实施环境示意图;
图1B为本申请一个实施例中用于实现用于图像处理的神经网络模型的训练方法的电子设备的内部结构示意图;
图2为本申请一个实施例中用于图像处理的神经网络模型的训练方法的流程示意图;
图3为本申请另一个实施例中用于图像处理的神经网络模型的训练方法的流程示意图;
图4为本申请一个实施例中用于图像处理的神经网络模型的训练架构图;
图5为本申请一个实施例中用于图像处理的神经网络模型的训练装置的结构框图;
图6为本申请一个实施例中用于图像处理的神经网络模型的训练装置的另一结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
采用传统的神经网络模型训练方法训练出的用于图像处理的神经网络模型在对视频进行特征转换时,由于没有考虑各视频帧之间的时间一致性,因此会引入大量的闪烁噪声(flickering),导致视频特征转换的效果较差。
有鉴于此,本申请实施例提出了一种用于图像处理的神经网络模型的训练方法、装置和存储介质,在对神经网络模型进行训练时,将时间损耗与特征损耗协同作为反馈调整依据来调整神经网络模型,以训练得到可用于图像处理的神经网络模型。其中,在对神经网络模型进行训练时,通过将时间相邻的视频帧作为输入,以对时序靠前的视频帧所对应的中间图像,按照时序靠前的视频帧变化至时序靠后的视频帧的光流信息,得到时序靠后的视频帧预期所对应的中间图像,从而得到时间损耗。该时间损耗反映了时间相邻的视频帧各自对应的中间图像之间在时间一致性上的损耗。训练后的神经网络模型在对视频进行特征转换时,会考虑视频的各视频帧之间的时间一致性,极大地减少了特征转换过程中引入的闪烁噪声,从而提高了对视频进行特征转换时的转换效果。同时,将神经网络模型计算与电子设备处理器能力结合在一起来处理视频图像,提高了处理器计算速度且不必牺牲视频图像特征转换效果,从而产生更优的用于图像处理的神经网络模型。
图1A是本申请实施例提供的用于图像处理的神经网络模型的训练 方法的实施环境示意图。其中,电子设备1集成有本申请任一实施例提供的用于图像处理的神经网络模型的训练装置11,该用于图像处理的神经网络模型的训练装置11用于实现本申请任一实施例提供的用于图像处理的神经网络模型的训练方法。该电子设备1与用户终端2之间通过网络3连接,所述网络3可以是有线网络,也可以是无线网络。
图1B为本申请一个实施例中用于实现用于图像处理的神经网络模型的训练方法的电子设备的内部结构示意图。参照图1B,该电子设备包括通过***总线101连接的处理器102、非易失性存储介质103和内存储器104。其中,电子设备的非易失性存储介质103存储有操作***1031,还存储有一种用于图像处理的神经网络模型的训练装置1032,该用于图像处理的神经网络模型的训练装置1032用于实现一种用于图像处理的神经网络模型的训练方法。电子设备的处理器102用于提供计算和控制能力,支撑整个电子设备的运行。电子设备中的内存储器104为非易失性存储介质103中的用于图像处理的神经网络模型的训练装置的运行提供环境。该内存储器104中可存储有计算机可读指令,该计算机可读指令被处理器102执行时,可使得处理器102执行一种用于图像处理的神经网络模型的训练方法。该电子设备可以是终端,也可以是服务器。终端可以是个人计算机或者移动电子设备,移动电子设备包括手机、平板电脑、个人数字助理或者穿戴式设备等中的至少一种。服务器可以用独立的服务器或者是多个物理服务器组成的服务器集群来实现。本领域技术人员可以理解,图1B中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的电子设备的限定,具体的电子设备可以包括比图1B中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
图2为本申请一个实施例中用于图像处理的神经网络模型的训练方法的流程示意图。本实施例主要以该方法应用于上述图1B中的电子设备 来举例说明。参照图2,该用于图像处理的神经网络模型的训练方法具体包括如下步骤:
步骤S202,获取多个时间相邻的视频帧。
具体地,视频是指可分割为按时间顺序排列的静态图像序列的数据。将视频分割得到的静态图像可作为视频帧。时间相邻的视频帧是指按时序排列的视频帧中相邻的视频帧。获取的时间相邻的视频帧,具体可以是两个或多于两个且时间相邻的视频帧。比如,若按时序排列的视频帧为p1,p2,p3,p4……,则p1和p2为时间相邻的视频帧,p1,p2和p3也是时间相邻的视频帧。
在本申请一个实施例中,电子设备中设置有训练样本集,在训练样本集中存储着多组时间相邻的视频帧,电子设备可从训练样本集中获取任意一组时间相邻的视频帧。训练样本集中的时间相邻的视频帧可以是由电子设备根据从互联网上获取的视频分割得到,也可以是由电子设备根据通过该电子设备包括的摄像设备录制的视频分割得到。
在本申请一个实施例中,电子设备中可设置多个训练样本集,每个训练样本集都设置有对应的训练样本集标识。用户通过电子设备可以访问训练样本集,并通过电子设备选择用于进行训练的训练样本集。电子设备可检测用户触发的携带有训练样本集标识的选择指令,电子设备提取选择指令中的训练样本集标识,从训练样本集标识对应的训练样本集中获取时间相邻的视频帧。
步骤S204,将多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像。
其中,神经网络模型是指由多层互相连接而形成的复杂网络模型。在本实施例中,电子设备可对一个神经网络模型进行训练,训练结束后得到的神经网络模型可用于图像处理。神经网络模型可包括多层特征转换层,每层特征转换层都有对应的非线性变化算子,每层的非线性变化 算子可以是多个,每层特征转换层中的一个非线性变化算子对输入的图像进行非线性变化,得到特征图(feature map)作为运算结果。每个特征转换层接收前一层的运算结果,经过自身的运算,对下一层输出本层的运算结果。
具体地,电子设备在获取到时间相邻的视频帧之后,将时间相邻的视频帧分别输入神经网络模型,依次通过神经网络模型的各特征转换层。在每一层特征转换层上,电子设备利用该特征转换层对应的非线性变化算子,对上一层输出的特征图中包括的像素点对应的像素值进行非线性变化,并输出当前特征转换层上的特征图。其中,如果当前特征转换层为第一级特征转换层,则上一层输出的特征图为输入的视频帧。像素点对应的像素值具体可以为像素点的RGB(Red Green Blue)三通道颜色值。
举例说明,在本申请一个实施例中,需训练的神经网络模型具体可包括3个卷积层、5个残差模块、2个反卷积层和1个卷积层。电子设备将视频帧输入神经网络模型后,首先经过卷积层,该卷积层对应的各卷积核对输入的视频帧对应的像素值矩阵进行卷积操作,得到与该卷积层中各卷积核各自对应的像素值矩阵,亦即特征图,再将得到的各特征图共同作为下一层卷积层的输入,逐层进行非线性变化,直至最后一层卷积层输出相应卷积核数量的特征图,再按照各特征图对应的偏置项对各特征图中对应的像素位置的像素值进行运算,合成一个特征图作为输出的中间图像。
电子设备可设置在其中一层卷积层的卷积操作后进行下采样操作。下采样的方式具体可以是均值采样,或者极值采样。比如,下采样的方式为对2*2像素区域进行均值采样,那么其中一个2*2像素区域对应的像素值矩阵为[1,2,3,4],那么下采样得到的像素值为:(1+2+3+4)/4=2.5。下采样操作后得到的特征图的分辨率减小为输入的视频帧分辨率的1/4。 进一步地,电子设备需在反卷积层的反卷积操作后设置与在前的下采样操作相应的上采样操作,使得上采样操作后得到的特征图的分辨率增大为上采样操作前的特征图的分辨率的4倍,以保证输出的中间图像与输入的视频帧的分辨率一致。
其中,神经网络模型中包括的层的个数以及层的类型可自定义调整,也可根据后续的训练结果相应调整。但需满足输入神经网络模型的图像的分辨率与神经网络模型输出的图像的分辨率一致。
步骤S206,获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息。
其中,光流可表示图像中灰度模式的运动速度。图像中按照空间位置排列的所有光流组成光流场。光流场表征了图像中像素点的变化情况,可用来确定图像间相应像素点的运动信息。
在本申请实施例中,时序靠前的视频帧,是指时间相邻的视频帧中时间戳较早的视频帧;时序靠后的视频帧,则是指时间相邻的视频帧中时间戳较晚的视频帧。比如时间相邻的视频帧按时序排列依次为x1,x2和x3,则x1相对于x2和x3为时序靠前的视频帧;x2相对于x1为时序靠后的视频帧,x2相对于x3为时序靠前的视频帧。
在本申请实施例中,时序靠前的视频帧变化至时序靠后的视频帧的光流信息,可由时序靠前的视频帧与时序靠后的视频帧之间的光流场表示。在本实施例中,用于计算光流信息的方式具体可以是根据光流约束方程得到的基于微分的光流算法、基于区域匹配的光流算法、基于能量的光流算法、基于相位的光流算法和神经动力学光流算法等中的任意一种,本申请实施例对此不做具体限定。
具体地,电子设备可按照用于计算光流信息的方式计算时序靠前的视频帧变化至时序靠后的视频帧的光流信息,得到时序靠前的视频帧中每个像素点相应的于时序靠后的视频帧中相应的像素点的光流。电子设 备也可从时序靠前的视频帧中选取特征点,采用稀疏光流计算方式,计算选取的特征点相应的光流。比如,时序靠前的视频帧中像素点A的位置为(x1,y1),时序靠后的视频帧中像素点A的位置为(x2,y2),那么像素点A的速度矢量
Figure PCTCN2018075958-appb-000001
时序靠前的视频帧中各像素点变化至时序靠后的视频帧中相应像素点的速度矢量形成的矢量场,即为时序靠前的视频帧变化至时序靠后的视频帧的光流场。
在本申请一个实施例中,当时间相邻的视频帧是多于两个且时间相邻的视频帧时,电子设备可计算时间相邻的视频帧中相邻的两帧视频帧之间的光流信息,也可以计算时间相邻的视频帧中不相邻的两帧视频帧之间的光流信息。比如,时间相邻的视频帧按时序排列依次为x1,x2和x3,电子设备可计算x1与x2之间的光流信息,x2与x3之间的光流信息,还可以计算x1与x3之间的光流信息。
在本申请一个实施例中,电子设备在按照用于计算光流信息的方式计算时序靠前的视频帧变化至时序靠后的视频帧的光流信息时,也可确定计算得到的光流信息的置信度。光流信息的置信度与光流信息一一对应,用于表示相应的光流信息的可信程度。光流信息的置信度越高,表示计算得到的光流信息所表征的图像中像素点的运动信息越准确。
步骤S208,获取时序靠前的视频帧所对应的中间图像按光流信息变化后的图像。
具体地,电子设备可将时序靠前的视频帧所对应的中间图像中包括的像素点,按照时序靠前的视频帧变化至时序靠后的视频帧的光流信息进行变化,得到变化后的像素点形成的图像,亦即时序靠后的视频帧预期所对应的中间图像的像素值分布。
在本申请一个实施例中,当时间相邻的视频帧是多于两个且时间相邻的视频帧时,电子设备可按照时间相邻的视频帧中相邻的两帧视频帧之间的光流信息,对相邻的两帧视频帧中时序靠前的视频帧所对应的中 间图像按照该光流信息得到相邻的两帧视频帧中时序靠后的视频帧预期所对应的中间图像。比如,时间相邻的视频帧按时序排列依次为x1,x2和x3,神经网络模型输出的x1、x2和x3的中间图像相应排序依次为y1,y2和y3。x1变化至x2的光流信息为g1,x2变化至x3的光流信息为g2,电子设备可将y1按照g1变化为z2,将z2按照g2变化为z3;其中,z2为x2预期对应的中间图像,z3为x3预期对应的中间图像。
在本申请一个实施例中,电子设备也可按照时间相邻的视频帧中不相邻的两帧视频帧之间的光流信息,对不相邻的两帧视频帧中时序靠前的视频帧所对应的中间图像按照该光流信息得到不相邻的两帧视频帧中时序靠后的视频帧预期所对应的中间图像。比如,时间相邻的视频帧按时序排列依次为x1,x2和x3,神经网络模型输出的x1、x2和x3的中间图像相应排序依次为y1,y2和y3。x1变化至x3的光流信息为g3,电子设备可将y1按照g3变化为z3,z3为x3预期对应的中间图像。
在本申请一个实施例中,电子设备也可在将时序靠前的视频帧所对应的中间图像中包括的像素点按照相应的光流信息变化时,将光流信息的置信度作为权重,修正变化后的像素点形成的图像。
步骤S210,获取时序靠后的视频帧所对应的中间图像与步骤S208中获取的变化后的图像间的时间损耗。
其中,时间损耗可用于表征时间相邻的视频帧在时域上的变化,与时间相邻的视频帧通过神经网络模型处理后得到的图像之间在时域上的变化的差异。具体地,电子设备可将时序靠后的视频帧所对应的中间图像,与将时序靠前的视频帧所对应的中间图像按照时序靠前的视频帧变化至时序靠后的视频帧的光流信息变化后的图像进行比较,得到两者之间的差异,根据该差异确定时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
举例说明,假设时间相邻的视频帧的帧数为两帧,时序靠前的视频 帧为x t-1,时序靠后的视频帧为x t,且x t-1变化至x t的光流信息为G t。x t-1经过神经网络模型处理后输出的中间图像为y t-1,x t经过神经网络模型处理后输出的中间图像为y t。电子设备可将y t-1按照x t-1变化至x t的光流信息G t进行变化,得到z t,z t可作为预期的时序靠后的视频帧x t所对应的神经网络模型处理后输出的图像。电子设备可再比较y t与z t的差异,从而得到y t与z t间的时间损耗。
举例说明,假设时间相邻的视频帧按时序排列依次为x1,x2和x3,神经网络模型输出的x1、x2和x3对应的中间图像相应排序依次为y1,y2和y3。x1变化至x2的光流信息为g1,x2变化至x3的光流信息为g2,x1变化至x3的光流信息为g3。电子设备可将y1按照g1变化为z2,将z2按照g2变化为z3,将y1按照g3变化为z’3;z2为x2预期对应的中间图像,z3与z’3均为x3预期对应的中间图像,电子设备可比较y2与z2的差异,得到y2与z2之间的时间损耗;电子设备可比较y3与z3的差异,以及y3与z’3的差异,根据z3与z’3的权重得到y3与z3和z’3之间的时间损耗。
步骤S212,获取多个时间相邻的视频帧对应的中间图像与目标特征图像之间的特征损耗。
其中,神经网络模型用于对图像进行特征转换时需转换至的图像特征即为目标特征图像所对应的图像特征。特征损耗为神经网络模型输出的中间图像所对应的图像特征,与目标特征图像所对应的图像特征之间的差异。图像特征具体可以是图像颜色特征、图像光影特征或者图像风格特征等。相应地,目标特征图像具体可以是目标颜色特征图像、目标光影特征图像或者目标风格特征图像等;中间图像与目标特征图像的特征损耗具体可以是颜色特征损耗、光影特征损耗或者风格特征损耗等。
具体地,电子设备可先确定需训练至的图像特征,并获取符合该图像特征的图像作为目标特征图像。电子设备可再采用训练完成的用于提 取图像特征的神经网络模型分别提取中间图像与目标特征图像对应的图像特征,再将中间图像对应的图像特征与目标特征图像对应的图像特征进行比较,得到两者之间的差异,根据该差异确定中间图像与目标特征图像之间的特征损耗。
举例说明,假设神经网络模型用于对图像进行图像风格特征转换,目标风格特征图像为S,时间相邻的视频帧的帧数为两帧,时序靠前的视频帧为x t-1,时序靠后的视频帧为x t。x t-1经过神经网络模型处理后输出的中间图像为y t-1,x t经过神经网络模型处理后输出的中间图像为y t。电子设备可分别比较y t-1与S的差异以及y t与S的差异,从而得到y t-1与S之间的风格特征损耗以及y t与S之间的风格特征损耗。
步骤S214,根据时间损耗和特征损耗调整神经网络模型,返回获取多个时间相邻的视频帧的步骤S202继续训练,直至神经网络模型满足训练结束条件。
具体地,训练神经网络模型的过程为确定需训练的神经网络模型中各特征转换层对应的非线性变化算子的过程。在确定各非线性变化算子时,电子设备可以先初始化需训练的神经网络模型中各特征转换层对应的非线性变化算子,并在后续的训练过程中,不断优化该初始化的非线性变化算子,并将优化得到的最优的非线性变化算子作为训练好的神经网络模型的非线性变化算子。
在本申请一个实施例中,电子设备可根据时间损耗构建时间域损失函数,根据特征损耗构建空间域损失函数,将时间域损失函数与空间域损失函数合并得到混合损失函数,再计算混合损失函数随神经网络模型中各特征转换层对应的非线性变化算子的变化率。电子设备可根据计算得到的变化率调整神经网络模型中各特征转换层对应的非线性变化算子,使得计算得到的变化率变小,以使得神经网络模型得到训练优化。
在本申请一个实施例中,训练结束条件可以是对神经网络模型的训 练次数达到预设训练次数。电子设备可在对神经网络模型进行训练时,对训练次数进行计数,当计数达到预设训练次数时,电子设备可判定神经网络模型满足训练结束条件,并结束对神经网络模型的训练。
在本申请一个实施例中,训练结束条件也可以是混合损失函数满足收敛条件。电子设备可在对神经网络模型进行训练时,对每次训练完成后计算得到的混合损失函数随神经网络模型中各特征转换层对应的非线性变化算子的变化率进行记录,当计算得到的该变化率逐渐靠近于某一特定数值时,电子设备可判定神经网络模型满足训练结束条件,并结束对神经网络模型的训练。
上述用于图像处理的神经网络模型的训练方法,在对神经网络模型进行训练时,将时间损耗与特征损耗协同作为反馈调整依据来调整神经网络模型,以训练得到可用于图像处理的神经网络模型。其中,在对神经网络模型进行训练时,通过将时间相邻的视频帧作为输入,以对时序靠前的视频帧所对应的中间图像按照时序靠前的视频帧变化至时序靠后的视频帧的光流信息,得到时序靠后的视频帧预期所对应的中间图像,从而得到时间损耗。该时间损耗反映了时间相邻的视频帧各自对应的中间图像之间在时间一致性上的损耗。训练后的神经网络模型在对视频进行特征转换时,会考虑视频的各视频帧之间的时间一致性,极大地减少了特征转换过程中引入的闪烁噪声,从而提高了对视频进行特征转换时的转换效果。同时,将神经网络模型计算与电子设备处理器能力结合在一起来处理视频图像,提高了处理器计算速度且不必牺牲视频图像特征转换效果,从而产生更优的用于图像处理的神经网络模型。
在本申请一个实施例中,该用于图像处理的神经网络模型的训练方法中根据时间损耗和特征损耗调整神经网络模型具体包括:获取中间图像与中间图像对应的输入的视频帧之间的内容损耗;根据时间损耗、特征损耗和内容损耗,生成训练代价;按照训练代价调整神经网络模型。
其中,内容损耗是指通过神经网络模型输出的中间图像与相应的输入的视频帧之间在图像内容上的差异。具体地,电子设备可采用训练完成的用于提取图像内容特征的神经网络模型分别提取中间图像对应的图像内容特征以及中间图像对应的输入的视频帧对应的图像内容特征,再将中间图像对应的图像内容特征与相应的输入的视频帧对应的图像内容特征进行比较,得到两者之间的差异,根据该差异确定中间图像与相应的视频帧之间的内容损耗。
在本申请一个实施例中,电子设备可根据时间损耗构建时间域损失函数,再根据特征损耗和内容损耗联合构建空间域损失函数,并生成与时间域损失函数正相关且与空间域损失函数正相关的训练代价。电子设备可再计算训练代价随神经网络模型中各特征转换层对应的非线性变化算子的变化率,并根据计算得到的变化率调整神经网络模型中各特征转换层对应的非线性变化算子,使得计算得到的变化率变小,以使得神经网络模型得到训练优化。
在本申请一个实施例中,电子设备还可对神经网络模型输出的中间图像进行去噪处理。具体地,电子设备可基于实现全变分(Total Variation,TV)的去噪算法,确定用于对中间图像的边缘像素点进行去噪处理的全变分最小化项,并将该全变分最小化项联合特征损耗和内容损耗联合构建空间域损失函数,以进行神经网络模型训练。这种采用全变分最小化项来对图像进行去噪处理的方式提高了神经网络模型对视频进行特征转换时的转换效果。
在本实施例中,在对神经网络模型进行训练时,将时间损耗、特征损耗与内容损耗协同作为反馈调整依据来调整神经网络模型,以训练得到可用于图像处理的神经网络模型,从时间、内容与特征三个维度保证了图像特征转换的准确性,提高了训练得到的神经网络模型对视频进行特征转换时的转换效果。
在本申请一个实施例中,步骤S210具体包括:将时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;根据差异分布图,确定时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
具体地,电子设备将时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减得到的差异分布图,具体可以是像素值差异矩阵。电子设备可对差异分布图进行降维运算得到时间损耗数值。电子设备在首次计算时间损耗时选定采用的降维运算方式后,后续的时间损耗计算均采用选定的该降维运算方式。其中,降维运算具体可以是均值降维或者极值降维。比如,像素值差异矩阵为[1,2,3,4],那么均值降维运算得到的时间损耗为:(1+2+3+4)/4=2.5。
在本实施例中,通过时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的像素值的差异,计算时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗,使得时间损耗的计算更为准确。
在本申请一个实施例中,该用于图像处理的神经网络模型的训练方法中获取中间图像与中间图像对应的输入的视频帧之间的内容损耗的步骤包括:将视频帧与相应的中间图像输入评价网络模型;获取评价网络模型所包括的层输出的,与视频帧对应的特征图和与中间图像对应的特征图;根据中间图像所对应的特征图和相应的视频帧所对应的特征图,确定中间图像与相应的视频帧之间的内容损耗。
其中,评价网络模型用于提取输入图像的图像特征。在本实施例中,评价网络模型具体可以是Alexnet网络模型、VGG(Visual Geometry Group视觉几何组)网络模型或者GoogLeNet网络。评价网络模型所包括的层对应有多个特征提取因子,每个特征提取因子提取不同的特征。特征图是通过评价网络模型中的层的变化算子对输入的图像处理得到 的图像处理结果,图像处理结果为图像特征矩阵,该图像特征矩阵由通过变化算子对输入的图像矩阵进行处理得到的响应值构成。
具体地,电子设备将视频帧与相应的中间图像输入评价网络模型后,评价网络模型可得到与输入的视频帧对应的像素值矩阵以及与相应的中间图像对应的像素值矩阵。评价网络模型所包括的层按照该层所对应的特征提取因子,对输入的视频帧或中间图像对应的像素值矩阵进行操作,得到相应的响应值以构成特征图。评价网络模型中不同的层提取的特征不同。电子设备可事先设置将评价网络模型中提取图像内容特征的层输出的特征图作为进行内容损耗计算的特征图。其中,评价网络模型中提取图像内容特征的层具体可以是一层,也可以是多层。
电子设备在获取中间图像所对应的特征图和中间图像对应的输入的视频帧所对应的特征图后,将中间图像所对应的特征图和相应的视频帧所对应的特征图中对应的像素位置的像素值相减,得到两者之间的内容差异矩阵,再对内容差异矩阵进行降维运算得到内容损耗。
在本实施例中,通过评价网络模型来提取特征转换前的视频帧与特征转换后的中间图像的图像内容特征,利用输出的提取了图像内容特征的特征图来计算相应输入的图像之间的内容损耗,使得内容损耗的计算更为准确。
在本申请一个实施例中,步骤S212具体包括:将中间图像与目标特征图像输入评价网络模型;获取评价网络模型所包括的层输出的,与中间图像对应的特征图和与目标特征图像对应的特征图;根据中间图像所对应的特征图和目标特征图像所对应的特征图,确定中间图像与目标特征图像之间的特征损耗。
具体地,电子设备可事先设置将评价网络模型中提取图像特征的层输出的特征图作为进行特征损耗计算的特征图。其中,评价网络模型中提取图像特征的层具体可以是一层,也可以是多层。在本实施例中,通 过评价网络模型来提取目标特征图像与特征转换后的中间图像的图像特征,利用评价网络模型输出的提取了图像特征的特征图来计算相应输入的图像之间的特征损耗,使得特征损耗的计算更为准确。
在本申请一个实施例中,该用于图像处理的神经网络模型的训练方法中根据中间图像所对应的特征图和目标特征图像所对应的特征图,确定中间图像与目标特征图像之间的特征损耗的步骤具体包括:根据中间图像所对应的特征图,确定中间图像所对应的特征矩阵;根据目标特征图像所对应的特征图,确定目标特征图像所对应的特征矩阵;将中间图像所对应的特征矩阵和目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;根据特征差异矩阵,确定中间图像与目标特征图像间的特征损耗。
在本申请一个实施例中,神经网络模型用于对图像进行图像风格特征转换,中间图像所对应的特征矩阵具体可以是风格特征矩阵。风格特征矩阵是反映图像风格特征的矩阵。风格特征矩阵具体可以是格拉姆矩阵(Gram Matrix)。电子设备可通过将中间图像所对应的特征图求取内积得到相应的格拉姆矩阵作为中间图像所对应的风格特征矩阵,将目标风格图像所对应的特征图求取内积得到相应的格拉姆矩阵作为目标风格图像所对应的风格特征矩阵。电子设备可再将中间图像所对应的风格特征矩阵和目标风格图像所对应的风格特征矩阵中对应位置的数值相减,得到风格差异特征矩阵;再对风格差异特征矩阵进行降维运算得到风格特征损耗。
在本实施例中,采用了可反映图像特征的特征矩阵具体计算特征转换得到的图像与目标特征图像间的特征损耗,使得特征损耗的计算更为准确。
举例说明,电子设备可选取VGG-19网络模型作为评价网络模型,该网络模型包括16层卷积层和5层池化层。试验表明该模型的第四层 卷积层提取的特征能体现图像内容特征,该模型的第一、二、三、四层卷积层提取的特征能体现图像风格特征。电子设备可获取第四层卷积层输出的中间图像所对应的特征图和该中间图像对应的输入的视频帧所对应的特征图,并基于获取的特征图计算中间图像与相应的视频帧之间的内容损耗。电子设备可获取第一、二、三、四层卷积层输出的中间图像所对应的特征图和所述中间图像对应的输入的视频帧所对应的特征图,并基于获取的特征图计算中间图像与相应的视频帧之间的风格特征损耗。
在本申请一个实施例中,该用于图像处理的神经网络模型的训练方法中按照训练代价调整神经网络模型,包括:按照神经网络模型所包括的层的顺序,逆序确定训练代价随各层所对应的非线性变化算子的变化率;按逆序调整神经网络模型所包括的层所对应的非线性变化算子,使得训练代价随相应调整的层所对应的非线性变化算子的变化率减小。
具体地,图像被输入神经网络模型后,每经过一层则进行一次非线性变化,并将输出的运算结果作为下一层的输入。电子设备可按照神经网络模型所包括的层的顺序,从神经网络模型所包括的最后一层起,确定训练代价随当前层所对应的非线性变化算子的变化率,再依次逆序确定训练代价随各层所对应的非线性变化算子的变化率。电子设备可再按逆序依次调整神经网络模型所包括的层所对应的非线性变化算子,使得训练代价随相应调整的层所对应的非线性变化算子的变化率减小。
举例说明,假设训练代价为L,按照神经网络模型所包括的层的顺序,逆序第一层所对应的非线性变化算子为z,则L随z的变化率为
Figure PCTCN2018075958-appb-000002
逆序第二层所对应的非线性变化算子为b,则L随b的变化率为
Figure PCTCN2018075958-appb-000003
逆序第三层所对应的非线性变化算子为c,则L随c的变化率为
Figure PCTCN2018075958-appb-000004
在求解变化率时,链式求导会一层一层的将梯度传导到在前的层。在逆序求解变化率至神经网络模型所包括的第 一层时,电子设备可逆序依次调整非线性变化算子z、b、c至神经网络模型所包括的第一层(即逆序最后一层)对应的非线性变化算子,使得逆序最后一层求得的变化率减小。
在本申请一个实施例中,训练代价具体可表示为:
Figure PCTCN2018075958-appb-000005
其中,L hybrid表示训练代价,L spatial(x i,y i,s)表示空间域损失函数;L temporal(y t,y t-1)表示时间域损失函数,由时间损耗生成,λ为时间域损失函数相应的权重。空间域损失函数具体可表示为:
Figure PCTCN2018075958-appb-000006
其中,l表示评价网络模型中提取图像特征的层;
Figure PCTCN2018075958-appb-000007
表示输入神经网络模型的图像与神经网络模型输出的图像之间的内容损耗;
Figure PCTCN2018075958-appb-000008
表示神经网络模型输出的图像与目标特征图像之间的特征损耗;R tv表示全变分最小化项;α、β和γ为各项损耗相应的权重。比如,α的取值可为1,β的取值可为1,γ的取值可为10 4
在本实施例中,通过反向传播方式求解训练代价随神经网络模型各层所对应的非线性变化算子的变化率,通过调节神经网络模型各层所对应的非线性变化算子使得计算得到的变化率减小,以训练神经网络模型,使得训练得到的神经网络模型用于进行图像转换时的效果更优。
如图3所示,在本申请一个具体的实施例中,用于图像处理的神经网络模型的训练方法具体包括以下步骤:
步骤S302,获取多个时间相邻的视频帧。
步骤S304,将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像。
步骤S306,获取时序靠前的视频帧变化至时序靠后的视频帧的光流信息。
步骤S308,获取时序靠前的视频帧所对应的中间图像按光流信息变 化后的图像。
步骤S310,将时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;根据差异分布图,确定时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
步骤S312,将中间图像与目标特征图像输入评价网络模型;获取评价网络模型所包括的层输出的,与中间图像对应的特征图和与目标特征图像对应的特征图;根据中间图像所对应的特征图,确定中间图像所对应的特征矩阵;根据目标特征图像所对应的特征图,确定目标特征图像所对应的特征矩阵;将中间图像所对应的特征矩阵和目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;根据特征差异矩阵,确定中间图像与目标特征图像间的特征损耗。
步骤S314,将视频帧与该视频帧对应的中间图像输入评价网络模型;获取评价网络模型所包括的层输出的,与视频帧对应的特征图和与中间图像对应的特征图;根据中间图像所对应的特征图和相应的视频帧所对应的特征图,确定中间图像与相应的视频帧之间的内容损耗。
步骤S316,根据时间损耗、特征损耗和内容损耗,生成训练代价。
步骤S318,按照神经网络模型所包括的层的顺序,逆序确定训练代价随各层所对应的非线性变化算子的变化率;按逆序调整神经网络模型所包括的层所对应的非线性变化算子,使得训练代价随相应调整的层所对应的非线性变化算子的变化率减小。
步骤S320,判断神经网络模型是否满足训练结束条件;若神经网络模型满足训练结束条件,则跳转至步骤S322;若神经网络模型不满足训练结束条件,则跳转至步骤S302。
步骤S322,结束训练神经网络模型。
在本实施例中,在对用于图像处理的神经网络模型进行训练时,将时间损耗、特征损耗与内容损耗协同作为反馈调整依据来调整神经网络 模型,在时间、特征与内容三个维度来训练神经网络模型,提高了神经网络模型的训练效果。
图4示出了本申请一个实施例中用于图像处理的神经网络模型的训练架构图。参考图4,本实施例中神经网络模型由3个卷积层,5个残差模块,2个反卷积层和1个卷积层组成,电子设备可将时序靠前的视频帧x t-1,时序靠后的视频帧x t分别输入神经网络模型中,得到神经网络模型输出的中间图像为y t-1和y t。电子设备可按照x t-1与x t之间的光流信息,得到y t-1与y t的时间域损失函数;再将x t-1、x t、y t-1、y t和目标特征图像S输入评价网络模型,通过评价网络模型所包括的层输出的特征图,得到x t-1与y t-1、x t与y t之间的内容损耗,y t-1与S、y t与S之间的特征损耗,从而得到空间域损失函数。
在本申请一个实施例中,电子设备按照该用于图像处理的神经网络模型的训练方法对神经网络模型训练完成后,可将该神经网络模型用于进行视频特征转换。电子设备可将需要进行特征转换的视频分割为时间相邻的视频帧,依次将分割得到的视频帧输入训练完成的神经网络模型,经神经网络模型处理后得到每帧视频帧对应的特征转换后的输出图像,再将各输出图像按照所对应的输入视频帧的时间顺序合并,得到特征转换后的视频。其中,神经网络模型可同时对多帧视频帧进行特征转换。
如图5所示,在本申请一个实施例中,提供一种用于图像处理的神经网络模型的训练装置500,该装置具体包括:输入获取模块501、输出获取模块502、损耗获取模块503和模型调整模块504。
输入获取模块501,用于获取多个时间相邻的视频帧。
输出获取模块502,用于将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像。
损耗获取模块503,用于获取所述多个时间相邻的视频帧中时序靠 前的视频帧变化至时序靠后的视频帧的光流信息;获取时序靠前的视频帧所对应的中间图像按光流信息变化后的图像;获取时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;获取多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗。
模型调整模块504,用于根据时间损耗和特征损耗调整神经网络模型,返回获取多个时间相邻的视频帧的步骤继续训练,直至神经网络模型满足训练结束条件。
在本申请一个实施例中,模型调整模块504还用于获取多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗;根据时间损耗、特征损耗和内容损耗,生成训练代价;按照训练代价调整神经网络模型。
在本实施例中,在对神经网络模型进行训练时,将时间损耗、特征损耗与内容损耗协同作为反馈调整依据来调整神经网络模型,以训练得到可用于图像处理的神经网络模型,从时间、内容与特征三个维度保证了图像特征转换的准确性,提高了训练得到的神经网络模型对视频进行特征转换时的转换效果。
在本申请一个实施例中,模型调整模块504还用于将所述中间图像与所述中间图像对应的视频帧输入评价网络模型;获取评价网络模型所包括的层输出的,与视频帧对应的特征图和与中间图像对应的特征图;根据中间图像所对应的特征图和相应的视频帧所对应的特征图,确定中间图像与相应的视频帧之间的内容损耗。
在本实施例中,通过评价网络模型来提取特征转换前的视频帧与特征转换后的中间图像的图像内容特征,利用输出的提取了图像内容特征的特征图来计算相应输入的图像之间的内容损耗,使得内容损耗的计算更为准确。
在本申请一个实施例中,模型调整模块504还用于按照神经网络模型所包括的层的顺序,逆序确定训练代价随各层所对应的非线性变化算子的变化率;按逆序调整神经网络模型所包括的层所对应的非线性变化算子,使得训练代价随相应调整的层所对应的非线性变化算子的变化率减小。
在本实施例中,通过反向传播方式求解训练代价随神经网络模型各层所对应的非线性变化算子的变化率,通过调节神经网络模型各层所对应的非线性变化算子使得计算得到的变化率减小,以训练神经网络模型,使得训练得到的神经网络模型用于进行图像转换时的效果更优。
在本申请一个实施例中,损耗获取模块503还用于将时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;根据差异分布图,确定时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
在本实施例中,通过时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的像素值的差异,计算时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗,使得时间损耗的计算更为准确。
在本申请一个实施例中,损耗获取模块503还用于将中间图像与目标特征图像输入评价网络模型;获取评价网络模型所包括的层输出的,与中间图像对应的特征图和与目标特征图像对应的特征图;根据中间图像所对应的特征图和目标特征图像所对应的特征图,确定中间图像与目标特征图像间的特征损耗。
在本实施例中,通过评价网络模型来提取目标特征图像与特征转换后的中间图像的图像特征,利用输出的提取了图像特征的特征图来计算相应输入的图像之间的特征损耗,使得特征损耗的计算更为准确。
在本申请一个实施例中,损耗获取模块503还用于根据中间图像所 对应的特征图,确定中间图像所对应的特征矩阵;根据目标特征图像所对应的特征图,确定目标特征图像所对应的特征矩阵;将中间图像所对应的特征矩阵和目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;根据特征差异矩阵,确定中间图像与目标特征图像间的特征损耗。
在本实施例中,采用了可反映图像特征的特征矩阵具体计算特征转换得到的图像与目标特征图像之间的特征损耗,使得特征损耗的计算更为准确。
图6是本申请一个实施例提供的一种用于图像处理的神经网络模型的训练装置的另一结构框图。如图6所示,该装置包括:处理器610,与所述处理器610通过总线620相连接的存储器630。所述存储器630中存储有可由所述处理器610执行的机器可读指令模块。所述机器可读指令模块包括:输入获取模块601、输出获取模块602、损耗获取模块603和模型调整模块604。
输入获取模块601,用于获取多个时间相邻的视频帧。
输出获取模块602,用于将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像。
损耗获取模块603,用于获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;获取时序靠前的视频帧所对应的中间图像按光流信息变化后的图像;获取时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;获取多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗。
模型调整模块604,用于根据时间损耗和特征损耗调整神经网络模型,返回获取多个时间相邻的视频帧的步骤继续训练,直至神经网络模型满足训练结束条件。
在本申请实施例中,上述输入获取模块601、输出获取模块602、 损耗获取模块603和模型调整模块604的具体功能与前述的输入获取模块501、输出获取模块502、损耗获取模块503和模型调整模块504相同,在此不再赘述。
上述用于图像处理的神经网络模型的训练装置,在对神经网络模型进行训练时,将时间损耗与特征损耗协同作为反馈调整依据来调整神经网络模型,以训练得到可用于图像处理的神经网络模型。其中,在对神经网络模型进行训练时,通过将时间相邻的视频帧作为输入,以对时序靠前的视频帧所对应的中间图像,按照时序靠前的视频帧变化至时序靠后的视频帧的光流信息,得到时序靠后的视频帧预期所对应的中间图像,从而得到时间损耗。该时间损耗反映了时间相邻的视频帧各自对应的中间图像之间在时间一致性上的损耗。训练后的神经网络模型在对视频进行特征转换时,会考虑视频的各视频帧之间的时间一致性,极大地减少了特征转换过程中引入的闪烁噪声,从而提高了对视频进行特征转换时的转换效果。同时,将神经网络模型计算与电子设备处理器能力结合在一起来处理视频图像,提高了处理器计算速度且不必牺牲视频图像特征转换效果,从而产生更优的用于图像处理的神经网络模型。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体 和详细,但并不能因此而理解为对本申请范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请的保护范围应以所附权利要求为准。

Claims (21)

  1. 一种用于图像处理的神经网络模型的训练方法,应用于电子设备,所述方法包括:
    获取多个时间相邻的视频帧;
    将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;
    获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;
    获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;
    获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;
    获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;
    根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
  2. 根据权利要求1所述的方法,所述根据所述时间损耗和所述特征损耗调整所述神经网络模型,包括:
    获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗;
    根据所述时间损耗、所述特征损耗和所述内容损耗,生成训练代价;
    按照所述训练代价调整所述神经网络模型。
  3. 根据权利要求2所述的方法,所述获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗,包括:
    将所述中间图像与所述中间图像对应的视频帧输入评价网络模型;
    获取所述评价网络模型所包括的层输出的,与所述视频帧对应的特征图和与所述中间图像对应的特征图;
    根据所述中间图像所对应的特征图和所述视频帧所对应的特征图,确定所述中间图像与相应的视频帧之间的内容损耗。
  4. 根据权利要求2所述的方法,所述按照所述训练代价调整所述神经网络模型,包括:
    按照所述神经网络模型所包括的层的顺序,逆序确定所述训练代价随各所述层所对应的非线性变化算子的变化率;
    按所述逆序调整所述神经网络模型所包括的层所对应的非线性变化算子,使得所述训练代价随相应调整的所述层所对应的非线性变化算子的变化率减小。
  5. 根据权利要求1至4中任一项所述的方法,所述获取时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗,包括:
    将所述时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;
    根据所述差异分布图,确定所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
  6. 根据权利要求1至4中任一项所述的方法,所述获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗,包括:
    将所述中间图像与目标特征图像输入评价网络模型;
    获取所述评价网络模型所包括的层输出的,与所述中间图像对应的特征图和与所述目标特征图像对应的特征图;
    根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标特征图像间的特征损耗。
  7. 根据权利要求6所述的方法,所述根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标 特征图像间的特征损耗,包括:
    根据所述中间图像所对应的特征图,确定所述中间图像所对应的特征矩阵;
    根据所述目标特征图像所对应的特征图,确定所述目标特征图像所对应的特征矩阵;
    将所述中间图像所对应的特征矩阵和所述目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;
    根据所述特征差异矩阵,确定所述中间图像与所述目标特征图像间的特征损耗。
  8. 一种用于图像处理的神经网络模型的训练装置,所述装置包括处理器以及与所述处理器相连接的存储器,所述存储器中存储有可由所述处理器执行的机器可读指令模块,所述机器可读指令模块包括:
    输入获取模块,用于获取多个时间相邻的视频帧;
    输出获取模块,用于将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;
    损耗获取模块,用于获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;
    模型调整模块,用于根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
  9. 根据权利要求8所述的装置,所述模型调整模块还用于获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容 损耗;根据所述时间损耗、所述特征损耗和所述内容损耗,生成训练代价;按照所述训练代价调整所述神经网络模型。
  10. 根据权利要求9所述的装置,,所述模型调整模块还用于将所述中间图像与所述中间图像对应的视频帧输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述视频帧对应的特征图和与所述中间图像对应的特征图;根据所述中间图像所对应的特征图和所述视频帧所对应的特征图,确定所述中间图像与相应的视频帧之间的内容损耗。
  11. 根据权利要求9所述的装置,所述模型调整模块还用于按照所述神经网络模型所包括的层的顺序,逆序确定所述训练代价随各所述层所对应的非线性变化算子的变化率;按所述逆序调整所述神经网络模型所包括的层所对应的非线性变化算子,使得所述训练代价随相应调整的所述层所对应的非线性变化算子的变化率减小。
  12. 根据权利要求8至11中任一项所述的装置,所述损耗获取模块还用于将所述时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;根据所述差异分布图,确定所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
  13. 根据权利要求8至11中任一项所述的装置,所述损耗获取模块还用于将所述中间图像与目标特征图像输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述中间图像对应的特征图和与所述目标特征图像对应的特征图;根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标特征图像间的特征损耗。
  14. 根据权利要求13所述的装置,所述损耗获取模块还用于根据所述中间图像所对应的特征图,确定所述中间图像所对应的特征矩阵;根据所述目标特征图像所对应的特征图,确定所述目标特征图像所对应 的特征矩阵;将所述中间图像所对应的特征矩阵和所述目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;根据所述特征差异矩阵,确定所述中间图像与所述目标特征图像间的特征损耗。
  15. 一种非易失性计算机可读存储介质,所述存储介质中存储有机器可读指令,所述机器可读指令可以由处理器执行以完成以下操作:
    获取多个时间相邻的视频帧;
    将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;
    获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;
    获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;
    获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;
    获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;
    根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
  16. 根据权利要求15所述的存储介质,所述根据所述时间损耗和所述特征损耗调整所述神经网络模型,包括:
    获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗;
    根据所述时间损耗、所述特征损耗和所述内容损耗,生成训练代价;
    按照所述训练代价调整所述神经网络模型。
  17. 根据权利要求16所述的存储介质,所述获取所述多个时间相 邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗,包括:
    将所述中间图像与所述中间图像对应的视频帧输入评价网络模型;
    获取所述评价网络模型所包括的层输出的,与所述视频帧对应的特征图和与所述中间图像对应的特征图;
    根据所述中间图像所对应的特征图和所述视频帧所对应的特征图,确定所述中间图像与相应的视频帧之间的内容损耗。
  18. 根据权利要求16所述的存储介质,所述按照所述训练代价调整所述神经网络模型,包括:
    按照所述神经网络模型所包括的层的顺序,逆序确定所述训练代价随各所述层所对应的非线性变化算子的变化率;
    按所述逆序调整所述神经网络模型所包括的层所对应的非线性变化算子,使得所述训练代价随相应调整的所述层所对应的非线性变化算子的变化率减小。
  19. 根据权利要求15至18中任一项所述的存储介质,所述获取时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗,包括:
    将所述时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;
    根据所述差异分布图,确定所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
  20. 根据权利要求15至18中任一项所述的存储介质,所述获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗,包括:
    将所述中间图像与目标特征图像输入评价网络模型;
    获取所述评价网络模型所包括的层输出的,与所述中间图像对应的特征图和与所述目标特征图像对应的特征图;
    根据所述中间图像所对应的特征图和所述目标特征图像所对应的 特征图,确定所述中间图像与目标特征图像间的特征损耗。
  21. 根据权利要求20所述的存储介质,所述根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标特征图像间的特征损耗,包括:
    根据所述中间图像所对应的特征图,确定所述中间图像所对应的特征矩阵;
    根据所述目标特征图像所对应的特征图,确定所述目标特征图像所对应的特征矩阵;
    将所述中间图像所对应的特征矩阵和所述目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;
    根据所述特征差异矩阵,确定所述中间图像与所述目标特征图像间的特征损耗。
PCT/CN2018/075958 2017-03-08 2018-02-09 一种用于图像处理的神经网络模型的训练方法、装置和存储介质 WO2018161775A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020197021770A KR102281017B1 (ko) 2017-03-08 2018-02-09 이미지 처리를 위한 신경망 모델 훈련 방법, 장치 및 저장 매체
JP2019524446A JP6755395B2 (ja) 2017-03-08 2018-02-09 画像処理用のニューラルネットワークモデルのトレーニング方法、装置、及び記憶媒体
EP18764177.4A EP3540637B1 (en) 2017-03-08 2018-02-09 Neural network model training method, device and storage medium for image processing
US16/373,034 US10970600B2 (en) 2017-03-08 2019-04-02 Method and apparatus for training neural network model used for image processing, and storage medium
US17/187,473 US11610082B2 (en) 2017-03-08 2021-02-26 Method and apparatus for training neural network model used for image processing, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710136471.9 2017-03-08
CN201710136471.9A CN108304755B (zh) 2017-03-08 2017-03-08 用于图像处理的神经网络模型的训练方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/373,034 Continuation US10970600B2 (en) 2017-03-08 2019-04-02 Method and apparatus for training neural network model used for image processing, and storage medium

Publications (1)

Publication Number Publication Date
WO2018161775A1 true WO2018161775A1 (zh) 2018-09-13

Family

ID=62872021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075958 WO2018161775A1 (zh) 2017-03-08 2018-02-09 一种用于图像处理的神经网络模型的训练方法、装置和存储介质

Country Status (7)

Country Link
US (2) US10970600B2 (zh)
EP (1) EP3540637B1 (zh)
JP (1) JP6755395B2 (zh)
KR (1) KR102281017B1 (zh)
CN (1) CN108304755B (zh)
TW (1) TWI672667B (zh)
WO (1) WO2018161775A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532431A (zh) * 2019-07-23 2019-12-03 平安科技(深圳)有限公司 短视频关键词提取方法、装置及存储介质
CN110555861A (zh) * 2019-08-09 2019-12-10 北京字节跳动网络技术有限公司 光流计算方法、装置及电子设备
CN111091144A (zh) * 2019-11-27 2020-05-01 云南电网有限责任公司电力科学研究院 基于深度伪孪生网络的图像特征点匹配方法及装置
CN111314733A (zh) * 2020-01-20 2020-06-19 北京百度网讯科技有限公司 用于评估视频清晰度的方法和装置
CN111340905A (zh) * 2020-02-13 2020-06-26 北京百度网讯科技有限公司 图像风格化方法、装置、设备和介质
CN111353597A (zh) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 一种目标检测神经网络训练方法和装置
CN111754503A (zh) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 基于两通道卷积神经网络的肠镜退镜超速占比监测方法
CN112016041A (zh) * 2020-08-27 2020-12-01 重庆大学 基于格拉姆求和角场图像化和Shortcut-CNN的时间序列实时分类方法
KR20210107084A (ko) * 2019-03-07 2021-08-31 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 이미지 처리 방법 및 장치, 컴퓨터 디바이스, 및 저장 매체
CN113591761A (zh) * 2021-08-09 2021-11-02 成都华栖云科技有限公司 一种视频镜头语言识别方法
CN113792654A (zh) * 2021-09-14 2021-12-14 湖南快乐阳光互动娱乐传媒有限公司 视频片段的整合方法、装置、电子设备及存储介质
CN114760524A (zh) * 2020-12-25 2022-07-15 深圳Tcl新技术有限公司 视频处理方法、装置、智能终端及计算机可读存储介质

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10181195B2 (en) * 2015-12-28 2019-01-15 Facebook, Inc. Systems and methods for determining optical flow
US10713754B1 (en) * 2018-02-28 2020-07-14 Snap Inc. Remote distribution of neural networks
CN109272486B (zh) * 2018-08-14 2022-07-08 中国科学院深圳先进技术研究院 Mr图像预测模型的训练方法、装置、设备及存储介质
US10318842B1 (en) * 2018-09-05 2019-06-11 StradVision, Inc. Learning method, learning device for optimizing parameters of CNN by using multiple video frames and testing method, testing device using the same
CN109068174B (zh) * 2018-09-12 2019-12-27 上海交通大学 基于循环卷积神经网络的视频帧率上变换方法及***
CN109389072B (zh) * 2018-09-29 2022-03-08 北京字节跳动网络技术有限公司 数据处理方法和装置
CN109712228B (zh) * 2018-11-19 2023-02-24 中国科学院深圳先进技术研究院 建立三维重建模型的方法、装置、电子设备及存储介质
CN109785249A (zh) * 2018-12-22 2019-05-21 昆明理工大学 一种基于持续性记忆密集网络的图像高效去噪方法
CN109840598B (zh) * 2019-04-29 2019-08-09 深兰人工智能芯片研究院(江苏)有限公司 一种深度学习网络模型的建立方法及装置
CN110378936B (zh) * 2019-07-30 2021-11-05 北京字节跳动网络技术有限公司 光流计算方法、装置及电子设备
CN110677651A (zh) * 2019-09-02 2020-01-10 合肥图鸭信息科技有限公司 一种视频压缩方法
CN110599421B (zh) * 2019-09-12 2023-06-09 腾讯科技(深圳)有限公司 模型训练方法、视频模糊帧转换方法、设备及存储介质
US20210096934A1 (en) * 2019-10-01 2021-04-01 Shanghai United Imaging Intelligence Co., Ltd. Systems and methods for enhancing a patient positioning system
CN110717593B (zh) * 2019-10-14 2022-04-19 上海商汤临港智能科技有限公司 神经网络训练、移动信息测量、关键帧检测的方法及装置
US11023791B2 (en) * 2019-10-30 2021-06-01 Kyocera Document Solutions Inc. Color conversion using neural networks
CN110753225A (zh) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 一种视频压缩方法、装置及终端设备
CN110830848B (zh) * 2019-11-04 2021-12-07 上海眼控科技股份有限公司 图像插值方法、装置、计算机设备和存储介质
CN110913230A (zh) * 2019-11-29 2020-03-24 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备
CN110913219A (zh) * 2019-11-29 2020-03-24 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备
CN110830806A (zh) * 2019-11-29 2020-02-21 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备
CN110913218A (zh) * 2019-11-29 2020-03-24 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备
US11080834B2 (en) * 2019-12-26 2021-08-03 Ping An Technology (Shenzhen) Co., Ltd. Image processing method and electronic device
CN111083478A (zh) * 2019-12-31 2020-04-28 合肥图鸭信息科技有限公司 一种视频帧重构方法、装置及终端设备
CN111083499A (zh) * 2019-12-31 2020-04-28 合肥图鸭信息科技有限公司 一种视频帧重构方法、装置及终端设备
CN111083479A (zh) * 2019-12-31 2020-04-28 合肥图鸭信息科技有限公司 一种视频帧预测方法、装置及终端设备
KR102207736B1 (ko) * 2020-01-14 2021-01-26 한국과학기술원 심층 신경망 구조를 이용한 프레임 보간 방법 및 장치
KR102198480B1 (ko) * 2020-02-28 2021-01-05 연세대학교 산학협력단 재귀 그래프 모델링을 통한 비디오 요약 생성 장치 및 방법
CN111340195B (zh) * 2020-03-09 2023-08-22 创新奇智(上海)科技有限公司 网络模型的训练方法及装置、图像处理方法及存储介质
CN111524166B (zh) * 2020-04-22 2023-06-30 北京百度网讯科技有限公司 视频帧的处理方法和装置
CN111726621B (zh) * 2020-04-24 2022-12-30 中国科学院微电子研究所 一种视频转换方法及装置
CN111915573A (zh) * 2020-07-14 2020-11-10 武汉楚精灵医疗科技有限公司 一种基于时序特征学习的消化内镜下病灶跟踪方法
US11272097B2 (en) * 2020-07-30 2022-03-08 Steven Brian Demers Aesthetic learning methods and apparatus for automating image capture device controls
CN112104830B (zh) * 2020-08-13 2022-09-27 北京迈格威科技有限公司 视频插帧方法、模型训练方法及对应装置
CN111970518B (zh) * 2020-08-14 2022-07-22 山东云海国创云计算装备产业创新中心有限公司 一种图像丢帧处理方法、***、设备及计算机存储介质
CN112116692B (zh) * 2020-08-28 2024-05-10 北京完美赤金科技有限公司 模型渲染方法、装置、设备
CN112055249B (zh) * 2020-09-17 2022-07-08 京东方科技集团股份有限公司 一种视频插帧方法及装置
CN112288621B (zh) * 2020-09-21 2022-09-16 山东师范大学 基于神经网络的图像风格迁移方法及***
WO2022070574A1 (ja) * 2020-09-29 2022-04-07 富士フイルム株式会社 情報処理装置、情報処理方法及び情報処理プログラム
CN112561167B (zh) * 2020-12-17 2023-10-24 北京百度网讯科技有限公司 出行推荐方法、装置、电子设备及存储介质
EP4262207A4 (en) 2021-02-22 2024-03-27 Samsung Electronics Co., Ltd. IMAGE ENCODING AND DECODING DEVICE USING AI AND IMAGE ENCODING AND DECODING METHOD USING SAID DEVICE
EP4250729A4 (en) 2021-02-22 2024-05-01 Samsung Electronics Co., Ltd. AI-BASED IMAGE ENCODING AND DECODING APPARATUS AND RELATED METHOD
WO2022250372A1 (ko) * 2021-05-24 2022-12-01 삼성전자 주식회사 Ai에 기반한 프레임 보간 방법 및 장치
CN113542651B (zh) * 2021-05-28 2023-10-27 爱芯元智半导体(宁波)有限公司 模型训练方法、视频插帧方法及对应装置
KR102404166B1 (ko) * 2021-07-20 2022-06-02 국민대학교산학협력단 스타일 전이를 활용한 엑스레이 영상의 유체 탐지 방법 및 장치
WO2023004727A1 (zh) * 2021-07-30 2023-02-02 华为技术有限公司 视频处理方法、视频处理装置及电子装置
CN113706414B (zh) * 2021-08-26 2022-09-09 荣耀终端有限公司 视频优化模型的训练方法和电子设备
CN113705665B (zh) * 2021-08-26 2022-09-23 荣耀终端有限公司 图像变换网络模型的训练方法和电子设备
KR102658912B1 (ko) * 2021-09-24 2024-04-18 한국과학기술원 도메인별 최적화를 위한 생성 신경망의 에너지 효율적인 재학습 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504366A (zh) * 2014-11-24 2015-04-08 上海闻泰电子科技有限公司 基于光流特征的笑脸识别***及方法
WO2015079470A2 (en) * 2013-11-29 2015-06-04 Protodesign S.R.L. Video coding system for images and video from air or satellite platform assisted by sensors and by a geometric model of the scene
CN105160310A (zh) * 2015-08-25 2015-12-16 西安电子科技大学 基于3d卷积神经网络的人体行为识别方法
CN106203533A (zh) * 2016-07-26 2016-12-07 厦门大学 基于混合训练的深度学习人脸验证方法
CN106407889A (zh) * 2016-08-26 2017-02-15 上海交通大学 基于光流图深度学习模型在视频中人体交互动作识别方法

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9892606B2 (en) * 2001-11-15 2018-02-13 Avigilon Fortress Corporation Video surveillance system employing video primitives
KR101284561B1 (ko) * 2011-02-14 2013-07-11 충남대학교산학협력단 멀티 모달리티 감정인식 시스템, 감정인식 방법 및 그 기록매체
CN102209246B (zh) * 2011-05-23 2013-01-09 北京工业大学 一种实时视频白平衡处理***
US8655030B2 (en) * 2012-04-18 2014-02-18 Vixs Systems, Inc. Video processing system with face detection and methods for use therewith
US9213901B2 (en) * 2013-09-04 2015-12-15 Xerox Corporation Robust and computationally efficient video-based object tracking in regularized motion environments
US9741107B2 (en) * 2015-06-05 2017-08-22 Sony Corporation Full reference image quality assessment based on convolutional neural network
CN106469443B (zh) * 2015-08-13 2020-01-21 微软技术许可有限责任公司 机器视觉特征跟踪***
US10157309B2 (en) * 2016-01-14 2018-12-18 Nvidia Corporation Online detection and classification of dynamic gestures with recurrent convolutional neural networks
US10423830B2 (en) * 2016-04-22 2019-09-24 Intel Corporation Eye contact correction in real time using neural network based machine learning
CN106056628B (zh) * 2016-05-30 2019-06-18 中国科学院计算技术研究所 基于深度卷积神经网络特征融合的目标跟踪方法及***
US10037471B2 (en) * 2016-07-05 2018-07-31 Nauto Global Limited System and method for image analysis
CN106331433B (zh) * 2016-08-25 2020-04-24 上海交通大学 基于深度递归神经网络的视频去噪方法
CN108073933B (zh) * 2016-11-08 2021-05-25 杭州海康威视数字技术股份有限公司 一种目标检测方法及装置
US20180190377A1 (en) * 2016-12-30 2018-07-05 Dirk Schneemann, LLC Modeling and learning character traits and medical condition based on 3d facial features

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015079470A2 (en) * 2013-11-29 2015-06-04 Protodesign S.R.L. Video coding system for images and video from air or satellite platform assisted by sensors and by a geometric model of the scene
CN104504366A (zh) * 2014-11-24 2015-04-08 上海闻泰电子科技有限公司 基于光流特征的笑脸识别***及方法
CN105160310A (zh) * 2015-08-25 2015-12-16 西安电子科技大学 基于3d卷积神经网络的人体行为识别方法
CN106203533A (zh) * 2016-07-26 2016-12-07 厦门大学 基于混合训练的深度学习人脸验证方法
CN106407889A (zh) * 2016-08-26 2017-02-15 上海交通大学 基于光流图深度学习模型在视频中人体交互动作识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3540637A4 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353597B (zh) * 2018-12-24 2023-12-05 杭州海康威视数字技术股份有限公司 一种目标检测神经网络训练方法和装置
CN111353597A (zh) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 一种目标检测神经网络训练方法和装置
KR20210107084A (ko) * 2019-03-07 2021-08-31 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 이미지 처리 방법 및 장치, 컴퓨터 디바이스, 및 저장 매체
US11900567B2 (en) 2019-03-07 2024-02-13 Tencent Technology (Shenzhen) Company Limited Image processing method and apparatus, computer device, and storage medium
KR102509817B1 (ko) * 2019-03-07 2023-03-14 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 이미지 처리 방법 및 장치, 컴퓨터 디바이스, 및 저장 매체
CN110532431A (zh) * 2019-07-23 2019-12-03 平安科技(深圳)有限公司 短视频关键词提取方法、装置及存储介质
CN110532431B (zh) * 2019-07-23 2023-04-18 平安科技(深圳)有限公司 短视频关键词提取方法、装置及存储介质
CN110555861B (zh) * 2019-08-09 2023-04-25 北京字节跳动网络技术有限公司 光流计算方法、装置及电子设备
CN110555861A (zh) * 2019-08-09 2019-12-10 北京字节跳动网络技术有限公司 光流计算方法、装置及电子设备
CN111091144A (zh) * 2019-11-27 2020-05-01 云南电网有限责任公司电力科学研究院 基于深度伪孪生网络的图像特征点匹配方法及装置
CN111314733A (zh) * 2020-01-20 2020-06-19 北京百度网讯科技有限公司 用于评估视频清晰度的方法和装置
CN111314733B (zh) * 2020-01-20 2022-06-10 北京百度网讯科技有限公司 用于评估视频清晰度的方法和装置
CN111340905A (zh) * 2020-02-13 2020-06-26 北京百度网讯科技有限公司 图像风格化方法、装置、设备和介质
CN111340905B (zh) * 2020-02-13 2023-08-04 北京百度网讯科技有限公司 图像风格化方法、装置、设备和介质
CN111754503A (zh) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 基于两通道卷积神经网络的肠镜退镜超速占比监测方法
CN111754503B (zh) * 2020-07-01 2023-12-08 武汉楚精灵医疗科技有限公司 基于两通道卷积神经网络的肠镜退镜超速占比监测方法
CN112016041A (zh) * 2020-08-27 2020-12-01 重庆大学 基于格拉姆求和角场图像化和Shortcut-CNN的时间序列实时分类方法
CN112016041B (zh) * 2020-08-27 2023-08-04 重庆大学 基于格拉姆求和角场图像化和Shortcut-CNN的时间序列实时分类方法
CN114760524A (zh) * 2020-12-25 2022-07-15 深圳Tcl新技术有限公司 视频处理方法、装置、智能终端及计算机可读存储介质
CN113591761A (zh) * 2021-08-09 2021-11-02 成都华栖云科技有限公司 一种视频镜头语言识别方法
CN113792654A (zh) * 2021-09-14 2021-12-14 湖南快乐阳光互动娱乐传媒有限公司 视频片段的整合方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP3540637A1 (en) 2019-09-18
JP2019534520A (ja) 2019-11-28
KR20190100320A (ko) 2019-08-28
EP3540637B1 (en) 2023-02-01
EP3540637A4 (en) 2020-09-02
KR102281017B1 (ko) 2021-07-22
CN108304755B (zh) 2021-05-18
US11610082B2 (en) 2023-03-21
US10970600B2 (en) 2021-04-06
JP6755395B2 (ja) 2020-09-16
US20210182616A1 (en) 2021-06-17
CN108304755A (zh) 2018-07-20
US20190228264A1 (en) 2019-07-25
TW201833867A (zh) 2018-09-16
TWI672667B (zh) 2019-09-21

Similar Documents

Publication Publication Date Title
WO2018161775A1 (zh) 一种用于图像处理的神经网络模型的训练方法、装置和存储介质
EP3716198A1 (en) Image reconstruction method and device
US9344690B2 (en) Image demosaicing
CN110751649B (zh) 视频质量评估方法、装置、电子设备及存储介质
CN111835983B (zh) 一种基于生成对抗网络的多曝光图高动态范围成像方法及***
TW202042176A (zh) 圖像生成網路的訓練及影像處理方法和裝置、電子設備
WO2023005818A1 (zh) 噪声图像生成方法、装置、电子设备及存储介质
Wang et al. Low-light image enhancement based on virtual exposure
WO2023160426A1 (zh) 视频插帧方法、训练方法、装置和电子设备
Bare et al. Real-time video super-resolution via motion convolution kernel estimation
CN112950596A (zh) 基于多区域多层次的色调映射全向图像质量评价方法
JP7463186B2 (ja) 情報処理装置、情報処理方法及びプログラム
CN110049242A (zh) 一种图像处理方法和装置
CN111429371A (zh) 图像处理方法、装置及终端设备
Fu et al. Screen content image quality assessment using Euclidean distance
CN104182931B (zh) 超分辨率方法和装置
CN112565887A (zh) 一种视频处理方法、装置、终端及存储介质
CN116980549A (zh) 视频帧处理方法、装置、计算机设备和存储介质
CN115471413A (zh) 图像处理方法及装置、计算机可读存储介质和电子设备
Choudhury et al. HDR image quality assessment using machine-learning based combination of quality metrics
CN116797510A (zh) 图像处理方法、装置、计算机设备和存储介质
Sun et al. Explore unsupervised exposure correction via illumination component divided guidance
JP7512150B2 (ja) 情報処理装置、情報処理方法およびプログラム
Chen et al. Low-light image enhancement and acceleration processing based on ZYNQ
CN113034358B (zh) 一种超分辨率图像处理方法以及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18764177

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019524446

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018764177

Country of ref document: EP

Effective date: 20190613

ENP Entry into the national phase

Ref document number: 20197021770

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE