WO2018161775A1 - 一种用于图像处理的神经网络模型的训练方法、装置和存储介质 - Google Patents
一种用于图像处理的神经网络模型的训练方法、装置和存储介质 Download PDFInfo
- Publication number
- WO2018161775A1 WO2018161775A1 PCT/CN2018/075958 CN2018075958W WO2018161775A1 WO 2018161775 A1 WO2018161775 A1 WO 2018161775A1 CN 2018075958 W CN2018075958 W CN 2018075958W WO 2018161775 A1 WO2018161775 A1 WO 2018161775A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- image
- network model
- intermediate image
- loss
- Prior art date
Links
- 238000003062 neural network model Methods 0.000 title claims abstract description 213
- 238000012549 training Methods 0.000 title claims abstract description 144
- 238000012545 processing Methods 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000003287 optical effect Effects 0.000 claims abstract description 74
- 230000008859 change Effects 0.000 claims description 78
- 239000011159 matrix material Substances 0.000 claims description 65
- 238000011156 evaluation Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 claims description 19
- 230000002441 reversible effect Effects 0.000 claims description 16
- 230000007423 decrease Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 description 57
- 230000006870 function Effects 0.000 description 20
- 238000004364 calculation method Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 10
- 230000009467 reduction Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000000779 depleting effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/758—Involving statistics of pixels or of feature values, e.g. histogram matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
- G06V10/993—Evaluation of the quality of the acquired pattern
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
Definitions
- the present application relates to the field of computer technology, and in particular, to a training method, apparatus, and storage medium for a neural network model for image processing.
- the neural network model is usually used to transform the features of the image, such as image color feature conversion, image light and shadow feature conversion or image style feature conversion.
- a neural network model for image processing needs to be trained.
- the embodiment of the present application provides a training method for a neural network model for image processing, which is applied to an electronic device, and the method includes:
- Embodiments of the present application propose a training apparatus for a neural network model for image processing, the apparatus comprising a processor and a memory connected to the processor, wherein the memory stores a machine executable by the processor A readable instruction module, the machine readable instruction module comprising:
- An input obtaining module configured to acquire video frames adjacent to each other at multiple times
- An output obtaining module configured to separately process the plurality of video frames by using a neural network model to output the corresponding intermediate image
- a loss acquisition module configured to acquire optical flow information of a video frame whose timing is advanced to a video frame with a lower timing in the plurality of temporally adjacent video frames, and obtain an intermediate corresponding to the video frame with the highest timing Obtaining an image of the image according to the optical flow information; obtaining a time loss between the intermediate image corresponding to the video frame after the timing and the changed image; acquiring an intermediate corresponding to the plurality of temporally adjacent video frames Feature loss between the image and the target feature image;
- a model adjustment module configured to adjust the neural network model according to the time loss and the feature loss, and return to the step of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies a training end condition .
- the embodiment of the present application further provides a non-transitory computer readable storage medium, wherein the storage medium stores machine readable instructions, which are executable by a processor to perform the following operations:
- 1A is a schematic diagram of an implementation environment of a training method for a neural network model for image processing according to an embodiment of the present application
- 1B is a schematic diagram showing an internal structure of an electronic device for implementing a training method for a neural network model for image processing according to an embodiment of the present application;
- FIG. 2 is a schematic flow chart of a training method for a neural network model for image processing according to an embodiment of the present application
- FIG. 3 is a schematic flow chart of a training method for a neural network model for image processing according to another embodiment of the present application.
- FIG. 4 is a training architecture diagram of a neural network model for image processing in an embodiment of the present application.
- FIG. 5 is a structural block diagram of a training apparatus for a neural network model for image processing according to an embodiment of the present application
- FIG. 6 is another structural block diagram of a training apparatus for a neural network model for image processing according to an embodiment of the present application.
- the neural network model used for image processing trained by the traditional neural network model training method introduces a large amount of flickering (flickering) when the feature conversion of the video is not considered, since the time consistency between the video frames is not considered. , resulting in poor video feature conversion.
- the embodiment of the present application provides a training method, device, and storage medium for a neural network model for image processing.
- time loss and feature loss are used as a feedback adjustment basis.
- the neural network model is adjusted to train a neural network model that can be used for image processing.
- the intermediate image corresponding to the video frame with the preceding timing is changed to the video with the lower timing according to the video frame with the preceding timing.
- the optical flow information of the frame is obtained as an intermediate image corresponding to the expected video frame at a later time, thereby obtaining a time loss.
- This time loss reflects the loss in time consistency between the respective intermediate images of temporally adjacent video frames.
- the trained neural network model considers the time consistency between video frames of the video when performing feature conversion on the video, which greatly reduces the flicker noise introduced during the feature conversion process, thereby improving the feature conversion of the video. Conversion effect.
- the neural network model calculation is combined with the electronic device processor capability to process the video image, which improves the processor computing speed without sacrificing the video image feature conversion effect, thereby producing a better neural network model for image processing.
- FIG. 1A is a schematic diagram of an implementation environment of a training method for a neural network model for image processing according to an embodiment of the present application.
- the training device 11 for the neural network model of the image processing provided by the electronic device 1 is integrated with the training device 11 of the neural network model for image processing.
- the electronic device 1 and the user terminal 2 are connected by a network 3, and the network 3 may be a wired network or a wireless network.
- FIG. 1B is a schematic diagram showing the internal structure of an electronic device for implementing a training method for a neural network model for image processing according to an embodiment of the present application.
- the electronic device includes a processor 102, a non-volatile storage medium 103, and an internal memory 104 connected by a system bus 101.
- the non-volatile storage medium 103 of the electronic device stores an operating system 1031, and further stores a training device 1032 for a neural network model for image processing, and the training device 1032 for the neural network model for image processing is used.
- a training method for a neural network model for image processing is implemented.
- the processor 102 of the electronic device is used to provide computing and control capabilities to support the operation of the entire electronic device.
- the internal memory 104 in the electronic device provides an environment for the operation of the training device for the neural network model of image processing in the non-volatile storage medium 103.
- the internal memory 104 can store computer readable instructions that, when executed by the processor 102, can cause the processor 102 to perform a training method for a neural network model of image processing.
- the electronic device can be a terminal or a server.
- the terminal may be a personal computer or a mobile electronic device including at least one of a mobile phone, a tablet, a personal digital assistant, or a wearable device.
- the server can be implemented as a stand-alone server or a server cluster consisting of multiple physical servers. A person skilled in the art can understand that the structure shown in FIG.
- 1B is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the electronic device to which the solution of the present application is applied.
- the specific electronic device may be More or less components than those shown in Figure IB are included, or some components are combined, or have different component arrangements.
- FIG. 2 is a schematic flow chart of a training method of a neural network model for image processing in an embodiment of the present application. This embodiment is mainly illustrated by the method applied to the electronic device in Fig. 1B described above.
- the training method for the neural network model for image processing specifically includes the following steps:
- Step S202 acquiring a plurality of temporally adjacent video frames.
- video refers to data that can be divided into a sequence of still images arranged in chronological order.
- a still image obtained by dividing a video can be used as a video frame.
- Time-adjacent video frames refer to adjacent video frames in a video frame arranged in time series.
- the acquired temporally adjacent video frames may specifically be two or more than two temporally adjacent video frames. For example, if the video frames arranged in time series are p1, p2, p3, p4, ..., p1 and p2 are temporally adjacent video frames, and p1, p2, and p3 are also temporally adjacent video frames.
- a set of training samples is disposed in the electronic device, and a plurality of sets of temporally adjacent video frames are stored in the training sample set, and the electronic device may acquire any set of temporally adjacent video frames from the training sample set.
- the temporally adjacent video frames in the training sample set may be obtained by the electronic device according to the video segmentation obtained from the Internet, or may be obtained by the electronic device according to the video segmentation recorded by the imaging device included in the electronic device.
- a plurality of training sample sets may be set in the electronic device, and each training sample set is set with a corresponding training sample set identifier.
- the user can access the training sample set through the electronic device and select a training sample set for training through the electronic device.
- the electronic device may detect a selection instruction that is triggered by the user and carries the identifier of the training sample set, and the electronic device extracts the training sample set identifier in the selection instruction, and acquires the temporally adjacent video frame from the training sample set corresponding to the training sample set identifier.
- Step S204 The plurality of video frames are respectively processed by the neural network model to output the corresponding intermediate image.
- the neural network model refers to a complex network model formed by multiple layers connected to each other.
- the electronic device can train a neural network model, and the neural network model obtained after the training is completed can be used for image processing.
- the neural network model may include a multi-layer feature conversion layer. Each layer of the feature conversion layer has a corresponding nonlinear variation operator. Each layer may have multiple nonlinear variation operators, and one nonlinear variation in each layer of the feature conversion layer. The operator makes a nonlinear change to the input image to obtain a feature map as the operation result.
- Each feature conversion layer receives the operation result of the previous layer, and outputs the operation result of the layer to the next layer after its own operation.
- the electronic device After acquiring the video frames adjacent to each other, the electronic device inputs the temporally adjacent video frames into the neural network model, and sequentially passes through the feature conversion layers of the neural network model. On each layer of the feature conversion layer, the electronic device uses the nonlinear variation operator corresponding to the feature conversion layer to nonlinearly change the pixel value corresponding to the pixel point included in the feature image outputted by the previous layer, and outputs the current feature.
- the feature map on the conversion layer Wherein, if the current feature conversion layer is the first-level feature conversion layer, the feature image outputted by the upper layer is the input video frame.
- the pixel value corresponding to the pixel may specifically be an RGB (Red Green Blue) three-channel color value of the pixel.
- the neural network model to be trained may specifically include three convolution layers, five residual modules, two deconvolution layers, and one convolution layer.
- the electronic device After inputting the video frame into the neural network model, the electronic device first passes through the convolution layer, and each convolution kernel corresponding to the convolution layer performs a convolution operation on the pixel value matrix corresponding to the input video frame, and obtains the volume in the convolution layer.
- the corresponding pixel value matrix that is, the feature map
- the feature map of the number of cores is further calculated according to the offset term corresponding to each feature map for the pixel values of the corresponding pixel positions in each feature map, and a feature map is synthesized as the intermediate image of the output.
- the electronic device can be set to perform a downsampling operation after a convolution operation of one of the convolution layers.
- the method of downsampling may specifically be mean sampling, or extreme sampling.
- the resolution of the feature map obtained after the downsampling operation is reduced to 1/4 of the resolution of the input video frame.
- the electronic device needs to set an upsampling operation corresponding to the previous downsampling operation after the deconvolution operation of the deconvolution layer, so that the resolution of the feature map obtained after the upsampling operation is increased to be before the upsampling operation.
- the resolution of the feature map is 4 times to ensure that the intermediate image of the output is consistent with the resolution of the input video frame.
- the number of layers included in the neural network model and the type of the layer can be customized, or adjusted according to subsequent training results. However, the resolution of the image that satisfies the input neural network model is consistent with the resolution of the image output by the neural network model.
- Step S206 Acquire optical flow information of a video frame whose timing is earlier in the plurality of temporally adjacent video frames and whose video frame is changed to a later time.
- the optical flow can represent the moving speed of the gray mode in the image.
- All optical streams in the image arranged in spatial position constitute an optical flow field.
- the optical flow field characterizes the variation of pixel points in the image and can be used to determine the motion information of corresponding pixels between images.
- the video frame with the highest timing refers to the video frame with the earlier time stamp in the video frame adjacent to the time; the video frame with the lower timing refers to the time stamp in the video frame adjacent to the time.
- Late video frames For example, time-adjacent video frames are sequentially arranged as x1, x2, and x3, and x1 is a video frame with a timing ahead with respect to x2 and x3; x2 is a video frame with a lower timing than x1, and x2 is relative to x3. The video frame with the highest timing.
- the optical frame information of the video frame whose timing is changed to the video frame whose timing is later may be represented by the optical flow field between the video frame with the highest timing and the video frame with the lower timing.
- the method for calculating the optical flow information may specifically be a differential-based optical flow algorithm obtained according to an optical flow constraint equation, an optical flow algorithm based on region matching, an energy-based optical flow algorithm, and a phase-based light.
- the flow algorithm and the neurodynamic optical flow algorithm and the like are not specifically limited in the embodiment of the present application.
- the electronic device may calculate, according to the method for calculating the optical flow information, the optical flow information of the video frame whose timing is changed to the video frame with the lower timing, and obtain corresponding points of each pixel in the video frame with the highest timing.
- the electronic device can also select feature points from the video frames with the highest timing, and use the sparse optical flow calculation method to calculate the corresponding optical flow of the selected feature points.
- the position of the pixel A in the video frame with the highest timing is (x1, y1)
- the position of the pixel A in the video frame with the lower timing is (x2, y2)
- the velocity vector of the pixel A The vector field formed by the velocity vector of the corresponding pixel in the video frame of the preceding video is changed to the optical field of the video frame with the timing of the previous video frame .
- the electronic device may calculate between the adjacent two video frames in the temporally adjacent video frames.
- the optical flow information may also calculate optical flow information between two adjacent video frames in temporally adjacent video frames.
- the time-adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the electronic device can calculate optical flow information between x1 and x2, optical flow information between x2 and x3, and can also calculate x1 and x3.
- the electronic device may determine the calculated optical flow information when calculating the optical frame information of the video frame whose timing is changed to the video frame with the lower timing according to the method for calculating the optical flow information.
- Confidence The confidence of the optical flow information is in one-to-one correspondence with the optical flow information, and is used to indicate the reliability of the corresponding optical flow information. The higher the confidence of the optical flow information, the more accurate the motion information of the pixel points in the image represented by the calculated optical flow information.
- Step S208 Acquire an image in which the intermediate image corresponding to the video frame with the highest timing is changed by the optical flow information.
- the electronic device may change the pixel points included in the intermediate image corresponding to the video frame with the preceding timing to the optical flow information of the video frame with the lower timing to change the optical frame information of the video frame with the lower timing, and obtain the changed The image formed by the pixel, that is, the pixel value distribution of the intermediate image corresponding to the video frame whose timing is later.
- the electronic device may be between adjacent two video frames in the video frames adjacent in time.
- the optical flow information is obtained by obtaining an intermediate image corresponding to the video frame of the adjacent two video frames in the adjacent two video frames according to the optical flow information for the intermediate image corresponding to the video frame of the adjacent two video frames.
- temporally adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the intermediate images of x1, x2, and x3 output by the neural network model are sequentially sorted into y1, y2, and y3.
- the optical flow information of x1 changing to x2 is g1
- the optical flow information of x2 changing to x3 is g2
- the electronic device can change y1 according to g1 to z2, and z2 to g3 according to g2; wherein z2 is the intermediate corresponding to x2 expected Image, z3 is the intermediate image expected by x3.
- the electronic device may also use the optical flow information between two adjacent non-adjacent video frames in a temporally adjacent video frame, and the video with the highest timing in the two adjacent non-adjacent video frames.
- the intermediate image corresponding to the frame obtains an intermediate image corresponding to the expected video frame in the two adjacent video frames that are not adjacent according to the optical flow information.
- temporally adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the intermediate images of x1, x2, and x3 output by the neural network model are sequentially sorted into y1, y2, and y3.
- the optical flow information of x1 changing to x3 is g3, and the electronic device can change y1 to z3 according to g3, and z3 is an intermediate image corresponding to x3 expected.
- the electronic device may also use the confidence of the optical flow information as a weight when the pixel included in the intermediate image corresponding to the video frame with the preceding timing is changed according to the corresponding optical flow information. An image formed by the changed pixel points.
- step S210 the time loss between the intermediate image corresponding to the video frame after the timing and the changed image acquired in step S208 is acquired.
- the time loss can be used to represent the change of the temporally adjacent video frames in the time domain, and the difference in the time domain between the images obtained by the temporally adjacent video frames processed by the neural network model.
- the electronic device may change the intermediate image corresponding to the video frame with the lower timing and the intermediate image corresponding to the video frame with the previous timing to the optical frame of the video frame with the lower timing.
- the image after the information change is compared to obtain a difference between the two, and the time loss between the intermediate image corresponding to the video frame after the timing and the changed image is determined according to the difference.
- the video frame with the highest timing is x t-1
- the video frame with the lower timing is x t
- the light with x t-1 changes to x t
- the stream information is G t .
- the intermediate image output by x t-1 after processing by the neural network model is y t-1
- the intermediate image output by x t after being processed by the neural network model is y t .
- the electronic device can change y t-1 according to the optical flow information G t changed from x t-1 to x t , and obtain z t , z t can be used as the neural network model corresponding to the expected video frame x t The image that is output after processing. The electronic device may then compare the difference between t and y t Z, resulting in loss between the time t y and z t.
- temporally adjacent video frames are sequentially arranged in the order of x1, x2, and x3, and the intermediate images corresponding to x1, x2, and x3 output by the neural network model are sequentially sorted into y1, y2, and y3.
- the optical flow information in which x1 changes to x2 is g1
- the optical flow information in which x2 changes to x3 is g2
- the optical flow information in which x1 changes to x3 is g3.
- the electronic device can change y1 according to g1 to z2, z2 according to g2 to z3, y1 to g3 according to g3, z2 to x2 expected corresponding intermediate image, z3 and z'3 are both x3 expected corresponding intermediate Image, electronic device can compare the difference between y2 and z2, get the time loss between y2 and z2; electronic device can compare the difference between y3 and z3, and the difference between y3 and z'3, according to the weight of z3 and z'3 Time loss between y3 and z3 and z'3.
- Step S212 Acquire feature loss between the intermediate image corresponding to the plurality of temporally adjacent video frames and the target feature image.
- the image feature to be converted to when the neural network model is used for feature conversion is the image feature corresponding to the target feature image.
- the feature loss is the difference between the image feature corresponding to the intermediate image output by the neural network model and the image feature corresponding to the target feature image.
- the image features may specifically be image color features, image light features or image style features, and the like.
- the target feature image may be a target color feature image, a target light shadow feature image or a target style feature image, and the like; the feature loss of the intermediate image and the target feature image may specifically be a color feature loss, a light feature loss or a style feature loss.
- the electronic device may first determine an image feature to be trained, and acquire an image conforming to the image feature as the target feature image.
- the electronic device can further extract the image features corresponding to the target feature image by using the trained neural network model for extracting image features, and then compare the image features corresponding to the intermediate image with the image features corresponding to the target feature image to obtain The difference between the two determines the feature loss between the intermediate image and the target feature image based on the difference.
- the neural network model is used to transform the image style features of the image
- the target style feature image is S
- the number of frames of temporally adjacent video frames is two frames
- the video frame with the highest timing is x t-1 , timing.
- the lower video frame is x t .
- the intermediate image output by x t-1 after processing by the neural network model is y t-1
- the intermediate image output by x t after being processed by the neural network model is y t .
- the electronic device can compare the difference between y t-1 and S and the difference between y t and S, respectively, to obtain the style feature loss between y t-1 and S and the style feature loss between y t and S.
- Step S214 adjusting the neural network model according to the time loss and the feature loss, and returning to the step S202 of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies the training end condition.
- the process of training the neural network model is a process of determining a nonlinear change operator corresponding to each feature conversion layer in the neural network model to be trained.
- the electronic device may first initialize the nonlinear change operator corresponding to each feature conversion layer in the neural network model to be trained, and continuously optimize the nonlinear change of the initialization in the subsequent training process.
- the operator is used to optimize the optimal nonlinear variation operator as the nonlinear variation operator of the trained neural network model.
- the electronic device may construct a time domain loss function according to the time loss, construct a spatial domain loss function according to the feature loss, combine the time domain loss function with the spatial domain loss function to obtain a mixed loss function, and then calculate the mixed loss function.
- the rate of change of the nonlinear variation operator corresponding to each feature transformation layer in the neural network model can adjust the nonlinear change operator corresponding to each feature conversion layer in the neural network model according to the calculated rate of change, so that the calculated rate of change becomes smaller, so that the neural network model is optimized for training.
- the training end condition may be that the number of trainings of the neural network model reaches a preset number of trainings.
- the electronic device can count the number of trainings when training the neural network model. When the counting reaches the preset training number, the electronic device can determine that the neural network model satisfies the training end condition and end training of the neural network model.
- the training end condition may also be that the mixing loss function satisfies the convergence condition.
- the mixed loss function calculated after each training is recorded with the rate of change of the nonlinear change operator corresponding to each feature conversion layer in the neural network model, and the calculated When the rate of change gradually approaches a certain value, the electronic device can determine that the neural network model satisfies the training end condition and end training of the neural network model.
- the above training method for neural network model for image processing when training the neural network model, adjusts the neural network model by using time loss and feature loss synergy as a feedback adjustment basis to train a neural network model that can be used for image processing.
- the intermediate image corresponding to the video frame with the preceding timing is changed to the video frame with the lower timing according to the video frame with the preceding timing.
- the optical flow information is obtained as an intermediate image corresponding to the expected video frame of the timing, thereby obtaining time loss. This time loss reflects the loss in time consistency between the respective intermediate images of temporally adjacent video frames.
- the trained neural network model considers the time consistency between video frames of the video when performing feature conversion on the video, which greatly reduces the flicker noise introduced during the feature conversion process, thereby improving the feature conversion of the video. Conversion effect.
- the neural network model calculation is combined with the electronic device processor capability to process the video image, which improves the processor computing speed without sacrificing the video image feature conversion effect, thereby producing a better neural network model for image processing.
- the adjusting the neural network model according to the time loss and the feature loss in the training method of the neural network model for image processing specifically includes: acquiring content between the input video frames corresponding to the intermediate image and the intermediate image. Loss; generates training cost based on time loss, feature loss, and content loss; adjusts the neural network model according to training cost.
- the content loss refers to the difference in image content between the intermediate image output through the neural network model and the corresponding input video frame.
- the electronic device may use the trained neural network model for extracting image content features to respectively extract the image content feature corresponding to the intermediate image and the image content feature corresponding to the input video frame corresponding to the intermediate image, and then corresponding the intermediate image.
- the image content feature is compared with the image content feature corresponding to the corresponding input video frame to obtain a difference between the two, and the content loss between the intermediate image and the corresponding video frame is determined according to the difference.
- the electronic device may construct a time domain loss function according to the time loss, and jointly construct a spatial domain loss function according to the feature loss and the content loss, and generate a positive correlation with the time domain loss function and a positive spatial domain loss function.
- the electronic device can recalculate the training rate with the change rate of the nonlinear change operator corresponding to each feature conversion layer in the neural network model, and adjust the nonlinear change operator corresponding to each feature conversion layer in the neural network model according to the calculated change rate. So that the calculated rate of change becomes smaller, so that the neural network model is optimized for training.
- the electronic device may further perform denoising processing on the intermediate image output by the neural network model. Specifically, the electronic device may determine a total variation minimization term for performing denoising processing on edge pixels of the intermediate image based on a denoising algorithm that implements Total Variation (TV), and convert the total variation The minimized joint joint feature loss and content loss jointly construct a spatial domain loss function for neural network model training. This method of denoising the image by using the total variation minimization term improves the conversion effect of the neural network model on the feature conversion of the video.
- TV Total Variation
- the neural network model when training the neural network model, is adjusted by using the time loss, the feature loss, and the content loss as a feedback adjustment basis to train a neural network model that can be used for image processing, from time to time,
- the three dimensions of content and feature ensure the accuracy of image feature conversion, and improve the conversion effect of the trained neural network model on feature conversion of video.
- step S210 specifically includes: subtracting an intermediate image corresponding to the video frame with a lower timing from a corresponding pixel position in the changed image to obtain a difference distribution map; and according to the difference distribution graph, The time loss between the intermediate image corresponding to the video frame after the timing and the changed image is determined.
- the difference distribution map obtained by subtracting the value of the corresponding pixel position in the intermediate image corresponding to the video frame with the lower-order video frame by the electronic device may be a pixel value difference matrix.
- the electronic device can perform a dimensionality reduction operation on the difference distribution map to obtain a time loss value. After the electronic equipment selects the dimensionality reduction operation method for the first time calculation of the time loss, the subsequent time loss calculation adopts the selected dimensionality reduction operation mode.
- the intermediate image corresponding to the video frame after the timing is compared with the changed image by the difference between the pixel value corresponding to the pixel position in the video frame after the timing and the pixel value of the corresponding pixel position in the changed image.
- the time loss between images makes the calculation of time loss more accurate.
- the step of acquiring the content loss between the input video frame corresponding to the intermediate image and the intermediate image in the training method of the neural network model for image processing comprises: the video frame and the corresponding intermediate image Entering an evaluation network model; obtaining a feature map corresponding to the video frame and a feature map corresponding to the intermediate image, and a feature map corresponding to the corresponding video frame according to the intermediate image; , determining the content loss between the intermediate image and the corresponding video frame.
- the evaluation network model is used to extract image features of the input image.
- the evaluation network model may specifically be an Alexnet network model, a VGG (Visual Geometry Group) network model, or a GoogLeNet network.
- the layer included in the evaluation network model corresponds to a plurality of feature extraction factors, and each feature extraction factor extracts different features.
- the feature map is an image processing result obtained by processing the input image by evaluating a layer change operator in the network model, and the image processing result is an image feature matrix, and the image feature matrix is processed by the change operator to process the input image matrix.
- the response value is composed.
- the evaluation network model can obtain a matrix of pixel values corresponding to the input video frame and a matrix of pixel values corresponding to the corresponding intermediate image.
- the layer included in the evaluation network model operates on the pixel value matrix corresponding to the input video frame or the intermediate image according to the feature extraction factor corresponding to the layer, and obtains a corresponding response value to form a feature map.
- the characteristics of different layers extracted in the evaluation network model are different.
- the electronic device may previously set a feature map of the layer output of the extracted image content feature in the evaluation network model as a feature map for performing content loss calculation.
- the layer for extracting image content features in the evaluation network model may be one layer or multiple layers.
- the electronic device After acquiring the feature map corresponding to the input image corresponding to the intermediate image and the input image corresponding to the intermediate image, the electronic device selects the corresponding pixel position in the feature map corresponding to the intermediate image and the corresponding image frame corresponding to the corresponding video frame.
- the pixel values are subtracted to obtain a content difference matrix between the two, and the content difference matrix is subjected to dimensionality reduction to obtain content loss.
- the image content of the video frame before the feature conversion and the image content of the feature image after the feature conversion are extracted by evaluating the network model, and the feature image extracted from the image feature is extracted to calculate the image between the corresponding input images.
- Content loss makes the calculation of content loss more accurate.
- step S212 specifically includes: inputting an intermediate image and a target feature image into the evaluation network model; acquiring a feature image corresponding to the intermediate image and corresponding to the target feature image, which is output by the layer included in the evaluation network model.
- the feature map determines the feature loss between the intermediate image and the target feature image according to the feature map corresponding to the intermediate image and the feature map corresponding to the target feature image.
- the electronic device may previously set a feature map of the layer output of the extracted image feature in the evaluation network model as a feature map for performing feature loss calculation.
- the layer for extracting image features in the evaluation network model may be one layer or multiple layers.
- the image feature of the target feature image and the feature-converted intermediate image is extracted by evaluating the network model, and the feature map extracted from the image feature is used to calculate the feature loss between the corresponding input images. To make the calculation of feature loss more accurate.
- the training method for the neural network model for image processing determines the feature between the intermediate image and the target feature image according to the feature map corresponding to the intermediate image and the feature map corresponding to the target feature image.
- the step of depleting specifically includes: determining a feature matrix corresponding to the intermediate image according to the feature map corresponding to the intermediate image; determining a feature matrix corresponding to the target feature image according to the feature map corresponding to the target feature image; and corresponding to the intermediate image
- the values of the corresponding positions in the feature matrix corresponding to the feature matrix and the target feature image are subtracted to obtain a feature difference matrix; and the feature loss between the intermediate image and the target feature image is determined according to the feature difference matrix.
- the neural network model is used to perform image style feature conversion on the image
- the feature matrix corresponding to the intermediate image may specifically be a style feature matrix.
- the style feature matrix is a matrix that reflects image style features.
- the style feature matrix may specifically be a Gram Matrix.
- the electronic device can obtain the corresponding Gram matrix as the style feature matrix corresponding to the intermediate image by taking the inner product of the feature image corresponding to the intermediate image, and obtain the inner product of the feature image corresponding to the target style image to obtain the corresponding Gram.
- the matrix is used as the style feature matrix corresponding to the target style image.
- the electronic device can further subtract the value of the corresponding position in the style feature matrix corresponding to the intermediate image and the corresponding position in the style feature matrix corresponding to the target style image to obtain a style difference feature matrix; and then perform a dimensionality reduction operation on the style difference feature matrix to obtain a style feature. loss.
- the feature matrix that can reflect the image features is used to calculate the feature loss between the image obtained by the feature conversion and the target feature image, so that the calculation of the feature loss is more accurate.
- the electronic device may select a VGG-19 network model as the evaluation network model, and the network model includes a 16-layer convolution layer and a 5-layer pooling layer.
- the feature extracted by the fourth layer of the model can reflect the characteristics of the image content.
- the features extracted by the first, second, third and fourth layers of the model can reflect the image style features.
- the electronic device may acquire the feature map corresponding to the intermediate image outputted by the fourth layer of the convolutional layer and the feature image corresponding to the input video frame corresponding to the intermediate image, and calculate the intermediate image and the corresponding video frame based on the acquired feature image. The loss of content between.
- the electronic device may acquire a feature map corresponding to the intermediate image outputted by the first, second, third, and fourth layers of the convolutional layer, and a feature map corresponding to the input video frame corresponding to the intermediate image, and calculate the middle based on the acquired feature image.
- the training method for the neural network model for image processing adjusts the neural network model according to the training cost, including: determining the training cost in the reverse order according to the order of the layers included in the neural network model. Corresponding change rate of nonlinear change operator; adjust the nonlinear change operator corresponding to the layer included in the neural network model in reverse order, so that the training cost decreases with the change rate of the nonlinear change operator corresponding to the corresponding adjusted layer small.
- the electronic device may determine the rate of change of the training cost from the nonlinear change operator corresponding to the current layer from the last layer included in the neural network model according to the order of the layers included in the neural network model, and then determine the training cost in reverse order. The rate of change of the operator with the nonlinear variation corresponding to each layer.
- the electronic device can sequentially adjust the nonlinear change operator corresponding to the layer included in the neural network model in reverse order, so that the training rate decreases with the change rate of the nonlinear change operator corresponding to the corresponding adjusted layer.
- the nonlinear change operator corresponding to the first layer in reverse order is z
- the rate of change of L with z is The nonlinear variation operator corresponding to the second layer of the reverse order
- the rate of change of L with b is
- the nonlinear change operator corresponding to the third layer of the reverse order is c
- the rate of change of L with c is When solving the rate of change, the chain derivation will conduct the gradient layer by layer to the previous layer.
- the electronic device can sequentially adjust the nonlinear change operator z, b, c to the first layer included in the neural network model (ie, the last layer of the reverse order).
- the corresponding nonlinear change operator reduces the rate of change obtained in the last layer of the reverse order.
- the training cost may be specifically expressed as:
- L hybrid represents the training cost
- L spatial (x i , y i , s) represents the spatial domain loss function
- L temporal (y t , y t-1 ) represents the time domain loss function, generated by time loss
- ⁇ is time The corresponding weight of the domain loss function.
- the spatial domain loss function can be expressed as:
- l represents a layer for extracting image features in the evaluation network model; Representing the content loss between the image input to the neural network model and the image output by the neural network model; Represents the feature loss between the image output by the neural network model and the target feature image; R tv represents the total variation minimization term; ⁇ , ⁇ , and ⁇ are the weights corresponding to the various losses. For example, the value of ⁇ may be 1, the value of ⁇ may be 1, and the value of ⁇ may be 10 4 .
- the rate of change of the training cost with the nonlinear change operator corresponding to each layer of the neural network model is solved by the back propagation method, and the nonlinear change operator corresponding to each layer of the neural network model is adjusted to calculate The rate of change is reduced to train the neural network model, so that the trained neural network model is better for image conversion.
- a training method for a neural network model for image processing specifically includes the following steps:
- Step S302 acquiring a plurality of temporally adjacent video frames.
- Step S304 the plurality of video frames are respectively processed by a neural network model to cause the neural network model to output a corresponding intermediate image.
- Step S306 Acquire optical flow information of the video frame whose timing is changed to the video frame whose timing is later.
- Step S308 the image of the intermediate image corresponding to the video frame with the highest timing is changed according to the optical flow information.
- Step S310 subtracting the intermediate image corresponding to the lower-order video frame from the value of the corresponding pixel position in the changed image to obtain a difference distribution map; and determining, according to the difference distribution map, the corresponding video frame corresponding to the timing Time loss between the intermediate image and the changed image.
- Step S312 input the intermediate image and the target feature image into the evaluation network model; acquire the feature image corresponding to the intermediate image and the feature image corresponding to the target feature image output by the layer included in the evaluation network model; and according to the feature corresponding to the intermediate image a map, determining a feature matrix corresponding to the intermediate image; determining a feature matrix corresponding to the target feature image according to the feature map corresponding to the target feature image; corresponding to the feature matrix corresponding to the intermediate image and the feature matrix corresponding to the target feature image The values of the positions are subtracted to obtain a feature difference matrix; and the feature loss between the intermediate image and the target feature image is determined according to the feature difference matrix.
- Step S314 input an intermediate image corresponding to the video frame and the video frame into the evaluation network model; acquire a feature map corresponding to the video frame and a feature map corresponding to the intermediate image, which are output by the layer included in the evaluation network model; The corresponding feature map and the corresponding feature map corresponding to the video frame determine the content loss between the intermediate image and the corresponding video frame.
- Step S316 generating a training cost based on time loss, feature loss, and content loss.
- Step S318 according to the sequence of layers included in the neural network model, determining the rate of change of the training cost with the nonlinear change operator corresponding to each layer in reverse order; adjusting the nonlinear change corresponding to the layer included in the neural network model in reverse order
- the sub-segment reduces the training rate with the rate of change of the nonlinear variation operator corresponding to the corresponding adjusted layer.
- step S320 it is determined whether the neural network model satisfies the training end condition; if the neural network model satisfies the training end condition, the process proceeds to step S322; if the neural network model does not satisfy the training end condition, the process proceeds to step S302.
- step S322 the training neural network model ends.
- the time loss, the feature loss and the content loss are used as feedback adjustment basis to adjust the neural network model, in three dimensions of time, feature and content. Training the neural network model improves the training effect of the neural network model.
- FIG. 4 is a diagram showing a training architecture of a neural network model for image processing in one embodiment of the present application.
- the neural network model embodiment convolution of three layers, a residual module 5, two layers deconvolution and convolution layers 1, the electronic device may forward the timing of the present embodiment video frame x t -1 , the video frames x t after the timing are respectively input into the neural network model, and the intermediate images output by the neural network model are obtained as y t-1 and y t .
- the electronic device can obtain the time domain loss function of y t-1 and y t according to the optical flow information between x t-1 and x t ; and then x t-1 , x t , y t-1 , y t and The target feature image S is input to the evaluation network model, and the content loss between x t-1 and y t-1 , x t and y t is obtained by evaluating the feature map of the layer output included in the network model, y t-1 and S , the characteristic loss between y t and S, resulting in a spatial domain loss function.
- the neural network model may be used for video feature conversion.
- the electronic device may divide the video that needs to be feature-converted into temporally adjacent video frames, and sequentially input the segmented video frames into the trained neural network model, and after processing the neural network model, obtain the feature conversion corresponding to each frame of the video frame.
- the output image is combined with each output image according to the time sequence of the corresponding input video frame to obtain the feature-converted video.
- the neural network model can perform feature conversion on multiple frames of video frames at the same time.
- a training apparatus 500 for a neural network model for image processing is provided, and the apparatus specifically includes: an input acquisition module 501, an output acquisition module 502, a loss acquisition module 503, and Model adjustment module 504.
- the input obtaining module 501 is configured to acquire a plurality of temporally adjacent video frames.
- the output obtaining module 502 is configured to process the plurality of video frames respectively through a neural network model to output the corresponding intermediate image.
- the loss obtaining module 503 is configured to acquire optical flow information of the video frame whose timing is earlier in the plurality of temporally adjacent video frames and to the video frame that is later than the timing; and obtain an intermediate image corresponding to the video frame with the highest timing Obtaining an image according to the optical flow information; obtaining a time loss between the intermediate image corresponding to the video frame after the timing and the changed image; acquiring an intermediate image corresponding to the plurality of temporally adjacent video frames and the target feature image Characteristic loss.
- the model adjustment module 504 is configured to adjust the neural network model according to the time loss and the feature loss, and return to the process of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies the training end condition.
- the model adjustment module 504 is further configured to acquire content loss between an intermediate image corresponding to a plurality of temporally adjacent video frames and a corresponding video frame; and generate, according to time loss, feature loss, and content loss. Training cost; adjust the neural network model according to the training cost.
- the neural network model when training the neural network model, is adjusted by using the time loss, the feature loss, and the content loss as a feedback adjustment basis to train a neural network model that can be used for image processing, from time to time,
- the three dimensions of content and feature ensure the accuracy of image feature conversion, and improve the conversion effect of the trained neural network model on feature conversion of video.
- the model adjustment module 504 is further configured to input a video frame corresponding to the intermediate image and the intermediate image into an evaluation network model, and obtain a layer output included in the evaluation network model, corresponding to the video frame. a feature map and a feature map corresponding to the intermediate image; determining a content loss between the intermediate image and the corresponding video frame according to the feature map corresponding to the intermediate image and the feature map corresponding to the corresponding video frame.
- the image content of the video frame before the feature conversion and the image content of the feature image after the feature conversion are extracted by evaluating the network model, and the feature image extracted from the image feature is extracted to calculate the image between the corresponding input images.
- Content loss makes the calculation of content loss more accurate.
- the model adjustment module 504 is further configured to determine the rate of change of the training cost with the nonlinear change operator corresponding to each layer according to the order of the layers included in the neural network model; adjust the neural network in reverse order The nonlinear variation operator corresponding to the layer included in the model reduces the training rate with the rate of change of the nonlinear variation operator corresponding to the corresponding adjusted layer.
- the rate of change of the training cost with the nonlinear change operator corresponding to each layer of the neural network model is solved by the back propagation method, and the nonlinear change operator corresponding to each layer of the neural network model is adjusted to calculate The rate of change is reduced to train the neural network model, so that the trained neural network model is better for image conversion.
- the loss obtaining module 503 is further configured to subtract the value of the corresponding pixel position in the intermediate image corresponding to the lower-order video frame from the corresponding pixel position in the changed image to obtain a difference distribution map; The figure determines the time loss between the intermediate image corresponding to the video frame after the timing and the changed image.
- the intermediate image corresponding to the video frame after the timing is compared with the changed image by the difference between the pixel value corresponding to the pixel position in the video frame after the timing and the pixel value of the corresponding pixel position in the changed image.
- the time loss between images makes the calculation of time loss more accurate.
- the loss obtaining module 503 is further configured to input the intermediate image and the target feature image into the evaluation network model; and acquire the feature image corresponding to the intermediate image and the target feature image outputted by the layer included in the evaluation network model. Corresponding feature map; determining feature loss between the intermediate image and the target feature image according to the feature map corresponding to the intermediate image and the feature map corresponding to the target feature image.
- the image feature of the target feature image and the feature-converted intermediate image is extracted by evaluating the network model, and the feature map extracted from the image feature is used to calculate the feature loss between the corresponding input images, so that the feature is obtained.
- the calculation of losses is more accurate.
- the loss obtaining module 503 is further configured to determine, according to the feature map corresponding to the intermediate image, a feature matrix corresponding to the intermediate image; and determine, according to the feature map corresponding to the target feature image, the target feature image.
- the feature matrix; the feature matrix corresponding to the intermediate image and the corresponding position value in the feature matrix corresponding to the target feature image are subtracted to obtain a feature difference matrix; and the feature loss between the intermediate image and the target feature image is determined according to the feature difference matrix.
- the feature matrix that can reflect the image features is used to calculate the feature loss between the image obtained by the feature conversion and the target feature image, so that the calculation of the feature loss is more accurate.
- FIG. 6 is another structural block diagram of a training apparatus for a neural network model for image processing according to an embodiment of the present application.
- the apparatus includes a processor 610, and a memory 630 coupled to the processor 610 via a bus 620.
- a machine readable instruction module executable by the processor 610 is stored in the memory 630.
- the machine readable instruction module includes an input acquisition module 601, an output acquisition module 602, a loss acquisition module 603, and a model adjustment module 604.
- the input obtaining module 601 is configured to acquire a plurality of temporally adjacent video frames.
- the output obtaining module 602 is configured to process the plurality of video frames respectively through a neural network model to output the corresponding intermediate image.
- the loss acquisition module 603 is configured to acquire optical flow information of a video frame whose timing is earlier in the plurality of temporally adjacent video frames and to a video frame that is later than the timing; and obtain an intermediate image corresponding to the video frame with the highest timing Obtaining an image according to the optical flow information; obtaining a time loss between the intermediate image corresponding to the video frame after the timing and the changed image; acquiring an intermediate image corresponding to the plurality of temporally adjacent video frames and the target feature image Characteristic loss.
- the model adjustment module 604 is configured to adjust the neural network model according to the time loss and the feature loss, and return to the process of acquiring a plurality of temporally adjacent video frames to continue training until the neural network model satisfies the training end condition.
- the specific functions of the input obtaining module 601, the output obtaining module 602, the loss obtaining module 603, and the model adjusting module 604 are the same as the foregoing input obtaining module 501, output obtaining module 502, loss obtaining module 503, and model adjustment. Module 504 is the same and will not be described again here.
- the training device for the neural network model for image processing when training the neural network model, adjusts the neural network model by using the time loss and the feature loss as a feedback adjustment basis to train a neural network model that can be used for image processing.
- the intermediate image corresponding to the video frame with the preceding timing is changed to the video with the lower timing according to the video frame with the preceding timing.
- the optical flow information of the frame is obtained as an intermediate image corresponding to the expected video frame at a later time, thereby obtaining a time loss. This time loss reflects the loss in time consistency between the respective intermediate images of temporally adjacent video frames.
- the trained neural network model considers the time consistency between video frames of the video when performing feature conversion on the video, which greatly reduces the flicker noise introduced during the feature conversion process, thereby improving the feature conversion of the video. Conversion effect.
- the neural network model calculation is combined with the electronic device processor capability to process the video image, which improves the processor computing speed without sacrificing the video image feature conversion effect, thereby producing a better neural network model for image processing.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Quality & Reliability (AREA)
- Biodiversity & Conservation Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (21)
- 一种用于图像处理的神经网络模型的训练方法,应用于电子设备,所述方法包括:获取多个时间相邻的视频帧;将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
- 根据权利要求1所述的方法,所述根据所述时间损耗和所述特征损耗调整所述神经网络模型,包括:获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗;根据所述时间损耗、所述特征损耗和所述内容损耗,生成训练代价;按照所述训练代价调整所述神经网络模型。
- 根据权利要求2所述的方法,所述获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗,包括:将所述中间图像与所述中间图像对应的视频帧输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述视频帧对应的特征图和与所述中间图像对应的特征图;根据所述中间图像所对应的特征图和所述视频帧所对应的特征图,确定所述中间图像与相应的视频帧之间的内容损耗。
- 根据权利要求2所述的方法,所述按照所述训练代价调整所述神经网络模型,包括:按照所述神经网络模型所包括的层的顺序,逆序确定所述训练代价随各所述层所对应的非线性变化算子的变化率;按所述逆序调整所述神经网络模型所包括的层所对应的非线性变化算子,使得所述训练代价随相应调整的所述层所对应的非线性变化算子的变化率减小。
- 根据权利要求1至4中任一项所述的方法,所述获取时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗,包括:将所述时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;根据所述差异分布图,确定所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
- 根据权利要求1至4中任一项所述的方法,所述获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗,包括:将所述中间图像与目标特征图像输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述中间图像对应的特征图和与所述目标特征图像对应的特征图;根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标特征图像间的特征损耗。
- 根据权利要求6所述的方法,所述根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标 特征图像间的特征损耗,包括:根据所述中间图像所对应的特征图,确定所述中间图像所对应的特征矩阵;根据所述目标特征图像所对应的特征图,确定所述目标特征图像所对应的特征矩阵;将所述中间图像所对应的特征矩阵和所述目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;根据所述特征差异矩阵,确定所述中间图像与所述目标特征图像间的特征损耗。
- 一种用于图像处理的神经网络模型的训练装置,所述装置包括处理器以及与所述处理器相连接的存储器,所述存储器中存储有可由所述处理器执行的机器可读指令模块,所述机器可读指令模块包括:输入获取模块,用于获取多个时间相邻的视频帧;输出获取模块,用于将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;损耗获取模块,用于获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;模型调整模块,用于根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
- 根据权利要求8所述的装置,所述模型调整模块还用于获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容 损耗;根据所述时间损耗、所述特征损耗和所述内容损耗,生成训练代价;按照所述训练代价调整所述神经网络模型。
- 根据权利要求9所述的装置,,所述模型调整模块还用于将所述中间图像与所述中间图像对应的视频帧输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述视频帧对应的特征图和与所述中间图像对应的特征图;根据所述中间图像所对应的特征图和所述视频帧所对应的特征图,确定所述中间图像与相应的视频帧之间的内容损耗。
- 根据权利要求9所述的装置,所述模型调整模块还用于按照所述神经网络模型所包括的层的顺序,逆序确定所述训练代价随各所述层所对应的非线性变化算子的变化率;按所述逆序调整所述神经网络模型所包括的层所对应的非线性变化算子,使得所述训练代价随相应调整的所述层所对应的非线性变化算子的变化率减小。
- 根据权利要求8至11中任一项所述的装置,所述损耗获取模块还用于将所述时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;根据所述差异分布图,确定所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
- 根据权利要求8至11中任一项所述的装置,所述损耗获取模块还用于将所述中间图像与目标特征图像输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述中间图像对应的特征图和与所述目标特征图像对应的特征图;根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标特征图像间的特征损耗。
- 根据权利要求13所述的装置,所述损耗获取模块还用于根据所述中间图像所对应的特征图,确定所述中间图像所对应的特征矩阵;根据所述目标特征图像所对应的特征图,确定所述目标特征图像所对应 的特征矩阵;将所述中间图像所对应的特征矩阵和所述目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;根据所述特征差异矩阵,确定所述中间图像与所述目标特征图像间的特征损耗。
- 一种非易失性计算机可读存储介质,所述存储介质中存储有机器可读指令,所述机器可读指令可以由处理器执行以完成以下操作:获取多个时间相邻的视频帧;将所述多个视频帧分别经过神经网络模型处理以使所述神经网络模型输出相对应的中间图像;获取所述多个时间相邻的视频帧中时序靠前的视频帧变化至时序靠后的视频帧的光流信息;获取所述时序靠前的视频帧所对应的中间图像按所述光流信息变化后的图像;获取所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗;获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗;根据所述时间损耗和所述特征损耗调整所述神经网络模型,返回所述获取多个时间相邻的视频帧的步骤继续训练,直至所述神经网络模型满足训练结束条件。
- 根据权利要求15所述的存储介质,所述根据所述时间损耗和所述特征损耗调整所述神经网络模型,包括:获取所述多个时间相邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗;根据所述时间损耗、所述特征损耗和所述内容损耗,生成训练代价;按照所述训练代价调整所述神经网络模型。
- 根据权利要求16所述的存储介质,所述获取所述多个时间相 邻的视频帧对应的中间图像与相应的视频帧之间的内容损耗,包括:将所述中间图像与所述中间图像对应的视频帧输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述视频帧对应的特征图和与所述中间图像对应的特征图;根据所述中间图像所对应的特征图和所述视频帧所对应的特征图,确定所述中间图像与相应的视频帧之间的内容损耗。
- 根据权利要求16所述的存储介质,所述按照所述训练代价调整所述神经网络模型,包括:按照所述神经网络模型所包括的层的顺序,逆序确定所述训练代价随各所述层所对应的非线性变化算子的变化率;按所述逆序调整所述神经网络模型所包括的层所对应的非线性变化算子,使得所述训练代价随相应调整的所述层所对应的非线性变化算子的变化率减小。
- 根据权利要求15至18中任一项所述的存储介质,所述获取时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗,包括:将所述时序靠后的视频帧所对应的中间图像与变化后的图像中对应的像素位置的数值相减,得到差异分布图;根据所述差异分布图,确定所述时序靠后的视频帧所对应的中间图像与变化后的图像间的时间损耗。
- 根据权利要求15至18中任一项所述的存储介质,所述获取所述多个时间相邻的视频帧对应的中间图像与目标特征图像间的特征损耗,包括:将所述中间图像与目标特征图像输入评价网络模型;获取所述评价网络模型所包括的层输出的,与所述中间图像对应的特征图和与所述目标特征图像对应的特征图;根据所述中间图像所对应的特征图和所述目标特征图像所对应的 特征图,确定所述中间图像与目标特征图像间的特征损耗。
- 根据权利要求20所述的存储介质,所述根据所述中间图像所对应的特征图和所述目标特征图像所对应的特征图,确定所述中间图像与目标特征图像间的特征损耗,包括:根据所述中间图像所对应的特征图,确定所述中间图像所对应的特征矩阵;根据所述目标特征图像所对应的特征图,确定所述目标特征图像所对应的特征矩阵;将所述中间图像所对应的特征矩阵和所述目标特征图像所对应的特征矩阵中对应位置的数值相减,得到特征差异矩阵;根据所述特征差异矩阵,确定所述中间图像与所述目标特征图像间的特征损耗。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020197021770A KR102281017B1 (ko) | 2017-03-08 | 2018-02-09 | 이미지 처리를 위한 신경망 모델 훈련 방법, 장치 및 저장 매체 |
JP2019524446A JP6755395B2 (ja) | 2017-03-08 | 2018-02-09 | 画像処理用のニューラルネットワークモデルのトレーニング方法、装置、及び記憶媒体 |
EP18764177.4A EP3540637B1 (en) | 2017-03-08 | 2018-02-09 | Neural network model training method, device and storage medium for image processing |
US16/373,034 US10970600B2 (en) | 2017-03-08 | 2019-04-02 | Method and apparatus for training neural network model used for image processing, and storage medium |
US17/187,473 US11610082B2 (en) | 2017-03-08 | 2021-02-26 | Method and apparatus for training neural network model used for image processing, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710136471.9 | 2017-03-08 | ||
CN201710136471.9A CN108304755B (zh) | 2017-03-08 | 2017-03-08 | 用于图像处理的神经网络模型的训练方法和装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/373,034 Continuation US10970600B2 (en) | 2017-03-08 | 2019-04-02 | Method and apparatus for training neural network model used for image processing, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018161775A1 true WO2018161775A1 (zh) | 2018-09-13 |
Family
ID=62872021
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/075958 WO2018161775A1 (zh) | 2017-03-08 | 2018-02-09 | 一种用于图像处理的神经网络模型的训练方法、装置和存储介质 |
Country Status (7)
Country | Link |
---|---|
US (2) | US10970600B2 (zh) |
EP (1) | EP3540637B1 (zh) |
JP (1) | JP6755395B2 (zh) |
KR (1) | KR102281017B1 (zh) |
CN (1) | CN108304755B (zh) |
TW (1) | TWI672667B (zh) |
WO (1) | WO2018161775A1 (zh) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110532431A (zh) * | 2019-07-23 | 2019-12-03 | 平安科技(深圳)有限公司 | 短视频关键词提取方法、装置及存储介质 |
CN110555861A (zh) * | 2019-08-09 | 2019-12-10 | 北京字节跳动网络技术有限公司 | 光流计算方法、装置及电子设备 |
CN111091144A (zh) * | 2019-11-27 | 2020-05-01 | 云南电网有限责任公司电力科学研究院 | 基于深度伪孪生网络的图像特征点匹配方法及装置 |
CN111314733A (zh) * | 2020-01-20 | 2020-06-19 | 北京百度网讯科技有限公司 | 用于评估视频清晰度的方法和装置 |
CN111340905A (zh) * | 2020-02-13 | 2020-06-26 | 北京百度网讯科技有限公司 | 图像风格化方法、装置、设备和介质 |
CN111353597A (zh) * | 2018-12-24 | 2020-06-30 | 杭州海康威视数字技术股份有限公司 | 一种目标检测神经网络训练方法和装置 |
CN111754503A (zh) * | 2020-07-01 | 2020-10-09 | 武汉楚精灵医疗科技有限公司 | 基于两通道卷积神经网络的肠镜退镜超速占比监测方法 |
CN112016041A (zh) * | 2020-08-27 | 2020-12-01 | 重庆大学 | 基于格拉姆求和角场图像化和Shortcut-CNN的时间序列实时分类方法 |
KR20210107084A (ko) * | 2019-03-07 | 2021-08-31 | 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 | 이미지 처리 방법 및 장치, 컴퓨터 디바이스, 및 저장 매체 |
CN113591761A (zh) * | 2021-08-09 | 2021-11-02 | 成都华栖云科技有限公司 | 一种视频镜头语言识别方法 |
CN113792654A (zh) * | 2021-09-14 | 2021-12-14 | 湖南快乐阳光互动娱乐传媒有限公司 | 视频片段的整合方法、装置、电子设备及存储介质 |
CN114760524A (zh) * | 2020-12-25 | 2022-07-15 | 深圳Tcl新技术有限公司 | 视频处理方法、装置、智能终端及计算机可读存储介质 |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10181195B2 (en) * | 2015-12-28 | 2019-01-15 | Facebook, Inc. | Systems and methods for determining optical flow |
US10713754B1 (en) * | 2018-02-28 | 2020-07-14 | Snap Inc. | Remote distribution of neural networks |
CN109272486B (zh) * | 2018-08-14 | 2022-07-08 | 中国科学院深圳先进技术研究院 | Mr图像预测模型的训练方法、装置、设备及存储介质 |
US10318842B1 (en) * | 2018-09-05 | 2019-06-11 | StradVision, Inc. | Learning method, learning device for optimizing parameters of CNN by using multiple video frames and testing method, testing device using the same |
CN109068174B (zh) * | 2018-09-12 | 2019-12-27 | 上海交通大学 | 基于循环卷积神经网络的视频帧率上变换方法及*** |
CN109389072B (zh) * | 2018-09-29 | 2022-03-08 | 北京字节跳动网络技术有限公司 | 数据处理方法和装置 |
CN109712228B (zh) * | 2018-11-19 | 2023-02-24 | 中国科学院深圳先进技术研究院 | 建立三维重建模型的方法、装置、电子设备及存储介质 |
CN109785249A (zh) * | 2018-12-22 | 2019-05-21 | 昆明理工大学 | 一种基于持续性记忆密集网络的图像高效去噪方法 |
CN109840598B (zh) * | 2019-04-29 | 2019-08-09 | 深兰人工智能芯片研究院(江苏)有限公司 | 一种深度学习网络模型的建立方法及装置 |
CN110378936B (zh) * | 2019-07-30 | 2021-11-05 | 北京字节跳动网络技术有限公司 | 光流计算方法、装置及电子设备 |
CN110677651A (zh) * | 2019-09-02 | 2020-01-10 | 合肥图鸭信息科技有限公司 | 一种视频压缩方法 |
CN110599421B (zh) * | 2019-09-12 | 2023-06-09 | 腾讯科技(深圳)有限公司 | 模型训练方法、视频模糊帧转换方法、设备及存储介质 |
US20210096934A1 (en) * | 2019-10-01 | 2021-04-01 | Shanghai United Imaging Intelligence Co., Ltd. | Systems and methods for enhancing a patient positioning system |
CN110717593B (zh) * | 2019-10-14 | 2022-04-19 | 上海商汤临港智能科技有限公司 | 神经网络训练、移动信息测量、关键帧检测的方法及装置 |
US11023791B2 (en) * | 2019-10-30 | 2021-06-01 | Kyocera Document Solutions Inc. | Color conversion using neural networks |
CN110753225A (zh) * | 2019-11-01 | 2020-02-04 | 合肥图鸭信息科技有限公司 | 一种视频压缩方法、装置及终端设备 |
CN110830848B (zh) * | 2019-11-04 | 2021-12-07 | 上海眼控科技股份有限公司 | 图像插值方法、装置、计算机设备和存储介质 |
CN110913230A (zh) * | 2019-11-29 | 2020-03-24 | 合肥图鸭信息科技有限公司 | 一种视频帧预测方法、装置及终端设备 |
CN110913219A (zh) * | 2019-11-29 | 2020-03-24 | 合肥图鸭信息科技有限公司 | 一种视频帧预测方法、装置及终端设备 |
CN110830806A (zh) * | 2019-11-29 | 2020-02-21 | 合肥图鸭信息科技有限公司 | 一种视频帧预测方法、装置及终端设备 |
CN110913218A (zh) * | 2019-11-29 | 2020-03-24 | 合肥图鸭信息科技有限公司 | 一种视频帧预测方法、装置及终端设备 |
US11080834B2 (en) * | 2019-12-26 | 2021-08-03 | Ping An Technology (Shenzhen) Co., Ltd. | Image processing method and electronic device |
CN111083478A (zh) * | 2019-12-31 | 2020-04-28 | 合肥图鸭信息科技有限公司 | 一种视频帧重构方法、装置及终端设备 |
CN111083499A (zh) * | 2019-12-31 | 2020-04-28 | 合肥图鸭信息科技有限公司 | 一种视频帧重构方法、装置及终端设备 |
CN111083479A (zh) * | 2019-12-31 | 2020-04-28 | 合肥图鸭信息科技有限公司 | 一种视频帧预测方法、装置及终端设备 |
KR102207736B1 (ko) * | 2020-01-14 | 2021-01-26 | 한국과학기술원 | 심층 신경망 구조를 이용한 프레임 보간 방법 및 장치 |
KR102198480B1 (ko) * | 2020-02-28 | 2021-01-05 | 연세대학교 산학협력단 | 재귀 그래프 모델링을 통한 비디오 요약 생성 장치 및 방법 |
CN111340195B (zh) * | 2020-03-09 | 2023-08-22 | 创新奇智(上海)科技有限公司 | 网络模型的训练方法及装置、图像处理方法及存储介质 |
CN111524166B (zh) * | 2020-04-22 | 2023-06-30 | 北京百度网讯科技有限公司 | 视频帧的处理方法和装置 |
CN111726621B (zh) * | 2020-04-24 | 2022-12-30 | 中国科学院微电子研究所 | 一种视频转换方法及装置 |
CN111915573A (zh) * | 2020-07-14 | 2020-11-10 | 武汉楚精灵医疗科技有限公司 | 一种基于时序特征学习的消化内镜下病灶跟踪方法 |
US11272097B2 (en) * | 2020-07-30 | 2022-03-08 | Steven Brian Demers | Aesthetic learning methods and apparatus for automating image capture device controls |
CN112104830B (zh) * | 2020-08-13 | 2022-09-27 | 北京迈格威科技有限公司 | 视频插帧方法、模型训练方法及对应装置 |
CN111970518B (zh) * | 2020-08-14 | 2022-07-22 | 山东云海国创云计算装备产业创新中心有限公司 | 一种图像丢帧处理方法、***、设备及计算机存储介质 |
CN112116692B (zh) * | 2020-08-28 | 2024-05-10 | 北京完美赤金科技有限公司 | 模型渲染方法、装置、设备 |
CN112055249B (zh) * | 2020-09-17 | 2022-07-08 | 京东方科技集团股份有限公司 | 一种视频插帧方法及装置 |
CN112288621B (zh) * | 2020-09-21 | 2022-09-16 | 山东师范大学 | 基于神经网络的图像风格迁移方法及*** |
WO2022070574A1 (ja) * | 2020-09-29 | 2022-04-07 | 富士フイルム株式会社 | 情報処理装置、情報処理方法及び情報処理プログラム |
CN112561167B (zh) * | 2020-12-17 | 2023-10-24 | 北京百度网讯科技有限公司 | 出行推荐方法、装置、电子设备及存储介质 |
EP4262207A4 (en) | 2021-02-22 | 2024-03-27 | Samsung Electronics Co., Ltd. | IMAGE ENCODING AND DECODING DEVICE USING AI AND IMAGE ENCODING AND DECODING METHOD USING SAID DEVICE |
EP4250729A4 (en) | 2021-02-22 | 2024-05-01 | Samsung Electronics Co., Ltd. | AI-BASED IMAGE ENCODING AND DECODING APPARATUS AND RELATED METHOD |
WO2022250372A1 (ko) * | 2021-05-24 | 2022-12-01 | 삼성전자 주식회사 | Ai에 기반한 프레임 보간 방법 및 장치 |
CN113542651B (zh) * | 2021-05-28 | 2023-10-27 | 爱芯元智半导体(宁波)有限公司 | 模型训练方法、视频插帧方法及对应装置 |
KR102404166B1 (ko) * | 2021-07-20 | 2022-06-02 | 국민대학교산학협력단 | 스타일 전이를 활용한 엑스레이 영상의 유체 탐지 방법 및 장치 |
WO2023004727A1 (zh) * | 2021-07-30 | 2023-02-02 | 华为技术有限公司 | 视频处理方法、视频处理装置及电子装置 |
CN113706414B (zh) * | 2021-08-26 | 2022-09-09 | 荣耀终端有限公司 | 视频优化模型的训练方法和电子设备 |
CN113705665B (zh) * | 2021-08-26 | 2022-09-23 | 荣耀终端有限公司 | 图像变换网络模型的训练方法和电子设备 |
KR102658912B1 (ko) * | 2021-09-24 | 2024-04-18 | 한국과학기술원 | 도메인별 최적화를 위한 생성 신경망의 에너지 효율적인 재학습 방법 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104504366A (zh) * | 2014-11-24 | 2015-04-08 | 上海闻泰电子科技有限公司 | 基于光流特征的笑脸识别***及方法 |
WO2015079470A2 (en) * | 2013-11-29 | 2015-06-04 | Protodesign S.R.L. | Video coding system for images and video from air or satellite platform assisted by sensors and by a geometric model of the scene |
CN105160310A (zh) * | 2015-08-25 | 2015-12-16 | 西安电子科技大学 | 基于3d卷积神经网络的人体行为识别方法 |
CN106203533A (zh) * | 2016-07-26 | 2016-12-07 | 厦门大学 | 基于混合训练的深度学习人脸验证方法 |
CN106407889A (zh) * | 2016-08-26 | 2017-02-15 | 上海交通大学 | 基于光流图深度学习模型在视频中人体交互动作识别方法 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9892606B2 (en) * | 2001-11-15 | 2018-02-13 | Avigilon Fortress Corporation | Video surveillance system employing video primitives |
KR101284561B1 (ko) * | 2011-02-14 | 2013-07-11 | 충남대학교산학협력단 | 멀티 모달리티 감정인식 시스템, 감정인식 방법 및 그 기록매체 |
CN102209246B (zh) * | 2011-05-23 | 2013-01-09 | 北京工业大学 | 一种实时视频白平衡处理*** |
US8655030B2 (en) * | 2012-04-18 | 2014-02-18 | Vixs Systems, Inc. | Video processing system with face detection and methods for use therewith |
US9213901B2 (en) * | 2013-09-04 | 2015-12-15 | Xerox Corporation | Robust and computationally efficient video-based object tracking in regularized motion environments |
US9741107B2 (en) * | 2015-06-05 | 2017-08-22 | Sony Corporation | Full reference image quality assessment based on convolutional neural network |
CN106469443B (zh) * | 2015-08-13 | 2020-01-21 | 微软技术许可有限责任公司 | 机器视觉特征跟踪*** |
US10157309B2 (en) * | 2016-01-14 | 2018-12-18 | Nvidia Corporation | Online detection and classification of dynamic gestures with recurrent convolutional neural networks |
US10423830B2 (en) * | 2016-04-22 | 2019-09-24 | Intel Corporation | Eye contact correction in real time using neural network based machine learning |
CN106056628B (zh) * | 2016-05-30 | 2019-06-18 | 中国科学院计算技术研究所 | 基于深度卷积神经网络特征融合的目标跟踪方法及*** |
US10037471B2 (en) * | 2016-07-05 | 2018-07-31 | Nauto Global Limited | System and method for image analysis |
CN106331433B (zh) * | 2016-08-25 | 2020-04-24 | 上海交通大学 | 基于深度递归神经网络的视频去噪方法 |
CN108073933B (zh) * | 2016-11-08 | 2021-05-25 | 杭州海康威视数字技术股份有限公司 | 一种目标检测方法及装置 |
US20180190377A1 (en) * | 2016-12-30 | 2018-07-05 | Dirk Schneemann, LLC | Modeling and learning character traits and medical condition based on 3d facial features |
-
2017
- 2017-03-08 CN CN201710136471.9A patent/CN108304755B/zh active Active
-
2018
- 2018-02-09 EP EP18764177.4A patent/EP3540637B1/en active Active
- 2018-02-09 KR KR1020197021770A patent/KR102281017B1/ko active IP Right Grant
- 2018-02-09 WO PCT/CN2018/075958 patent/WO2018161775A1/zh unknown
- 2018-02-09 JP JP2019524446A patent/JP6755395B2/ja active Active
- 2018-03-08 TW TW107107998A patent/TWI672667B/zh active
-
2019
- 2019-04-02 US US16/373,034 patent/US10970600B2/en active Active
-
2021
- 2021-02-26 US US17/187,473 patent/US11610082B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015079470A2 (en) * | 2013-11-29 | 2015-06-04 | Protodesign S.R.L. | Video coding system for images and video from air or satellite platform assisted by sensors and by a geometric model of the scene |
CN104504366A (zh) * | 2014-11-24 | 2015-04-08 | 上海闻泰电子科技有限公司 | 基于光流特征的笑脸识别***及方法 |
CN105160310A (zh) * | 2015-08-25 | 2015-12-16 | 西安电子科技大学 | 基于3d卷积神经网络的人体行为识别方法 |
CN106203533A (zh) * | 2016-07-26 | 2016-12-07 | 厦门大学 | 基于混合训练的深度学习人脸验证方法 |
CN106407889A (zh) * | 2016-08-26 | 2017-02-15 | 上海交通大学 | 基于光流图深度学习模型在视频中人体交互动作识别方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3540637A4 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353597B (zh) * | 2018-12-24 | 2023-12-05 | 杭州海康威视数字技术股份有限公司 | 一种目标检测神经网络训练方法和装置 |
CN111353597A (zh) * | 2018-12-24 | 2020-06-30 | 杭州海康威视数字技术股份有限公司 | 一种目标检测神经网络训练方法和装置 |
KR20210107084A (ko) * | 2019-03-07 | 2021-08-31 | 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 | 이미지 처리 방법 및 장치, 컴퓨터 디바이스, 및 저장 매체 |
US11900567B2 (en) | 2019-03-07 | 2024-02-13 | Tencent Technology (Shenzhen) Company Limited | Image processing method and apparatus, computer device, and storage medium |
KR102509817B1 (ko) * | 2019-03-07 | 2023-03-14 | 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 | 이미지 처리 방법 및 장치, 컴퓨터 디바이스, 및 저장 매체 |
CN110532431A (zh) * | 2019-07-23 | 2019-12-03 | 平安科技(深圳)有限公司 | 短视频关键词提取方法、装置及存储介质 |
CN110532431B (zh) * | 2019-07-23 | 2023-04-18 | 平安科技(深圳)有限公司 | 短视频关键词提取方法、装置及存储介质 |
CN110555861B (zh) * | 2019-08-09 | 2023-04-25 | 北京字节跳动网络技术有限公司 | 光流计算方法、装置及电子设备 |
CN110555861A (zh) * | 2019-08-09 | 2019-12-10 | 北京字节跳动网络技术有限公司 | 光流计算方法、装置及电子设备 |
CN111091144A (zh) * | 2019-11-27 | 2020-05-01 | 云南电网有限责任公司电力科学研究院 | 基于深度伪孪生网络的图像特征点匹配方法及装置 |
CN111314733A (zh) * | 2020-01-20 | 2020-06-19 | 北京百度网讯科技有限公司 | 用于评估视频清晰度的方法和装置 |
CN111314733B (zh) * | 2020-01-20 | 2022-06-10 | 北京百度网讯科技有限公司 | 用于评估视频清晰度的方法和装置 |
CN111340905A (zh) * | 2020-02-13 | 2020-06-26 | 北京百度网讯科技有限公司 | 图像风格化方法、装置、设备和介质 |
CN111340905B (zh) * | 2020-02-13 | 2023-08-04 | 北京百度网讯科技有限公司 | 图像风格化方法、装置、设备和介质 |
CN111754503A (zh) * | 2020-07-01 | 2020-10-09 | 武汉楚精灵医疗科技有限公司 | 基于两通道卷积神经网络的肠镜退镜超速占比监测方法 |
CN111754503B (zh) * | 2020-07-01 | 2023-12-08 | 武汉楚精灵医疗科技有限公司 | 基于两通道卷积神经网络的肠镜退镜超速占比监测方法 |
CN112016041A (zh) * | 2020-08-27 | 2020-12-01 | 重庆大学 | 基于格拉姆求和角场图像化和Shortcut-CNN的时间序列实时分类方法 |
CN112016041B (zh) * | 2020-08-27 | 2023-08-04 | 重庆大学 | 基于格拉姆求和角场图像化和Shortcut-CNN的时间序列实时分类方法 |
CN114760524A (zh) * | 2020-12-25 | 2022-07-15 | 深圳Tcl新技术有限公司 | 视频处理方法、装置、智能终端及计算机可读存储介质 |
CN113591761A (zh) * | 2021-08-09 | 2021-11-02 | 成都华栖云科技有限公司 | 一种视频镜头语言识别方法 |
CN113792654A (zh) * | 2021-09-14 | 2021-12-14 | 湖南快乐阳光互动娱乐传媒有限公司 | 视频片段的整合方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP3540637A1 (en) | 2019-09-18 |
JP2019534520A (ja) | 2019-11-28 |
KR20190100320A (ko) | 2019-08-28 |
EP3540637B1 (en) | 2023-02-01 |
EP3540637A4 (en) | 2020-09-02 |
KR102281017B1 (ko) | 2021-07-22 |
CN108304755B (zh) | 2021-05-18 |
US11610082B2 (en) | 2023-03-21 |
US10970600B2 (en) | 2021-04-06 |
JP6755395B2 (ja) | 2020-09-16 |
US20210182616A1 (en) | 2021-06-17 |
CN108304755A (zh) | 2018-07-20 |
US20190228264A1 (en) | 2019-07-25 |
TW201833867A (zh) | 2018-09-16 |
TWI672667B (zh) | 2019-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018161775A1 (zh) | 一种用于图像处理的神经网络模型的训练方法、装置和存储介质 | |
EP3716198A1 (en) | Image reconstruction method and device | |
US9344690B2 (en) | Image demosaicing | |
CN110751649B (zh) | 视频质量评估方法、装置、电子设备及存储介质 | |
CN111835983B (zh) | 一种基于生成对抗网络的多曝光图高动态范围成像方法及*** | |
TW202042176A (zh) | 圖像生成網路的訓練及影像處理方法和裝置、電子設備 | |
WO2023005818A1 (zh) | 噪声图像生成方法、装置、电子设备及存储介质 | |
Wang et al. | Low-light image enhancement based on virtual exposure | |
WO2023160426A1 (zh) | 视频插帧方法、训练方法、装置和电子设备 | |
Bare et al. | Real-time video super-resolution via motion convolution kernel estimation | |
CN112950596A (zh) | 基于多区域多层次的色调映射全向图像质量评价方法 | |
JP7463186B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
CN110049242A (zh) | 一种图像处理方法和装置 | |
CN111429371A (zh) | 图像处理方法、装置及终端设备 | |
Fu et al. | Screen content image quality assessment using Euclidean distance | |
CN104182931B (zh) | 超分辨率方法和装置 | |
CN112565887A (zh) | 一种视频处理方法、装置、终端及存储介质 | |
CN116980549A (zh) | 视频帧处理方法、装置、计算机设备和存储介质 | |
CN115471413A (zh) | 图像处理方法及装置、计算机可读存储介质和电子设备 | |
Choudhury et al. | HDR image quality assessment using machine-learning based combination of quality metrics | |
CN116797510A (zh) | 图像处理方法、装置、计算机设备和存储介质 | |
Sun et al. | Explore unsupervised exposure correction via illumination component divided guidance | |
JP7512150B2 (ja) | 情報処理装置、情報処理方法およびプログラム | |
Chen et al. | Low-light image enhancement and acceleration processing based on ZYNQ | |
CN113034358B (zh) | 一种超分辨率图像处理方法以及相关装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18764177 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019524446 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2018764177 Country of ref document: EP Effective date: 20190613 |
|
ENP | Entry into the national phase |
Ref document number: 20197021770 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |