CN116095328A

CN116095328A - Video encoding method, model training method, apparatus, and storage medium

Info

Publication number: CN116095328A
Application number: CN202111290503.3A
Authority: CN
Inventors: 孔德辉; 徐科; 宋剑军; 任聪; 易自尧
Original assignee: Sanechips Technology Co Ltd
Current assignee: Sanechips Technology Co Ltd
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2023-05-09
Also published as: WO2023077707A1

Abstract

The invention provides a video coding method, a model training method, equipment and a storage medium, wherein the model training method comprises the steps of extracting a current video frame from an original video sample and obtaining coding constraint parameters corresponding to the current video frame; obtaining a reconstructed video frame corresponding to the current video frame according to the current video frame and the coding constraint parameter; determining a target loss value according to the current video frame and the reconstructed video frame; and training a video coding model according to the target loss value until the video coding model meets a convergence condition. The embodiment of the invention trains the video coding model based on the video frames to be coded and the reconstructed frames corresponding to the video frames so as to realize the coding of global information based on the video frames, overcomes the defect that the classical coding and decoding scheme is based on block division, and improves the video coding quality under the same constraint.

Description

Video encoding method, model training method, apparatus, and storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video encoding method, a model training method, a device, and a storage medium.

Background

The video coding takes the high correlation of video signals and the visual characteristics of human eyes as starting points, and eliminates redundancy generated by various correlations and the human eye characteristics by a proper coding mode so as to achieve the purposes of compressing the video signals and reducing the transmission code rate.

Current video coding standards basically follow a block-based video coding scheme, i.e. by dividing a video frame into coded blocks, then intra-or inter-coding the coded blocks, compressing the coded residual with a given quantization parameter (Quantitative parameters, QP) by transformation to achieve compression of each frame of data by block-wise coding. Because the standard coding frames are concentrated on shallow image information and local blocks, the coding is not performed based on the global information of the video frames, so that the coding effect is low.

Disclosure of Invention

The embodiment of the invention provides a video coding method, a model training method, equipment and a storage medium, so as to improve coding effect.

In a first aspect, an embodiment of the present invention provides a model training method, where the method includes:

extracting a current video frame from an original video sample;

acquiring coding constraint parameters corresponding to the current video frame;

obtaining a reconstructed video frame corresponding to the current video frame according to the current video frame and the coding constraint parameter;

determining a target loss value according to the current video frame and the reconstructed video frame;

and training a video coding model according to the target loss value until the video coding model meets a convergence condition.

In a second aspect, an embodiment of the present invention provides a video encoding method, including:

acquiring an original video;

dividing the original video into a plurality of video frames;

and obtaining an output code stream according to each video frame, the preset coding constraint parameters and the video coding model obtained through training by the model training method according to the first aspect.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the model training method as provided above in the first aspect or the video encoding method as provided above in the second aspect when the computer program is executed.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the model training method provided in the first aspect above, or the video encoding method provided in the second aspect above.

According to the scheme of the embodiment of the invention, a current video frame is firstly extracted from an original video sample, and coding constraint parameters corresponding to the current video frame are obtained; obtaining a reconstructed video frame corresponding to the current video frame according to the current video frame and the coding constraint parameter; determining a target loss value according to the current video frame and the reconstructed video frame; and training a video coding model according to the target loss value until the video coding model meets a convergence condition. The embodiment of the invention trains the video coding model based on the video frames to be coded and the reconstructed frames corresponding to the video frames so as to realize the coding of global information based on the video frames, overcomes the defect that the classical coding and decoding scheme is based on block division, and improves the video coding quality under the same constraint.

Drawings

Fig. 1 is a schematic diagram of a video encoding module according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a video coding module according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating steps of a model training method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of the substeps of step S330 in FIG. 3;

FIG. 5 is a schematic flow chart of the substeps of step S330 in FIG. 3;

fig. 6 is a flowchart illustrating steps of a video encoding method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the technical solutions according to the embodiments of the present invention will be given with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be appreciated that in the description of embodiments of the present invention, the descriptions of "first," "second," etc. are for the purpose of distinguishing between technical features only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated. "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any of these items, including any group of single or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c, or a and b and c, wherein a, b and c may be single or multiple.

The embodiment of the invention provides a video coding method, a model training method, equipment and a storage medium, which are used for coding based on global information of video frames so as to overcome the defect that a classical coding and decoding scheme is based on block division and improve the video coding quality under the same constraint.

First, a description will be given of a framework of a video coding model to which the video coding method of the embodiment of the present invention is applied. Referring to fig. 1, fig. 1 is a schematic diagram illustrating a frame of a video coding module according to an embodiment of the invention. The video coding module of the embodiment of the invention can comprise a coding element generating module and a coding module, wherein the coding element generating module is used for determining a coding element according to a video frame of an input model and coding constraint parameters; the coding module is used for coding the current video frame according to the coding element and outputting the coded video frame.

It should be appreciated that video encoding as described by embodiments of the present invention generally refers to the process of compressing an original video into a bitstream to reduce the amount of data required to represent the video, thereby more efficiently storing and/or transmitting. The video decoding is to reconstruct the video code stream to obtain the reconstructed video in the reverse processing process of the encoding. In most coding standards, adaptive inter/intra prediction is used on a block basis, and a basic block unit for video coding is often referred to as a Coding Unit (CU). A CU is typically obtained by block segmentation of video frames of the original video.

In the encoding process, a reference block is generally generated through spatial (intra) prediction and temporal (inter) prediction, subtracted from a current block (a block currently processed or to be processed) to obtain a residual block, transformed in a transform domain, and quantized (quantitive) to reduce the amount of data to be transmitted (compressed). In inter coding, a reference block may be selected from other video frames and the motion estimation unit is provided with a reference picture and/or an offset (spatial offset) between the position of the reference block (X, Y coordinates) and the position of the current block as inter prediction parameters. This offset is also called Motion Vector (MV).

The embodiment of the invention constructs the coding element generation module of the video coding model based on the coding elements used in the coding process. Specifically, the encoding element generation module may include at least one among a CU partition sub-module, a QP determination sub-module, and an MV estimation sub-module.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating an architecture of a video coding model according to an embodiment of the invention. In the example shown in fig. 2, the video coding model employs a graph neural network (Grahp Nueral Network, GNN) based architecture. The coding element generation module comprises a CU dividing sub-module, a QP determining sub-module and an MV estimating sub-module. The system comprises a CU dividing submodule, a QP determining submodule, a MV estimating submodule and a MV estimating submodule, wherein the CU dividing submodule is used for outputting CU dividing results based on video data of an input model, the QP determining submodule is used for outputting QP values based on the video data of the input model, and the MV estimating submodule is used for outputting MV estimating results based on the video data of the input model. It should be appreciated that the video data input to the model includes the current (to be encoded) video frame and the encoding constraint parameter, and when the encoding mode is inter-frame encoding, the video data also includes the reference video frame.

The CU dividing sub-module, the QP determining sub-module and the MV estimating sub-module are mutually related, the CU dividing sub-module is further used for updating self-output results based on the output results of the QP determining sub-module, the QP determining sub-module is further used for updating self-output results based on the output results of the CU dividing sub-module and the MV estimating sub-module, and the MV estimating sub-module is further used for updating self-output results based on the output results of the CU dividing sub-module. Of course, in a particular implementation, the encoding element generation module may include more or fewer sub-modules than the example shown in fig. 2.

The video coding model based on the GNN architecture can be deployed to the GNN accelerator to reduce the iterative hardware cost, and can be deployed to the similar GNN accelerator through tool chain recompilation under the condition of introducing new nodes or edges (equivalent to updating the coding and decoding schemes), so that the product iteration can be realized without replacing a hardware encoder, and the method has better flexibility.

Referring to fig. 3, fig. 3 illustrates a model training method according to an embodiment of the present invention, which can be applied to the video coding model shown in fig. 1 or fig. 2. The model training method provided by the embodiment of the invention comprises the following steps:

step S310, extracting a current video frame from an original video sample;

step S320, obtaining coding constraint parameters corresponding to the current video frame;

step S330, obtaining a reconstructed video frame corresponding to the current video frame according to the current video frame and the coding constraint parameter;

step S340, determining a target loss value according to the current video frame and the reconstructed video frame;

and step S350, training a video coding model according to the target loss value until the video coding model meets a convergence condition.

The model training method provided by the embodiment of the invention can be executed by a server, and the server can be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

In step S310, original videos (uncompressed videos) of different application scenes may be collected to construct training samples, and when there is an explicit deployment scene requirement (for example, for a video cloud conference scene or a video monitoring scene), the duty ratio of the original video samples in the corresponding scene may be increased, so as to improve the processing performance of the video coding model on the video of the scene. It should be appreciated that each of the acquired original video samples includes a plurality of video frames, and the video frame currently to be encoded is selected from the plurality of video frames included in the original video sample as the current video frame.

In step S320, the coding constraint parameters may be preconfigured by the user, and illustratively, the coding constraint parameters may include at least one of coding mode indication parameters and rate control parameters, for example, the coding mode indication parameters represent intra-coding with 0 and inter-coding with 1; the code rate control parameter can also adopt specific numerical values to express the corresponding requirement of the additional code rate. It should be noted that the encoding constraint parameters may also include other types of parameters besides the above-mentioned parameters, which are not limited by the embodiment of the present invention.

The coding constraint parameters are used for indicating the coding mode of the video coding model, so that the video coding model can be flexibly coded based on the coding constraint parameters, and the video coding model has stronger universality. In addition, the coding constraint parameters such as the coding mode indication parameters and the code rate control parameters are introduced into the coding framework, so that finer coding control can be realized.

In step S330, the reconstructed video frame corresponding to the current video frame may be determined by different method steps based on the training method adopted. The training mode herein includes a supervised training mode and an unsupervised training mode.

Referring to fig. 4, for example, in the supervised training mode, in step S330, the acquisition of the reconstructed video frame corresponding to the current video frame may be specifically implemented by the following method steps:

step S331, a first code stream sample is obtained, where the first code stream sample is obtained by pre-encoding the original video sample based on a preset video encoding standard.

Step S332, decoding the first code stream sample, and obtaining a reconstructed video frame corresponding to the current video frame according to a decoding result.

Illustratively, in step S331, the original video samples may be pre-encoded by the advanced video coding standard h.266 to obtain an output code stream.

In step S332, the output code stream obtained in step S331 may be decoded based on the advanced video coding standard to obtain a decoding result, where the decoding result is a reconstructed video, and the reconstructed video frame corresponding to the current video frame is obtained from the reconstructed video.

It will be appreciated that embodiments of the present invention train a video coding model based on determining a target loss value from a current video frame and a reconstructed video frame. In the supervised training mode, in step S340, the target loss value may be specifically determined in the following three ways.

In a first way, a first target loss value is determined from a norm of a difference between the reconstructed video frame and the current video frame. Specifically, the first target loss value may be calculated by the following formula (1):

f1＝||x′-x|| ¹ (1)

in equation (1), x represents the current video frame, x' represents the reconstructed video frame, I.I ¹ The L1 norm is represented, and f1 represents the first target loss value.

In a second mode, inputting the current video frame into a preset discriminator network to obtain a first image quality discrimination result; inputting the reconstructed video frame into a preset discriminator network to obtain a second image quality discrimination result; and determining a second target loss value according to the norm of the difference between the second image quality judging result and the first image quality judging result.

The arbiter network may be a trained generated countermeasure network (Generative Adversarial Networks, GAN) network arbiter, or a trained Enhanced Super-resolution generated countermeasure network (Enhanced Super-Resolution Generative Adversarial Networks) network arbiter, where the embodiment of the invention does not limit the specific type of arbiter.

In a second manner, based on the image quality degradation of the reconstructed video frame relative to the original video frame, the current video frame is input into a discriminator network to obtain a first image quality discrimination result, the reconstructed video frame is input into a preset discriminator network to obtain a second image quality discrimination result, and a target loss value is determined based on the first image quality discrimination result and the second image quality discrimination result to evaluate the encoding quality based on the target loss value.

Specifically, the second target loss value may be calculated by the following formula (2):

f2＝||g(h′(h(x)))-g(x)|| ¹ (2)

in formula (2), x represents the current video frame, h represents the encoding function, h ' represents the decoding function, (h ' (h (x))) represents the reconstructed video frame, g represents the discriminator output, g (x) represents the first image quality discrimination result, g (h ' (h (x))) represents the second image quality discrimination result, and f2 represents the second target loss value.

A third way is to obtain a first target loss value, wherein the first target loss value is determined according to a norm of a difference value between the reconstructed video frame and the current video frame; acquiring a length parameter, wherein the length parameter is determined according to the length of the current video frame after precoding; and determining a third target loss value according to the first target loss value and the length parameter. Specifically, the third target loss value may be calculated by the following formula (3):

f3＝||x′-x|| ¹ +λlen(h(x)) (3)

in formula (3), x represents the current video frame; x' represents a reconstructed video frame; I.I ¹ Represents an L1 norm; ||x' -x|| ¹ Representing a first target loss value; len (h (x)) represents the length of the current video frame after pre-encoding, i.e. the length parameter; λ represents a coefficient; f3 represents a third target loss value.

In step S350, training a video coding model according to the target loss value until the video coding model meets a convergence condition may include: and training a video coding model according to at least one of the first target loss value, the second target loss value and the third target loss value until the video coding model meets a convergence condition.

In particular implementations, the objective loss function (e.g., any one or more of formulas (1), (2), and (3)) may be flexibly selected, then the objective loss value may be determined based on the current video frame and the reconstructed video frame and the selected objective loss function, and finally the video coding model may be trained based on the objective loss value.

For example, in the inter-frame coding mode, the convergence condition may be determined according to the target loss value, the current video frame, the coding constraint parameter and the reference video frame are used as inputs, the video coding model is trained, the current loss value is determined according to the coding result output by the video coding model and the target loss function, and whether the video coding model meets the convergence condition is determined by comparing the current loss value with the target loss value. If the convergence condition is not satisfied, adjusting model parameters of the video coding model, and iteratively executing the following steps: training a video coding model by taking a current video frame, coding constraint parameters and a reference video frame as inputs, determining a current loss value according to a coding result and a target loss function output by the video coding model, and determining whether the video coding model meets a convergence condition by comparing the current loss value with the target loss value; if the convergence condition has been met, the training is ended.

It will be appreciated that if the coding scheme is intra-coding, the video coding model is trained with the current video frame and coding constraint parameters as inputs.

In the above-mentioned supervised training method, the original video samples are first pre-encoded based on the advanced coding standard to obtain an output code stream, the output code stream obtained in step S331 is decoded based on the advanced video coding standard to obtain a reconstructed video, a reconstructed video frame corresponding to the current video frame is obtained from the reconstructed video, and a target loss value is determined based on the reconstructed video frame to perform model training based on the target loss value. Because the target loss value is determined based on the reconstructed video frame, the overall reconstruction effect of the model output code stream can be improved.

It can be understood that the video coding model provided by the embodiment of the invention is also applicable to an unsupervised training mode.

Referring to fig. 5, in the unsupervised training mode, in step S330, the obtaining the reconstructed video frame corresponding to the current video frame may be specifically implemented by the following method steps:

step S333, performing CU division on the current video frame by using the video coding model to obtain a plurality of CU division results;

step S334, coding based on each CU partition result by using the video coding model, to obtain a plurality of second code stream samples corresponding to the plurality of CU partition results one by one;

step S335, decoding each of the second code stream samples to obtain a plurality of reconstructed video frames corresponding to the plurality of CU partitioning results one by one.

Illustratively, the current video frame is input to the video coding model multiple times, each input resulting in a CU partitioning result. And outputting corresponding second code stream samples by the video coding model under each CU dividing result. And decoding each second code stream sample to obtain a plurality of reconstructed video frames corresponding to a plurality of CU dividing results one by one.

In an unsupervised training mode, in step S340, the target loss value may be specifically determined by the following method steps: determining norms of differences between the reconstructed video frame and the current video frame corresponding to each CU dividing result to obtain a fourth target loss value corresponding to each CU dividing result; and selecting a minimum value from fourth target loss values corresponding to the CU division results respectively as a fifth target loss value.

Specifically, the fifth target loss value may be calculated by the following formula (4):

f＝min||x-h′(h(g(x)))|| ¹ (4)

in formula (4), g (x) represents a current CU partition result, and h' (h (g (x))) represents a reconstructed video frame obtained based on the current CU partition result.

In the above-mentioned unsupervised training process, based on CU partition modeling, to find the best partition for processing video frames, maximum coupling within the CU under coding parameter constraints, and minimum correlation between CUs are achieved.

In addition, the embodiment of the invention provides a video coding model based on the GNN architecture, which can be deployed to the GNN accelerator to reduce the iterative hardware cost.

The embodiment of the invention also provides a video coding method. Referring to fig. 6, the video encoding method provided by the embodiment of the invention includes, but is not limited to, the following steps:

s410, acquiring an original video;

s420, dividing the original video into a plurality of video frames;

s430, obtaining an output code stream according to each video frame, the preset coding constraint parameters and the video coding model obtained through training by the model training method provided by any one of the previous embodiments.

It should be noted that, the video encoding method provided in the embodiment of the present invention may be executed by a terminal device, where the terminal device may be hardware or software. When the terminal device is hardware, it may be a variety of electronic devices including, but not limited to, smartphones, tablet computers, electronic book readers, car-mounted computers, laptop and desktop computers, and the like. When the terminal device is software, it can be installed in the above-listed electronic device. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The county body limitation is not made here.

In particular, when the method is implemented, the terminal equipment divides an uncompressed original video into a plurality of video frames, the video frames are input into a trained video coding model, and the video frames are coded by utilizing the video coding model based on preset coding constraint parameters.

Fig. 7 shows an electronic device 500 provided by an embodiment of the invention. The electronic device 500 includes, but is not limited to:

a memory 510 for storing a program;

the processor 520 is configured to execute the program stored in the memory 510, and when the processor 520 executes the program stored in the memory 510, the processor 520 is configured to execute the model training method or the video encoding method described above.

The processor 520 and the memory 510 may be connected by a bus or other means.

The memory 510 serves as a non-transitory computer readable storage medium that can be used to store non-transitory software programs as well as non-transitory computer executable programs, such as the model training method or the video encoding method described in any of the embodiments of the present invention. The processor 520 implements the model training method or video encoding method described above by running non-transitory software programs and instructions stored in the memory 510.

Memory 510 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store a model training method or a video encoding method that performs the above. In addition, memory 510 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, memory 510 may optionally include memory located remotely from processor 520, which may be connected to processor 520 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions required to implement the above-described model training method or video encoding method are stored in the memory 510, which when executed by the one or more processors 520, perform the model training method or video encoding method provided by any embodiment of the present invention.

The embodiment of the invention also provides a storage medium which stores computer executable instructions for executing the model training method or the video coding method.

In an embodiment, the storage medium stores computer-executable instructions that are executed by one or more control processors 520, for example, by one of the processors 520 in the electronic device 500, so that the one or more processors 520 perform the model training method or the video encoding method provided in any embodiment of the present invention.

The embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically include computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media.

The preferred embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and those skilled in the art will appreciate that the present invention may be practiced without departing from the spirit of the present invention. Various equivalent modifications and substitutions may be made in the shared context, and are intended to be included within the scope of the present invention as defined in the following claims.

Claims

1. A method of model training, the method comprising:

extracting a current video frame from an original video sample;

2. The method of claim 1, wherein the obtaining the reconstructed video frame corresponding to the current video frame comprises:

acquiring a first code stream sample, wherein the first code stream sample is obtained by pre-coding the original video sample based on a preset video coding standard;

and decoding the first code stream sample, and acquiring a reconstructed video frame corresponding to the current video frame according to a decoding result.

3. The method of claim 1, wherein said determining a target loss value from said current video frame and said reconstructed video frame comprises:

determining a first target loss value according to a norm of a difference between the reconstructed video frame and the current video frame;

the training of the video coding model according to the target loss value comprises the following steps:

and training a video coding model according to the first target loss value.

4. The method of claim 1, wherein said determining a target loss value from said current video frame and said reconstructed video frame comprises:

inputting the current video frame into a preset discriminator network to obtain a first image quality discrimination result;

inputting the reconstructed video frame into a preset discriminator network to obtain a second image quality discrimination result;

determining a second target loss value according to the norm of the difference between the second image quality discrimination result and the first image quality discrimination result;

and training a video coding model according to the second target loss value.

5. The method of claim 1, wherein said determining a target loss value from said current video frame and said reconstructed video frame comprises:

obtaining a first target loss value, wherein the first target loss value is determined according to a norm of a difference value between the reconstructed video frame and the current video frame

Acquiring a length parameter, wherein the length parameter is determined according to the length of the current video frame after precoding;

determining a third target loss value according to the first target loss value and the length parameter;

and training a video coding model according to the third target loss value.

6. The method of claim 1, wherein the obtaining the reconstructed video frame corresponding to the current video frame comprises:

dividing a current video frame into coding units CUs by using the video coding model to obtain a plurality of CU dividing results;

coding based on each CU dividing result by utilizing the video coding model to obtain a plurality of second code stream samples corresponding to the CU dividing results one by one;

decoding each second code stream sample to obtain a plurality of reconstructed video frames corresponding to a plurality of CU dividing results one by one;

the determining a target loss value according to the current video frame and the reconstructed video frame comprises the following steps:

determining norms of differences between the reconstructed video frame and the current video frame corresponding to each CU dividing result to obtain a fourth target loss value corresponding to each CU dividing result;

selecting a minimum value from fourth target loss values corresponding to the CU division results respectively as a fifth target loss value;

and training a video coding model according to the fifth target loss value.

7. The method of claim 1, wherein the coding constraint parameters comprise at least one of coding mode indication parameters and rate control parameters.

8. The method of claim 1, wherein the video coding model comprises a coding element generation module and a coding module;

the coding element generation is used for generating a coding element according to the current video frame of the input model and the coding constraint parameters;

the encoding module is used for encoding the current video frame according to the encoding element;

the coding element generation module comprises at least one of a CU partitioning sub-module, a quantization parameter QP determination sub-module, and a motion vector MV estimation sub-module.

9. A video encoding method, the method comprising:

acquiring an original video;

dividing the original video into a plurality of video frames;

obtaining an output code stream according to each video frame, preset coding constraint parameters and a video coding model obtained by training according to the method of any one of claims 1-8.

10. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-9 when the computer program is executed.

11. A computer readable storage medium, characterized in that a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-9.