WO2023050439A1 - 编解码方法、码流、编码器、解码器、存储介质和*** - Google Patents

编解码方法、码流、编码器、解码器、存储介质和*** Download PDF

Info

Publication number
WO2023050439A1
WO2023050439A1 PCT/CN2021/122480 CN2021122480W WO2023050439A1 WO 2023050439 A1 WO2023050439 A1 WO 2023050439A1 CN 2021122480 W CN2021122480 W CN 2021122480W WO 2023050439 A1 WO2023050439 A1 WO 2023050439A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
feature
data
node
encoding
Prior art date
Application number
PCT/CN2021/122480
Other languages
English (en)
French (fr)
Inventor
虞露
王超
王东
Original Assignee
浙江大学
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, Oppo广东移动通信有限公司 filed Critical 浙江大学
Priority to CN202180102716.0A priority Critical patent/CN117981309A/zh
Priority to PCT/CN2021/122480 priority patent/WO2023050439A1/zh
Publication of WO2023050439A1 publication Critical patent/WO2023050439A1/zh
Priority to US18/618,752 priority patent/US20240244218A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the embodiments of the present application relate to the technical field of intelligent encoding, and in particular, to an encoding and decoding method, a code stream, an encoder, a decoder, a storage medium, and a system.
  • image and video codec processing can include traditional methods and intelligent methods based on neural networks.
  • the traditional method is to perform de-redundancy processing on the input data.
  • the image or video codec process uses the spatial correlation of each frame of images and the temporal correlation between multiple frames of images to perform de-redundancy; while the intelligent method is Use neural network to process image information and extract feature data.
  • the decoded data obtained through the encoding and decoding process is directly used as the input data of the intelligent task network.
  • the decoded data may contain a large amount of redundant information that is not needed by the intelligent task network, and the transmission of these redundant information will lead to a waste of bandwidth or a decrease in the efficiency of the intelligent task network; There is almost no correlation between them, so that the encoding and decoding process cannot realize the optimization of the intelligent task network.
  • the embodiment of the present application provides a codec method, code stream, encoder, decoder, storage medium and system, which can not only better learn the image information required by the intelligent task network, but also reduce the complexity of the intelligent task network Accuracy, thereby improving the accuracy and speed of the intelligent task network.
  • the embodiment of the present application provides a decoding method, which includes:
  • the embodiment of the present application provides an encoding method, which includes:
  • the encoding network is used to encode the initial feature data, and the obtained encoded bits are written into the code stream.
  • the embodiment of the present application provides a code stream, which is generated by performing bit coding according to the information to be encoded; wherein, the information to be encoded includes at least initial feature data, and the initial feature data is input through the intelligent task network
  • the image data is obtained by feature extraction.
  • an encoder which includes a first feature extraction unit and an encoding unit; wherein,
  • the first feature extraction unit is configured to use the intelligent task network to perform feature extraction on the input image data to obtain initial feature data;
  • the encoding unit is configured to use the encoding network to encode the initial feature data, and write the obtained encoded bits into the code stream.
  • the embodiment of the present application provides an encoder, where the encoder includes a first memory and a first processor; wherein,
  • a first memory for storing a computer program capable of running on the first processor
  • the first processor is configured to execute the method of the second aspect when running the computer program.
  • the embodiment of the present application provides a decoder, which includes an analysis unit and a feature analysis unit; wherein,
  • the parsing unit is configured to parse the code stream and determine the reconstruction feature data
  • the feature analysis unit is configured to use the intelligent task network to perform feature analysis on the reconstructed feature data to determine the target result.
  • the embodiment of the present application provides a decoder, where the decoder includes a second memory and a second processor; wherein,
  • a second memory for storing a computer program capable of running on the second processor
  • the second processor is configured to execute the method of the first aspect when running the computer program.
  • the embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed, the method of the first aspect is implemented, or the method of the second aspect is implemented.
  • the embodiment of the present application provides an intelligent analysis system, the intelligent analysis system at least includes the encoder according to the fourth aspect or the fifth aspect and the decoder according to the sixth aspect or the seventh aspect.
  • the embodiment of the present application provides an encoding and decoding method, code stream, encoder, decoder, storage medium and system.
  • the network encodes the initial feature data, and writes the obtained coded bits into the code stream.
  • the reconstruction feature data is determined by analyzing the code stream; the intelligent task network is used to perform feature analysis on the reconstruction feature data to determine the target result.
  • using the feature extraction of the intelligent task network as the input of the encoding network can not only better learn the image information required by the intelligent task network, but also save the process of image restoration and extraction and restoration of image feature data in related technologies.
  • the decoding network can perform the processing of the intelligent task network without restoring to the image dimension, which greatly reduces the complexity of the intelligent task network, thereby improving the accuracy and speed of the intelligent task network.
  • FIG. 1 is a schematic diagram of an overall framework of an encoding and decoding system
  • FIG. 2 is a schematic diagram of an overall framework of an intelligent task network
  • FIG. 3 is a schematic diagram of an overall framework of cascading a codec system and an intelligent task network
  • FIG. 4A is a detailed schematic diagram of an encoder provided in an embodiment of the present application.
  • FIG. 4B is a detailed schematic diagram of a decoder provided in an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of an encoding method provided in an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a decoding method provided in an embodiment of the present application.
  • FIG. 7 is a schematic flow diagram of an intelligent fusion network model provided in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an end-to-end codec network provided by an embodiment of the present application.
  • FIG. 9A is a schematic structural diagram of an attention mechanism module provided by an embodiment of the present application.
  • FIG. 9B is a schematic structural diagram of a residual block provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an intelligent task network provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an intelligent fusion network model provided by an embodiment of the present application.
  • FIG. 12A is a schematic structural diagram of a Lee network model provided by the embodiment of the present application.
  • FIG. 12B is a schematic structural diagram of a Duan network model provided in the embodiment of the present application.
  • Fig. 13A is a schematic structural diagram of a yolo_v3 network model provided by the embodiment of the present application.
  • Fig. 13B is a schematic structural diagram of another yolo_v3 network model provided by the embodiment of the present application.
  • FIG. 13C is a schematic structural diagram of a ResNet-FPN network model provided by the embodiment of the present application.
  • FIG. 13D is a schematic structural diagram of a Mask-RCNN network model provided by the embodiment of the present application.
  • FIG. 14 is a schematic diagram of the composition and structure of an encoder provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram of a specific hardware structure of an encoder provided in an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of a decoder provided in an embodiment of the present application.
  • FIG. 17 is a schematic diagram of a specific hardware structure of a decoder provided in an embodiment of the present application.
  • FIG. 18 is a schematic diagram of the composition and structure of an intelligent analysis system provided by the embodiment of the present application.
  • references to “some embodiments” describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
  • first ⁇ second ⁇ third involved in the embodiment of the present application is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ The specific order or sequence of "third” can be interchanged where allowed, so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
  • image and video codec processing can include traditional methods and intelligent methods based on neural network processing.
  • the traditional method is to perform de-redundancy processing on the input data.
  • the image or video codec process uses the spatial correlation of each frame of images and the temporal correlation between multiple frames of images to perform de-redundancy; while the intelligent method is Use neural network to process image information and extract feature data.
  • image-based encoding and decoding processing can be divided into traditional methods and neural network-based intelligent methods.
  • the traditional method is to use the spatial correlation of pixels to perform deredundancy processing on the image, and obtain the code stream through transformation, quantization and entropy coding and transmit it.
  • the intelligent method is to use neural network for encoding and decoding processing.
  • image encoding and decoding methods based on neural network have proposed many efficient neural network structures, which can be used for feature information extraction of images.
  • CNN Convolutional Neural Networks
  • GAN Generative Adversarial Networks
  • RNN Recurrent Neural Networks
  • other network structures which can improve the end-to-end image compression performance based on neural networks.
  • GAN-based image encoding and decoding method has achieved remarkable results in improving the subjective quality of images.
  • video-based codec processing can also be divided into traditional methods and neural network-based intelligent methods.
  • the traditional method is to perform video encoding and decoding processing through intra-frame or inter-frame predictive coding, transformation, quantization, entropy coding, and loop filtering.
  • intelligent methods mainly focus on three aspects: hybrid neural network coding (that is, embedding neural network into video frame instead of traditional coding module), neural network rate-distortion optimized coding and end-to-end video coding.
  • the hybrid neural network coding is generally used in the inter-frame prediction module, the loop filter module and the entropy coding module;
  • the neural network rate-distortion optimization coding uses the highly nonlinear characteristics of the neural network to train the neural network into an efficient Discriminator and classifier, such as applied to the decision-making link of video coding mode;
  • end-to-end video coding is generally divided into using CNN to replace all modules of traditional coding methods, or expanding the input dimension of neural network to perform end-to-end on all frames compression.
  • FIG. 1 shows a schematic diagram of an overall framework of a codec system.
  • the encoding method consists of E1 and E2; among them, E1 refers to the process of extracting feature data and the encoding process, and the feature data can be obtained after E1; E2 refers to the feature data.
  • E1 refers to the process of extracting feature data and the encoding process
  • E2 refers to the feature data.
  • the process of processing and obtaining the code stream that is, the code stream can be obtained after E2.
  • the decoding method is composed of D1 and D2; among them, D2 refers to the process of receiving the code stream and parsing the code stream into characteristic data, that is, the reconstructed characteristic data can be obtained after D2; D1 refers to the process of reconstructing the characteristic data through traditional The method or the process of transforming into decoded data based on the neural network, that is, the decoded data (specifically referred to as "decoded image”) can be obtained after D1.
  • D2 refers to the process of receiving the code stream and parsing the code stream into characteristic data, that is, the reconstructed characteristic data can be obtained after D2
  • D1 refers to the process of reconstructing the characteristic data through traditional The method or the process of transforming into decoded data based on the neural network, that is, the decoded data (specifically referred to as "decoded image”) can be obtained after D1.
  • the intelligent task network generally analyzes the image or video to complete the target detection, target tracking, or behavior recognition and other task objectives.
  • the input of the intelligent task network is the decoded data obtained by the encoding method and the decoding method
  • the processing flow of the intelligent task network is generally composed of A1 and A2; wherein, A1 refers to the input decoding data for the target of the intelligent task network.
  • A2 refers to the process of processing feature data and obtaining results.
  • FIG. 2 which shows a schematic diagram of an overall framework of an intelligent task network.
  • the characteristic data can be obtained after passing through A1
  • the target result can be obtained after the characteristic data passes through A2.
  • the input data of the intelligent task network is generally the decoded data obtained through the encoding method and the decoding method, and the decoded data is directly used as the input of the intelligent task network.
  • FIG. 3 shows a schematic diagram of an overall framework of cascading a codec system and an intelligent task network.
  • the encoding method and the decoding method form the encoding and decoding system.
  • the decoding data is obtained through the encoding method and the decoding method, it will be directly input into A1, and the characteristic data can be obtained through A1; processing, so as to obtain the target result output by the intelligent task network.
  • the decoded data obtained by the encoding method and the decoding method are directly input into the intelligent task network.
  • the decoded data may include a large amount of redundant information that is not needed by the intelligent task network. The transmission of these redundant information leads to a waste of bandwidth or The efficiency of the intelligent task network is reduced; on the other hand, the correlation between the end-to-end encoding and decoding process and the intelligent task network is almost zero, so the encoding and decoding process cannot be optimized for the intelligent task network.
  • an embodiment of the present application provides an encoding method, which is applied to an encoder.
  • the intelligent task network is used to extract the features of the input image data to obtain initial feature data; the encoding network is used to encode the initial feature data, and the obtained encoded bits are written into the code stream.
  • the embodiment of the present application also provides a decoding method, which is applied to a decoder. Analyzing the code stream to determine the reconstruction feature data; using the intelligent task network to perform feature analysis on the reconstruction feature data to determine the target result.
  • the decoding network can perform the processing of the intelligent task network without restoring to the image dimension, which greatly reduces the complexity of the intelligent task network, thereby improving the accuracy and speed of the intelligent task network.
  • the encoder 10 includes a transform and quantization unit 101, an intra frame estimation unit 102, an intra frame prediction unit 103, a motion compensation unit 104, a motion estimation unit 105, an inverse transform and inverse quantization unit 106, a filter control unit
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • a video coding block can be obtained by dividing the coding tree unit (Coding Tree Unit, CTU), and then the residual pixel information obtained after intra-frame or inter-frame prediction is paired by the transformation and quantization unit 101
  • the video coding block is transformed, including transforming the residual information from the pixel domain to the transform domain, and quantizing the obtained transform coefficients to further reduce the bit rate;
  • the intra frame estimation unit 102 and the intra frame prediction unit 103 are used for Intra-frame prediction is performed on the video coding block; specifically, the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used to determine the intra-frame prediction mode to be used to code the video coding block;
  • the motion compensation unit 104 and the motion estimation unit 105 is used to perform inter-frame predictive encoding of the received video coding block relative to one or more blocks in one or more reference frames to provide temporal prediction information;
  • the motion estimation performed by the motion estimation unit 105 is used to generate motion vectors process, the motion vector can estimate the motion of the video
  • the context content can be based on adjacent coding blocks, and can be used to encode the information indicating the determined intra-frame prediction mode, and output the code stream of the video signal; and the decoded image buffer unit 110 is used to store the reconstructed video coding block for forecast reference. As the video image encoding progresses, new reconstructed video encoding blocks will be continuously generated, and these reconstructed video encoding blocks will be stored in the decoded image buffer unit 110 .
  • the decoder 20 includes a decoding unit 201, an inverse transform and inverse quantization unit 202, an intra prediction unit 203, a motion compensation unit 204, a filtering unit 205, and a decoded image buffer unit 206, etc., wherein the decoding unit 201 Header information decoding and CABAC decoding can be realized, and the filtering unit 205 can realize DBF filtering/SAO filtering/ALF filtering.
  • the code stream of the video signal is output; the code stream is input into the video decoding system 20, and first passes through the decoding unit 201 to obtain the decoded transform coefficient; for the transform coefficient, pass
  • the inverse transform and inverse quantization unit 202 performs processing to generate a residual block in the pixel domain; the intra prediction unit 203 is operable to generate residual blocks based on the determined intra prediction mode and data from previously decoded blocks of the current frame or picture.
  • the motion compensation unit 204 determines the prediction information for the video decoding block by parsing motion vectors and other associated syntax elements, and uses the prediction information to generate the predictive properties of the video decoding block being decoded block; a decoded video block is formed by summing the residual block from the inverse transform and inverse quantization unit 202 with the corresponding predictive block produced by the intra prediction unit 203 or the motion compensation unit 204; the decoded video signal Video quality can be improved by filtering unit 205 in order to remove block artifacts; the decoded video blocks are then stored in the decoded picture buffer unit 206, which stores reference pictures for subsequent intra prediction or motion compensation , and is also used for the output of the video signal, that is, the restored original video signal is obtained.
  • FIG. 5 shows a schematic flowchart of an encoding method provided in an embodiment of the present application. As shown in Figure 5, the method may include:
  • S501 Using the intelligent task network to perform feature extraction on the input image data to obtain initial feature data.
  • S502 Use the encoding network to encode the initial feature data, and write the obtained encoded bits into the code stream.
  • the encoder may include an intelligent task network and an encoding network.
  • the intelligent task network is used to realize the feature extraction of the input image data
  • the coding network is used to realize the coding processing of the obtained initial feature data. In this way, using the feature extraction of the intelligent task network as the input of the encoding network can help the encoding network to better learn the image information required by the intelligent task network.
  • the intelligent task network after the intelligent task network extracts the initial feature data, it will not execute the subsequent processing flow of the intelligent task network, but directly use the encoding nodes with the same dimension to encode it; so that Subsequently, in the decoder, after the reconstructed characteristic data are determined through the decoding network, the subsequent processing flow of the intelligent task network can be performed on the reconstructed characteristic data. In this way, not only can better learn the image information required by the intelligent task network, but also save the process of image restoration and extracting and recovering image feature data in related technologies, so that the decoding network can perform intelligent tasks without restoring to the image dimension
  • the processing of the network greatly reduces the complexity of the intelligent task network,
  • the intelligent task network may at least include a feature extraction sub-network
  • the feature extraction of the input image data by using the intelligent task network to determine the initial feature data may include: using the feature extraction sub-network The network performs feature extraction on the input image data to obtain the initial feature data at the first feature node.
  • the feature extraction sub-network may include N feature extraction layers, where N is an integer greater than or equal to 1.
  • said using the feature extraction sub-network to perform feature extraction on the input image data to obtain the initial feature data at the first feature node may include:
  • the feature extraction layer is used to perform feature extraction on the input image data to obtain the initial feature data at the first feature node;
  • N feature extraction layers are used to perform feature extraction on the input image data to obtain initial feature data at the first feature node.
  • the first feature node may be a feature node corresponding to a different feature extraction layer, which feature extraction layer is specifically determined according to actual conditions. For example, when it is determined in the intelligent task network that encoding and decoding processing is required after a certain feature extraction layer, then the feature node corresponding to the feature extraction layer is the first feature node, and these feature extraction layers form a feature extraction sub-network. And the initial feature data extracted after passing through the feature extraction layer will be input into the encoding network.
  • the initial feature data at the first feature node it may be obtained by feature extraction by a feature extraction layer, or by feature extraction by two or more feature extraction layers.
  • the present application Examples are not specifically limited.
  • the first feature node is the feature node corresponding to the first feature extraction layer, and the feature data extracted at this time is to be Input the initial feature data of the encoding network
  • the feature extraction sub-network is only the first feature extraction layer
  • the first feature node is the second feature extraction
  • the feature node corresponding to the layer, the feature data extracted at this time is the initial feature data to be input into the encoding network
  • the feature extraction sub-network is the first feature extraction layer and the second feature extraction layer.
  • the method may also include:
  • the initial feature data at the first feature node is determined as the feature data to be encoded at the first encoding node; or,
  • the adaptation network When the first encoding node in the encoding network does not match the data dimension of the first feature node, use the adaptation network to perform data dimension conversion on the initial feature data at the first feature node, and obtain the feature to be encoded at the first encoding node data.
  • the parameters such as the number of feature space channels and resolution at the first feature node are completely consistent with the parameters such as the number of feature space channels and resolution at the first encoding node, it can be determined that The first feature node in the intelligent task network matches the data dimension of the first encoding node in the encoding network.
  • the first encoding node corresponding to the same data dimension in the encoding network can be directly used for encoding processing; that is, the initial feature data is input into the encoding network
  • the first encoding node uses the encoding network to encode the initial feature data, and then writes the obtained encoded bits into the code stream.
  • the adaptive network can be used to perform data dimension conversion on the initial feature data at the first feature node to obtain the feature data to be encoded at the first encoding node.
  • the encoding process of the initial characteristic data by using the encoding network, and writing the obtained encoded bits into the code stream may include: inputting the characteristic data to be encoded into the first encoding node in the encoding network, and using the encoding network to be encoded The feature data is encoded, and the obtained encoded bits are written into the code stream.
  • the adaptation network here may include one or more layers of network structure, and the network structure may use but not limited to upsampling, downsampling, selection or repetition of some channels, etc. . That is to say, in the cascade of the intelligent task network and the encoding network, there will also be a problem that the spatial resolution or the number of channels of the feature map of the input analysis network and the reconstruction feature map do not match.
  • a single layer or The multi-layer network structure acts as an adapter to perform dimension conversion processing of features, thereby adapting the cascade of the two parts of the network.
  • the network structure of the adapter may use, but is not limited to, upsampling, downsampling, selection or repetition of some channels, etc., without any limitation.
  • the embodiment of the present application mainly provides an intelligent fusion network model in which an end-to-end codec network and an intelligent task network are cascaded.
  • the end-to-end codec network includes an encoding network and a decoding network; that is, an intelligent fusion network model can include an encoding network, a decoding network, and an intelligent task network.
  • an intelligent fusion network model can include an encoding network, a decoding network, and an intelligent task network.
  • a part of the intelligent task network and the encoding network are used in the encoder, and another part of the intelligent task network and the decoding network are used in the decoder.
  • the training of the intelligent fusion network model can be performed in the encoder, or in the decoder, or even in both the encoder and the decoder.
  • the method may also include:
  • the preset network model includes an initial encoding network, an initial decoding network and an initial intelligent task network, and the initial encoding network and the initial decoding network are connected to the initial intelligent task network through nodes;
  • the model obtained after training is determined as an intelligent fusion network model; wherein, the intelligent fusion network model includes an encoding network, a decoding network and an intelligent task network.
  • the loss function corresponding to the preset network model can be divided into two parts: the loss function of the codec network and the loss function of the intelligent task network.
  • the method may also include:
  • the loss value of the intelligent task network is determined.
  • the retraining method of the intelligent fusion network model in the embodiment of the present application may be to perform joint training on a fusion network formed by connecting the initial intelligent task network and the initial codec network through nodes.
  • the loss function can be as follows,
  • R represents the codec bit rate of the codec network
  • ⁇ 1 and ⁇ 2 represent the rate-distortion trade-off parameters
  • different ⁇ 1 and ⁇ 2 correspond to different models, that is, different total bit rates
  • loss task represents the intelligent task network loss value, Indicates the distortion value between the input image and the decoded image.
  • x and Re represents the data at the encoding node and the reconstruction node used by the encoding and decoding network instead of image data; in addition, the distortion value here can be measured by using Mean Squared Error (Mean Squared Error, MSE).
  • MSE Mean Squared Error
  • ⁇ 1 loss task represents the loss function of the intelligent task network, Indicates the loss function of the codec network; that is, the loss function of the intelligent fusion network model can be obtained jointly by the loss function of the intelligent task network and the loss function of the codec network.
  • the values of ⁇ 1 and ⁇ 2 are specifically set according to the actual situation, for example, the value of ⁇ 1 is 0.3, and the value of ⁇ 2 is 0.7, but there is no limitation thereto.
  • the intelligent fusion network model retraining method in the embodiment of the present application may also be trained step by step.
  • the value of ⁇ 2 can be set to zero first, and the value of ⁇ 1 can be set to any value, and the intelligent task network is trained at this time; then the value of ⁇ 1 can be set to zero, and the value of ⁇ 2 is any value, at this time, the codec network (including the encoding network and the decoding network) is trained; finally, the joint training is performed, and there is no limitation on the training method here, and it can also be a variety of other training methods, or even A variety of different training methods are used in combination and so on.
  • the intelligent task network may include a feature extraction subnetwork and a feature analysis subnetwork; wherein, the feature extraction subnetwork can be used to perform feature extraction on input image data and determine the initial feature data, which is then encoded by the encoding network.
  • the feature analysis sub-network can be used to perform feature analysis on the input feature data to determine the target result; here it can refer to the completion of task targets such as target detection, target tracking, or behavior recognition.
  • the method may further include:
  • the intelligent task network that has been trained after training can directly perform feature extraction and feature analysis processing on the input image data. At this time, it is no longer necessary to go through the encoding network and decoding network, and the target result can also be determined.
  • the processing of local images at this time, there is no need for data transmission through the end-to-end codec network; at this time, after training the intelligent task network (including the feature extraction sub-network and the feature analysis sub-network), It can also be applied to the analysis and processing of images to obtain the target results of intelligent tasks.
  • This embodiment provides an encoding method using an encoder.
  • the initial feature data is obtained by using the intelligent task network to perform feature extraction on the input image data; the encoding process is performed on the initial feature data by using the encoding network, and the obtained encoded bits are written into the code stream.
  • using the feature extraction of the intelligent task network as the input of the encoding network can not only better learn the image information required by the intelligent task network, but also save the process of image restoration and extraction and restoration of image feature data in related technologies.
  • the complexity of the intelligent task network is greatly reduced, thereby improving the accuracy and speed of the intelligent task network.
  • the embodiment of the present application provides a code stream, where the code stream is generated by performing bit coding according to the information to be coded.
  • the information to be encoded includes at least initial feature data, and the initial feature data is obtained by extracting features from input image data through an intelligent task network.
  • the encoder After the encoder generates the code stream, it can be transmitted to the decoder, so that the subsequent decoder can obtain the reconstructed feature data by parsing the code stream.
  • FIG. 6 shows a schematic flow chart of a decoding method provided in an embodiment of the present application. As shown in Figure 6, the method may include:
  • S601 Parse the code stream, and determine the reconstruction feature data.
  • S602 Using the intelligent task network to perform feature analysis on the reconstructed feature data to determine a target result.
  • the decoder may include a decoding network.
  • the analyzing the code stream to determine the reconstruction feature data may further include: using a decoding network to analyze the code stream to determine the reconstruction feature data.
  • the decoder not only has a decoding function, but also has an intelligent analysis function; that is to say, in addition to the decoding network, the decoder in the embodiment of the present application also includes an intelligent task network.
  • the intelligent task network can be used to perform feature analysis on the reconstructed feature data, so as to determine the target result , here may refer to completing task objectives such as target detection, target tracking, or behavior recognition.
  • the analyzing the code stream and determining the reconstruction feature data may include:
  • Parsing the code stream when the first feature node in the intelligent task network matches the data dimension of the first reconstruction node, determining the candidate reconstruction feature data at the first reconstruction node as the reconstruction feature data; or,
  • the candidate reconstruction feature data at the first reconstruction node can be determined as the first feature node in the intelligent task network
  • the reconstructed feature data at when the parameters such as the number of feature space channels and resolution at the first feature node are completely consistent with the parameters such as the number of feature space channels and resolution at the first reconstruction node, it can be determined that the intelligence
  • the first feature node in the task network matches the data dimension of the first reconstruction node in the decoding network.
  • the adaptation network here may include one or more layers of network structure, and the network structure may use but not limited to up-sampling, down-sampling, selection or repetition of some channels and so on. That is to say, in the cascade of the intelligent task network and the decoding network, there will also be a problem that the spatial resolution or the number of channels of the feature map of the input analysis network and the reconstruction feature map do not match.
  • a single layer or The multi-layer network structure acts as an adapter to perform dimension conversion processing of features, thereby adapting the cascade of the two parts of the network.
  • the network structure of the adapter may use, but is not limited to, upsampling, downsampling, selection or repetition of some channels, etc., without any limitation.
  • the feature analysis of the reconstruction feature data by using the intelligent task network to determine the target result may include:
  • the reconstructed feature data is input to the first feature node in the intelligent task network, and the feature analysis is performed on the reconstructed feature data by using the intelligent task network to obtain the target result.
  • the first feature node may be a feature node corresponding to a different feature extraction layer, which feature extraction layer is specifically determined according to actual conditions. For example, when it is determined in the intelligent task network that encoding and decoding processing is required after a certain feature extraction layer, then the feature node corresponding to the feature extraction layer is the first feature node, and the extracted feature node obtained after passing through the feature extraction layer The initial feature data will be processed through the encoding network and the decoding network, so that the reconstructed feature data at the first feature node can be obtained, and then the target result can be obtained by analysis.
  • the intelligent task network may include a feature extraction sub-network and a feature analysis sub-network; correspondingly, in some embodiments, in a specific embodiment, the intelligent task The network performs feature analysis on the reconstructed feature data to determine the target result, which may include:
  • the reconstructed feature data is input to the first feature node, and the feature analysis sub-network is used to perform feature analysis on the reconstructed feature data to obtain the target result.
  • the feature extraction sub-network can include several feature extraction layers; and the first feature node here can be obtained through one feature extraction layer, or through two or more feature extraction layers,
  • the embodiments of the present application do not make specific limitations.
  • the feature extraction sub-network It includes four feature extraction layers, and the reconstructed feature data obtained after the fourth feature extraction layer will be input to the feature analysis sub-network for feature analysis, and the target result can be obtained.
  • the feature extraction sub-network includes two features Extraction layer, and the reconstructed feature data obtained after the second feature extraction layer will be input to the feature analysis sub-network for feature analysis, and the target result can be obtained.
  • the feature analysis sub-network can include a region generation network (Region Proposal Network, RPN) and a region of interest head network (Region Of Interest_Heads, ROI_Heads) .
  • RPN region Proposal Network
  • ROI_Heads region of interest head network
  • the output end of the region generation network is connected to the input end of the ROI head network
  • the input end of the region generation network is also connected to the ROI head network
  • the output end of the ROI head network is used to output the target result.
  • using the feature analysis sub-network to perform feature analysis on the reconstructed feature data to obtain the target result may include:
  • the successful feature data first process it through the area generation network to obtain the target area; then intelligently analyze the reconstructed feature data and the target area through the ROI head network, so as to obtain the target result.
  • the embodiment of the present application mainly provides an end-to-end intelligent fusion network model in which the codec network and the intelligent task network are cascaded.
  • Intelligent task network achieves optimal performance.
  • the end-to-end codec network includes an encoding network and a decoding network; that is, an intelligent fusion network model can include an encoding network, a decoding network, and an intelligent task network.
  • part of the intelligent task network and the encoding network are used in the encoder, and the other part of the intelligent task network and the decoding network are used in the decoder.
  • the method may also include:
  • the preset network model includes an initial encoding network, an initial decoding network and an initial intelligent task network, and the initial encoding network and the initial decoding network are connected to the initial intelligent task network through nodes;
  • the model obtained after training is determined as an intelligent fusion network model; wherein, the intelligent fusion network model includes an encoding network, a decoding network and an intelligent task network.
  • the loss function corresponding to the preset network model can be divided into two parts: the loss function of the codec network and the loss function of the intelligent task network.
  • the method may also include:
  • the loss value of the intelligent task network is determined.
  • the retraining method of the intelligent fusion network model in the embodiment of the present application may be to perform joint training on a fusion network formed by connecting the initial intelligent task network and the initial codec network through nodes.
  • the loss function may be as shown in the above formula (1).
  • ⁇ 1 loss task represents the loss function of the intelligent task network, Indicates the loss function of the codec network; that is, the loss function of the intelligent fusion network model can be obtained jointly by the loss function of the intelligent task network and the loss function of the codec network.
  • the values of ⁇ 1 and ⁇ 2 are specifically set according to the actual situation, for example, the value of ⁇ 1 is 0.3, and the value of ⁇ 2 is 0.7, but there is no limitation thereto.
  • the intelligent fusion network model retraining method in the embodiment of the present application may also be trained step by step.
  • the value of ⁇ 2 can be set to zero first, and the value of ⁇ 1 can be set to any value, and the intelligent task network is trained at this time; then the value of ⁇ 1 can be set to zero, and the value of ⁇ 2 is any value, at this time, the codec network (including the encoding network and the decoding network) is trained; finally, the joint training is performed, and there is no limitation on the training method here, and it can also be a variety of other training methods, or even A variety of different training methods are used in combination and so on.
  • This embodiment also provides a decoding method, which is applied to a decoder.
  • the reconstruction characteristic data is determined; the intelligent task network is used to perform characteristic analysis on the reconstruction characteristic data, and the target result is determined.
  • the decoding network can perform intelligent tasks without restoring to the image dimension
  • the processing of the network greatly reduces the complexity of the intelligent task network, thereby improving the accuracy and speed of the intelligent task network.
  • the intelligent task network may firstly perform feature extraction on it, and then input the extracted initial feature data into the encoding network; that is to say , using the feature extraction part of the intelligent task network as the pre-processing flow of the encoding network, that is, using the feature extraction of the intelligent task network as the input of the encoding network can help the encoding and decoding network to better learn the information required by the intelligent task network. image information.
  • the reconstructed feature data can be obtained by parsing the code stream, and then input the reconstructed feature data into the intelligent task network for feature extraction.
  • Analysis that is, the analysis and processing part of the intelligent task network is used as the subsequent processing flow of the decoding network, so that the decoding network can perform the analysis and processing of the intelligent task network without restoring to the image dimension, which greatly reduces the complexity of the intelligent task network.
  • FIG. 7 it shows a schematic flowchart of an intelligent fusion network model provided by an embodiment of the present application.
  • the input data to be encoded i.e. input image data
  • the feature data can be obtained; then the feature data can be obtained after the encoding process of E2; the code stream is input
  • D2 performs decoding processing, reconstructed feature data can be obtained, and after the reconstructed feature data is input into A2 for feature analysis, target results can be obtained.
  • A1 and A2 belong to the intelligent task network, and E2 and D2 belong to the codec network; here, A1 refers to the process of extracting features for the input data to be encoded and obtaining feature data for the target of the intelligent task network, and E2 refers to the process of The process of processing the feature data and obtaining the code stream, D2 refers to the process of receiving the code stream and parsing the code stream into reconstructed feature data, and A2 refers to the process of processing the reconstructed feature data and obtaining the result.
  • the decoding process does not need to be reconstructed to the decoded image, but only needs to be reconstructed into the feature space, and then the feature space is used as the input of the intelligent task network without using the decoded image. That is to say, the feature data extracted by the intelligent task network A1 is encoded and decoded by E2 and D2, and then the reconstructed feature data decoded by D2 is analyzed by A2 to directly obtain the target result.
  • the codec network and the intelligent task network used here wherein the codec network can be divided into an encoding network and a decoding network.
  • the encoding network can use the feature extraction sub-network of the intelligent task network and some nodes of the end-to-end encoding network.
  • the input image data is extracted through the intelligent task network, and the intelligent task network can no longer be executed after reaching a certain feature node. Instead, it directly uses an end-to-end image compression network corresponding to encoding nodes of the same dimension for compression.
  • the decoding network also inputs the reconstruction feature data at the reconstruction node to the intelligent task network after the decoding execution reaches the reconstruction node corresponding to the same dimension as the encoding node, and performs the subsequent processing flow of the intelligent task network.
  • the codec network and intelligent task network used here may be various commonly used end-to-end codec networks and intelligent task networks, which have nothing to do with the specific network structure and type.
  • the codec network itself can use variants of various neural network structures such as CNN, RNN, and GAN; the intelligent task network also does not make any restrictions on the task goals and network structures performed, which can be target detection, target tracking, and behavior recognition. , pattern recognition and other tasks involving image processing.
  • FIG. 8 shows a schematic structural diagram of an end-to-end codec network provided by an embodiment of the present application.
  • an encoding network and a decoding network may be included.
  • Conv is the abbreviation of convolution (Convolution)
  • “1 ⁇ 1”, “3 ⁇ 3”, and “5 ⁇ 5" all indicate the size of the convolution kernel
  • “N” indicates the number of convolution kernels (ie The number of output channels of the convolutional layer)
  • "/2” means 2 times downsampling processing, which halves the input size
  • “ ⁇ 2” means 2 times upsampling processing, which doubles the input size. Since the down-sampling processing is performed by 2 times in the encoding network, the up-sampling processing needs to be performed by 2 times in the decoding network.
  • FIG. 9A shows a schematic structural diagram of an attention mechanism module provided by an embodiment of the present application.
  • it may consist of a residual block (Residual Block, RB), a 1 ⁇ 1 convolutional layer (represented by 1 ⁇ 1 Conv), an activation function, a multiplier, and an adder.
  • the activation function can be represented by a Sigmoid function, which is a common S-type function, also known as an S-type growth curve.
  • the Sigmoid function is often used as the activation function of a neural network to map variables between 0 and 1.
  • the residual block can be composed of three convolutional layers, including the first convolutional layer, the second convolutional layer, and the third convolutional layer, where the first convolutional layer is a 1 ⁇ 1 convolutional layer
  • the size of the product kernel, the number of output channels of N/2, can be represented by 1 ⁇ 1 Conv, N/2;
  • the second convolution layer has a convolution kernel size of 3 ⁇ 3, and the number of output channels of N/2 can be represented by 3 ⁇ 3 Conv,N/2;
  • the third convolutional layer has a convolution kernel size of 1 ⁇ 1, and the number of output channels of N can be represented by 1 ⁇ 1 Conv,N.
  • the intelligent task network may include a feature extraction sub-network and a feature analysis sub-network.
  • F0 represents input, which is input image data.
  • the feature extraction sub-network includes four feature extraction layers: the first convolution module corresponds to the first feature extraction layer, and its corresponding feature node is represented by F1; the second convolution module corresponds to the second feature extraction layer, and its corresponding feature The node is represented by F2; the third convolution module corresponds to the third feature extraction layer, and its corresponding feature node is represented by F3; the fourth convolution module corresponds to the fourth feature extraction layer, and its corresponding feature node is represented by F4.
  • the feature analysis sub-network can include a region generation network (RPN) and a region of interest head network (ROI_Heads), and the final output is the target result.
  • RPN region generation network
  • ROI_Heads region of interest head network
  • FIG. 11 shows a schematic structural diagram of an intelligent fusion network model provided by an embodiment of the present application.
  • a joint network that integrates end-to-end codec network and intelligent task network is shown here, and the goal is to make the intelligent task network achieve optimal performance through the processing and retraining of the joint network.
  • encoding nodes such as e0, e1, e2, e3, e4, e5, e6, e7, e8, and e9 are set in the encoding network
  • Reconstruction nodes such as d6, d7, d8, d9, and d10
  • feature nodes such as F0, F1, F2, F3, and F4
  • e0 and d0 are the input node and output node of the end-to-end codec
  • F0 is the input node of the intelligent task network.
  • the input size it is W ⁇ H ⁇ 3; after the first convolution module, because the size is halved, it is After the second convolution module, as the size continues to be halved, this time is After passing through the third convolution module, as the size continues to be halved, this time is After passing through the fourth convolution module, as the size continues to be halved, this time is
  • the intelligent fusion network model of the embodiment of the present application is shown in Figure 11.
  • the input nodes and output nodes of the end-to-end codec network in the original processing flow are e0 and d0 respectively, and the intelligent task network's
  • the input node is FO (that is, the decoded image through the end-to-end codec network).
  • FO the decoded image through the end-to-end codec network.
  • the initial feature data at the F1 node of the intelligent task network is used as the e1 node in the coding network
  • the input at node and the reconstruction feature data at node d2 are obtained through the decoding network, which can be used as feature data at node F1, and then the subsequent intelligent task network processing flow is performed.
  • codec network described in the embodiment of this application can be such as traditional video codec, intelligent end-to-end image codec, partial intelligence of traditional video codec, and end-to-end video codec Wait, there is no limit here.
  • intelligent task network and the end-to-end codec network proposed in the embodiment of the present application can also be replaced by other common network structures.
  • Lee network and Duan network can be used for specific implementation.
  • the Lee network uses the transfer learning method to improve the quality of the network reconstructed image; the Duan network uses the high-level semantic map to enhance the low-level visual features, and verifies that this method can effectively improve the bit rate-accuracy-distortion performance of image compression .
  • the composition structure of the Lee network model is shown in Figure 12A
  • the composition structure of the Duan network model is shown in Figure 12B.
  • the target recognition network yolo_v3 can be used for specific implementation, and the composition structure of its network model is shown in Figure 13A and Figure 13B; in addition, the target detection network ResNet-FPN and the instance segmentation network Mask can also be used - RCNN, wherein the structure of the target detection network model is shown in Figure 13C, and the structure of the instance segmentation network model is shown in Figure 13D.
  • inputting the feature space vector of the codec network instead of the original image into the intelligent task network can save the process of image restoration and extract and restore image features, and better improve the accuracy and speed of the intelligent task network.
  • using the feature extraction of the intelligent task network as the input of the end-to-end image codec network helps the codec network to better learn the image information required by the intelligent task network.
  • the feature extraction part of the intelligent task network is used as the pre-processing flow of the end-to-end encoding network
  • the analysis and processing part of the intelligent task network is used as the subsequent processing flow of the image end-to-end decoding network, so that the decoding network
  • the processing of the intelligent task network can be performed without restoring to the image dimension, which greatly reduces the complexity of the intelligent task network.
  • This embodiment elaborates on the specific implementation of the aforementioned embodiments in detail. According to the technical solutions of the aforementioned embodiments, it can be seen that not only can better learn the image information required by the intelligent task network, but also reduce the number of intelligent tasks. The complexity of the network can improve the accuracy and speed of the intelligent task network.
  • FIG. 14 shows a schematic diagram of the composition and structure of an encoder 140 provided by the embodiment of the present application.
  • the encoder 140 may include: a first feature extraction unit 1401 and an encoding unit 1402; wherein,
  • the first feature extraction unit 1401 is configured to use the intelligent task network to perform feature extraction on the input image data to obtain initial feature data;
  • the encoding unit 1402 is configured to use the encoding network to encode the initial feature data, and write the obtained encoded bits into the code stream.
  • the intelligent task network includes at least a feature extraction sub-network.
  • the first feature extraction unit 1401 is specifically configured to use the feature extraction sub-network to perform feature extraction on the input image data to obtain the initial feature data.
  • the feature extraction sub-network includes N feature extraction layers, and N is an integer greater than or equal to 1; correspondingly, the first feature extraction unit 1401 is also configured to use the feature extraction layer to performing feature extraction on the input image data to obtain initial feature data at the first feature node; and when N is greater than 1, performing feature extraction on the input image data using N feature extraction layers to obtain initial feature data at the first feature node.
  • the encoder 140 may further include a first dimension conversion unit 1403; wherein,
  • the encoding unit 1402 is further configured to determine the initial feature data at the first feature node as the first encoding node when the first encoding node in the encoding network matches the data dimension of the first feature node The feature data to be encoded at the node; or, when the first encoding node in the encoding network does not match the data dimension of the first feature node, through the first dimension conversion unit 1403, the adaptation network is used to convert the The initial feature data at the first feature node is subjected to data dimension conversion to obtain the feature data to be encoded at the first encoding node.
  • the encoding unit 1402 is specifically configured to input the feature data to be encoded to the first encoding node in the encoding network, and use the encoding network to perform encoding processing on the feature data to be encoded , and write the encoded bits into the code stream.
  • the adaptation network includes one or more layers of network structure.
  • the encoder 140 may further include a first training unit 1404 configured to determine at least one training sample; and use at least one training sample to train a preset network model; wherein, the preset network model Including the initial encoding network, initial decoding network and initial intelligent task network, and the initial encoding network and initial decoding network are connected to the initial intelligent task network through nodes; and when the loss function corresponding to the preset network model converges to the preset threshold, the training The obtained model is determined to be an intelligent fusion network model; wherein, the intelligent fusion network model includes an encoding network, a decoding network and an intelligent task network.
  • the intelligent task network includes a feature extraction sub-network and a feature analysis sub-network.
  • the encoder 140 may further include a first feature analysis unit 1405;
  • the first feature extraction unit 1401 is further configured to use the feature extraction sub-network to perform feature extraction on the input image data to obtain initial feature data;
  • the first feature analysis unit 1405 is configured to use the feature analysis sub-network to perform feature analysis on the initial feature data to determine a target result.
  • a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
  • the embodiment of the present application provides a computer storage medium, which is applied to the encoder 140, and the computer storage medium stores a computer program, and when the computer program is executed by the first processor, it implements any one of the above-mentioned embodiments. Methods.
  • FIG. 15 shows a schematic diagram of a specific hardware structure of an encoder 140 provided by an embodiment of the present application.
  • it may include: a first communication interface 1501 , a first memory 1502 and a first processor 1503 ; each component is coupled together through a first bus system 1504 .
  • the first bus system 1504 is used to realize connection and communication between these components.
  • the first bus system 1504 includes not only a data bus, but also a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as first bus system 1504 in FIG. 15 . in,
  • the first communication interface 1501 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the first memory 1502 is used to store computer programs that can run on the first processor 1503;
  • the first processor 1503 is configured to, when running the computer program, execute:
  • the encoding network is used to encode the initial feature data, and the obtained encoded bits are written into the code stream.
  • the first memory 1502 in the embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDRSDRAM enhanced synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM synchronous connection dynamic random access memory
  • Synchlink DRAM, SLDRAM Direct Memory Bus Random Access Memory
  • Direct Rambus RAM Direct Rambus RAM
  • the first processor 1503 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the first processor 1503 or an instruction in the form of software.
  • the above-mentioned first processor 1503 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the first memory 1502, and the first processor 1503 reads the information in the first memory 1502, and completes the steps of the above method in combination with its hardware.
  • the embodiments described in this application may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices for performing the functions described in this application electronic unit or its combination.
  • the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein.
  • Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.
  • the first processor 1503 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • This embodiment provides an encoder.
  • the feature extraction of the intelligent task network is used as the input of the encoding network, which can not only better learn the image information required by the intelligent task network, but also save the relevant
  • the process of image restoration and extraction and restoration of image feature data in the technology enables the decoding network to perform the processing of the intelligent task network without restoring to the image dimension, which greatly reduces the complexity of the intelligent task network and improves the accuracy of the intelligent task network and speed.
  • FIG. 16 shows a schematic diagram of the composition and structure of a decoder 160 provided in the embodiment of the present application.
  • the decoder 160 may include: an analysis unit 1601 and a second feature analysis unit 1602; wherein,
  • the parsing unit 1601 is configured to parse the code stream and determine the reconstruction feature data
  • the second feature analysis unit 1602 is configured to use the intelligent task network to perform feature analysis on the reconstructed feature data to determine the target result.
  • the decoder 160 may further include a second dimension conversion unit 1603; wherein,
  • the parsing unit 1601 is further configured to parse the code stream, and when the first feature node in the intelligent task network matches the data dimension of the first reconstruction node, determine the candidate reconstruction feature data at the first reconstruction node as the or, analyze the code stream, and when the data dimension of the first feature node in the intelligent task network does not match the data dimension of the first reconstruction node, the second dimension conversion unit 1603 is used to convert the The reconstruction feature data candidates at the first reconstruction node are subjected to data dimension conversion to obtain the reconstruction feature data. .
  • the second feature analysis unit 1602 is specifically configured to input the reconstructed feature data into the first feature node in the intelligent task network, and use the intelligent task network to analyze the reconstructed feature The data is subjected to feature analysis to obtain the target result.
  • the adaptation network includes one or more layers of network structure.
  • the intelligent task network includes a feature extraction subnetwork and a feature analysis subnetwork; correspondingly, the second feature analysis unit 1602 is specifically configured to obtain input the reconstructed feature data to the first feature node, and use the feature analysis sub-network to perform feature analysis on the reconstructed feature data to obtain the target result. .
  • the feature analysis sub-network includes a region generation network and a region-of-interest head network; correspondingly, the second feature analysis unit 1602 is specifically configured to process the reconstructed feature data through the region generation network to obtain the target region; And intelligently analyze the reconstructed feature data and the target area through the head network of the region of interest to obtain the target result.
  • the parsing unit 1601 is further configured to use a decoding network to parse the code stream to determine reconstruction feature data.
  • the decoder 160 may further include a second training unit 1604 configured to determine at least one training sample; and use at least one training sample to train the preset network model; wherein, the preset network model Including the initial encoding network, initial decoding network and initial intelligent task network, and the initial encoding network and initial decoding network are connected to the initial intelligent task network through nodes; and when the loss function corresponding to the preset network model converges to the preset threshold, the training The obtained model is determined to be an intelligent fusion network model; wherein, the intelligent fusion network model includes an encoding network, a decoding network and an intelligent task network.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated units are implemented in the form of software function modules and are not sold or used as independent products, they can be stored in a computer-readable storage medium.
  • this embodiment provides a computer storage medium, which is applied to the decoder 160, and the computer storage medium stores a computer program, and when the computer program is executed by the second processor, any one of the preceding embodiments is implemented. the method described.
  • FIG. 17 shows a schematic diagram of a specific hardware structure of the decoder 160 provided by the embodiment of the present application.
  • it may include: a second communication interface 1701 , a second memory 1702 and a second processor 1703 ; each component is coupled together through a second bus system 1704 .
  • the second bus system 1704 is used to realize connection and communication between these components.
  • the second bus system 1704 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as the second bus system 1704 in FIG. 17 . in,
  • the second communication interface 1701 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • the second memory 1702 is used to store computer programs that can run on the second processor 1703;
  • the second processor 1703 is configured to, when running the computer program, execute:
  • the second processor 1703 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • the hardware function of the second memory 1702 is similar to that of the first memory 1502, and the hardware function of the second processor 1703 is similar to that of the first processor 1503; details will not be described here.
  • This embodiment provides a decoder, which may include an analysis unit and a feature analysis unit.
  • a decoder which may include an analysis unit and a feature analysis unit.
  • the processing of the task network greatly reduces the complexity of the intelligent task network, thereby improving the accuracy and speed of the intelligent task network.
  • the intelligent analysis system 180 may include an encoder 1801 and a decoder 1802; wherein, the encoder 1801 may be the encoder described in any of the foregoing embodiments, and the decoder 1802 may be any of the foregoing embodiments.
  • the encoder 1801 may be the encoder described in any of the foregoing embodiments
  • the decoder 1802 may be any of the foregoing embodiments.
  • the intelligent analysis system 180 has an intelligent fusion network model, and the intelligent fusion network model may include an encoding network, a decoding network, and an intelligent task network.
  • the intelligent task network model may include an encoding network, a decoding network, and an intelligent task network.
  • a part of the intelligent task network and the encoding network are used in the encoder 1801
  • another part of the intelligent task network and the decoding network are used in the decoder 1802 .
  • the decoding network can perform the processing of the intelligent task network without restoring to the image dimension, which greatly reduces the complexity of the intelligent task network, thereby improving the accuracy and speed of the intelligent task network.
  • the intelligent task network is used to extract features from the input image data to obtain initial feature data; the encoding network is used to encode the initial feature data, and the obtained coded bits are written into the code stream.
  • the reconstruction feature data is determined by analyzing the code stream; the intelligent task network is used to perform feature analysis on the reconstruction feature data to determine the target result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种编解码方法、码流、编码器、解码器、存储介质和***,该方法包括:解析码流,确定重建特征数据;利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果。这样,不仅能够更好地学习到智能任务网络所需的图像信息,而且由于解码网络无需还原到图像维度即可执行智能任务网络的处理,还降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。

Description

编解码方法、码流、编码器、解码器、存储介质和*** 技术领域
本申请实施例涉及智能编码技术领域,尤其涉及一种编解码方法、码流、编码器、解码器、存储介质和***。
背景技术
目前,图像及视频的编解码处理可以包括传统方法和基于神经网络的智能方法。其中,传统方法是对于输入数据进行去冗余处理,比如图像或视频编解码流程是利用每帧图像的空间相关性以及多帧图像之间的时间相关性进行去冗余;而智能方法则是利用神经网络进行图像信息处理,并进行特征数据的提取。
在相关技术中,一般是将经过编解码处理过程得到的解码数据直接作为智能任务网络的输入数据。然而,解码数据中可能包括智能任务网络不需要的大量冗余信息,这些冗余信息的传输将会导致带宽的浪费或智能任务网络的效率降低;另外,端到端的编解码过程与智能任务网络之间几乎不存在相关性,使得编解码处理过程并无法实现对智能任务网络的优化。
发明内容
本申请实施例提供一种编解码方法、码流、编码器、解码器、存储介质和***,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够降低智能任务网络的复杂度,进而提升智能任务网络的精度和速度。
本申请实施例的技术方案可以如下实现:
第一方面,本申请实施例提供了一种解码方法,该方法包括:
解析码流,确定重建特征数据;
利用智能任务网络对重建特征数据进行特征分析,确定目标结果。
第二方面,本申请实施例提供了一种编码方法,该方法包括:
利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;
利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流。
第三方面,本申请实施例提供了一种码流,该码流是根据待编码信息进行比特编码生成的;其中,待编码信息至少包括初始特征数据,初始特征数据是通过智能任务网络对输入图像数据进行特征提取得到的。
第四方面,本申请实施例提供了一种编码器,该编码器包括第一特征提取单元和编码单元;其中,
第一特征提取单元,配置为利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;
编码单元,配置为利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流。
第五方面,本申请实施例提供了一种编码器,该编码器包括第一存储器和第一处理器;其中,
第一存储器,用于存储能够在第一处理器上运行的计算机程序;
第一处理器,用于在运行计算机程序时,执行如第二方面的方法。
第六方面,本申请实施例提供了一种解码器,该解码器包括解析单元和特征分析单元;其中,
解析单元,配置为解析码流,确定重建特征数据;
特征分析单元,配置为利用智能任务网络对重建特征数据进行特征分析,确定目标结果。
第七方面,本申请实施例提供了一种解码器,该解码器包括第二存储器和第二处理器;其中,
第二存储器,用于存储能够在第二处理器上运行的计算机程序;
第二处理器,用于在运行计算机程序时,执行如第一方面的方法。
第八方面,本申请实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,计算机程序被执行时实现如第一方面的方法、或者实现如第二方面的方法。
第九方面,本申请实施例提供了一种智能分析***,该智能分析***至少包括如第四方面或第五方面的编码器和如第六方面或第七方面的解码器。
本申请实施例提供了一种编解码方法、码流、编码器、解码器、存储介质和***,在编码器侧,利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流。在解码器侧,通过解析码流,确定重建特征数据;利用智能任务网络对重建特征数据进行特征分析,确定目标结果。这样,以智能任务网络的特征提取作为编码网络的输入,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。
附图说明
图1为一种编解码***的整体框架示意图;
图2为一种智能任务网络的整体框架示意图;
图3为一种编解码***和智能任务网络级联的整体框架示意图;
图4A为本申请实施例提供的一种编码器的详细框架示意图;
图4B为本申请实施例提供的一种解码器的详细框架示意图;
图5为本申请实施例提供的一种编码方法的流程示意图;
图6为本申请实施例提供的一种解码方法的流程示意图;
图7为本申请实施例提供的一种智能融合网络模型的流程框图示意图;
图8为本申请实施例提供的一种端到端的编解码网络的结构示意图;
图9A为本申请实施例提供的一种注意力机制模块的结构示意图;
图9B为本申请实施例提供的一种残差块的结构示意图;
图10为本申请实施例提供的一种智能任务网络的结构示意图;
图11为本申请实施例提供的一种智能融合网络模型的结构示意图;
图12A为本申请实施例提供的一种Lee网络模型的结构示意图;
图12B为本申请实施例提供的一种Duan网络模型的结构示意图;
图13A为本申请实施例提供的一种yolo_v3网络模型的结构示意图;
图13B为本申请实施例提供的另一种yolo_v3网络模型的结构示意图;
图13C为本申请实施例提供的一种ResNet-FPN网络模型的结构示意图;
图13D为本申请实施例提供的一种Mask-RCNN网络模型的结构示意图;
图14为本申请实施例提供的一种编码器的组成结构示意图;
图15为本申请实施例提供的一种编码器的具体硬件结构示意图;
图16为本申请实施例提供的一种解码器的组成结构示意图;
图17为本申请实施例提供的一种解码器的具体硬件结构示意图;
图18为本申请实施例提供的一种智能分析***的组成结构示意图。
具体实施方式
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。还需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅是用于区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
目前,图像及视频的编解码处理可以包括传统方法和基于神经网络进行处理的智能方法。其中,传统方法是对于输入数据进行去冗余处理,比如图像或视频编解码流程是利用每帧图像的空间相关 性以及多帧图像之间的时间相关性进行去冗余;而智能方法则是利用神经网络进行图像信息处理,并进行特征数据的提取。
在一种具体的实施例中,对于图像而言,基于图像的编解码处理可以分为传统方法和基于神经网络的智能方法。其中,传统方法即利用像素的空间相关性,对图像进行去冗余处理,通过变换,量化和熵编码等得到码流并进行传输。而智能方法则是使用神经网络进行编解码处理,目前基于神经网络的图像编解码方法提出了许多高效的神经网络结构,其能够用于图像的特征信息提取。在这里,卷积神经网络(Convolutional Neural Networks,CNN)是最早用于图像编解码的网络结构;在CNN的基础上,衍生了许多改良的神经网络结构以及概率估计模型,以神经网络结构为例,可以包括有:生成对抗网络(Generative Adversarial Networks,GAN)、循环神经网络(Recurrent Neural Networks,RNN)等网络结构,这些都可以提升基于神经网络的端到端图像压缩性能。其中,基于GAN的图像编解码方法在提升图像主观质量上取得了明显效果。
在另一种具体的实施例中,对于视频而言,基于视频的编解码处理同样可以分为传统方法和基于神经网络的智能方法。其中,传统方法即通过帧内或帧间预测编码,变换、量化、熵编码以及环路滤波等进行视频的编解码处理。而智能方法目前主要集中于三个方面:混合式神经网络编码(即将神经网络代替传统编码模块嵌入到视频框架)、神经网络率失真优化编码以及端到端视频编码。在这里,混合式神经网络编码一般以帧间预测模块、环路滤波模块和熵编码模块应用较多;神经网络率失真优化编码则是利用神经网络高度非线性特点,将神经网络训练成为高效的判别器与分类器,例如应用于视频编码模式的决策环节;端到端视频编码目前一般分为使用CNN替换传统编码方法的所有模块,或者是扩大神经网络的输入维度对所有帧进行端到端压缩。
在相关技术中,参见图1,其示出了一种编解码***的整体框架示意图。如图1所示,针对输入的待编码数据,编码方法由E1和E2组成;其中,E1是指提取特征数据的流程及编码流程,经过E1之后可以得到特征数据;E2是指对特征数据进行处理并得到码流的流程,即经过E2之后可以得到码流。相对应地,解码方法由D1和D2组成;其中,D2是指接收码流并将码流解析为特征数据的流程,即经过D2之后可以得到重建特征数据;D1是指将重建特征数据通过传统方法或基于神经网络变换为解码数据的流程,即经过D1之后可以得到解码数据(具体是指“解码图像”)。
另外,在本申请实施例中,智能任务网络一般通过对图像或视频进行分析,从而完成目标检测、目标跟踪或者是行为识别等任务目标。其中,智能任务网络的输入为通过编码方法和解码方法所得到的解码数据,智能任务网络的处理流程一般由A1和A2组成;其中,A1是指对于输入的解码数据针对智能任务网络的目标进行特征提取并得到特征数据的流程,A2则是指对特征数据进行处理并得到结果的流程。具体地,参见图2,其示出了一种智能任务网络的整体框架示意图。如图2所示,针对输入的解码数据,在经过A1之后可以得到特征数据,而特征数据在经过A2之后可以得到目标结果。
可以理解地,智能任务网络的输入数据,一般为经过编码方法和解码方法得到的解码数据,并直接将解码数据作为智能任务网络的输入。参见图3,其示出了一种编解码***和智能任务网络级联的整体框架示意图。如图3所示,编码方法和解码方法组成编解码***,在经过编码方法和解码方法获得到解码数据之后,将直接输入到A1中,经过A1可以得到特征数据;然后利用A2对特征数据进行处理,从而得到智能任务网络输出的目标结果。
这样,经过编码方法和解码方法获得到的解码数据,直接输入智能任务网络中,一方面,解码数据可能包括智能任务网络不需要的大量冗余信息,这些冗余信息的传输导致带宽的浪费或智能任务网络的效率降低;另一方面,端到端的编解码流程与智能任务网络的相关性几乎为零,导致编解码流程并无法针对智能任务网络进行优化。
基于此,本申请实施例提供了一种编码方法,应用于编码器。利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;利用编码网络对所述初始特征数据进行编码处理,并将得到的编码比特写入码流。
本申请实施例还提供了一种解码方法,应用于解码器。解析码流,确定重建特征数据;利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果。
这样,以智能任务网络的特征提取作为编码网络的输入,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。
下面将结合附图对本申请各实施例进行详细说明。
参见图4A,其示出了本申请实施例提供的一种编码器的详细框架示意图。如图4A所示,该编码器10包括变换与量化单元101、帧内估计单元102、帧内预测单元103、运动补偿单元104、运动估计单元105、反变换与反量化单元106、滤波器控制分析单元107、滤波单元108、编码单元109和解码图像缓存单元110等,其中,滤波单元108可以实现DBF滤波/SAO滤波/ALF滤波,编码单元109可以实现头信息编码及基于上下文的自适应二进制算术编码(Context-based Adaptive Binary Arithmetic Coding,CABAC)。针对输入的原始视频信号,通过编码树单元(Coding Tree Unit,CTU)的划分可以得到一个视频编码块,然后对经过帧内或帧间预测后得到的残差像素信息通过变换与量化单元101对该视频编码块进行变换,包括将残差信息从像素域变换到变换域,并对所得的变换系数进行量化,用以进一步减少比特率;帧内估计单元102和帧内预测单元103是用于对该视频编码块进行帧内预测;明确地说,帧内估计单元102和帧内预测单元103用于确定待用以编码该视频编码块的帧内预测模式;运动补偿单元104和运动估计单元105用于执行所接收的视频编码块相对于一或多个参考帧中的一或多个块的帧间预测编码以提供时间预测信息;由运动估计单元105执行的运动估计为产生运动向量的过程,所述运动向量可以估计该视频编码块的运动,然后由运动补偿单元104基于由运动估计单元105所确定的运动向量执行运动补偿;在确定帧内预测模式之后,帧内预测单元103还用于将所选择的帧内预测数据提供到编码单元109,而且运动估计单元105将所计算确定的运动向量数据也发送到编码单元109;此外,反变换与反量化单元106是用于该视频编码块的重构建,在像素域中重构建残差块,该重构建残差块通过滤波器控制分析单元107和滤波单元108去除方块效应伪影,然后将该重构残差块添加到解码图像缓存单元110的帧中的一个预测性块,用以产生经重构建的视频编码块;编码单元109是用于编码各种编码参数及量化后的变换系数,在基于CABAC的编码算法中,上下文内容可基于相邻编码块,可用于编码指示所确定的帧内预测模式的信息,输出该视频信号的码流;而解码图像缓存单元110是用于存放重构建的视频编码块,用于预测参考。随着视频图像编码的进行,会不断生成新的重构建的视频编码块,这些重构建的视频编码块都会被存放在解码图像缓存单元110中。
参见图4B,其示出了本申请实施例提供的一种解码器的详细框架示意图。如图4B所示,该解码器20包括解码单元201、反变换与反量化单元202、帧内预测单元203、运动补偿单元204、滤波单元205和解码图像缓存单元206等,其中,解码单元201可以实现头信息解码以及CABAC解码,滤波单元205可以实现DBF滤波/SAO滤波/ALF滤波。输入的视频信号经过图4A的编码处理之后,输出该视频信号的码流;该码流输入视频解码***20中,首先经过解码单元201,用于得到解码后的变换系数;针对该变换系数通过反变换与反量化单元202进行处理,以便在像素域中产生残差块;帧内预测单元203可用于基于所确定的帧内预测模式和来自当前帧或图片的先前经解码块的数据而产生当前视频解码块的预测数据;运动补偿单元204是通过剖析运动向量和其他关联语法元素来确定用于视频解码块的预测信息,并使用该预测信息以产生正被解码的视频解码块的预测性块;通过对来自反变换与反量化单元202的残差块与由帧内预测单元203或运动补偿单元204产生的对应预测性块进行求和,而形成解码的视频块;该解码的视频信号通过滤波单元205以便去除方块效应伪影,可以改善视频质量;然后将经解码的视频块存储于解码图像缓存单元206中,解码图像缓存单元206存储用于后续帧内预测或运动补偿的参考图像,同时也用于视频信号的输出,即得到了所恢复的原始视频信号。
在本申请的一实施例中,参见图5,其示出了本申请实施例提供的一种编码方法的流程示意图。如图5所示,该方法可以包括:
S501:利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据。
S502:利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流。
需要说明的是,该编码方法应用于编码器。在本申请实施例中,编码器可以包括有智能任务网络和编码网络。其中,智能任务网络用于实现对输入图像数据的特征提取,编码网络则用于实现对得到的初始特征数据进行编码处理。这样,使用智能任务网络的特征提取作为编码网络的输入,可以有助于编码网络更好的学习到智能任务网络所需的图像信息。
还需要说明的是,在编码器中,智能任务网络在提取到初始特征数据之后,将不会执行智能任务网络的后续处理流程,而是直接使用维度相同的编码节点对其进行编码处理;以便后续在解码器中,通过解码网络确定出重建特征数据之后,可以对重建特征数据执行智能任务网络的后续处理流程。这样,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,使得解码网络无需还原到图像维度即可执行智能任务网络 的处理,大大降低了智能任务网络的复杂度,
在一些实施例中,对于智能任务网络而言,智能任务网络至少可以包括特征提取子网络,所述利用智能任务网络对输入图像数据进行特征提取,确定初始特征数据,可以包括:利用特征提取子网络对输入图像数据进行特征提取,得到第一特征节点处的初始特征数据。
进一步地,在一些实施例中,特征提取子网络可以包括N个特征提取层,N为大于或等于1的整数。相应地,所述利用特征提取子网络对输入图像数据进行特征提取,得到第一特征节点处的初始特征数据,可以包括:
当N等于1时,利用特征提取层对输入图像数据进行特征提取,得到第一特征节点处的初始特征数据;
当N大于1时,利用N个特征提取层对输入图像数据进行特征提取,得到第一特征节点处的初始特征数据。
需要说明的是,第一特征节点可以是不同特征提取层对应的特征节点,针对哪一个特征提取层,其根据实际情况进行具体确定。例如,当在智能任务网络中确定出在某特征提取层之后需要进行编码和解码处理时,那么该特征提取层对应的特征节点即为第一特征节点,这些特征提取层组成特征提取子网络,而且经过该特征提取层之后提取得到的初始特征数据将会输入到编码网络中。
也就是说,针对第一特征节点处的初始特征数据,其可以是由一特征提取层进行特征提取得到的,也可以是由两个或者更多个特征提取层进行特征提取得到的,本申请实施例不作具体限定。
示例性地,如果是在第一个特征提取层之后即需要进行编码和解码处理,那么第一特征节点即为第一个特征提取层对应的特征节点,这时候提取得到的特征数据即为待输入编码网络的初始特征数据,特征提取子网络仅为第一个特征提取层;如果是在第二个特征提取层之后需要进行编码和解码处理,那么第一特征节点即为第二个特征提取层对应的特征节点,这时候提取得到的特征数据即为待输入编码网络的初始特征数据,特征提取子网络为第一个特征提取层和第二特征提取层。
进一步地,在得到第一特征节点处的初始特征数据之后,其对应编码网络中的哪一个编码节点处的待编码特征数据,其与两者的维度相关。因此,在一些实施例中,该方法还可以包括:
当编码网络中的第一编码节点与第一特征节点的数据维度匹配时,将第一特征节点处的初始特征数据确定为第一编码节点处的待编码特征数据;或者,
当编码网络中的第一编码节点与第一特征节点的数据维度不匹配时,利用适配网络对第一特征节点处的初始特征数据进行数据维度转换,得到第一编码节点处的待编码特征数据。
需要说明的是,在本申请实施例中,当第一特征节点处的特征空间通道数和分辨率等参数与第一编码节点处的特征空间通道数和分辨率等参数完全一致时,可以确定智能任务网络中的第一特征节点与编码网络中的第一编码节点的数据维度匹配。也就是说,在根据智能任务网络提取得到第一特征节点处的初始特征数据之后,可以直接使用编码网络中对应数据维度相同的第一编码节点进行编码处理;即将初始特征数据输入到编码网络中的第一编码节点,利用该编码网络对初始特征数据进行编码处理,然后将得到的编码比特写入码流中。
还需要说明的是,在本申请实施例中,当第一特征节点处的特征空间通道数和分辨率等参数与编码网络中的第一编码节点的特征空间通道数和分辨率等参数不完全一致时,即可能存在智能任务网络中的第一特征节点与编码网络中的第一编码节点的数据维度不匹配的现象。这时候可以利用适配网络对第一特征节点处的初始特征数据进行数据维度转换,得到第一编码节点处的待编码特征数据。这样,所述利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流,可以包括:将待编码特征数据输入到编码网络中的第一编码节点,利用编码网络对待编码特征数据进行编码处理,并将得到的编码比特写入码流。
还需要说明的是,在本申请实施例中,这里的适配网络可以包括一层或多层网络结构,而且该网络结构可以使用但不限于上采样、下采样、选择或者重复部分通道等等。也就是说,在智能任务网络与编码网络的级联上,还会有输入分析网络的特征图和重建特征图的空间分辨率或通道数不匹配的问题,在此基础上可以增加单层或多层网络结构作为适配器,进行特征的维度转换处理,从而对两部分网络的级联进行适配。在这里,适配器的网络结构可以使用但不限于上采样,下采样,选择或重复部分通道等,对此并不作任何限定。
可以理解的是,本申请实施例主要是提供了一种端到端的编解码网络与智能任务网络级联的智能融合网络模型。这里,端到端的编解码网络包括编码网络和解码网络;也就是说,智能融合网络模型可以包括编码网络、解码网络和智能任务网络。其中,智能任务网络的一部分和编码网络是在编码器中使用,智能任务网络的另一部分和解码网络是在解码器中使用。
在本申请实施例中,对于智能融合网络模型的训练,可以是在编码器中进行训练,也可以是在解码器中进行训练,甚至也可以是在编码器和解码器中都进行训练,这里不作任何限定。
在一种可能的实现方式中,对于智能融合网络模型的训练,该方法还可以包括:
确定至少一个训练样本;
利用至少一个训练样本对预设网络模型进行训练;其中,预设网络模型包括初始编码网络、初始解码网络和初始智能任务网络,且初始编码网络和初始解码网络与初始智能任务网络通过节点连接;
当预设网络模型对应的损失函数收敛到预设阈值时,将训练后得到的模型确定为智能融合网络模型;其中,智能融合网络模型包括编码网络、解码网络和智能任务网络。
在一种具体的实施例中,对于预设网络模型对应的损失函数来说,可以分为两部分:编解码网络的损失函数和智能任务网络的损失函数。具体地,在一些实施例中,该方法还可以包括:
确定智能任务网络的第一率失真权衡参数、智能任务网络的损失值、编解码网络的第二率失真权衡参数和编解码网络的失真值和码流比特率;
根据第一率失真权衡参数、第二率失真权衡参数以及智能任务网络的损失值、编解码网络的失真值和码流比特率,确定出预设网络模型对应的损失函数。
也就是说,本申请实施例的智能融合网络模型的重训练方法,可以是将初始智能任务网络和初始编解码网络通过节点连接起来形成的融合网络进行联合训练。示例性地,损失函数可以如下所示,
Figure PCTCN2021122480-appb-000001
其中,R表示编解码网络的码流比特率,λ 1、λ 2表示率失真权衡参数,而且不同的λ 1、λ 2对应不同的模型,即不同的总比特率;loss task表示智能任务网络的损失值,
Figure PCTCN2021122480-appb-000002
表示输入图像和解码图像之间的失真值。这里,x和
Figure PCTCN2021122480-appb-000003
分别表示编解码网络使用的编码节点和重建节点处的数据而非是图像数据;另外,这里的失真值可以是使用均方误差(Mean Squared Error,MSE)进行失真值度量。
对于式(1)而言,其可以看作两部分:λ 1·loss task表示智能任务网络的损失函数,
Figure PCTCN2021122480-appb-000004
表示编解码网络的损失函数;也就是说,智能融合网络模型的损失函数可以是由智能任务网络的损失函数和编解码网络的损失函数共同得到的。其中,λ 1、λ 2的取值根据实际情况进行具体设定,示例性地,λ 1的取值为0.3,λ 2的取值为0.7,但是并不作任何限定。
需要说明的是,本申请实施例的智能融合网络模型的重训练方法,也可以是分步进行训练的。例如,可以先将λ 2的取值设置为零,λ 1的取值设置为任意值,此时针对智能任务网络进行训练;然后再将λ 1的取值设置为零,λ 2的取值为任意值,此时针对编解码网络(包括编码网络和解码网络)进行训练;最后再进行联合训练等,这里针对训练方法并不作任何限定,还可以是其他多种训练方法,甚至也可以是多种不同训练方法组合使用等等。
还需要说明的是,对于智能任务网络而言,智能任务网络可以包括特征提取子网络和特征分析子网络;其中,特征提取子网络,可以用于对输入图像数据进行特征提取,确定出初始特征数据,然后由编码网络对其进行编码处理。而特征分析子网络,可以用于对输入的特征数据进行特征分析,以确定出目标结果;这里可以是指完成目标检测、目标跟踪或者行为识别等任务目标。
这样,在训练得到智能任务网络之后,在一些实施例中,该方法还可以包括:
利用特征提取子网络对输入图像数据进行特征提取,得到初始特征数据;
利用特征分析子网络对初始特征数据进行特征分析,确定目标结果。
也就是说,训练后已经完成训练的智能任务网络,可以直接对输入图像数据进行特征提取以及特征分析处理,这时候不再需要经过编码网络和解码网络,也能够确定出目标结果。示例性地,对于本地图像的处理,这时候就不需要再经过端到端的编解码网络进行数据传输;这时候,在训练得到智能任务网络(包括特征提取子网络和特征分析子网络)之后,同样可以应用于图像的分析处理以得到智能任务的目标结果。
本实施例提供了一种编码方法,应用编码器。通过利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;利用编码网络对所述初始特征数据进行编码处理,并将得到的编码比特写入码流。这样,以智能任务网络的特征提取作为编码网络的输入,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。
在本申请的另一实施例中,本申请实施例提供了一种码流,该码流是根据待编码信息进行比特编码生成的。
在本申请实施例中,待编码信息至少包括初始特征数据,所述初始特征数据是通过智能任务网络对输入图像数据进行特征提取得到的。这样,编码器在生成码流后,可以传输到解码器,以便后续解码器通过解析码流即可获得重建特征数据。
在本申请的又一实施例中,参见图6,其示出了本申请实施例提供的一种解码方法的流程示意图。如图6所示,该方法可以包括:
S601:解析码流,确定重建特征数据。
S602:利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果。
需要说明的是,该解码方法应用于解码器。在本申请实施例中,解码器中可以包括解码网络。这样,对于S601来说,所述解析码流,确定重建特征数据,还可包括:利用解码网络对码流进行解析,确定重建特征数据。
还需要说明的是,在本申请实施例中,解码器不仅具有解码功能,还可以具有智能分析功能;也就是说,除了解码网络之外,本申请实施例中的解码器还包括有智能任务网络。这样,在解码器中,无需重构到解码图像,而是在重构到特征空间,即解码获得重建特征数据之后,即可以利用智能任务网络对重建特征数据进行特征分析,从而确定出目标结果,这里可以是指完成目标检测、目标跟踪或者行为识别等任务目标。
在一些实施例中,所述解析码流,确定重建特征数据,可以包括:
解析码流,当所述智能任务网络中的第一特征节点与第一重建节点的数据维度匹配时,将所述第一重建节点处的候选重建特征数据确定为所述重建特征数据;或者,
解析码流,当所述智能任务网络中的第一特征节点与第一重建节点的数据维度不匹配时,利用适配网络对所述第一重建节点处的候选重建特征数据进行数据维度转换,得到所述重建特征数据。
需要说明的是,针对解码获得的重建特征数据并非全部都是满足需求的,其与智能任务网络中的特征节点的数据维度相关。具体而言,当智能任务网络中的第一特征节点与第一重建节点的数据维度匹配时,这时候可以将第一重建节点处的候选重建特征数据确定为智能任务网络中的第一特征节点处的重建特征数据。也就是说,在本申请实施例中,当第一特征节点处的特征空间通道数和分辨率等参数与第一重建节点处的特征空间通道数和分辨率等参数完全一致时,可以确定智能任务网络中的第一特征节点与解码网络中的第一重建节点的数据维度匹配。
还需要说明的是,在本申请实施例中,当智能任务网络中的第一特征节点处的特征空间通道数和分辨率等参数与第一重建节点处的特征空间通道数和分辨率等参数不完全一致时,即可能存在智能任务网络中的第一特征节点与第一重建节点的数据维度不匹配的现象。这时候需要利用适配网络对第一重建节点处的候选重建特征数据进行数据维度转换,得到所述重建特征数据。
还需要说明的是,这里的适配网络可以包括一层或多层网络结构,而且该网络结构可以使用但不限于上采样、下采样、选择或者重复部分通道等等。也就是说,在智能任务网络与解码网络的级联上,还会有输入分析网络的特征图和重建特征图的空间分辨率或通道数不匹配的问题,在此基础上可以增加单层或多层网络结构作为适配器,进行特征的维度转换处理,从而对两部分网络的级联进行适配。在这里,适配器的网络结构可以使用但不限于上采样,下采样,选择或重复部分通道等,对此并不作任何限定。
进一步地,在确定出重建特征数据之后,在一些实施例中,对于S602来说,所述利用智能任务网络对重建特征数据进行特征分析,确定目标结果,可以包括:
将重建特征数据输入到智能任务网络中的第一特征节点,并利用智能任务网络对重建特征数据进行特征分析,得到目标结果。
需要说明的是,第一特征节点可以是不同特征提取层对应的特征节点,针对哪一个特征提取层,其根据实际情况进行具体确定。例如,当在智能任务网络中确定出在某特征提取层之后需要进行编码和解码处理时,那么该特征提取层对应的特征节点即为第一特征节点,而经过该特征提取层之后提取得到的初始特征数据将会经过编码网络和解码网络进行处理,使得能够得到第一特征节点处的重建特征数据,进而分析得到目标结果。
可以理解的是,对于智能任务网络而言,智能任务网络可以包括特征提取子网络和特征分析子网络;相应地,在一些实施例中,在一种具体的实施例中,所述利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果,可以包括:
当第一特征节点为经过特征提取子网络后得到的特征节点时,将重建特征数据输入到第一特征节点,并利用特征分析子网络对重建特征数据进行特征分析,得到目标结果。
需要说明的是,特征提取子网络可以包括若干个特征提取层;而这里的第一特征节点可以是经过一特征提取层得到的,也可以是经过两个或更多个特征提取层得到的,本申请实施例不作具体限定。
示例性地,如果在智能任务网络中确定在第四个特征提取层之后需要进行编码和解码处理,即第一特征节点为第四个特征提取层对应的特征节点时,这时候特征提取子网络包括四个特征提取层,而且第四个特征提取层之后所得到的重建特征数据将输入到特征分析子网络进行特征分析,可以得到目标结果。如果在智能任务网络中确定在第二个特征提取层之后需要进行编码和解码处理,即第一特征节点为第二个特征提取层对应的特征节点,那么这时候特征提取子网络包括两个特征提取层,而且第二个特征提取层之后所得到的重建特征数据将输入到特征分析子网络进行特征分析,即可得到目标结果。
还需要说明的是,对于特征分析子网络而言,在一些实施例中,特征分析子网络可以包括区域生成网络(Region Proposal Network,RPN)和感兴趣区域头部网络(Region Of Interest_Heads,ROI_Heads)。其中,区域生成网络的输出端与感兴趣区域头部网络的输入端连接,区域生成网络的输入端也与感兴趣区域头部网络连接,而感兴趣区域头部网络的输出端用于输出目标结果。
相应地,在一些实施例中,利用特征分析子网络对重建特征数据进行特征分析,得到目标结果,可以包括:
通过区域生成网络对重建特征数据进行处理,得到目标区域;
通过感兴趣区域头部网络对重建特征数据和目标区域进行智能分析,得到目标结果。
也就是说,对于成功将特征数据,首先通过区域生成网络对其进行处理,得到目标区域;然后通过感兴趣区域头部网络对重建特征数据和目标区域进行智能分析,从而可以得到目标结果。
还可以理解的是,本申请实施例主要是提供了一种端到端的编解码网络与智能任务网络级联的智能融合网络模型,其目标是经过该智能融合网络模型的处理和重训练可以使得智能任务网络达到最优性能。其中,端到端的编解码网络包括编码网络和解码网络;也就是说,智能融合网络模型可以包括编码网络、解码网络和智能任务网络。在这里,智能任务网络的一部分和编码网络是在编码器中使用,智能任务网络的另一部分和解码网络是在解码器中使用。
进一步地,对于智能融合网络模型的训练,在一些实施例中,该方法还可以包括:
确定至少一个训练样本;
利用至少一个训练样本对预设网络模型进行训练;其中,预设网络模型包括初始编码网络、初始解码网络和初始智能任务网络,且初始编码网络和初始解码网络与初始智能任务网络通过节点连接;
当预设网络模型对应的损失函数收敛到预设阈值时,将训练后得到的模型确定为智能融合网络模型;其中,智能融合网络模型包括编码网络、解码网络和智能任务网络。
在一种具体的实施例中,对于预设网络模型对应的损失函数来说,可以分为两部分:编解码网络的损失函数和智能任务网络的损失函数。具体地,在一些实施例中,该方法还可以包括:
确定智能任务网络的第一率失真权衡参数、智能任务网络的损失值、编解码网络的第二率失真权衡参数和编解码网络的失真值和码流比特率;
根据第一率失真权衡参数、第二率失真权衡参数以及智能任务网络的损失值、编解码网络的失真值和码流比特率,确定出预设网络模型对应的损失函数。
也就是说,本申请实施例的智能融合网络模型的重训练方法,可以是将初始智能任务网络和初始编解码网络通过节点连接起来形成的融合网络进行联合训练。示例性地,损失函数可以如上述的式(1)所示。
对于式(1)而言,其可以看作两部分:λ 1·loss task表示智能任务网络的损失函数,
Figure PCTCN2021122480-appb-000005
表示编解码网络的损失函数;也就是说,智能融合网络模型的损失函数可以是由智能任务网络的损失函数和编解码网络的损失函数共同得到的。其中,λ 1、λ 2的取值根据实际情况进行具体设定,示例性地,λ 1的取值为0.3,λ 2的取值为0.7,但是并不作任何限定。
还需要说明的是,本申请实施例的智能融合网络模型的重训练方法,也可以是分步进行训练的。例如,可以先将λ 2的取值设置为零,λ 1的取值设置为任意值,此时针对智能任务网络进行训练;然后再将λ 1的取值设置为零,λ 2的取值为任意值,此时针对编解码网络(包括编码网络和解码网络)进行训练;最后再进行联合训练等,这里针对训练方法并不作任何限定,还可以是其他多种训练方法,甚至也可以是多种不同训练方法组合使用等等。
本实施例还提供了一种解码方法,应用于解码器。通过解析码流,确定重建特征数据;利用智 能任务网络对所述重建特征数据进行特征分析,确定目标结果。这样,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。
在本申请的又一实施例中,针对输入图像数据,在编码器中,首先可以是由智能任务网络对其进行特征提取,然后将提取得到的初始特征数据输入到编码网络中;也就是说,以智能任务网络的特征提取部分作为编码网络的前置处理流程,即利用智能任务网络的特征提取作为编码网络的输入,可以有助于编解码网络更好的学习到智能任务网络所需的图像信息。然后在利用编码网络对初始特征数据进行编码处理得到码流后,当码流传输到解码器中时,可以通过解析码流得到重建特征数据,将重建特征数据再输入到智能任务网络中进行特征分析,即以智能任务网络的分析处理部分作为解码网络的后续处理流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的分析处理,大大降低了智能任务网络的复杂度。
参见图7,其示出了本申请实施例提供的一种智能融合网络模型的流程框图示意图。如图7所示,针对输入的待编码数据(即输入图像数据),在经过A1的特征提取之后,可以得到特征数据;然后特征数据经过E2的编码处理之后可以得到码流;码流在输入D2进行解码处理之后,可以得到重建特征数据,而重建特征数据输入A2进行特征分析之后,可以得到目标结果。其中,A1和A2属于智能任务网络,E2和D2属于编解码网络;在这里,A1是指对于输入的待编码数据针对智能任务网络的目标进行特征提取并得到特征数据的流程,E2是指对特征数据进行处理并得到码流的流程,D2是指接收码流并将码流解析为重建特征数据的流程,A2则是指对重建特征数据进行处理并得到结果的流程。
根据图7可以明显看出,解码处理时无需重构到解码图像,而只需要重建到特征空间,之后将特征空间作为智能任务网络的输入而不使用解码图像。也就是说,将经过智能任务网络A1提取的特征数据,采用E2和D2进行编码和解码,然后将D2解码后的重建特征数据使用A2进行分析可以直接得到目标结果。
在本申请实施例中,这里所使用的编解码网络和智能任务网络,其中,编解码网络可以分为编码网络和解码网络。具体而言,编码网络可以使用智能任务网络的特征提取子网络以及端到端的编码网络的部分节点,输入图像数据通过智能任务网络进行特征提取,可以到某个特征节点后不再执行智能任务网络而是直接使用对应维度相同的编码节点的端到端图像压缩网络进行压缩。解码网络同样在解码执行到对应与编码节点维度相同的重建节点后,将重建节点处的重建特征数据输入到智能任务网络,并进行智能任务网络的后续处理流程。
另外,在本申请实施例中,这里所使用的编解码网络和智能任务网络,可以是各种常用的端到端的编解码网络和智能任务网络,其与具体的网络结构和类型无关。例如,编解码网络本身可以使用CNN、RNN以及GAN等多种神经网络结构的变种;智能任务网络针对所执行的任务目标和网络结构同样也不作任何限定,可以是目标检测、目标跟踪、行为识别、模式识别等涉及图像处理的任务目标。
示例性地,参见图8,其示出了本申请实施例提供的一种端到端的编解码网络的结构示意图。如图8所示,可以包括编码网络和解码网络。其中,“Conv”为卷积(Convolution)的缩写,“1×1”、“3×3”、“5×5”均表示卷积核的大小;“N”表示卷积核的数量(即该卷积层的输出通道数),“/2”表示2倍的下采样处理,使得输入尺寸减半;“×2”表示2倍的上采样处理,使得输入尺寸扩大一倍。由于编码网络中进行了2倍的下采样处理,那么在解码网络中,对应需要进行2倍的上采样处理。
还需要说明的是,在图8的编解码网络中,还包括有注意力机制模块,图9A示出了本申请实施例提供的一种注意力机制模块的结构示意图。如图9A所示,其可以包括由残差块(Residual Block,RB)、1×1的卷积层(用1×1 Conv表示)和激活函数、乘法器和加法器组成。其中,激活函数可以用Sigmoid函数表示,其是一种常见的S型函数,也称为S型生长曲线。Sigmoid函数常被用作神经网络的激活函数,将变量映射到0与1之间。图9B示出了本申请实施例提供的一种残差块的结构示意图。如图9B所示,残差块可以是由第一卷积层、第二卷积层和第三卷积层等三个卷积层组成,其中,第一卷积层为1×1的卷积核尺寸,N/2的输出通道数,可以用1×1 Conv,N/2表示;第二卷积层为3×3的卷积核尺寸,N/2的输出通道数,可以用3×3 Conv,N/2表示;第三卷积层为1×1的卷积核尺寸,N的输出通道数,可以用1×1 Conv,N表示。
参见图10,其示出了本申请实施例提供的一种智能任务网络的结构示意图。如图10所示,智能任务网络可以包括特征提取子网络和特征分析子网络。其中,F0表示输入,其为输入图像数据。特征提取子网络包括四个特征提取层:第一卷积模块对应第一个特征提取层,其对应的特征节点用F1表示;第二卷积模块对应第二个特征提取层,其对应的特征节点用F2表示;第三卷积模块对应第三个特征提取层,其对应的特征节点用F3表示;第四卷积模块对应第四个特征提取层,其对应的特征节点用F4表示。特征分析子网络可以包括区域生成网络(RPN)和感兴趣区域头部网络(ROI_Heads),最终的输出为目标结果。
基于图8示出的端到端的编解码网络和图10示出的智能任务网络为例,图11示出了本申请实施例提供的一种智能融合网络模型的结构示意图。如图11所示,这里示出了一种融合端到端的编解码网络与智能任务网络的联合网络,目标是经过该联合网络的处理和重训练使得智能任务网络达到最优性能。
在图11中,编码网络中设置有e0、e1、e2、e3、e4、e5、e6、e7、e8、e9等编码节点,解码网络中设置有d0、d1、d2、d3、d4、d5、d6、d7、d8、d9、d10等重建节点,智能任务网络中设置有F0、F1、F2、F3、F4等特征节点。其中,e0和d0为端到端的编解码的输入节点和输出节点,F0为智能任务网络的输入节点。对于输入尺寸来说,其为W×H×3;而经过第一卷积模块之后,由于尺寸减半,这时候为
Figure PCTCN2021122480-appb-000006
在经过第二卷积模块之后,由于尺寸继续减半,这时候为
Figure PCTCN2021122480-appb-000007
在经过第三卷积模块之后,由于尺寸继续减半,这时候为
Figure PCTCN2021122480-appb-000008
在经过第四卷积模块之后,由于尺寸继续减半,这时候为
Figure PCTCN2021122480-appb-000009
也就是说,本申请实施例的智能融合网络模型如图11所示,在相关技术中,原始处理流程中端到端的编解码网络的输入节点和输出节点分别为e0、d0,智能任务网络的输入节点为FO(即经过端到端的编解码网络的解码图像)。在本申请实施例中,可以探索F1、F2、F3、F4等特征节点的融合网络性能,以F1节点为例,首先将智能任务网络的F1节点处的初始特征数据作为编码网络中的e1节点处的输入并通过解码网络得到d2节点处的重建特征数据,可以将其作为F1节点处的特征数据,然后进行后续的智能任务网络处理流程。
还需要说明的是,图11所示的不同特征层数对应的d节点、e节点处提取的特征空间通道数和分辨率与智能任务网络对应的F节点需要完全一致,因此,本申请实施例需要对d节点、e节点处的数据与F节点处数据维度进行匹配。
还需要说明的是,本申请实施例所述的编解码网络,可以是诸如传统的视频编解码、智能端到端图像编解码、传统视频编解码的部分智能化以及视频的端到端编解码等等,这里并不作任何限定。此外,本申请实施例提出的智能任务网络和端到端的编解码网络,同样可以使用其他常见的网络结构来代替。例如,在端到端的编解码领域,可以使用Lee网络和Duan网络来具体实施。其中,Lee网络采用迁移学习的方法提升网络重建图像的质量;Duan网络则使用高层的语义图增强低级的视觉特征,并且验证了这种方法可以有效的提升图像压缩的码率-精度-失真性能。在这里,Lee网络模型的组成结构如图12A所示,Duan网络模型的组成结构如图12B所示。
相应地,在智能任务网络领域,可以使用目标识别网络yolo_v3来具体实施,其网络模型的组成结构如图13A和图13B所示;此外,也可以使用目标检测网络ResNet-FPN以及实例分割网络Mask-RCNN,其中,目标检测网络模型的组成结构如图13C所示,实例分割网络模型的组成结构如图13D所示。
综上可知,将编解码网络的特征空间向量而非原始图像输入智能任务网络,从而能够节省图像恢复并提取恢复图像特征流程,更好的提升智能任务网络的精度和速度。同时使用智能任务网络的特征提取作为端到端图像编解码网络的输入有助于编解码网络更好的学习到智能任务网络所需的图像信息。这样,本申请实施例以智能任务网络的特征提取部分作为端到端编码网络的前置处理流程,以智能任务网络的分析处理部分作为图像端到端解码网络的后续处理流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度。
本实施例对前述实施例的具体实现进行了详细阐述,根据前述实施例的技术方案,从中可以看出,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够降低智能任务网络的复杂度,进而提升智能任务网络的精度和速度。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图14,其示出了本申请实施例提供的一种编码器140的组成结构示意图。如图14所示,该编码器140可以包括:第一特征提取 单元1401和编码单元1402;其中,
第一特征提取单元1401,配置为利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;
编码单元1402,配置为利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流。
在一些实施例中,智能任务网络至少包括特征提取子网络,相应地,第一特征提取单元1401,具体配置为利用特征提取子网络对输入图像数据进行特征提取,得到第一特征节点处的初始特征数据。
在一些实施例中,特征提取子网络包括N个特征提取层,N为大于或等于1的整数;相应地,第一特征提取单元1401,还配置为当N等于1时,利用特征提取层对输入图像数据进行特征提取,得到第一特征节点处的初始特征数据;以及当N大于1时,利用N个特征提取层对输入图像数据进行特征提取,得到第一特征节点处的初始特征数据。
在一些实施例中,参见图14,编码器140还可以包括第一维度转换单元1403;其中,
编码单元1402,还配置为当所述编码网络中的第一编码节点与所述第一特征节点的数据维度匹配时,将所述第一特征节点处的初始特征数据确定为所述第一编码节点处的待编码特征数据;或者,当所述编码网络中的第一编码节点与所述第一特征节点的数据维度不匹配时,通过第一维度转换单元1403,利用适配网络对所述第一特征节点处的初始特征数据进行数据维度转换,得到所述第一编码节点处的待编码特征数据。
在一些实施例中,编码单元1402,具体配置为将所述待编码特征数据输入到所述编码网络中的所述第一编码节点,利用所述编码网络对所述待编码特征数据进行编码处理,并将得到的编码比特写入码流。
在一些实施例中,适配网络包括一层或多层网络结构。
在一些实施例中,参见图14,编码器140还可以包括第一训练单元1404,配置为确定至少一个训练样本;以及利用至少一个训练样本对预设网络模型进行训练;其中,预设网络模型包括初始编码网络、初始解码网络和初始智能任务网络,且初始编码网络和初始解码网络与初始智能任务网络通过节点连接;以及当预设网络模型对应的损失函数收敛到预设阈值时,将训练后得到的模型确定为智能融合网络模型;其中,智能融合网络模型包括编码网络、解码网络和智能任务网络。
在一些实施例中,智能任务网络包括特征提取子网络和特征分析子网络。参见图14,编码器140还可以包括第一特征分析单元1405;
第一特征提取单元1401,还配置为利用所述特征提取子网络对输入图像数据进行特征提取,得到初始特征数据;
第一特征分析单元1405,配置为利用所述特征分析子网络对所述初始特征数据进行特征分析,确定目标结果。
可以理解地,在本申请实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本申请实施例提供了一种计算机存储介质,应用于编码器140,该计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码器140的组成以及计算机存储介质,参见图15,其示出了本申请实施例提供的一种编码器140的具体硬件结构示意图。如图15所示,可以包括:第一通信接口1501、第一存储器1502和第一处理器1503;各个组件通过第一总线***1504耦合在一起。可理解,第一总线***1504用于实现这些组件之间的连接通信。第一总线***1504除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图15中将各种总线都标为第一总线***1504。 其中,
第一通信接口1501,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第一存储器1502,用于存储能够在第一处理器1503上运行的计算机程序;
第一处理器1503,用于在运行所述计算机程序时,执行:
利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;
利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流。
可以理解,本申请实施例中的第一存储器1502可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的***和方法的第一存储器1502旨在包括但不限于这些和任意其它适合类型的存储器。
而第一处理器1503可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器1503中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器1503可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器1502,第一处理器1503读取第一存储器1502中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,第一处理器1503还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
本实施例提供了一种编码器,在该编码器中,以智能任务网络的特征提取作为编码网络的输入,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图16,其示出了本申请实施例提供的一种解码器160的组成结构示意图。如图16所示,该解码器160可以包括:解析单元1601和第二特征分析单元1602;其中,
解析单元1601,配置为解析码流,确定重建特征数据;
第二特征分析单元1602,配置为利用智能任务网络对重建特征数据进行特征分析,确定目标结果。
在一些实施例中,在一些实施例中,参见图16,解码器160还可以包括第二维度转换单元1603;其中,
解析单元1601,还配置为解析码流,当所述智能任务网络中的第一特征节点与第一重建节点的数据维度匹配时,将所述第一重建节点处的候选重建特征数据确定为所述重建特征数据;或者,解析码流,当所述智能任务网络中的第一特征节点与第一重建节点的数据维度不匹配时,通过第二维度转换单元1603,利用适配网络对所述第一重建节点处的候选重建特征数据进行数据维度转换,得到所述重建特征数据。。
在一些实施例中,第二特征分析单元1602,具体配置为将所述重建特征数据输入到所述智能任务网络中的所述第一特征节点,并利用所述智能任务网络对所述重建特征数据进行特征分析,得到所述目标结果。
在一些实施例中,适配网络包括一层或多层网络结构。
在一些实施例中,智能任务网络包括特征提取子网络和特征分析子网络;相应地,第二特征分析单元1602,具体配置为当所述第一特征节点为经过所述特征提取子网络后得到的特征节点时,将所述重建特征数据输入到所述第一特征节点,并利用所述特征分析子网络对所述重建特征数据进行特征分析,得到所述目标结果。。
在一些实施例中,特征分析子网络包括区域生成网络和感兴趣区域头部网络;相应地,第二特征分析单元1602,具体配置为通过区域生成网络对重建特征数据进行处理,得到目标区域;以及通过感兴趣区域头部网络对重建特征数据和目标区域进行智能分析,得到目标结果。
在一些实施例中,解析单元1601,还配置为利用解码网络对码流进行解析,确定重建特征数据。
在一些实施例中,参见图16,解码器160还可以包括第二训练单元1604,配置为确定至少一个训练样本;以及利用至少一个训练样本对预设网络模型进行训练;其中,预设网络模型包括初始编码网络、初始解码网络和初始智能任务网络,且初始编码网络和初始解码网络与初始智能任务网络通过节点连接;以及当预设网络模型对应的损失函数收敛到预设阈值时,将训练后得到的模型确定为智能融合网络模型;其中,智能融合网络模型包括编码网络、解码网络和智能任务网络。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本实施例提供了一种计算机存储介质,应用于解码器160,该计算机存储介质存储有计算机程序,所述计算机程序被第二处理器执行时实现前述实施例中任一项所述的方法。
基于上述解码器160的组成以及计算机存储介质,参见图17,其示出了本申请实施例提供的一种解码器160的具体硬件结构示意图。如图17所示,可以包括:第二通信接口1701、第二存储器1702和第二处理器1703;各个组件通过第二总线***1704耦合在一起。可理解,第二总线***1704用于实现这些组件之间的连接通信。第二总线***1704除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图17中将各种总线都标为第二总线***1704。其中,
第二通信接口1701,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第二存储器1702,用于存储能够在第二处理器1703上运行的计算机程序;
第二处理器1703,用于在运行所述计算机程序时,执行:
解析码流,确定重建特征数据;
利用智能任务网络对重建特征数据进行特征分析,确定目标结果。
可选地,作为另一个实施例,第二处理器1703还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
可以理解,第二存储器1702与第一存储器1502的硬件功能类似,第二处理器1703与第一处理器1503的硬件功能类似;这里不再详述。
本实施例提供了一种解码器,该解码器可以包括解析单元和特征分析单元。这样,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。
在本申请的再一实施例中,参见图18,其示出了本申请实施例提供的一种智能分析***的组成 结构示意图。如图18所示,智能分析***180可以包括编码器1801和解码器1802;其中,编码器1801可以为前述实施例中任一项所述的编码器,解码器1802可以为前述实施例中任一项所述的解码器。
在本申请实施例中,智能分析***180中具有智能融合网络模型,且智能融合网络模型可以包括编码网络、解码网络和智能任务网络。其中,智能任务网络的一部分和编码网络是在编码器1801中使用,智能任务网络的另一部分和解码网络是在解码器1802中使用。这样,以智能任务网络的特征提取作为编码网络的输入,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
工业实用性
本申请实施例中,在编码器侧,利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;利用编码网络对初始特征数据进行编码处理,并将得到的编码比特写入码流。在解码器侧,通过解析码流,确定重建特征数据;利用智能任务网络对重建特征数据进行特征分析,确定目标结果。这样,以智能任务网络的特征提取作为编码网络的输入,不仅能够更好地学习到智能任务网络所需的图像信息,而且还能够节省相关技术中的图像恢复及提取恢复图像特征数据的流程,从而使得解码网络无需还原到图像维度即可执行智能任务网络的处理,大大降低了智能任务网络的复杂度,进而提升了智能任务网络的精度和速度。

Claims (23)

  1. 一种解码方法,所述方法包括:
    解析码流,确定重建特征数据;
    利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果。
  2. 根据权利要求1所述的方法,其中,所述解析码流,确定重建特征数据,包括:
    解析码流,当所述智能任务网络中的第一特征节点与第一重建节点的数据维度匹配时,将所述第一重建节点处的候选重建特征数据确定为所述重建特征数据;或者,
    解析码流,当所述智能任务网络中的第一特征节点与第一重建节点的数据维度不匹配时,利用适配网络对所述第一重建节点处的候选重建特征数据进行数据维度转换,得到所述重建特征数据。
  3. 根据权利要求2所述的方法,其中,所述利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果,包括:
    将所述重建特征数据输入到所述智能任务网络中的所述第一特征节点,并利用所述智能任务网络对所述重建特征数据进行特征分析,得到所述目标结果。
  4. 根据权利要求3所述的方法,其中,所述适配网络包括一层或多层网络结构。
  5. 根据权利要求3所述的方法,其中,所述智能任务网络包括特征提取子网络和特征分析子网络;
    相应地,所述利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果,包括:
    当所述第一特征节点为经过所述特征提取子网络后得到的特征节点时,将所述重建特征数据输入到所述第一特征节点,并利用所述特征分析子网络对所述重建特征数据进行特征分析,得到所述目标结果。
  6. 根据权利要求5所述的方法,其中,所述特征分析子网络包括区域生成网络和感兴趣区域头部网络;
    所述利用所述特征分析子网络对所述重建特征数据进行特征分析,得到所述目标结果,包括:
    通过所述区域生成网络对所述重建特征数据进行处理,得到目标区域;
    通过所述感兴趣区域头部网络对所述重建特征数据和所述目标区域进行智能分析,得到所述目标结果。
  7. 根据权利要求1至6任一项所述的方法,其中,所述解析码流,确定重建特征数据,还包括:利用解码网络对所述码流进行解析,确定所述重建特征数据。
  8. 根据权利要求7所述的方法,其中,所述方法还包括:
    确定至少一个训练样本;
    利用所述至少一个训练样本对预设网络模型进行训练;其中,所述预设网络模型包括初始编码网络、初始解码网络和初始智能任务网络,且所述初始编码网络和所述初始解码网络与所述初始智能任务网络通过节点连接;
    当所述预设网络模型对应的损失函数收敛到预设阈值时,将训练后得到的模型确定为智能融合网络模型;其中,所述智能融合网络模型包括编码网络、所述解码网络和所述智能任务网络。
  9. 一种编码方法,所述方法包括:
    利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;
    利用编码网络对所述初始特征数据进行编码处理,并将得到的编码比特写入码流。
  10. 根据权利要求9所述的方法,其中,所述智能任务网络至少包括特征提取子网络,所述利用智能任务网络对输入图像数据进行特征提取,确定初始特征数据,包括:
    利用所述特征提取子网络对输入图像数据进行特征提取,得到第一特征节点处的所述初始特征数据。
  11. 根据权利要求10所述的方法,其中,所述特征提取子网络包括N个特征提取层,N为大于或等于1的整数;
    所述利用所述特征提取子网络对输入图像数据进行特征提取,得到第一特征节点处的所述初始特征数据,包括:
    当N等于1时,利用所述特征提取层对输入图像数据进行特征提取,得到所述第一特征节点处的所述初始特征数据;
    当N大于1时,利用所述N个特征提取层对输入图像数据进行特征提取,得到所述第一特征节点处的所述初始特征数据。
  12. 根据权利要求11所述的方法,其中,所述方法还包括:
    当所述编码网络中的第一编码节点与所述第一特征节点的数据维度匹配时,将所述第一特征节点处的初始特征数据确定为所述第一编码节点处的待编码特征数据;或者,
    当所述编码网络中的第一编码节点与所述第一特征节点的数据维度不匹配时,利用适配网络对所述第一特征节点处的初始特征数据进行数据维度转换,得到所述第一编码节点处的待编码特征数据。
  13. 根据权利要求12所述的方法,其中,所述利用编码网络对所述初始特征数据进行编码处理,并将得到的编码比特写入码流,包括:
    将所述待编码特征数据输入到所述编码网络中的所述第一编码节点,利用所述编码网络对所述待编码特征数据进行编码处理,并将得到的编码比特写入码流。
  14. 根据权利要求12所述的方法,其中,所述适配网络包括一层或多层网络结构。
  15. 根据权利要求9所述的方法,其中,所述方法还包括:
    确定至少一个训练样本;
    利用所述至少一个训练样本对预设网络模型进行训练;其中,所述预设网络模型包括初始编码网络、初始解码网络和初始智能任务网络,且所述初始编码网络和所述初始解码网络与所述初始智能任务网络通过节点连接;
    当所述预设网络模型对应的损失函数收敛到预设阈值时,将训练后得到的模型确定为智能融合网络模型;其中,所述智能融合网络模型包括所述编码网络、解码网络和所述智能任务网络。
  16. 根据权利要求15所述的方法,其中,所述智能任务网络包括特征提取子网络和特征分析子网络,所述方法还包括:
    利用所述特征提取子网络对输入图像数据进行特征提取,得到初始特征数据;
    利用所述特征分析子网络对所述初始特征数据进行特征分析,确定目标结果。
  17. 一种码流,所述码流是根据待编码信息进行比特编码生成的;其中,所述待编码信息至少包括初始特征数据,所述初始特征数据是通过智能任务网络对输入图像数据进行特征提取得到的。
  18. 一种编码器,所述编码器包括第一特征提取单元和编码单元;其中,
    所述第一特征提取单元,配置为利用智能任务网络对输入图像数据进行特征提取,得到初始特征数据;
    所述编码单元,配置为利用编码网络对所述初始特征数据进行编码处理,并将得到的编码比特写入码流。
  19. 一种编码器,所述编码器包括第一存储器和第一处理器;其中,
    所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;
    所述第一处理器,用于在运行所述计算机程序时,执行如权利要求9至16任一项所述的方法。
  20. 一种解码器,所述解码器包括解析单元和特征分析单元;其中,
    所述解析单元,配置为解析码流,确定重建特征数据;
    所述特征分析单元,配置为利用智能任务网络对所述重建特征数据进行特征分析,确定目标结果。
  21. 一种解码器,所述解码器包括第二存储器和第二处理器;其中,
    所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;
    所述第二处理器,用于在运行所述计算机程序时,执行如权利要求1至8任一项所述的方法。
  22. 一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被执行时实现如权利要求1至9任一项所述的方法、或者实现如权利要求9至16任一项所述的方法。
  23. 一种智能分析***,其中,所述智能分析***至少包括如权利要求18或19所述的编码器和如权利要求20或21所述的解码器。
PCT/CN2021/122480 2021-09-30 2021-09-30 编解码方法、码流、编码器、解码器、存储介质和*** WO2023050439A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202180102716.0A CN117981309A (zh) 2021-09-30 2021-09-30 编解码方法、码流、编码器、解码器、存储介质和***
PCT/CN2021/122480 WO2023050439A1 (zh) 2021-09-30 2021-09-30 编解码方法、码流、编码器、解码器、存储介质和***
US18/618,752 US20240244218A1 (en) 2021-09-30 2024-03-27 Encoding method, decoding method, bitstream, encoder, decoder, storage medium, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/122480 WO2023050439A1 (zh) 2021-09-30 2021-09-30 编解码方法、码流、编码器、解码器、存储介质和***

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/618,752 Continuation US20240244218A1 (en) 2021-09-30 2024-03-27 Encoding method, decoding method, bitstream, encoder, decoder, storage medium, and system

Publications (1)

Publication Number Publication Date
WO2023050439A1 true WO2023050439A1 (zh) 2023-04-06

Family

ID=85781230

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/122480 WO2023050439A1 (zh) 2021-09-30 2021-09-30 编解码方法、码流、编码器、解码器、存储介质和***

Country Status (3)

Country Link
US (1) US20240244218A1 (zh)
CN (1) CN117981309A (zh)
WO (1) WO2023050439A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104717512A (zh) * 2013-12-16 2015-06-17 浙江大学 一种前向双假设编码图像块的编解码方法和装置
EP3315006A1 (en) * 2016-10-31 2018-05-02 Carnegie Mellon University Control arrangement and method for controlling a position of a transfer device of a harvesting machine
CN109862315A (zh) * 2019-01-24 2019-06-07 华为技术有限公司 视频处理方法、相关设备及计算机存储介质
CN111325252A (zh) * 2020-02-12 2020-06-23 腾讯科技(深圳)有限公司 图像处理方法、装置、设备、介质
CN112866697A (zh) * 2020-12-31 2021-05-28 杭州海康威视数字技术股份有限公司 视频图像编解码方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104717512A (zh) * 2013-12-16 2015-06-17 浙江大学 一种前向双假设编码图像块的编解码方法和装置
EP3315006A1 (en) * 2016-10-31 2018-05-02 Carnegie Mellon University Control arrangement and method for controlling a position of a transfer device of a harvesting machine
CN109862315A (zh) * 2019-01-24 2019-06-07 华为技术有限公司 视频处理方法、相关设备及计算机存储介质
CN111325252A (zh) * 2020-02-12 2020-06-23 腾讯科技(深圳)有限公司 图像处理方法、装置、设备、介质
CN112866697A (zh) * 2020-12-31 2021-05-28 杭州海康威视数字技术股份有限公司 视频图像编解码方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN117981309A (zh) 2024-05-03
US20240244218A1 (en) 2024-07-18

Similar Documents

Publication Publication Date Title
Minnen et al. Joint autoregressive and hierarchical priors for learned image compression
CN110059796B (zh) 卷积神经网络的生成方法及装置
CN112866694B (zh) 联合非对称卷积块和条件上下文的智能图像压缩优化方法
TW202247650A (zh) 使用機器學習系統進行隱式圖像和視訊壓縮
WO2020192034A1 (zh) 滤波方法及装置、计算机存储介质
CN110971901A (zh) 卷积神经网络的处理方法及装置
TW202312031A (zh) 使用機器學習系統的網路參數子空間中的實例自我調整影像和視訊壓縮
CN114096987A (zh) 视频处理方法及装置
TW202337211A (zh) 條件圖像壓縮
US20240242467A1 (en) Video encoding and decoding method, encoder, decoder and storage medium
Rhee et al. Channel-wise progressive learning for lossless image compression
WO2023024115A1 (zh) 编码方法、解码方法、编码器、解码器和解码***
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
TWI826160B (zh) 圖像編解碼方法和裝置
WO2023050439A1 (zh) 编解码方法、码流、编码器、解码器、存储介质和***
KR20160078984A (ko) 오리지널 이미지의 저품질 버전 및 에피톰으로부터 오리지널 이미지의 추정치를 구축하기 위한 방법 및 장치
WO2021056214A1 (zh) 编码方法、解码方法、编码器、解码器以及存储介质
CN118120233A (zh) 基于注意力的图像和视频压缩上下文建模
CN117321989A (zh) 基于神经网络的图像处理中的辅助信息的独立定位
WO2022246809A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
WO2022067806A1 (zh) 一种视频编解码方法、编码器、解码器及存储介质
US20240020884A1 (en) Online meta learning for meta-controlled sr in image and video compression
KR102604657B1 (ko) 영상 압축 성능 개선 방법 및 장치
WO2024120499A1 (en) Method, apparatus, and medium for visual data processing
WO2023279968A1 (zh) 视频图像的编解码方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21959005

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180102716.0

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021959005

Country of ref document: EP

Effective date: 20240430