CN115861343B - Arbitrary scale image representation method and system based on dynamic implicit image function - Google Patents

Arbitrary scale image representation method and system based on dynamic implicit image function Download PDF

Info

Publication number
CN115861343B
CN115861343B CN202211590183.8A CN202211590183A CN115861343B CN 115861343 B CN115861343 B CN 115861343B CN 202211590183 A CN202211590183 A CN 202211590183A CN 115861343 B CN115861343 B CN 115861343B
Authority
CN
China
Prior art keywords
coordinate
slice
image
processing
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211590183.8A
Other languages
Chinese (zh)
Other versions
CN115861343A (en
Inventor
金枝
何宗耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Sun Yat Sen University Shenzhen Campus
Original Assignee
Sun Yat Sen University
Sun Yat Sen University Shenzhen Campus
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University, Sun Yat Sen University Shenzhen Campus filed Critical Sun Yat Sen University
Priority to CN202211590183.8A priority Critical patent/CN115861343B/en
Publication of CN115861343A publication Critical patent/CN115861343A/en
Application granted granted Critical
Publication of CN115861343B publication Critical patent/CN115861343B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for representing an image with any scale based on a dynamic implicit image function, wherein the method comprises the steps of obtaining an image to be processed; performing implicit coding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map; inputting the two-dimensional feature map into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional feature map, and carrying out pixel value prediction processing through a double-stage multi-layer sensor to obtain an image pixel value. The embodiment of the invention can reduce the calculation cost of continuous image representation, improve the processing performance and can be widely applied to the technical field of artificial intelligence.

Description

Arbitrary scale image representation method and system based on dynamic implicit image function
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a system for representing an image with any scale based on a dynamic implicit image function.
Background
Digital images are two-dimensional representations of the real world in the digital world, but the continuous physical world is often quantized in a sensor while stored in a computer in the form of a discrete matrix of pixels. If the images can be expressed in a continuous form, the images of any resolution can be acquired in a continuous space, thereby ensuring the accuracy of the described scene of the images. Although the continuous representation method for the images in the related art has excellent performance in continuous image representation, the calculation cost is increased in square order along with the increase of the image magnification, so that the super-resolution reconstruction of any scale is huge in time consumption. In view of the foregoing, there is a need for solving the technical problems in the related art.
Disclosure of Invention
In view of this, the embodiment of the invention provides a method and a system for representing an image at any scale based on a dynamic implicit image function, so as to reduce the calculation cost and improve the processing performance.
In one aspect, the present invention provides a method for representing an arbitrary scale image based on a dynamic implicit image function, including:
acquiring an image to be processed;
Performing implicit coding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;
Inputting the two-dimensional feature map into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional feature map, and carrying out pixel value prediction processing through a double-stage multi-layer sensor to obtain an image pixel value.
Optionally, the performing dynamic coordinate slicing processing on the two-dimensional feature map includes:
inputting the magnification of the image;
Acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and carrying out grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate set;
and carrying out slicing treatment on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the slicing processing is performed on the feature coordinate set according to the image magnification factor to obtain a coordinate slice, including:
Determining a slice interval according to the image magnification;
and dividing the characteristic coordinate set according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
Optionally, the pixel value prediction processing by the dual-stage multi-layer sensor includes:
Inputting a coordinate slice and a slice hidden code;
Performing first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
obtaining a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
And carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Optionally, the dual-stage multi-layer sensor comprises a hidden layer consisting of a linear layer and an activation function.
Optionally, before the pre-trained encoder performs implicit coding processing on the image to be processed to obtain a two-dimensional feature map, the method further includes pre-training the encoder and a dynamic implicit image network, and specifically includes:
Acquiring a training image;
Performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
And updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain a trained encoder and dynamic implicit image network.
In another aspect, an embodiment of the present invention further provides a system, including:
The first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;
and the third module is used for inputting the two-dimensional feature map into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional feature map, and carrying out pixel value prediction processing through a double-stage multi-layer sensor to obtain an image pixel value.
Optionally, the third module includes:
the first sub-module is used for carrying out dynamic coordinate slicing processing on the two-dimensional feature map;
and the second sub-module is used for carrying out pixel value prediction processing through the double-stage multi-layer perceptron.
Optionally, the first submodule includes:
A first unit for inputting an image magnification;
the second unit is used for acquiring the feature vector from the two-dimensional feature map and determining the feature vector as a hidden code, and carrying out grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate set;
And the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the second submodule includes:
a fourth unit for inputting a coordinate slice and a slice hidden code;
A fifth unit, configured to perform a first stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
a sixth unit, configured to obtain coordinates to be predicted, where the coordinates to be predicted are arbitrary coordinates in the coordinate slice;
And a seventh unit, configured to perform a second stage processing on the slice hidden vector according to the coordinate to be predicted, to obtain a pixel value of the coordinate to be predicted.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects: according to the embodiment of the invention, the two-dimensional feature map is input into a dynamic implicit image network, the two-dimensional feature map is subjected to dynamic coordinate slicing, so that the neural network can execute many-to-many mapping from a coordinate slice to a pixel value slice, a decoder can predict all pixel values corresponding to the coordinate slice by using hidden codes only once, and the calculation cost is reduced; and the pixel value is predicted by the double-stage multi-layer perceptron to obtain the image pixel value, so that the decoder can use coordinates with non-fixed quantity as input, thereby reducing the quantity of hidden layers and improving the processing performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an arbitrary scale image representation method based on a dynamic implicit image function provided by an embodiment of the present application;
FIG. 2 is an overall frame diagram of a dynamic implicit image function provided by an embodiment of the present application;
FIG. 3 is a diagram illustrating an example of a coordinate slice provided by an embodiment of the present application;
Fig. 4 is a block diagram of a dual-stage multi-layer sensor according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The embodiment of the application provides a method and a system for representing an image at any scale based on a dynamic implicit image function, which mainly relate to the artificial intelligence technology. Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology is a theory, method, technique, and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend, and extend human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Wherein, the artificial intelligence basic technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing technology, an operation/interaction system, electromechanical integration and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Specifically, the method and the system for representing the image at any scale based on the dynamic implicit image function provided by the embodiment of the application can analyze and process the image by adopting a computer vision technology and a machine learning/depth learning technology in the artificial intelligence field so as to obtain continuous image representation of the image. It can be understood that, for different tasks, the method provided in the embodiment of the present application may be executed in an application scenario of the corresponding artificial intelligence system; moreover, the specific execution time of the methods can be in any link in the operation flow of the artificial intelligence system.
Implicit neural representation techniques-implicit neural representations are capable of capturing details of an object with a small number of parameters, as compared to explicit representations, and their differentiable nature allows back propagation through a neural rendering model. However, implicit neural representations, when applied on a two-dimensional visual task, typically require independent predictions of each pixel, requiring significant computational costs and lengthy run-time.
A Local IMPLICIT IMAGE Function (LIIF), which is a novel implicit representation of an image, uses a multi-layer perceptron to infer the pixel values at each coordinate.
In the related art, although LIIF can provide stable performance in any scale super-resolution task of up to 30 times, its calculation cost increases rapidly with the increase of magnification.
In view of this, referring to fig. 1, an embodiment of the present invention provides a method for representing an arbitrary scale image based on a dynamic implicit image function, including:
s101, acquiring an image to be processed;
S102, performing implicit coding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;
S103, inputting the two-dimensional feature map into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional feature map, and carrying out pixel value prediction processing through a double-stage multi-layer sensor to obtain an image pixel value.
In the embodiment of the invention, a dynamic implicit image Function (DYNAMIC IMPLICIT IMAGE Function, DIIF) is provided, which is a fast and effective arbitrary scale image representation method. Referring to fig. 2, i in represents an input image, and an encoder maps the input image to a two-dimensional feature map as its DIIF representation. Given the resolution of the real image, the hidden code z * and the coordinate slices around the hidden code can be obtained from the two-dimensional feature mapWhere X 1st represents the first coordinates of the coordinate slice and X last represents the last coordinates of the coordinate slice. The decoding function then uses the information described above to predict all pixel values of the coordinate slice, i.e., the pixel value prediction of the coordinates is performed by the two-stage multi-layer perceptron (or referred to as coarse-to-fine multi-layer perceptron), the slice hidden vector H * is predicted by the first stage (coarse stage), and the pixel value I out-i of the coordinates to be predicted is output together with the coordinates to be predicted X i as input to the second stage (fine stage). In the training stage, the embodiment of the invention calculates a loss function by using the predicted pixel value I out-i and the pixel value I gt-i of the real image, the encoder and the decoding function are jointly trained in a self-supervision super-resolution task, and the learned network parameters are shared by all the images. Embodiments of the present invention enable a neural network to perform a many-to-many mapping from coordinate slices to pixel value slices by using image coordinate grouping and slicing strategies, rather than predicting pixel values for a given coordinate individually at a time. The embodiment of the invention further provides a dual-stage multi-layer perceptron (Coarse-to-FineMultilayer Perceptron, C2F-MLP) for executing image decoding based on a dynamic coordinate slicing strategy, so that the number of coordinates in each slice can be changed along with the change of magnification, and the calculation cost required by large-scale super-resolution can be obviously reduced by using DIIF of the dynamic coordinate slicing strategy. Experimental results show that DIIF achieves optimal calculation efficiency and super-resolution performance compared with the existing super-resolution method with any scale.
Further as a preferred embodiment, the performing a dynamic coordinate slicing process on the two-dimensional feature map includes:
inputting the magnification of the image;
Acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and carrying out grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate set;
and carrying out slicing treatment on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
In the embodiment of the invention, a vector is selected from the two-dimensional feature map as the hidden code, and the coordinates closer to the hidden code than other hidden codes in the two-dimensional feature map are grouped according to the hidden code, so that a feature coordinate set is obtained. The hidden code can be shared within one coordinate set by the feature coordinate set so that the decoder can use the hidden code only once to predict all pixel values corresponding to the coordinate set. The number of coordinates in a coordinate set is proportional to the magnification, so that the larger the magnification is, the more calculation cost can be saved. Coordinate grouping requires the decoder to predict all pixel values of the coordinate group at the same time, which can place a heavy burden on the decoder when performing large scale super resolution. The embodiment of the invention provides a reasonable solution that the characteristic coordinate set is sliced according to the image magnification to obtain the coordinate slice, one coordinate set is divided into a plurality of coordinate slices, and the hidden code input is shared only in the coordinate slice but not in the whole coordinate set.
Further as a preferred embodiment, the slicing processing is performed on the feature coordinate set according to the image magnification, to obtain a coordinate slice, including:
Determining a slice interval according to the image magnification;
and dividing the characteristic coordinate set according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice.
Among them, the simplest method to set the proper slice interval to achieve the best performance and efficiency balance is to fix the coordinate slices, which in any case use a fixed slice interval. However, this strategy preserves the square order increasing nature of the computational cost as magnification increases. In addition, there are two major problems of spatial discontinuities and redundant coordinates within a coordinate slice. To address these problems, embodiments of the present invention propose dynamic coordinate slicing to adjust slice spacing as magnification changes. The first strategy that can be adopted by embodiments of the present invention is linear order coordinate slicing, which sets the slice interval to a magnification. When linear order coordinate slicing is used, the computational cost of DIIF increases linearly with increasing magnification. Another strategy is to set the slice interval to the square of the magnification, known as a constant order coordinate slice. When using a constant order coordinate slice, the computational cost of DIIF is determined only by the resolution of the input image, which remains unchanged as the magnification increases. In the embodiment of the invention, the characteristic coordinate set is divided according to the slice interval to obtain the coordinate slice, and the coordinate slice is used for sharing the same hidden code for all coordinates in the slice. Referring to fig. 3, fig. 3 is a 4-coordinate-magnification group and employs coordinate slices with a slice interval of 4, Z * represents a hidden code, X 1st represents the first coordinate of the coordinate slice, and X last represents the last coordinate of the coordinate slice.
Further as a preferred embodiment, the pixel value prediction processing by the dual-stage multi-layer sensor includes:
Inputting a coordinate slice and a slice hidden code;
Performing first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
obtaining a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
And carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
Wherein in order to perform a dynamic coordinate slicing strategy, the decoder needs to have scalability using a non-fixed number of coordinates as input and outputting corresponding pixel values. However, the general MLP only allows using a fixed length vector as an input. To solve this problem, an embodiment of the present invention proposes a dual-stage multi-layer perceptron (C2F-MLP) as a decoder, divided into a first stage (coarse stage) for predicting slice hidden vectors and a second stage (fine stage) for predicting pixel values. In the embodiment of the invention, the hidden layer in the rough stage takes the boundary coordinates of the coordinate slice and the corresponding hidden codes thereof as input to generate the slice hidden vector. The slice hidden vector contains information of all pixel values in the slice and is used as input for the fine phase. The computational cost of the coarse phase is determined by the number of coordinate slices, which is much smaller than the number of output coordinates due to the use of the dynamic coordinate slicing strategy. The coarse phase also allows the decoding function to exploit spatial relationships within the slice, which makes its prediction of pixel values more accurate. The concealment layer of the fine phase takes as input the slice concealment vector output by the coarse phase and any coordinates in a given coordinate slice to predict the pixel values at that coordinate. The fine phase is designed to independently predict pixel values on the coordinates to be predicted. The decoding function employed in the fine phase can be expressed as:
I(X*)=fθ(z*,[xtl-v*,…,xrb-v*]);
Where I is the pixel value, X *=[xtl,…,xrb is the given coordinate slice, f θ is the decoder, z * is the hidden code corresponding to the coordinate slice, v * is the coordinates of the hidden code, and X tl and X rb are the first and last coordinates, respectively, of the coordinate slice.
Since the length of the slice hidden vector is shorter than the length of the hidden code and the number of hidden layers in the fine phase is smaller, the computational cost required for the fine phase of DIIF is significantly lower compared to the decoder of LIIF.
Further as a preferred embodiment, the dual-stage multi-layer sensor comprises a hidden layer consisting of a linear layer and an activation function.
Referring to fig. 4, the C2F-MLP divides the decoder into a coarse phase for predicting slice hidden vectors and a fine phase for predicting pixel values. The hidden layer of the C2F-MLP consists of a linear layer with dimension 256, followed by a ReLU activation function. In the rough stage, the hidden code z *, the first coordinate X 1st of the coordinate slice, the last coordinate X last of the coordinate slice and the pixel area a under the current magnification are taken as inputs, and the coordinate hidden vector H lt~rb is obtained by outputting. In the fine stage, the coordinate hidden vector and the coordinate X I to be predicted are input, and I i is obtained through output. To predict the RGB values, the fine phase finally uses an output linear layer of dimension 3.
Further as a preferred embodiment, before the pre-trained encoder performs implicit encoding processing on the image to be processed to obtain a two-dimensional feature map, the method further includes pre-training the encoder and a dynamic implicit image network, specifically including:
Acquiring a training image;
Performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
And updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain a trained encoder and dynamic implicit image network.
In an embodiment of the invention, the training phase uses the predicted pixel values and the pixel values of the real image to calculate the pixel level loss. The encoder and decoding functions are trained jointly in a self-supervising super-resolution task, while the learned network parameters are shared by all images.
In another aspect, an embodiment of the present invention further provides a system, including:
The first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;
and the third module is used for inputting the two-dimensional feature map into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional feature map, and carrying out pixel value prediction processing through a double-stage multi-layer sensor to obtain an image pixel value.
Optionally, the third module includes:
the first sub-module is used for carrying out dynamic coordinate slicing processing on the two-dimensional feature map;
and the second sub-module is used for carrying out pixel value prediction processing through the double-stage multi-layer perceptron.
Optionally, the first submodule includes:
A first unit for inputting an image magnification;
the second unit is used for acquiring the feature vector from the two-dimensional feature map and determining the feature vector as a hidden code, and carrying out grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate set;
And the third unit is used for carrying out slicing processing on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice.
Optionally, the second submodule includes:
a fourth unit for inputting a coordinate slice and a slice hidden code;
A fifth unit, configured to perform a first stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
a sixth unit, configured to obtain coordinates to be predicted, where the coordinates to be predicted are arbitrary coordinates in the coordinate slice;
And a seventh unit, configured to perform a second stage processing on the slice hidden vector according to the coordinate to be predicted, to obtain a pixel value of the coordinate to be predicted.
The invention provides a method and a system for representing an arbitrary-scale image based on a dynamic implicit image function, which are used for rapidly and effectively representing the arbitrary-scale image. In DIIF, the pixel-based image is represented as a two-dimensional feature map, and the decoding function takes as input the coordinate slices and local feature vectors, predicting the corresponding set of pixel values. By sharing local feature vectors inside the coordinate slices DIIF a large scale super-resolution reconstruction can be performed at very low computational cost. Experimental results show that the super-resolution performance and the calculation efficiency of DIIF are superior to those of the existing arbitrary scale super-resolution method on all scaling factors. DIIF can save up to 87% of the computational cost and always has better PSNR performance compared to LIIF. DIIF can be efficiently applied to scenes where images need to be presented in real time at any resolution. By applying the embodiment of the invention, any zooming function in image viewing/editing software can be realized, and the low-resolution image can be amplified and repaired, and the high-resolution image can be compressed and stored.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present application has been described in detail, the present application is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present application, and these equivalent modifications or substitutions are included in the scope of the present application as defined in the appended claims.

Claims (4)

1. An arbitrary scale image representation method based on a dynamic implicit image function, the method comprising:
acquiring an image to be processed;
Performing implicit coding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;
inputting the two-dimensional feature map into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional feature map, and carrying out pixel value prediction processing through a double-stage multi-layer sensor to obtain an image pixel value;
The processing of the dynamic coordinate slicing on the two-dimensional feature map comprises the following steps:
inputting the magnification of the image;
Acquiring a feature vector from the two-dimensional feature map, determining the feature vector as a hidden code, and carrying out grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate set;
slicing the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice;
the processing of the feature coordinate set to obtain a coordinate slice according to the image magnification factor comprises the following steps:
Determining a slice interval according to the image magnification;
dividing the characteristic coordinate set according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice;
the pixel value prediction processing by the dual-stage multi-layer sensor comprises the following steps:
Inputting a coordinate slice and a slice hidden code;
Performing first-stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
obtaining a coordinate to be predicted, wherein the coordinate to be predicted is any coordinate in the coordinate slice;
And carrying out second-stage processing on the slice hidden vector according to the coordinate to be predicted to obtain a pixel value of the coordinate to be predicted.
2. The method of claim 1, wherein the dual-stage multi-layer perceptron comprises a hidden layer, the hidden layer consisting of a linear layer and an activation function.
3. Method according to any one of claims 1 to 2, characterized in that before said implicit coding of said image to be processed by means of a pre-trained encoder, obtaining a two-dimensional feature map, the method further comprises pre-training said encoder and a dynamic implicit image network, in particular comprising:
Acquiring a training image;
Performing pixel prediction processing on the training image through the encoder and the dynamic implicit image network to obtain a predicted pixel value;
determining a pixel loss value according to the pixel value of the training image and the predicted pixel value;
And updating the weight parameters of the encoder and the dynamic implicit image network according to the pixel loss value to obtain a trained encoder and dynamic implicit image network.
4. An arbitrary scale image representation system based on a dynamic implicit image function, the system comprising:
The first module is used for acquiring an image to be processed;
the second module is used for carrying out implicit coding processing on the image to be processed through a pre-trained encoder to obtain a two-dimensional feature map;
the third module is used for inputting the two-dimensional feature map into a dynamic implicit image network, carrying out dynamic coordinate slicing processing on the two-dimensional feature map, and carrying out pixel value prediction processing through a double-stage multi-layer sensor to obtain an image pixel value;
the third module includes:
the first sub-module is used for carrying out dynamic coordinate slicing processing on the two-dimensional feature map;
The second sub-module is used for carrying out pixel value prediction processing through the double-stage multi-layer perceptron;
the first sub-module includes:
A first unit for inputting an image magnification;
the second unit is used for acquiring the feature vector from the two-dimensional feature map and determining the feature vector as a hidden code, and carrying out grouping processing on coordinates in the two-dimensional feature map according to the hidden code to obtain a feature coordinate set;
The third unit is used for carrying out slicing treatment on the characteristic coordinate set according to the image magnification factor to obtain a coordinate slice;
The third unit is configured to perform slicing processing on the feature coordinate set according to the image magnification factor to obtain a coordinate slice, and includes:
Determining a slice interval according to the image magnification;
dividing the characteristic coordinate set according to the slice interval to obtain a coordinate slice, wherein the coordinate slice is used for sharing the same hidden code for all coordinates in the slice;
The second sub-module includes:
a fourth unit for inputting a coordinate slice and a slice hidden code;
A fifth unit, configured to perform a first stage processing on the coordinate slice and the slice hidden code to obtain a slice hidden vector;
a sixth unit, configured to obtain coordinates to be predicted, where the coordinates to be predicted are arbitrary coordinates in the coordinate slice;
And a seventh unit, configured to perform a second stage processing on the slice hidden vector according to the coordinate to be predicted, to obtain a pixel value of the coordinate to be predicted.
CN202211590183.8A 2022-12-12 2022-12-12 Arbitrary scale image representation method and system based on dynamic implicit image function Active CN115861343B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211590183.8A CN115861343B (en) 2022-12-12 2022-12-12 Arbitrary scale image representation method and system based on dynamic implicit image function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211590183.8A CN115861343B (en) 2022-12-12 2022-12-12 Arbitrary scale image representation method and system based on dynamic implicit image function

Publications (2)

Publication Number Publication Date
CN115861343A CN115861343A (en) 2023-03-28
CN115861343B true CN115861343B (en) 2024-06-04

Family

ID=85672081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211590183.8A Active CN115861343B (en) 2022-12-12 2022-12-12 Arbitrary scale image representation method and system based on dynamic implicit image function

Country Status (1)

Country Link
CN (1) CN115861343B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014197994A1 (en) * 2013-06-12 2014-12-18 University Health Network Method and system for automated quality assurance and automated treatment planning in radiation therapy
CN111784570A (en) * 2019-04-04 2020-10-16 Tcl集团股份有限公司 Video image super-resolution reconstruction method and device
KR102193108B1 (en) * 2019-10-10 2020-12-18 서울대학교산학협력단 Observation method for two-dimensional river mixing using RGB image acquired by the unmanned aerial vehicle
CN112163655A (en) * 2020-09-30 2021-01-01 上海麦广互娱文化传媒股份有限公司 Dynamic implicit two-dimensional code and generation and detection method and device thereof
CN112419150A (en) * 2020-11-06 2021-02-26 中国科学技术大学 Random multiple image super-resolution reconstruction method based on bilateral up-sampling network
CN112446489A (en) * 2020-11-25 2021-03-05 天津大学 Dynamic network embedded link prediction method based on variational self-encoder
WO2021122850A1 (en) * 2019-12-17 2021-06-24 Canon Kabushiki Kaisha Method, device, and computer program for improving encapsulation of media content
WO2021183336A1 (en) * 2020-03-09 2021-09-16 Schlumberger Technology Corporation Fast front tracking in eor flooding simulation on coarse grids
WO2021216747A1 (en) * 2020-04-21 2021-10-28 Massachusetts Institute Of Technology Real-Time Photorealistic 3D Holography with Deep Neural Networks
EP3907695A1 (en) * 2019-08-14 2021-11-10 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof
CN113689539A (en) * 2021-07-06 2021-11-23 清华大学 Dynamic scene real-time three-dimensional reconstruction method and device based on implicit optical flow field
CN113947521A (en) * 2021-10-14 2022-01-18 展讯通信(上海)有限公司 Image resolution conversion method and device based on deep neural network and terminal equipment
US11308657B1 (en) * 2021-08-11 2022-04-19 Neon Evolution Inc. Methods and systems for image processing using a learning engine
CN114897912A (en) * 2022-04-24 2022-08-12 广东工业大学 Three-dimensional point cloud segmentation method and system based on enhanced cyclic slicing network
CN115049556A (en) * 2022-06-27 2022-09-13 安徽大学 StyleGAN-based face image restoration method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10185584B2 (en) * 2013-08-20 2019-01-22 Teleputers, Llc System and method for self-protecting data
US10403007B2 (en) * 2017-03-07 2019-09-03 Children's Medical Center Corporation Registration-based motion tracking for motion-robust imaging
US11847560B2 (en) * 2020-07-27 2023-12-19 Robert Bosch Gmbh Hardware compute fabrics for deep equilibrium models

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014197994A1 (en) * 2013-06-12 2014-12-18 University Health Network Method and system for automated quality assurance and automated treatment planning in radiation therapy
CN111784570A (en) * 2019-04-04 2020-10-16 Tcl集团股份有限公司 Video image super-resolution reconstruction method and device
EP3907695A1 (en) * 2019-08-14 2021-11-10 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof
KR102193108B1 (en) * 2019-10-10 2020-12-18 서울대학교산학협력단 Observation method for two-dimensional river mixing using RGB image acquired by the unmanned aerial vehicle
WO2021122850A1 (en) * 2019-12-17 2021-06-24 Canon Kabushiki Kaisha Method, device, and computer program for improving encapsulation of media content
WO2021183336A1 (en) * 2020-03-09 2021-09-16 Schlumberger Technology Corporation Fast front tracking in eor flooding simulation on coarse grids
WO2021216747A1 (en) * 2020-04-21 2021-10-28 Massachusetts Institute Of Technology Real-Time Photorealistic 3D Holography with Deep Neural Networks
CN112163655A (en) * 2020-09-30 2021-01-01 上海麦广互娱文化传媒股份有限公司 Dynamic implicit two-dimensional code and generation and detection method and device thereof
CN112419150A (en) * 2020-11-06 2021-02-26 中国科学技术大学 Random multiple image super-resolution reconstruction method based on bilateral up-sampling network
CN112446489A (en) * 2020-11-25 2021-03-05 天津大学 Dynamic network embedded link prediction method based on variational self-encoder
CN113689539A (en) * 2021-07-06 2021-11-23 清华大学 Dynamic scene real-time three-dimensional reconstruction method and device based on implicit optical flow field
US11308657B1 (en) * 2021-08-11 2022-04-19 Neon Evolution Inc. Methods and systems for image processing using a learning engine
CN113947521A (en) * 2021-10-14 2022-01-18 展讯通信(上海)有限公司 Image resolution conversion method and device based on deep neural network and terminal equipment
CN114897912A (en) * 2022-04-24 2022-08-12 广东工业大学 Three-dimensional point cloud segmentation method and system based on enhanced cyclic slicing network
CN115049556A (en) * 2022-06-27 2022-09-13 安徽大学 StyleGAN-based face image restoration method

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
"3D场景表征—神经辐射场(NeRF)近期成果综述";朱方;《 中国传媒大学学报(自然科学版) 》;20221020;全文 *
"Meta-sr: A magnification-arbitrary network for super-resolution";Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, and Jian Sun.;《https://doi.org/10.48550/arXiv.1903.00875》;20190303;全文 *
Huanrong Zhang ; Jie Xiao ; Zhi Jin."Multi-scale Image Super-Resolution via A Single Extendable Deep Network" .《 IEEE Journal of Selected Topics in Signal Processing》.2020,全文. *
Luke Lozenski ; Mark A. Anastasio ; Umberto Villa."A Memory-Efficient Self-Supervised Dynamic Image Reconstruction Method Using Neural Fields".《IEEE Transactions on Computational Imaging》.2022,全文. *
Ning Ni ; Hanlin Wu ; Libao Zhang."A Memory-Efficient Self-Supervised Dynamic Image Reconstruction Method Using Neural Fields".《2022 IEEE International Conference on Image Processing (ICIP)》 .2022,全文. *
Xin Huang ; Qi Zhang ; Ying Feng ; Hongdong Li ; Xuan Wang ; Qing Wang ."HDR-NeRF: High Dynamic Range Neural Radiance Fields".《2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》.2022,全文. *
Yinbo Chen ; Sifei Liu ; Xiaolong Wang."Learning Continuous Image Representation with Local Implicit Image Function".《2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》.2021,全文. *
基于Softplus+HKELM的彩色图像超分辨率算法;王亚刚;王萌;;计算机与数字工程;20200120(第01期);全文 *
李哲远 ; 陈翔宇 ; 乔宇 ; 董超 ; 井焜."注意力机制在单图像超分辨率中的分析研究".《集成技术》.2022,全文. *
李征 ; 金迪 ; 黄雪原 ; 袁科."基于隐式反馈的推荐研究综述".《河南大学学报(自然科学版)》.2022,全文. *
边缘修正的多尺度卷积神经网络重建算法;程德强;蔡迎春;陈亮亮;宋玉龙;;激光与光电子学进展;20180328(第09期);全文 *

Also Published As

Publication number Publication date
CN115861343A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN110782490B (en) Video depth map estimation method and device with space-time consistency
Ye et al. Inverted pyramid multi-task transformer for dense scene understanding
CN113034380B (en) Video space-time super-resolution method and device based on improved deformable convolution correction
Endo et al. Animating landscape: self-supervised learning of decoupled motion and appearance for single-image video synthesis
Li et al. Hst: Hierarchical swin transformer for compressed image super-resolution
CA3137297C (en) Adaptive convolutions in neural networks
Grant et al. Deep disentangled representations for volumetric reconstruction
Zhai et al. Optical flow estimation using channel attention mechanism and dilated convolutional neural networks
CN115294282A (en) Monocular depth estimation system and method for enhancing feature fusion in three-dimensional scene reconstruction
CN113095254A (en) Method and system for positioning key points of human body part
CN115205150A (en) Image deblurring method, device, equipment, medium and computer program product
CN115272437A (en) Image depth estimation method and device based on global and local features
CN113066018A (en) Image enhancement method and related device
CN114283347A (en) Target detection method, system, intelligent terminal and computer readable storage medium
Gao et al. Augmented weighted bidirectional feature pyramid network for marine object detection
CN113436224B (en) Intelligent image clipping method and device based on explicit composition rule modeling
Kim et al. Latent transformations neural network for object view synthesis
Suzuki et al. Residual learning of video frame interpolation using convolutional LSTM
CN115861343B (en) Arbitrary scale image representation method and system based on dynamic implicit image function
WO2023170069A1 (en) Generating compressed representations of video for efficient learning of video tasks
Chen et al. Adaptive hybrid composition based super-resolution network via fine-grained channel pruning
US20220172421A1 (en) Enhancement of Three-Dimensional Facial Scans
Xiang et al. InvFlow: Involution and multi-scale interaction for unsupervised learning of optical flow
CN113902985A (en) Training method and device of video frame optimization model and computer equipment
WO2023051408A1 (en) Feature map processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant