Detailed Description
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
FIG. 1 is a block diagram of a video decoding apparatus according to an embodiment of the present invention.
As shown in fig. 1, the video decoding apparatus 100 includes a video decoder 110, a volatile memory 120, and a memory unit 130. The video decoder 110 is used for decoding a video stream to obtain a plurality of image frames. In some embodiments, video decoder 110 supports Video codec standards such as open media alliance (AOMedia) Video1 (AV 1), high Efficiency Video Coding (HEVC), and VP9, although this disclosure is not limited in this respect. In an embodiment, the video decoder 110 can be implemented by a hardware circuit, such as an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other logic circuits with the same function, but the invention is not limited thereto. In another embodiment, the video decoder 110 can be implemented by a central processing unit or a digital signal processor executing the corresponding program code.
The volatile memory 120 is, for example, a static random access memory (static random access memory) for temporarily storing decoded blocks required by the video decoder 110 in the video decoding process. In some embodiments. In some embodiments, the video decoder 110 and the volatile memory 120 may be integrated into an integrated circuit (integrated circuit) 150.
In one embodiment, the video decoder 110 may receive a video stream generated from an encoder for decoding. In another embodiment, the video decoding apparatus 100 further comprises a storage device 140, and the video stream can be stored in the storage device 140 in the form of a video file, for example. The video decoder 110 may read the video stream in the video file for decoding by the storage device 140. The storage device 140 is a non-volatile memory, such as a hard disk drive (hard disk drive), a solid-state disk (solid-state disk), a flash memory (flash memory), etc., but the invention is not limited thereto.
FIG. 2 is a diagram of superblocks and recursive partitioning into coding units according to an embodiment of the present invention. As shown in fig. 2, superblock 200 is 128x128 pixels in size. For example, in the AV1 standard, the minimum coding unit size is 4 × 4 pixels, so the video decoder 110 needs to recursively divide the super block 200 to be decoded until obtaining 4 × 4 coding units.
First, the super block 200 is divided into 4 64 × 64 blocks, such as the blocks 21 to 24, and then whether each of the 64 × 64 blocks 21 to 24 needs to be divided is determined according to a sequential scanning (raster scan) sequence. Taking 64x64 blocks as an example, when the determination is performed according to the sequential scanning order, the video decoder 110 first determines whether the upper left 64x64 block (block 21) needs to be divided, and then sequentially determines whether the upper right, lower left and lower right 64x64 blocks 22, 23 and 24 need to be divided.
For example, on the premise that the video decoder 110 correctly decodes according to the video codec standard, if the syntax value obtained by decoding is PARTITION _ SPLIT, the partitioning continues; if the decoded syntax value is another value (e.g., PARTITION _ NONE, PARTITION _ VERT, \8230;) then it is returned and re-entered into the decoding phase. The decoding according to the video encoding and decoding standard does not belong to the scope of the present invention, and is not described herein.
If the video decoder 110 determines that the first 64x64 block 21 needs to be divided, the block 21 is divided into 4 32x32 blocks, such as blocks 211-214. Similarly, as shown in fig. 2, the video decoder 110 determines that the blocks 211 and 214 need to be further divided, and the blocks 212 and 213 need not be further divided, so the video decoder 110 divides the blocks 211 and 214 into 4 16 × 16 blocks, such as blocks 2111 to 2114 and blocks 2141 to 2144, respectively. In addition, the video decoder 110 determines that none of the blocks 2111 to 2114 and 2141 to 2144 need to be further divided, so each of the blocks 2111 to 2114 and 2141 to 2144 can be regarded as a Coding Unit (CU).
FIGS. 3A-3O are schematic diagrams illustrating a stacking sequence in the embodiment of FIG. 2 according to the present invention.
In one embodiment, the memory space of a portion of the volatile memory 120 can be used as a stack memory (stack memory) 121, and the video decoder 110 can execute a block recursive partitioning instruction, such as QTREE, and implement the block recursive partitioning operation by First-in-Last-out (FILO) or Last-in-First-out (LIFO) the stack memory 121. For the stack memory 121, the add (push) operation and the delete (pop) operation are performed on the same side of the stack memory 121.
For illustrative purposes, the stack order of FIGS. 3A-3O illustrates how the block recursive partitioning instruction is implemented according to the partition order of the superblock in FIG. 2.
In one embodiment, the video decoder 110 may, for example, preset a maximum partition depth of the stack memory 121, where the maximum partition depth represents the number of 4 × 4 blocks when all blocks in the super block 200 are partitioned into the minimum coding unit (i.e., 4 × 4 blocks). For example, although a superblock with a size of 128x128 may be divided into 1024 4x4 blocks, the video decoder 110 does not actually use 1024 entries (entries) when performing the block recursive division instruction, but processes one of the four blocks of each layer according to the sequential scanning order. Therefore, if the maximum partition depth of stack memory 121 is calculated, super block 200 is partitioned as shown in FIG. 4A, i.e. each layer of blocks occupies only 3 entries of stack memory 121. Thus, 64x64 blocks, 32x32 blocks, 16x16 blocks, 8x8 blocks each occupy 3 entries in the stack memory 121. Since the 4x4 block is the smallest coding unit, the video decoder 110 can decode the 4x4 block directly without dividing the 4x4 block into smaller blocks, and thus the video decoder 110 does not need to write the 4x4 block into the stack memory 121.
Therefore, when the block recursive partitioning instruction of superblock 200 is executed, a total of 12 entries in stack memory 121 are occupied, so the maximum partition depth of stack memory 121 is 12. Assuming that the stack memory 121 has N entries (entries) (e.g., N = 12), the logical address of the bottommost entry of the stack memory 121 is #0, and the logical address of the topmost entry is #1 (N-1), i.e., #11, as shown in fig. 4B. However, the present invention is not limited to the maximum partition depth, and the video decoder 110 may adjust the maximum partition depth depending on the actual situation of the decoding process.
As shown in fig. 3A, the video decoder 110 performs a first division process, for example, dividing the super block 200 (e.g., 128 × 128 blocks) into 4 64 × 64 blocks 21 to 24. In this embodiment, the video decoder 110 first determines whether the block 21 needs to be further divided, but does not first determine whether the blocks 22-24 need to be further divided, so that the block 21 is not written into the stack memory 121, but the block 24, the block 23 and the block 22 are written into the stack memory 121 sequentially, i.e. the block 24 is at the bottom of the stack memory 121, as shown in fig. 3A.
Then, the video decoder 110 determines that the block 21 needs to be further divided, so a second division process is performed to divide the block 21 into 4 32 × 32 blocks 211 to 214. Since the second or more division processes have been performed, the blocks 211 to 214 may also be referred to as sub-blocks. Similarly, in the sequential scanning order, the video decoder 110 first determines whether the first 32 × 32 block 211 needs to be further divided, but first does not determine whether the blocks 212 to 214 need to be further divided, so that the block 211 is not written into the stack memory 121, and the block 214, the block 213, and the block 212 are sequentially written into the stack memory 121, as shown in fig. 3B.
Then, the video decoder 110 determines that the block 211 needs to be further divided, so a third division process is performed to divide the block 211 into 4 16 × 16 blocks 2111 to 2114 (also referred to as sub-blocks). Similarly, according to the sequential scanning order at this layer of 16 × 16 blocks, the video decoder 110 first determines whether the first 16 × 16 block 2111 needs to be further divided, but first does not determine whether the blocks 2112 to 2114 need to be further divided, so that the block 2111 is not written into the stack memory 121, but the block 2114, the block 2113 and the block 2112 are sequentially written into the stack memory 121, as shown in fig. 3C.
Then, the video decoder 110 determines that the block 2111 does not need to be further divided, i.e., the size of the block 2111 is maintained at 16 × 16 during the fourth dividing process, so the video decoder 110 can set the block 2111 as the coding unit CU0 and decode the block 2111. At this time, stack memory 121 remains as the memory content shown in fig. 3C.
Video decoder 110 then fetches (pop) block 2112 from stack memory 121 and determines that block 2112 does not need to be further partitioned. That is, at the time of the fifth division process, the size of the sector 2112 is still maintained at 16 × 16, i.e., the division is suspended. Accordingly, the video decoder 110 may set the block 2112 as the coding unit CU1 and decode the block 2112. At this time, the storage contents of the stack memory 121 are as shown in fig. 3D.
Similarly, the video decoder 110 continues to fetch the block 2113 from the stack memory 121 in the sequential scanning order of the 16 × 16 block layer, and determines that the block 2113 needs no further division. That is, at the sixth division processing, the size of the block 2113 is still maintained at 16 × 16, that is, the division is terminated. Accordingly, the video decoder 110 may set the block 2113 as the coding unit CU2 and decode the block 2113. At this time, the storage contents of the stack memory 121 are as shown in fig. 3E.
Similarly, in response to the partition termination of the coding unit CU2, the video decoder 110 continues to fetch the block 2114 from the stack memory 121 in the sequential scanning order at the 16 × 16 block level and determines that the block 2114 does not need to be further partitioned. That is, at the seventh division processing, the size of the block 2114 is still maintained at 16 × 16, that is, the division is terminated. Accordingly, the video decoder 110 may set the block 2114 as the coding unit CU3 and decode the block 2114. At this time, the storage contents of stack memory 121 are as shown in FIG. 3F, and the recursive division processing of block 211 has ended.
Then, the video decoder 110 continues to fetch the 32x32 block 212 from the stack memory 121 according to the sequential scanning order of the layer of 32x32 blocks, and determines that the block 212 does not need to be further divided. That is, during the eighth division, the size of the block 212 is still maintained at 32x32, i.e., the division is terminated. Thus, video decoder 110 may set block 212 as coding unit CU4 and decode block 212. At this time, the storage contents of stack memory 121 are as shown in fig. 3G.
Then, in response to the partition of the coding unit CU4 ending, the video decoder 110 continues to fetch the 32x32 block 213 from the stack memory 121 according to the sequential scanning order at this layer of 32x32 blocks, and determines that the block 213 does not need to be further partitioned. That is, in the ninth division process, the size of the block 213 is maintained at 32x32, i.e., the division is terminated. Accordingly, the video decoder 110 may set the block 213 as the coding unit CU5 and decode the block 213. At this time, the storage contents of the stack memory 121 are as shown in fig. 3H.
Then, in response to the partition of the coding unit CU5 being aborted, the video decoder 110 continues to fetch the 32x32 block 214 from the stack memory 121 in the sequential scanning order of the layer at the 32x32 block, and determines that the block 214 needs to be further partitioned. Therefore, the video decoder 110 performs the tenth division process to divide the 32 × 32 block 214 into four 16 × 16 blocks 2141 to 2144, and sequentially writes the blocks 2144, 2143, and 2142 into the stack memory 121, as shown in fig. 3I.
The processing of the 16x16 blocks 2141 to 2144 is similar to the processing of the 16x16 blocks 2111 to 2114. In detail, the video decoder 110 first determines that the block 2141 does not need to be further divided, i.e., the size of the block 2141 is still maintained at 16 × 16 in the eleventh division process, i.e., the division is terminated. Thus, video decoder 110 may set block 2141 to coding unit CU6 and decode block 2141. At this time, stack memory 121 remains in the storage content shown in fig. 3I.
Then, in response to the partition of the coding unit CU6 ending, the video decoder 110 continues to fetch the 16 × 16 block 2142 from the stack memory 121 in the sequential scanning order at this layer of the 16 × 16 block, and determines that the block 2142 does not need to be further partitioned. That is, during the twelfth division process, the size of the block 2142 remains at 16x16, i.e., the division is terminated. Thus, video decoder 110 may set block 2142 as coding unit CU7 and decode block 2142. At this time, the storage contents of the stack memory 121 are as shown in fig. 3J.
In response to the partition of coding unit CU7 ending, video decoder 110 continues to fetch 16 × 16 block 2143 from stack memory 121 in the progressive scan order at the layer of 16 × 16 blocks and determines that block 2143 does not need to be further partitioned. That is, during the thirteenth division process, the size of the block 2143 is still maintained at 16x16, i.e., the division is suspended. Thus, video decoder 110 may set block 2143 as coding unit CU8 and decode block 2143. At this time, the storage contents of the stack memory 121 are as shown in fig. 3K.
In response to the partition of coding unit CU8 ending, video decoder 110 continues to fetch 16x16 block 2144 from stack memory 121 in the sequential scanning order at this layer of 16x16 blocks and determines that block 2144 does not need to be further partitioned. That is, during the fourteenth dividing process, the size of the block 2144 is still maintained at 16 × 16, i.e., the division is terminated. Thus, video decoder 110 may set block 2144 as coding unit CU9 and decode block 2144. At this time, the contents of stack memory 121 are as shown in FIG. 3L, and all recursive partitioning processes for 64 × 64 block 21 are completed.
The video decoder 110 then continues to fetch the 64x64 blocks 22 from the stack memory 121 in the sequential scanning order of the layer of 64x64 blocks, and determines that the blocks 22 do not need to be further divided. That is, during the fifteenth division process, the size of the block 22 is maintained at 64 × 64, i.e., the division is terminated. Accordingly, video decoder 110 may set block 22 as coding unit CU10 and decode block 22. At this time, the storage contents of the stack memory 121 are as shown in fig. 3M.
In response to the partition of coding unit CU10 ending, video decoder 110 continues to fetch 64x64 block 23 from stack memory 121 in the progressive scan order at this layer of 64x64 blocks and determines that block 23 does not need to be further partitioned. That is, in the fifteenth division process, the size of the block 23 is still maintained at 64 × 64, i.e., the division is terminated. Accordingly, video decoder 110 may set block 23 as coding unit CU11 and decode block 23. At this time, the storage contents of the stack memory 121 are as shown in fig. 3N.
In response to the partition termination of the coding unit CU11, the video decoder 110 continues to fetch the 64x64 block 24 from the stack memory 121 in the sequential scanning order of the layer at the 64x64 block, and determines that the block 24 does not need to be further partitioned. That is, at the sixteenth division processing, the size of the block 24 is still maintained at 64 × 64, i.e., the division is terminated. Thus, video decoder 110 may set block 24 as coding unit CU12 and decode block 24. At this time, the storage contents of stack memory 121 have been completely emptied, as shown in fig. 3O. When decoding of coding unit CU12 is completed, the decoding process of super block 200 is terminated.
FIG. 5A is a flowchart of a method for recursively partitioning decoded blocks according to an embodiment of the invention. FIG. 5B is a flowchart of the block recursive partitioning process of the current block in the embodiment of FIG. 5A according to the present invention. Please refer to fig. 1 and fig. 5A-5B.
In step S510, a super block is obtained from a video stream to be decoded, wherein the super block has a first size in both horizontal and vertical directions. For example, the maximum size of a superblock in the AV1 standard may be, for example, 128x128.
In step S520, the super block is divided into a first block, a second block, a third block and a fourth block in the horizontal direction and the vertical direction. For example, the order of the first block, the second block, the third block and the fourth block is determined according to the sequential scanning order (e.g., corresponding to the blocks 21, 22, 23 and 24 in fig. 2), i.e., the first block, the second block, the third block and the fourth block are respectively located at the upper left, the upper right, the lower left and the lower right of the super block.
In step S530, the first block is selected as a current block to perform a block recursive partitioning process, and the fourth block, the third block and the second block are sequentially written into a stack memory. For example, the fourth block written to the stack memory first is at the bottom of the stack memory.
In step S540, in response to the end of the block recursive partitioning process of the current block, a top block is fetched from the stack memory as the current block, and the block recursive partitioning process is performed on the current block. Note that the stack memory 121 implements the operation of block recursive partitioning in a first-in-last-out (FILO) or last-in-first-out (LIFO) manner. For the stack memory 121, the add (push) operation and the delete (pop) operation are performed at the same end of the stack memory 121. Thus, the next block fetched from the stack memory is the top-most block (or sub-block) of the stack memory.
In step S550, it is determined whether the current block is the fourth block. If the current block is the fourth block, the process ends. If the current block is not the fourth block, go back to step S540.
In this embodiment, the block recursive division process of the current block can be further divided into steps S521-S527.
In step S521, it is determined whether the current block needs to be divided. If the current block needs to be divided, step S522 is performed. If the current block is determined not to be divided, step S525 is performed.
In step S522, the current block is divided into a first sub-block, a second sub-block, a third sub-block and a fourth sub-block in the horizontal direction and the vertical direction, and the fourth sub-block, the third sub-block and the second sub-block are sequentially written into the stack memory.
In step S523, the first sub-block is set as the current sub-block, and whether the current sub-block needs to be divided is determined. If the current sub-block needs to be divided, step S524 is executed. If the current sub-block does not need to be divided, step S525 is performed.
In step S524, the block recursive partitioning process is repeatedly performed on the current sub-block until the size of the current sub-block is equal to a predetermined size. For example, the predetermined size is the size of the smallest Coding Unit (CU), i.e., a 4 × 4 block.
In step S525, the current sub-block is decoded.
In step S526, in response to the decoding of the current sub-block ending, the top sub-block is fetched from the stack memory as the current sub-block, and the block recursive partitioning process is performed on the current sub-block.
In step S527, in response to the end of the block recursive partitioning process of the current sub-block, it is determined whether the current sub-block is the fourth sub-block. If the current sub-block is the fourth sub-block, step S540 is performed. If the current sub-block is not the fourth sub-block, go back to step S526.
In summary, the present invention provides an integrated circuit and a method for recursively partitioning a decoding block, which can utilize a stack memory to enable a video decoder to efficiently recursively partition a super block into a plurality of coding units, thereby improving decoding performance.
Use of the terms "first," "second," "third," and the like in the claims is used to modify a claim element without indicating a priority, precedence, or order between elements, or the order in which a method step is performed, but is intended to distinguish one element from another element having a same name.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.