CN110945872A - Video processing apparatus and method - Google Patents

Video processing apparatus and method Download PDF

Info

Publication number
CN110945872A
CN110945872A CN201880042827.5A CN201880042827A CN110945872A CN 110945872 A CN110945872 A CN 110945872A CN 201880042827 A CN201880042827 A CN 201880042827A CN 110945872 A CN110945872 A CN 110945872A
Authority
CN
China
Prior art keywords
cache
read
search range
search
image block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201880042827.5A
Other languages
Chinese (zh)
Inventor
陈秋伯
郑萧桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN110945872A publication Critical patent/CN110945872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A video codec and corresponding method are provided. The method comprises the following steps: adjusting the data amount to be read into a cache of a video encoder/decoder according to the bandwidth use consumption; reading at least a portion of image blocks of a reference frame into the cache based on the amount of data; performing a search match on a target image block in a current image based on image data in the cache; and performing inter-frame encoding/decoding on the target image block based on the search matching result. Therefore, the consumption of the video coder/decoder to the bandwidth of the system on chip can be adjusted according to the bandwidth requirement of the video coder/decoder, and the energy consumption of the video coder/decoder is ensured to be in a controllable range.

Description

Video processing apparatus and method Technical Field
Embodiments of the present application relate to the field of video processing. More particularly, embodiments of the present application relate to a video codec and a corresponding method.
Background
In order to reduce the bandwidth occupied by video storage and transmission, video data generally needs to be subjected to encoding compression processing. The encoding compression process includes prediction, transformation, quantization and entropy coding processes. The prediction comprises an intra-frame prediction type and an inter-frame prediction type, and aims to remove redundant information of a current image block to be coded by using prediction block information. In which intra-frame prediction obtains prediction block data using information of the present frame image, and inter-frame prediction obtains prediction block data using information of the reference frame.
However, in video encoder circuits, inter-prediction needs to occupy a large amount of bandwidth resources, while incurring large external memory access power consumption. In systems on chip (SoC), in particular, bandwidth resources are limited and are often shared by multiple modules (e.g., including CPUs, GPUs, image processors, etc.). Therefore, the huge bandwidth requirement of the video encoder inevitably compresses the bandwidth requirement of other modules, and the large occupied bandwidth may cause the external memory not to respond to other real-time processing modules instantly. As video resolution becomes increasingly popular from high definition to ultra-high definition, bandwidth consumption of inter-frame prediction of a video encoder will increase by a factor of two.
Disclosure of Invention
The embodiment of the application provides a bandwidth control technology for a video coder/decoder circuit, which can effectively control the consumption of the video coder/decoder on the limited bandwidth of a system on a chip and ensure that the bandwidth requirement and the energy consumption of the video coder/decoder circuit are within a controllable range.
In a first aspect of embodiments of the present application, there is provided a method in a video codec, including:
adjusting the data amount to be read into a cache of a video encoder/decoder according to the bandwidth use consumption;
reading at least a portion of image blocks of a reference frame into the cache based on the amount of data;
performing a search match on a target image block in a current image based on image data in the cache;
and performing inter-frame encoding/decoding on the target image block based on the search matching result.
In a second aspect of embodiments of the present application, there is provided a video codec. The video codec includes a memory and one or more processors communicatively coupled with the memory. The memory has stored thereon instructions that, when executed by the one or more processors, cause the video codec to:
adjusting the data amount to be read into a cache of a video encoder/decoder according to the bandwidth use consumption;
reading at least a portion of image blocks of a reference frame into the cache based on the amount of data;
performing a search match on a target image block in a current image based on image data in the cache;
and performing inter-frame encoding/decoding on the target image block based on the search matching result.
According to a third aspect of embodiments of the present application, there is provided a drone comprising a video codec according to the second aspect of embodiments of the present application.
According to a fourth aspect of embodiments herein, there is provided a computer program which, when executed by at least one processor, causes the at least one processor to perform the method according to the first aspect of embodiments herein.
According to a fifth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing the computer program according to the fourth aspect of the embodiments of the present application.
By adopting the embodiment of the application, the consumption of the video coder/decoder on the bandwidth of the system on chip can be adjusted according to the bandwidth requirement of the video coder/decoder, and the energy consumption of the video coder/decoder circuit is ensured to be within a controllable range.
Drawings
The above and other features of the embodiments of the present application will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:
fig. 1 is a schematic diagram illustrating a video codec scheme.
Fig. 2 is a schematic diagram illustrating a video codec scheme according to an embodiment of the present application.
Fig. 3 is a flowchart illustrating a video encoding and decoding method according to an embodiment of the present application.
Fig. 4 is a block diagram illustrating a video codec according to an embodiment of the present application.
Fig. 5 is a schematic diagram illustrating a computer-readable storage medium according to an embodiment of the present application.
Fig. 6 is a diagram illustrating a search range according to an embodiment of the present application.
Fig. 7 is a schematic diagram illustrating reference frame compression according to an embodiment of the application.
It is noted that the drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the technology of embodiments of the application. In addition, for purposes of clarity, like reference numbers refer to like elements throughout the drawings.
Detailed Description
Technical solutions in the embodiments of the present application will be described below clearly with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
As described above, in order to reduce the bandwidth occupied by video storage and transmission, it is generally necessary to perform an encoding compression process on video data. The encoding compression process includes prediction, transformation, quantization and entropy coding processes. The prediction includes two types, intra prediction which obtains prediction block data using information of the present frame image, and inter prediction which obtains prediction block data using information of the reference frame. Specifically, the inter prediction process includes: dividing an image block to be encoded into a plurality of sub-image blocks; searching an image block which is most matched with the current sub image block in a reference image as a prediction block for each sub image block; subtracting the corresponding pixel values of the sub image block and the prediction block to obtain a residual error; and combining the obtained residuals corresponding to the sub image blocks together to obtain the residual of the image block.
The residuals may be decorrelated (i.e., redundant information of the image blocks is removed) using a transform matrix to improve coding efficiency. The transformation of the data block in the image block usually adopts two-dimensional transformation, i.e. the residual information of the data block is multiplied by a transformation matrix and a transpose thereof respectively at the encoding end to obtain a transformation coefficient. The transform coefficients are quantized to obtain quantized coefficients. Finally, entropy coding is carried out on the quantized coefficients, and the bit stream obtained by entropy coding and the coded coding mode information (such as intra-frame prediction mode, motion vector information and the like) are stored or sent to a decoding end.
Correspondingly, at the decoding end of the image, entropy decoding is carried out after entropy coding bit streams are obtained, corresponding residual errors are obtained, a prediction image block corresponding to an image block of information such as a motion vector or intra-frame prediction obtained by decoding is obtained, and the value of each pixel point in a current sub-image block is obtained according to the prediction image block and the residual errors of the image block.
In inter-frame prediction, for each prediction unit of a current frame, Search matching is performed within a Search Area (SA) of a corresponding position of a reference frame. To make the prediction more accurate, a search is usually performed in a search range centered on a plurality of Motion Vector candidates (MVs) from MVs of the coded prediction unit of the current frame or the reference frame. Meanwhile, for each SA, a larger search range also means more accurate inter prediction. Therefore, inter prediction requires matching of image blocks using a large number of reference frame data.
Considering that the on-chip memory is expensive, the reference frame data may be stored in an external memory, and the corresponding reference frame data may be read into the on-chip buffer only when necessary for inter-frame prediction. Fig. 1 shows a schematic diagram of a video codec 10. As can be seen from fig. 1, reference frame data required for inter prediction is stored in the external memory 160, and loaded into the on-chip buffer 120 when necessary.
To alleviate the large bandwidth and energy consumption of inter-frame prediction, several different schemes may be employed. For example, one scheme includes designing on-chip cache 120 as a line cache. In particular, the line cache may cache an entire line of reference frame data. Therefore, in the encoding process of the current frame from left to right and from top to bottom. However, line buffering may consume a large number of memory cells and may limit the inter prediction search range, which may have an impact on prediction accuracy. Another approach involves compressing the reference frame data using a reference frame compression module (see, e.g., decompression unit 140 and compression unit 150 shown in fig. 1), reducing the amount of data that accesses external memory, thereby reducing bandwidth requirements and reducing read and write power consumption. Reference frame compression may alleviate the bandwidth requirements of inter-frame prediction, however, since reference frame compression is typically lossless, its compression rate is dynamically changing with video content. Although the bandwidth requirement for inter-prediction is reduced on average, the maximum requirement or worst case instantaneous requirement is still uncontrollable, which requires worst case consideration of bandwidth allocation for SoC system design.
In order to improve the encoding efficiency, a cache architecture can be considered to ensure that the search range is not limited. The main characteristic of cache is that the cache can be mapped to any region of an image, so that reference frame data at any position can be cached theoretically, and the search range is not limited. Fig. 2 is a schematic diagram illustrating a video codec scheme according to an embodiment of the present application. As shown in fig. 2, the reference frame data is cached by the cache 220 to ensure an unlimited inter-frame prediction search range, thereby ensuring the prediction accuracy in a severe motion video scene. The main features of the cache 220 include: (1) the cache is composed of cache lines, and the reference frame image is divided into the size of the cache lines and corresponds to the cache in a certain mapping mode; (2) the cache not only caches one piece of cache line data in the image, but also caches the position coordinates of the cache line data at the same time, and the cache can be used for judging whether the data of the size of the cache line required currently is in the cache during reading.
Since the circuits of the video codec are usually designed in a pipeline manner, when inter-frame prediction is performed on a current Coding Unit (e.g., a Coding Tree Unit CTU), data of a corresponding search range needs to be prepared in advance, that is, the data of the search range needs to be read into the cache 220 from the external memory 260. The search range is a region centered on the MV at the adjacent coded position in the current frame or the reference frame, and the corresponding search range is named SAn (n is 1, 2, 3 …). Generally, reading all of the SAn from the external memory 260 will exceed the set bandwidth limit. Although some of the required reference frame data may already be stored in the cache 220 due to image correlation, and thus SAn does not need to be completely retrieved from the external memory 260, the proportion of the useful data in the cache 220 dynamically changes with the image content, and the worst case needs to be considered in practical applications.
In one embodiment of the present application, the number and size of the search range of the current coding unit can be determined by counting the amount of bandwidth consumed by reading data of the previous coding unit. The bandwidth control module 270 may control the number or size of the SA to be read, and for each SA to be read or its sub-range SA _ small, detect whether the data in cache line units in the SSA is in the cache 220 one by one. Only for non-existing data is read from the external memory 260. In this process, the statistical bandwidth consumption can be used for bandwidth control for the next reading.
As shown in fig. 2, in the scheme that the cache 220 is adopted, since some reference frame data already exists in the cache 220, all the reference frame data for matching the image block does not need to be acquired from the external memory 260, and part of the data can be directly read from the cache 220. For example, the data may be compared with the data stored in the cache memory 220, respectively, starting from the position pointed by one candidate motion vector. Only if the comparison indicates that the cache 220 does not store the data, the data needs to be fetched from the external memory 260. Therefore, it is possible to determine the total amount of data that needs to be read from the external memory 260 and control the total amount of data actually read from the external memory 260 according to a predetermined upper limit value. When the amount of data read from the external memory 260 reaches a predetermined upper limit value, the reading of data from the external memory 260 is stopped even if the amount of data has not reached the total amount of data. In other words, the bandwidth consumption amount is reduced by appropriately reducing the number of reference frame data for matching image blocks.
Fig. 6 is a diagram illustrating a search range according to an embodiment of the present application. In fig. 6, each gray small box represents the mapping range of one cache line. Since the reference data of the search range is typically read in the size of one cache line, it is preferable to align the search range to the size of the cache line, i.e. to ensure that the search range is an integer multiple of the size of the cache line. The number of SAs and the size of SAs are not limited in the present application. For each SAn, a smaller sub-search range SAn _ small _ scale _ factor SAn may be defined within it, where scale _ factor is a scaling parameter from 0 to 1. The sub-search ranges with different sizes can be obtained by controlling the size of the scale _ factor, and the bandwidth occupation can be reduced by only reading the data of the sub-search ranges. It should be noted that the size of each SAn is not necessarily the same, and the size of SAn _ small is preferably an integer multiple of the size of the cache line. For each SAn or SAn _ small, the process of cache reading is to check whether the image area corresponding to each cache line in SAn or SAn _ small is in the cache according to the sequence from left to right and from top to bottom. If so, called a cache hit, no operation is performed at this time; if not, the image area is called as cache miss, and at this time, the image area needs to be read from an external memory and stored in a cache.
Fig. 3 is a flowchart illustrating a video encoding and decoding method according to an embodiment of the present application. The method can be applied to a video coder/decoder which adopts cache as on-chip cache. As shown in fig. 3, the amount of data to be read into the cache of the video codec is adjusted according to the consumption of bandwidth usage at step S310. At step S320, at least a partial image block of the reference frame is read into the cache memory based on the amount of data. In step S330, search matching is performed on the target image block in the current image based on the image data in the cache. In step S340, inter-frame encoding/decoding is performed on the target image block based on the result of search matching. In an embodiment of the present application, the target image block may refer to an image block that is coded/decoded after the current coded/decoded block in the current image.
The operation of the various steps in fig. 3 is described below by way of several detailed examples.
For example, a search range corresponding to the target image block may be determined in the reference frame, a ratio to be read of the search range may be determined according to bandwidth usage consumption, and sub-search ranges in the search range may be read into the cache memory in the reference frame according to the ratio to be read. The cache may include a plurality of cache lines. At least a partial image block of the reference frame may be read into the cache memory by reading one search range at a time from the reference frame or reading a sub-search range within one search range and reading the data read at a time into at least one cache line of the cache memory. The amount of data of the search range read at different times may be the same or different. The data amount of the sub-search ranges in the search ranges read at different times may be the same or different. Preferably, the amount of data of the one search range or the sub-search range in the one search range is an integer multiple of the size of the cache line.
Preferably, an image block to be read in is read into the cache only when it is determined that the image block is not present in the cache. If it is determined that an image block to be read in already exists in the cache, the image block is discarded from being read into the cache.
In one example, the cache may also store location coordinates of the at least some of the image blocks.
In one example, the adjusting of the amount of data to be read into the cache of the video codec according to the consumption of bandwidth usage may be performed after inter-frame coding/decoding of the first M image blocks in one cycle with every N rows of image blocks in the current image as one cycle. Wherein N and M are positive integers. For example, N may be equal to 1. For example, the N rows of image blocks may be N rows of coding/decoding tree cells.
In one example, prior to adjusting an amount of data to be read into the cache according to a bandwidth usage consumption, a total number of cache misses occurring when performing a cache read on one or more image blocks in a current image preceding a current encode/decode block is calculated, and a bandwidth usage consumption is determined based at least on the total number of cache misses. The search range corresponding to the target image block includes one or more search ranges and/or one or more sub-search ranges. Each sub-search range may be defined in a corresponding one of the search ranges. For example, if the total number of cache misses is greater than a first threshold, a cache read is performed in the search range at a first read rate. If the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, performing a cache read in the search range at a second read fraction, the second read fraction being greater than the first read fraction. If the total number of cache misses is less than or equal to a second threshold, a cache read is performed on the search range.
In another example, a search range or a sub-search range to be read into the cache is determined from the at least two search ranges to be read based on the total number of cache misses and priorities of the at least two search ranges to be read.
In the case where the at least two search ranges have different priorities, the search range in the reference frame or the sub-search ranges in the search range may be selected in order of priority from high to low. For example, assume that there is a first search range and a second search range, where the first search range has a higher priority than the second search range. If the total number of cache misses is greater than a first threshold, a portion of a first search range is read into the cache. The first search range is read into the cache if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold. Determining to perform a cache read in the first search range and the second search range if the total number of cache misses is less than or equal to a second threshold.
For example, it is assumed that the priority of the first search range SA1 is greater than the priority of the second search range SA 2. The bandwidth control module controls in units of one row of CTUs. And if the current read CTU is not the first L CTUs, performing bandwidth control on the current read CTU. The concrete mode is as follows:
1) counting the cache missing number of the previous N CTUs during prefetching, and recording the cache missing number as sum;
2) sum is compared to the threshold TH 0. If sum > TH0, read only SA1_ small ═ scale _ factor1 ═ SA1, where scale _ factor1 is a parameter;
3) if sum < ═ TH0, then sum is compared to the threshold TH 1. If sum > TH1, read only SA 1;
4) if sum < ═ TH1, then SA1 and SA2 were read.
5) And counting the cache missing quantity generated in the process, and controlling the bandwidth of the next CTU.
The significance of the parameter L is: at the beginning of each line, the video content changes significantly from the end of the previous line, and the reference frame data stored in the cache needs to be updated in a large amount. Thus, assume that no bandwidth limitation can be made for cache reads of the first L code units. The choice of L is related to hardware pipelining, e.g., L8. While the parameters TH0 or TH1 may directly control the number of cache misses. When these two thresholds are exceeded, it is indicated that more bandwidth is consumed. The scale factor is a scaling factor to reduce the prefetch range and thus reduce the bandwidth. One possible setting for the parameter N is 10. The larger N, the larger TH0 and TH 1. For example, when N is 10, one of TH0 and TH1 may be set to TH0 to 500 and TH1 to 300. The parameter scale _ factor1 may be set to 0.5.
In case the at least two search ranges have the same priority, it is assumed that there is a first search range and a second search range, wherein the priority of the first search range is equal to the priority of the second search range. Then, if the total number of cache misses is greater than a first threshold, a cache read is performed in a first search range at a first read rate, and a cache read is performed in a second search range at a second read rate. If the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, performing a cache read in a first search range at a third read rate, and performing a cache read in a second search range at a fourth read rate, wherein the third read rate is greater than the first read rate, and the fourth read rate is greater than the second read rate. Reading the first search range and the second search range into the cache if the total number of cache misses is less than or equal to a second threshold.
For example, it is assumed that the priority of the first search range SA is equal to the priority of the second search range SA 2. The bandwidth control module controls in units of one row of CTUs. And if the current read CTU is not the first L CTUs, performing bandwidth control on the current read CTU. The concrete mode is as follows:
6) counting the cache missing quantity when the caches of the first N CTUs are read, and recording the cache missing quantity as sum;
7) sum is compared to the threshold TH 0. If sum > TH0, read only the sum of SA1_ small ═ scale _ factor1 ═ SA1 and SA2_ small ═ scale _ factor2 ═ SA2, where scale _ factor1 and scale _ factor2 are parameters;
8) if sum < ═ TH0, then sum is compared to the threshold TH 1. If sum > TH1, read only the sum of SA1_ small ═ scale _ factor3 ═ SA1 and SA2_ small ═ scale _ factor4 SA2, where scale _ factor3 and scale _ factor4 are parameters, scale _ factor1 < scale _ factor3 and scale _ factor2 < scale _ factor 4;
9) if sum < ═ TH1, then SA1 and SA2 were read.
10) And counting the cache missing quantity generated in the process, and controlling the bandwidth of the next CTU.
The significance of the parameter L is: at the beginning of each line, the video content changes significantly from the end of the previous line, and the reference frame data stored in the cache needs to be updated in a large amount. Thus, assume that no bandwidth limitation can be made for cache reads of the first L code units. The choice of L is related to hardware pipelining, e.g., L8. While the parameters TH0 or TH1 may directly control the number of cache misses. When these two thresholds are exceeded, it is indicated that more bandwidth is consumed. The scale factor is a scaling factor to reduce the prefetch range and thus reduce the bandwidth. One possible setting for the parameter N is 10. The larger N, the larger TH0 and TH 1. For example, when N is 10, one of TH0 and TH1 may be set to TH0 to 500 and TH1 to 300. In one example, the parameters scale _ factor1 ═ 0.3, scale _ factor2 ═ 0.3, scale _ factor3 ═ 0.6, and scale _ factor4 ═ 0.6.
The above bandwidth control method is mainly directed to a video codec system without a reference frame compression and decompression module. In a video encoder system having reference frame compression and decompression modules (e.g., the decompression unit 240 and the compression unit 250 shown in fig. 2), the bandwidth can be further more precisely controlled using compression rate information of reference frame compression. The main feature of video codec systems with reference frame compression is that the reference video frames are compressed before being stored in external memory and are typically compressed in units of one cache line or slice. As shown in fig. 7, the amount of data corresponding to a cache line is reduced, and therefore the amount of bandwidth occupied when a cache miss occurs and a read is required is also reduced proportionally. In order to reflect the reduction of the bandwidth, the sum needs to be calculated by modifying:
Figure PCTCN2018098044-APPB-000001
in the above formula, m is the number of cache misses, cr (i) is the compression ratio of the image block corresponding to the ith cache miss, and the compression ratio is defined as the compressed data amount divided by the original data amount. The compression ratio value may typically be provided directly by the reference frame decompression module or obtained by simple calculations. Furthermore, it can be seen from the above formula that the bandwidth control has the same form for whether there is reference frame compression and decompression block, i.e. when there is no reference frame compression, cr (i) ═ 1; when there is a reference frame compression, cr (i) is expressed as the compression rate of the corresponding image block data.
Accordingly, an actual search range when performing cache reading on a current image block may be determined based on compression rates of one or more image blocks preceding the current image block. Wherein the number of cache misses occurring when reading is performed for each image block may be multiplied by the compression rate of the image block and the multiplication results summed to obtain the total number of said cache misses. And the target image block is an image block which is coded/decoded after the current coding/decoding block in the current image.
Bandwidth control in combination with reference frame compression rate information has a greater advantage. For example, sum has a smaller value with the same number of cache misses. Therefore, the video encoder with reference frame compression will read more search range data for subsequent inter prediction, and thus better encoding efficiency can be obtained.
Fig. 4 is a block diagram illustrating a video codec according to an embodiment of the present application. The video codec can be applied to various platforms, such as a drone, an unmanned vehicle, or a robot. As shown in fig. 4, the video codec 40 includes a memory 410 and a processor 420.
The memory 410 stores program instructions. For example, the memory 410 may be Random Access Memory (RAM) or Read Only Memory (ROM), or any combination thereof. The memory 410 may also include persistent storage such as any one or combination of magnetic memory, optical memory, solid state memory, or even remotely mounted memory.
Processor 420 may include any combination of one or more Central Processing Units (CPUs), multiple processors, microcontrollers, Digital Signal Processors (DSPs), application specific integrated circuits, and the like.
Processor 420 may call program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: adjusting the data amount to be read into a cache of a video encoder/decoder according to the bandwidth use consumption; reading at least a portion of image blocks of a reference frame into the cache based on the amount of data; performing a search match on a target image block in a current image based on image data in the cache; and performing inter-frame encoding/decoding on the target image block based on the search matching result. The target image block may refer to an image block of the current image that is coded/decoded after the current coding/decoding block.
In one example, processor 420 may invoke program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: determining a search range corresponding to the target image block in the reference frame; determining the proportion to be read of the search range according to the bandwidth use consumption; and reading the sub-search ranges in the search range into the cache in the reference frame according to the ratio to be read.
In one example, a cache includes a plurality of cache lines. Processor 420 may call program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: reading a search range or reading sub-search ranges in a search range from the reference frame each time, wherein the at least partial image block comprises the search range or reads the sub-search ranges in the search range; reading the data read each time into at least one cache line of the cache.
In one example, the cache also stores location coordinates of the at least a portion of the image block.
In one example, the amount of data of the one search range or the sub-search range in the one search range is an integer multiple of the cache line.
In one example, the amount of data for the search ranges read at different times may be the same or different. Alternatively, the data amount of the sub-search ranges in the search ranges read at different times may be the same or different.
In one example, processor 420 may invoke program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: when determining that an image block to be read in does not exist in the cache, reading the image block into the cache; alternatively, upon determining that an image block to be read in already exists in the cache, the image block is discarded from being read into the cache.
In one example, processor 420 may invoke program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: and taking every N rows of image blocks in the current image as a period, and after the first M image blocks in the period are subjected to inter-frame encoding/decoding, starting to execute the adjustment of the data size to be read into a cache of the video encoder/decoder according to the bandwidth use consumption, wherein N and M are positive integers. For example, N ═ 1. In addition, the N rows of image blocks may be N rows of coding/decoding tree units.
In one example, processor 420 may invoke program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: prior to said adjusting the amount of data to be read into said cache in accordance with bandwidth usage consumption, calculating a total number of cache misses occurring when performing a cache read on one or more image blocks in a current image preceding a current encode/decode block, and determining bandwidth usage consumption based at least on said total number of cache misses.
In one example, processor 420 may invoke program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: performing a cache read in the search range at a first read rate if the total number of cache misses is greater than a first threshold; performing a cache read in the search range at a second read rate if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, the second read rate being greater than the first read rate; if the total number of cache misses is less than or equal to a second threshold, a cache read is performed on the search range.
In one example, processor 420 may invoke program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: determining a search range or a sub-search range to be read into the cache memory from the at least two search ranges to be read based on the total number of cache misses and priorities of the at least two search ranges to be read. If the search ranges have different priorities, the search ranges in the reference frame or the sub-search ranges in the search ranges may be selected in order of priority from high to low.
For example, it is assumed that the at least two search ranges include a first search range and a second search range, and the first search range has a higher priority than the second search range. Reading a portion of a first search range into the cache if the total number of cache misses is greater than a first threshold; reading the first search range into the cache if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold; determining to perform a cache read in the first search range and the second search range if the total number of cache misses is less than or equal to a second threshold.
For example, it is assumed that the at least two search ranges include a first search range and a second search range, and the priority of the first search range is the same as the priority of the second search range. If the total number of cache misses is greater than a first threshold, performing a cache read in a first search range at a first read rate, and performing a cache read in a second search range at a second read rate; if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, performing a cache read in a first search range at a third read rate, and performing a cache read in a second search range at a fourth read rate, wherein the third read rate is greater than the first read rate, the fourth read rate is greater than the second read rate; reading the first search range and the second search range into the cache if the total number of cache misses is less than or equal to a second threshold.
In one example, processor 420 may invoke program instructions stored in memory 410. When the program instructions are executed, processor 420 may perform the following operations: an actual search range when performing cache reading on the current image block is determined based on compression rates of one or more image blocks preceding the current image block. For example, the number of cache misses that occur when a read is performed for each image block may be multiplied by the compression rate of that image block, and the multiplication results summed to get the total number of cache misses.
Furthermore, embodiments of the present application may be implemented by means of a computer program product. The computer program product may be a computer readable storage medium, for example. The computer readable storage medium stores a computer program, and when the computer program is executed on a computing device, the computer program can perform relevant operations to implement the above technical solutions of the embodiments of the present application.
For example, FIG. 5 is a block diagram illustrating a computer-readable storage medium 50 according to one embodiment of the application. As shown in fig. 5, the computer-readable storage medium 50 includes a computer program 510. The computer program 510, when executed by at least one processor, causes the at least one processor to perform the various steps of the method, for example, as described above in connection with fig. 3.
The computer program 510 stored on the computer-readable storage medium 50 may be loaded into the memory 410 of the video codec 40 shown in fig. 4, for example, so that the processor 420 of the video codec 40 performs a corresponding operation.
Those skilled in the art will appreciate that examples of computer-readable storage medium 50 include, but are not limited to: semiconductor storage media, optical storage media, magnetic storage media, or any other form of computer-readable storage media.
The method and the related apparatus of the embodiments of the present application have been described above in connection with preferred embodiments. Those skilled in the art will appreciate that the methods illustrated above are exemplary only. The methods of embodiments of the present application are not limited to the steps or sequences shown above. For example, the above steps may be performed in different steps from those in the embodiments of the invention, or in parallel.
It should be understood that the above-described embodiments of the present application may be implemented by software, hardware, or a combination of both software and hardware. Such arrangements of embodiments of the present application are typically provided as downloadable software images, shared databases, etc. arranged or encoded on a computer readable medium such as an optical medium (e.g., CD-ROM), floppy or hard disk, or other media such as firmware or microcode on one or more ROM or RAM or PROM chips, or in one or more modules. The software or firmware or such configurations may be installed on a computing device to cause one or more processors in the computing device to perform the techniques described in embodiments of the present application.
Furthermore, each functional block or respective feature of the device used in each of the above-described embodiments may be implemented or executed by a circuit, which is typically one or more integrated circuits. Circuitry designed to perform the various functions described in this specification may include a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC) or a general purpose integrated circuit, a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components, or any combination thereof. A general-purpose processor may be a microprocessor, or the processor may be an existing processor, controller, microcontroller, or state machine. The general-purpose processor or each circuit described above may be configured by a digital circuit, or may be configured by a logic circuit. Further, when advanced technology capable of replacing the current integrated circuit is developed due to the advancement of semiconductor technology, the embodiments of the present application can also use the integrated circuit obtained using the advanced technology.
The program running on the apparatus according to the embodiment of the present application may be a program that causes a computer to realize the functions of the embodiment of the present application by controlling a Central Processing Unit (CPU). The program or information processed by the program may be temporarily stored in a volatile memory (such as a random access memory RAM), a Hard Disk Drive (HDD), a nonvolatile memory (such as a flash memory), or other memory system. A program for realizing the functions of the embodiments of the present application may be recorded on a computer-readable recording medium. The corresponding functions can be realized by causing a computer system to read the programs recorded on the recording medium and execute the programs. The term "computer system" as used herein may be a computer system embedded in the device and may include an operating system or hardware (e.g., peripheral devices).
As above, the embodiments of the present application have been described in detail with reference to the accompanying drawings. However, the specific configuration is not limited to the above-described embodiment, and the embodiment of the present application also includes any design modification without departing from the gist of the embodiment of the present application. In addition, various modifications may be made to the embodiments described in the present application within the scope of the claims, and embodiments obtained by appropriately combining technical means of different embodiments are also included in the technical scope of the embodiments of the present application. Further, components having the same effects described in the above embodiments may be substituted for each other.

Claims (51)

  1. A method in a video codec, comprising:
    adjusting the data amount to be read into a cache of a video encoder/decoder according to the bandwidth use consumption;
    reading at least a portion of image blocks of a reference frame into the cache based on the amount of data;
    performing a search match on a target image block in a current image based on image data in the cache;
    and performing inter-frame encoding/decoding on the target image block based on the search matching result.
  2. The method of claim 1, wherein said adjusting an amount of data to be read into a cache of a video codec as a function of bandwidth usage consumption comprises:
    determining a search range corresponding to the target image block in the reference frame;
    determining the proportion to be read of the search range according to the bandwidth use consumption;
    and reading the sub-search ranges in the search range into the cache in the reference frame according to the ratio to be read.
  3. The method of claim 1, wherein the cache comprises a plurality of cache lines;
    said reading at least a portion of image blocks of a reference frame into said cache comprises:
    reading a search range or reading sub-search ranges in a search range from the reference frame each time, wherein the at least partial image block comprises the search range or reads the sub-search ranges in the search range;
    reading the data read each time into at least one cache line of the cache.
  4. The method of claim 3, wherein the cache further stores location coordinates of the at least partial image block.
  5. The method of claim 3, wherein the amount of data of the one search range or the sub-search ranges in the one search range is an integer multiple of the cache line.
  6. The method of claim 3, wherein the data amount of the search range read at different times is the same or different; alternatively, the first and second electrodes may be,
    the data amount of the sub-search ranges in the search ranges read at different times is the same or different.
  7. The method of claim 1, wherein reading at least partial image blocks of a reference frame into the cache based on the amount of data comprises:
    upon determining that an image block to be read in is not present in the cache, reading the image block into the cache.
  8. The method of claim 1, wherein upon determining that an image block to be read in already exists in the cache, forgoing reading the image block into the cache.
  9. The method as claimed in claim 1, wherein the adjusting of the amount of data to be read into the cache memory of the video codec according to the consumption of bandwidth usage is performed starting after inter-coding the first M image blocks within one period with every N rows of image blocks in the current image as one period, where N and M are positive integers.
  10. The method of claim 9, wherein N-1.
  11. The method of claim 9, wherein the N rows of image blocks are N rows of coding/decoding tree elements.
  12. The method of claim 1, further comprising: prior to said adjusting the amount of data to be read into said cache in accordance with bandwidth usage consumption:
    calculating a total number of cache misses occurring when performing cache reading on one or more image blocks preceding a current encode/decode block in a current image; and
    determining bandwidth usage consumption based at least on the total number of cache misses.
  13. The method of claim 12, wherein if the total number of cache misses is greater than a first threshold, performing a cache read in the search range at a first read rate.
  14. The method of claim 13, wherein if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, performing a cache read in the search range at a second read rate, the second read rate greater than the first read rate.
  15. The method of claim 14, wherein if the total number of cache misses is less than or equal to a second threshold, performing a cache read on the search range.
  16. The method according to claim 2, wherein the search range corresponding to the target image block comprises one or more search ranges and/or one or more sub-search ranges.
  17. The method of claim 16, wherein each sub-search range is defined in a corresponding one of the search ranges.
  18. The method of claim 12, further comprising:
    determining a search range or a sub-search range to be read into the cache memory from the at least two search ranges to be read based on the total number of cache misses and priorities of the at least two search ranges to be read.
  19. The method of claim 18, wherein the search ranges have different priorities, and the search ranges in the reference frame or the sub-search ranges in the search ranges are selected in order of priority from high to low.
  20. The method of claim 18, wherein the at least two search scopes include a first search scope and a second search scope, the first search scope having a higher priority than the second search scope;
    reading a portion of a first search range into the cache if the total number of cache misses is greater than a first threshold;
    and/or the presence of a gas in the gas,
    reading the first search range into the cache if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold;
    and/or the presence of a gas in the gas,
    determining to perform a cache read in the first search range and the second search range if the total number of cache misses is less than or equal to a second threshold.
  21. The method of claim 18, wherein the at least two search scopes include a first search scope and a second search scope, the first search scope having a same priority as the second search scope;
    if the total number of cache misses is greater than a first threshold, performing a cache read in a first search range at a first read rate, and performing a cache read in a second search range at a second read rate;
    and/or the presence of a gas in the gas,
    if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, performing a cache read in a first search range at a third read rate, and performing a cache read in a second search range at a fourth read rate, wherein the third read rate is greater than the first read rate, the fourth read rate is greater than the second read rate;
    and/or the presence of a gas in the gas,
    reading the first search range and the second search range into the cache if the total number of cache misses is less than or equal to a second threshold.
  22. The method of any of claims 1-21, further comprising: an actual search range when performing cache reading on the current image block is determined based on compression rates of one or more image blocks preceding the current image block.
  23. The method of claim 22, wherein the number of cache misses occurring when a read is performed for each image block is multiplied by the compression ratio of the image block, and the multiplication results are summed to obtain the total number of cache misses.
  24. The method of claim 1, wherein the target image block is an image block in the current image that is coded/decoded after a current coded/decoded block.
  25. A video codec, comprising:
    a memory; and
    one or more processors communicatively coupled with the memory,
    wherein the memory has stored thereon instructions that, when executed by the one or more processors, cause the video codec to:
    adjusting the data amount to be read into a cache of a video encoder/decoder according to the bandwidth use consumption;
    reading at least a portion of image blocks of a reference frame into the cache based on the amount of data;
    performing a search match on a target image block in a current image based on image data in the cache;
    and performing inter-frame encoding/decoding on the target image block based on the search matching result.
  26. The video codec of claim 25, wherein said adjusting an amount of data to be read into a cache of the video codec as a function of bandwidth usage consumption comprises:
    determining a search range corresponding to the target image block in the reference frame;
    determining the proportion to be read of the search range according to the bandwidth use consumption;
    and reading the sub-search ranges in the search range into the cache in the reference frame according to the ratio to be read.
  27. The video codec of claim 25, wherein the cache comprises a plurality of cache lines;
    said reading at least a portion of image blocks of a reference frame into said cache comprises:
    reading a search range or reading sub-search ranges in a search range from the reference frame each time, wherein the at least partial image block comprises the search range or reads the sub-search ranges in the search range;
    reading the data read each time into at least one cache line of the cache.
  28. The video codec of claim 27 wherein the cache further stores location coordinates of the at least some tiles.
  29. The video codec of claim 27, wherein the amount of data of the one search range or the sub-search ranges in the one search range is an integer multiple of the cache line.
  30. The video codec of claim 27, wherein the amount of data of the search ranges read at different times is the same or different; alternatively, the first and second electrodes may be,
    the data amount of the sub-search ranges in the search ranges read at different times is the same or different.
  31. The video codec of claim 25 wherein reading at least a portion of image blocks of a reference frame into the cache based on the amount of data comprises:
    upon determining that an image block to be read in is not present in the cache, reading the image block into the cache.
  32. The video codec of claim 25 wherein upon determining that an image block to be read in already exists in the cache, the image block is discarded from being read into the cache.
  33. The video codec of claim 25, wherein the adjusting of the amount of data to be read into the cache memory of the video codec according to the consumption of bandwidth usage is performed starting after inter-coding the first M image blocks within one period with every N rows of image blocks in the current image as one period, wherein N and M are positive integers.
  34. The video codec of claim 33, wherein N-1.
  35. The video codec of claim 33 wherein the N rows of image blocks are N rows of codec tree elements.
  36. The video codec of claim 25, further comprising: prior to said adjusting the amount of data to be read into said cache in accordance with bandwidth usage consumption:
    calculating a total number of cache misses occurring when performing cache reading on one or more image blocks preceding a current encode/decode block in a current image; and
    determining bandwidth usage consumption based at least on the total number of cache misses.
  37. The video codec of claim 36, wherein if the total number of cache misses is greater than a first threshold, a cache read is performed in the search range at a first read rate.
  38. The video codec of claim 37 wherein if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, a cache read is performed in the search range at a second read rate, the second read rate being greater than the first read rate.
  39. The video codec of claim 38, wherein if the total number of cache misses is less than or equal to a second threshold, a cache read is performed on the search range.
  40. The video codec of claim 26, wherein the search range corresponding to the target image block comprises one or more search ranges and/or one or more sub-search ranges.
  41. The video codec of claim 40, wherein each of the sub-search ranges is defined in a corresponding one of the search ranges.
  42. The video codec of claim 36, the method further comprising:
    determining a search range or a sub-search range to be read into the cache memory from the at least two search ranges to be read based on the total number of cache misses and priorities of the at least two search ranges to be read.
  43. The video codec of claim 42, wherein the search ranges have different priorities, and the search range in the reference frame or the sub-search ranges in the search range are selected in order of priority from high to low.
  44. The video codec of claim 42, wherein the at least two search ranges include a first search range and a second search range, the first search range having a higher priority than the second search range;
    reading a portion of a first search range into the cache if the total number of cache misses is greater than a first threshold;
    and/or the presence of a gas in the gas,
    reading the first search range into the cache if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold;
    and/or the presence of a gas in the gas,
    determining to perform a cache read in the first search range and the second search range if the total number of cache misses is less than or equal to a second threshold.
  45. The video codec of claim 42, wherein the at least two search ranges include a first search range and a second search range, the first search range having the same priority as the second search range;
    if the total number of cache misses is greater than a first threshold, performing a cache read in a first search range at a first read rate, and performing a cache read in a second search range at a second read rate;
    and/or the presence of a gas in the gas,
    if the total number of cache misses is less than or equal to a first threshold and greater than a second threshold, performing a cache read in a first search range at a third read rate, and performing a cache read in a second search range at a fourth read rate, wherein the third read rate is greater than the first read rate, the fourth read rate is greater than the second read rate;
    and/or the presence of a gas in the gas,
    reading the first search range and the second search range into the cache if the total number of cache misses is less than or equal to a second threshold.
  46. The video codec of any one of claims 25-45, further comprising: an actual search range when performing cache reading on the current image block is determined based on compression rates of one or more image blocks preceding the current image block.
  47. The video codec of claim 46 wherein the number of cache misses that occur when a read is performed for each image block is multiplied by the compression ratio for that image block and the multiplication results are summed to obtain the total number of cache misses.
  48. The video codec of claim 25, wherein the target image block is an image block in the current picture that is coded/decoded after a current coded/decoded block.
  49. A drone comprising a video codec according to any one of claims 25 to 48.
  50. A computer program comprising instructions for performing the method according to any one of claims 1-24 when run on one or more processors.
  51. A computer-readable storage medium storing a computer program according to claim 50.
CN201880042827.5A 2018-08-01 2018-08-01 Video processing apparatus and method Pending CN110945872A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/098044 WO2020024152A1 (en) 2018-08-01 2018-08-01 Video processing device and method

Publications (1)

Publication Number Publication Date
CN110945872A true CN110945872A (en) 2020-03-31

Family

ID=69230795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880042827.5A Pending CN110945872A (en) 2018-08-01 2018-08-01 Video processing apparatus and method

Country Status (2)

Country Link
CN (1) CN110945872A (en)
WO (1) WO2020024152A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115190305A (en) * 2021-04-01 2022-10-14 Oppo广东移动通信有限公司 Method, apparatus, medium, and system for image processing in video encoding apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101257628A (en) * 2008-03-20 2008-09-03 武汉大学 Adjustable compressing method for realizing video code stream frame rate
CN101272497A (en) * 2008-05-07 2008-09-24 北京数码视讯科技股份有限公司 Video encoding method
US20090274215A1 (en) * 2008-04-30 2009-11-05 Metsugi Katsuhiko Image processing apparatus, image processing method and image processing program
CN102647586A (en) * 2011-02-16 2012-08-22 富士通株式会社 Code rate control method and device used in video coding system
CN107615765A (en) * 2015-06-03 2018-01-19 联发科技股份有限公司 The method and apparatus of resource-sharing in video coding and decoding system between intra block replication mode and inter-frame forecast mode

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008136178A1 (en) * 2007-04-26 2008-11-13 Panasonic Corporation Motion detection apparatus, motion detection method, and motion detection program
CN103763555A (en) * 2014-01-19 2014-04-30 林雁 Motion estimation method for reducing memory bandwidth requirements
CN105376586A (en) * 2015-11-17 2016-03-02 复旦大学 Three-level flow line hardware architecture suitable for integer motion estimation in HEVC standard

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101257628A (en) * 2008-03-20 2008-09-03 武汉大学 Adjustable compressing method for realizing video code stream frame rate
US20090274215A1 (en) * 2008-04-30 2009-11-05 Metsugi Katsuhiko Image processing apparatus, image processing method and image processing program
CN101272497A (en) * 2008-05-07 2008-09-24 北京数码视讯科技股份有限公司 Video encoding method
CN102647586A (en) * 2011-02-16 2012-08-22 富士通株式会社 Code rate control method and device used in video coding system
CN107615765A (en) * 2015-06-03 2018-01-19 联发科技股份有限公司 The method and apparatus of resource-sharing in video coding and decoding system between intra block replication mode and inter-frame forecast mode

Also Published As

Publication number Publication date
WO2020024152A1 (en) 2020-02-06

Similar Documents

Publication Publication Date Title
CN107846597B (en) data caching method and device for video decoder
US7965773B1 (en) Macroblock cache
US9473778B2 (en) Skip thresholding in pipelined video encoders
KR100952861B1 (en) Processing digital video data
Gupte et al. Memory bandwidth and power reduction using lossy reference frame compression in video encoding
TWI688260B (en) Methods, systems, and devices including an encoder for image processing
US11223838B2 (en) AI-assisted programmable hardware video codec
TWI816684B (en) Video encoding device and encoder
JP2013532926A (en) Method and system for encoding video frames using multiple processors
CN111355962A (en) Video decoding caching method suitable for multiple reference frames, computer device and computer readable storage medium
CN101783958B (en) Computation method and device of time domain direct mode motion vector in AVS (audio video standard)
US8225043B1 (en) High performance caching for motion compensated video decoder
US20170013262A1 (en) Rate control encoding method and rate control encoding device using skip mode information
CN111432213A (en) Adaptive tile data size coding for video and image compression
CN110945872A (en) Video processing apparatus and method
US7881367B2 (en) Method of video coding for handheld apparatus
CN116233453B (en) Video coding method and device
US20150055707A1 (en) Method and Apparatus for Motion Compensation Reference Data Caching
TWI785073B (en) Multi-codec encoder and multi-codec encoding system
WO2021175108A1 (en) Inter-frame prediction method, encoder, decoder, and computer readable storage medium
JP2003153283A (en) Method for performing motion estimation in video encoding, a video encoding system, and a video encoding device
CN114584779A (en) Video coding method, device, equipment and medium based on H264 coding algorithm
US6873735B1 (en) System for improved efficiency in motion compensated video processing and method thereof
CN110945870A (en) Video processing apparatus and method
US20090201989A1 (en) Systems and Methods to Optimize Entropy Decoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200331

RJ01 Rejection of invention patent application after publication